
The Most Innovative Companies In Big Data - sgy
http://www.fastcompany.com/most-innovative-companies/2014/industry/big-data?utm_source=facebook
======
genofon
the right title should be: "The World's Top 10 Most Innovative Companies In
Big Data according to their marketing team." The article I would like to read
is about the real advantages they made, not just "they are applying BIG DATA
to the X,Y industries". No description on the size of the problem nor the
advantages or details.(I'm especially looking at you AYASDI... )

I've been to Big Data conferences where the main application was mean and
standard deviation on a huge datasets, I'm just sick of companies inflating
the Big Data bubble (and this comes from a "data scientist")

~~~
michaelochurch
_I 've been to Big Data conferences where the main application was mean and
standard deviation on a huge datasets, I'm just sick of companies inflating
the Big Data bubble (and this comes from a "data scientist")_

"Data science" is a bizarre job description to me. Some companies' data
science teams are doing work that could be done in Excel, and others are doing
sophisticated machine learning research. In many companies, though, it's a
watered-down version of the (extinct, sadly) R&D job description. I've heard
people opine that 95% of "data scientists" have never implemented anything
more sophisticated than an SGD regression, and it wouldn't surprise me.

Granted, you can get some neat insights and visualizations with off-the-shelf
tools, but I still consider it important to know how the algorithms actually
work and what the basic assumptions are. For example, you can _usually_ use
linear regression for 2-class classification problems (as opposed to the
mathematically more correct logistic model, since probabilities are in [0, 1]
and linear models diverge at the boundary) get a reasonable predictive model,
but it's worth knowing when (and why) that short-cut breaks down.

In software, you hear "data scientist syndrome" used to describe people who
have a lot of short job tenures because they (a) leave if there isn't
interesting work for them, and (b) tend to be the first laid off, not because
they're bad but because R&D is first to bleed when things go bad. The software
industry forces you to choose between short-term job security (people doing
interesting work are most exposed to organizational changes) and long-term
career health (people _not_ doing interesting work turn into dinosaurs). It
stands to reason that the selection process for organizational credibility
(favoring tenure) would be something other than deep knowledge of data science
(which requires a stream of interesting work, and the behavioral correlates
such as job volatility).

It's sad that the typical corporate culture of reliable mediocrity, at the
expense of excellence, has also learned to use the vocabulary of "Big Data".
Reality: truly Big Data (> 10 TB) is a huge pain in the ass. It's what you
have to deal with when there is too much noise, or the model's essential
complexity is too high, to get a good model out of "small data".

~~~
dannypgh
Curious- why would you ever use linear regression for binary classification?

Logistic regression can be implemented as linear regression predicting the
input to the logistic function.

~~~
oldskoolbob2000
From my experience, it's easier to explain linear coefficients. Also, at least
in R, linear regression tends to run faster than logistic regression.

~~~
sgy
Might be a good read [http://mrvar.fdv.uni-
lj.si/pub/mz/mz1.1/pohar.pdf](http://mrvar.fdv.uni-
lj.si/pub/mz/mz1.1/pohar.pdf)

------
ghoul2
eh. not including Google in a list of big data companies is of similar kinda
ridiculousness as not giving the Nobel Peace prize to Gandhi. It doesn't take
anything away from Google/Gandhi, it just marks the list/prize as farcical.

~~~
Fishkins
Just for context, Google was #1 on their general "most innovative" list. I
think it's pretty common to try to have a variety of companies topping
different lists, even if one company really should be at the top of them all.
That doesn't make this list any more accurate, but it might have been what
they were thinking.

------
coldcode
What does "Big Data" even mean? Personally I think of it as some kind of
disruptive synergistic paradigm. Is it a sea change or a holistic approach?
Perhaps some kind of seamless mission-critical win-win? I hate when people use
overloaded terms that can be twisted to mean anything at all. To not put
Google in this list clearly shows it's just a random collection of companies,
or maybe based on advertising sales to the magazine.

~~~
oldskoolbob2000
Well said. To me, the definition of 'Big Data' is data so big that that the
computation time is much greater than the data analysis/munging time. It has
nothing to do with how complex, robust, etc. the analysis is. Big data is
really not that sexy, but it's a fad that I'm riding because people are hiring
for data analysis skills.

------
jgalt212
I am bit surprised to see both Palantir and Factual missing b/c 1. they are
doing interesting work, and 2. they probably spend a pretty penny on PR.

~~~
michaelochurch
Factual has smart people at every level (a rarity in the "tech" world) and
seems to be able to market itself organically to top developers. They have an
extremely competent core team that people want to work with. This doesn't
require a PR budget.

It's the Knewtons of the world that have to spend millions on PR. Actually,
Knewton is solidly OK at hiring good people but they usually leave within a
year because of the management.

~~~
S4M
I noticed several of your comments criticizing Knewton. Could you expand a
bit? I am not related to Knewton at all, but interested in ed tech. If you
don't want to say it in public, I'd appreciate if you could email me.

~~~
michaelochurch
I'll post half of it publicly and email you the other half.

Their management is known throughout the industry to be unethical. Some
recruiters refuse to work with them, and you'll hear some amazing stories if
you hang around New York. They've existed for almost 6 years, pivoted
constantly and recklessly, and delivered next to nothing. They're great at
using the founder's family connections to raise money and get partnerships,
but they treat engineers like commodities and have extreme architectural
instability. There's also one high-profile case of the execs spending months
trying to ruin the reputation of someone after he left.

Ultimately, their true mission is fire teachers. That's a really terrible
mission, if you ask me. Obviously, the engineers and data scientists
(Knewton's management is scum, but they have good people at lower levels) are
being told a different story about "democratizing education", but their real
mission is to make teachers obsolete while making a few ed-pub incumbents
(like Pearson) very wealthy.

They're really damaging the reputation of the ed-tech space. It's a shame,
because there are some really good companies trying to advance the field, and
they're struggling to raise money due to the behavior (and declining
reputation) of a large no-longer-startup that has nothing to do with them.

------
capkutay
Excuse my marketing terminology but I think splunk was only successful because
they were the only enterprise "big data" player present on the way to the peak
of inflated expectations[0]. Technology wise, it's easy to poke holes in and
not even valuable with free indexing tools like elasticsearch[1] coming out
and being paired with logstash. And if you think a distributed inverted index
for syslogs is the future of real time analytics, you're probably stuck in
2009.

0: [http://berglondon.com/wp-
content/uploads/2010/07/trough.png](http://berglondon.com/wp-
content/uploads/2010/07/trough.png)

1: [http://www.elasticsearch.org](http://www.elasticsearch.org)

------
beamer99
The slant in the list is companies doing interesting things with big data or
applications from unexpected areas like weather and mapping it onto things
like shopping habits. What I don't understand is how Splunk is on the list
since operational data and business intelligence has been a historical fit for
big data. Nothing new. IBM is evangelizing big data. A bit of a weak one, I
thought but the Smarter Cities initiatives doing some good work. Some
interesting companies on the list and in the comments section.

------
theyeti
A bit surprised not to see Google in that list.

