

Beware The Hype Over Big Data Analytics - yannickt
http://seekingalpha.com/article/441171-beware-the-hype-over-big-data-analytics

======
stiff
I am not sure if I disagree with the central thesis of the article, but the
argumentation seems rather weak in many places, for example he says:

 _Let's take the example of the number of people who walk into a store and buy
a certain product. By studying 10 customers you can get an estimate of the
probability of the purchase and predict sales; by studying 100 customers you
get a better estimate. By analyzing 1000 customers you have a very good
estimate. You could study a million customers, but the results are unlikely to
be vastly different from the results of analyzing 1000 customers and probably
not worth the expense._

This makes sense only for very low-dimensional data sets, but most real-life
problems are very high-dimensional and in high-dimensional spaces a million
customers (or any other number of samples you might reasonably expect to
gather) might still very well be only a small sample. So, there are situations
where because of various constraints you will never be able to get to this
level where more data stops yielding better results. You can watch a nice
presentation from Peter Norvig about the unreasonable effectiveness of data:

<http://www.youtube.com/watch?v=yvDCzhbjYWs>

Or read about the well-known curse of dimensionality:

<http://en.wikipedia.org/wiki/Curse_of_dimensionality>

~~~
kvh
That specific point stuck out as incorrect to me as well. And given that it's
one of the only technical points in the article, doesn't really inspire
confidence in the author. That said, however, I do tend to agree with the
overall hypothesis that the value add of "big data" is very marginal for most
established industries, for exactly the reason he mentions: they've all been
doing this for 20 years now. It's just become cheaper and easier recently.
There are certain domains where modern data techniques can have a
transformative effect, he mentions one (law). I don't think the data analytics
solutions and companies that are being talked about are capable of generating
these kind of transformative innovations though; so, I think the author is
absolutely correct to say that the big data hype is overblown. There _will_ be
big opportunities for big data, but they will come from disruptive startups
finding new ways of doing old jobs. Not from crunching a few more GB of your
data for a few less bucks.

------
noelwelsh
This article seems to be discussing a very different 'Big Data' to that I see.
I'm sure the author's points are true in their world, but largely they aren't
in mine (mostly online media stuff). Here's a rebuttal of a sort:

This data collection is a recent phenomenon: only since media moved online has
it been cheap enough to collect sufficient data to do interesting things.
Furthermore, you can actually use that data is more meaningful ways (e.g.
recommendation systems) that weren't possible before.

That value is not being extracted by the companies that do currently collect
data: surprisingly true from my discussions with media companies. The largest
companies (Yahoo!, Netflix, etc.) sure do, but the second tier don't.

That companies will need outside help to extract insights: This perhaps
indicative of the biggest difference between our worlds. It seems the author
is talking about batch jobs, where you do some rudimentary analysis on a data,
makes some slides, present to management, and go home. The people I know are
talking about deploying automated systems into production (think
recommendation systems again). This requires a large array of skills (machine
learning, software engineering, distributed systems, etc.) that many companies
don't have. "Minitab, SAS, SPSS, Systat" are not sufficient tools in this
world.

The insights gathered from ever larger data sets have more value and are more
accurate than insights gathered from smaller data sets: The conventional
wisdom in machine learning is that more data trumps more sophisticated
algorithms. E.g. [http://anand.typepad.com/datawocky/2008/03/more-data-
usual.h...](http://anand.typepad.com/datawocky/2008/03/more-data-usual.html)

Unstructured and cross functional data have huge value waiting to be
extracted: Here we agree!

~~~
lancewiggs
Perhaps, but many of the basic warnings still apply. In industry and web
businesses alike I see the biggest issue is not capturing or storing the data;
it's one of picking up the tools and data already at their disposal and using
them to help customers. That's not a gap outside consultants are going to help
a lot with as it is mainly a cultural thing, and smart web businesses have
that from the start.

------
bearmf
This guy comes off as a cynical MBA, and he actually is one. His valid points
are:

\- big data is overhyped

\- potential benefits are likely smaller than you think (this is true for most
new technologies)

BUT:

\- he seems unaware about modern analysis tools and their power

\- state of the art in machine learning has actually advanced a lot in the
last 20 years

\- as a result a lot of problems that were intractable back then are being
solved (Jeopardy, Go)

\- there are completely new problems (all the social stuff, huge recommender
systems)

\- big companies are not good at innovating

\- outsourcing is MUCH harder than it seems, especially to Bangalore

------
iusable
Glad to see somebody paint the counter-point to all the hype. I don't believe
that there is no substance to the 'big data story' but these points hold up
well.

------
jasonkolb
A rebuttal to these arguments: [http://www.applieddatalabs.com/content/beware-
hype-over-big-...](http://www.applieddatalabs.com/content/beware-hype-over-
big-data-analytics)

------
claudiusd
A big reason companies need outside help is because they don't share data and
results - if something works at company A and saves them a lot of money, that
information isn't passed to company B and thus you have everyone trying to re-
invent the wheel.

A third party that works with many companies builds up a vocabulary of
techniques, data sets, and experience that allow them to enter a new
environment and be 10 times as effective as any internal team of analysts.

Don't know if anyone has heard of Lattice Engines (<http://www.lattice-
engines.com/>) but they are today integrating external data with internal
cross-functional data to extract insights that are saving large companies
(that I'm sure already have teams of very talented analysts) millions of
dollars.

------
ivan_ah
startup + custom analytics + fast response = good

big corp + generic big data + corporate management = ???

Established corporations may be too sluggish to incorporate the insights from
the data they are sitting on. Smaller companies will be able to do magical
things with machine learning techniques, and will have the agility to act on
their insights.

Thus, it is not so much that big data doesn't have value, but that you need a
whole management-fat-head-free organizational structure to take advantage of
the value.

