

Big Data vs Intelligent Data (and what Startups can do with it) - bialecki
http://www.klaviyo.com/blog/2012/09/05/big-data-vs-intelligent-data-startups/

======
equark
Another key fact is that "big data" is actually not that common, especially
when it gets to the analysis stage.

The median job size at Microsoft and Yahoo is only 15GB. And 90% of Hadoop
jobs at Facebook are under 100GB. Clearly you want to be able to crunch large
log files, but in terms of day-to-day analysis the files are much smaller than
that. (cite:
[http://research.microsoft.com/pubs/163083/hotcbp12%20final.p...](http://research.microsoft.com/pubs/163083/hotcbp12%20final.pdf)).

At Sense (<http://www.senseplatform.com>) most of the clients we work with are
struggling not with the size of their data but with tricky modeling problems
that don't fit into standard black boxes and with integrating analytics into
actual production systems. Adopting something like Hadoop for these tasks is
not very productive.

~~~
noelwelsh
Interesting paper (and makes me feel more justified in rejecting Hadoop). Do
you have any blog posts / other material about the techniques you're using at
Sense?

~~~
equark
Unfortunately no, but you're welcome to email me at tristan@senseplatform.com.

------
wookietrader
From a data analyst's perspective, let's go through what he says.

First he states something along the lines of "More data does not always help."
This is right from a theoretical perspective. But: it never hurts. This is
also right from a theoretical perspective, it's a result from probability
theory: additional observations will always lead to less or equal variance in
your estimations. There is no data like more data. There is no down side with
more data.

I am not sure in what way (2) and (3) relate to big data. I'd even say that
(3) is pro big data.

Then there is this term "intelligent data". Actually, I can't emphasize how
badly chosen this term is. Intelligence is related to the quality of actions
someone takes. Data does not take actions, It just "is". Data cannot be
intelligent, just as a stone cannot be intelligent. He also thinks that data
measurements should be repeatable. Guess what, in all interesting cases data
measurements are _not_ repeatable due to randomness in the source itself. One
of the main challenges of data analysis is to still get robust results. He
also thinks that data should be concise, e.g. that the data set at hand should
be as minimal as possible to lead to the same actions. This sounds like a
chicken and egg problem. How would you be able to even assess this without
trying it out?

~~~
noelwelsh
You're neglecting that data costs money and time to collect and process. More
data is more cost.

I am in agreement with the spirit of this post (I'm not interested in arguing
whether intelligent data is a good term or not.) Heck I even blogged along
similar lines just a few days ago: [http://noelwelsh.com/streaming-
algorithms/2012/08/29/lean-da...](http://noelwelsh.com/streaming-
algorithms/2012/08/29/lean-data/) Here are a few problems with collecting
everything:

\- Big Data infrastructure like Hadoop is expensive and slow. It's very much
not in the turn-on-a-dime spirit of startups.

\- If you collect everything, the value per data item is low. This impacts the
analyses you profitably do. Compare the value that Klaviyo can deliver per
data point vs, say, Mixpanel. (And then ask yourself why Mixpanel is moving
into "People" analytics. My suggestion: because it's much more valuable.)

Disclaimer 1: My startup, Myna [<http://mynaweb.com/>], had a shout-out in the
blog post.

Disclaimer 2: I'm a Klaviyo user, as of a few days ago.

~~~
washedup
Today the storage of more data is negligible. There are alternatives to
Hadoop. However, to capture new variables that exist in your market place
definitely takes time and money.

------
photorized
Data can't be intelligent.

