

Big Data Is Useless Without Science - physcab
http://kaleidoscope.kontagent.com/2011/11/09/big-data-is-useless-without-science/

======
tryitnow
The author makes a good point. Unfortunately, many companies are led by people
who just don't get data and its analysis and potential for driving change.

Many executives got to where they are by either lingering around for a long
time and waiting for competent people to leave or by mastering the skills
their company needed 10 years ago, but which are now irrelevant.

This is leading to ample opportunities for startups and growth companies to
beat the living daylights out of bigger competitors. No amount of MBA-speak
and powerpoint presentations can help you understand what to do with the data
deluge. These leaders can try to hire smart analysts, but how are they going
to be able to differentiate between data scientists who can make a real
contribution and those who can't?

A lot of bigger more established companies won't make the cut, no matter how
much data they accumulate.

There's ample opportunity to disrupt established markets with data driven
strategies,but this disruption probably won't come from larger more
established companies.

------
tikhonj
While I agree that "science" (as defined in the article) is a very good way to
make use of data, I do not think the data without the science is useless.

For one, if you collect the data now you can always apply the "science" later
--as long as you have the data and can query it, you're always ready. This
alone means that collecting data now could be useful even if you have no idea
of what to do with it. Storage is relatively cheap, and you--or somebody else
--might come up with a clever way to analyze it in the future.

Additionally, I think the author underestimates the potential of machine
learning. I am by no means an expert in the field (I'm taking an AI class--
that has to count for something! ;) I hope) but even the simple techniques
we've covered can get some interesting information without much domain
knowledge involved. As ML evolves, I suspect there will be more and more
technology that can find interesting trends and relationships in data
regardless of what the data is actually modelling. The article does mention ML
a bit, but I think it will be much more significant in the near future.

Ultimately, figuring out what questions to ask about data and harnessing human
curiosity will give you much more than just hoarding data. I just think it's
possible that sufficiently abstract and generic approaches in the near future
will make it easier to get similar--or perhaps orthogonal--results using
techniques from ML and AI. I firmly believe that hoarding the data without
doing anything to it is still much better than not collecting it at all.

~~~
sausax82
I agree with you that collecting data can be a good thing, and it can be
analyzed at some later point. But in the case of online world, the lifetime of
data is very small(this might not be for other industries). Trends change very
rapidly, and using an old data for making inference might not be profitable if
not a loss.

~~~
spaznode
Actually that's exactly the opposite of true. In fact you -need- old data to
validate that your mathematical model actually works by applying it to
whatever historical timespan of data you are trying to derive intelligence
from. At least if any sort of unit of time is applicable to any of your
models.

Of course just sitting on the data doesn't do anyone any good at all..and
randomly picking at stuff isn't likely to do much either unless you go in
knowing what you want to find.

------
spaznode
I think his main point was that people are already hoarding but - like most
knowledge - don't know what they're missing/clueless about without some good
statistical analysts.

My employer has a whole department of these analysts and after working with
them closely it is obvious to me that they are the most important / valuable
to ourselves and customers asset in the whole company. Engineers can pick up
on some of these things but these guys aren't just picking stuff out of their
ass or anything. I'd have to agree with the author that any company storing or
collecting any significant amount of data should give this stuff some serious
thought.

