

R Is Not Enough For "Big Data"... - romil
http://www.forbes.com/sites/douglasmerrill/2012/05/01/r-is-not-enough-for-big-data/

======
mynegation
The title of the article is misleading. One may think that R is a tool is
worse than other alternatives. It's more like "current methods (or simple
methods) are not enough". R is as good for data science as any other tool (and
arguably is even better thanks to its extensive library of packages).

------
dxbydt
Amazingly bad article, riddled with inaccuracies. Is it because the stat/math
has to be toned down so much becaudse otherwise the average Forbes reader
can't grok the content ? First he says, "mean and standard deviation define a
normal distribution". They do no such thing. You don't even know what
probability distribution attendance at sports events is when you compute mu
and sigma. Then he says, ok its not normal, its power law. But the reason he
attributes for power law is all wrong. Its not normal because there are no
"negative people" to lower the mean ?! Ha ha ha!

His second example on the height weight regression line has little to do with
whether there is a physical relationship between height and weight. If you
really went to know, your weight might be a function of some two dozen
variables like body surface area, bone structure/density, genetics, food-
intake, income levels, nature of work ( sedentary vs active ) etc. Even if you
gathered data on all those, 24 dimensions is a lot. So you do a principal
components, and what does that do ? The first PC from the PCA gives you a
conflated component that explains say 90% of your weight. That conflated
component is going to be a linear weighted sum of many of the 24 dimensions.
That conflated component is just a math object, doesn't exist in reality. If
you want, I can easily connect your body weight to the attendance at your
local sports event and a residual, and that regression might even be a good
fit :)

In any case, none of this has anything to do with R, so his title is pure
linkbait.

~~~
jgmmo
I was really hoping this would say Big Data people are moving towards Python
-- or, i prayed Ruby. But nah, this article had nothing to do with R or the
others.

------
streptomycin
> most stuff measured about humans is distributed like a power law

 _twitch_

<http://cscs.umich.edu/~crshalizi/weblog/491.html>

<http://cscs.umich.edu/~crshalizi/weblog/390.html>

------
canopylabs
Sorry to say, but I disagree with the premise of this article. You still need
R to answer many of the questions the author of the article is proposing to
answer.

On top of that, having a degree in AI will prepare you for a lot of real-world
problems because you've been lots of data sets. I'd prefer to have an PhD-
level researcher who knows why my support vector machine is overtrained, than
a guy with real-world experience who is likely to overtrain and bias the
model.

Furthermore, the author's example of weight and height correlation isn't one
of a bad model. It would be a perfectly fine model _if that was all the data
you had_. If you ignore lots of input data to build a crappy model, you're
going to have a bad time... But everyone with a PhD and now "real world
experience" would know this.

------
a5x2h9k41l
I think Big Data is about convincing companies they need to upgrade their
"tools" and purchase your software or services, or convincing managers in
other departments you, the Big Data team, are doing valuable work. It's
primarily hype-driven. Classic IT.

Most money will be made not by exploiting actually the "big data", but by
exploiting the perceived need to collect and exploit it. We'll succeed in the
former, but not necessarily in the later. It's more about keeping pace with
one's competitors. "If they're doing it, we need to be doing it too."

Lots of meetings with presentations that will use "big data". But few results
that ever come out of that data.

------
CurtHagenlocher
I don't think Forbes magazine is targeted at the same person as Hacker News.

------
nullsub
this guy was VP/CIO of engineering at Google? seriously?

