

More data beats better algorithm at predicting Google earnings - neilc
http://anand.typepad.com/datawocky/2008/04/more-data-beats.html

======
danohuiginn
Problem is, I doubt he would have blogged this if the better algorithm had
beaten the larger dataset.

As it happens, I broadly agree with his conclusion (data trumps algorithms),
but cherrypicking data-points doesn't provide any evidence for it.

~~~
mlinsey
I'm not familiar with this guy's site so I'm not sure if he means something
more nuanced, but "more data beats better algorithms" is too vague a claim to
test by picking _any_ number of datapoints. 100 datapoints will of course beat
1,000 datapoints if the former is selected via a random sampling that uses a
uniform distribution across the entire population with no response bias and
the latter is selected by asking the first 1,000 people you happen to see.

------
brent
While I haven't looked in depth at either post I find this to be an awfully
bold claim and appears to be based on the results of a small number of example
problems. There are probably hundreds of papers available where the opposite
is shown to be true.

In fact, many come to the conclusion that more data approaches an asymptote
for performance of a given algorithm, where implementation of a new algorithm
or using an ensemble classifier may substantially increase performance.

------
rms
The blog author isn't try to say that more data beats a better algorithm in
general! That isn't even really what the headline is saying.

The useful thing is that someone with an enormous amount of data is able to
predict Google's earnings very accurately. Bookmark their page and check it
before the next Google earnings reports come out, you can make some good money
by trading the stock.

~~~
danohuiginn
erm...that's exactly what he's saying.

 _Readers of this blog will be familiar with my belief that more data usually
beats better algorithms_

He may not mean what he says, but we can't guess nuances that aren't in his
post.

~~~
rms
Oh. OK. In that case, his broader point is best ignored in favor of the more
interesting prediction about Google's earnings.

------
mattj
umm.. This has nothing to do with more data. This is someone finding 2 random
numbers that, when combined, come out to another random number.

Their claim is the same as this: "Temperatures rose 2% last year. I ate 1%
more potatoes and 1% fewer salmon. Therefore, since 1%+1%=2%, adding the %more
potatoes and % fewer salmon will predict the change in temperature"

The blog author is, in fact, using only 2 data points to predict a 3rd. He's
not using more data, he's using basically no data.

~~~
neilc
I have no idea how you could reach that conclusion from the article. The data
in question is not "two random numbers":

EF reported a 19.2% increase in paid clicks and 11.2% increase in CPCs at
Google Y-O-Y. Do the math (1.192*1.112 = 1.325), that's a 32.5% Y-O-Y revenue
increase. That's the closest anyone got to the real numbers!

------
yters
Since there isn't a free lunch, randomness wins in the end.

