

How the Netflix prize was won - bootload
http://www.wired.com/epicenter/2009/09/how-the-netflix-prize-was-won/

======
roundsquare
Neat. I wonder if it would be possible to do something like this...

Create a website where anyone can upload their algorithm. The website takes it
and runs it on the data set. Depending on its accuracy and the accuracy of
everything else it has, it will give it a weight.

I wonder how well this would do.

~~~
coglethorpe
Sounds kind of like the "Puzzle Bot" Facebook has to grade puzzle solutions to
find employment candidates.

------
markessien
It's like trying to model a random graph using a set of distinct y=f(x)
functions. There will be a major f(x) that captures most of the movement of
the graph, but the spikes and other weird happenings can be modelled by
another function h(x), which however, does not describe the main trajectory of
the initial function, because it's describing another effect.

For example f(x) could measure the success of a film based on the amount of
marketing money spent on it. h(x) would describe the success based on the day
that it was released. f(x) does not consider h(x), so there would be small
bumps that are dependent on the day it was released. When h(x), which by
itself would be a very poor predictor, seeing as it only factors in the day of
the week.

But combined! They fix each other.

------
JacobAldridge
I think the key point of this article is buried right in the bottom quote,
from Joe Sill of The Ensemble - _"One of the big lessons was developing
diverse models that captured distinct effects"_.

The rest of the piece seems to gloss over the importance of "distinct
effects". If you have a collection of algorithms which are each trying to
interpret the _same_ effect or data, but in different ways, combining them
could make things worse.

So this isn't just a great example that "proffered hard proof of a basic
crowdsourcing concept" (in fact, I would argue it doesn't even fit within the
concept of crowd-sourcing, at least as I understand it). It's a great example
of really smart people _who really know their models_ working together.

~~~
roundsquare
_If you have a collection of algorithms which are each trying to interpret the
same effect or data, but in different ways, combining them could make things
worse._

Can it? I wouldn't think so if you are combining them properly... at least, I
would expect the overall average to be better even if its worse in some
predictions.

Obviously if you are doing something like evenly weighing the predictions you
could end up with something worse but thats not really a smart combination.

~~~
JacobAldridge
Don't get me wrong - it won't definitely make things worse, but it can. I felt
the article over-simplistically inferred that adding other teams and
algorithms would improve results almost every time, which isn't the case - it
took a lot of smart combinations to win.

~~~
roundsquare
Maybe we agree... but I guess my point is that if you are doing things
correctly/intelligently, it should never make things worse (in aggregate)...

In other words, if combining algorithms is making things worse than you are
probably combining them wrong.

I'm sorta pulling this out of my ass (in the sense that I haven't done this
before), but thats what I would thing...

------
wglb
Good article. However, I think of the wisdom of crowds in the sense described
by James Surowiecki is that the participants are acting independently. Is it
true that "Crowdsourced" now means something a little different, as these are
researchers that started off in separate teams and joined together later.

