

Grand Prize Awarded To BellKor Pragmatic Chaos - physcab
http://www.netflixprize.com//community/viewtopic.php?id=1537

======
robg
Papers here:

[http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKo...](http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf)

[http://www.netflixprize.com/assets/GrandPrize2009_BPC_BigCha...](http://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf)

[http://www.netflixprize.com/assets/GrandPrize2009_BPC_Pragma...](http://www.netflixprize.com/assets/GrandPrize2009_BPC_PragmaticTheory.pdf)

------
mildweed
Did they figure out the Napoleon Dynamite conundrum?

~~~
physcab
I did a search on Google, but couldn't find the reference. Care to share?

~~~
arfrank
From what I remember reading about the contest, the problem was that when
comparing movies and using movies that were similar to others as a way to
suggest movies people either hated or loved Napoleon Dynamite and it didn't
match up at all with their previous movie ratings. Thats how I understood it
at a basic level.

------
pronoiac
I'll repost from another thread about the NYTime article,
<http://news.ycombinator.com/item?id=834681> : The next contest info surprised
me:

"The data set of more than 100 million entries will include information about
renters’ ages, gender, ZIP codes..." It's enough to identify 87% of the
people, apparently: [http://www.freedom-to-tinker.com/blog/paul/netflixs-
impendin...](http://www.freedom-to-tinker.com/blog/paul/netflixs-impending-
still-avoidable-multi-million-dollar-privacy-blunder)

I hadn't realized that someone identified some of the raters in the previous
contest with their imdb ratings: <http://www.cs.utexas.edu/~shmat/netflix-
faq.html>

Also, why this matters, page 44 of a research paper (PDF):
<http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006>

------
tmsh
so why didn't the ensemble win? in the rules it says, 'At the end of this
period, qualifying submissions will be judged (see Judging below) in order of
the largest improvement over the qualifying RMSE on the test subset.' this
isn't the icfp. i assume the person on the leaderboard is in fact the leader.

the difference between the 'test' and 'quiz' sounds b.s. to me. at this rate,
i know i won't even be contemplating netflix prize 2. at the very least,
netflix owes it to the community to explain why they (at this point, seemingly
corruptly) made the decision they did.

i suppose they say they'll post the final 'test subset' scores on the
leaderboard. it doesn't appear to be a very 'open' contest if they don't
actually publish exactly what the test subset is and how the rankings are
determined. when it's that close, everything should be shown to be exactly
done within the rules. otherwise, they really risk delegitimizing the whole
contest, imho.

~~~
pmjordan
I'm not familiar with the details in any way, but it sounds like you could
probably produce an algorithm that produces a perfect result for the rest if
you give it a subset of the publicly available data. That doesn't mean your
algorithm is good in general, that just means you essentially encoded the test
data set into your algorithm. The real test is always going to be how well you
do on data that you've never seen.

As for why the intermittent leaderboard shows results based on the public
dataset: using a specially crafted algorithm, you could leak information about
hidden data through any feedback you get from the test environment. Such
information could be used to create an aforementioned algorithm that does well
in this specific case.

see also: <http://en.wikipedia.org/wiki/Overfitting>

~~~
tmsh
i see. but if they don't make their private dataset public, what's to prevent
netflix from fitting it to a particular entry?

~~~
mquander
Why would Netflix corrupt their own competition in this way? I don't
understand what you're concerned about.

~~~
tmsh
maybe because 'bellkor's pragmatic chaos' won the progress prize and they have
some special relationship with them. could be any number of reasons. this is i
think the danger of a private corporation running a programming contest.

i'm not saying netflix is unfair or corrupt. i hope not. but i think they owe
it to the contestants to explain exactly why they chose the #2 entry on the
leaderboard over the #1 entry. and not just (though i appreciate people's
comments) because of overfitting. how is the private 'test' subset chosen? i
have no affiliation with this contest or any of the teams in any way, but how
would you like to work on this for years only to be told that even though you
beat everyone in the quiz, in the private 'test', you lost.

~~~
luchak
Did you read the contest rules and FAQ? (<http://www.netflixprize.com/rules>,
<http://www.netflixprize.com/faq> \-- and if you didn't, what are you doing
throwing around words like "corrupt"?) It looks to me like they're being very
transparent and explained things very clearly. You have a bunch of data in the
test set; these are partitioned randomly into two equally-sized subsets,
"quiz" and "test". You submit predictions for both subsets, but are only told
how you did on "quiz". Netflix provides an MD5 checksum for the judging file
that defines what the partition is; this file will be made available "at the
end of the Contest". So this will be verifiable by anyone soon.

Also, don't brush overfitting aside. In the (paraphrased) words of one machine
learning researcher, "life is a battle against entropy. In the same way,
machine learning research is a battle against overfitting." Any data whose
test results you use to select your algorithm or to adjust your algorithm's
parameters is not properly considered part of the test set; after your
optimization that data will provide an optimistically-biased estimate of your
true error. Since competitors could get regular updates on their performance
on the quiz dataset, one must assume that they were attempting to optimize
this performance, and so quiz set performance was not be a good estimate of
their true error. You can only get a good estimate of the true error of a
method by testing against data that has played no part in its development.

~~~
tmsh
> So this will be verifiable by anyone soon

Yeah....what happens when the checksums don't check out?

------
AGorilla
This is great and all, but it still thinks I'll love Once:
<http://www.imdb.com/title/tt0907657/>

