

Netflix Progress Prize Awarded - linhir
http://biz.yahoo.com/prnews/081210/aqw507.html

======
FiReaNG3L
As per the prize rule, they are under the obligation to describe and publish
their mode (as they did last time). I really wish someone would release an
open source recommender package, coded in C, with PHP bindings, based on the
latest / greatest algorithms, or any algorithm (just plain SVD if anything).

Interesting links (from the winning team(s) websites about their latest
research) :
[http://public.research.att.com/~volinsky/netflix/kdd08koren....](http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf)
[http://www.commendo.at/index.php?lang=0&_0=2&_1=3](http://www.commendo.at/index.php?lang=0&_0=2&_1=3)

~~~
icefox
Well I am not going to do everything for you, but here is something to start
with:

[http://benjamin-meyer.blogspot.com/2006/10/netflix-prize-
con...](http://benjamin-meyer.blogspot.com/2006/10/netflix-prize-contest.html)

It is a little framework I put together to play with the netflix prize
framework. I implemented SVD as one of the examples.

On a side note I did think about the idea of putting together a small company
that would sell recommendation systems to companies for their website. The
value would not be in the algorithm, but in the ability to integrate with
their software.

~~~
FiReaNG3L
Very nice, I saw many Netflix Prize derived frameworks but I somehow missed
this one.

Did you test RMSE of your SVD on Netflix Dataset?

~~~
icefox
Nope, it is just a stock SVD, it wont win any awards, just get the job done
and be an example. There are plenty of people who have tweaked SVD up and down
so I didn't spent time doing it too.

------
aswanson
It may be simply mathematically impossible to get to 10 percent given the
observable variables. Maybe netflix knows this and just gets synthesized
research on the cheap.

~~~
pg
I'd guess the machine learning guys by this point understand the data set at
least as well as Netflix does.

~~~
jderick
The machine learning guys have only a subset of the full dataset to work with.
They have to send their algorithms to Netflix for testing on the full dataset.
So it may be possible Netflix knows something they don't. It does seem quite
strange how close and yet how far that 10% mark has been.

~~~
pg
Hmm. What could Netflix know, though? Is it even possible to quantify how hard
it is to improve a recommendation algorithm by a certain amount?

~~~
aswanson
Maybe there is an inherent noise floor for the optimal estimator for the
dataset they have found or bounded.

------
madmanslitany
I started working on this about a month and a half ago. My RMSE is pathetic
thus far(just started tweaking a kNN with Pearson correlation as a distance
metric), but I don't really care. It's a great way to brush up on a lot of
CompSci concepts at once you may be rusty on. Besides obviously machine
learning, when you're trying to process 2 gb of training files on your
personal computer, O(n) time/space complexity REALLY starts to matter, as does
choice of implementation language...using mostly Python with C extensions for
the heavy math right now...

------
tocomment
I'd love to win just so I could settle arguments by saying "anyone who has won
a million dollar data mining competition raise your hand"

------
staunch
I keep hoping someone will swoop down with a crazy solution and snatch the
prize from the people trying to solve it conventionally. So much more romantic
than this slog!

~~~
tocomment
cool thought. any ideas what the solution might look like?

~~~
inerte
Each movie rating will be represented by a star in the sky, the constelations
are the rating clusters, and on 09/20/2009 they'll align to achieve the 10%
mark.

Hey, it works for astronomy!

------
jmtame
2 years in and they still haven't given out the prize money. How's that crowd
sourcing working out for you NetFlix?

~~~
siong1987
Hey, it is not easy to develop a better recommendation engine. They have
problems like the "Napoleon Dynamic" which is still remained unsolved right
now.

Computer can only predict human behavior which is unchanging and static. But,
this is always not the case.

~~~
jmtame
I didn't say it was easy, I'm not sure where you're drawing that assumption
from.

I did essentially say that over 2 years, I'm glad that the collective research
efforts of 35,000 teams, one of the teams has seen $100k from a corporation
who will ultimately profit off their research.

~~~
inerte
One of the teams mentioned in the article started they own company based on
the notoriety achieved, so yeah. It seems to be working pretty well for
everyone.

By the way, your post was just crawled by Google, and it probably represents a
couple cents into their billion dollar quarter. How do you feel helping the
big guys, huh? Working for free for the man... you're disgusting, sir!

~~~
jmtame
Yes, I've been bad! I'm going to sit in time out bbs

~~~
siong1987
LOL. I misunderstood what you have said. Since the code will be open sourced,
it will definitely benefit thousands of companies.

In fact, there was one graduate student who was in our campus last semester
was doing research in this area. He did a very good job on this. And, he won
the first ACM machine learning prize for the paper. If you are interested in
this area, you can actually go to the library and find out the paper. Now, I
think he is now working in the Microsoft Research Team(the usual career path
in our campus).

I actually did a tiny recommendation engine using one of the collaborative
filtering algorithm - slope one predictor last two days. It will be up on the
web very soon. I am having problems updating my server because rubygems.org is
down!

~~~
jmtame
It's all good, I think I need to stop hanging out on YC today. I have to catch
up to Matt on this Cocoa Programming book, and then I've got a final project
to prep for, and finally a meeting tomorrow with the company to talk about
some marketing stuff.

Clearly YC is not where I should be right now :)

