

The Netflix Prize: 300 Days Later - nickb
http://whimsley.typepad.com/whimsley/2007/07/the-limitations.html

======
simpleenigma
Great article and speaking a someone who is actively working with the Netflix
prize data on a daily basis the summarize some of the more bizarre quirks of
the dataset pretty well. The other problem that he didn't really get into is
the sheer size of the dataset, the .txt files are about 4GB in size to hold
the 100,480,507 ratings ...

Beyond that I think he missed the really interesting ideas behind the
recommender system possibilities. Once a user has enough ratings on movies and
there is enough of a cross section of other users rating the same movies the
math of the problem starts to create a profile of the user. With that profile
it is more likely to be able to find the things the user will find interesting
instead of just what the crowd voted most popular.

The real trick is not to use some artificial error measuring method to
validate how your algorithm is working but instead to focus on how the
recommendation system changes the way the user interacts with your website.

The Netflix prize did a few things that where wonderful for the process of
recommendation system, they brought a lot of attention to the field and they
gave a huge dataset for people to compare their results with. In the long run,
even if no one wins the grand prize, they will have give much more to the
field of study then the $1 million.

Oh yeah .. and maybe this January there will be one more YC start-up doing
recommendation systems.

~~~
ntoshev
I will watch that with interest. I have played with Netflix dataset, although
not as much as I'd like. You can do a lot of useful things just with it, but
in real world it is very useful to use different definition of the problem
from the Netflix one. "Similar items" are useful because customers can
understand what it means, besides having some abstract "we think you would
rate this X".

Also it makes a lot of sense to integrate the other data you have about your
items (genre, director). It is actually pretty easy to do.

------
ntoshev
From the YC crowd, Reddit and most recently Adpinion seem to try exploiting
recommendation systems. My experience with Reddit's "recommended" page tells
me it is useless for now. I hope Adpinion doesn't rely solely on getting
recommendation system right.

------
sethjohn
Great article. The limit doesn't seem to be in the algorithm (as this article
makes clear), it's the ability of a 1 through 5 star system to accurately
reflect a person's feelings about a movie.

While algorithms for predicting your upcoming ratings based on your previous
ratings may not be able to improve...surely there are some creative ways to
re-think the entire system:

Two obvious approaches would be to rate on multiple scales (funniness,
intellectual weight, 'light'-ness, etc.), and to include a comment-tracking
system alongside the simple rating system to provide a better more complex
picture of the film (as Reddit, and YCnews do).

