

If You Liked This, Sure to Love That  - kurtosis
http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

======
quan
I worked on the Netflix Challenge last year but did not come far and gave up
after submitting a few mediocre results. Actually, I was proud of myself for
my result did not explode but was within the range of the Cinematch's
algorithm. The experience has given me a lot of respect for companies that are
developing recommendation algorithms.

There should be more great competition like this. At this point, I'd say the
benefits coming from all the research and development by the Netflix Challenge
community as well as the experience obtained by many hobbyists like myself as
the result of this competition has already exceeded the $1 million winning
prize.

~~~
randomwalker
I saw an informal study on the economics of prizes. Apparently, the monetary
value of the effort that is put into solving the, the media publicity, etc.
added together exceeds the value of the prize itself by anywhere between a
factor of 10 and 50, depending on the prize.

~~~
staunch
Sounds like a _very_ informal study.

------
jcdreads
I'm a fan of this old post (probably linked from Hacker News, actually, but I
can't really remember)...

[http://whimsley.typepad.com/whimsley/2007/07/the-
limitations...](http://whimsley.typepad.com/whimsley/2007/07/the-
limitations.html)

...that asks exactly how much better the experience will be for a particular
customer if all this research results in a recommendation engine that's really
10% better than Cinematch. Undoubtedly the marketing value (to Netflix) of the
challenge is incredible, and I don't question the reasonableness of their
desire to eke satisfaction out of every potentially satisfied customer; but
surely there's somewhere else in their business they could more easily improve
their margin and their customers' experience.

I, personally, use Netflix a bunch, and love the service. Furthermore, I love
the fact that they sponsored this contest and provided a large research
database to support it. I guess I'm just always a bit disappointed by
breathless mainstream press coverage that doesn't discuss these other meta-
contest questions. That and I wish I could somehow get Netflix to send me
season 4 of Lost before season 5 starts.

------
DaniFong
That's hilarious. The list of movies that are hard to classify reads like a
list of my favorites...

 _[Like] “Napoleon Dynamite” — culturally or politically polarizing and hard
to classify, including “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit
9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and
“Sideways.”_

~~~
jfornear
I would classify those all as Indie mainstream, stuff that is popular with the
scenester, psuedo-intellectual, pretentious crowd.

~~~
olavk
No, it must be more complicated than that. If they were just consistently
popular with a specific demographic, it would be easy to predict - the problem
seem to be that even if you like Lost in Translation, you _might_ still hate
Kill Bill, or Fahrenheit 9/11.

~~~
kirse
After signing up for Netflix last month, I realized that I am one of those
users that the algorithm writer's probably hate. I usually vote things "1" or
"5", because a movie I did not like meant I just wasted 2 hours of my time. On
the other hand, a movie I did like was an enjoyable and relaxing 2 hours.

So I rate everything to the polar ends, and also tend to have pretty varied
choices for what movies I like. Usually the only genre I avoid is
Horror/Psycho/Chainsaw Death films.

------
dangoldin
If anyone is interested in some Singular Value Decomposition type of work -
take a look at Principal Component Analysis.

<http://en.wikipedia.org/wiki/Principal_components_analysis>

~~~
ctkrohn
PCA is an incredibly useful technique. At work we've been using it to model
the structure of the yield curve, i.e. the graph of interest rates vs.
maturity. Turns out you can decompose most daily movements of the yield curve
into three components: parallel shift up/down, steepening/flattening, and a
"bow" where 2s5s flattens, 5s10s steepens, and 10s30s flattens.. It would be
interesting to build an interest rate model that evolves these three
components forward in time... it would probably be most useful for short time
scales where the principal components are unlikely to change.

~~~
kurtosis
unrelated question: I've been recently been reading a lot about wavelets and
multiscale analysis. My application area is in text processing and topic
models for legal document analysis. Wavelet transforms or statistical modeling
in the wavelet domain seems like the kind of thing that would have been tried
many times over in finance. Do you know of any instances when it turns out to
be useful useful for time series?

~~~
ctkrohn
I've heard people talk about it, but never seen any concrete applications to
finance. If you know of any papers or introductory material, I'd be thrilled
to check it out. I know nothing about wavelets or multiscale analysis -- I
couldn't even define them if you asked -- but I have a decent math and
statistics background so I'd love to take a look.

------
jaytee_clone
My first thought was - the rating system itself creates a ceiling to the
accuracy of the prediction. And it seems that every one knows that. (Netflix
or the 30,000 hackers.)

Then why spend so much resource to improve 10% of the existing rating system
instead of experiment with new kinds of rating system. (I'm sure someone can
come up with something clever yet simple.) Yes it's costly to change the
infrastructure. But if you don't do it, some startup will come out and beat
them to it.

The worst part is that Netflix is paying people to think inside-of-the-box
(the rating system).

I read about Bertoni a while ago and was inspired by his out-of-the-box
approach (Behavioral Economics). Wouldn't that give Netflix a hint - "Yo
Netflix, here's this dude who's getting the fastest-growing result by
extracting more qualitative information out of the quantitative rating system.
Maybe you should just design a new rating system that better orgnizes these
qualitative information? Just Maybe."

Or maybe I'm missing the point here?

------
FiReaNG3L
I really wish there was an open source incremental SVD-based library to to
collaborative filtering; the algorithm is known, we know its pretty efficient,
scalable and high performance. Sure there is Mahout
(<http://lucene.apache.org/mahout/>) but development seems pretty slow, if
there is still dev work at all on this.

~~~
fizx
Lingpipe has a semi-open license, and a Java implementation of incremental
SVD. The original incremental SVD code is open:
<http://www.timelydevelopment.com/demos/NetflixPrize.aspx>

BTW, SVD is not scalable. I define scalable algorithms to have time and space
complexity O(n log(n)) or less. SVD is generally O(n*n)+, requiring matrix
multiplication. You'll have to shard your computations in order to scale
indefinitely.

------
GavinB
I assume it would be cheating to simply never recommend Napoleon Dynamite and
the other controversial movies?

------
mynameishere
If those teams got together, hacked up a program that ran 2 or more
independant systems (entries) at the same time and averaged the results,
they'd probably get to 10 percent.

~~~
aston
There are lots of teams on the leaderboard that are actually metateams. My
favorite is "When Gravity and Dinosaurs Unite."

<http://www.netflixprize.com/leaderboard>

They actually do one better than your suggestion, which is that they use
machine learning to figure out how to weight one team's results vs. the other.

------
sharkfish
The vast amount of data and the prediction task remind me a lot of stock
market prediction attempts.

I enjoyed Pi, by the way.

