

This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize - elq
http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix

======
hugh
I've always known that science journalists don't know anything about science.
I'm only now realizing that they don't know anything about journalism either.

They only know how to tell one story. That's the story of the underdog
scientist who is (or more commonly, "might be about to") upsetting the stuffy
old establishment. Everything that goes on in the scientific community has to
be shoehorned into the storyline of "Dodgeball" or "The Mighty Ducks", or else
they have no idea how to talk about it.

I was originally going to post this comment on the thread about the "surfer
dude with theory of everything" article, but it fits here just as well.

~~~
jwp
Sure, there's plenty of fluff in the article, but it isn't because the author
(<http://www.math.wisc.edu/~ellenber/>) is clueless about science. The article
is probably at the right level of technical depth. Seems ambitious to even
touch SVD in Wired.

~~~
hugh
Oops. I stand corrected. The author of this _particular_ article obviously
_does_ know the subject very well. My apologies.

------
dood
This article needed a lot more info about the psychologist's approach to make
it interesting, it was 90% background and the barest hint at the end about
what he is actually doing. For all we know he may have just made minor tweaks
to some existing algorithm.

~~~
cainus
The article mentions tweaking the algorithm to take the timing of ratings into
account. It gives the example that a person might rate two movies as 3/5, and
a third they watch right after as a 4/5, because it was better than the
previous two, but under normal(ized) circumstances that user would've given
the 3rd movie a 3/5 too.

Soo.... it's an interesting notion, that there can be time-segment-based
normalization of the data set. Team Bellkor/KorBell credit a big part of their
gains to using the ordering of rankings, rather than the rankings themselves
to test for similarity, so this guy's actually got a novel approach for
normalizing the dataset. I really don't know how he can detect if he's dealing
with the kind of person that is meticulous enough to make sure their new
ratings take all their previous ratings into account, though, or if he's
dealing with the kind of person that gives everything a rating from 4/5 to
5/5.. I wouldn't be surprised if he hits a wall because of this.

Really I think netflix would make a lot better strides towards their goals by
improving their data collection technique. They could probably make great
enhancements towards normalization just by saying "you gave this previous
movie 3/5 stars. How do you rate this latest movie?" That way the data would
be much more normalized. There are lots of possibilities for this sort of
enhancement, so I'm not sure why they're only letting competitors look at the
already collected dataset.

------
brent
Why does he or the author consider him a psychologist? I think its a bit
misleading when simply reading the title of the article. He has an
undergraduate degree in psychology, but his master's is in operations research
(and correct me if I'm wrong, but OR is usually the application of statistics
to business problems, right?). Wouldn't the article more accurately be
something like "Operations research guy uses statistical technique motivated
by psychology ... "

------
pskomoroch
It was a great article, but he hasn't really been anonymous in the Netflix
Prize world:

[http://www.netflixprize.com/community/viewtopic.php?pid=6090...](http://www.netflixprize.com/community/viewtopic.php?pid=6090#p6090)

Anyway, congrats Gavin...

------
nreece
"Just a guy in a garage" is now at #8. "When Gravity and Dinosaurs Unite" tops
the leaderboard ( <http://www.netflixprize.com/leaderboard> ), above BellKor
(the progress prize winner of 2007).

------
andreyf
Yet another psychological problem mathematicians too eagerly claimed as their
own. Why not try to find patterns about the movies which appeal to a certain
individual? Is it against the rules to use outside data (actors, directors,
etc)?

~~~
andreyf
From the Netflix prize FAQ:

 _Why not provide other data about the movies, like genres, directors, or
actors?

We know others do. Again, Cinematch doesn’t currently use any of this data.
Use it if you want._

That seems like an easy target - if I've 5-starred every movie with Kevin
Spacey, I probably will like anything with Kevin Spacey. Why not mine blogs
and reviews, trying to find themes which are appealing to individuals?

~~~
cainus
It's probably just not that hot of a solution, believe it or not. The
recommendation engine should be able to make much better associations between
movies, without even being able to describe what those associations are.
Looking at features like genre, director, actor etc, is like a spam filter
looking for specific spammy words. As soon as the spammers start saying
"p3n1s" instead or Eddy Murphy starts making family movies, it falls apart.
Check out [http://karmatics.com/docs/evolution-and-wisdom-of-
crowds.htm...](http://karmatics.com/docs/evolution-and-wisdom-of-crowds.html)
(scroll down to "3. Recommendation Systems" for the relevant part) for an
explanation.

~~~
toffer
Great link. Thanks for posting that.

------
aston
Kills me how fast WIRED postings make it in. We start discussing articles
before I even get to read them in the real magazine :(

~~~
food79
Wired and Reddit are part of the same media conglomerate, so that explains how
the wired articles get on Reddit. I think a lot of n.yc readers read reddit,
so that is how they get here so fast I bet.

------
tel
Now there just needs to be an economist to drop in and sneak an entry with a
9% improvement.

Statistics and math are excellent tools, but they're quite blind.

~~~
elq
well... there is probably already a few teams with econometricians on board.
based on my prior experience with that ilk - I'm sure their progress got held
up by the insistence that all returns (err, ratings) follow a guassian
distribution.

Also, all of that training in math and stats is pretty useless if you just use
it to "prove" something you already "know" :)

