

What is a Good Recommendation Algorithm? - Anon84
http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext

======
wheels
Greg nails something that seems to be passing the academic world of
recommendations by: you can't measure recommendations quality with RMSE. It's
just not a good metric. _User happiness_ is the goal, not the ability to
predict ratings of unrated items. I'm glad to have someone with a little more
clout than me saying this.

Some ask, "What's the difference?" If I tell you about 5 albums that you've
already heard of, are the recommendations good? Even if we're pretty certain
you'll like them? If you're buying an obscure jazz album and you get "Kind of
Blue" as a recommendation (probably most popular jazz album in history and one
any jazz fan would know of) is that a good recommendation?

How do users build trust of recommendations? How does that factor into the
algorithms? It turns out you need a mix of obvious and surprising results. All
obvious and they don't discover anything; all surprising and they don't trust
them.

Those are the right questions. A good algorithm for recommendations is one
that people interact with and discover things with.

This is an awesome read (in fact, I uhm, submitted it a few minutes before at
Greg's blog, but it's good enough that I upvoted it here too). As soon as I
ran across it I immediately blogged, tweeted, and submitted here. I'd had a
draft of an essay along these lines kicking around for ages.

~~~
FiReaNG3L
I think they use RMSE because it's easy, not because it's ideal. Bellkor, a
participating team in Netflix challenge, discussed this in their paper
describing their method who won the progress prize; they calculated whether
minute differences in RMSE improved the quality of top10 results; it did
pretty significantly.

~~~
wheels
Just fished it out -- paper is here for the curious:

[http://public.research.att.com/~volinsky/netflix/RecSys08tut...](http://public.research.att.com/~volinsky/netflix/RecSys08tutorial.pdf)

It's one, amusingly, that I'd skipped because it seemed to be less technical.
:-) Good stuff.

------
jfarmer
I'd argue that "user happiness" isn't the goal for Netflix, long-term revenue
is. That's relatively easy to measure, and certainly easier than something
nebulous like "user happiness." You can even test different recommendation
algorithms and see which maximizes long-term revenue.

Presumably Netflix knows that the recommendation algorithm has a significant
impact on their bottom line, which is why they launched the Netflix Prize to
outsource new algorithm development.

Now, Netflix can't give revenue data to third parties, and they also don't
want to let third-party recommendation algorithms run on their system because
an "average" algorithm will hurt their bottom line.

The question then becomes: which well-understood metric correlates best with
long-term revenue?

Perhaps the answer is RMSE, which is why Netflix chose it. That doesn't seem
totally implausible to me.

~~~
wheels
You'd expect that. In the recommendations world that's called "business rules"
and includes things from skewing results based on margins to not showing
inappropriate recommendations (say, women's clothing to men).

However, I'm pretty sure that Amazon's recommendations don't do that, or don't
do it much, anyway. Their "similar product" recommendations seem to be on a
very simple (and often mediocre quality) pure counting correlation between two
items purchases. It's much harder to guess which algorithms are at work for
personalized recommendations.

At the end of the day, profit margins aside, there's a lot that goes into
optimizing recommendations that can't be easily measured. How do you measure
customer loyalty based on good recommendations? There have been a number of
market research studies that indicate that recommendations do drive customer
loyalty, but it's hard to say where the sweet spot is between skewing things
toward higher margins vs. skewing things towards customer utility. About 80%
of Amazon's visitors aren't there to buy stuff -- and that's great for them!
They've become an information portal / window shopping location that happens
to also sell stuff. Which is a great position to be in when somebody does
think of buying stuff.

That Netflix uses RMSE for their contest doesn't bother me. What I think Greg
is reacting to (and certainly my sentiment, again, this is really similar to
something I'd been writing) is that there's becoming a blurring between
stimulus and response here and there's the assumption, if not in this
subfield, certainly among those casually tracking recommendations advances,
that RMSE _is_ a good way of measuring a recommendations algorithm, not just,
"the metric Netflix is using", when in fact, it's really a much more inexact
science.

------
csytan
A simple item based algorithm which has been reported to work quite well is
Slope One. The advantages are that it is easy to implement, can be updated on
the fly, and it works well (enough) for sparse data sets.

<http://www.daniel-lemire.com/fr/abstracts/SDM2005.html>

There's also examples using python, java, and PHP/SQL.

------
goodkarma
A friend of mine made a Rails plugin called acts_as_recommendable (a plugin
for collaborative filtering):
<http://github.com/maccman/acts_as_recommendable/tree/master>

------
Aron
1) Getting more preference-defining data from the user trumps algorithm
improvements at this point. Netflix would have improved RMSE even more by
turning over additional data like queue-adds, page views, user age\gender,
etc. 2) Use caution criticizing RMSE as overly blunt. It may seem so, but it
is not obvious that an algorithm can be improved for top N prediction simply
because you declare that as the focus.

------
ntoshev
Netflix needed a formal measure for their contest, so RMSE is a useful one
while "making people happy" is not. A business that relies on recommendation
can plug a new algorithm with better RMSE and get improved results
immediately, it is an important part of the puzzle.

~~~
sah
"Making people happy" is hard to define, but you can pick better concrete
metrics than RMSE, and this article offers suggestions on how. An important
part of solving any problem is defining success correctly.

