

Bandit Algorithms for Recommendation Systems - sergeyfeldman
http://engineering.richrelevance.com/bandits-recommendation-systems/

======
j2kun
I've been studying a handful of these bandit algorithms for a while now (with
some technical expositions on my own blog [1, 2]), and my PhD advisor has done
a lot of work on improving the theoretical guarantees for various settings.

But the more I read about bandit algorithms the more I think they're not
enough. In particular, they don't take into account natural things we have in
practice: similarities in the structure of the actions, correlations in
rewards across rounds, and the ability to precompute without measuring your
regret against the results.

There are some new research directions that do each of these, though. People
are now starting to impose metrics on the actions so that "closer" actions
have more correlated rewards. They have also been doing a lot of work on
"contextual bandits" in which every round comes paired with a context (i.e.
browser type, information about past searches, etc), and relationships among
the contexts can be leveraged. And finally, there is a model of "pure
exploration," in which you're given some fixed (but unknown) amount of time to
prepare, and the recommender you come up with when that period ends is judged
into the future of how it would perform.

I don't have a lot of intuition about which algorithms/settings are the best
for which situations, but if you're interested in this kind of thing you check
out the work of John Langford, Peter Auer, Sebastien Bubeck, and the many
others in this field.

[1]: [http://jeremykun.com/2013/10/28/optimism-in-the-face-of-
unce...](http://jeremykun.com/2013/10/28/optimism-in-the-face-of-uncertainty-
the-ucb1-algorithm/) [2]: [http://jeremykun.com/2013/11/08/adversarial-
bandits-and-the-...](http://jeremykun.com/2013/11/08/adversarial-bandits-and-
the-exp3-algorithm/)

~~~
sergeyfeldman
Thanks for the comment. The third blog post in this series will discuss linear
contextual bandits, for which I used Langford's VW.

It does a huge amount better.

~~~
cbsmith
Yeah, in general, the contextual bandit approach seems to be a better
strategy.

------
noelwelsh
No love for Thompson Sampling?

~~~
sergeyfeldman
This is blog post 1 out of 3. The second is all about Thompson Sampling, and
should be coming out on Friday =)

