
Multi-Armed Bandits, Conjugate Models and Bayesian Reinforcement Learning - _eigenfoo
https://eigenfoo.xyz/bayesian-bandits/
======
heydenberk
I spent two years[0] designing, building and maintaining a system which used
contextual multi-armed bandits at large scale. A couple pieces of advice
relating to this post and this subject:

1\. Thompson sampling is great. It's intuitive and computationally tractable.
The literature is full of other strategies, specifically semi-uniform
strategies, but I strongly recommend using Thompson sampling if it works for
your problem.

2\. This is broadly true about ML, but for contextual bandits, most of the
engineering work will probably be the feature engineering, not algorithm
implementation. Plan accordingly. Choosing the right inputs in the first place
makes a big difference. The hashing trick (a la sklearn's dictvectorizer) can
make a huge difference.

3\. It can be difficult to obtain organizational alignment on the intention of
using reinforcement learning. Tell stakeholders early and often that you're
using bandit algos to produce some kind of outcome — say, clicks or
conversions — and not to do science which will uncover deep truths.

[0] along with an excellent data scientist and a team of excellent engineers,
of course :)

~~~
ma2rten
_The hashing trick (a la sklearn 's dictvectorizer) can make a huge
difference._

I'm not a huge fan of that. Hash collisions can lead to unexpected behaviors
in production and make feature attribution for debugging harder.

It's slightly more effort to implement, but with a trie data structure you can
store even the biggest feature mapping in memory.

~~~
lsorber
How are you compressing the feature space in that case, by truncating the
trie?

~~~
ma2rten
You could use clustering, dimensionality reduce or feature selection.

However, the way I have seen the hashing trick being used is not to compress
the feature space. For most problems it would be a bad idea to just lump your
most discriminative features together with some other random ones. Instead
people just choose a very large feature space which makes collisions unlikely.
For model implementations using sparse matrices it doesn't matter if the
feature space is very large. The main advantage of this is that you don't have
to keep an expensive hash map of your vocabulary in memory (hence my
suggestion to use a trie).

------
atrudeau
Though mentioned it the article, I'll add it here for posterity:
[https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf](https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf)

Great tutorial on Thompson sampling.

------
mopierotti
I would strongly recommend the post he cited. It is the same style but
features interactive visualizations: [https://dataorigami.net/blogs/napkin-
folding/79031811-multi-...](https://dataorigami.net/blogs/napkin-
folding/79031811-multi-armed-bandits)

I implemented something like this for my company and found the latter article
quite helpful in explaining the concept to people who understood the basics of
probability but not programming.

------
pengstrom
I want to recommend the book [http://camdavidsonpilon.github.io/Probabilistic-
Programming-...](http://camdavidsonpilon.github.io/Probabilistic-Programming-
and-Bayesian-Methods-for-Hackers/) a nice little book about probabilistic
programming in Python.

------
melling
There was a free Bandits algorithm book discussed on HN about a month ago.

[https://news.ycombinator.com/item?id=17642564](https://news.ycombinator.com/item?id=17642564)

~~~
_eigenfoo
Yes! I haven't read all of it yet, but from what I've seen so far, the book
spends a lot of time rigorously proving mathematical properties/bounds of
various bandit algorithms. I love rigor as much as the next guy, but I also
like seeing code :)

------
atrudeau
The math isn't rendering for me...

~~~
_eigenfoo
Right! Sorry about that, forgot to add mathjax. Should be fixed soon...

~~~
atrudeau
I was about to send you an email! :)

------
tegansnyder
This is an excellent well written post. Thanks for sharing.

