Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Thompson Sampling: The Bernoulli Bandit (2017) (gdmarmerola.github.io)
57 points by pncnmnp 9 months ago | hide | past | favorite | 13 comments



My favorite resource on Thompson Sampling is <https://everyday-data-science.tigyog.app/a-b-testing>.

After learning about it, I went on to replace the UCT formula in MCTS with it and the results were... not much better, actually. But it made me understand both a little better.


My favorite is this series from 2015 by Ian Osband:

https://iosband.github.io/2015/07/19/Efficient-experimentati...


Love it! Thanks for sharing


Thompson Sampling, a.k.a. Bayesian Bandits, is a powerful method for runtime performance optimization. We use it in ClickHouse to optimize compression and to choose between different instruction sets: https://clickhouse.com/blog/lz4-compression-in-clickhouse


This is great. I remember finding another really good resource on the Bernoulli bandit that was interactive. Putting feelers out there to see if anyone knows what I’m talking about off the top of their heads.


I built a contextual bandit combining XGBoost with Thompson Sampling you can check out at https://improve.ai


What's the added value over Thomson's sampling?


It can learn faster and generalize learning to unseen variants by learning the impact different features have on the conversion rate.

It can also learn how different variants perform in different contexts.


if you have an NN that is probabilistic, how do you update the prior after sampling from the posterior?


You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN.


ah, so observe the reward and then take a gradient step


(Well, not necessarily, which is why I framed it as training from scratch, to make it clearer that it doesn't have anything necessarily to do with SGD or HMC etc. In theory it shouldn't matter via the likelihood principle, but in practice, taking a gradient step might not give you the same model as you would if you trained from scratch. You'd like it to be a gradient step because that would save you a ton of compute, but I don't know how well Bayesian NNs actually do that. And if that works OK in supervised problems or the simplest bandit RL, it might not work in full PSRL uses because DRL is so unstable.)


Beautifully composed article. Looking forward to trying this out.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: