Hacker News new | past | comments | ask | show | jobs | submit login
Bandit Algorithms Book [pdf] (tor-lattimore.com)
195 points by csabapalfi 7 months ago | hide | past | web | favorite | 16 comments



This came up a couple of days ago: https://news.ycombinator.com/item?id=17637683


I skimmed through this and have already found a bunch of interesting sections, but there's also a ton of background information on topics related to bandit algorithms.

The authors say that this is the first draft of the book submitted to the publisher, so I suppose it's nearly complete? More details available at the site they put up, http://banditalgs.com/


Never heard of bandit algorithms before! Or if I did I didn't recognize it as something different from probability. What have people around here used them for?


You can use it when determining the best solution being tested in as few trials as possible.

Say you are selling a product and you are AB testing something related to buying the product. When a user visits the site you ideally want to give him the version you are more confident is better. By using a bandit approach you can determine if say option A is currently better (w.r.t. some confidence bounds). After each visit you can update the bounds and after sufficiently many visits you have a winner. The main difference to more traditional AB testing is that the process is more adaptive and less time is wasted on exposing an inferior product to the user.


Bandits are probably one of the most underrated machine learning algorithms. One possible application is recommendation systems. Shameless self promotion. I wrote an article about it: https://towardsdatascience.com/how-not-to-sort-by-popularity...


They're probably the most fundamental kind of reinforcement learning algorithms. Understanding bandit algorithms is crucial to developing a good understanding of RL.


This rust project, to manage the number of threads in a monero miner afair. https://github.com/Ragnaroek/mithril


Doesn't alphago use some form of Bandit algorithm in their MonteCarlo code?


I believe that Monte Carlo Tree Search, used in AlphaGo, does work using bandit algorithms. On top of that AlphaGo uses Reinforcement Learning, which also uses bandit algorithms (in Sutton & Barto's book, "Reinforcement Learning: An Introduction", all of chapter 2 is about multi-armed bandits).


Readers who enjoy banditry may also enjoy John Langford's http://hunch.net


It always makes me sad that Thompson Sampling isn't (or at least doesn't appear to be) mentioned alongside things like UCB1. Its theoretically optimal, and relatively easy to grok, and not significantly more difficult to implement.


I really appreciate sharing the book. However, to everyone in charge with naming these files, please don't call it "book.pdf". It makes everyone go to their computer and rename the file after downloading it so that they can find it later. Give it a more intuitive name.

Thanks


Cool, nice to see that Tor was a student at ANU.


Is this the book that is going to make me a poker master player ?


If you play long enough it will make you regret less


Well that's really great! What is it?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: