
Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems - dragontamer
https://arxiv.org/abs/1204.5721
======
dragontamer
There's a lot of talk about "AlphaGo" and "MCTS" around here. But people
should understand that MCTS is built on top of a particular statistical
problem called the "Multi-Armed Bandit Problem".

The Multi-Armed Bandit Problem describes a gambler who is trying to optimize
their gains. There are a finite number of slot machines in front of the
gambler, and every slot machine has a different probability of winning.

What strategy should the gambler adopt to maximize their winnings? In general,
the various algorithms balance "Exploitation" vs "Exploration". Exploration
looks for better machines, while Exploitation plays the machine with the best
statistics gathered so far.

In the case of MCTS, the different branches of the search tree are seen as a
multi-armed bandit / different slot machines. UCT (Upper Confidence Bounds
applied to Tree Searches) is the UCB algorithm (described in this survey)
applied to a search tree.

Marketing experts have also used the Multi-armed bandit as a mechanism for A/B
testing, determining the best placement of ads.

Finally, there are applications to business and research opportunities. Which
research efforts should be funded for example, is very much a multi-armed
bandit problem.

As such, the Multi-Armed Bandit Problem is a fundamental component to a lot of
Hacker News discussion, even if people don't yet realize it. That's why I'm
posting this excellent survey by Bubeck1 and Cesa-Bianchi, which can provide a
good introduction to the Multi-Armed Bandit Problem.

