
Superhuman AI in heads-up no-limit poker: Libratus beats top professionals [pdf] - fmihaila
http://science.sciencemag.org/content/sci/early/2017/12/15/science.aao1733.full.pdf
======
antognini
One of the researchers on this team gave a great talk at NIPS this year about
this work and explained how playing an imperfect information game like poker
is fundamentally different from playing a perfect information game like chess
or go. A perfect information game has no path dependence --- once you arrive
at a state, it doesn't matter how you got there. You only need to consider the
current state of the game to figure out the best move to play.

In an imperfect information game this is no longer true. Now you need to
consider counterfactuals --- how would the opponent have played if I had done
something different earlier?

They illustrated with a toy game they called "Coin Toss". The idea is that
Player A flips a coin, but Player B doesn't get to see the result. Player A
then gets to choose whether to end the game there or continue playing. If they
end the game, they get $1 from Player B if the coin is heads and lose $1 from
Player B if the coin is tails. If they decide to continue playing, then Player
B gets to guess whether the coin is heads or tails. If Player B guesses
correctly, they get $2 from Player A, and otherwise lose $2.

If you spend some time thinking about the strategy, you can figure out that
the best strategy for Player B is to guess heads 25% of the time and tails 75%
of the time. But if you change the rules to the game for Player A, so that
they get $1 if it's tails and lose $1 if it's heads, then the best strategy
for Player B reverses as well --- now Player B should guess heads 75% of the
time and tails 25% of the time. Yet nothing about Player B's situation has
changed --- the only change happened prior to Player B's involvement. Yet this
nevertheless changes Player B's strategy because Player B must consider how
Player A would have acted given that the coin landed heads vs. tails.

So to solve these imperfect information games, you have to be very careful to
model these sorts of counterfactuals. If you mess up you will leave yourself
with an exploitable strategy --- that is, someone can devise a strategy that
causes you to lose the game.

Link to the paper here:

[https://arxiv.org/pdf/1705.02955.pdf](https://arxiv.org/pdf/1705.02955.pdf)

~~~
roenxi
I'm going to read the paper carefully and think hard, but going in to it I
still have philosophical issues saying imperfect information is
'fundamentally' different.

Sure, the specific learning algorithm may not generalise from board games to
poker. But to claim that Go is perfect-information is more theoretical than
practical.

In practice, the 50% of moves your opponent makes are unknowns. Nobody can
assess the entire game tree, so the fact that it is theoretically knowable
isn't actually relevant in a game. Neither computers nor humans can compute
it. Once the tree can be reasonably computed, between high level players the
game is over (bar the occasional blunder).

~~~
jpapon
>> Nobody can assess the entire game tree, so the fact that it is
theoretically knowable isn't actually relevant in a game.

That's not the point though. The point is that you only need to consider
current state when considering your next move in Go or Chess. This is not the
case in an incomplete information game, since the sequence of moves that led
to the current state contains a lot of information. That's not the case in
Chess or Go. Basically, Chess and Go satisfy the Markov Property, while Poker
does not.

~~~
roenxi
The current state in Go is the state of the board, the current state of poker
includes some historical data. This difference is only significant to humans
because we have quite poor memories.

An AI has perfect recall, and the fact that a variable is temporal really
doesn't make a difference to the theory.

(Coincidental aside, this is an issue in Go as well due to the ko rule; but it
isn't a major part of the game such as in poker.)

~~~
jononor
Which AI architecture has perfect recall, in a way that actually makes use of
this information? Pretty sure AlphaGo etc does not make use of memory at all.

------
tlholaday
I wonder about an AI being as reluctant to go bankrupt as is a human. Maybe
that reluctance is abstracted away by the betting evaluation.

~~~
xapata
Yeah, that's a major flaw in their research design. In their 120k-hand
competition they reset the players' bankroll after each hand in order to have
the same number of hands for each trial. This dramatically changes the payoff
for aggressive play. The humans were too cautious against the bot, as their
experience was in an environment when going bust could end your session.

In real play, going bust* doesn't just lose your current stack, but also all
future possible winnings. In their experiment, going bust only loses that
stack. Kind of like being a soldier rather than a general, or founder rather
than a VC.

* Yes, you can buy in again, but eventually you've exhausted your bankroll.

~~~
splonk
I don't believe this meaningfully affects a top HUNL cash player, who tend to
have auto-rebuy set anyways and at least think that they're bankrolled for
most games they play. If anything, it might help them a bit by letting them
continuously play at the stack size they have the most experience with.

~~~
xapata
Bankrolled to what extent? The analysis after the experiment showed that the
bot pushed small edges far harder than the humans did. Little else differed.
That's consistent with my theory.

------
splonk
There's some previous discussion of the "beats top professionals" part of this
here:
[https://news.ycombinator.com/item?id=13534778](https://news.ycombinator.com/item?id=13534778)
[https://news.ycombinator.com/item?id=13358138](https://news.ycombinator.com/item?id=13358138)

Note that this shouldn't really be a surprise - it's very likely that the best
HUNL AIs were as good or better than the best humans maybe 5 years ago. The
recent developments over the last couple years are more about widening the gap
and increasing the confidence in the results. I haven't fully read this paper
yet but my impression is that it's mostly about architectural and abstraction
improvements, and possibly just chucking more hardware at it.

------
StanislavPetrov
Interesting research, although people familiar with poker would not consider
Jason Les, Dong Kim, Daniel McCauley, and Jimmy Chou "top professionals". They
may be very good at HUNL but they are definitely not the "top".

Many people will look at this and say, "computers have solved poker" without
understanding that this is applicable to heads up poker only. Full ring game
poker, where there are an uncertain number of non-GTO players still to act
behind, is an entirely different beast.

I'd also be interested in knowing how deep the stacks were in the human vs
computer matches. The complexity of the game is directly related to how deep
stacks are. A 10 big blind hu match, a 100 big blind hu match, and a 1,000 big
blind hu match are all three entirely different games with entirely different
decision trees.

------
DominikPeters
Video of an interesting panel from AAAI-17 with the heads of the two teams who
built human-level heads-up no-limit programs.

[http://videolectures.net/aaai2017_bowling_sandholm_poker/](http://videolectures.net/aaai2017_bowling_sandholm_poker/)

They use very different approaches (deep learning versus clever game-tree
pruning). It's not clear how to tell which of them is the stronger program.
Both teams seem to think that theirs is stronger, but I think only Libratus
(the pruning-based approach) has beaten top poker pros in a tournament.

~~~
xapata
> beaten ... in a tournament

They didn't play a real tournament with cash at stake.

------
yazr
Recent reddit AMA on this

[https://www.reddit.com/r/MachineLearning/comments/7jn12v/ama...](https://www.reddit.com/r/MachineLearning/comments/7jn12v/ama_we_are_noam_brown_and_professor_tuomas/)

------
korbonits
The demo of this at NIPS was awesome!

------
nathanaldensr
One interesting error in their paper: 27o (two-seven offsuit) is actually the
worst hand _in a 10 player game_. 23o (two-three offsuit) is the worst hand
heads-up.

~~~
noambrown
Author of the paper here. This actually depends on the stack sizes. At 200BB
deep, 27o is the worst hand, though 23o is not far behind.

