
CMU's Libratus builds substantial lead in Brains vs. AI competition - nopinsight
http://www.cmu.edu/news/stories/archives/2017/january/AI-tough-poker-player.html
======
nopinsight
I believe we are witnessing the Cambrian explosion of intelligences. The
techniques behind Libratus (abstraction algorithm and game theory [1]) appear
to be qualitatively distinct from those behind AlphaGo (deep reinforcement
learning and MCMC) and DeepBlue (search and heuristics).

An ecology of Artificial Intelligences, unbounded by our evolutionary history
and neural architecture, could evolve to suit each particular task more
effectively than our brains can.

Promises and perils abound.

[1] [http://www.cs.cmu.edu/~sandholm/](http://www.cs.cmu.edu/~sandholm/)
section "Algorithms and complexity of solving games"

~~~
echelon
Chess, Go, Poker... All feel like variations on the same theme. While it's
obvious there is innovation being done, I want to see something more
challenging. Something with more dimensionality and integration of several
types of input.

How about a machine that can beat someone at Smash Bros, a game with varied
characters, complex comboing mechanics, and a nontrivial computer vision task?

Or--more difficult by a few orders of magnitude--what about a robot that can
beat someone at tennis? Or a team of robots that can best a professional
basketball team?

When do you suppose we'll begin to see these sorts of things? Within our
lifetime, I hope?

~~~
logicallee
>Something with more dimensionality and integration of several types of input.

like using a few degrees of freedom and little strength to chop onions and
some vegetable, crack eggs, whisking them, pour a bit of oil into a pan and
lighting a stove, pour the omelette, flip it onto the plate, throw the
eggshells in the trash unless it's too full (and take the trash out if it is)
and wash the chopping board and pan with a sponge (adding a little dish soap)
and not too much water, then rinse them thoroughly but without too much water.
Which poses no challenge to most adult humans who take just a bit of time to
learn (if you never learned to make an omelette and you're an average adult,
by Wednesday you can make a perfect omelette every time.) Even though
objectively humans have very weak hands, see things very, very slowly compared
with machines, and cannot do any single mechanical action as reliably and
predictably as robots.

computers might be able to find and count all the primes between one and a
million before I can count the ones up to ten[1], but they can't even scrub my
bathtub with a sponge given a whole afternoon to do it - not without a lot of
specialized robotics anyway.

[1] [https://www.quora.com/How-long-does-the-fastest-algorithm-
ta...](https://www.quora.com/How-long-does-the-fastest-algorithm-take-to-
generate-primes-up-to-a-million-and-up-to-ten-million-on-a-standard-home-
computer)

~~~
Qworg
We make up for hand weakness with hand dexterity, slow vision processing with
incredibly rich feature sets and powerful cameras, and reliable actions with
previsioning and prediction on every task.

The human system is pretty incredible.

~~~
logicallee
Yes, this is the kind of task computers should "compete with" \- using weak
and semireliable motor functions and cheap cameras (no infrared or laser
vision etc) and making up for it with "smarts" \- the way humans do.

------
oculusthrift
It's important to note that last time, the poker players beat CMU's AI by
around 500k chips as well and they had the gall to declare it a "statistical
tie". Yet if they win by chip they will claim to have "won".

~~~
smohnot
[http://motherboard.vice.com/read/a-poker-playing-
supercomput...](http://motherboard.vice.com/read/a-poker-playing-
supercomputer-just-barely-lost-to-human-pros) Lost by $732,713, with $170
million bet... pretty close. But you're right, a victory by a chip will be
declared a win

~~~
dgacmu
That's both an unfair slight to the ethics of the researchers involved, as
well as inaccurate. They've published in advance their criteria for declaring
a victory vs a tie: [http://www.cardplayer.com/poker-news/21215-poker-bot-
doubles...](http://www.cardplayer.com/poker-news/21215-poker-bot-doubles-lead-
in-historic-match-against-humans)

which is: "If after 120,000 hands either Libratus or the humans are one
standard deviation above break-even, they will have won the competition with
“statistical significance.”"

(I'm a professor at CMU, but I have nothing to do with this research or
competition.)

~~~
oculusthrift
we mostly mean by the media. Story if they barely lose "Poker bot
statistically ties pros". Story if they are one chip up: "Poker bot
leads/beats pros"

------
angelofm
I played poker part time heads up (one against one) previously and the amount
of study and analyses players have to do is huge to only get to a reasonable
level.

This challenge is very unfair to players so I wouldn't say it won, players
have a massive disadvantage, every professional player has tracking software
and a database to analyse every decisions that has been made.

This is of course extremely important because you can model your strategy to
exploit the suboptimal decisions made by your opponent, yet players have no
access to any of these tools, so the bot adjusts it's play based on their
human opponents but humans cannot do the same and are left with a guessing
game.

If they want to make a proper challenge then players need to have access to
the tools they usually use playing in the Internet.

~~~
Anderkent
>o the bot adjusts it's play based on their human opponents but humans cannot
do the same and are left with a guessing game

Do we know if it actually does it? I imagine it's much simpler to build a bot
that plays a balanced profitable strategy rather than one that tries to build
a model of their opponent and exploit it.

~~~
deong
It's stated in the article. During a day's play, the AI suggests plays based
on its current knowledge. At night, the day's events are fed into the system
for it to learn on for the next day.

I assume there are probably papers that specify what form of learning is
taking place, but the article didn't go into that level of detail and I
haven't tried to track it down.

------
kriro
Trying to play GTO is an interesting (and very profitable) approach but
ultimately it's not the most profitable approach. It makes sense for the AI to
be constructed that way because the matches it's likely to play are against
good opponents. Most profit in poker comes from playing against not so great
opponents though. Against those you usually play extremely exploitable on
purpose.

That being said, the AI seems pretty impressive. Not sure how they picked the
players I could think of a few HUNL players I'd rather see but they might not
be interested in a 200k freeroll.

~~~
xapata
It's tough to decide what skill in poker really means. Is it that the AI can
win against a good player or that the AI can earn money from bad players
faster than other good players?

------
lucidrains
They should do a Libratus vs Deepstack tournament
[https://arxiv.org/abs/1701.01724](https://arxiv.org/abs/1701.01724)

------
skoutus
I wonder if the players tried colluding: coordinate their bets to fake a
weakness so the AI start to adopt a poor strategy, then up the bets and stop
the feign. I don't see how the AI can protect itself against that.

~~~
hunl
The aim of the AI isn't to adopt to poor strategies, rather to play an
approximate optimal strategy itself. It's aiming to be unexploitable, the
further the other players deviate from optimal, the more it wins. It's EV
(expected value) comes from the other players not playing optimally, it
doesn't care about exploiting individual weaknesses.

~~~
xapata
Then I'd say it's not a very good poker player.

~~~
hunl
If you define a 'not very good' strategy as losing at a maximum of 0, then
sure. Playing optimally means the worst case scenario against _any_ opponent
would be breaking even. It doesn't have to be trained on individual playing
styles, it is simply playing each spot theoretically correctly.

An example, say the humans are getting to a river situation with too many
bluffs for a given betsize, an exploit for the AI would be to always call. The
opposite is also true, if they are bluffing too little it should always fold.
The players notice that the AI has adjusted, and adjust their frequencies -
now exploiting the AI. By taking an exploitative approach the AI leaves itself
open to be exploited, this is not the goal.

If this were rock paper scissors, the AI is doing the equivalent of always
throwing each at 1/3 - even when it's opponent throws rock every time. It
could switch to paper, but a thinking opponent will now switch to scissors,
this will continue until we are back at equilibrium. The AI aims to play poker
in this same fashion, having the correct frequencies of actions for a given
range in every spot.

~~~
xapata
A better AI should be able to fool the opponent into thinking it has thrown
rock (metaphorically) so that the opponent throws paper while the AI instead
throws scissors.

Poker isn't about equilibrium, it's about misdirection and exploitation. When
the table gets cold, you liven it up by convincing everyone to do a round of
straddle.

~~~
hunl
Heads up poker is precisely about equilibrium. Your straddle reference is also
irrelevant, this is not live multiway poker.

"Tricking an opponent into thinking it has metaphorically thrown rock"
extrapolated into a poker example would be betting larger/smaller, calling
more/less, folding more/less than is optimal in a given scenario in the hope
that your opponent makes a (bigger) mistake. You're simply hoping he makes
more errors than you, the AI instead choses to just make zero mistakes and let
the opponents do the rest. You can see this in action for yourself in Heads up
limit holdem by playing Cepheus ([http://poker-
play.srv.ualberta.ca](http://poker-play.srv.ualberta.ca))

~~~
xapata
You're still thinking one hand at a time. It may be possible to confuse the
opponent into permanently shifting strategy.

I agree that would not happen if two equilibrium-seeking computers played each
other. Since the human strategy is unknown, it is possible that equilibrium
may not exist or be optimal. Even if it's two computers, if one of the
computers has the possibility of choosing a non-equilibrium strategy, then
again the optimal strategy may not be to seek equilibrium.

------
grizzles
Libratus biggest edge is probably grinding away at their blunders & tendencies
(eg. Don never check raise bluffs in this spot, so I can safely value bet). It
would be really interesting if they published the bots results vs. a mythical
generic player. The difference would be a nice estimate as to how big an edge
it derives from backtesting it's personalization strategies.

~~~
iopq
It doesn't use player tendencies, it plays closer to Nash equilibrium than the
human players, AFAIK.

~~~
grizzles
Yes it does. The Libratus bot uses a conterfactual regret minimization
algorithm variant of the CMUs teams' own design to calculate endgame strategy.
The inputs to that algorithm explicitly takes into account previous player
behavior.

~~~
iopq
It doesn't know who it's playing against, so it accounts for the behavior of
ALL players it played against, no?

------
czzarr
That is very unconvincing as 49k hands is really nothing and not enough to
iron out variance unless the edge is really big (which doesn't seem to be the
case). Any serious poker player will tell you that. They should play 1-10
million hands (depending on the edge) in order to get a decent idea of where
this is going.

~~~
vannevar
Yes, I'm wondering if anyone is tracking the relative strength of each
player's hands. In the end, the bot should be declared the "winner" only if
its winnings were disproportionately high in relation to the strength of its
hands.

~~~
tansey
The usual way they reduce variance in these man vs. machine poker showdowns is
to do "pairs" play. You have two humans playing simultaneously in isolated
locations. The decks for both humans are the same, but player 1 and 2 are
swapped for one human. That way, the bot strategy has to play both hands.

It does totally eliminate variance, but they also take that into account and
correct for it when looking at final outcomes usually. Right now the bot is up
by something like 800K over 60K (out of 120K) hands. If that rate continues,
it will win by around 1.6M or 400K per human. The blinds are 50/100, so that
would equate to roughly 33 millibets (thousandths of a big blind per hand).
That isn't too far off from standard win rates in bot vs. bot tournaments [1].

I'd say it's likely that the results of this tournament will be a
statistically significant win for the bot.

[1]
[http://www.computerpokercompetition.org/downloads/competitio...](http://www.computerpokercompetition.org/downloads/competitions/2014/results/results_2pn_tbr.pdf)

~~~
czzarr
That trick doesn't change the variance at all if decks are the same.

Typical winrates in human vs human are between 1-5ptbb/100 where 1ptbb = two
big blinds. At 1ptbb the variance is pretty big and north of 1million hands
are probably necessary to establish an edge, whereas at 5ptbb the variance is
much smaller and 100k hands are usually enough to converge to the expected
value

------
sagivo
I would love to know where they got their training data or if there's one
publicly available.

~~~
osti
This is not neural net. It uses an algorithm called counterfactual regret
minimization to compute the Nash equilibrium of the game, no data required.

~~~
sagivo
what about -
[https://arxiv.org/abs/1701.01724](https://arxiv.org/abs/1701.01724)

~~~
osti
That's a new bot by university of alberta. Funnily enough, even that doesn't
require actual data. It uses randomly generated ones.

------
mrkgnao
How would someone with no knowledge of the game learn to play poker? Any good
books that stress the mathematical/probabilistic(/game theoretic...?) side of
things?

~~~
davidivadavid
Assuming you at least know the rules of poker, you can probably start with the
MIT course on poker theory, available here:
[https://www.youtube.com/watch?v=OTkq4OsG_Yc&list=PLUl4u3cNGP...](https://www.youtube.com/watch?v=OTkq4OsG_Yc&list=PLUl4u3cNGP61kfOW3zAIfpNhf0piao8oo)

------
andrewprock
Unless things have changed significantly from the previous version (Claudico),
this is not really no limit hold'em, as the stack sizes are reset every hand.

~~~
poikniok
How is that not no limit?

~~~
notahacker
One of the distinctive traits of no limit tournament is that on any given hand
a player can be knocked out the game with a big enough bet, which affects how
players are likely to play (as might one player amassing a large chip lead ove
another if neither player opts to go all in on early hands)

Of course it also makes games shorter and introduces a lot more variance,
which isn't so good for assessing how well a computer plays.

~~~
Jefro118
This isn't a tournament though, it's more like playing a very long cash game
where the stacks are reset on each hand. Whoever has the profit at the end of
the cash game will be the winner.

------
product50
Honestly, this doesn't look to be a true test of who is better at poker. Poker
is not like chess - a lot of emotions are involved in reading another players'
cues (if he is touching his face, whether he is talking nervously, how quickly
does he call all in etc.). Making all this computerized seems to take the
spirit of poker away. It is like playing online poker which is an entirely
different ball game vs. real poker.

~~~
phaus
Physical tells are almost entirely Hollywood nonsense. When a professional
player talks about getting a read on someone, they are talking about
probabilities combined with the opponent's position at the table and a history
(even a short history) of that player's betting/play patterns.

~~~
patrick_haply
> Physical tells are almost entirely Hollywood nonsense

That's true for most any level of serious player, but I see it often in casual
games in peoples kitchens with casual players who don't play often or are just
starting. Even I can't control myself sometimes when the adrenaline starts
pumping. I'd rather say lack of self control is just one of the most amateur
things you can do in poker, and what's nonsense is the idea that it's a
meaningful aspect of any kind of serious poker game.

------
LAMike
Wonder how long it will be until there is a crowdsourced AI winning the World
Series of Poker?

~~~
misja111
Very very long. First of all, this bot was capably only of analysing heads up
play. At the WSOP there can be maximum 10 players at the table, and with each
extra player, the number of combinations to analyse grows exponentially.

Second, in the WSOP it is key to exploit weaker opponents. This bot was able
to find almost perfect play against expert opponents but exploiting weak play
is a different ballgame, especially if you are facing a mix of strong and weak
players at a multi handed table.

------
EGreg
How good are the BEST poker bots right now at holdem with more than two
players?

------
rhlala
Easy with 16 tetra of ram

~~~
spectrum1234
And tanking for 1+ mins on turn and river decisions (it would time out in real
online poker). And $10M in computer resources. But impressive nonetheless.

~~~
xapata
Eh, I'll tank for a few minutes in live poker. I don't mind that the computer
does too.

------
eutectic
I wonder if it is possible to achieve super-human play purely using
reinforcement learning to train deep neural networks.

------
bossx
Is this really fair? Part of Poker is "reading" when other players are
bluffing, it's a lot more challenging to read when a computer is...

~~~
iopq
First of all, that's not true. Most players play several tables at the same
time against the same player, up to four heads-up. That's something like 600
hands an hour in HU.

There's little heads-up play in live games, other than at the end of the
tournament, and the stack sizes are completely different. These are not
tournament players, they're heads-up specialists. They most assuredly play
online the majority of the time.

Second of all, the computer doesn't read the human players, why would it
matter? It's all up to actual strategies at that point.

