
No limit: AI poker bot is first to beat professionals at multiplayer game - Anon84
https://www.nature.com/articles/d41586-019-02156-9
======
thomasfl
One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in
the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at
Carnegie Mellon. Speaks four european languages, including swedish. And now at
the age of 51, he has created the best AI powered poker bot.

[https://www.cs.cmu.edu/~sandholm/cv.pdf](https://www.cs.cmu.edu/~sandholm/cv.pdf)

~~~
jacquesm
Not to belittle the man's other achievements but speaking four languages is
pretty normal in Europe, except when you're from the UK.

~~~
loup-vaillant
> _speaking four languages is pretty normal in Europe_

 _Northern_ Europe, maybe. French people for instance tend to suck at foreign
languages. We rarely go beyond 3 languages (French, English, then German or
Spanish. The last two are often forgotten after school.)

I suspect Spain and Italy are similar.

~~~
jfengel
As an American, I am now going to bang my head into a wall.

~~~
stronglikedan
Nothing to do with being American, since you're afforded the luxury to learn
other languages _for free_ through public schooling. If anything, bang your
head because you _chose_ not to.

~~~
jfengel
The offer is made, but the reason for doing so isn't made clear. I didn't
understand it at the time; I availed myself of it in a minimal way. Most don't
do that.

Some of that is the accident of geography: it simply wasn't necessary. Today,
we are more connected to our Spanish-speaking neighbors, and the value of
learning that language is becoming increasingly obvious. I don't know whether
the schools are doing a better job of stressing that than they did when I was
in school.

I have indeed chosen to learn other languages, several of them. I wish I'd
done it in school, at a time when my brain was more open to it. Unfortunately,
that was also a time when I didn't know very much and put my priority on other
things that ended up making less of a difference in my life.

~~~
Kaiyou
It's a myth that you learn languages easier earlier in life. Mastering a
language takes about 10 years, it's just that when you start at age 6, you
could be done by age 16.

------
pesenti
Blog post: [https://ai.facebook.com/blog/pluribus-first-ai-to-beat-
pros-...](https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-
in-6-player-poker/)

Science article:
[https://science.sciencemag.org/content/early/2019/07/10/scie...](https://science.sciencemag.org/content/early/2019/07/10/science.aay2400)

~~~
YeGoblynQueenne
>> Pluribus is also unusual because it costs far less to train and run than
other recent AI systems for benchmark games. Some experts in the field have
worried that future AI research will be dominated by large teams with access
to millions of dollars in computing resources. We believe Pluribus is powerful
evidence that novel approaches that require only modest resources can drive
cutting-edge AI research.

That's the best part in all of this. I'm not convinced by the claim the
authors repeatedly make, that this technique will translate well to real-world
problems. But I'm hoping that there is going to be more of this kind of
result, singalling a shift away from Big Data and huge compute and towards
well-designed and efficient algorithms.

In fact, I kind of expect it. The harder it gets to do the kind of machine
learning that only large groups like DeepMind and OpenAI can do, the more
smaller teams will push the other way and find ways to keep making progress
cheaply and efficiently.

~~~
kqr
Yes! I work for a company that does just this: pull big gears on limited data
and try to generalise across groups of things to get intelligent results even
on small data. In many ways, it absolutely feels like the future.

~~~
mooneater
Interesting, are you using bayesian methods?

~~~
kqr
Does "Bayesian methods" mean anything specific? Parts of the core algorithms
were written before I joined, and they are very improvised in the dog-in-a-
lab-coat way. I haven't analysed them to see how closely they follow Bayes
theorem and how strictly they define conjugate probabilities etc. (we are also
heavily using simple empirical distributions), but the general idea of
updating priors with new evidence is what it builds on, yes. I have a hard
time imagining doing things any other way and still getting quality results,
but that is probably a reflection on my shortcomings rather than a technical
fact.

------
gexla
It's easy to "take away" too much information from this. The focus is that an
AI poker bot "did this" and not get too much into other adjacent subjects.

But what's the fun in that?

10,000 hands in an interesting number. If you search the poker forums, you'll
see this is the number you'll see people throw out there for how many hands
you need to see before you can analyze your play. You then make adjustments
and see another 10,000 hands before you can assess those changes.

In 2019, it's impractical to adapt as a competitive player in live poker. A
grinder can see 10,000 hands within a day. The live poker room took 12 days.
Another characteristic of online poker is that players can also use data to
their advantage.

So, I wouldn't consider 10K hands as long term, even if this was a period of
12 days. Once players get a chance to adapt, then they'll increase their rate
of wins against a bot. Once you have a history of hand histories being shared,
then it's all over. And again, give these players their own software tools.

Remember that one of the most exciting events in online poker was the run of
isildur1. That run was put to rest when he went bust against players who had
studied thousands of his hand histories.

This doesn't take away from the development of the bot. If we learn something
from it, then all good.

~~~
csa
You clearly didn’t read the additional links they posted. They mentioned why
they chose 10k (AIVAT), and it goes far beyond any of the variables you
mentioned.

For any number of hands, my money is on the bot.

~~~
Traster
That really doesn't address the point that was raised. It's not that the bot
wins through luck and that 10k is too small a sample, it's that a good
professional poker player isn't good over 10k hands, they're good over 5
years.

Any good player will have their play analyzed and responded to, so there's a
feedback loop there - any good player will have their play analyzed, exploited
and will have to re-adjust their strategy to respond to exploitative play. The
question is: How does the AI strategy adapt over time to players who know the
hand history of the AI strategy. That's an extremely important part of being a
top level player. To give you an example - if you watch Daniel Negreanu's vlog
about his time at the WSOP he actively talks about changing his strategy in
response to his analysis of different players' profiles. This is especially
important in Sit & Go where at high stakes you'll have regular grinders who
build up reputations - less so in tournaments where you're less likely to meet
any given player.

~~~
hdkrgr
This will be interesting to see.

Brown and Sandholm's algorithm aims to play a Nash Equilibrium which by
deifnition _cannot_ be exploited by a single opponent player as long as all
players are playing the equilibrium strategy. As they note in the paper this
gives you a strong optimality guarantee in the 2-player setting. It was
unclear whether this would transfer to real-world winnings in the multi-player
case, and while it looks like it does for now (for current strategy-profiles
of human players) humans might be able to adapt to the strategy played by the
bot. Given the fact that the bot wins against current human strategy-profiles
in the n-player setting, it's likely (but not a sure thing) that human players
will have to team-up against the bot to exploit it. That seems rather unlikely
to me.

------
noambrown
I'm one of the authors of the bot, AMA

~~~
throwamay1241
Who _were_ the pros? Are they credible endbosses? Seth Davies works at RIO
which deserves respect but I've never heard of the others except Chris
Ferguson who I doubt is a very good player by todays standards (or human
being, for that matter), but I've never heard of the others when I do know the
likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider
100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as
your opponent and yourself play with equal skill, variance increases to the
point where regs refuse to sit each other.

~~~
noambrown
LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1
Human experiment but not the 5 Humans + 1 AI experiment.

We used AIVAT to reduce variance, which reduces the number of samples we need
by roughly a factor of 10:
[https://poker.cs.ualberta.ca/publications/aaai18-burch-
aivat...](https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat.pdf)

------
auggierose
This is fascinating stuff. So do I understand this right, Liberatus worked
using computing the Nash equilibrium, while the new multiplayer version works
using self-play like AlphaGo Zero? Did you run the multiplayer version against
the two-player version? If yes, how did it go? Could you recommend a series of
books / papers that can take me from zero to being able to reprogram this (I
know programming and mathematics, but not much statistics)? And how much
computing resources / time did it take to train your bot?

~~~
noambrown
Training was super cheap. It would cost under $150 on cloud computing
services.

The training aspect has some improvements but is at its core similar to
Libratus. The search algorithm is the biggest difference.

There aren't that many great resources out there for helping new people get
caught up to speed on this area. That's something we hope to fix in the
future. Maybe this would be a good place to start?
[http://modelai.gettysburg.edu/2013/cfr/cfr.pdf](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf)

~~~
dharma1
Is Oskari Tammelin still working on this stuff? I remember he wrote some very
fast CFR optimisations a few years ago

------
JaRail
So let me see if I understand this. I don't believe it's hard to write a
probabilistic program to play poker. That's enough to win against humans in
2-player.

With one AI and multiple professional human players sitting at a physical
table, the humans outperform the probabilistic model because they take
advantage of each other's mistakes/styles. Some players crash out faster but
the winner gets ahead of the safe probabilistic style of play.

So this bot is better at the current professional player meta than the current
players. In a 1v1 against a probabilistic model, it would probably also lose?

Am I understanding this properly? Or is playing the probabilistic model
directly enough of a tell that it's also losing strategy? Meaning you need
some variation of strategies, strategy detection, or knowledge of the meta to
win?

~~~
rightbyte
Interesting article. Too bad a don't have a subscription to read the paper.

The bot played like 10 000 hands. There is no way that is enough to prove it's
better or worse than the opponents.

More so in no-limit where some key all-ins can turn the game up side down. The
variance is higher than limit or fixed, right?

I did a heads up Texas holdem fixed bot with "counter factual regret
minimization" like 8 years ago from a paper I read. It had to play like 100
000 hands vs a crappy reference bot to prove it was better.

Strategy detection in so short games is probably worthless.

The edge is probably in seeing who are tired or drunk in paper poker.

~~~
junar
They mention that they use AIVAT to reduce variance.

> Although poker is a game of skill, there is an extremely large luck
> component as well. It is common for top professionals to lose money even
> over the course of 10,000 hands of poker simply because of bad luck. To
> reduce the role of luck, we used a version of the AIVAT[1] variance
> reduction algorithm, which applies a baseline estimate of the value of each
> situation to reduce variance while still keeping the samples unbiased. For
> example, if the bot is dealt a really strong hand, AIVAT will subtract a
> baseline value from its winnings to counter the good luck. This adjustment
> allowed us to achieve statistically significant results with roughly 10x
> fewer hands than would normally be needed.

[1] [https://arxiv.org/abs/1612.06915](https://arxiv.org/abs/1612.06915)

------
GCA10
Hi Noam: I'm intrigued that you trained/tested the bot against strategies that
were skewed to raise a lot, fold a lot and check a lot, as well as something
resembling GTO. Were there any kinds of table situations where the bot had a
harder time making money? Or where the AI crushed it?

I'm thinking in particular of unbalanced tables with an ever-changing mixture
of TAG and LAG play. I've changed my mind three times about whether that's
humans' best refuge -- or a situation that's a bot's dream.

You've done the work. Insights welcome.

------
cyberferret
With the advent of AI bots in Poker, Chess etc., what happens to the old adage
of "Play the player, not the game". How do modern human players manage when
you don't have the psychological aspects of the game to work with?

I see on chess channels that grand masters have to rethink their whole game
preparation methodology to cope with the "Alpha Zero" oddities that have now
been introduced into this ancient game. They literally have to "throw out the
book" of standard openings and middle games and start afresh.

~~~
pk2200
The chess channels you're visiting are grossly overstating Alpha Zero's
impact. AFAICT, it hasn't made any impact on opening theory at all. AZ's
strength is in the middlegame, where it appears to be slightly better than
traditional engines (like Stockfish) at finding material sacrifices for long
term piece activity and/or mating attacks.

------
neural_thing
How long until a slightly worse version of this model is reverse engineered
and appears at every table in online poker?

~~~
DevX101
Slightly worse versions are already out in the wild. Bot using the published
technique will be live in a couple of months tops.

~~~
rightbyte
Colluding bots are the main worry if you play online though.

------
merlincorey
Pretty incredible that this has scaled down from 100 CPUs (and a couple
terabytes of RAM) for their two player limit hold'em bot to just two CPUs for
the no limit bot.

------
donk2019
Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since
the human pros know which player is AI (read from your previous response), is
it possible for human players to conspire to beat the AI? And in theory, for
multi-player game, even the AI plays at the best strategy, is it still
possible to be beat by conspiracy of other players?

Thanks.

------
asdfman123
So, is this the end of online poker?

Will it just become increasingly sophisticated bots playing each other online?

~~~
trishume
I'm really confused about why stock for the company that makes PokerStars
hasn't moved at all today:
[https://www.google.com/search?tbm=fin&q=TSE:+TSGI#scso=_wqsn...](https://www.google.com/search?tbm=fin&q=TSE:+TSGI#scso=_wqsnXbuWBu2N_QbshojgBw2:0)

The fact that there's a published recipe for a superhuman bot that can be
trained for $150 and run on any desktop computer sounds like an existential
threat to their business.

The main mitigating factor I can think of is that you'd need to also
adversarially train it so it isn't distinguishable from a skilled human. But
that doesn't seem like it would be too difficult.

~~~
asdfman123
You know, now that we're talking about it I'm wondering if someone hasn't
already come up with a better bot and has just been silently using it to win
money online.

I'm sure the sites have been crawling with bots as long as they've been
around, some better than others. As long as it doesn't drive away too many
customers I doubt the sites care. They still take a rake on bot games. However
better AI could change that as the "dumb money" slowly dries up.

~~~
bcassedy
Dumb money has been drying up for years. There have been bots taking millions
of dollars out of games for more than a decade. Even bots from 10 years ago
were sophisticated enough to win money at mid-stakes poker (up to $2000 buy in
6max no limit games)

~~~
dbancajas
proof? I dont' believe this. 2K buy-in has a lot of regs that are pretty good
overall in cash games. Plus Pokerstars/FT has a pretty good anti-bot policy.
if you get caught bye bye to the $.

~~~
bcassedy
[https://forumserver.twoplustwo.com/153/high-stakes-pl-
omaha/...](https://forumserver.twoplustwo.com/153/high-stakes-pl-
omaha/massive-bot-ring-pokerstars-party-how-spot-them-1537778/)

There are a bunch of such threads over the years where through statistical
analysis, users have identified groups of dozens of bots.

While years ago many of the pros could theoretically beat these bots, it may
not have been by enough of a factor to overcome the rake. Of course if the
bots are practicing any game selection they can take money out of the economy
even if they can't beat pros.

Anti-bot measures is an arms race and the sites aren't always ahead of the
game.

------
solidasparagus
So Dota 2 doesn't count as a multiplayer game?

OpenAI Five beat the world champions in back-to-back games...

~~~
taejavu
Yes, Dota 2 is not a multiplayer poker game. I agree that the title is
ambiguous, but it's not a stretch to imagine that "poker" is implied here.

~~~
solidasparagus
I don't think it's implied considering the articles compares the poker bot to
go and chess bots (which are the non-multiplayer games the title is referring
to).

------
r00fus
I was really hoping the article would go into more detail on how the AI
engaged with the human players.

Was it online? the picture on the article seems to imply IRL.

If IRL, what inputs did it have, simply cards shown or could it read tells?
Did those players know they were playing an AI?

~~~
noambrown
It was online. The players were playing from home on their own schedules. The
bot did not look at any tells (timing tells or otherwise). The players knew
they were playing a bot and knew which player the bot was.

------
grandtour001
Were the games played with real money? Nobody is going to take fake money
games seriously.

~~~
slashcom
From the paper:

"$50,000 was divided among the human participants based on their performance
to incentivize them to play their best. Each player was guaranteed a minimum
of $0.40 per hand for participating, but this could increase to as much as
$1.60 per hand based on performance."

So the humans weren't betting their own money, but they still made more money
if they won.

------
rofo1
I'd love to see high-stakes heads-up bot vs Tom Dwan or Negreanu.

Maybe a bot technically qualifies as an opponent in durrr's challenge [0]? :)

How would bluffing influence the outcome? Both these players who are
considered very strong, are known to play all kinds of hands.

[0] -
[https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrr...](https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrrr_Million_Dollar_Challenge)

------
nishantvyas
I don't get this... Poker isn't pure mathematical... it has emotions involved
(greed, fear, belief, reading others, manipulations (to fool the opponent)...
and may be more... and all of these emotions arises differently for different
people based on their time, place, their world view, their background and
history...)

Are we now saying that a computer can do this all in simulation? if so, it's a
great break through in human history.

~~~
throwamay1241
At the nosebleeds, poker hasn't been around those things in a long time.

Poker is about exploitative play against people who base their play off
emotions, and unperfect game theory optimal against players who don't base
their play off emotions. The more perfect the GTO play is, the higher the
winrate against the latter group, but higher stakes games are built around one
or more bad players - pros will literally stop playing as soon as the fish
busts.

------
luckyalog
isnt it just possible that the bot got lucky. It plays good. Maybe really good
but does it play as good as a pro??? Would it win 9 wp bracelets. Would It
make it to day 3 of the world series of poker.

Chris Moneymaker got some damn good hands. Its part of the game. Its why this
feat is unremarkable and why poker is a crap game for AI. The outcomes are
very loose, especially when the reason these guys are pros is partially
because of their ability to read.

You are taking away a tool that made their proker players great and then
expect them to be a metric to test the AI. A better test would be to have pro
players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same.
Then you compare differences in play. With enough data points you can compare
situations that are similar but the AI did better or worse. This is a fair
comparison of skill.

Also if there are professional players at a multiplayer game the AI is getting
help from other players. Just like Civ V I get help from the AI attacking
itself. Im sure this AI got help from the players attacking eachother
(especially if they were doing so and making the pot bigger for the AI to grab
up, think of a player reraising another player after the bot does a check all
in).

~~~
awal2
Despite the luck/noise in Poker, there are reasonable measures of performance,
and while I'm not an expert in this area, the bot seems to be doing very well
(see paper for details). Poker is not a "crap game for AI" it's actually quite
a good game. It's a very simple example of a game with a lot of randomness (a
feature not a bug) and hidden information that still admits a wide variety of
skill levels (expert play is much better than intermediate play is much better
than novice play). This is a great accomplishment.

More links for reference: [https://ai.facebook.com/blog/pluribus-first-ai-to-
beat-pros-...](https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-
in-6-player-poker/)
[https://science.sciencemag.org/content/early/2019/07/10/scie...](https://science.sciencemag.org/content/early/2019/07/10/science.aay2400)

------
w_s_l
I would love to get a hands on the source. Hook it up to an API like
[https://pokr.live](https://pokr.live) and then basically build a computer
vision poker bot.

The trick is how to create natural mouse click movements or keyboard inputs.
This is the part that I'm most shaky on but the pokr.live API works by sending
screenshots which it will translate into player actions at the table

disclaimer: pokr.live API is a WIP

~~~
auggierose
You do that by letting a human play, informed by the bot.

------
DeathArrow
I was thinking a year ago about using Deep Reinforcement Learning in a poker
bot what stopped me was the impossible amount of data and computation due to
imperfect information nature of poker games. If I'll have the time I'll try to
implement thing akin to the search technique described in the paper.

It might pay better than a full time job.

------
axilmar
"At each decision point, it compares the state of the game with its blueprint
and searches a few moves ahead to see how the action played out. It then
decides whether it can improve on it."

That's exactly how the brain operates.

------
gringoDan
Curious if we'll see human poker pros get much better in the coming years as
they incorporate training regimens that involve bots (analogous to chess today
vs. 50 years ago). Seems like this will be the trend in almost every game.

~~~
TylerE
As someone who plays both games....I doubt it.

Poker is an incomplete information game with crushingly high variance. The
bots strategy is likely not quantifiable.

~~~
gringoDan
Can you expand on this? I'm a novice at both games, but the Facebook blog post
mentioned that the bot exhibited some unconventional strategies:

> Pluribus disagrees with the folk wisdom that donk betting (starting a round
> by betting when one ended the previous betting round with a call) is a
> mistake; Pluribus does this far more often than professional humans do.

Is it overly simplistic to think that humans could improve their game by
incorporating some strategies like this more/less often than they were
previously?

------
DeathArrow
I wonder what would be the impact of using Counterfactual Regret Minimization
instead of training a neural network based on hands played by real players?

Whys is using CFR better than training based on real data?

~~~
Tenoke
It's not necesserily better but with CFR you can learn beyond what humans have
learned, but on the other hand you dont learn their usual mistakes to more
easily exploit them. Also in this approach you need CRM since at every point
you are checking what would've happened if you picked something else, which is
just impossible with a fixed dataset.

------
auggierose
Would you say it would be hard to expand this to tables with 9 players?

~~~
noambrown
No, it wouldn't be hard. We chose six players because it's the most
common/popular form of poker.

Also, as you add more players it becomes harder and harder to evaluate because
the bot's involved in fewer hands, we need to have more pros at the table, and
we need to coordinate more schedules. Six was logistically pretty tough
already.

------
User23
The most interesting thing about this to me is the lesson it teaches human
players about bluffing.

------
indigodaddy
Was this cash or tourney format? How many blinds deep was the bot and the rest
of the players at the start?

~~~
GCA10
From the sample hands, it looks as if it's a cash game with stacks equal to
200BB. Plenty of room to play real poker.

------
ayemeng
Curious, why was 100BB used for six max? If I recall right, the head ups
experiment was 200BB?

~~~
noambrown
We considered both options but decided to go with 100BB because that is the
standard in the poker world. It doesn't make a big difference for these
techniques though.

~~~
srkigo
Could you try to run a training with ante included in the pot? I wonder if
open-limping would be a viable strategy with some hands. No one knows that and
it would be really interesting to find out. Ante should be equal to BB, like
it was in WSOP Main Event.

------
cklaus
Is the source code and data available for allowing others to play against this
not?

------
anbop
Would love to wire this into some kind of device I could play with at a
casino.

------
zzo38computer
I have seen fixed limit AI, and here is now no limit AI. Is there a pot limit
AI?

------
IloveHN84
Basically, all the online Poker rooms are now rigged and leading to frauds

------
donk2019
Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since
the human pros know which player is AI (read from your previous response), is
it possible for human players to conspire to beat the AI?

------
david-gpu
Time to cross poker off the list?

[0] [https://xkcd.com/1002/](https://xkcd.com/1002/)

------
alexashka
The title is misleading - bots have been beating no limit pros in 1v1 matches
for quite some time.

This is for 6-man games. The article mentions 10,000 hands - this is a very
small sample size to draw any real conclusions, as anyone who has dabbled in
online poker for more than a few thousand dollars can attest to. Regardless -
it's trivial to write a bot that'll beat 90% of the players, as site runners
can all attest to (bots are a serious problem that is not new). What does it
matter that a bot can beat 'the best' or 'professionals'? It's enough that it
can do better than the vast majority, outside of dystopian woes about robots
taking over or being 'superior' to human beings.

Glossing over all that - I am curious if this can be used for something other
than ruining online poker, which has largely already been ruined by allowing
multi-tabling professionals with custom software that gathers statistics on
players (data mining), existing bots, US government and irresponsible
(criminal) site runners (looking at you ultimate bet)

