Hacker News new | past | comments | ask | show | jobs | submit login

I'm one of the authors of the bot, AMA

What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.

The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.

As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?

Bonus questions in case you have the time and inclination to oblige:

What does this mean for people who like to play on-line Poker for real money?

Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?

I think it took the community a while to come up with the right algorithms. So much of early AI research was focused on beating humans at chess and later Go. But those techniques don't directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning).

Fortunately, these techniques now work really really well for poker. It's now quite inexpensive to make a superhuman poker bot.

So will this be the end of online poker?

It's pretty easy for good players to recognize other good players. And since the house takes such a large cut, the only way for pro players to have positive expected value online is to seek out games with poor players. So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

That said, I suppose it would be possible for the bots to become so prevalent that all this sort of opportunity is effectively used up, so the return vs time and risk for a human player is no longer worthwhile. (That already happened long ago for most players, as the initial online poker boom faded and most casual players left.)

On the other hand, all the major platforms have terms prohibiting using bots, so their numbers might be sufficiently limited to prevent that scenario.

It's my understanding the big sites have some pretty sophisticated bot detection systems, so in theory a bot that would be successful at beating online poker couldn't be a huge winner, it'd presumably raise too many red flags. However, if it were a near break-even player, with dozens, if not hundreds, of instances running at any given time, it's going to slowly grind out a substantial figure. You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc. I'm not a coder, but it seems like it'd be a tremendous undertaking to code a bot that would be a substantial threat to players. Then again, maybe I'm naive about the level of scrutiny the poker sites are employing.

One of the professors I used to work with some years ago was involved in stylometry research on human-computer interactions such as keystrokes and mouse input (for example, to determine if a user who had authenticated successfully earlier is not the same person currently typing based on keystroke cadence and pattern analysis - e.g. if someone sat down at an unlocked workstation and started typing, you could detect it and force them to reauthenticate).

It would probably be possible to figure out the types of detection being performed by the poker sites and use adversarial training methods to train a machine learning solution to mimic human input patterns. Or, more pragmatically, have the bot analyse the state of the game and give orders for a human to perform at their own natural pace.

Poker sites mainly detect bots based on their login times, number of tables, time per action, etc.

A successful bot shouldn't get caught for "playing like a bot" because the moment it's actions are that predictable it would presumably no longer be effective.

But it will get caught for operating like a bot. So, don't run it 24hrs a day. Sites also randomize things to keep bots at bay, even card imagery.

If your performance and success drops whenever they randomize something that gives the bot false inputs, then you might get caught.

Inputting all of the poker events manually would be really tedious I'd imagine.

Of course, if you're winning millions, they can interview you about your poker history and how you got so good.

It sounds like easy money, but probably not.

Just play as you normally would, with the bot advising moves from the laptop next to you.

Right, but the bot needs to know who is in what position what the bets are, who folded, etc. Try inputting all of that information manually to the laptop next to you and you'll quickly get frustrated. Online poker is a fast game with lots of data points.

TensorFlow, PyTorch, Caffe, Keras, MXNet, and OpenCV could copy the game if you split the video input for the player and the bot.

Yes, but see my previous comment.

People have tried it and online poker sites know they've tried it, so they'll randomize images and other data. If you take a dive when the randomizations are triggered and outperform otherwise good luck trying to collect your winnings.

An external camera with Image processor does that

Not to mention, if you get caught, there could be worse consequences than just having your account locked. The site could (and likely would if the scale was significant) sue you for not only all your winnings, but damage to their business. They would likely win (since you're flagrantly breaking their terms of use contract), and bankrupt you.

Edit: In fact, if we're talking worst case, circumventing their anti-bot restrictions would presumably be illegal under the CFAA. So if you're in the US you could even be charged criminally, although I expect in reality that would be less likely.

>You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc.

You might be surprised by the lengths people go to in order to bypass bot-detection just for ordinary games. All of the things you mentioned are pretty standard. Considering there is serious money on the line here, I am positive that plenty of poker bots will be virtually indistinguishable from professional players, if they aren't already.

The same argument of money being on the line applies to the detection. Poker software is already pretty damn impressive with its tracking. The online casinos actually stand to lose more money than the bot creators could make, so the detection has a greater incentive, and is likely to triumph.

They only lose if there are less plays, surely? I assume they take a cut of all winnings, they're not putting up stakes.

Yes, I'm assuming that if bots work their way into everyday online poker that people will stop using it, so there would be less players.

I guess the real threat isn't a "bot" but something in the way of a program that interprets the data on the screen real-time and whose output instructs the player of the "optimal" play, given the circumstances. How the hell would you deter that as a site operator?

No, I think your earlier example of a swarm of just-above-break-even bots would be much more difficult to combat. Even if they can be detected, the anti-detection countermeasures can evolve, turning it into an arms race. Anything you can model in your bot detection algorithm, the bot-maker can model too.

Reaction times ought to be one of the easiest things to fake. All it would take is a bunch of monitoring of large numbers of games to create a nice model of real player reaction times, which in all likelihood are normally distributed anyway.

Not normally distributed, as negative reaction times are unlikely. You could use log-normal, but I believe that a mixture of exponential and gamma tends to be used by reaction time researchers (search ExpGamma).

negative reaction times are unlikely

Oh, right. I was thinking along the lines of 100m dash, where people often do have negative reaction times (which we penalize as false starts).

In poker we don't have much of an incentive to react instantly to any play.

Pretty sure I've read a long time back on 2p2 that large consistant winners on certain sites have been asked to submit camera footage of their play with a clear view of screens and inputs. So this is probably something that companies like Pokerstars have been dealing with for years already.

It would be pretty easy to hide something signalling you on what to play from cameras.

True, but ultimately if they're unsure they'll just ban you from the platform anyway. Consistent, winning players aren't really where they make their money, and they're free to ban anyone they like. (I realize technically they take a cut from all players, but more money gets sloshed around for them to skim off of if winning players aren't removing it from the system.)

That was what I was thinking, the bot augmenting a human's playing ability rather than playing itself.

> So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

The problem would be if i was a pro i would rather run 1000 bots than play myself. Which means the only players left are AI and fish. Once the fishies learn of this fact, they will abandon in drove.

It's all gonna go back to live poker soon.

No, having losing odds never stopped anyone from gambling.

That's simply not true... I don't play casino games because of the losing odds. I play poker because of the winning odds. I guess you meant "having losing odds doesn't stop everyone from gambling".

Even with a magical human test, you couldn't know whether it was human + robot performance.

Just bet on bots playing each other.

So... like Wall Street!

I'm sitting here considering the possibility of making my own bot to play low stakes online poker ($1.50 sit n go). Run it on 6 tables at once and I imagine it would be facing really poor opponents and would have a steady flow of cash.

until your bot gets caught (possibly quickly) and then you're banned from the sites.

Even if it is, it means a new live poker boom which is a very good thing

It must be. It is way too hard to prevent humans from using an AI. Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.

>> Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.

Not really. With perfect information you know the correct strict equity plays assuming normal opponents. This doesn't give you the ultimate answer, because a player's reads and inference about another player is definitely an input - especially at the highest level - but it is more than enough to give you a winning/losing player at the small/midstakes.

source: worked for an online poker company that had these tools... and far more available to us

> a player's reads and inference about another player is definitely an input - especially at the highest level

I think player-dependent strategy is more important at lower levels because the players are much further away from what you call "normal opponents", so there's far more opportunity to exploit their mistakes.

>Some chess services try to check if you're playing "too perfect" [...]

That's interesting, could you share an example? Most of my search results are anecdotal Reddit threads about how many people cheat in online chess.

All the major online chess websites have anti-cheat mechanisms. They don't publish details of how they detect cheaters though, and I don't know how good they are.

From what I've read, they work by comparing the player's moves against chess engines, and if the player is picking engine's choice too often in positions where there are multiple roughly equal moves, they get flagged.

I always found weird that someone would want to cheat in a game like online chess. I mean, what's the point? Does anyone have insight on what's going on in the head of cheaters?

A few reasons come to mind. One is simply that if you have any metrics (ranking, win/loss ratio, greater site access..) it's going to feel nice to see them improve. Another is that losing at anything can be ego-hurting (similar reason good players sometimes sandbag with new accounts / lower ranks they can't possibly lose to, they need to 'win' more). Or reverse sandbagging/trolling with a bot might be amusing. Another is the cheater may justify it as a self-teaching game, and might not always play the strongest move but see if their move is even in consideration or try to improve ability to see the better moves by having them always pointed out -- but why not just play the bot, or save that for post-game analysis? I like to run my go games through gnugo's annotated analysis at the end (as I'm very weak I assume even the weak gnugo can teach me things), it'd be too troublesome to use it in a live game.

Other players justify cheating by convincing themselves that everyone else is cheating.

it's where the enjoyment comes from. cheaters don't enjoy the game as much as they seeing their ELO/MMR go up or in the worst case they're psychopaths who just want to mess with other people's heads.

People enjoy the the feeling of having power over others.

Even so most people don't bother with standard games online since its way too easy to cheat by mirroring the game and basically undetectable if they are good enough to not play lines that look like "computerish" moves.

>> Computers are better at maths than humans.

OP discussed it but while this is true, it is not necessarily true or straightforward when it comes to games with hidden information like poker. This is more of a game theoretical problem (Economics) than it is a purely mathematical one, which had less support in the AI/ML community, hence the delay.

The lower CPU/GPU/resource use supports that fact as does your intuition. Breaking poker required a lot of manual work and model design over brute force algorithms and reinforcement learning.

The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.

It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.

At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?

The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.

It does bluff, but that’s not my point. My issue is that it bluffs without consideration of its opponent. High level strategic play of most games is about adapting to your opponents play. This bot does not do that. It is secretly a giant lookup table of game state to response.

In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.

If you think about it, any AI that's stopped learning and is now efficiently doing pattern matching or pattern completion (assuming memory and attractor states), instead of running a complex search, is arguably a fancy lookup table hashed by similarity. This includes humans. In other words, lookup table isn't the slight most think it is. But the bot does do real time search so it's not "merely doing" a look-up.

Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.

It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.

While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.

I don’t mean look up table as a bad thing. I mean it’s a lookup table on game state, without incorporating any information about the players. But good points

> High level strategic play of most games is about adapting to your opponents play.

Is this true in any meaningful sense?

For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.

Adaptability is certainly not necessary (almost by definition) if you're playing a near to equilibrium strategy but adaptability is a useful skill to have in a general non-stationary world.

That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.

> Pluribus must infer its opponents strategy profile

This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.

You're right that it uses itself as a prototype for decisions but the fact that it also maintains a probability distribution over possible hole cards and that it updates according to observed actions is already richer than the local decision only approach taking most all other bots. This is sort of forced by the simplicity of poker's action space combined with the large search space and imperfect information. Here, the simplicity ends up making things more difficult! They also use multiple play styles as "continuation strategies" so it's a bit more robust. And to be fair, I suspect much of human play does use themselves and experience as a substitute too.

> For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.

Well in online poker high level players make great use of player tagging, taking notes about players they have played before and what they've done in important hands or their patterns. Software exists to track how opponents behave in any given situation, and if it pops up again you use that.

I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.

I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)

Player tagging just makes you exploitable. I play one way now, you tag me "Haha, fool bet-folds way too much" and then I change it up to exploit you, "Huh, I keep trying to fold him out with worse and he doesn't bite even though my notes say he will".

The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.

For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.

You would be a fool to act based off only 2 instances of seeing a particular behaviour. That's why you have to weigh up how many instances you've seen. Sometimes if it's less than X instances it's not worth considering that particular statistic as valid.

Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.

The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.

For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.

I think we need to make a distinction between two kinds/styles of play:

1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.

2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.

You most often see 1 online, and 2 live, for obvious reasons.

A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:

1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.

2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.

3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.

Note that 1) is only true if your opponent is also not making many mistakes. Which fails to be true for most humans, where the combination of randomization and calculating state appropriate ranges is very difficult. This means that weak players can still lose heavily from mistakes/poor play within a reasonable number of hands, it need not be slim.

Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.

Player tagging is more complicated than a single game, and goes far deeper than playing a few hands one way and then switching it up. You can have player stats based on thousands of hands, you can know things about your opponent even they don't know.

I don't think you play very much, which is fine, but makes this discussion a bit pointless.

> In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

Adaptability is beaten by perfect strategic play in games with clear victory conditions.

My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.


> Rules Rather than Discretion: The Inconsistency of Optimal Plans

> Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.

"Strategic" is probably the wrong word, but I think there is a valid question here regarding the approach the AI is taking. One of the key things for a good poker player is having the ability to adapt and adjust their strategy depending on how others at the table are playing. Sometimes you can have the exact same cards in the exact same position and in one game it is smart to fold and in another game it is smart to raise. From the description in the article, it doesn't appear that this AI takes those ebbs and flows into consideration. Instead it seems to play "purely mathematically optimally on expected value" that was honed through trillions of simulations.

There is a cliche about how poker is about playing your opponents and not the cards. Is this AI is only focusing on its cards and ignoring its opponents?

The AI doesn't adapt to the opponents, and that's still an interesting challenge for AI research. That said, at the end of the day, it was making quite a bit of money playing against elite human pros. I think that suggests the cliche is, at least in part, wrong.

Making "quite a bit of money" still leaves open the possibility that the AI is leaving a lot of money on the table by not taking opponents into consideration.

Also I would be curious to see how it performs against people that aren't "elite human pros". Would this AI win at a higher rate in a game against average recreational players compared to the rate a pro would win?

Lastly it is also possible that the pros simply didn't have enough time to adapt to the AI which would be extra important considering the AI plays unlike humans and therefore is harder to predict.

I think the bot would make a lot of money playing against average recreational players, but it's absolutely true that if you can exploit bad players' weaknesses, then you can make more money than what the bot would earn.

We played 10,000 hands over 12 days in the 5 humans + 1 AI experiment. That's quite a long time, and there's no indication that they even began to uncover any weaknesses in that time period. So I'm fairly confident the AI is robust to exploitation, and I think that's a very important quality to have in any AI system.

That 10,000 total hands number isn't particularly meaningful on the point of adaptability because the humans aren't sharing information with each other. The important number is how many hands each individual human played against the AI. Another question would be whether the pros knew which player was the AI? Because if they didn't, you are basically throwing a modified Turing Test against the pros before they can even begin to try to find tendencies in the AI. Predicting opponents is a huge part of how people play poker. If the AI plays unlike any human, pros are at huge disadvantage against an AI compared to how they would fair against a similarly skilled but more traditional human player.

None of this is meant to diminish what you all accomplished, I'm just highlighting areas of poker in which this AI would be less successful than humans even if it is more successful overall.

The humans knew the whole time which player was the bot.

There was an interesting IRL poker game a few years ago. The player who was running behind started going all in on every hand without even looking at their hand (with a huge amount of success).

Out of curiosity, how does a bot deal with oddities things like this?

This is a solved problem. Open-shoving is a feature of sit-n-gos, so of course people have simulated these and compiled so called "pushbot tables". The parameters are basically pot size and winning probabilities against a random hand.

While this particular bot may not have those programmed in, a more powerful variant eventually will.

The mathematical theory explored in the paper is that if multiplayer poker isn't one of the multiplayer finite state games that pathologically fails to converge to a Nash equilibrium, then it has one, and this strategy should approximate it. Intuitions about adaptability and the advantages thereof aren't applicable in the scenario where the opponent is playing to a Nash equilibrium. You can perform equally well by participating in the other side of the Nash equilibrium, but anything else is a losing strategy. The fact that this approximation converges to a strategy that's actually really good suggests that there is a Nash equilibrium, and that the converged-upon strategy is converging on it.

You can't out-think or adapt to a rock-paper-scissors opponent who selects at random. All you can do is also select at random and accept that the two of you have even odds.

>> Bots that play purely mathematically optimally on expected value aren’t effective or interesting.

Interesting is up to you, but effective is definitely wrong.

ICM-perfect bots crush small tournaments, which do not take into account opponent behavior - merely modeling the gamestate. The faster the blinds and the smaller the stacks, the better, but even normal structures get killed by these so-called "expected value" only bots.

Game Theory Optimal (GTO) attacks are incredibly effective at all levels of the game. The AI need not incorporate opponent feedback to be a winner. It can make it better, but it is not at all required.

First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).

Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?

[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]

Also, curious how much poker you folks play in the lab for "research".

We're doing cash games in this experiment. At the end of the day, this is about advancing AI, not about making a poker bot. Going from two-player to multi-player has important implications for AI beyond just poker. I don't think the same is true for cash game vs tournament.

There's a cash game almost every night at the FBNY office! I don't usually play though -- I'm not nearly as good as the bot.

> In Tourney play, the top 2 or 3 players get paid out

Or top 2 or 3 thousand... depends on the tournament but it's usually the top 15% ish.

True, I am thinking "sit and go" tournament where you would have 6 players like in this research.

Is there much to do here? ICM bots have this space covered pretty effectively.

But ICM is only a model that helps you evaluate information in the tournament, players will use it often to cap their bets or as a tipping point on a call, but I've never seen it used as a complete basis of play.

How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?

Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.

It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.

In the paper we include a graph of performance over the course of the 10,000-hand 5 humans + 1 AI experiment that was played over 12 days. There's no indication that the bot's performance decreased over time (there is a temporary downward blip in the middle, but that's likely just variance). Based on discussions with pros, it sounds like they didn't find any weaknesses and they didn't seem to think they'd find any given more time.

I think it would be hard for the pros to find exploits against the bot, but they could definitely lose less. When using solvers, pros generally only input a couple of sizings for bets, and avoid 2x+ pot sizings, which from the video it seemed like the bot used at much higher frequencies than other pros.

I'm not great at poker, but I did play a decent amount and I know a lot of my strategy involves probing for other people's weaknesses and shifting my strategy mid game to throw people off.

I feel like a lot of trained ML models have a lot laughable weaknesses, but perhaps they've been trained on every game they're well prepared for any tomfoolery.

The bot is trained to play Game Theory Optimal, aka it's playing to be breakeven at worst, which is why I believe it would be hard for a human to beat it. It's not playing perfectly, but the edges it's giving up is so marginal to perfect play that a human is going to lose simply by making a mistake at some point, even if a human were to use a solver to completely optimize their strategy.

I also suspect it would not be able to maintain a ~40bb/100 hand win rate. The thing about human players is, while the best are capable of learning and employing truly balanced GTO strategies, in practice they rarely adhere to these because other humans (even good pros) will still have exploitable flaws in their strategies, and attempting to exploit these will be more profitable than sticking to the unexploitable strategy; of course it also opens the exploiter to counter-exploitation, creating a fluctuating cycle of players trying to exploit, getting exploited, then moving back towards playing unexploitably. That's the normal state of a pro's strategy in a given game - so to switch to a steady state of always playing unexploitably would be a fairly big adjustment even to top tier pros who are capable of it.

Yeah, that is kinda what I was trying to tease out. These 10K hands are nothing compared to the XM of hands these pros have already played. It would be interesting to see how well they did after 1M hands. I'm sure the bot would likely still have an edge but I'd assume the players would adjust their strategy and but less confused by the random sized bets.

I was also confused by the sample videos where everyone had $10K at the start of each of the demo hands. It was unclear to me if that just the simulation of the hands or actual game play. If everyone starts every hand with $10K, then the feat seems less strong as going all-in has less risk.

Stacks are reset to 10k at the beginning of each hand, so they can use every hand to train a single model with the same starting state.

The fixed stack size doesn't really discount anything to me - it makes sense as an experimental control; and it's a cash game so there's no additional risk to going all in regardless of stack size.

But yea the sample size is definitely too small imo; when tested the heads up version of the bot some years ago they had it play a bigger sample (50 or 100k iirc?).

In online poker (at least with 100BB stacks) it's customary to top up between hands if you're below full stack.

The reason is simple: with table stakes, your maximum win for a hand is constrained by your own stack size.

I remember reading in the mid-to-late aughts that a lot of old-school poker players that used more swagger and intuition were starting to be run out of the game by kids who applied statistical methods.

Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

- Is there anything interesting going on with how the strategy is compressed in memory?

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

We tried to make the paper as accessible as possible. A lot of these questions are covered in the supplementary material (along with pseudocode).

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

The information abstraction is determined by k-means clustering on certain features. There wasn't much thought put into the action abstraction because it turns out the exact sizes you use don't matter that much as long as the bot has enough options to choose from. We basically just did 0.25x pot, 0.5x pot, 1x pot, etc. The number of sizes varied depending on the situation.

- Is there anything interesting going on with how the strategy is compressed in memory?


- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

We set a threshold at $100.

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

In each case, we multiplied by the biased action's probability by a factor of 5 and renormalized. In theory it doesn't really matter what the factor is.

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

This comes out naturally from our use of Linear Counterfactual Regret Minimization in the search space. It's covered in more detail in the supplementary material

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

I think it's all pretty robust to the choice of parameters, but we didn't do extensive testing to see. While these bots are quite easy to train, the variance is so high in poker that getting meaningful experimental results is relatively quite computationally expensive.

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

I think the key is that the search algorithm is picking up so much of the slack that we don't really need to train an amazing precomputed strategy. If we weren't using search, it would probably be infeasible to generate a strong 6-player poker AI. Search was also critical for previous AI benchmark victories like chess and Go.

Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy

I don't think the poker world would be happy with us if we did that. Heads-up limit hold'em isn't really played professionally anymore, but six-player no-limit hold'em is very popular.

It depends who you ask. I think it's inevitable that it's released one day. By not releasing you're just delaying it.

All the top high stakes players already have solvers that they've spent a lot of money developing and studying privately. They would definitely be upset with you, but by releasing the code you are democratizing the information to all the midstakes pros who want to study but don't have the resources to pay developers and solve the game privately.

If you're already using programs to help you, I don't see how you can be upset if someone else is cheating better than you are.

Someone watch this guy and see if he buys any fancy watches or nice cars in the next few years. ;)

Doesn't that make it a rather poor candidate for a scientific paper? Chest-thumping without data and code is, well, chest-thumping without data and code.

Have you thought about open sourcing the non-AI pieces? It would be great for other researchers so they wouldn’t have to build the poker pieces from scratch

There is some open-source code in this area, and hopefully there will be more going forward. Here's one example: https://github.com/EricSteinberger/Deep-CFR

a. Is CFR applicable in single player hidden-information games? (e.g. state is initially hidden, gradually revealed to the agent, but there is not adversary)

b. How much more efficient is the improved search algorithm? the $150 number sounds like a couple of order of magnitudes..

a. There was this paper a couple years ago applying CFR to single-agent settings: https://arxiv.org/abs/1710.11424

b. It really depends on the game and the situation. It can be several orders of magnitude in six-player poker. In other games, it can be even more.

Why are you concerned about the happiness of the poker world?

Well if they upset the poker world do you think they would have top pros willing to go on record endorsing them?

Top pros will endorse whatever they're paid to endorse.

This is falsifiable by any number of cases, but Isaac Haxton spurning PokerStars is probably one of the best examples so others see your comment is not universally applicable.


>However, Haxton isn’t accepting PokerStars’ olive branch as he was among the victims defrauded by the online giant for millions of dollars.

I'm not sure the really provides strong opposition to the GP's claim.

PokerStars offered to make him - and him alone - whole through sponsorship dollars. Haxton used to be their lead pro and is widely considered one of the very best players in the world.

It could just be for ethical reasons. I think anbop has a good reason even for unethical folks: hitting the best players hard in their wallets will definitely make it harder to recruit them for comparisons that validate these experiments. My prediction is that releasing this software will lead to profitable cheating like what people do with Blackjack at casinos.

Why not run the bot, post its proceeds transparently online, and donate everything to charity?

By not releasing it, you're ensuring a higher concentration of money in the hands of a few, IMO.

Anyone with access to this source code could run a bot themselves, or employ someone to do so.

Plus, if you've accomplished this, no doubt someone can replicate it.

By not releasing it, it doesn't validate the experiment. How can we be sure there wasn't human support?

As other commenters have said, I do too think you're delaying the inevitable but releasing now would mean you get credited with the first free solution.

In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?

That took place after the final version of the Science paper was submitted. It would have been nice to include but it takes a while to do those experiments and we didn't feel it was worth delaying the publication process for it.

The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.

The bot handles each hand independently. How the players play in one hand does not affect how the bot plays in future hands at all.

That said, it did train by playing against itself (before the experiment against the humans began).

Interesting. Does this mean that it cannot adjust to human players "switching gears"? Isn't this a huge leak?

It’s not a leak, it just means it cant beat the opponent for the maximum it could by playing the exploitative counter strategy vs their tendencies. Instead it just plays gto which will win against any given non gto strategy, though not for as much as the exploitative counter strategy. Playing an exploitative strategy however leaves you open for exploitation and this goes back and forth until the players converge onto gto, assuming the players are (very) good.

t. former poker pro

Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...

The CFR algorithm is actually somewhat similar to Q-learning, but the connection is difficult to see because the algorithms came out of different communities, so the notation is all different.

Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.

LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1 Human experiment but not the 5 Humans + 1 AI experiment.

We used AIVAT to reduce variance, which reduces the number of samples we need by roughly a factor of 10: https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat...

What? The pros chosen were definitely highly skilled players. They're fairly well known in the online poker community.

Furthermore, Chris Ferguson, scumbag aside, is absolutely still a very good player by today's standards, and one way higher than the mean participant in a research experiment.

10,000 hands is an effective enough sample at a certain win rate and analysis of variance of play; the n-value alone is not enough to tell you if it was enough hands.

They're credible enough. I'd like the sample sizes to be bigger as well but they're enough to verify that even if the bot got lucky over the sample size, it's close enough that it doesn't really matter. Add a bit more compute, optimize some algorithms a little, and you'd make up the difference. The real point is that they have a technique that scales to 6-max, and whether it's 97% or 99% is kind of immaterial in the grand scheme of things.

FWIW, they did some variance reduction techniques that dramatically reduce the number of hands needed to be confident in your results, so the number of hands may be bigger than you think. e.g. the results of 10k HU hands have much higher variance than the results of 10k HU hands where everyone just collects their EV once they're all in.

Jimmy Chou, Jason Les, Dong Kim are affiliated with Doug Polk.

It is an interesting point that these are pros but their specialities are either tournament or heads up. The current 6 max pros are LLinusLove, Otb_RedBaron, TrueTeller.

I'm very late to this post, so not sure if you're still around.

What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?

Congrants on the bot!

I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.

Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?


Why did you optimize for using less cpus? Was it a happy accident or a goal?

A little bit of both. We didn't think we needed the extra computing power. And we really wanted to convey how cheap it is to make a superstrong poker AI with these latest algorithms.

Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?

The bot does bluff, and in fact it learns from self-play that bluffing is (sometimes) the optimal thing to do. At the end of the day, bluffing is simply betting when you have a weak hand. The bot learns from experience that when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet. The bot doesn't view it as deceptive or dishonest. It just views it as the action that makes it the most money.

Of course, a key part of bluffing is getting the probabilities right. You can't always bluff and you can't never bluff, because that would make you too predictable. But our self-play and search algorithms are designed to get those probabilities right.

> when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet.

This makes no sense. If I am betting for thin value with a weak hand, then I make less money when my opponent folds. Does the bot not know whether it is bluffing or value betting?

It makes complete sense. There’s a component of value and a component of bluff for a given hand in front of you. They’re related.

Value betting and bluffing aren’t defined by the outcome of a hand — action yet to be completed. Poker is a game of hidden information so betting with “thin value” implies that your component of bluffing is larger. You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

QQ can get KK to fold based upon board texture, street, and prior action. But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.

> You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

No, that is simply not true. If I am betting for value, then I want my opponent to call no matter how weak I am or how thin it is.

> But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.

Then it's a value bet. As you said, it's not defined by the outcome.

“Value betting” and “bluffing” are human heuristics to simplify complicated situations.

The bot doesn’t “know” whether it’s value betting or bluffing—it’s not a relevant question. The relevant question is whether to bet, and what amount, in order to maximize value of the particular hand it has, with reference to the board and opponent actions taken.

Right, we agree on that, but the above comment lumps all of what you describe (“betting with a weak hand”) under “bluffing” and says the bot learns that it makes more money when its opponent folds.

Where does your quote say that the bet is a value-bet? I read it as saying that the bot learned to bluff (not value bet) by betting when it has a weak hand (I.e. The bot has a weak hand, so it's getting better hands to fold by betting). The phrase "value bet"was not used.

(This, in addition to what the other comments have said about there being spots where a bet can get better hands to fold with some probability AND get worse hands to call with some probability - see the chapter "The grey area between value betting and bluffing" in Applications of No Limit Hold Em)

"At the end of the day, bluffing is simply betting when you have a weak hand."

I was the one who introduced the term "value betting" to the conversation, applied specifically to weak hands.

I mean, unless only those who interpret it wrong would respond, then I must be the one reading it wrong. Because these responses aren't lining up with how I read it or what I meant.

At the highest levels of play psychological factors are pretty minimal. Before a showdown which cards you actually hold aren't particularly material, as the only information you convey is through your bids. This means if you predict that you're more likely to win a hand by bidding (and inducing a fold) than by calling and going to a showdown it makes mathematical sense to "bluff". I'm sure AIs have no trouble learning that fact.

The issue is that you don't know exactly the probability of your opponent folding.

This is psychology.

The probability of the opponent folding doesn't matter. The goal of bluffing in modern games is so that optimal players are indifferent in their decision (no matter how they play, you can't lose money). And because this is a zero sum game, if you can't lose money then you win if the opponent makes mistake.

You only need to know the probability of the opponent folding so that you can deviate from the theoretical optimal strategy to win even more money if they are a biased player

I'll have to go back and watch Data playing poker on Star Trek NG -- what do sci fi writers think of this.

Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?

We're really focused on advancing the fundamental AI aspect. We're not here to kill poker. The popular poker sites have quite sophisticated anti-bot measures, but it's true that this is an arms race.

There are no ethical reasons why a game like poker must exist. In fact, poker gives a false sense of hope to the thousands of gambling addicts that enter casinos. It is a fun game, but there are an unlimited potential number of fun games..

1 ethical reason it ‘must exist’ is that it is a man-made game that some people enjoy without causing harm to themselves or others. Not quite sure what you’re suggesting, but “banning poker” is not going to solve the problem of gambling addiction.

I saw people who were going occasionally to casino without problems because nothing makes you lose and tilt so much as poker. I witnessed poker destroying families and people more than other games. There are people who don't like other casino games but lose heaps on poker and before they started poker their lives had more quality and meaning. I don't play other casino games but poker had a really bad influence on my life and the lives of people around me. Also, majority of money from poker comes from the players, not from the viewers and sponsors like in other sports, like football, baseball etc.

Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.

We played 10,000 hands of poker over the course of 12 days in the 5 humans + 1 AI experiment, and 5,000 hands per player in the 1 human + 5 AI's experiment. That's a good amount of time for a player to find a weakness in the system. There's no indication that any of the players found any weaknesses.

In fact, the methods we use are designed from the ground up to minimize exploitability. That's a really important property to have for an AI system that is actually deployed in the real world.

A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...

Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.

Well done.

The progress you have made in this research field is amazing. What do you think will be next step or where do you the the future of your research?

Thanks! I think going beyond two-player/team zero-sum games is really important. This was a first step, but it's definitely not the last. I'm hoping to continue in this direction, and maybe start looking at interactions involving the potential for cooperation in addition to competition.

I haven't finished digging through the paper and the supplement yet, but I'm curious about how many hands were multiway to the flop (and whether the percentages differ significantly between 1H/5AI and 5H/1AI). I'd guess that it's a pretty small fraction of the total hands, and I'm wondering what the performance is like in those particular cases.

I don't have the exact percentages but I think it's less than 10%. It's not really possible to measure the bot's performance just in specific situations, but my feeling is the bot performs relatively well in these situations. Multi-way flops were basically impossible to do in a reasonable amount of time for past AI's. Our new search techniques make these situations feasible to figure out in seconds.

Cheers, thanks. One of the reasons I asked about 1H/5AI vs 5H/1AI is that historically the new bots for a given form of poker have played a bit wider than the accepted wisdom of the time, so I was curious if there were relatively more multiway pots with 5AI than with 5H.

The pros described the bot's preflop strategy as very sensible, so I think it's unlikely there were more multiway pots with 5 AI's.

What table information does the bot take into account? Position? Other player's stack size?

>Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand .

Is this information used to form an idea of what other players might be holding based on how the other player acts and how closely that action matches Pluribus's 'what if' action?

No, it's to mask actions. If you bet big with monsters and check with air 100% of the time, you opponent knows when to fold and bet.

iirc, the frequency of bets in that spot is roughly equivalent to the frequency of times you're definitely in front of your opponent in that particular spot, but not always with the hands that are beating your opponent.

The concept is called Game Theory Optimal (GTO) and it's pretty popular in higher stakes games.

Can you share some about what strategies the bot prefers and how these compare with common professional human strategies?

We talk about this a bit in the paper. Based on the feedback from the pros, the bot seems to "donk bet" (call and then bet on the next round) much more than human pros do. It also randomizes between multiple bet sizes, including very large bet sizes, while humans stick to just one or two sizes depending on the situation.

Is there a way to see the EV the bot is calculating when it's deciding between checking and donk betting? When you place these spots in solvers, they actually advocate for a significant amount of donk betting on certain boards, but pros don't do it because the EV is marginal and it's better for pros to simplify their strategy so they make less mistakes. If you have a flop donk bet strategy, you also have to develop a corresponding turn and river strategy, which makes it extremely difficult.

When human players donk bet it's almost always a weak player employing an extremely exploitable strategy, whereas pros almost never do it because the metagame has evolved around the presumption that nobody ever donk bets. I'd love to see what the bot's balanced GTO donking strategy looks like.

It's basically been true along every step of the the poker bot evolution (HU limit, HU NL, and 6-max NL) that the bots donk a lot more than the humans. 10 years ago you could find pros arguing that donking in any situation is always wrong. That's been shifting for years, but still not to the level that the bots do it.

My personal belief is that the "no-donk" strategy is an adaptation by fallible human minds to reduce the branching on the decision tree to something tractable.

Your personal belief is likely correct. Balancing a donking range is incredibly difficult for humans and doing so perfectly likely yields only a very small EV bonus over just always checking. For humans it makes a whole lot of sense to reduce the branching in a case like that whereas for computers it doesn't really matter.

Another good example is varying continuation betting sizes. A true GTO strategy would mix in a number of different sizings (and I'm sure the bots adapted to do this), but you only sacrifice a very tiny amount of EV by basically betting the same size every time. Doing the latter limits humans risk for making errors which is far more valuable than squeezing out .05bb/100 more by varying the sizes.

If true in cash games, it is funny since it is a not uncommon strategy in high-level tournament play to control pot size.

Donk bets exist in the meta, ie when the turn is extremely good for your range but is horrible for your opponent. ( if you have a fd on the flop and it hits on the turn you can overbet the pot on the turn with your bluffs and foushes then just go all in on the river) if they have top pair its pretty hard to play against that

Oh, sure. I more meant flop donk bets; I guess it doesn't specify which street the donking was happening.

The same logic can apply to flop donk bets. Some flops favor the donking player's range more than their opponent.

Yea I'm not saying it's impossible to devise an unexploitable flop donking strategy. I think the reason thinking players generally don't is because of the complexity of adding significantly more branches early in the game tree - basically going from 3 (check-{fold,call,raise}) to 6 (those 3 plus donk-{fold,call,raise}).

The other issue is that increasing the number of branches also decreases the number of hands that go into each bucket, to the point where it might not be effective any more without being able to randomize the branch choice for specific threshold hands. Most pros I know just have a hard cutoff for each branch and don't worry too much if they're slightly out of balance, but smaller bucket sizes could magnify errors. If you have 31 combos for one action when you're supposed to have 30.5, then whatever, but if you have 6 when it should be 5.5, that could become a problem faster.

Neal - super interesting stuff. Couple of questions:

1) What were the reasons for choosing 6-handed play (assuming logistical and costs)? It would be interesting to see how the bot’s strategy would differ in a full ring game. 2) Are there any plans to commercialize the bot as a tool for training human players?

1) The goal was to show convincingly that we could handle multi-player poker. The exact number of players was kind of arbitrary. We chose six-player because that's the most common/popular format. Considering training the 6-player bot would cost less than $150 on a cloud computing service, I think it's safe to say these techniques would all work fine in other formats.

2) I'm quite happy working on fundamental AI research and plan to continue in that direction.

6-handed is a very common format online.

Are any papers available yet?

Is the bot going for game-theory-optimal play, or trying to exploit weaknesses in other players?

The paper is here: https://science.sciencemag.org/content/early/2019/07/10/scie...

It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.

I thought myself the same. However if players do expose each others weaknesses fast enough it could lead to a chip gain which might be hard to overcome right? Just in theory ofc. :)

This is a great question. I wonder how this bot would do in a game with a couple of pros and a couple of reasonably skilled amateurs?

Still well, I suspect, since straightforward theoretically-correct poker will take money off the amateurs efficiently. But it seems possible that playing to wipe out weaker or less consistent players could provide enough margin to bully the more stable AI player.

This is true in tournament play. In a cash game it doesn't matter since there is no elimination, you can always rebuy.

And it's not like in the movies where if you don't have the money to call a bet, you lose. You simply are considered all in for the main pot and then sidepots that you aren't eligible to win will be created for any bets you can't cover.

Yeah that is a really interesting insight. I presume that also makes optimization much simpler. The rules are fixed. Opponents are not.

> you don't need to adapt to opponent weaknesses to win at poker at the highest levels

that may be true for limit poker, but in a no-limit tournament the best this bot could do is not lose. as the pressure increases with the blinds and the players are forced to bluff and call bluffs how does this bot avoid folding itself to death from a run of bad cards?

I could see this bot doing well at cashing but I don't see how it could consistently place 1st the way the top human players do.

Optimal play includes bluffing. It's "optimal" according to game theory.

For example, game theory may tell you that in a particular situation, you can't be exploited if you bluff 10% of the time. If the opponent bluffs less than that, you can come out ahead by more often folding when he bets. If the opponent bluffs more than 10%, you can call or reraise when he bets. But if he bluffs the optimal amount, it doesn't matter either way, you can't take advantage of him.

So this bot would bluff at 10% to avoid getting exploited, but wouldn't try to detect whether the opponent is exploitable. (The latter is risky since a crafty opponent can switch up strategies, manipulating you into playing an exploitable strategy.)

To add onto this, some players that truly abide by the GTO strategy will use a prop, for example a watch, to determine what play to make.

If you want your perceived range to be balanced and make x play 50% of the time and y play the other 50%, you look at the watch and if the second hand is in the first 30 seconds, you make x play, 30-60 seconds, y play.

That's just an example but your point is 100% accurate.

I think this comes down to ambiguity over what "optimal play" means.

There's a poker strategy we might call 'deterministically' optimal play, which consists of precisely assessing each hand's expected value with little to no bluffing. This is already common in online cash games with both bots and players running multiple games at once. And you're right - it's excellent at running net-positive and not losing, but unlikely to win significant tournaments.

Pluribus, though, is playing something close to game-theoretically optimal poker. In playing against itself, it's attempting to develop a takes-all-comers strategy with no exploitable weaknesses. That includes bluffing and calling bluffs - the goal is simply to find a mixed-strategy equilibrium where those moves are made some percentage of the time, in proportion to their expected payoffs. This can involve doing all of the same basic operations as pro players, like valuing button raises differently than donks or attempting to bluff based on how many players remain in the hand. The distinctive limitation is simply that Pluribus plays 'locally' optimal poker with no conception of opponent's identities or behavior in prior hands.

that's a helpful explanation thank you! I was misunderstanding the statement about Pluribus not modeling its opponents between hands as between rounds - it's definitely modeling its opponents and detecting bluffs by understanding when a bluff is likely strategically based on each opponents actions so far in the hand, it's just not taking anything it learned into the next hand.

I could see this being an effective strategy in a WSOP, that ability to perfectly forget the previous hand is probably more valuable than anything the way WSOP champions play. I could see it coming down to whether or not the ability to exploit a reliable tell during a pivotal hand matters more than 10% of the time.

I couldn't find it confirmed in the primary or secondary article, but I would bet the bot is just playing cash at a fixed stack depth rather than a tournament; just like in the wild, bots are much more of a problem in online cash than online tournaments. Dynamically adjusting strategies by stack depth, number of players, and pay jumps, would probably be several orders of magnitude more complex.

Smaller stack sizes reduce possibilities and thus reduce complexity. Pay jumps result in chips having different utility to each player which forces some situational playstyles to be more optimal. I would guess that this also reduces the complexity of the game.

Since tournaments don't often spend much time with stacks much deeper than 100bb, I would guess that tournaments would be more easily solved. Though tournaments are much more frequently run with 9-10 players rather than 6 at a table.


You're right that a single short stack hand in a vacuum has fewer game tree branches, and that factoring in chip utility is also fairly straightforward. But I strongly disagree that it reduces the overall complexity of the game. The model in the article played every single hand with 100bb; to be an effective tournament player it would have to be able to fluidly adjust strategies between big, medium and short stack play, as well as reasoning about the stack sizes of other players at the table. It's basically 4 different games at >100bb, 50-100bb, 25-50bb, and <25bb, so it would have to develop optimal strategies for each. And even if the shallower stacked games are generally simpler in isolation, there's a meta strategy of knowing which one to apply in a given hand with heterogenous stack sizes. To paraphrase Doug Polk "If cash game play is a science, tournaments are more of an art."

The bot could likely just be trained on the 4 or so different games. You’re likely increasing the complexity by a constant factor, nothing exponential here.

> There were two formats for the experiment: five humans playing with one AI at the table, and one human playing with five copies of the AI at the table. In each case, there were six players at the table with 10,000 chips at the start of each hand. The small blind was 50 chips, and the big blind was 100 chips.

In the fb article linked above.

Ah thanks. As I suspected, cash game with fixed 100bb stacks.

Isn’t this survivorship bias, or do you know which player repeatedly will place 1st beforehand? Granted that poker is pretty popular, there must be quite a few people who always become first place.

Or to turn this around: given enough bots, some bots will place 1st a lot more than others. It’s just unclear which one.

The game actually becomes simpler when you have less blinds to the point where if you have 15 blinds or fewer you actually just follow a chart and go all in or fold preflop

The blinds don’t increase it’s a cash game not tournament

Why would you choose Chris Ferguson to participate? Don't you know his terrible history?

Congrats! As soon as I saw the title I thought “I wonder if this is the project Noam works on...”


Congratulations on the win! Can you recommend any papers, blog(post)s, or books for the interested layman? (I am currently scanning though the facebook post, which is great, but personally I am looking for something more technical).

Do you want to do a Hearthstone / CCG bot? I have an engine and testers for you.

Very interesting results. From the paper it sounds like the algorithms you used are very similar to Libratus (pre-solved blueprint + subgame solving). What change made it so that the computation requirement is much lower now?

There were several improvements but the most important was the depth-limited search. Libratus would always search to the end of the game. But that's not necessarily feasible in a game as complex as six-player poker. With these new algorithms, we don't need to go to the end of the game. Instead, we can stop at some arbitrary depth limit (as is done in chess and Go AI's). That drastically reduces the amount of compute needed.

Can you share more details about the abstraction? The paper is kind of vague on it. How does it decide if it should use 1 or 14 bet values? Is it a perfect recall abstraction? How many information sets are there?

We give more details on this in the supplementary material.

When do you solve bridge? :)

It is in a way disappointing that this question gets so little attention, and yet, it might be the most significant. If a bot can false-card - if it can discern the strategy that the opponents have in mind, and deliberately mislead them to its own advantage - we have a real world AI. However, skills of computer bridge programs remain at club level standards.

Interesting that the conventional wisdom of never open limping emerged as confirmed through self-play. What other general poker “best practices” were either confirmed or upended through this research?

For someone not in the AI field, can you explain why AI is needed and an elaborate code with conditional blocks is not enough? Where does AI fit in with a poker game.

Conditional blocks would work, but it would be an impossibly detailed and granular tree to setup. The AI component simply helps you arrive at the decisions to create the complex tree.

This is super interesting! What steps would you recommend a professional poker player take in order to use AI to improve his/her personal poker skills?

Does it beat poker by reaching Nash eq (where you cant make profit and no one can profit from you) or exploits opponents weakness to seek profit ?

It doesn't exploit its opponents' weaknesses. Its focus was on not having any weaknesses that its opponents could exploit. However, the algorithms are not guaranteed to converge to a Nash equilibrium in this setting because it's not a two-player zero-sum game (and in either case, it's not clear that playing a Nash equilibrium would provide much benefit in this setting).

What sort of defense applications could this sort of technology be used for? The last line of the Facebook blog post sparked curiosity.

Do you expect the human players to play at the best of their ability when they're not playing for actual real money?

There was real money at stake in this experiment. The pros were guaranteed $0.40 per hand just for participating, but that could increase to $1.60 per hand depending on how well they did.

To answer your question, no, I don't think human players would play at their best when not playing for actual money.

Sorry, I meant for way less than they typically play.

Any chance you could put Libratus / Pluribus online for people like me to try to beat it?

Unfortunately we don't have any plans to do that currently.

Are all the hands posted online somewhere for analysis. I would be very interested!

How many games did the bot beat the same 5 players? And how many games were played?

We played 10,000 hands of poker in the 5 humans + 1 AI experiment. The number of hands won isn't a useful metric in poker. If you win only 10% of your hands and make $1,000 on those hands, while losing only $1 on the other 90% of hands, then you're a winning player. The bot won at a rate of 4.8 bb/100 ($4.8 per hand if the blinds are $50/$100). This is considered a large win rate by professionals.

> This is considered a large win rate by professionals.

It depends on context. 4.8bb/100 is quite good for high-level online play, but wouldn't be enough to make a living at live poker. The biggest game that runs on a regular basis in most areas is $5/10. At ~33 hands per hour, that's 1.6bb or $16 an hour.

And I'd assume there was no rake in your game? That would take a big chunk out of the rate.

For one player...

$16/hr X (VM|microservice thread) could become astronomical profit.

That's why it's a good rate online where you can play multiple tables. Unlimited VMs don't help you in a live casino.

Any chance you’ll consider releasing the hand history of the session?

they're in the extra data section of the science mag article. formatting is terrible for importing into hand history viewers, so i'm trying to get a friend to re-format

What was the most challenging part about implementing this?

Honestly, probably debugging. Training this thing is very cheap, but the variance in poker is huge (even with the best variance-reduction techniques) so it takes a very long time to tell whether one version is better than another version (or better than a human).

When will you test it with 10 total players in a game?

The number of players is kind of arbitrary given the techniques we're using. We chose 6 because that's the most popular/common format for poker. I don't think there's any scientific value in also doing 10.

I am obviously a human, not a bot, but in my experience playing poker, it seems much more likely for me to be successful, personally in a 6 player game, whereas a 10 player game, I never seem to do well.

Any plans to make money using this in online games?

No, I don't have any plans to do that. This is really about advancing fundamental AI research.

What are the names of the poker pros the AI beat?

Are all the hands available to the public?

The hand logs from the 5 humans + 1 AI experiment are included in the supplementary material of the Science paper.

They are missing the stack sizes of the players. Would love to have logs that include that info!

will you release the source code?

Our goal is to make the research as accessible as possible to the AI community, so we include descriptions of the algorithms and pseudocode in the supplementary material. However, in part due to the potential negative impact this code could have on online poker, we're not releasing the code itself.

While you are not releasing the code to the general public, some people who worked on it obviously have access to it and someone will likely use it in the wild. The potential profits are astronomical - Rob Reitzen solved limit hold 'em and made what is rumored to be over $100 million hiring women to play online poker using his system from his house in Beverly Hills [1].

Did you guys set any rules as to whether or not members of the team that worked on this are allowed to use it?

[1] https://www.cigaraficionado.com/index.php/article/robotic-po...

Is this publicly available? How can I use it?

What's the name of the bot? Please say its Poker McPokerface

This is literally in the second sentence of the article

>A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker...

It was a joke.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact