Hacker News new | past | comments | ask | show | jobs | submit login

The paper is here: https://science.sciencemag.org/content/early/2019/07/10/scie...

It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.

I thought myself the same. However if players do expose each others weaknesses fast enough it could lead to a chip gain which might be hard to overcome right? Just in theory ofc. :)

This is a great question. I wonder how this bot would do in a game with a couple of pros and a couple of reasonably skilled amateurs?

Still well, I suspect, since straightforward theoretically-correct poker will take money off the amateurs efficiently. But it seems possible that playing to wipe out weaker or less consistent players could provide enough margin to bully the more stable AI player.

This is true in tournament play. In a cash game it doesn't matter since there is no elimination, you can always rebuy.

And it's not like in the movies where if you don't have the money to call a bet, you lose. You simply are considered all in for the main pot and then sidepots that you aren't eligible to win will be created for any bets you can't cover.

Yeah that is a really interesting insight. I presume that also makes optimization much simpler. The rules are fixed. Opponents are not.

> you don't need to adapt to opponent weaknesses to win at poker at the highest levels

that may be true for limit poker, but in a no-limit tournament the best this bot could do is not lose. as the pressure increases with the blinds and the players are forced to bluff and call bluffs how does this bot avoid folding itself to death from a run of bad cards?

I could see this bot doing well at cashing but I don't see how it could consistently place 1st the way the top human players do.

Optimal play includes bluffing. It's "optimal" according to game theory.

For example, game theory may tell you that in a particular situation, you can't be exploited if you bluff 10% of the time. If the opponent bluffs less than that, you can come out ahead by more often folding when he bets. If the opponent bluffs more than 10%, you can call or reraise when he bets. But if he bluffs the optimal amount, it doesn't matter either way, you can't take advantage of him.

So this bot would bluff at 10% to avoid getting exploited, but wouldn't try to detect whether the opponent is exploitable. (The latter is risky since a crafty opponent can switch up strategies, manipulating you into playing an exploitable strategy.)

To add onto this, some players that truly abide by the GTO strategy will use a prop, for example a watch, to determine what play to make.

If you want your perceived range to be balanced and make x play 50% of the time and y play the other 50%, you look at the watch and if the second hand is in the first 30 seconds, you make x play, 30-60 seconds, y play.

That's just an example but your point is 100% accurate.

I think this comes down to ambiguity over what "optimal play" means.

There's a poker strategy we might call 'deterministically' optimal play, which consists of precisely assessing each hand's expected value with little to no bluffing. This is already common in online cash games with both bots and players running multiple games at once. And you're right - it's excellent at running net-positive and not losing, but unlikely to win significant tournaments.

Pluribus, though, is playing something close to game-theoretically optimal poker. In playing against itself, it's attempting to develop a takes-all-comers strategy with no exploitable weaknesses. That includes bluffing and calling bluffs - the goal is simply to find a mixed-strategy equilibrium where those moves are made some percentage of the time, in proportion to their expected payoffs. This can involve doing all of the same basic operations as pro players, like valuing button raises differently than donks or attempting to bluff based on how many players remain in the hand. The distinctive limitation is simply that Pluribus plays 'locally' optimal poker with no conception of opponent's identities or behavior in prior hands.

that's a helpful explanation thank you! I was misunderstanding the statement about Pluribus not modeling its opponents between hands as between rounds - it's definitely modeling its opponents and detecting bluffs by understanding when a bluff is likely strategically based on each opponents actions so far in the hand, it's just not taking anything it learned into the next hand.

I could see this being an effective strategy in a WSOP, that ability to perfectly forget the previous hand is probably more valuable than anything the way WSOP champions play. I could see it coming down to whether or not the ability to exploit a reliable tell during a pivotal hand matters more than 10% of the time.

I couldn't find it confirmed in the primary or secondary article, but I would bet the bot is just playing cash at a fixed stack depth rather than a tournament; just like in the wild, bots are much more of a problem in online cash than online tournaments. Dynamically adjusting strategies by stack depth, number of players, and pay jumps, would probably be several orders of magnitude more complex.

Smaller stack sizes reduce possibilities and thus reduce complexity. Pay jumps result in chips having different utility to each player which forces some situational playstyles to be more optimal. I would guess that this also reduces the complexity of the game.

Since tournaments don't often spend much time with stacks much deeper than 100bb, I would guess that tournaments would be more easily solved. Though tournaments are much more frequently run with 9-10 players rather than 6 at a table.


You're right that a single short stack hand in a vacuum has fewer game tree branches, and that factoring in chip utility is also fairly straightforward. But I strongly disagree that it reduces the overall complexity of the game. The model in the article played every single hand with 100bb; to be an effective tournament player it would have to be able to fluidly adjust strategies between big, medium and short stack play, as well as reasoning about the stack sizes of other players at the table. It's basically 4 different games at >100bb, 50-100bb, 25-50bb, and <25bb, so it would have to develop optimal strategies for each. And even if the shallower stacked games are generally simpler in isolation, there's a meta strategy of knowing which one to apply in a given hand with heterogenous stack sizes. To paraphrase Doug Polk "If cash game play is a science, tournaments are more of an art."

The bot could likely just be trained on the 4 or so different games. You’re likely increasing the complexity by a constant factor, nothing exponential here.

> There were two formats for the experiment: five humans playing with one AI at the table, and one human playing with five copies of the AI at the table. In each case, there were six players at the table with 10,000 chips at the start of each hand. The small blind was 50 chips, and the big blind was 100 chips.

In the fb article linked above.

Ah thanks. As I suspected, cash game with fixed 100bb stacks.

Isn’t this survivorship bias, or do you know which player repeatedly will place 1st beforehand? Granted that poker is pretty popular, there must be quite a few people who always become first place.

Or to turn this around: given enough bots, some bots will place 1st a lot more than others. It’s just unclear which one.

The game actually becomes simpler when you have less blinds to the point where if you have 15 blinds or fewer you actually just follow a chart and go all in or fold preflop

The blinds don’t increase it’s a cash game not tournament

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact