Is the bot going for game-theory-optimal play, or trying to exploit weaknesses in other players?
It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.
Still well, I suspect, since straightforward theoretically-correct poker will take money off the amateurs efficiently. But it seems possible that playing to wipe out weaker or less consistent players could provide enough margin to bully the more stable AI player.
And it's not like in the movies where if you don't have the money to call a bet, you lose. You simply are considered all in for the main pot and then sidepots that you aren't eligible to win will be created for any bets you can't cover.
that may be true for limit poker, but in a no-limit tournament the best this bot could do is not lose. as the pressure increases with the blinds and the players are forced to bluff and call bluffs how does this bot avoid folding itself to death from a run of bad cards?
I could see this bot doing well at cashing but I don't see how it could consistently place 1st the way the top human players do.
For example, game theory may tell you that in a particular situation, you can't be exploited if you bluff 10% of the time. If the opponent bluffs less than that, you can come out ahead by more often folding when he bets. If the opponent bluffs more than 10%, you can call or reraise when he bets. But if he bluffs the optimal amount, it doesn't matter either way, you can't take advantage of him.
So this bot would bluff at 10% to avoid getting exploited, but wouldn't try to detect whether the opponent is exploitable. (The latter is risky since a crafty opponent can switch up strategies, manipulating you into playing an exploitable strategy.)
If you want your perceived range to be balanced and make x play 50% of the time and y play the other 50%, you look at the watch and if the second hand is in the first 30 seconds, you make x play, 30-60 seconds, y play.
That's just an example but your point is 100% accurate.
There's a poker strategy we might call 'deterministically' optimal play, which consists of precisely assessing each hand's expected value with little to no bluffing. This is already common in online cash games with both bots and players running multiple games at once. And you're right - it's excellent at running net-positive and not losing, but unlikely to win significant tournaments.
Pluribus, though, is playing something close to game-theoretically optimal poker. In playing against itself, it's attempting to develop a takes-all-comers strategy with no exploitable weaknesses. That includes bluffing and calling bluffs - the goal is simply to find a mixed-strategy equilibrium where those moves are made some percentage of the time, in proportion to their expected payoffs. This can involve doing all of the same basic operations as pro players, like valuing button raises differently than donks or attempting to bluff based on how many players remain in the hand. The distinctive limitation is simply that Pluribus plays 'locally' optimal poker with no conception of opponent's identities or behavior in prior hands.
I could see this being an effective strategy in a WSOP, that ability to perfectly forget the previous hand is probably more valuable than anything the way WSOP champions play. I could see it coming down to whether or not the ability to exploit a reliable tell during a pivotal hand matters more than 10% of the time.
Since tournaments don't often spend much time with stacks much deeper than 100bb, I would guess that tournaments would be more easily solved. Though tournaments are much more frequently run with 9-10 players rather than 6 at a table.
In the fb article linked above.
Or to turn this around: given enough bots, some bots will place 1st a lot more than others. It’s just unclear which one.