In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.
I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.
Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.
It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.
While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.
Is this true in any meaningful sense?
For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.
I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.
That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.
This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.
In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.
I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.
I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)
The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.
For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.
Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.
The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.
For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.
1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.
2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.
You most often see 1 online, and 2 live, for obvious reasons.
A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:
1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.
2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.
3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.
Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.
I don't think you play very much, which is fine, but makes this discussion a bit pointless.
Adaptability is beaten by perfect strategic play in games with clear victory conditions.
My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.
> Rules Rather than Discretion:
The Inconsistency of Optimal Plans
> Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.