

Computers Conquer Heads-Up Limit Hold'em - bcn
http://spectrum.ieee.org/tech-talk/computing/software/computers-conquer-texas-holdem-poker-for-first-time

======
Dn_Ab
This uses a particular form of a fundamentally simple yet surprisingly
powerful class of learning algorithms called regret minimization. CFR is
interesting in an of itself as it specializes regret minization to play
extensive form games. There are also CFR algorithms to play multiplayer and
no-limit games and though the guarantees of optimality are no longer there,
the players are still strong (but for now, far away from experts).

The article states that this algorithm is weak to bad players but that's more
an artifact of resources and training method; one advantage of minimizing
regret on games instead of using linear programming is that online learning
versions can adapt to exploit poor play with payoff larger than the game's
value.

I've also posted here before that RM solves 2 player Zero sum game more
efficiently than linear programming and how it's related to boosting,
portfolio optimization and as an abstraction of natural selection _.

_[http://www.pnas.org/content/111/29/10620.full](http://www.pnas.org/content/111/29/10620.full)

------
jcr
> _" The algorithm, named CFR+ by its creators, uses an improved version of a
> technique called counterfactual regret minimization (CFR). Past CFR
> algorithms have tried to solve poker by using several steps at each decision
> point: coming up with counterfactual values representing different game
> outcomes; applying a regret minimization approach to figure out the strategy
> leading to the best outcome; and averaging the latest strategy with all past
> strategies."_

Seems like a useful algorithm for dating sites.

(only half-joking)

~~~
javajosh
You're not the only one to think "relationships" when the phrase "regret
minimization" came up!

------
wglb
It would appear that this is some of the team at Alberta that "solved"
checkers:
[http://en.wikipedia.org/wiki/Chinook_(draughts_player)](http://en.wikipedia.org/wiki/Chinook_\(draughts_player\))

The book [http://www.amazon.com/One-Jump-Ahead-Jonathan-Schaeffer-
eboo...](http://www.amazon.com/One-Jump-Ahead-Jonathan-Schaeffer-
ebook/dp/B002C1AN3Y/ref=sr_1_3?ie=UTF8&qid=1420764059&sr=8-3&keywords=one+jump+ahead)
One Jump Ahead details the checkers effort.

I highly recommend this book.

The "solved" aspect of resembles enumerating all possible outcomes.

------
ssharp
Abysmal headline. The bot conquered heads-up, limit hold 'em, a game with next
to no popularity. Conquering heads-up may be a good first step, though I'd
reason that making a bot that can beat a full limit table is exponentially
harder than making a bot that can beat heads-up limit.

Full table (9 or 10 person) limit hold 'em has been mostly dead in casinos for
well over a decade and it only had a shelf life of a few years online, even
during the boom.

There are better edges (for the sharks) and more excitement (for the fish) in
no-limit and pot-limit games. I don't see a bot being close to be able to
conquer heads-up no-limit, let alone a full table of no-limit. I suspect most
heads-up limit happens in private games or the end of a limit poker
tournament. It's played in scenarios that will not really be that exploitable
should a person grok this bot's abilities.

The article also mentions that it's opponent was another bot that was also
playing a very strong strategy. That suggests it's not really able to adapt to
individual play, which is essential to being at profitable at all but the
lowest levels of poker.

~~~
frinxor
Limit hold'em is popular, and can be found in almost every card room as well
as online.

HU limit is probably only online, and theres a fair amount of bots prevalent
in the games already today.

> The article also mentions that it's opponent was another bot that was also
> playing a very strong strategy. That suggests it's not really able to adapt
> to individual play, which is essential to being at profitable at all but the
> lowest levels of poker.

The computer bots attempt to play Game Theory Optimal - meaning it plays an
(optimal) strategy that does not lose overall, regardless of stake/opponent.
Against another GTO player, it will breakeven - otherwise it will be
profitable. That is the holy grail of poker (and any game). If you are not
playing GTO, there are leaks/mistakes in your game that can be exploited.

Check out
[https://www.youtube.com/watch?v=VHcrsMPQtgo](https://www.youtube.com/watch?v=VHcrsMPQtgo)
, a poker theory course that talks about solving a simplified poker limit game

~~~
cxseven
The concept of Nash equilibrium doesn't apply to a poker player facing two or
more human opponents at the table, since they are likely not playing optimally
or even non-cooperatively, e.g. if there is a communication channel such as
facial expressions (however poker-faced) that a robot player is not privy to.

In the case of a poker tournament where there are at least a few non-optimal
players, it's advantageous to exploit and take chips from them before other
players do. A player who plays as though all of his opponents are playing
optimally probably plays too conservatively in that early stage and no doubt
faces a significant (non-optimal) disadvantage. Your comments still apply to
heads-up (two person) poker, though.

------
ikeboy
>After all, Cepheus honed its strategy by playing the equivalent of a near-
perfect opponent that made practically no mistakes.

So why don't you f king use _that_ instead of making your own program?

Also, how do they know that play is optimal if poker hadn't been solved
before? They're comparing it to perfect play, but if we can compute that, then
the problem is done anyway.

~~~
olegbl
It's possible they simply created another AI and gave it full information
(including the opponent's hand) - essentially making a "perfect" player (who
cheats). Not a useful program, but a good play-mate.

~~~
ikeboy
Optimal play against a player who knows your hand is not optimal against one
who doesn't; for example you should never bluff in this case.

Your suggestion makes no sense, sorry.

------
bluecalm
It would be nice to have more details on that CFR+ approach. If you just do
what the article claims (substitute averages with regret from recent
iteration) you will end up with oscillating solutions which never reach the
equilibrium (I just tried it with naive CFR implementation). There is surely
more to that and it would be nice to see what.

As to the claim of "conquering". While there is no reason to not believe them
let's see how they fare vs other near optimal AIs. There is a lot of scope for
numerical mistakes when you are dealing with solutions that big and it may
well be that they missed something along the way. Some other teams claimed
they solved HU holdem some time ago (and without supercomputers). They compete
in Alberta yearly championship every year so it will be easy to see how it
goes for Cepheus.

~~~
jcr
There are more details on CFR+ in the Science Magazine article [1] and the in
paper [2]. These were posted by hn user 'benktbyte' in another story.

[1]
[http://www.sciencemag.org/content/347/6218/145](http://www.sciencemag.org/content/347/6218/145)

[2] [https://pdf.yt/d/qv-O9AwQuV1Kjb04](https://pdf.yt/d/qv-O9AwQuV1Kjb04)

~~~
bluecalm
Thank you very much. I see that they've changed how regret matching works and
it that context it makes to only remember values from last iteration (or
average from n last iterations). Very interesting idea. I am still unable to
make it work faster than naive CFR but I am probably missing some details.

------
mmanfrin
'Conquers' in the sense it always plays +EV hands. This is exactly what
'grinders' do -- you just play the hands that have a long-term positive
expected value. You play that way long enough, your risk goes to zero and you
normalize out at a regular return.

~~~
gweinberg
I don't think that's correct in heads up. I think optimal play includes
"bluffing" a fair portion of the time.

~~~
thret
That's one way to put it. Often nobody has anything that would be playable in
a full ring game. The button is paramount, almost like having the serve in
tennis.

Also, HU LHE is (much) more complicated than full ring. You have less
knowledge about your opponents range, not more. You have to play more hands, a
wider range of hands, and you frequently need to make marginal decisions.

------
zirkonit
Misleading headline.

This is a complete -solution- for a limited poker version. Really strong, but
stochastic, Hold'em algorithms have existed for a long time.

~~~
logicalmind
Specifically, this is for heads-up (two person) limit texas hold'em.

------
matilde
Best article so far about this: [http://www.parttimepoker.com/heads-up-limit-
holdem-has-been-...](http://www.parttimepoker.com/heads-up-limit-holdem-has-
been-solved)

------
Pinn2
Limited still means having a choice in what to bet. Trivial EV calculations
can only tell you to bet or not, so I assume it's using something more
involved.

~~~
jib
Limit means fixed size betting. Typically 1 big blinds preflop and flop, 2 big
blinds on turn and river, with a max of 4 bets total on any street.

The algorithm sounds like it basically identifies the path with the least
negative EV for any given action, which is essentially how humans play limit
too. "If my opponent plays perfectly, what path will let me lose the least or
break even"

~~~
jchendy
Indeed, "limit" by itself is shorthand for fixed limit. There are other limit
games (such as pot limit and spread limit) where you do have a choice in your
bet size.

------
clintboxe
What language is it written in ?

------
vinayp10
Dude, I need one of these for pokerstars

~~~
makerops
They exist, fwiw.

~~~
jchendy
This probably goes without saying, but they're definitely not allowed.

~~~
thret
Yep, you write a marginally profitable bot for Stars and then spend all your
time trying to get around bot detection. I gave up although the exercise was
informative.

Interestingly I get pegged for a bot rather frequently anyway (well, perhaps
once a month). It is EXTREMELY annoying when you have to answer a captcha
while playing 40 tables, you time out everywhere no matter how fast you are.

------
MBCook
> [...] because of the huge amount of memory required; roughly 262 terabytes
> of memory. That’s about 268,288 times as much memory as the 1-gigabyte
> memory available to an iPhone 6.

Wow. I can't believe that sentence made it into an article, let alone one on
IEEE Spectrum.

First, normal people don't know how much RAM is in a phone. Second, the
numbers are reasonably identical (only differing by 2.4%) so what's the point
of specifying it out that way? Why not figure out what an average amount of
RAM in a new computer is (4GB?) and say "That's the RAM of about 70k new
Desktops".

~~~
coderdude
I think the people you're talking about who don't know how much RAM their
phone has are the same people who don't know much RAM their desktop has. At
least if you're using phones as a benchmark there is the chance a reader will
be awed.

