
Mastering Chess and Shogi by Self-Play with General Reinforcement Learning - dennybritz
https://arxiv.org/abs/1712.01815
======
gwern
This is an incredible demonstration that the AG Zero expert iteration method
is a general method. If you go back to the discussions of AG Zero lo a month
ago, there was a lot of skepticism that NNs would ever challenge Stockfish et
al - they are just too good, too close to perfection, and chess not well
suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well
in chess: it works _better_ as it only takes 4 hours of training to beat
Stockfish. This is going to be an impetus for researchers to explore solving
many more MDPs than just chess or Go using expert iteration... ("There is no
fire alarm.")

~~~
Cybiote
Let's break this down and consider things carefully. To informed researchers,
what is most surprising here is not that the AlphaGo Zero algorithm beat
stockfish but that MCTS managed to outperform Alpha-beta search. I'll venture
a hypothesis as to why this was.

Informed skepticism would have discounted MCTS against alpha-beta search but
wouldn't have put much stock into the idea that Neural Networks couldn't learn
better features than what has been painstakingly handcrafted. We know that
given sufficient data and an appropriate architecture, neural nets have
achieved better local minima than humans. This shouldn't be surprising
anymore. A structurally adapted searcher will always do better in its adapted
to domain. _A Cat is so good at being a cat, it doesn 't even have to think
about how to cat_. Choice of optimization method, input pre-processing, loss
function, hyper-parameters and architecture together define a search space, a
structural prior and how to navigate.

Returning to alpha-beta vs MCTS, my view is that earlier work on the chess
search space being ill-suited to MCTS has not been invalidated once you
account for the synergy between the neural net and search method brought about
by the imitation learning approach. What might be happening here is the neural
net not only learns to correct when it goes out of bounds, it also learns to
account for missteps of MCTS!

The AlphaGo Zero Chess Program is clearly smarter than stockfish from the
perspective of its ability to better navigate the search space but before
talking about fire alarms there are some things to note.

Assuming the paper, AlphaGo zero does well if you hold compute fixed and
adjust time, but how does it do as you move along both compute and time? This
is of relevance to the general community, especially if AlphaGoZero skill
degrades gracefully enough to allow it to be a better tutor than current
engines.

Contrary to the no fire alarm claim, we should see sudden improvements
everywhere due to how close joint, structured prediction, reinforcement and
imitation learning are to each other. Unexpected improvement across a broad
class of problems is a fire alarm. Right now, POMDP or games with hidden
information and multiple interacting agents are still very difficult.
Structured prediction is still difficult. Granted, this was before AGZ, but
Neural Nets+MCTS had to be modified to Neural Self-Play before it could work
just ok in poker-like games.

What we should take away is the power of combining searching and learning.
I'll argue that what is now being called expert iteration was presaged in an
_antique_ 2006 paper [1] where Hal Daume et al discuss the power of a learning
algorithm trained to imitate a search computed policy. Even with limited
compute and data, you can use similar ideas under the learning to search
framework. The imitation approach is what's consistently yielded great
results, whether applied to neural nets or logistic regression.

[1] [http://www.umiacs.umd.edu/~hal/docs/daume06searn-
practice.pd...](http://www.umiacs.umd.edu/~hal/docs/daume06searn-practice.pdf)

[https://link.springer.com/content/pdf/10.1007/s10994-009-510...](https://link.springer.com/content/pdf/10.1007/s10994-009-5106-x.pdf)

~~~
Cybiote
Correction to the above: I stated Deepmind applied Neural Nets+MCTS and
achieved ok results. I was actually misremembering two David Silver (Deepmind)
papers as one. Smooth UCT modified UCT (popular brand of MCTS) to be able to
handle imperfect information games. MCTS does not converge under imperfect
information. Smooth UCT is strong at _limit poker_. Limit is much simpler than
no-limit.

Neural Fictitious Self Play based on fictitious play (invented 1950s), is an
approach to reinforcement learning using neural nets for function
approximation. Typical RL methods like DQN are highly exploitable. Against
strong programs, NFSP did okay, with a win rate of -50 mbb/h against the best
bot it played against.

Looking not just at Deepmind, there's Deepstack. It's similar to AlphaGo OG,
combining CFR+Neural nets. Deepstack did not win convincingly against humans
at 2 player no limit hold em.

The general point I'm trying to make here is that Chess and Go are closer to
checkers than to poker, which is itself a constrained game with known rules. I
mention all this and this Deepmind paper:
[https://arxiv.org/pdf/1711.00832.pdf](https://arxiv.org/pdf/1711.00832.pdf),
to provide a sense of scale to those talking about smoke and fire alarms.

~~~
nopinsight
What do you think of Libratus which won quite convincingly against top players
in no-limit Texas hold ‘em poker?

[https://en.m.wikipedia.org/wiki/Libratus](https://en.m.wikipedia.org/wiki/Libratus)

------
soveran
The ten sample games:

Sample game 1 [https://lichess.org/VMe0gfa2](https://lichess.org/VMe0gfa2)

Sample game 2 [https://lichess.org/Zqwn4Gzk](https://lichess.org/Zqwn4Gzk)

Sample game 3 [https://lichess.org/G2fPHci8](https://lichess.org/G2fPHci8)

Sample game 4 [https://lichess.org/LLt8wyYp](https://lichess.org/LLt8wyYp)

Sample game 5 [https://lichess.org/3r6CXx3H](https://lichess.org/3r6CXx3H)

Sample game 6 [https://lichess.org/sbdyUYS4](https://lichess.org/sbdyUYS4)

Sample game 7 [https://lichess.org/88vsAftE](https://lichess.org/88vsAftE)

Sample game 8 [https://lichess.org/1uvCwaeB](https://lichess.org/1uvCwaeB)

Sample game 9 [https://lichess.org/743quCXj](https://lichess.org/743quCXj)

Sample game 10 [https://lichess.org/SkCjxXkb](https://lichess.org/SkCjxXkb)

~~~
sillysaurus3
Uhh... These games are actually broken. From the second link:
[https://imgur.com/a/P5tG6](https://imgur.com/a/P5tG6)

See for yourself:

[https://lichess.org/Zqwn4Gzk#87](https://lichess.org/Zqwn4Gzk#87)

[https://lichess.org/Zqwn4Gzk#88](https://lichess.org/Zqwn4Gzk#88)

EDIT: Nope, I'm just a noob.

~~~
rditooait
[https://en.wikipedia.org/wiki/En_passant](https://en.wikipedia.org/wiki/En_passant)

~~~
sillysaurus3
Ah, thanks.

I'm delighted. Chess seemed so simple. I had no idea there was a special pawn
capture.

------
xianshou
One impressive statistic from the paper: AlphaZero analyzes 80,000 chess
positions per second, while Stockfish looks at 70,000,000. Seventy million,
three orders of magnitude higher. Yet AG0 beats Stockfish half the time as
White and never loses with either color.

A stunning demonstration of generality indeed.

~~~
1024core
So ... what if you combined Stockfish and AG0, and let AG0 explore 70M
positions instead of 80K? Would it improve even faster?

~~~
EvgeniyZh
What if you combined a bus that gets you to work in 10 minutes and plane that
gets you from Paris to Brazil, would it get you from Paris to Brazil in 10
minutes?

~~~
eternalcode
yes. In an imaginary and hypothetical sense. :)

------
magoghm
"We also analysed the relative performance of AlphaZero’s MCTS search compared
to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo.
AlphaZero searches just 80 thousand positions per second in chess and 40
thousand in shogi, compared to 70 million for Stockfish and 35 million for
Elmo. AlphaZero compensates for the lower number of evaluations by using its
deep neural network to focus much more selectively on the most promising
variations – arguably a more “human-like” approach to search, as originally
proposed by Shannon." <\- Amazing!

~~~
maxander
Meanwhile a human player considers <1 position per second, so there’s a few
orders of magnitude left to go in that direction.

But unsettlingly few, nonetheless.

~~~
nopinsight
Humans are also much weaker than AlphaZero in these three games. The
difference in the numbers of positions searched might be responsible for a
substantial part of that.

~~~
yters
It'd be interesting to weaken AZ until it is on par with a human, and then
compare moves evaluated. I'd suspect humans still evaluate significantly fewer
moves.

------
partycoder
If you have seen the Stockfish project you will see many hardcoded weights in
the configuration, found through experimentation. All these adjustments took
probably years to achieve... and now Alpha Go Zero just self-learns everything
and surpasses it.

Would be good to see Deepmind's solution play Arimaa and Stratego, and see
what kind of strategy it comes up with. Or weird variations of Go.

Eventually this tech will make it into military strategy simulators and that's
where things will get really messed up. 4 star generals will be replaced by
bots.

~~~
gamegoblin
I don't think this technique immediately applies to Stratego because it's not
a perfect information game.

I suspect it would exceed the state of the art in Arimaa, since Arimaa is
specifically designed to have a high branching factor (17281 -- compared to 35
for chess), and this technique was designed to work well in high-branching
factor games (since Go is a high-branching factor game, though much lower than
Arimaa).

~~~
partycoder
In that regard then Stratego would share some aspects with Starcraft, another
incomplete information game.

Deepmind is actively working in a StarCraft bot. It would be interesting to
see if they can be put together a supraintelligent StarCraft bot and then
translate those results to Stratego.

------
zwischenzug
I smell a rat.

The paper says:

'AlphaZero achieved within 24 hours a superhuman level of play in the games of
chess and shogi'

In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever
seen, which would never be considered by a human, let alone a superhuman.

11\. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as
losing 0.2 pawns, which makes it highly suspect in such a position.

35\. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half
a pawn immediately, and a whole pawn soon after.

50\. g4 also suspect

52\. e5 is insane.

This is bullshit.

Edit: bullshit is too much - see comments below.

Edit: Oh dear. We're doomed.

[https://lichess.org/study/qiwMCyNQ](https://lichess.org/study/qiwMCyNQ)

~~~
zwischenzug
If I leave Stockfish to study for longer then Qe1 comes up in the analysis.
Which makes me wonder whether SF gets weaker in some positions the more it's
left to think.

~~~
zwischenzug
Now I'm really intrigued.

SF plays really odd moves when left to its own devices for a time. As does
this AI. So maybe chess looks really weird with play significantly better than
the best humans.

It's actually really disturbing.

~~~
thom
I think being able to play tactically perfect chess over 20 or so moves will
often look weird to human strategic sensibilities. The computer sees every
tiny exception to the patterns and heuristics you've incorporated into your
gut feel about positions. In a way these moves are right just because they're
right, and that's what's jarring - there's no _principle_ behind them that can
be learned and generalised, which is something humans struggle with in all
walks of life.

~~~
thomasahle
Except AlphaZero doesn't evaluate nearly as many moves as Stockfish (80Knps vs
70Mnps), so in a sense, it has exactly generalized a principle (or likely a
whole lot of principles) that allows it to estimate positions much better than
Stockfish.

Of course you are right about perfect play, but the human-like aspect is part
of what is exciting about these new Alpha engines.

------
cdelsolar
I wanted to contact the authors directly but can't seem to find contact info
at the moment, with a question. I hope some of you might know enough to answer
it.

I'm interested in applying this method, or a similar neural-network / tabula
rasa based method to the game of Scrabble. I read the original AlphaGo Zero
paper and they mentioned that this method works best for games of perfect
information. The standard Scrabble AI right now is quite good and can
definitely beat top experts close to 50% of the time, but it uses simple Monte
Carlo simulations to evaluate positions and just picks the ones that perform
better. It doesn't quite account for defensive considerations or other
subtleties of the game. I was wondering if anyone who had more insight into
MCTS and NN would be able to talk me through how to apply this to Scrabble, or
if it even makes sense. One of the issues I can see currently would be very
slow convergence; as it has a luck factor, the algorithm could make occasional
terrible moves and still win games, and thus be "wrongly trained".

~~~
bo1024
Step 1: millions (?) of dollars of hardware.

------
ericand
Two things to note:

1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa

2) "Shogi is a significantly harder game, in terms of computational
complexity, than chess (2, 14): it is played on a larger board, and any
captured opponent piece changes sides and may subsequently be dropped anywhere
on the board. The strongest shogi programs, such as Computer Shogi Association
(CSA) world-champion Elmo, have only recently defeated human champions (5)"

~~~
pacaro
Shogi is a fun game, it always feels a little sad that it doesn't get more
exposure outside of Japan (and my understanding is that, by and large, in
Japan it is considered an "old persons" game)

Because captured pieces change sides, there is less of an "endgame" scenario,
and as a beginner (like me) it is very easy to put too many captured pieces
back into play, which makes it hard to defend everything and essentially you
end up giving them back to your opponent

~~~
kaffeemitsahne
I've been interested in learning both shogi and xiangqi for a while. If anyone
knows a nice engine with graphical frontend for either game, I'd love to know.
Wasn't able to find much the last time I looked.

~~~
apetresc
The best place to play Shogi online against others at
[http://81dojo.com/](http://81dojo.com/)

------
Scarblac
As a chess player I find the win rate astonishing.

Given the drawish tendency at top level, among human players, in
correspondence chess and also in the TCEC final, I thought that even
absolutely perfect play wouldn't score so well against a decent Stockfish
setup (which 64 cores and 1 minute per move should be).

------
thom
I can’t see any reference to whether Stockfish was configured with an endgame
tablebase. It’d be interesting to see results then, as you’d expect
AlphaZero’s superior evaluation to give it an advantage out of the opening,
but later in the game Stockfish would have access to perfect evaluations.
Obviously there’s nothing stopping you from plugging a tablebase into
AlphaZero but that feels wrong.

~~~
thomasahle
It's not clear that it had an opening book either. In any case it's not
specified which one.

------
Invictus0
I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero
used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play,
while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an
impressive achievement.

~~~
Recursing
Only 1GB RAM? Really?

~~~
bluecalm
Yes, this is really strange. Hash table size is a major contributing factor
for strength of chess programs. It looks like a very artificial limitation.

------
Aissen
Serious question: how does one evaluate the results reproducibility of this
paper ?

Maybe I'm missing some things but:

\- Are 1st gen TPUs even accessible ? You have to fill out a form to learn
more about those second generation TPUs:
[https://cloud.google.com/tpu/](https://cloud.google.com/tpu/)

\- I can't find the source code

This does not look like a scientific paper, but a ( _very_ impressive) tech
demo.

~~~
dandermotj
This is definitely a scientific paper. Pretty much no scientific paper comes
with source code and the majority of scientific papers are not reproducible
without an entire university department of resources anyway.

~~~
Aissen
> Pretty much no scientific paper comes with source code

Are we blindly accepting this as science now ?

~~~
voxl
Yes? Sorry you've been out of the loop so long but science doesn't cater to
your idealistic ideas of what it ought to be.

------
thomasahle
Discussion at the Computer Chess Club (CCC) forum:
[http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...](http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=741214&t=65910)

and

[http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...](http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=741211&t=65909)

------
tboerstad
Stockfish plays like an ambitious amateur in the first game, giving away a
piece for two pawns on move 13.

Perhaps this move was justified though, as later in the same game Stockfish
gets a position which is at worst drawn, likely winning. Moves later however,
around move 40, Stockfish gets its own knight trapped and the game is over.

This is not the kind of chess we normally see from Stockfish.

~~~
rozim
Yeah, that game was kind of different from the others - in the other games the
feeling I got was that over time AphaGo's pieces got increasingly effective
while Stockfish's pieces would get bottled up and lose their mobility.

------
naveen99
Very happy to see this result. It's like a moral victory for humans, as
alphago is more human like (discounting montecarlo search) than stockfish.
Maybe deep learning will give us the next Euler, Newton, or Einstein.

~~~
visarga
Shogi, chess and Go are "perfect information games", meaning you can see the
whole game state. It's a whole different thing to be able to solve games where
you don't see everything (based on uncertainty).

~~~
roenxi
Is it really though?

A big class of imperfect information games can be modeled by having a record
of everything the agent has seen so far. Then it has exactly the same, if not
more, information available than a human player in the same position. We know
that with equal information AIs can make better decisions than humans (see
also, AlphaGo :] ) so at that point the AI could reasonably be expected to
achieve superhuman performance.

The "imperfect information games are harder for AI" crowd are going to be
surprised by just how badly humans deal with imperfect information. AIs have a
much better memory than humans do, and much more potential to use actual
probability which humans are truly shocking at utilising (although neural
networks don't seem to utilise this edge; so far).

~~~
Cybiote
The difficulty of imperfect information is from cross cutting through
information sets and partial observability. With perfect information games
like chess or Go, one can solve subgames with guarantees that the equilibrium
is the same as for the full game. This is not the case for games like poker,
which is why they have been difficult. In addition to that, for n > 2 players,
there are no longer theoretical guarantees about converging to a nash
equilibrium, which makes designing theory guided algorithms harder. Though
empirical performance with n=3 of CFR is encouraging, I know of no results for
n > 3.

Earlier this year, DeepStack, a system combining neural nets with search,
competed live against humans without any side being dominant. Search policy
guided training might improve its results, which are impressive compared to
even 5 years ago, but this highlights how much more demanding imperfect
information games are.

~~~
sharky6000
Yep, this. Btw there are some encouraging results for n=4 using sequence form
replicator dynamics (which are implementing a form of CFR) in Kuhn poker. Toy
example but the game gets large fast with n=4. Don't know of any results with
n > 4.

[http://mlanctot.info/files/papers/aamas14sfrd-cfr-
kuhn.pdf](http://mlanctot.info/files/papers/aamas14sfrd-cfr-kuhn.pdf)

------
nl
For those complaining about the TPU resources used during self training it is
worth noting that Stockfish has used over 10,000 CPU hours for tuning its
parameters. See
[https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...](https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU%20Contributors.txt)

~~~
dmurray
This understates it a bit. More like 10 million CPU hours according to that
link.

------
110011
What an amazing result! Evaluating fewer (by a factor of 1000) positions
AlphaZero still beats Stockfish.

In the figure on its preferred openings I find it very interesting that it
doesn't like the Ruy Lopez very much over training time (there is a small bump
but that is transient). I am hardly a chess expert but I know that it was very
favored at the world championships so maybe the chess world will be turned
upside down by this result now?

Positing that the chess world is bigger than the Go world (in terms of
interest and finances) there is probably going to be a race to replicate these
results "at home" and train yourself before your competitors :)

------
elcapitan
What would be a good starting point to learn about the AI behind that for a
"normal" programmer? There seem to be so many resources now that it's hard to
choose. Combination of hands-on plus theory would be good.

~~~
thanatropism
The keyword is "reinforcement learning".

~~~
elcapitan
I know the names of the general concepts, I was wondering if someone has
concrete recommendations on where to start and which books/frameworks are sort
of beginner-friendly.

~~~
AlexCoventry
Try _Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts,
Tools, and Techniques to Build Intelligent Systems_ for the fundamentals.

[http://shop.oreilly.com/product/0636920052289.do](http://shop.oreilly.com/product/0636920052289.do)

For reinforcement learning, I hear Barto and Sutton is very readable, but I
haven't read it myself. You can just pick the concepts up by reading papers.
The introduction in the Deep Q-Learning paper is not great, but it's how I
first learned the concept.

[http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-...](http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-
bookdraft2016sep.pdf)
[https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)

------
asdfologist
While this sounds impressive, I'll believe it when AlphaZero wins TCEC.

~~~
computerphage
It beat the winner of TCEC-2016, Stockfish, with a record of 28-72-0. That's
zero losses.

~~~
bluecalm
If I run SF on my desktop computer it will kill SF run on my phone. It doesn't
prove anything. Comparing TPUs and CPUs is hard but they could've at least let
SF run on what is considered top of the line setup and sensible settings (1GB
hash memory is very limited, 8GB is standard for rapid games on a quad core
CPU, let alone 64core one).

~~~
sireat
I can't figure out the reason for this stingy 1GB hash memory limit when using
64 cores. It pretty much negates advantage of 64 cores vs say 4/6 cores.

A nefarious suggestion would be that setting 1GB limit ensures that Alpha
would always have the edge in depth as Stockfish would be forced to prune long
lines to preserve hash memory.

Maybe someone who has read Stockfish source code can comment how Stockfish
prunes hash memory.

~~~
galkk
Well, the one explanation is that they wanted to win "convincingly", thus 1m
per move and so low memory amount for hash.

------
gallerdude
I wonder if being an expert at one game makes it easier to be an expert at
another. If so, then maybe the examples are datasets, and convergence would be
able to complete new tasks after a few examples.

~~~
vinchuco
Really interesting question. Some strategic concepts may transfer, say, from
chess to chess variants. However, a simple change in the rules can have a huge
impact in the game mechanics as anyone who has tried chess variants [1] knows.

[1]
[https://en.wikipedia.org/wiki/List_of_chess_variants](https://en.wikipedia.org/wiki/List_of_chess_variants)

The answer may be that it is hard enough to become an expert at anything, but
there may be some serendipitous (how to make this precise?) overlap.

------
luckyt
It doesn't seem to like the Sicilian Defense (1.e4 c5), which is the most
popular opening by human players. I wonder if this will change opening theory?

~~~
SubiculumCode
That's stunning. I thought that was one of the strongest openings for black.

~~~
Isofarro
Not strongest opening, but because it's an asymmetric opening system, which
introduces imbalance into the position, thus it tends to have less drawish
tendencies than a symmetric opening system.

This creates the psychological effect of slightly turning the knob of "Black
is playing for equality", to "Black is playing for counter-play".

------
narrator
So when are they going to apply this to Atari Games or well anything? The next
step is they have one AI figure out the rules by making a GAN that imitates
player behavior and the other AI be Alpha Go which tweaks the GAN inputs to
generate different moves to win. Voila...Almost General Purpose AI that can
learn to play any game.

~~~
eref
The main problem is that we still lack good generative models and good ways of
interrogating them. GANs are unstable and difficult to apply to time series,
VAEs suffer from posterior collapse, WaveNet/PixelRNN grow with the input size
and overemphasize the details, RNNs are hard to train because we lack good
training algorithms. Generally, small errors tend to compound in step-wise
predictions because NNs do not generalize very well and gradients tend to
vanish and shatter. If you just regard computation time to roll out the
future, modeling domains in which the rules are simple enough to be hand-coded
and evaluated quickly (such as Go and Chess) probably makes MCTS a million
times more suitable compared to domains in which you need a complex model.

~~~
gwern
To expand on eref's comment a little: you absolutely _could_ apply this or
MCTS to ALE (and Guo et al 2014 did it very nicely). After all, the ALE is
deterministic and simulatable by definition, so of course you can explore the
game tree and reset the simulation as necessary. But people aren't much
interested in this approach because using the ALE as a 'simulator' is cheating
as far as testing full-strength AI techniques (we don't have simulators of the
real world, after all), and the ALE games themselves (unlike Go) are of little
intrinsic interest so there's no real benefit to engaging in cheating.

------
Sukotto
Is this a library or something I can download and try training myself (on a
small scale)?

I'm not in a position to read the paper right now, so my apologies if that's
covered in there. I want to ask just in case it's not, while this is still on
the front page.

~~~
gwern
No. DM only occasionally releases software. Expert iteration is simple enough
that someone can code it up on their own and there's already a few clones, so
if anyone cares to train their own, it's doable, although it may take a while.

~~~
chillee
"a while" is a bit of an understatement.

Leela zero (the main alphago zero replication project) is a crowd sourced
computation effort that's going to take a fairly long time to get anywhere.

And from this paper: > "Training proceeded for 700,000 steps (mini-batches of
size 4,096) starting from randomly initialised parameters, using 5,000 first-
generation TPUs (15) to generate self-play games and 64 second-generation TPUs
to train the neural networks."

~~~
Houshalter
You don't have to start from zero though. It's cool that it works with google
scale resources. But it seems like it would be faster to initialize with a
neural net first trained to mimic the moves of an existing chess or Go AI. And
then improve it from there.

>"Why is the net wired randomly?", asked Minsky. "I do not want it to have any
preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why
do you close your eyes?", Sussman asked his teacher. "So that the room will be
empty." At that moment, Sussman was enlightened.

~~~
gcp
The problem is that it isn't entirely clear whether this produces equal
quality results. You might end up on a lower optimization plateau.

------
lern_too_spel
What is its win percentage against itself on each side of the board in each
game? Is chess a draw for its style of play? Is there a first move advantage
for the other games with its play style?

------
hmate9
So AlphaGo Zero used 4 TPUs while AlphaZero used 1500. It’s not immediately
obvious to me why there is this massive difference. Can anyone elaborate?

~~~
sanxiyn
Both used 4 TPUs at playing time. At training time, AlphaGo Zero used
unspecified amount of computing resource, AlphaZero used 5000 TPUs for self-
play.

~~~
hmate9
Ah, thanks for clearing that up! Makes sense.

------
skc
I'm only a fairly pedestrian chess player, but I looked at one of these games
between AGZ and SF and aside from the endgame, AGZ played in a manner that
almost seemed alien. It seemed to completely ignore various little rules of
thumb which is to be expected in hindsight but fairly mind-blowing when you
actually watch a game.

------
bfirsh
Here's an HTML version of the paper:

[https://www.arxiv-vanity.com/papers/1712.01815/](https://www.arxiv-
vanity.com/papers/1712.01815/)

Table 2 is broken, but the rest is much more readable if you're on a phone.

------
wskish
The more interesting metric going forward is performance at a given power
budget (not unlike with motorsports). The TPUs are consuming sooo much power
here! Most interesting real-world problems are power-limited, including in
nature (e.g. metabolic limits).

~~~
orasis
When a lot of money is on the line you can use a lot of resources.

------
k2xl
Chess.com forum thread [https://www.chess.com/forum/view/general/stockfish-
dethroned](https://www.chess.com/forum/view/general/stockfish-dethroned)

------
hyperpape
This paper compares AlphaZero to the 20 block version of AlphaGo Zero that was
trained for 3 days. Am I right in thinking that this version was significantly
less strong than the 40 block version? If so, does it matter?

------
TwoBit
Wasn't Stockfish gimped for this competition? No openings, no endgame tables,
low RAM, etc? If that's so then this AI did not in fact beat the computer
chess champ.

------
naveen99
Is there an sdk or compiler for using the google tpu's beyond just using
tensorflow ? Is the tpu backend of tensorflow based on cuda, opencl, plain c
or something else ?

------
imrehg
As a Shogi enthusiast (but complete beginner), I'd like to have seen more
Shogi details in the article. Nevertheless there's plenty of other things to
geek out on...

------
auggierose
Great result, but without access to source code this is not a scientific
paper.

------
SubiculumCode
There is only one way for a human to win at chess against these computers; and
it involves violence against the chess board.

------
foobaw
Did Magnus play against this? Is there a way we can see the game?

~~~
NegatioN
No he didn't play it. As far as I know, computers are already far ahead of
humans in chess, so a further progress in this wouldn't really make a
difference.

~~~
thomasahle
It would be interesting in one way though: Magnus says he hates playing
against computers, because "it's like being beaten by an idiot". Modern chess
engines still make moves that are somewhat strategically weak, but they make
up for it with amazing tactics.

It would be interesting to hear if Magnus thought AlphaZero played less like
an idiot.

------
plg
source code?

------
stretchwithme
See, Mom? Self play is a good thing.

------
firebones
A lot of the graphs in the paper seem to level out as they hit the level of
the opponent. It makes me wonder to what extent AlphaGo Zero is merely
optimizing to beat flaws in existing opponents' current implementations (even
if "existing opponents" == all available opponents' data and algorithms today)
rather than generalizable insights into the underlying game. Because wouldn't
you expect that unless we are at the theoretical limit of perfect chess that a
tabula rasa approach might exceed existing best practice significantly,
especially with the massive computation advantage it has?

Not that there's anything wrong with that; AlphaGo Zero supposedly optimized
for the "just enough" win rather than the crushing win. It doesn't even mean
Stockfish is doomed--I suspect Stockfish could beat it in a future heads up
match provided that Zero didn't have time to retrain, but that a retrained
Zero (having the benefit of optimizing against a new Stockfish) would be able
to supersede it once again.

~~~
Houshalter
It's not. It learns entirely through self play and never learns from playing
it's opponent. Diminishing returns isn't unusual and happens in every domain.
These AIs are probably playing close to the limit of what is possible, just
not quite there yet.

~~~
tlb
Are there popular games where the best human players are not near the limit of
what is possible? Obviously you can construct one to be hard for humans (large
3SAT problems, or even big arithmetic problems), but I wonder if there is one
that people enjoy.

~~~
throw_away_777
Humans are nowhere near the limit of what is possible in chess, as evidenced
by how much better computers are at the game.

~~~
nandemo
Presumably tlb meant what is _humanly_ possible...

------
ericand
Certainly a significant achievement. Also, kind of interesting that the
AlphaGo team spent a lot of energy to convince us Go is much harder than
Chess, only to turn around and tell us that it is amazing that it can also win
at Chess.

~~~
2bitencryption
> only to turn around and tell us that it is amazing that it can also win at
> Chess.

What they're demoing here is a single, general formula for mastering multiple
games. Start with empty AG0, then teach it chess from scratch until it is the
strongest player on the planet.

Go back to an empty slate, with the same exactly "untrained" AG0, and now
teach it Go, to the same result. No fine-tuning for the domain of the game you
are training -- it is general(ized).

That's the gist I'm getting from this.

question for someone who has time to read the paper: can you train it to
master chess and go _at the same time_? or is it one or the other? I'm
assuming the latter.

edit: check out the graph on the 4th page. AlphaZero, which can master chess
and shogi, can beat AlphaGo Zero, the implementation specifically designed for
Go, at its own game.

~~~
gwern
> question for someone who has time to read the paper: can you train it to
> master chess and go at the same time? or is it one or the other? I'm
> assuming the latter.

I'm sure you could with a multi-headed NN. But what would be the point?
There's very little transfer of knowledge between the games, especially once
you get past the very most basics.

~~~
thanatropism
The point is that real problem domains are not neatly partitioned and labeled.

I don't know what kind of input the NN itself gets, but computer vision is
enough to translate a photo of a chessboard to a usable symbolic
representation. But it would be nice to already have a black box-ish computer
program that figures out what's the game at hand and how to play it.

The next variation is have the adversary start playing a chess variant and
have the machine recognize it (assuming honesty) and play it to significant
skill. Then "real life Pong" where the size and aerodynamics of the ball are
unknown to it. This is the gist of human intelligence: answering questions is
significantly easier than figuring out what the question is.

