
How AlphaZero Mastered Its Games - jsomers
https://www.newyorker.com/science/elements/how-the-artificial-intelligence-program-alphazero-mastered-its-games
======
glinscott
James put together a really nice summary of the ideas and the projects!

It was almost a year ago that lc0 was launched, since then the community (led
by Alexander Lyashuk, author of the current engine) has taken it to a totally
different level. Follow along at [http://lczero.org](http://lczero.org)!

Gcp has also done an amazing job with Leela Zero, with a very active community
on the Go side. [http://zero.sjeng.org](http://zero.sjeng.org)

Of course, DeepMind really did something amazing with AlphaZero. It’s hard to
overstate how dominant minimax search has been in chess. For another approach
(MCTS/NN) to even be competitive with 50+ years of research is amazing. And
all that without any human knowledge!

Still, Stockfish keeps on improving - Stockfish 10 is significantly stronger
than the version AlphaZero played in the paper (no fault of DeepMind; SF just
improves quickly). We need a public exhibition match to setttle the score,
ideally with some GM commentary :). To complete the links you can watch
Stockfish improve here:
[http://tests.stockfishchess.org](http://tests.stockfishchess.org).

~~~
elcomet
I thought alphazero was based on minimax, and used neural networks to evaluate
moves, isn't this the case ?

~~~
glinscott
MCTS is not a traditional depth first minimax framework. Key concepts like
alpha-beta don’t apply. Although it is proven to converge to minimax in the
limit, the game trees are so large this is not relevant. You could use the
network in a minimax searcher, but it’s so much slower than a conventional
evaluation function it’s unlikely to be competitive.

------
alan_wade
God what a well written article! I don't have much to say on the subject, but
this was pure joy to read, it's crazy good. Clear, engaging, to the point,
making a difficult subject accessible without dumbing it down, no fluff or
unnecessary side stories, just awesomeness.

~~~
assblaster
The only gripe I had with the writing was the completely unnecessary injection
of gender pronoun controversy.

I highly recommend the AlphaGo movie as well, it does a great job documenting
the psychology of professional Go in the world of AI.

~~~
cdelsolar
what gender pronoun controversy? did we read the same article?

~~~
assblaster
>An expert human player is an expert precisely because her mind automatically
identifies the essential parts of the tree and focusses its attention there.

Instead of using gender neutral pronoun like "they", the author used a
feminine pronoun.

~~~
skybrian
This is nitpicking. I like using singular "they" and I'm guessing it will win,
but this isn't settled yet and some editors make different choices.

~~~
skyyler
I don't like singular "they", we should try to establish a second person
plural pronoun.

------
stabbles
What's very interesting is that the Komodo developers have implemented a Monte
Carlo Tree Search version of their engine _without_ neural nets for evaluation
/ move selection. This brand new engine can actually compete at the top level
(still much worse than Stockfish and slightly worse than Lc0) [1] [2]

The exact implementation details are probably kept secret, but the idea is to
do a few steps of minimax / alpha-beta rather than completely random play in
the playout phase of MCTS.

This makes me think that the contribution of AlphaZero is not necessarily
neural nets, but rather MCTS as a succesful method to search the game tree
efficiently.

[1] [http://tcec.chessdom.com/](http://tcec.chessdom.com/) [2]
[http://www.chessdom.com/komodo-mcts-monte-carlo-tree-
search-...](http://www.chessdom.com/komodo-mcts-monte-carlo-tree-search-is-
the-new-star-of-tcec/)

~~~
ggggtez
You missed the point then. Alpha beta pruning requires knowledge of the game
rules. Neural network pruning doesn't. The advantage is that it's a general
purpose technique.

~~~
stabbles
Yes, that's the main contribution of the experiment / paper. But prior to
AlphaZero the chess community did not even consider investing in MCTS engines
-- alpha-beta pruning was thought to be far superior. I'm thinking that we
might see classical engines exploring this concept more, and maybe it's even a
natural step to go from alpha-beta pruning + iterative deepening to 'best-
first' search with MCTS.

------
YeGoblynQueenne
>> In fact, less than two months later, DeepMind published a preprint of a
third paper, showing that the algorithm behind AlphaGo Zero could be
generalized to any two-person, zero-sum game of perfect information (that is,
a game in which there are no hidden elements, such as face-down cards in
poker).

I can't find this claim in the linked paper. What I can find is a statement
that AlphaZero has demonstrated that 'a general-purpose reinforcement learning
algorithm can achieve, _tabula rasa_ , superhuman performance across many
challenging domains'.

Personally, and I'm sorry to be so very negative about this, but I don't even
see the "many" domains. AlphaZero plays three games that are very similar to
each other. Indeed, shoggi is a variant of chess. There are certainly two-
person, zero-sum, perfect-information games with radically different boards
and pieces to either Go, or chess and shoggi - say, the Royal Game of Ur [1],
or Mancala [2], etc, not to mention stochastic games of perfect information,
like backgrammon, or assymetric games like the hnefatafl games [3], and so on.

Most likely, AlphaZero _can_ be trained to play many such games very
powerfully, or at a superhuman level. The point however is that, currently,
_it hasn 't_. So no "demonstration" of general game-playing has taken place,
and of course there is no such thing as some sort of theoretical analysis that
would serve as proof, or indication, of such ability in any of the DeepMind
papers.

I was hoping for less ra-ra cheerleading from the New Yorker, to be honest.

________________

[1]
[https://en.wikipedia.org/wiki/Royal_Game_of_Ur](https://en.wikipedia.org/wiki/Royal_Game_of_Ur)

[2]
[https://en.wikipedia.org/wiki/Mancala](https://en.wikipedia.org/wiki/Mancala)

[3]
[https://en.wikipedia.org/wiki/Tafl_games](https://en.wikipedia.org/wiki/Tafl_games)

~~~
dhh2106
Any explanation as to why this should not be used for games without perfect
information? As an example, why couldn't the face-down card in poker be
modeled as part of the MCTS?

~~~
YeGoblynQueenne
MCTS can be applied to imperfect information games, in fact it is quite robust
against uncertainty.

It's AlphaZero's deep neural net component, that's used to learn an evaluation
function and move orderings that will need a substantial redesign to take into
account imperfect information. The difficulty of this redesign will vary
considerably between games- in some games, information is gained throughout
the game, by observing another player's moves (e.g. in Poker), in some others
an initial state (e.g. a starting deal in card games) dominates the
probability that a certain possible board state is the real board state (e.g.
Bridge) [1].

On top of that, AphaZero's deep net has the shape of the board and the legal
movements of pieces on it hard-coded, as part of the net's structure. That
would also need a substantial redesign to accommodate a card game, or any
other kind of game without a board and without pieces that move on it. In
fact, different card games will require different architectures, most likely.
It's very hard to see how, e.g., the same neural net structure could be used
to encode both Bridge and Poker rules - and _still_ allow learning chess,
shogi and Go.

Given the great variety of board games out there (and that's only classical
games, I'm not even considering modern board games, like Settlers, etc) a lot
of very hard work would be required to even train AlphaZero to play any game
that's not very similar to chess, shogi and go. Not to mention, training
AlphaZero is very expensive (wikipedia quotes a cost of $25 million for
AlphaGoZero, AlphaZero's predeecssor, and that's just to buy the hardware
[2]). So I don't see how or when they'll demonstrate the "general" game
playing power of their system.

Basically, I think all that stuff about "generalized" game playing is just so
much pointless bragging. The way DeepMind designed AlphaZero is exactly how
everyone else has designed their systems- hard-coded with structures
appropriate to the targeted game (e.g. boards and pieces, etc). DeepMind are
clever in that they chose three very similar games, and then threw an immense
amount of money on the problem of solving them all in tandem. And still they
had to train _different models for each game_. That's just no way to get to
general game playing.

___________

[1] See chapter 5. Adversarial Search in AI: A modern approach, 3d Ed. for a
discussion of imperfect information and stochastic games and the difficulties
of designing evaluation functions for them.

[2]
[https://en.wikipedia.org/wiki/AlphaGo_Zero#Hardware_cost](https://en.wikipedia.org/wiki/AlphaGo_Zero#Hardware_cost)

~~~
dhh2106
Thank you!

------
cdelsolar
Awesome article. Does anyone know how to begin applying the AlphaZero
techniques to games where information is NOT perfect? I'm trying to apply it
to Scrabble. There hasn't been much AI research in this game and right now the
best AI just uses brute force Monte Carlo with a flawed evaluation function
(which doesn't take into account the state of the board at all, just points
and tiles remaining on the opponent's rack). It's still good enough to beat
top human experts about half the time, but I want to make something better.

Is it impossible to apply to these types of games? Every time I read about
AlphaZero the articles mention that the techniques are meant for games of
perfect information.

~~~
sobellian
If you search "UCT imperfect information" on Google, you'll turn up plenty of
articles and slide decks, including one from David Silver that discusses
reinforcement learning. The catch is that they're mostly dated before
AlphaZero's emergence, so there's some original work involved to extend
AlphaZero to this domain. This is likely something that DeepMind is working on
themselves. It's possible that tweaking the search query might turn up more
recent results. Good luck!

EDIT - The David Silver lecture I mentioned actually mentions a Scrabble AI,
Maven, which successfully applies MCTS. Here's a link:
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/g...](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/games.pdf)

~~~
cdelsolar
Thanks, I'll take a look. Yup, I'm aware of Maven. The one most commonly in
use now is named Quackle, which uses the same techniques as Maven but has a
slightly better initial evaluation function. But the evaluation function is
very simple and just keeps into account score and leave. This doesn't suffice
often.

------
FPGAhacker
They mention a documentary on Netflix about AlphaGo. Any recommendations for
or against?

~~~
sprt
Really good. As other people mentioned, it's not technical, but it's super
interesting to see the emotional impacts those games had on Lee Sedol.

------
eismcc
If you are interested in how to build a bot, Manning is having a Go bot
competition:

[https://deals.manning.com/go-comp/](https://deals.manning.com/go-comp/)

It’s been really fun to work through the books.

------
lgeorget
The article is very well written but that sentence felt a bit wierd:

> Before there could be acceptance, there was depression. “I want to apologize
> for being so powerless,” he said in a press conference.

Lee Sedol was clearly upset, especially after the first two matches, but I
think that apology was more out of politeness than depression, really.

------
deegles
Is there a way to play against an AlphaGo or equivalent but with adaptive
difficulty? I know next to nothing about go and think it would be interesting
to learn it just by playing vs. a neural network. Maybe over time the
strategies it uses would be "transferred" over to me!

------
pie_hacker
The match between Stockfish and AlphaZero was played with certain unjustified
parameters (time control, ponder off, different hardware, no opening book or
endgame tablebase for Stockfish etc.). By "unjustified," I mean that the
authors of the paper did not justify their choice of parameters in the paper
as being designed to implement a fair match.

At a glance, the parameters of the match seem unfair to me -- and tilted
heavily towards AlphaZero. If the code, were open source, this would not
matter; anyone could run a rematch. As it is, I haven't seen any convincing
evidence that AlphaZero is stronger than Stockfish when Stockfish is allowed
to use its full breadth of knowledge and run on equal hardware.

~~~
yesenadam
There has been a rematch recently vs Stockfish, with a couple of hundred
games. AlphaZero won 155-6! [0] There are fascinating videos with grandmasters
commentating on some of the games. They're played in an exciting, sacrificial,
swashbuckling style, nothing like any other top computer engine, and it seems
that may affect the play of top (human) players for the better. e.g. see

Matthew Sadler on chess24

[https://www.youtube.com/watch?v=JacRX6cKIaY&list=PLAwlxGCJB4...](https://www.youtube.com/watch?v=JacRX6cKIaY&list=PLAwlxGCJB4NchyTBYik8FBbnzLXpCCO79&index=6)

Daniel King

[https://www.youtube.com/watch?v=pFtY7gNRVRI&index=3&list=PLh...](https://www.youtube.com/watch?v=pFtY7gNRVRI&index=3&list=PLhyM8toCZs_qWKCohlxdcRUo3O9h7g61w)

[0]
[https://en.wikipedia.org/wiki/AlphaZero#Chess_2](https://en.wikipedia.org/wiki/AlphaZero#Chess_2)

~~~
fairplay2
The Wikipedia link says this was against Stockfish 8. Could people _please_
stop spreading FUD here?

If there's no public tournament, this might as well not have happened. I do
not understand why Google is always special. Other engines are open, Google
can test against Stockfish but not vice versa.

All these web companies take, take, take from Open Source and rarely give
back.

I'll write a paper now that I beat Carlsen, but I'll refuse to do so in
public.

~~~
yesenadam
>Could people _please_ stop spreading FUD here?

Do you mean me? If so, why not say so. "Could people _please_ stop X-ing
here?" is very passive-aggressive. I looked up 'FUD' \- Fear, uncertainty and
doubt. Not sure how what I said counts as any of those. Your comment certainly
seems to want to spread FUD, however. (And why is this your account's only
comment on HN?)

I don't know the significance of your first sentence either. I know nothing
about the versions of Stockfish used for this or anything else. Maybe you're
taking too much for granted. i.e. that I know enough of the minutiae to
understand your comments. Could you fill in the dots a bit? And who are 'all
these web companies'?

I'm not super-interested in AlphaZero (or computer chess generally) - haven't
read any of the papers, for example. But there's a lot of talk about the
various Alphas in the online chess world, since before it was playing chess,
and I found these videos very impressive. And it's ridiculous to say "This
might as well not have happened". Maybe true for you, but not for the chess
world, at all.

------
tosser0001
> An expert human player is an expert precisely because her mind automatically
> identifies ...

The "Patronizing 'Her'"

Almost invariably, when the author decides to use the patronizing 'her'
instead of the gender-neutral 'they' it's written by a man.

~~~
meroes
'They' isn't used as a singular pronoun in many style guides, including the
New Yorker's but nice try.

