
DeepMind's MuZero teaches itself how to win at Atari, chess, shogi, and Go - jonbaer
https://venturebeat.com/2019/11/20/deepminds-muzero-teaches-itself-how-to-win-at-atari-chess-shogi-and-go/
======
schoen
Recent HN discussion:
[https://news.ycombinator.com/item?id=21589719](https://news.ycombinator.com/item?id=21589719)

------
2bitencryption
After all these iterations of Alpha-[blank], [blank]-Zero, now MuZero, etc,
I'm wondering:

If I'm interested in building a toy version following the Deepmind spec, which
can be trained to reach super-human capabilities on a particular board game
(Reversi, Chess, checkers, possibly even Go given enough compute), which of
these "versions" of the project would be the easiest for me to
understand/implement? (assume I have a basic understanding of the high-level
concepts and lots of enthusiasm, but I'm not an expert).

My understanding is, AlphaZero is not just stronger than AlphaGo, but
architecturally simpler and more efficient. That's what I'm looking for -- the
implementation with the highest result/difficulty ratio.

~~~
codehotter
AlphaGo Master, unsurprisingly, was significantly stronger than AlphaGoZero.
AlphaZero, although it can play multiple games, was weaker yet. In both cases,
they compared the 40 block version of the one with the 20 block version of the
other (they had to double the network size to approach the level of the
predecessor.)

Recently, Katago has reached similar levels of strength using a small fraction
of the resources:
[https://arxiv.org/abs/1902.10565](https://arxiv.org/abs/1902.10565)

It depends on what you mean by "more efficient." The significance of AlphaZero
was that you can reach good results in a variety of domains even without human
expert knowledge to provide supervised learning data or engineer features.
It's efficient in terms of engineering resources.

A precisely tailored approach can always get better results.

~~~
IanCal
Has it been improved? AlphaZero overtook AlphaGo Master previously
[https://en.wikipedia.org/wiki/AlphaGo_Zero#Comparison_with_p...](https://en.wikipedia.org/wiki/AlphaGo_Zero#Comparison_with_predecessors)

~~~
codehotter
The 40 block version of AlphaGo Zero is stronger than the 20 block version of
AlphaGo Master.

~~~
IanCal
This is a bit outside of my comfort zone so I'm not sure I quite get what
these blocks are. Has any version of alphago master bested alphago zero?

------
yters
Do they have any sort of chart showing zeroes are able to learn more
games/state spaces with less domain specific information and less compute and
space requirements? For instance, if we are getting into an exponential
tradeoff curve (seems possible due to enormous number of GPUs), then it is
hard to see how this will scale to human level type intelligence.

These one off experiments make it hard to know if AI is truly progressing or
not. Naively I'd assume due to the decision tree leaves growing exponentially
with depth, then we are facing an inherently unscalable problem, and we only
are getting current gains due to advances in hardware, but the gains are only
linear with exponential hardware improvements, especially if Moore's law is
giving out and even with parallel computation we might end up turning the
earth into a giant GPU array before we can reach parity with human
intelligence.

------
twohearted
I assume this has been tried but what happens if you give MuZero a goal like
"keep the system/process that spawns me running as long as possible?"

~~~
hervature
Why do you assume this has been tried? It's not even clear what the game is.
In this setting, what state and actions would the algorithm have access to?

------
davidfoster
Just released - walkthrough of the MuZero pseudo code:
[https://link.medium.com/KB3f4RAu51](https://link.medium.com/KB3f4RAu51)

------
chenzikuy
It's unclear to me how MuZero was able to use less compute to achieve
AlphaZero level performance on Go?

~~~
gok
From the preprint [1]:

> In Go, MuZero slightly exceeded the performance of AlphaZero, despite using
> less computation per node in the search tree (16 residual blocks per
> evaluation in MuZero compared to 20 blocks in AlphaZero). This suggests that
> MuZero may be caching its computation in the search tree and using each
> additional application of the dynamics model to gain a deeper understanding
> of the position.

It also strikes me as possible that just not giving the system the rules to
start with might have allowed it to explore more efficient strategies.

[1]
[https://arxiv.org/pdf/1911.08265.pdf](https://arxiv.org/pdf/1911.08265.pdf)

------
NicoJuicy
It can play a million times against itselve in the virtual world every day.

But applying that in the real world takes years.

~~~
Erlich_Bachman
If we and up following this approach, it is clear that it will be combined
with some sort of virtual-world building, where a machine builds an
approximate world based on the real-world data, then runs the simulation
inside the virtual world for eons, ends up with the best possible, but still
inferior action model (of course because the virtual world was not real), goes
back to real world, adjusts the model, and repeats etc.

It is even possible that our brains just do the same thing BTW. How many times
do you run that scenario of a job interview in your head before you go there?
How many times does it run in your subconscious virtually? How many times does
it happen in dreams? And more profoundly, how often do those scenarios in our
head are very inaccurate and simplified, and yet they still help us act in the
real world nonetheless?

