
Fast implementation of DeepMind's AlphaZero algorithm in Julia - metalwhale
https://github.com/jonathan-laurent/AlphaZero.jl
======
cgreerrun
I've been working on a Python implementation that uses Gradient Boosted
Decision Trees (LightGBM/Treelite) instead of using a neural network for the
value/policy models:

[https://github.com/cgreer/alpha-zero-
boosted](https://github.com/cgreer/alpha-zero-boosted)

It's mostly to understand how AlphaZero&Friends work. I'm also curious about
how well a GBDT could do, and if there are self-play techniques that can
accelerate training.

The nice thing about a GBDT is that, unlike when using a NN, you can do
thousands of value/policy lookups per second on a single core. So it should be
cheaper to scale self-play and run a lot of self-play experiments (assuming
the self-play learnings when using the GBDT model transfer to when you use the
more-powerful NN in these environments).

If you're curious about accelerating self-play training, check out David Wu's
work
([https://arxiv.org/pdf/1902.10565.pdf](https://arxiv.org/pdf/1902.10565.pdf)).
He's the creator of KataGo. I implemented his "Playout Cap Randomization"
technique in my implementation above and, sure enough, it's much more
efficient: [https://imgur.com/a/epaKtDY](https://imgur.com/a/epaKtDY). It
seems like it's still early days in terms of how efficient self-play training
is.

~~~
psandersen
This is really interesting, thanks for sharing!

I've been thinking about extensions to decision tree models that could get the
benefits of NNs and it seems like there are a few ideas floating around.

For example; Probabilistic Random Forests have some really interesting
properties for noisy datasets, e.g. "The PRF accuracy decreased by less then
5% for a dataset with as many as 45% misclassified objects, compared to a
clean dataset." \-
[https://arxiv.org/abs/1811.05994](https://arxiv.org/abs/1811.05994)

PRF's might be a natural fit for RL, especially methods using monte carlo tree
search.

Speculating here as I'm not adequately familiar with stochastic calculus, but
intuitively it seems like probabilistic decision trees could be made
differentiable since the hard decision threshold in a tree could be a turned
continuous (i.e. every split is logistic regression), which might enable some
really interesting applications. I personally dream of being able to cleanly
integrate decision trees and tools from NNs in something like pyro for a fully
Bayesian model.

~~~
cgreerrun
A fast, powerful bayesian model seems like it would be a game-changer. PUCT
(the heart of the AlphaZero MCTS that decides which action to choose) really
seems setup to model the action choices as a multinomial bayesian inference
problem (it already updates the action priors with Dir noise).

Thanks for the link! I don't really know anything about the world of
probabilistic trees. I'll check it out.

The only bayesian approach to decision trees I'm familiar with is BART
([https://projecteuclid.org/download/pdfview_1/euclid.aoas/127...](https://projecteuclid.org/download/pdfview_1/euclid.aoas/1273584455)).
I haven't used them, but I'm guessing because it uses MCMC to update the
params it's not super fast. I've seen them used in causality applications for
partial dependency plots where it's convenient to convey the certainty of a
variable's effect.

~~~
gbrown
Check out the SoftBART method, I think there are interesting optimizations
possible there. XBart also looks promising as an approach.

------
tromp
The implementation includes Connect Four as an example application. While the
standard board size of 7x6 is indeed solved, as they note, and in fact all
sizes up to 8x8 are [1], they could have picked 9x8 or 9x9 which are currently
unsolved. The latter is the new standard size on Little Golem which upgraded
from 8x8 when that was solved.

[1] [https://tromp.github.io/c4/c4.html](https://tromp.github.io/c4/c4.html)

[2]
[http://www.littlegolem.net/jsp/games/gamedetail.jsp](http://www.littlegolem.net/jsp/games/gamedetail.jsp)?
gtid=fir

[3]
[http://www.littlegolem.net/jsp/forum/topic2.jsp?forum=80&top...](http://www.littlegolem.net/jsp/forum/topic2.jsp?forum=80&topic=77)

~~~
jonath_laurent
I completely agree with you. Let me just add two remarks. First, although
picking 9x9 boards makes connect-four intractable for bruteforce search
indeed, I would be suprised if it made it much more difficult for AlphaZero,
which relies on the generalization capabilities of the network anyway. Second,
using a solved game for the tutorial is a feature, not a bug. This allows
precise benchmarking of the resulting agent as a ground truth is known.

~~~
tromp
I did not see an evaluation of how close to perfection the agent becomes. Did
you compute any sort of error rate (by finding moves that turn a won position
into a non-won one or a drawn position into a lost one) ? And how this error
rate drops over time as learning advances? That would indeed be very
interesting to see.

~~~
vishvananda
My team did an implementation of alpha zero connect four a couple of years
ago. Our findings are in a series of blog posts starting at
[https://medium.com/oracledevs/lessons-from-implementing-
alph...](https://medium.com/oracledevs/lessons-from-implementing-
alphazero-7e36e9054191). We didn't manage to get to perfection either on
policy, but got pretty close. You can play against some versions of the
network here: [https://azfour.com](https://azfour.com)

~~~
jonath_laurent
Your series of blog articles has been an important source of inspiration in
writing AlphaZero.jl and I cite it frequently in the documentation. Thanks to
you and your team!

------
jonath_laurent
Author here: I am happy to answer any question you may have about
AlphaZero.jl. :-)

~~~
master_yoda_1
I am confused about the FAST part, it is faster than all the other
implementation (some of them are in c++) or it is just julia implementation
and you think it is fast? I am asking because if julia is faster than c++ for
ml/dl I would prefer to use it for production use cases.

~~~
jonath_laurent
This needs clarification indeed. As I explain in the documentation, the aim of
AlphaZero.jl is not to compete with hyper-specialized and hyper-optimized
implementations such as LC0 or ELF OpenGO. These implementations are written
in C++ with custom CUDA kernels and they are optimized for highly distributed
computing environments. They are also very complex and therefore pretty
inaccessible to students and researchers.

The philosophy of AlphaZero.jl is to provide an implementation of AlphaZero
that is simple enough to be widely accessible for students and researchers,
while also being sufficiently powerful and fast to enable meaningful
experiments on limited computing resources. It has the simplicity of the many
existing python implementations, while being consistently between one and two
orders of magnitude faster.

More generally, the AlphaZero algorithm is extremely general and I think it
can find applications in many research domains (including automated theorem
proving, which is my own research area). I have been surprised to see that,
despite the general excitement around AlphaZero, very few people actually
tried to build on it. One explanation, I think, is the lack of accessible
open-source implementations. I am trying to bridge this gap with AlphaZero.jl.

~~~
O_H_E
Always great to see someone who finds something broken, and fixes it for
others to move forward.

Thank you.

------
FiberBundle
Does anybody know how long it would take to train an alphazero go version
using one gpu? In [1] they claim that it took 13 hours until the model was
able to beat the original alphago version, but they don't state what hardware
they used.

[1] [https://deepmind.com/blog/article/alphazero-shedding-new-
lig...](https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-
games-chess-shogi-and-go)

~~~
arijun
I can’t find it now but iirc there was a blog post on HN about a month ago
that estimated their training costs at $25 million, using many TPU pods.

~~~
cgreerrun
Here was the guestimation: [https://www.yuzeh.com/data/agz-
cost.html](https://www.yuzeh.com/data/agz-cost.html)

------
tbenst
First of all this is very cool. Dunno if author is on here, but I’m curious
why both Flux and Knet are used rather than just one of them (Flux seems the
most Julianic?).

Also, is this really faster than PyTorch/TF? Last time I benchmarked Flux for
non-trivial networks, the speed was quite good with small models but memory
usage was ~5x higher than pytorch, and I couldn’t fit my models on the GPU for
flux. For large models, I had to compromise on batch size in Julia, although
maybe with Zygote.jl the memory issues have been resolved?

~~~
jonath_laurent
I suspect FLux/Knet are still slightly slower and less memory efficient than
PyTorch/TF, although things are moving very fast here!

This is not relevant in understanding AlphaZero.jl speed though. The reason it
is much faster than Python implementations is because tree search is also a
bottleneck, and Julia shines here!

~~~
tbenst
Ah, I hadn’t appreciated this. Thanks for making & sharing your code!

------
metalwhale
Disclaimer: I'm not the author. Just want to share this awesome project.

------
likeaj6
This is awesome! I worked on a similar project in the past for the game Hex

Did a writeup here about it:
[https://notes.jasonljin.com/projects/2018/05/20/Training-
Alp...](https://notes.jasonljin.com/projects/2018/05/20/Training-AlphaZero-To-
Play-Hex.html)

[https://github.com/likeaj6/alphazero-
hex](https://github.com/likeaj6/alphazero-hex)

~~~
jonath_laurent
Actually, I found your blog article when I was reading about AlphaZero and I
found it useful!

------
mtgp1000
I don't know anything about Julia...how hard would this be to port to python
or a c-style language?

Edit: I was mainly asking because I was curious about the relative
expressiveness Julia...

~~~
ViralBShah
I was going through this project over the weekend. And while I can't recall
where exactly in the docs I read this, I am quite sure the author mentioned
that there are various python projects but they are quite slow. Other
implementations such as leela chess zero have a lot of C++ and are difficult
to follow.

In fact, one of the things we want to do is maximize the performance of the
Julia implementation. We hope to co-develop the compiler and ML stack to
address these issues as they come up.

~~~
doublesCs
Truly truly thank you for your work <3

~~~
newswasboring
Not sure you are thanking jonath_laurent (original author of package of
discussion) or ViralBShah (co-creator of Julia). But I concur on both accounts
:D

~~~
doublesCs
Viral. I thanked Jonath in a different post :-P

When I see projects like this (I mean especially Julia, but also people
sharing their work on packages like this) I feel very fortunate that elements
of the free software movement are still alive.

