
How much did AlphaGo Zero cost? (2018) - hamsterbooster
https://www.yuzeh.com/data/agz-cost.html
======
irjustin
Alpha Go Zero inspired the development of an open source version, Leela Go
Zero which Leela Chess Zero is forked from by the same guy who made Stock
Fish.

Lots of people contribute what I imagine are amounts of CPU Power/money to the
Leela Chess Zero project[1].

Would love to see Alpha Chess vs Leela Chess.

[1] [https://training.lczero.org/](https://training.lczero.org/)

[edit] I've caused terrible confusion by melding Leela Go and Leela Chess when
Leela Chess was originally forked from Leela Go and that's basically when
similarities end.

Edited for a bit more clarity.

~~~
mikorym
AFAIK Garry Kasparov to this day does computer&human vs. computer&human chess
research, and it's far from a solved problem.

~~~
air7
Is human&computer better than computer only?

~~~
luisvictoria
Not really, humans barely provide insight (if anything), which chess engines
don’t already consider. Deep Blue could evaluate 200 million different
moves... per second. And that’s from 1997.

The few and rare times an engine gets funky is usually in end-game positions
where the engine can’t seem to find a sacrifice to win the game and will
output a current position as drawn. These cases are few and I very much doubt
that a human would be able to find these moves in an actual match.

Now if you’re talking about the way the chess engine learns, it can learn in
two different ways: without human help (learning completely on its own giving
it nothing but the rules which is how AlphaGo works), or with human aid
(through chess theory accumulated over centuries of human matches that these
engines have built in as part of their evaluations). Things get very
interesting.

I’d recommend you to look up a few games between AlphaGo and Stockfish, which
embody these two different philosophies and battle it to the teeth and bones.
The matches are brilliant. I would say though that it seems like AlphaGo
(learning the game entirely through scratch without human help) has seemed to
triumph more times than Stockfish and with the nature of these systems, I’d
suspect it to continue that trend.

~~~
marksweston
I'm not sure it's right to characterise Deep Blue or Stockfish as repositories
of human chess theory. Fundamentally they were all based on a relatively
simplistic function for calculating the value of a board position combined
with the ability to evaluate more board positions further into the future than
any human possibly could (plus a database of opening moves). That approach
seems thoroughly non-human, and represents a victory of tactical accuracy over
chess theory or strategy.

However I agree that the games between AlphaGo and Stockfish are really
interesting. It strikes me that the AlphaGo version of chess looks a lot more
human; it seems to place value on strategic ideas (activity, tempo, freedom of
movement) that any human player would recognise.

~~~
luisvictoria
I think you're right, I meant to say that chess engines usually have book
openings built into them which derive off of human chess theory but you're
absolutely right in that they don't play in a human form.

It's kind of crazy how AlphaZero has managed the success it has. Stockfish
calculates roughly 60 million moves per second and AlphaZero calculates at
only 60 thousand per second. Three orders of magnitude less yet its brilliance
is mesmerizing, tearing Stockfish apart in certain matches.

------
conistonwater
Are they using the on-demand price instead of the preemptible price? It seems
like the sort of job that can run on preemptible machines, just because it's a
batch job. Also, should the cost really be calculated using public market
prices at all, as opposed to the running costs of the TPUs? It is not
guaranteed at all that the opportunity cost to Google of using all those TPUs
is equal to the price that you or I would pay Google to use them. I understand
it cost a lot, but I'm not convinced by the headline figure of $36M.

~~~
Certhas
The article actually addresses that. The precise number is not the point but
the ballpark is:

"In terms of actual cost to DeepMind (a subsidiary of Google’s parent company)
to run the experiment, there are other factors that need to be taken into
account, such as researcher salaries, or that the quoted TPU rate probably
includes a healthy amount of margin. But for someone outside Google, this
number is a good ballpark estimate of how much it would cost to replicate this
experiment."

~~~
conistonwater
KataGo and Leela Zero and all the other AIs certainly didn't cost that much
(the people running them wouldn't have had that much money and resources) and
are probably stronger than Alpha Go Zero. I don't think it's at all fair to
say _this number is a good ballpark estimate of how much it would cost to
replicate this experiment_. It's wrong as a calculation of Google's costs,
it's wrong per the title _How much did AlphaGo Zero cost_ , and it's also
wrong as an estimate of the cost of replication.

~~~
qayxc
> KataGo and Leela Zero and all the other AIs certainly didn't cost that much

And you base that claim on what exactly? Leela Go is trained by the community,
which donates self-play resources. Just because you outsource your cost to
volunteers doesn't mean it's free!

In order to get to a realistic estimation, you'd need to get the average cost
for electricity, hardware cost (proportionate to use), and of course
opportunity costs.

Since you cannot do that, I'd argue that you have no clue what the true
training cost of these projects compared to on-demand/cloud costs really are.

~~~
conistonwater
Go is a small community, and there's just no chance they managed to donate
$36M worth of anything, whatever the electricity and hardware costs, that's
just too much money. KataGo's page says it was developed using resources
donated by Jane Street, the company that its developer worked for; the general
magnitude of numbers is also way off: sure they're a quantitative trading
firm, but it's implausible that they'd donate $36M to develop a go AI.

~~~
qayxc
The cost estimate was based on what what you and I would have to pay if we
were to train AlphaGo Zero in 40 days on the given hardware using the reported
number of games and resources.

Just replace the self-play TPU resources with commodity hardware or even just
cheaper GPU compute providers and you'd reduce cost 10-fold by just not using
TPUs. Same goes for the number of self-play games.

That still doesn't change the estimate itself. IF the other projects would've
used Google TPUs, they'd well have been around the same cost as the estimate.

I really don't understand what you're trying to argue against here.

------
tech-historian
Achieving a new breakthrough in computing is often very expensive. Deep Blue
is estimated to cost IBM over $100 million over a decade [1].

And in comparison to large tech company R&D budgets, the amount cited in the
article is a drop in the bucket. Consider the fact that Google spent $26
billion in R&D budget in 2019 alone [2]. Microsoft spent almost $17 billion
[3].

[1] [https://www.extremetech.com/computing/76552-project-deep-
bli...](https://www.extremetech.com/computing/76552-project-deep-blitz-chess-
pc-takes-on-deep-blue/2)

[2] [https://www.statista.com/statistics/507858/alphabet-
google-r...](https://www.statista.com/statistics/507858/alphabet-google-rd-
costs/)

[3] [https://www.statista.com/statistics/267806/expenditure-on-
re...](https://www.statista.com/statistics/267806/expenditure-on-research-and-
development-by-the-microsoft-corporation/)

~~~
sukilot
Note that everything in software development is R&D. Building (not operating)
Gmail and Android and Office and Azure are R&D.

~~~
geodel
Right. That's the Development part of Research and Development.

~~~
mehh
And they get tax breaks, so we are motivated to class as much as possible as
R&D.

------
hjnilsson
Another way of thinking about how efficient the brain is: By the article’s
numbers, about 5.5 million TPU hours were required to train the machine to
play as well as a Go champion.

A Go champion might have trained for 8 hours a day, for 15 years (age 5 to
20). That is about 40 000 hours.

In other words, machines required 137 times longer to learn the game, and at
twice the power consumption! There is still a lot of room for improvement.

~~~
phreeza
This comparison is not entirely fair because the human brain also benefits
from priors baked in over the entire course of evolution.

~~~
andbberger
That's a pretty big claim. One could argue that the topology of the brain is a
prior, analogous to the architecture of a neural net. But considering that we
really have no idea how learning happens in the brain on a large scale, you
really can't say.

~~~
PeterisP
I believe that there's a reasonable (but unprovable) assumption that any games
which humans usually play - because the rulesets are learnable and interesting
for humans - implicitly rely on priors of human brains and behavior.

The space of possible games is huge (infinite?), but only a tiny subset of
these games could reasonably become a popular game for humans.

E.g. it's not an arbitrary random coincidence that the scoring rules for each
grid intersection in go are the same (I mean, it could vary in an arbitrary
pattern), it ensures that the ruleset is small enough so that humans can learn
it.

It's not an arbitrary random coincidence that the playing of go involves
pattern recognition on some level, since that's what we're good at and find
interesting in many games.

It's not an arbitrary random coincidence that in Mario game after jumping the
sprite falls back down eventually; that's reusing the priors from real world
physics.

~~~
andbberger
Games are designed to be fun and playable by humans, this doesn't seem
surprising

------
sorenbouma
I might be wrong, but I think this cost calculation is way off:

Their running cost estimate of a single TPU in a machine with 4 "TPUs" is
based off the price of a cloud TPU v2-8, but a v2-8 is actually 4 ASICS on 1
board.

Also, because of the date of publication being around the time v2s were
announced, and the fact that the TPU is only used for inference and GPU is
used for training, I think self play was likely done on TPU v1s, which use 5x
less power per ASIC and so are likely much cheaper

I also think the way they calculated the number of TPUs required is wrong, it
looks like they assume 1 machine with 4 TPUs makes 1 move in 0.4 seconds, but
since making 1 move only requires a forwards pass through a moderately sized
CNN with 19x19(tiny) input, 1 TPU should be able to make thousands of moves in
parallel per second.

~~~
brilee
Making one move requires 1600 MCTS playouts to explore the game tree, so it's
a 1600-1 correspondence of "forward pass" and "move played".

------
ipsum2
Alpha Go Zero*, which was trained from scratch, without human games.

I've also heard rumors that AlphaStar
([https://deepmind.com/blog/article/alphastar-mastering-
real-t...](https://deepmind.com/blog/article/alphastar-mastering-real-time-
strategy-game-starcraft-ii)) was essentially put on hold because it was too
expensive to improve/train. The bot wasn't able to beat StarCraft champions
and _only_ got to a grandmaster level.

~~~
Symmetry
Alpha Go Zero used a combination of deep learning within a classical AI Monte
Carlo simulation framework. Without a similarly effective framework its not
surprising that Alpha Star wasn't able to achieve similar success. I've
watched a lot of Alpha Star replays and the lapses in forethought glare
through pretty regularly, though other aspects of its judgement (not just
reflexes!) seem frankly super human.

~~~
embrassingstuff
Why was MCTS (or some search variant) not used in alphastar ?

(Sure, u need to somehow roll forward and rollback the StarCraft world, but
for Atari using MCTS was shown to be an order of magnitude more efficient )

I have also seen comments that the search width is too large, or maybe
academic purity consideration?

------
trashburger
The amount was removed from the submission title, which sucks if you're like
me and don't like to visit yet another possibly JS-heavy site and drain your
battery.

For others: It's $36M.

~~~
ramraj07
Are you working off one solar panel on the way to Mars? First time I'm hearing
battery drain as a reason to not visit a js site.

~~~
qayxc
Phones. F'in phones man :D

------
skywhopper
I'll quibble with a little bit of this.

"AlphaGo Zero showed the world that it is possible to build systems to teach
themselves to do complicated tasks."

It didn't do any such thing. The game of go has a huge number of potential
moves and outcomes, but the rules themselves are trivial, the board position
can be measured in a handful of bytes and gameplay always and only progresses
in one direction. And judging a good vs bad outcome is just a matter of
comparing two numbers.

Go is challenging and interesting for humans, but it's not remotely as
"complicated" as driving a car or translating a language.

------
jonplackett
It’s a shame that this ‘Next big thing’ is the complete opposite of the
internet. Instead of opening up the world for anyone to create things, letting
small companies compete with large, it is only going to concentrate power with
the richest companies and leave small companies unable to get involved.

~~~
dcolkitt
“Well sure, AlphaZero looks impressive but I predict that within one hundred
years, deep learning systems will be twice as powerful, ten thousand times
larger, and so expensive that only the five richest kings of Europe will own
them.”

------
gridlockd
It is estimated to be 36 million for _someone else_ to train AlphaGo Zero,
assuming they use Google TPU instances and pay the sticker price.

Google isn't operating with that cost, unless we assume that they are
prioritizing AlphaGo to the point where they lose such customers 100% of the
time.

It's way more likely that AlphaGo is trained on spare time, the cost for the
hardware is sunk anyway, so only the cost for upkeep is real.

~~~
pixelpoet
> It's way more likely that AlphaGo is trained on spare time, the cost for the
> hardware is sunk anyway, so only the cost for upkeep is real.

Not quite, power is quite expensive and basically all modern computers use far
less power at idle than going full bore saturated with multiply-add
instructions and perfect memory streaming.

Having said that, I agree that there is a substantial cost efficiency gain if
they can schedule it during periods of inactivity.

------
zucker42
I wonder if the code and network weights will ever see the light of day. I
wonder what the eventual value proposition of working on this sort of stuff
is. I suppose they are just going to try to apply the algorithms to better
things.

I've been interested in the application of AlphaZero to chess. It's sad that
this many resources were devoted to something which we can't even use to play
chess as of now. Leela (the open source reengineer) is really strong, but the
crushing results presented in the AlphaZero paper never materialized. And this
article just shows how hard they are to replicate.

~~~
RobertoG
>>" I wonder what the eventual value proposition of working on this sort of
stuff is."

It seems to me that, if you only take it as a marketing operation, it has been
already very valuable.

~~~
hobofan
I'd really love to know how big the marketing impact on IBM Watson from this
was. Somehow they managed to put the idea that "Watson is the best available
AI" so well in the heads of then general population that 50%+ of my non-tech
friends somehow thought that Watson beat the Go pros when I was talking with
them about the topic when it was in the news.

------
Lucasoato
> Each move during self-play uses about 0.4 seconds of computer thinking time.

> Over 72 hours, 4.9 million matches were played.

One of this claim must be incorrect or misinterpreted, I highly doubt they
used so many TPU's as the article claims. That would be not only impractical
but also it would raise a lot of other issues like networking, disk speed...
etc...

My statement is not against this article, if anyone can confirm they used so
many TPUs in parallel feel free to post it

~~~
MauranKilom
72 hours are 259200 seconds.

Playing 4.9 million matches of ~100 plies each at 0.4 seconds per ply is
196000000 seconds.

That's < 1000 TPUs. Sounds big but not too-large-for-google big. But other
comments here say that the 0.4 second number is also wrong (and in fact
significantly lower).

------
NVHacker
The (synchronous) 0.4s per move number is misleading (and wrong), that's not
what the paper is saying. The "footnote 1" of the article is wrong.

------
tinco
Our main compute doesn't go towards machine learning, but we do rely heavily
on GPU power. I recently had to come up with the figures for us to invest in
an expansion of our compute power, and it turned out that buying the machines
ourselves would be cheaper than renting them from Google in 3-4 months.

We don't run on those fancy V100 cards though, just regular old gaming cards
suffice, and I suppose if we bought the "industrial" nvidia versions it would
a take a bit longer to recoup, but still definitely within the year.

Anyway what I'm saying is that it's probably possible to to this a lot cheaper
than 36M, though maybe not in such a short time. Our startup is extremely cash
intensive, and I bet machine learning companies are as well (I suppose machine
learning experts aren't cheap ;)), so if we can put in some work and safe a
big portion off our hardware costs that really goes the distance.

~~~
gridlockd
How Google started out: Run servers from a shack, using commodity PC hardware

How modern startups start out: Spend $50,000/month to run hundreds of
microservices on a managed Kubernetes cluster

~~~
est31
Which of the modern startup founders have gotten into the Stanford PhD program
as teenagers? Those people are extremely rare and there is far too much VC
floating around to focus only on such talent. Most founders, while still
outstanding, are not from that class of person. Also, due to the larger sums
of money being thrown around, people can just do those things now and spend
their time growing their business instead of optimizing it. In the VC funded
environment, optimizing businesses lose out to competitors which grow the
fastest.

------
vadarvariu
Now, consider that this is the cost of the _final_ model reported in the
paper. This doesn't account for all the iterations of trying out e.g.
different model architectures, hyperparameter sweeps, etc. The true cost of
the experimentation is likely at least an order of magnitude higher.

------
jaekash
> This accomplishment is truly remarkable in that it shows that we can develop
> systems that teach themselves to do non-trivial tasks from a blank slate,
> and eventually become better than humans at doing the task.

"non-trivial" is a bit of a red herring here. Playing go is pretty trivial
compared to something like walking or scratching your face. Winning go may be
non-trivial compared to those in some ways but it is very trivial in
comparison in other ways.

~~~
PeterisP
Playing go _at that level_ is non-trivial compared to walking because (a) most
humans can walk, but not even the best human go masters can play go at that
level; (b) we had algorithms that allow bipedal robots to walk long before we
had algorithms for playing go at that level.

~~~
simiones
Do we have algorithms that allow bipedal robots to walk _at human level_? Or
run at, let's say, 10th grade standard student level?

~~~
PeterisP
Bipedal walking and running on uneven terrain has been demonstrated years ago
- Boston Dynamics has a lot of nice videos; e.g. here's a four year old video
on walking
[https://www.youtube.com/watch?v=rVlhMGQgDkY](https://www.youtube.com/watch?v=rVlhMGQgDkY)
, here's two year old videos on running
[https://www.youtube.com/watch?v=vjSohj-
Iclc](https://www.youtube.com/watch?v=vjSohj-Iclc) and
[https://www.youtube.com/watch?v=LikxFZZO2sk](https://www.youtube.com/watch?v=LikxFZZO2sk)
\- perhaps it's debatable if it's "full human level" but it seems sufficient
to me for most purposes. IMHO any improvements to that would be mostly for
sensors (seeing what terrain is there even in poor conditions e.g.
fog/rain/snow) and modelling environment (understanding which objects might
break or slide if you step on them), not walking/running as such.

The main problem for bipedal robots that makes them still impractical is the
hardware expense (wheels are simpler and cheaper) and the power supply
required, so for most use cases it's more efficient to use something other
than a bipedal robot and there's limited business application and future
revenue in scaling up research demos of bipedal walking to practicality, so
most people who are working on walking algorithms are doing so in simulated
virtual environments (where we have algorithms that can learn walking and
running "from scratch" through experimentation) and not building very
expensive hardware.

~~~
username90
> it seems sufficient to me for most purposes.

Current self driving car technology is sufficient for most purposes, except to
actually drive on roads. So for those walking robots, can they run or even
walk through a crowd without hitting people? A normal 15 year old human can do
it, and that is the level you need to be to release it among people.

~~~
tokai
I'll be impressed when they can finish an orienteering course.

------
ggm
Does Sarbanes Oxley apply to zero rating ML costs? Alpha go might have unfair
kyu ranking, if Google don't have to "pay" to acquire rank. (95% joking)

------
FartyMcFarter
> The power consumption of the experiment is equivalent to 12,760 human brains
> running continuously.

Given the experiment lasts for just days, this actually sounds pretty
impressive I think.

Many humans studied the game for a big portion of their lives in order to get
Go knowledge where it is.

------
amelius
I'd like to see an AI play Monopoly (the board game) against CEOs of large
companies.

------
antris
I wonder if there's some kind of software that takes an advantage of an AI to
teach non-beginner players Go. E.g. you could play against the bot and then
the AI would translate your mistakes into what you can improve upon.

------
phonebucket
This is a lot of money.

However, if you want to reliably make an AI the best in the world at a range
of complicated tasks, can you reasonably expect this to be cheap?

------
raverbashing
Wondering when researchers will switch from "race to the moon" mode to looking
at better optimization techniques instead of just throwing money at the
problem.

I know some companies are doing that, but I think looking at AlphaGo or AGZ
and making it go faster should be an interesting problem in itself.

~~~
sanxiyn
KataGo optimized AlphaZero and achieved 50x compute reduction.

[https://arxiv.org/abs/1902.10565](https://arxiv.org/abs/1902.10565)

------
ksec
The interesting part to me, rather than cost, is Energy usage.

>The power consumption of the experiment is equivalent to 12,760 human brains
running continuously.

But the problem is this "brains" unit on AlphaZero doesn't seems to take into
account of GPU, CPU and Memory involved. It only took the TPU numbers.

Then there is another problem.

> a TPU consumes about 40 watts,[1]

The TPU referred to was a first Gen TPU built on 28nm running at 40W, more
like a proof of concept. Currently Google is with Cloud TPU v3 [2], The
latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance.
And each TPU v3 is actually a four chip module. [3]. If a single chip is 100W
that is 400W per TPU.

Edit: Turns out Wiki list TPU v3 as 250W. [4]. Not sure if that is 250W per
chip or 250W for 4 Chips.

That is on the assumption they are very high powered and hence would require
liquid cooling. Although that might not always be the case.

So adding CPU, GPU, Memory, and TPU figures. That original estimate of 12,760
human brains may be off by a factor of 10 if not more.

Still pretty impressive. Considering we now only get about 1.8x improvement
with each generation node. We would get about 19x by 2030. ( Assuming the same
algorithm ). Which means AI is good, but human brain on its own is still very
much magical in its efficiency :)

Correct me If I am wrong on the numbers.

My other questions is, that was how much energy it used to learn Go. But what
about energy it used during the Game?

How would AlphaGo Zero perform if it was limited to 20W?

[1] [https://cloud.google.com/blog/products/gcp/an-in-depth-
look-...](https://cloud.google.com/blog/products/gcp/an-in-depth-look-at-
googles-first-tensor-processing-unit-tpu)

[2] [https://cloud.google.com/blog/products/ai-machine-
learning/g...](https://cloud.google.com/blog/products/ai-machine-
learning/googles-scalable-supercomputers-for-machine-learning-cloud-tpu-pods-
are-now-publicly-available-in-beta)

[3] [https://techcrunch.com/2019/05/07/googles-newest-cloud-
tpu-p...](https://techcrunch.com/2019/05/07/googles-newest-cloud-tpu-pods-
feature-over-1000-tpus/)

[4]
[https://en.wikipedia.org/wiki/Tensor_processing_unit](https://en.wikipedia.org/wiki/Tensor_processing_unit)

------
Kronen
I would be more interested in how much LCZero cost?

------
seb314
the article doesn't seem to consider cost of hyperparameter optimization prior
to the final training...

------
magwa101
Ahem, "one time cost"

------
angel_j
A: $400M—to acquire DeepMind

------
lihaciudaniel
You need 35$m to beat the best Stockfish engine which can work on a small
computer. Who won?

------
justplay
sorry if it is off topic but I want to learn alpha zero from beginning , I do
have little understanding of Machine & deep learning including vision
recognition. Unfortunately I don't able to understand how monto Carlo tree is
used for decision making. where I can start, what shall I learn so that I can
learn alpha go (or OpenAI Five - Dota 2 bit).

thanks

------
rurban
Misleading numbers, and wrong calculations. The TPU and CPU cost them almost
nothing as they use and build them anyway, and renting them out for this PR
stunt just cost them the missed rental time, if customers would really pay
that much. Maybe around 20.000. Energy cost? I don't see much additional costs
as those machines run all the time, regardless if improving the model or doing
something else.

I bet the much higher cost was the PR team, including the film team, press
support, TV team, travels, inviting the expert Go players, building the stage,
and such. Estimated 100.000.

Not counting the man hours, they were just doing their normal job.

~~~
melbourne_mat
"renting them out for this PR stunt just cost them the missed rental time"

Because renting them out generates no revenue, right?

"Maybe around 20.000"

At least the article used a formula for the calculation. You just picked a
number at random.

~~~
rurban
You need actual customers paying that much, not hypothetical ones. That was my
educated guess.

And the title was "His much did it cost" not how much it would cost.

