Hacker News new | past | comments | ask | show | jobs | submit login
How much did AlphaGo Zero cost? (2018) (yuzeh.com)
214 points by hamsterbooster on June 11, 2020 | hide | past | favorite | 175 comments

Alpha Go Zero inspired the development of an open source version, Leela Go Zero which Leela Chess Zero is forked from by the same guy who made Stock Fish.

Lots of people contribute what I imagine are amounts of CPU Power/money to the Leela Chess Zero project[1].

Would love to see Alpha Chess vs Leela Chess.

[1] https://training.lczero.org/

[edit] I've caused terrible confusion by melding Leela Go and Leela Chess when Leela Chess was originally forked from Leela Go and that's basically when similarities end.

Edited for a bit more clarity.

The great thing about these community driven efforts is that it is indeed feasible to reproduce these super expensive efforts. I'm a bystander now, as new maintainers have taken over, and they are doing a fantastic job pushing things forward.

This is also how Stockfish got to be the #1 engine. By being open source, and having the testing framework (https://tests.stockfishchess.org) use donated computer time from volunteers, it was able to make fast, continuous progress. It flipped what was previously a disadvantage (if you are open source, everyone can copy your ideas), into an advantage - as you can't easily set up a fishtest like system with an engine that isn't already developed in public.

I think KataGo is stronger than Leela Zero.


I was suspecting another boring clone, but Kata looks like a cool project with nice new ideas! Thanks for sharing

It is indeed independend and has a lot of nice ideas and theory contributions. It turned up after I stopped actively competing in computer Go, so I can't tell how strong it is. It certainly has a lot of potential and I'd probably pick it as a base to work from nowadays.

KataGo is also cool for those of us that use bots to review our human games because it has score estimator. With Leela a 0.5 win and a 20.5 win in the endgame can both amount to 99.5% chance of winning, but to us (amateur) humans that is not true.

There is a Go client that is distributed with Kata Go and was built for that purpose: https://github.com/sanderland/katrain and is distributed as a pip package, making it very frictionless to install.

With a tool like this, instead of waiting for a Go pro to visit our local Go club to review our kifu, I can have my game reviewed move by move until the end (not only the first n moves).

I can still have questions for pros, but they would be more specific.

Now, playing a superhuman intelligence bot can be unfun. Now matter how much effort you put in a move, you will just keep making your outcome worse with every move.

Another important use-case is that the AI can also tell you if a joseki is actually joseki, and how to refute a bad joseki move.

I'd love to see some CPU/GPU/TPU donations to @chewxy's "agogo"[1] which is AlphaGo (AlphaZero) re-implemented in Go as a sort of proof of concept / demo of Gorgonia[2].

[1]: https://github.com/gorgonia/agogo

[2]: https://github.com/gorgonia/gorgonia

As far as I know Gian-Carlo Pascutto is not among the original authors of Stockfish, though he did work on chess engines.

Perhaps you were confused because Leela Chess Zero was forked from Leela Zero (neural network Go engine by Pascutto) but it includes Stockfish's move generation logic.

I think OP was referring to Gary Linscott who has made major contributions to both Stockfish and made an adaptation of Leela Zero to Chess, now living under the Leela Chess Zero Github org but apparently a different adaptation is now the officially sanctioned one, at least his commits don't show up in the new lc0 repo.



He's also well known for his absolutely amazing work on audio fingerprinting.

There are several people who wrote Stockfish, including Tord Romstad, whose Glaurung was the initial base for Stockfish.

Glaurung was pretty innovative at the time.

AFAIK Garry Kasparov to this day does computer&human vs. computer&human chess research, and it's far from a solved problem.

If by "it's far from a solved problem" you mean that chess isn't a solved game that's true.

But Kasparov and others have given up on the idea that a human provides any unique insight into chess anymore. Computers are just better.

You are right if by "better" you mean "competitively stronger at tournament or rapid conditions". Humans are still way stronger strategically and competitively if given enough time and resources to avoid tactical mistakes. So yes, humans still provide unique insight into chess every day in correspondence chess or analytic research.

Humans aren't stronger strategically anymore either, under any conditions.

In 2014 a heavily handicapped Stockfish beat the 5th ranked player in the world (Nakamura) under tournament conditions despite no access to its opening or closing books and a one pawn handicap.

The match you are referring to was played under tournament conditions that clearly handicapped the human Grandmaster. I read from the report of the match [0] that "The total time for the match was more than 10 hours [...] The two decisive games lasted 147 and 97 moves." This unfavourable conditions clearly penalized the human and so the result can hardly be taken as meaningful regarding the strategic superiority. From the quietness of my room I instead regularly find strategic plans that overcome my and my opponent's computers. Feel free to join the correspondence chess federation [1] to experience the joy and pain of strategic research!

[0] https://www.chess.com/news/view/stockfish-outlasts-nakamura-... [1] www.iccf.com

That's absurd. He works on human+chess research. He obviously hasn't given up on human insight.

What he has given up on is a single human beating a computer.

I haven't seen any recent (last 5 years or so) articles by him showing he's doing any work on it. There's a few recent articles where he talks about how humans should work with machine learning, but nothing specific.

When he recently came out of chess retirement he didn't talk about it at all in either 2017 or 2019:



There's nothing recent about him on https://en.wikipedia.org/wiki/Advanced_chess

Can a human + computer beat just a computer?

I can't imagine a human doing anything besides making things worse or even.

It's much more difficult than it used to be, but I think there is still some value to human guidance, more as a "referee" than anything else.

Right now we have essentially two top tier engines -- traditional brute force with alpha beta pruning (stockfish), and ML (leela). Both alone are incredibly strong, but they are strongest and weakest in different types of positions. A computer chess expert, who knows what kind of positions favor stockfish and what kind favor leela, could act as a "referee" between the two engines when they disagree, and when they are unanimous, simply accept the move.

Ten years ago, a grandmaster driving a single engine could typically beat an equal strength engine. I don't think that's the case anymore.

But I think if you have someone who is an expert at computer chess -- not so much a chess grandmaster, and you gave them Leela AND SF, and let them pick which one to use when in the case of conflicts -- they would score positive against either leela or stockfish in isolation.

Larry Kaufman designed his new opening repertoire book by doing exactly this -- running Leela on 2 cores + GPU, and stockfish on 6 cores, and doing the conflict resolution with his own judgement.

The human can certainly no longer pull his own moves out of thin air, though.

Computer mastery of Go has reached the point where it is a difficult task for an expert (read: grandmaster level) human to even follow what is happening. It is totally implausible that a human could resolve a conflict between top engines in a meaningful way.

It is unlikely that Chess is any different. Any superficial understanding by a human of which move is 'better' is just ignorance of the issues around evaluating a position. If you have statistical evidence that is something. 'But I think' is not evidence.

It might be entertaining to have a human involved. It isn't going to help with winning games.

I can't speak for Go but in Chess the best players in the world understand the nuances of a position still better than the computer engines and - if occasionally proven wrong by the computer analysis - are able to understand the refutation and refine their strategic eveluation. I know this because it's what I've been doing in the past seven years in the realm of correspondence chess to gain the title of international master.

I don't even know the rules of Go, but I am a long-time chess enthusiast, and I have a decent, but not top-level, understanding of chess (I am a FIDE master and I also play correspondence chess (which is human+engine) and have an interest in computer chess heavily).

I can absolute guarantee you that a human (who is an expert in computer chess, someone like Larry Kaufman) + engines will beat a single engine over the long run. With current tech and computing power, this is ONLY because we have brute force (with alpha-beta pruning) and ML engines that are at near-equal strength, and have strengths and weaknesses in different types of positions, and that those strengths and weaknesses are understandable.

If we did not have AlphaZero, I don't think the human would be able to add anything at all currently.

He recently spoke quite dismissively of computer- augmented chess on the Lex Friedman podcast. Essentially, the computer knows best...so computer and human isn’t meaningfully different from computer and rubber stamper.

I strongly disagree. The best correspondence chess players often improve over the computer suggestions. It takes time, energy and a great strategic knowledge, but it’s still possibile.

Source: I’m a correspondence chess international master

Maybe the future of computer augmented chess is to form teams of a computer, a person, and a dog. The computer comes up with chess moves. The person feeds the dog. The dog makes sure the person does not touch the board.

Is human&computer better than computer only?

Not really, humans barely provide insight (if anything), which chess engines don’t already consider. Deep Blue could evaluate 200 million different moves... per second. And that’s from 1997.

The few and rare times an engine gets funky is usually in end-game positions where the engine can’t seem to find a sacrifice to win the game and will output a current position as drawn. These cases are few and I very much doubt that a human would be able to find these moves in an actual match.

Now if you’re talking about the way the chess engine learns, it can learn in two different ways: without human help (learning completely on its own giving it nothing but the rules which is how AlphaGo works), or with human aid (through chess theory accumulated over centuries of human matches that these engines have built in as part of their evaluations). Things get very interesting.

I’d recommend you to look up a few games between AlphaGo and Stockfish, which embody these two different philosophies and battle it to the teeth and bones. The matches are brilliant. I would say though that it seems like AlphaGo (learning the game entirely through scratch without human help) has seemed to triumph more times than Stockfish and with the nature of these systems, I’d suspect it to continue that trend.

I'm not sure it's right to characterise Deep Blue or Stockfish as repositories of human chess theory. Fundamentally they were all based on a relatively simplistic function for calculating the value of a board position combined with the ability to evaluate more board positions further into the future than any human possibly could (plus a database of opening moves). That approach seems thoroughly non-human, and represents a victory of tactical accuracy over chess theory or strategy.

However I agree that the games between AlphaGo and Stockfish are really interesting. It strikes me that the AlphaGo version of chess looks a lot more human; it seems to place value on strategic ideas (activity, tempo, freedom of movement) that any human player would recognise.

I think you're right, I meant to say that chess engines usually have book openings built into them which derive off of human chess theory but you're absolutely right in that they don't play in a human form.

It's kind of crazy how AlphaZero has managed the success it has. Stockfish calculates roughly 60 million moves per second and AlphaZero calculates at only 60 thousand per second. Three orders of magnitude less yet its brilliance is mesmerizing, tearing Stockfish apart in certain matches.

> ...learning completely on its own giving it nothing but the rules which is how AlphaGo works...

Not to be too picky, but it was AlphaGo _Zero_ that learned from the rules alone. AlphaGo learned from a large database of human played games: "...trained by a novel combination of supervised learning from human expert games". [1]

AlphaGo Zero, derived from AlphaGo, was "an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules". [2]

[1] https://www.nature.com/articles/nature16961

[2] https://pubmed.ncbi.nlm.nih.gov/29052630/

Also AlphaGo Zero never played chess, only go. It was AlphaZero that applied the same framework to other games including chess.

https://en.wikipedia.org/wiki/AlphaGo_Zero https://en.wikipedia.org/wiki/AlphaZero

> I’d recommend you to look up a few games between AlphaGo and Stockfish

Agadmator's youtube channel covers a bunch of those. https://www.youtube.com/watch?v=1yM0D1iZLrg

And some of the most amazing games are when AlphaGo is absolutely breaking chess "wisdom" left and right simply because it can see a forced solution on the horizon.

Pawn structure? BAH! King safety? CHARGE!

And then 75 moves later Stockfish is in zugzwang.

>Deep Blue could evaluate 200 million different moves... per second. And that’s from 1997.

And still he lost against Kasparov. Which doesn't happen now, top engines haven't been beaten by humans since ~2006.

Not really. It was tried and it turns out that the best strategy for the human is to just do what the computer suggests.

Could you provide a source for this?

I suppose he can’t because it isn’t true at all. The best correspondence players usually improve significantly over the computer suggestions.

Source: I’m a corrspondence chess international master

> The best correspondence players usually improve significantly over the computer suggestions.

I might be misunderstanding your claim, but how can humans playing correspondence chess beat Stockfish or Lc0?

In official correspondence games the computer assistance is allowed so most (if not all) of the players usually start their analysis with the computer suggestions (Stockfish, Lc0 or others). Some players limit themselves to this and play the engine's move, others try to improve with their own contribution. If no human contribution was possible, correspondence chess would become an hardware fight while results show that the best players can defeat "naive" opponents that rely on computer suggestions. In this sense, every correspondence chess win is a win over the opponent's hardware and engine.

Isn't it possible that you're not improving upon the engine's suggestions, but instead, your opponent is choosing suboptimal non-engine lines, and your engine is beating their weakened engine?

Occasionally it is possible. After seven years and more than one hundred games played I can tell you that I have been surprised by my opponent's reply not more than an handful of times. For "surprised" I mean he didn't play the top choice of the engine. In fact most of the times the best move in a given position is easily agreed on by any reasonable engine on any decent hardware. In few critical moments in the game, the best move is not clear and there are two or three or more playable alternatives that take into very different positions. In these cases the computer, after a long thought (one or more hours) usually converges to one suggestion and sticks to it even if given more time (a sort of "horizon effect"). These are the moments where a human, after a long thought, can overcome the computer suggestion and favor the 2nd or 3rd choice of the engine. So in brief no, I can't recall a game where I've been gifted the win by my opponent "weakened" move while most of the time I have confronted with the "engine's approved" suggestion and had to build my win by refuting it.

I assume that when you come across one of these novel moves, plug it into the computer, and give it time to search, it ultimately decides that it's superior?

Relatedly, can you give some examples of novel non-engine lines that turned out to be better than engine lines?

Sometimes if you play a move and the first plies (i.e. half moves) of the main variation the computer starts "understanding" and its score changes accordingly. Those are the cases where more hardware power could be useful and make the engine realize the change from the starting position. More often, the "non-engine" move relies on some blindness of the engine, so the computer starts understanding its strength only when it's too late. In these cases is unlikely that more power could bring benefits. Typical cases are

- fortresses [0]. One side has more material but the position can't be won by the superior side. As the chess rules declare the draw only after 50 moves without captures or pawn pushes, current engines can't look this far away and continue manouvering without realizing the blocked nature of the position. Some engines have been programmed to solve this problem but their overall strength decreases significantly.

- Threefold repetitions [1]. The engine believes the position is equal and move the pieces in - let me say - pseudorandom way. Only at some point it realizes the repetition can be avoided favourably by one side. Also this topic is frequently discussed in the programming forums but no clearcut solution has still emerged.

If you are looking for positions where human play is still better than engine's, the opening phase is the most fruitful. Most theoretical lines were born by human creativity and I doubt a chess engine will ever be able to navigate the intricacies of the Poisoned Pawn Variation of the Sicilian Najdorf [2] or the Marshall Attack of the Ruy Lopez [3]. Neural networks engines are strategically stronger than classical AB programs in the opening phase but they suffers from occasional tactical blindness. Engine-engine competitions often use opening books to force the engines to play a prearranged variation to increase the variabililty and reduce the draw percentage.

[0] https://en.wikipedia.org/wiki/Fortress_(chess) [1] https://en.wikipedia.org/wiki/Threefold_repetition [2] https://en.wikipedia.org/wiki/Poisoned_Pawn_Variation [3] https://en.wikipedia.org/wiki/Ruy_Lopez#Marshall_Attack

I'm interested because the experience in Go is humans simply can't keep up.

What is the evidence that it isn't a hardware or software differential between the players? I can't think of an easy way to ensure that both players started with computer-suggested moves of the same quality.

There are a lot of engines with rating on the chart way higher than the best humans, so every suggestion on their part should be in theory enough to overcome any human opponent. In practice most (if not all) of the players rely on Stockfish and Lc0 (both open source). During a game, most of the time the "best" move is easily agreed on by every reasonable engine on any decent hardware. Only in few cases during a game, the position offers two or three or more playable choices. In these cases a stronger hardware or a longer thought rarely makes the computer change his idea. It's a sort of horizon effect where more power doesn't translate into a really better analysis.

For example in a given position you could have 3 moves M1 - a calm continuation with a good advantage M2 - an exchange sacrifice (a rook for a bishop or a knight) for an attack M3 - a massive exchange of pieces entering into a favorable endgame. If the three choices are so different, the computer usually can't dwell enough to settle on a clear best move. Instead the human can evaluate the choices until one of them shows up as clearly best (for example the endgame can be forcefully won). In these cases the computer suggestion becomes almost irrelevant and only a naive player would make the choice on some minimal score difference (that can unpredictably vary on hardware, software version or duration of analysis). So the quality of the starting suggestion is somehow irrelevant if you plan to make a thoughtful choice.

I'm not sure about very recent chess engines, but for a long time, it was better. The human suggests several moves that would advance their strategy, and the computer dedicates its search time to evaluating the strength of those potential moves, which cuts down the search space considerably. It's called "advanced chess" or "centaur chess". https://en.wikipedia.org/wiki/Advanced_chess


Computers are now as much better than Magnus Carlsen as he is better than a moderate amateur.

If even the best player overrides a move he's much more likely to be reducing the strength of the move than increasing it.

The rating you are referring to are typically based on tournament or rapid games, where the limited time induces the human players to mistakes that the computer capitalizes on. Given enough time or with a “blunder check” option, the best human players are still strategically stronger. In correspondence chess, where the is much more time at disposal, the human players can still improve the computer suggestions.

Source: I’m a correspondence international chess master

Yeah I was thinking about classic or standard time controls. The last big cyborg tournament a few years ago I remember a computer coming in 1st and 2nd.

I wasn't thinking about correspondence but what was the latest large cyborg correspondence tournament?

I don't know the last one but I recall the matches of Hydra chess machine [0] in the early 2000s against GM Adams in tournament condition (5½ to ½ for the machine) and against GM Nickel in correspondence condition (2 to 0 for the human). Both Grandmaster were top players in their relative field so it showed very clearly how the time limitation impacted the competitive results. Nobody in the chess elite would claim that Hydra understood chess better than GM Adams but still he lost resoundigly due to the inevitable mistakes caused by the relatively fast time control.

[0] https://en.wikipedia.org/wiki/Hydra_(chess)

But wasn't Hydra 2005 ~2800 ELO where as the current best chess engines like Leela Chess Zero or Stockfish are ~4000 ELO?

Just realized that correspondence chess is cyborg chess, I didn't know computers were legal in correspondence chess, but it makes sense now. Reading about it, it sounds like it's less about knowing chess, and more about understanding the applications you're using.

Chess engine ratings are not immediately comparable to human ratings as they are extracted from different pools. Hydra played relatively few games so its rating estimation was somewhat approximate but it was clearly "superhuman" (GM Adams was n°7 in the world and only scored one draw in 6 games). Today Stockfish is awarded a rating of about 3500 [0] with a typical PC hardware but this rating comes from matches between engines and not with humans.

Regarding the argument of "knowing chess", it depends on you definition. I often use this analogy. Correspondence chess is to tournament chess what the marathon is to track running. They require different skills and training but I guarantee to you that a lot of understanding is involved in correspondence chess, possibly more than in tournament chess.

[0] https://ccrl.chessdom.com/ccrl/4040/

Oh I assumed it required quite a bit of chess knowledge and skill. But I assume what differentiates a good from great player isn't unassisted chess ability. Basically I'm wondering how well do correspondence ratings track with unassisted ratings. It was my understand they don't track really well at the higher levels of correspondence chess.

Are they using the on-demand price instead of the preemptible price? It seems like the sort of job that can run on preemptible machines, just because it's a batch job. Also, should the cost really be calculated using public market prices at all, as opposed to the running costs of the TPUs? It is not guaranteed at all that the opportunity cost to Google of using all those TPUs is equal to the price that you or I would pay Google to use them. I understand it cost a lot, but I'm not convinced by the headline figure of $36M.

The article actually addresses that. The precise number is not the point but the ballpark is:

"In terms of actual cost to DeepMind (a subsidiary of Google’s parent company) to run the experiment, there are other factors that need to be taken into account, such as researcher salaries, or that the quoted TPU rate probably includes a healthy amount of margin. But for someone outside Google, this number is a good ballpark estimate of how much it would cost to replicate this experiment."

KataGo and Leela Zero and all the other AIs certainly didn't cost that much (the people running them wouldn't have had that much money and resources) and are probably stronger than Alpha Go Zero. I don't think it's at all fair to say this number is a good ballpark estimate of how much it would cost to replicate this experiment. It's wrong as a calculation of Google's costs, it's wrong per the title How much did AlphaGo Zero cost, and it's also wrong as an estimate of the cost of replication.

> KataGo and Leela Zero and all the other AIs certainly didn't cost that much

And you base that claim on what exactly? Leela Go is trained by the community, which donates self-play resources. Just because you outsource your cost to volunteers doesn't mean it's free!

In order to get to a realistic estimation, you'd need to get the average cost for electricity, hardware cost (proportionate to use), and of course opportunity costs.

Since you cannot do that, I'd argue that you have no clue what the true training cost of these projects compared to on-demand/cloud costs really are.

Leela Zero used some commercial providers for training. Mostly, we used the free offers for new GCP/AWS/etc members, so it's only a certain fraction of course.

To provide an order of magnitude: There are about 20m training games. Iirc a V100 could complete one game in something like a minute. So that's 300k-ish hours of value. Obviously, while V100 were the fastest, other GPUs were more cost efficient.

Go is a small community, and there's just no chance they managed to donate $36M worth of anything, whatever the electricity and hardware costs, that's just too much money. KataGo's page says it was developed using resources donated by Jane Street, the company that its developer worked for; the general magnitude of numbers is also way off: sure they're a quantitative trading firm, but it's implausible that they'd donate $36M to develop a go AI.

The cost estimate was based on what what you and I would have to pay if we were to train AlphaGo Zero in 40 days on the given hardware using the reported number of games and resources.

Just replace the self-play TPU resources with commodity hardware or even just cheaper GPU compute providers and you'd reduce cost 10-fold by just not using TPUs. Same goes for the number of self-play games.

That still doesn't change the estimate itself. IF the other projects would've used Google TPUs, they'd well have been around the same cost as the estimate.

I really don't understand what you're trying to argue against here.

Replicating the experiment would normally mean running the same code for the same length of time, which is what the article measures. If you're using different approaches like KataGo, it's not a replication any more.

How is Leela Zero stronger if it's the same calculations and less compute time?

True, but "giving a ballpark estimate to replicate the experiment" is quite different to "How much did it cost"

One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

Minor improvements to Google Books OCR might not be worth much, whereas better search result scoring would be worth lots. An automated system would decide where it was most efficient to spend the TPU's. Management would set how many dollars a 10% performance improvement was worth.

I'm sure the reality is a bunch of middle managers arguing over why their team deserves them more than another.

> One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

That's a short sighted, immediate benefit or bust mentality. Not to mention that projects have a ramp-up time where they are not profitable yet, but still very valuable strategically.

You mean better ad revenue, because search results are getting worse and worse. So search results can't really be google's primary focus anymore.

Like you, I suspect there are substantial bulk discounts available if you're using tens of millions of dollars worth of these.

I don't know how much less, but if you were to do fully pre-emptible at this scale I wouldn't be surprised if you could get it down to one-tenth the price. I wouldn't suspect the same of other more generic resources like CPUs that have a much lower price point to begin with, but the TPU sticker price seems very high with lots of headroom.

For me I think it's very interesting just to get a ballpark order-of-magnitude estimate. I'm sure the cost isn't much less than $3.6M and so the underlying story doesn't change that much, i.e., this is not something accessible to hobbyists.

Achieving a new breakthrough in computing is often very expensive. Deep Blue is estimated to cost IBM over $100 million over a decade [1].

And in comparison to large tech company R&D budgets, the amount cited in the article is a drop in the bucket. Consider the fact that Google spent $26 billion in R&D budget in 2019 alone [2]. Microsoft spent almost $17 billion [3].

[1] https://www.extremetech.com/computing/76552-project-deep-bli...

[2] https://www.statista.com/statistics/507858/alphabet-google-r...

[3] https://www.statista.com/statistics/267806/expenditure-on-re...

Note that everything in software development is R&D. Building (not operating) Gmail and Android and Office and Azure are R&D.

Did you mean to say “not everything”? Much may be, but certainly not everything. As you said, operating or maintaining software is not generally considered R&D. Development without the Research component is just Development, not Research and Development.

Right. That's the Development part of Research and Development.

And they get tax breaks, so we are motivated to class as much as possible as R&D.

I thought R&D was mostly research so I found his comment clarifying.

Another way of thinking about how efficient the brain is: By the article’s numbers, about 5.5 million TPU hours were required to train the machine to play as well as a Go champion.

A Go champion might have trained for 8 hours a day, for 15 years (age 5 to 20). That is about 40 000 hours.

In other words, machines required 137 times longer to learn the game, and at twice the power consumption! There is still a lot of room for improvement.

Go champions don't learn from zero. They learn from teachers, books, and playing against each other. This knowledge is built over hundreds, or thousands of years.

Alphago didn't learn from zero either. It has a pre-processor that identifies sets of patterns with known features, and also:

"AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves".

That's for an earlier system (which also used less compute).

AlphaGo was followed by AlphaGo Zero (which is the topic of this article) which did not use the process that you describe, it used only the rules of the game and the winning condition.

Oops, my mistake. Thanks for the correction.

AlphaGo != AlphaGo Zero

Yes! So perhaps one way to make the machine more efficient, is by one of pre-programmed “general” models, that can be attuned to a particular problem in a much shorter time?

And long collaborative study sessions.

This comparison is not entirely fair because the human brain also benefits from priors baked in over the entire course of evolution.

That's a pretty big claim. One could argue that the topology of the brain is a prior, analogous to the architecture of a neural net. But considering that we really have no idea how learning happens in the brain on a large scale, you really can't say.

I believe that there's a reasonable (but unprovable) assumption that any games which humans usually play - because the rulesets are learnable and interesting for humans - implicitly rely on priors of human brains and behavior.

The space of possible games is huge (infinite?), but only a tiny subset of these games could reasonably become a popular game for humans.

E.g. it's not an arbitrary random coincidence that the scoring rules for each grid intersection in go are the same (I mean, it could vary in an arbitrary pattern), it ensures that the ruleset is small enough so that humans can learn it.

It's not an arbitrary random coincidence that the playing of go involves pattern recognition on some level, since that's what we're good at and find interesting in many games.

It's not an arbitrary random coincidence that in Mario game after jumping the sprite falls back down eventually; that's reusing the priors from real world physics.

Games are designed to be fun and playable by humans, this doesn't seem surprising

I don't remember the name but I definitively saw attempts to build a general AI that was designed to solve 50 different games. There was one long learning phase where the AI learns mechanics that are common to all of the 50 games and then there is a much shorter learning phase that is just specific to a single game. Same attempt was made with a Minecraft bot. First it just learned how to live and interact in a vanilla world. Then it was fed twitch pvp livestreams and finally it was placed in a pvp server. It didn't perform super well but watching the livestreams was definitively more efficient than learning combat from scratch.

The whole atari suite has been solved recently by a single algorithm: https://deepmind.com/blog/article/Agent57-Outperforming-the-...

Does anyone have an idea about the advances in research about this topic i.e. human intelligence and how learning happens in the brain?

I know Josh Tenenbaum from MIT [1] works on this, see for example :

- How to Grow a Mind: Statistics, Structure and Abstraction [2]

- Steps towards more human-like learning in machines [3]

Wondering if there are other researchers exploring similar questions.

[1] http://web.mit.edu/cocosci/josh.html

[2] https://www.youtube.com/watch?v=97MYJ7T0xXU

[3] https://www.youtube.com/watch?v=WTK6eaSVTjo

There is very little that we understand about the larger picture of how learning happens in the brain. We have some understanding of how learning happens on very small scales, I'm talking plasticity at a single synapse. But even restricting ourselves to a single synapse, there is much we don't know. At the least, it's clear that synapses and dendrites have impressive computational capacity but making detailed measurements is currently beyond the reach of our experimental apparati. We can measure signals in dendrites and synapses, but not at a high enough spatiotemporal resolution to answer the big questions.

And we're starting to bump against fundamental limits of these apparati. Most modern neurobiology uses genetically encoded fluorescent sensors read out by rather expensive 2-photon microscopes. The sensors aren't as crisp as one wishes - there is a huge subfield dedicated just to deconvolving these fluorescent sensor readings into what the neurons are actually doing. And there's only so much further the 'scopes can be pushed.

The point being: it's really quite difficult to overstate just how overwhelmingly complex the brain is and how far we are from understanding even little really specific bits of it, let alone the whole thing.

That being said, the redwood center for theoretical neuroscience does some excellent work bridging the cutting edge of theory neuro and machine learning - towards the larger picture of how the brain works. You might be surprised at how 'rudimentary' the questions we're trying to solve in that domain are. Most work focuses on the visual system - far easier to study something when you have a good idea of what it's supposed to do (as opposed to, say, cortex).

I am not aware of anything resembling a grand theory that makes experimentally verifiable predictions. I am pretty sure I would have heard of such a thing if it existed.

True, but a spider doesn't figure out how to build a web all by itself. That's to say, there is a lot that evolution can provide us with as a prior.

Yes it is clearly possible to encode behavioral priors, there are many examples from different species.

But humans aren't spiders. We've got the big brain, it's kind of our thing

Not just the topology of the brain, but the environment is also important. Human life is more diverse than that of AlphaGo, we can borrow concepts gained while doing something else. Should we count those external tasks as part of the learning to play Go?

Yeah only the last few human layers needed to be trained for the GO expansion pack, all the early layers were frozen during GO training.

OTOH, I expect that avoiding human evolutionary priors is necessary for superhuman performance.

So did the machine, albeit indirectly.

But there are also many other people spending time studying Go who didn't reach that level. We ran all that studying in parallel and then selected the best person by running a world championship. You can't only count his effort alone.

True. But that single brain, in that person was that efficient. And represents the theoretical gap in efficiency to the machine.

There are for example, other NNs also being trained to play Go, should all unsuccessful attempts be counted into the machine total? The comparison is almost impossible then.


>KataGo's latest run used about 29 GPUs, rather than thousands (like AlphaZero and ELF), first reached superhuman levels on that hardware in perhaps just three to six days, and reached strength similar to ELF in about 14 days. With minor adjustments and a few more GPUs, starting around 40 days it roughly began to match or surpass Leela Zero in some tests with different configurations, time controls, and hardware. And finally after about four months of training time, the current run may be wrapping up fairly soon, but we hope to be able to continue it or begin another run in the future.

> In other words, machines required 137 times longer to learn the game, and at twice the power consumption!

This comparison is a bit unfair. Humans are the result of evolution on a grand scale. Human Go is the result of millennia of gameplay. A human does not become grand master in isolation.

AlphaGo is the result of an evolutionary tournament style competition of a much smaller duration and breadth. AG is also a population, not just one agent, and it would be silly to take just one agent and evaluate it on its own as if it could be created without the others.

Should we include the human costs as well in AG, why just the electricity and CPU?

AlphaGo is the result of human evolution too.

Human evolution is a result of mammal evolution, and so on. The question is how to compare in a fair way?

It'd be really interesting if a research group could calculate an entropic calculation on how efficient training any given neural network would be. As in what is the thermodynamic limit of the most optimal NN training could be in terms of watts per bit trained. My hunch would be that human brains would operate close to this limit. At least in our standard environmental conditions. Based on how near optimal biomaterials are in terms of strength to weight ratios it wouldn't surprise me much.

I think the problem you'd find is that "bit trained" is probably highly non-trivial.

For example, I expect that the training required to go from 7-year-old child to Go grand master requires a completely different number of bits of information than the training required to go from blanks-late NN to NN Go Grand master. I also suspect that the difference in what is being learned may well dominate the difference in training efficiency. Both the prior knowledge and the mechanism of learning are so different that I doubt you could get a meaningful comparison based on current understanding.

You should remember that we have no idea basically how human beings actually learn things, and no idea how much prior knowledge we have encoded. Just for an example, I once saw a documentary that claimed chess grandmasters seem to recognize valid chess positions using the parts of the brain that usually recognize faces. Assuming that was true (I'm not claiming it is) perhaps a part of their chess learning consisted in taking a built-in face recognizing NN and training it to recognize chess boards. How much did the built-in knowledge of recognizing faces help? I don't think it would be possible to calculate.

Agreed, after writing that I realized that "bits" of training is a pretty poor metric. Especially in lossy NN as compared to normal computing. Likely researchers will be busy for decades defining and narrowing down the concepts in the field before useful values could be determined in terms of information theory.

A huge question I didn't even realize was "bits don't relate very directly to a NN ability to perform a task".

Can't help but remark that "7-year-old child" is not a valid go rank. Some 7 year olds are surprisingly good at playing go :)

True, I should have probably said something safer, like 1-year old child :)

Rather than Go champion I would rather use the term Go professional. There is a difference between being a professional and winning professional tournaments.

Now, the bot has many advantages. It never sleeps, never gets distracted, never dies and can be copied to another system to obtain a copy of the bot with the same playing performance.

The bot is also more accessible. Any player now can train with a bot, all day if you want, for almost free. You cannot do that with a professional.

Human does not learn Go from the scratch on himself. He's using teachers, books which present compressed knowledge which was crystallized from many millions of human hours.

If you would ask someone to learn Go, but only present him rules of the game, he'll likely be weak player (although probably with some original strategies).

I might be wrong, but I think this cost calculation is way off:

Their running cost estimate of a single TPU in a machine with 4 "TPUs" is based off the price of a cloud TPU v2-8, but a v2-8 is actually 4 ASICS on 1 board.

Also, because of the date of publication being around the time v2s were announced, and the fact that the TPU is only used for inference and GPU is used for training, I think self play was likely done on TPU v1s, which use 5x less power per ASIC and so are likely much cheaper

I also think the way they calculated the number of TPUs required is wrong, it looks like they assume 1 machine with 4 TPUs makes 1 move in 0.4 seconds, but since making 1 move only requires a forwards pass through a moderately sized CNN with 19x19(tiny) input, 1 TPU should be able to make thousands of moves in parallel per second.

Making one move requires 1600 MCTS playouts to explore the game tree, so it's a 1600-1 correspondence of "forward pass" and "move played".

Alpha Go Zero*, which was trained from scratch, without human games.

I've also heard rumors that AlphaStar (https://deepmind.com/blog/article/alphastar-mastering-real-t...) was essentially put on hold because it was too expensive to improve/train. The bot wasn't able to beat StarCraft champions and _only_ got to a grandmaster level.

Alpha Go Zero used a combination of deep learning within a classical AI Monte Carlo simulation framework. Without a similarly effective framework its not surprising that Alpha Star wasn't able to achieve similar success. I've watched a lot of Alpha Star replays and the lapses in forethought glare through pretty regularly, though other aspects of its judgement (not just reflexes!) seem frankly super human.

Why was MCTS (or some search variant) not used in alphastar ?

(Sure, u need to somehow roll forward and rollback the StarCraft world, but for Atari using MCTS was shown to be an order of magnitude more efficient )

I have also seen comments that the search width is too large, or maybe academic purity consideration?


At the last Blizzcon they had it around. The setup wasn't ideal, so Serral (won world finals in 2018, reached semifinals in 2019) wasn't really happy with how he played, but it won

This was also a version where they'd worked on preventing its ability to micro at quadruple digit apm

That match was still arguably not fair for the human because of the imperfect input and outputs he had to deal with. In the end I'm not sure it makes sense to put artificial handicaps on the machine to leave the human a chance. It's like a swimming race between a fish and some land animal but the fish is not allowed to use its fins. Sure the other guy has a better chance to win, but, what does that measure really?

The main issue is that they need to retrain their bots for each new map, which would mean excessive training every time the map-pool changes (e.g. every 3 months or so IIRC).

I'd even argue that they missed their goal by a long shot if their system isn't able to play arbitrary maps - every human player can do that no problem.

Interesting ! i didn't understand why they stopped the alphastar project. They pretended they reached their goal, but clearly haven't.

on a side note, I'm still disappointed to see that they didn't improve on the EAPM restriction to truly show they can beat humans without machine-y advantages.

Still what they have accomplished is a miracle.

The version that later went on ladder anonymously used human-like E/APM.


playing at grandmaster level is a pretty astounding achievement.

The amount was removed from the submission title, which sucks if you're like me and don't like to visit yet another possibly JS-heavy site and drain your battery.

For others: It's $36M.

Are you working off one solar panel on the way to Mars? First time I'm hearing battery drain as a reason to not visit a js site.

Phones. F'in phones man :D

I was of half a mind to un-upvote because I liked knowing the number but didn't care enough about the article. Now I don't think it's as much worth it for this to be higher up.

Also nobody mentioned the title is inaccurate so I guess it's just pedantic "thou shalt not change zhe title" rather than "title was misleading/clickbait"...

I'll quibble with a little bit of this.

"AlphaGo Zero showed the world that it is possible to build systems to teach themselves to do complicated tasks."

It didn't do any such thing. The game of go has a huge number of potential moves and outcomes, but the rules themselves are trivial, the board position can be measured in a handful of bytes and gameplay always and only progresses in one direction. And judging a good vs bad outcome is just a matter of comparing two numbers.

Go is challenging and interesting for humans, but it's not remotely as "complicated" as driving a car or translating a language.

It’s a shame that this ‘Next big thing’ is the complete opposite of the internet. Instead of opening up the world for anyone to create things, letting small companies compete with large, it is only going to concentrate power with the richest companies and leave small companies unable to get involved.

“Well sure, AlphaZero looks impressive but I predict that within one hundred years, deep learning systems will be twice as powerful, ten thousand times larger, and so expensive that only the five richest kings of Europe will own them.”

I wonder if the code and network weights will ever see the light of day. I wonder what the eventual value proposition of working on this sort of stuff is. I suppose they are just going to try to apply the algorithms to better things.

I've been interested in the application of AlphaZero to chess. It's sad that this many resources were devoted to something which we can't even use to play chess as of now. Leela (the open source reengineer) is really strong, but the crushing results presented in the AlphaZero paper never materialized. And this article just shows how hard they are to replicate.

>>" I wonder what the eventual value proposition of working on this sort of stuff is."

It seems to me that, if you only take it as a marketing operation, it has been already very valuable.

I'd really love to know how big the marketing impact on IBM Watson from this was. Somehow they managed to put the idea that "Watson is the best available AI" so well in the heads of then general population that 50%+ of my non-tech friends somehow thought that Watson beat the Go pros when I was talking with them about the topic when it was in the news.

> Each move during self-play uses about 0.4 seconds of computer thinking time.

> Over 72 hours, 4.9 million matches were played.

One of this claim must be incorrect or misinterpreted, I highly doubt they used so many TPU's as the article claims. That would be not only impractical but also it would raise a lot of other issues like networking, disk speed... etc...

My statement is not against this article, if anyone can confirm they used so many TPUs in parallel feel free to post it

72 hours are 259200 seconds.

Playing 4.9 million matches of ~100 plies each at 0.4 seconds per ply is 196000000 seconds.

That's < 1000 TPUs. Sounds big but not too-large-for-google big. But other comments here say that the 0.4 second number is also wrong (and in fact significantly lower).

The (synchronous) 0.4s per move number is misleading (and wrong), that's not what the paper is saying. The "footnote 1" of the article is wrong.

Our main compute doesn't go towards machine learning, but we do rely heavily on GPU power. I recently had to come up with the figures for us to invest in an expansion of our compute power, and it turned out that buying the machines ourselves would be cheaper than renting them from Google in 3-4 months.

We don't run on those fancy V100 cards though, just regular old gaming cards suffice, and I suppose if we bought the "industrial" nvidia versions it would a take a bit longer to recoup, but still definitely within the year.

Anyway what I'm saying is that it's probably possible to to this a lot cheaper than 36M, though maybe not in such a short time. Our startup is extremely cash intensive, and I bet machine learning companies are as well (I suppose machine learning experts aren't cheap ;)), so if we can put in some work and safe a big portion off our hardware costs that really goes the distance.

How Google started out: Run servers from a shack, using commodity PC hardware

How modern startups start out: Spend $50,000/month to run hundreds of microservices on a managed Kubernetes cluster

How else do you attract venture capitalists and convince them to throw millions at you? You need all the buzzwords and the latest tech ;)

I remember my last chat with one of such guys. They insisted that the company wasn't up-to-date because we didn't run our app inside containers and didn't develop our own AI/ML systems...

probably the memory requirements mean that you do need (multiple) V100s though

Now, consider that this is the cost of the final model reported in the paper. This doesn't account for all the iterations of trying out e.g. different model architectures, hyperparameter sweeps, etc. The true cost of the experimentation is likely at least an order of magnitude higher.

Does Sarbanes Oxley apply to zero rating ML costs? Alpha go might have unfair kyu ranking, if Google don't have to "pay" to acquire rank. (95% joking)

> The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

Given the experiment lasts for just days, this actually sounds pretty impressive I think.

Many humans studied the game for a big portion of their lives in order to get Go knowledge where it is.

I'd like to see an AI play Monopoly (the board game) against CEOs of large companies.

I wonder if there's some kind of software that takes an advantage of an AI to teach non-beginner players Go. E.g. you could play against the bot and then the AI would translate your mistakes into what you can improve upon.

This is a lot of money.

However, if you want to reliably make an AI the best in the world at a range of complicated tasks, can you reasonably expect this to be cheap?

Wondering when researchers will switch from "race to the moon" mode to looking at better optimization techniques instead of just throwing money at the problem.

I know some companies are doing that, but I think looking at AlphaGo or AGZ and making it go faster should be an interesting problem in itself.

KataGo optimized AlphaZero and achieved 50x compute reduction.


Can you share the names of those companies, or some of their projects? At least that way people who thinks like you can follow those efforts and try to give them more visibility.

The interesting part to me, rather than cost, is Energy usage.

>The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

But the problem is this "brains" unit on AlphaZero doesn't seems to take into account of GPU, CPU and Memory involved. It only took the TPU numbers.

Then there is another problem.

> a TPU consumes about 40 watts,[1]

The TPU referred to was a first Gen TPU built on 28nm running at 40W, more like a proof of concept. Currently Google is with Cloud TPU v3 [2], The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance. And each TPU v3 is actually a four chip module. [3]. If a single chip is 100W that is 400W per TPU.

Edit: Turns out Wiki list TPU v3 as 250W. [4]. Not sure if that is 250W per chip or 250W for 4 Chips.

That is on the assumption they are very high powered and hence would require liquid cooling. Although that might not always be the case.

So adding CPU, GPU, Memory, and TPU figures. That original estimate of 12,760 human brains may be off by a factor of 10 if not more.

Still pretty impressive. Considering we now only get about 1.8x improvement with each generation node. We would get about 19x by 2030. ( Assuming the same algorithm ). Which means AI is good, but human brain on its own is still very much magical in its efficiency :)

Correct me If I am wrong on the numbers.

My other questions is, that was how much energy it used to learn Go. But what about energy it used during the Game?

How would AlphaGo Zero perform if it was limited to 20W?

[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

[2] https://cloud.google.com/blog/products/ai-machine-learning/g...

[3] https://techcrunch.com/2019/05/07/googles-newest-cloud-tpu-p...

[4] https://en.wikipedia.org/wiki/Tensor_processing_unit

I would be more interested in how much LCZero cost?

the article doesn't seem to consider cost of hyperparameter optimization prior to the final training...

Ahem, "one time cost"

A: $400M—to acquire DeepMind

You need 35$m to beat the best Stockfish engine which can work on a small computer. Who won?

It is estimated to be 36 million for someone else to train AlphaGo Zero, assuming they use Google TPU instances and pay the sticker price.

Google isn't operating with that cost, unless we assume that they are prioritizing AlphaGo to the point where they lose such customers 100% of the time.

It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

> It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

Not quite, power is quite expensive and basically all modern computers use far less power at idle than going full bore saturated with multiply-add instructions and perfect memory streaming.

Having said that, I agree that there is a substantial cost efficiency gain if they can schedule it during periods of inactivity.

> This accomplishment is truly remarkable in that it shows that we can develop systems that teach themselves to do non-trivial tasks from a blank slate, and eventually become better than humans at doing the task.

"non-trivial" is a bit of a red herring here. Playing go is pretty trivial compared to something like walking or scratching your face. Winning go may be non-trivial compared to those in some ways but it is very trivial in comparison in other ways.

This is... wrong?

Scratching a face is a matter of fine motor control. [1] is an example from 2011 which did this, as well as face shaving.

Walking is slightly tricky because it's such a dynamic system, but is now human level[2], and there was never really any question that it would be possible.

On the other hand, the state of the art in Go systems before Alpha Go (the one trained off games, not Alpha Zero) couldn't beat competent amateurs. No one had really considered the learn-from-zero-knowledge approach of Alpha Zero even for easier games like chess.

[1] https://www.engadget.com/2011-07-14-robots-for-humanity-help...

[2] https://www.youtube.com/watch?v=_sBBaNYex3E

Playing go at that level is non-trivial compared to walking because (a) most humans can walk, but not even the best human go masters can play go at that level; (b) we had algorithms that allow bipedal robots to walk long before we had algorithms for playing go at that level.

Do we have algorithms that allow bipedal robots to walk at human level? Or run at, let's say, 10th grade standard student level?

Bipedal walking and running on uneven terrain has been demonstrated years ago - Boston Dynamics has a lot of nice videos; e.g. here's a four year old video on walking https://www.youtube.com/watch?v=rVlhMGQgDkY , here's two year old videos on running https://www.youtube.com/watch?v=vjSohj-Iclc and https://www.youtube.com/watch?v=LikxFZZO2sk - perhaps it's debatable if it's "full human level" but it seems sufficient to me for most purposes. IMHO any improvements to that would be mostly for sensors (seeing what terrain is there even in poor conditions e.g. fog/rain/snow) and modelling environment (understanding which objects might break or slide if you step on them), not walking/running as such.

The main problem for bipedal robots that makes them still impractical is the hardware expense (wheels are simpler and cheaper) and the power supply required, so for most use cases it's more efficient to use something other than a bipedal robot and there's limited business application and future revenue in scaling up research demos of bipedal walking to practicality, so most people who are working on walking algorithms are doing so in simulated virtual environments (where we have algorithms that can learn walking and running "from scratch" through experimentation) and not building very expensive hardware.

> it seems sufficient to me for most purposes.

Current self driving car technology is sufficient for most purposes, except to actually drive on roads. So for those walking robots, can they run or even walk through a crowd without hitting people? A normal 15 year old human can do it, and that is the level you need to be to release it among people.

I'll be impressed when they can finish an orienteering course.

I was aware of the Boston Dynamics robots, but it always seemed to me that they move very slowly compared to walking/running humans. I suppose that may just be a precaution on their part, the jumping and gymnastics are certainly impressive otherwise.


Here's Boston Dynamic's robot doing Parkour and gymnastics: https://www.youtube.com/watch?v=_sBBaNYex3E

It's not national level gymnastics but it's better co-ordinated than most humans.

I was aware of the Boston Dynamics robots, but it always seemed to me that they move very slowly compared to walking/running humans. I suppose that may just be a precaution on their part, the jumping and gymnastics are certainly impressive otherwise.

It's not even AI - all the moves are a pre-programmed sequence. That's something Hollywood puppeteers could do 40 years ago...

This is both correct and incorrect.

Boston Dynamics use control-systems style robotic control. This is different to ML-style control where the system learns to perform tasks.

But that's different to "pre-programmed sequence". They don't program the individual servo movements for each movement - instead they give it the motions to perform and the control-systems balance the robot automatically.

(This is what the OP implied by the word "algorithms" anyway right?)

sorry if it is off topic but I want to learn alpha zero from beginning , I do have little understanding of Machine & deep learning including vision recognition. Unfortunately I don't able to understand how monto Carlo tree is used for decision making. where I can start, what shall I learn so that I can learn alpha go (or OpenAI Five - Dota 2 bit).


Misleading numbers, and wrong calculations. The TPU and CPU cost them almost nothing as they use and build them anyway, and renting them out for this PR stunt just cost them the missed rental time, if customers would really pay that much. Maybe around 20.000. Energy cost? I don't see much additional costs as those machines run all the time, regardless if improving the model or doing something else.

I bet the much higher cost was the PR team, including the film team, press support, TV team, travels, inviting the expert Go players, building the stage, and such. Estimated 100.000.

Not counting the man hours, they were just doing their normal job.

The OP talks about how much would’ve cost a third party to replicate the experiments. That’s what I get from it anyway.

"renting them out for this PR stunt just cost them the missed rental time"

Because renting them out generates no revenue, right?

"Maybe around 20.000"

At least the article used a formula for the calculation. You just picked a number at random.

You need actual customers paying that much, not hypothetical ones. That was my educated guess.

And the title was "His much did it cost" not how much it would cost.

I think it would be helpful for discourse if you read the article in full.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact