Hacker News new | comments | ask | show | jobs | submit login
Show HN: AlphaZero Science paper (deepmind.com)
200 points by Inufu 76 days ago | hide | past | web | favorite | 89 comments

It's a shame they still played against 2016 Stockfish (Stockfish 8), when Stockfish 9 or Stockfish Dev were available (Stockfish 10 is out now, but only very recently, so I can understand why they didn't use it).

Their results show that they are only just barely stronger than Stockfish 8, but Stockfish 9 and 10 are stronger than 8 as well.

EDIT: Also meant to include a shout-out to http://www.lczero.org/ which is an open source implementation of AlphaZero chess. Here is their forum post for this paper: https://groups.google.com/forum/#!topic/lczero/TfmaNHI99gk

SECOND EDIT: I was wrong! They did play against a newer SF than 8, specifically, SF at this commit: https://github.com/official-stockfish/Stockfish/commit/b508f... , which was about 2 weeks before SF 9 was released, so maybe it is close in strength to SF 9.

The paper was in peer review for a year, so it couldn't have compared against those.

AFAICT based on dates, they chose nearly the latest version they could have.

The most likely reason why SF9 or SF10 wouldn't do much better is that the Elo model isn't applicable: imagine if it's not about two engines doing similar things at different levels of competence, it could also be that AlphaZero's evaluation function is sufficiently different that it can exploit mistakes in SF's evaluation function. Then Elo(SF10) = Elo(SF8) + 100 wouldn't matter so much, it would give you SF's "strength", but not probability of winning against AlphaZero.

I believe you are correct and the not-significantly-changing win rates vs SF8 and SF9 imply that.

I'll never forget the last round of games with everyone gushing over AlphaZero's wonder human-like play as it reached a drawn position only for Stockfish to blunder a whole queen with nobody remarking on it. I played along while watching the Danny King video:


Recent Stockfishes recommend many of the quiet, strategic moves that he seems particularly enamoured with.

It said it was winning games with 1:10-100 time control. Why would you say thats only just barely stronger?

Time handicaps are supposed to correspond to handicaps in computational power. But here according to the paper AlphaZero is running on ~100 teraFLOPS, but the stockfish machine has about 1 teraFLOP (my best estimate based on https://www.microway.com/knowledge-center-articles/detailed-...). So 100:1 time control would be fair from a computational perspective (of course comparing the computational power used is more subtle than the rough calculation I did here).

AlphaZero used a few ( I think 4 TPUs) on top of the same or roughly same CPU Count, which is not that outrageous. I think the most fair comparison would be to pick a configuration for AZ and Stockfish which uses roughly the same amount of power and is best suited for each one respectively.

As far as I know, FLOPS are mostly meaningless for traditional chess engines. They don't really use floating point operations much or at all.

I know, but it still gives you a rough measure of the amount of silicon used.

I don't think so. For example, GPUs have way more TFLOPS per area than CPUs.

The Elo graph in Figure 1 of the paper suggests it is maybe 30 Elo higher (eyeballing the graph, as they don't provide exact figures).

Performance against SF9 was only marginally worse than vs SF8.

To OP or anyone else at DeepMind: can you comment on why you decided not to release all of the games?

IMHO as a competitive scholastic chess player (former national U16 champion and top 3 world U10) and software engineer, it would significantly increase credibility of results. Not to mention would be fascinating to see the “ugly” games in addition to the ones handpicked by your team.

They've released over a 100 new ones here now.


That’s a step, but I still find it weird to release some but not all of the games. I’m trying to come up with a logical reason other than they have something they don’t want people to see in the rest of the data, but so far I’m failing.

It doesn't necessarily come under "logical reasons", but the DeepMind team have pretty strict rules on data retention, chances are there's a debate in the company about it.

There's a book coming out in a few weeks with more details. The authors were provided with all the data.

What would you do with 1000 games that's not possible with 100?

I would be able to accurately gauge AlphaZero’s true chess ability, strengths and weaknesses. Right now it’s impossible to do that given a curated selection of games. It would be like evaluating someone’s coding ability based on a few samples of their absolute finest work rather than the code they actually write day in and day out.

> AlphaZero and AlphaGo Zero used a single machine with 4 first-generation TPUs and 44 CPU cores. A first generation TPU is roughly similar in inference speed to commodity hardware such as an NVIDIA Titan V GPU, although the architectures are not directly comparable.

> The amount of training the network needs depends on the style and complexity of the game, taking approximately 9 hours for chess, 12 hours for shogi, and 13 days for Go.

How much would that much computing power would cost on something like AWS? That's a lot of hardware, but if you're only renting it for 9 hours... the beefiest EC2+GPU instance Amazon has currently is p3.16xlarge, which has 8 Tesla V100 GPUs, and 64 (virtual) CPUs, for $25/hour on-demand. My understanding is that a V100 is slightly more powerful than a Titan V, so does that mean you could run the Chess training (at least the AlphaZero side) for $225? That seems impossible?

EDIT: pacala below pointed out that the hardware listed was just for running AlphaZero against Stockfish, not for training it. Digging through the preprint itself, they say that for training they used:

> During training only, 5,000 first-generation tensor processing units (TPUs) (19) were used to generate self-play games, and 16 second-generation TPUs were used to train the neural networks.

So that would be... a lot more.

The most expensive part of training AlphaZero is creating the training dataset by self-playing tens of millions of games.

Ah! Okay, that's what I think I misunderstood. The 4 TPU + 44 GPU configuration was only for running AlphaZero against Stockfish, not for training it. Phew! That seemed unbelievable, I was hoping someone would see what I was missing.

And the 9 hours was for training, but I don't think the article linked says on what.

First game is cheap :P

Subsequent games are expensive because scaling becomes an issue

Keep in mind that you don’t train and run the system only once during development. How many resources were used just for hyperparameter optimization?

It would be nice if A0 participated in at least one public computer chess championship, Chess.com's CCC or TCEC. That's a level playing field and all games published.

AlphaZero was a great concept and execution, but if we have to judge its relative strength, it should compete fairly. 4 TPUs (~ 4 Titan V) + 44 cores for AlphaZero vs only 44 cores for Stockfish pre-9 may or may not have put Stockfish at a disadvantage.

BTW, current, presumably balanced, TCEC 14 configurations are:

Non-GPU Server: CPUs: 2 x Intel Xeon E5 2699 v4 @ 2.8 GHz, Cores: 44 physical, RAM: 64 GB DDR4 ECC

GPU Server: GPUs: 1 x 2080 ti + 1 x 2080, CPU: Quad Core i5 2600k, RAM: 16GB DDR3-2133

TCEC GPU server looks more modest than what A0 authors used to "beat" SF.

The TCEC GPU overheated during the games, without anyone knowing until later. Then they underclocked it dramatically, as there was poor to no cooling in the datacenter they rented it from.

That reason alone, if I were Deepmind, I would not be included in those competitions. It would be horrible press for them, that would involve a ton of human error out of their control.

I believe it was TCEC season 13 which put GPU engines in disadvantage due to overheating, it was also the first season with GPUs. Right now there is TCEC season 14 in progress, did I miss anything?

They also beat it with only 1/10th the time that stockfish got. That should more or less negate any advantage of the extra processing power.

On paper, the results are amazing, but researchers are always biased to produce positive results.

BTW, a match between StockFish 10 and LeelaChessZero, an open source implementation of the same idea, will be organized in a couple of days. From the LC0 blog:

Lichess.org will host a match between the mighty Stockfish 10 and Leela. It will be a 6 games match with time control of 5'+2" with ChessNetwork commentary. Games will be played on 15th December at 17:00 UTC.

Stockfish 10 will run on 64 cores 2.3GHz Xeon, while Leela will use the latest v19.1 Lc0 with 11248 network and will run on one GTX 1080 Ti + one RTX 2080 GPU.

Finally! I've been waiting for this for a year. I love that it learned from scratch without human bias and how it plays in a much more captivating style than alpha-beta engines.

Will you be answering questions?

How it is possible that a so high profile article says a so doubtful statement?

"Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try to account for every eventuality in a game."

Looking at the recent patches[1] for Stockfish it seems like a rather true statement? Most of the patches are related to how it evaluates a position; not how it prunes/searches the tree of possible moves.

Here's one random test which showed improvement: https://github.com/Vizvezdenec/Stockfish/compare/5c2fbcd...5...

    Bitboard b1 = double_pawn_attacks_bb<Them>(nonPawnEnemies) & b & attackedBy[Us][PAWN];
    score += make_score(90, 72) * popcount(b1);
And look at all of the magic numbers and logic in evaluate.cpp: https://github.com/official-stockfish/Stockfish/blob/master/...

[1]: http://tests.stockfishchess.org/tests

Position evaluation is very unlike to the article statement, where it looks like actual game dynamics are taken into account one after the other in order to make the engine stronger. Instead still traditional engines mostly rely on brute force. So a reader not aware of those things will have a wrong picture in her/his mind.

So this is... Sort of true? Alpha/Beta pruning relies on heuristics and exploits their power with search. The heuristics that make stockfish so strong are numerous, handcrafted and try to account for many notions of material.

In the current computer chess championships, there are Monte Carlo engines that use the search search strategy as AlphaZero with hand crafted heuristics, and they're doing ok. But they're not as strong as AZ, which learnt, by itself, what a good chess position looks like, starting from random play

It's not that far from the truth though is it? https://github.com/official-stockfish/Stockfish/blob/master/...

So they finally released more games?! Really looking forward to Kingscrusher or Chessnetwork covering more of these 210 games on YouTube: https://deepmind.com/research/alphago/alphazero-resources/

Daniel King is doing them now (https://www.youtube.com/watch?v=pFtY7gNRVRI), and so is Matthew Sadler for Chess24 (https://www.youtube.com/playlist?list=PLAwlxGCJB4NchyTBYik8F...).

Agadmator covered one already: https://www.youtube.com/watch?v=ZHfumZVPjVA


As a Go player, is there any way I could download and review the game records described in this paper?

I appreciate the link, but there doesn't appear to be a single Go game record at this link.

Where as the paper describes a thousand (or more) games played between Alpha Go Zero and AlphaZero.

You can download very interesting self play games from the open source clone: https://zero.sjeng.org/

My point was that that is the official source for all the games they released so you can see for yourself. If they aren't there, they aren't there. You can download the earlier games (http://www.alphago-games.com/) but apparently not these new ones.

> "If they aren't there, they aren't there"

That's why I've specifically asked the author. Perhaps if Inufu sees there's interest, more game records could be released.

If you've been following the past DM releases, you'd know they pretty much never release additional material when asked and it's pointless asking a junior employee (who hasn't answered any questions to begin with). WYSIWYG.

I have studied the previously released games. they are pretty enough to ask.

Even if (and here I agree with you) chances seem slim.

What is Deepmind's interest in not releasing the source code and weights for the neural networks?

I'm excited about their work but it seems that it would be much better for everyone if they just released their work openly.

Dunno if I just missed it in the paper, but is there an explanation for why alphazero is better at go than alphago zero?

I don't see anything I can try out? https://news.ycombinator.com/showhn.html

From the profile, it looks that it may be one of the authors. In that case this can be more like a AMA than a ShowHN.

Maybe a little borderline, but it seems like there are lots of resources to read and game data to play with, so we can probably spring for a “Show HN” in this case.

Why Show HN? It generally implies a single person or just a few people behind it.

Looks like the author works on the project

If they don't show up in the comment section they might as well not pretend to be part of the community.

Another set of games against an outdated Stockfish which appears to make moves that a recent Stockfish at any reasonably depth disagrees with. I've no doubt at all that AlphaZero has a much stronger evaluation algorithm than Stockfish, but I do wish they'd be a bit more transparent about its actual strength (although presumably they're selling access to it right now if you connect all the dots).

The paper used latest AlphaZero and Stockfish at the time of writing. It's been in peer-review for almost a year.

Well, let me be more blunt: there is zero chance they're playing fair here.

FWIW: Accusations of bad faith with no backup are not generally good commentary.

As someone said, it was in peer review for a year. That means it could not have been compared against stockfish 9 or 10 - they were not released yet. As someone above points out, they used in-development stockfish versions as well (2 weeks before sf9 was released), and from what I can tell, they used the newest version they could have.

If you have some good data that means you can make this statement, can you please cite it?

Otherwise, can you please not make claims of bad faith?

The strength of the language I used was motivated largely by the obviously bad-faith approach employed when they _first_ announced AlphaZero a year ago. They wanted to be able to say they'd created the strongest engine in the world, and they created an environment where it was hard to fail. I will admit the setup used in this paper seems more reasonable on that basis.

That said, I've checked out a few old versions of Stockfish (including the exact commit they used) and analysed the games in Table S6 in the paper. Stockfish still spots multiple blunders in its own play. Obviously these things aren't entirely deterministic, but it seems unlikely there was time trouble.

And again, just to be clear, there's little doubt in my mind that AlphaZero's evaluation of any given position is better than Stockfish's or anyone else's. I'm not even saying it can't reliably beat Stockfish. I just find it sad that the evidence of its overall strength continues to be wobbly.

Each program got 1 minute per move in the original match. Did it find the blunders within 1 minute?

If you really feel like their original setup handicapped stockfish, it seems like the best way to know would be to have that setup play your preferred setup and see what the difference is.

Are you a chess engine developer by any chance?

Yes, at a very amateurish level, and I contribute a lot of CPU/GPU time to LCZero. But I've also played all the published games through Stockfish at the published time controls, and the results make no sense.

Do you often make professionally slanderous statements in public forums?

To be specific, in a written comment on a public forum, it would be "libelous" and not "slanderous". But to be either, it would also have to be false, and at least in the US, known to be false to the author at the time it was written or written with a reckless disregard for the truth.

He says he's analyzed the published games, and found that Stockfish finds obvious errors in the way that Stockfish was said to be playing. Unless he's consciously lying about this (I'd guess unlikely?) it's not libel. And if it's true that a correctly configured Stockfish doesn't play the way it was said to play, this would be a strong indication that either intentionally or accidentally, it was not a fair playing field.

Anyway, if your point was that it might be more productive to be more polite, sure! But one might say the same about accusing someone of slander.

AlphaGo/Zero success has been a major driver of the recent AI hype, it's actual strength is no concern compared to the money to be made.

Why would they be selling access to it? No one actually cares about computer chess other than in the chess world. It's a hobby.

Computer chess is a huge part of modern professional chess, someone like Magnus Carlsen would probably pay for access to the best chess engine available.

The first games won are chess and go, the last games one are for the survival of the human species!

It is my moral obligation to express to you the fact that AI, even this kind of AI, is a death sentence for humanity. The progress of automation will eventually meet and surpass the human mind. But even before it does, perhaps long before it does, it will cause massive economic disruption and unemployment. The more complete automation becomes, the less power humans will have, the less influence humans will have over the powerful entities that hold the keys to critical resources such as jobs. The economics of automation leave little doubt that the outcome will be bad for humans. I’m sorry I can’t explain it more effectively here. But I think it’s clear to anyone who thinks it through carefully.

Please stop applying your intillenge to AI.

Edit: substantive counter-arguments would be highly appreciated

The current economic model is by no means a rule of nature that humans need to follow unless they disappear. That model gives power into the hands of the minority that possesses the most capital (as machines or money to buy machines), up until now that capital has needed workers to create useful stuff, I agree with you that automation is causing that less and less workers are needed to create value, causing a widening of the inequality gap.

The solution to that problem is not to stop working on AI, it is to rethink how capital and power is shared among the society. Indeed advanced AI and automation may well be our way out of the exploitation of man by man.

You completely misinterpret what I’ve said. The way capital and power are shared is determined by economics. Not by us. The presence of AI will cause capital and power and etc to naturally shift away from humans in general. It is inescapable.

Simple counter argument: If all "good people" stop researching AI, it will eventually end up in the hands of "bad people", whoever you deem those to be.

That has nothing to do with the substance of my argument. Your suggestion is just to walk into it without trying anything? The “bad people” will have access to AI regardless of who creates it, when they create it or what manner they create it in.

If your primary concern is just job loss, my reply does not address it. Also, I don't really share your concern, or at least the strength that you seem to hold it with.

My concern is with superintelligence. A hostile superintelligence would probably end humanity, or at least the part of it that I care about.

Basically I share Elon Musk's view on this. I think superintelligence is quite likely within a few decades or at least a century or two, and if/when it happens, our only hope is that is imbued with values we share.

Ideally, the first organizations to discover how to do this should be open or under the control of democratic governments, or secondarily, under the control of corporations that can be regulated by such governments.

Don’t you see the logic of what I’m saying? How can you read my comment history and still think that automation is not a problem? But the answer is the same for automation and super-intelligence: you have to prevent it from existing in the first place. People who advocate for containment of SI usually prefer prevention but think it’s too difficult and go to containment as a last resort, including Elon musk. Fight as hard as you can for prevention.

I'm undecided on the issue, but the best and most detailed response I've read to your view is this one: https://idlewords.com/talks/superintelligence.htm

That was a huge useless ramble. And anyway, it seems to focus on the danger of hyper-intelligence, not the economics of automation. That guy doesn’t have any idea what he’s talking about.

Can you imagine any ways in which that kind of AI can be used for good (or less bad), however unlikely? If yes, become an AI researcher/policy shaper or donate to groups that might be able to make progress towards the better outcomes.

The individual use-cases don’t matter. What matters are the points of contact between AI as a concept and the fundamental economics of human life as we know it. AI will change the economics of life in very fundamental and very negative ways. Why does it matter or help to be active some branch of AI in particular? It won’t change this!

The only solution to this problem is the banishment of AI. There is no other way to preserve life as we know it. AI might not provoke these changes within my lifetime. But people are very happy to protest and march for global warming even though it also will not end the world within our lifetime. There is a strange cognitive dissonance there. The logic is very similar: even if there is a small chance that it could end the world, better to err on the side of caution. The consequences of AI will be indescribably worse for humans than global warming, so why not exercise caution?

Because the arguments for global warming eventually ending the world are so far much stronger. The endgame for AI is much less clear and therefore the final outcome much less certain.

I admit, I don't understand people who think that 'Global Warming will eventually end the world'. I'm not denying it exists, nor that it can be devastating but end the world? No credible projections I've seen suggest that it will end up killing every human or blow up the planet or whatever you mean by that.

Is it a case that 'the enemy' is saying Global Warming isn't real, so there is a push to reply that not only is it real, but even worse than we expect? A concious hyperbole so people will pay more attention to a real problem you perceive as neglected? An actual belief that it will genuinely end the world because you've heard it will be very bad so you conflate it with being the worst thing possible?

You're reading too much into what I said. I didn't mean to imply that global warming will end the world with certainty and definitely not that the planet will blow up. I simply meant that the probability of the world (in the sense of the human civilization in its current form; definitely not the planet itself) ending from global warming is greater than it ending from god-like AGI.

I don't consider this a controversial statement. From my own reading of the relevant studies, I think there is definitely a non-negligible probability that runaway effects may make Earth very harsh or even completely unsuitable for human life, eventually. Are you saying the current studies outrule this scenario entirely or with near certainty?

AI is much more severe. Whether just automation or otherwise. We can come back from global warming. We can never come back from AI.

That depends on a lot of factors which I am unconvinced can be determined beforehand. I agree that the worst-case AGI scenario is worse than the worst-case global warming scenario.

Nothing is determined beforehand, you use what you know to try and guess what will happen. How can you look at it all, and my comments, and not think that the outcomes I’ve described are a very good guess? It doesn’t have to be definitive. GW is not definitively proven. All you need is to show that there seems to be a chance that something really bad will happen. How have have I not done that? What link in the chain is ambiguous to you?

Edit: substantive counter-arguments would be highly appreciated

I think the biggest problem with your argument is its lack of originality. Every labor-saving innovation in history has been greeted with "This is a death sentence for humanity" or equivalent sentiments, and the naysayers have never once been correct. You need to flesh out your argument by explaining exactly how It's Different This Time.

Your purpose as a human being is not to do a machine's job poorly. Think more of yourself and your fellow humans.

The lack of critical thought is entirely on your side of the court. Automation has not yet lead to negative things, therefore it never will?

Once a job or task is automated, it can never be done by humans again. This is because within any market, the most competitive entity propagates and the others do not. The less competitive entities are starved of resources and stop existing. As AI approaches sentience, it will become the most competitive entity in more and more cases. When AI becomes comparable to humans, it will displace humans in every instance unless there is a conscious effort to subvert the market economy model and intentionally use less competitive options, otherwise known as abstaining from AI. However, some entities don’t care about a consensus to abstain and will use AI anyway. A country for example will always win against its adversaries if it uses AI. At the end of the day, AI will ratchet forward uncontrollably and the only entities left standing will be those who use AI. This is an inescapable and inherit characteristic of AI. Most AI experts concede to this, but wave their hands and say it’s a long way off.

The only way for humanity to continue is for AI to be banished. And by the way, this is all assuming that AI never maliciously targets human life on its own or otherwise for any reason whatsoever.

The lack of critical thought is entirely on your side of the court.

(Shrug) At least I have historical precedent on my side, so I don't have to fall back on a list of bare-assertion fallacies.

A bare-assertion fallacy is a “fallacy that makes no attempt to use logic in order to justify its conclusion.” Did you accidentally not read the whole paragraph of logic that I wrote in my comment? It’s right there dude. Pick one of the links in the chain of my logic to scrutinize. That’s how rational debate works.

Maybe you disagree that the most competitive entity always propagates in a free market? Maybe you disagree that the world and everything in it is a kind of free market? Maybe you disagree that machines can recreate any aspect of the human mind? I literally just don’t understand where I’ve gone wrong. Please point out where I’ve gone wrong in this.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact