Hacker News new | comments | show | ask | jobs | submit login
Alphazero crushes Stockfish in 1,000 game match (chess.com)
48 points by stillsut 10 days ago | hide | past | web | favorite | 42 comments





"CCC fans will be pleased to see that some of the new AlphaZero games include "fawn pawns," the CCC-chat nickname for lone advanced pawns that cramp an opponent's position. "

The "fawn pawn" naming comes from fans of kingscrusher on youtube who analyzes Leela Chess Zero games. https://www.youtube.com/user/kingscrusher

His accent makes him saying "thorn pawn" sound like "fawn pawn", and thus the name has been given.

Here is a link to shirts he sells with "fawn pawn" on them. https://teespring.com/fawn-pawn?tsmac=store&tsmic=kingscrush...


I feel compelled to repeat the previous criticism on the games: - Stockfisch was never designed nor tested on so many cores. Running on 44 cores may degrade the performance of Stockfish. - Stockfish was designed to start a game with an opening book. In games with opening book, Stockfish has won significantly more games, too. - Stockfish 8 was not a particularly good implementation. Stockfish 9 or 10 should be a better choice, though.

Nevertheless, the performance of AlphaZero was impressive, especially the positional knowledge it has acquired is second none. In all existing chess engines, positional knowledge is under-represented through simple heuristics. Acquiring positional knowledge was a longtime dream of chess programmers many generations. The dream was to create an engine which plays a more human-like style of chess. AlphaZero has realized this dream and even goes beyond that: extending the humans knowledge of chess.

I believe the most intriguing question right now is why AlphaZero stopped to improve after 9 hours of training? It’s due to the inherent problem of chess or due to the limits of ANN? If it’s the latter, how we can breakthrough and create a new generation of engines that can even surpass AlphaZero?


Perhaps in the highest level of chess, almost all games will be a draw. So there's not much progress left to be made other than playing a variation that has less draws, like Shogi?

Do we know what hardware each was using? Aside from time, s criticism of the previous AlphaZero/Stockfish match was that AlphaZero was using a tremendous amount of TPU power while Stockfish was running on, essentially, an average laptop.

They used the same config as TCEC.

"Stockfish was configured according to its 2016 TCEC world championship superfinal settings: 44 threads on 44 cores (two 2.2GHz Intel Xeon Broadwell CPUs with 22 cores), a hash size of 32GB, syzygy endgame tablebases, at 3 hour time controls with 15 additional seconds per move"


For comparison, AlphaZero used 4 TPUs and the same number of cores. So more computing power but not absurdly so.

And it's literally true that the TPU has more computing power than a CPU in terms of e.g. flops but part of the reason why NNs are so powerful is that their operations can effectively be parallelized whereas Stockfish is optimized for CPUs that can efficiently branch and do smaller, more flexible operations.

How did AlphaZero lose 6 games? Do we have an analysis of those? I'd love to see what happened.

They are available in the appendix but here's one of them: https://lichess.org/IVW0T4Jd

Kinda strange stockfish won it since it looked like stockfish pushed for draw via repetition multiple times with alphazero declining.

AlphaZero seems to hate repetitions, whether in won drawn or lost positions. I don't understand why, given that it isn't explicitly programmed to have "contempt" like most engines.

Most chess engines have "contempt"? What does it do and what purpose does it serve?

It influences how the engine values situations that could lead to draws. If playing against an (assumed) weaker opponent, it's likely good to avoid draws when there's probably a good chance to win. In reverse, trying to lock in a draw is better than likely loosing later.

Since repetition is one way to get a draw, an engine with positive contempt (it assumes the opponent to be weaker than itself) will score repetitive moves lower and is more likely to pick something else.


This looks like a draw, is it not?

I'm glad that they took a lot of criticisms to heart this time.

The opening book and Syzygy Tablebases were enabled, so we're seeing Stockfish go at full power here. The only last bit of problem is that Stockfish's scaling isn't very good. But there's not much that the admins of the test can do about that.

This test seems fair IMO.


Previous discussion from a submission by a DeepMind engineer here: https://news.ycombinator.com/item?id=18620978


> According to DeepMind, AlphaZero uses a Monte Carlo tree search, and examines about 60,000 positions per second, compared to 60 million for Stockfish.

The previous statement was talking about how much faster and more efficient AlphaZero is, but the interpretation I pickup from that sentence is the opposite. Is this a “golf score” situation where lower is better?


When Stockfish evaluates a position, it explores moves to a greater depth (a greater number of plays ahead), with its guesses and the value of the final board arrangements it can get to estimated using a relatively simple heuristic. AlphaZero evaluates the different potential moves using the neural network, which guides the search with a very complex heuristic that implicitly incorporates a tremendous amount of depth from prior games that have been incorporated into the model. Similar to the way an image recognition model takes in a whole image and says "this is an image of a goat," AlphaZero takes in a whole board and says "this is a winning board."

> Similar to the way an image recognition model takes in a whole image and says "this is an image of a goat," AlphaZero takes in a whole board and says "this is a winning board."

IIRC, AlphaZero has two outputs from the neural network. You described the first output.

The 2nd output was absolutely critical to it growing in strength. In effect, this 2nd output value is the difference from AlphaGo and AlphaZero. The 2nd output value guides the monte-carlo tree search.

Naive MCTS looks at board positions randomly. AlphaZero's MCTS looks at board positions the neural network deems "interesting". In effect, the neural network both guides the search (output #2), and evaluates the position (output #1).

MCTS chooses a position based off of the "interesting factor", as well as "how much that position has been evaluated". Ex: if "Knight to c3" has been evaluated 1-million times, MCTS will try to look at other positions. But if the neural network says that "Knight to c3 is really, really interesting", MCTS will still favor to look at that position, more so than other positions.

Etc. etc. down the hierarchy of moves.


AlphaZero does effectively a pattern matching of the board, whilst a traditional engine needs to calculate and rate every move into depth (<15). This search space is exponential, whilst AlphaZero's evaluation function is constant.

With deep search trees as in Chess or even more so in Go, a pretrained evaluator function easily beats any tree search. Applied AI.


AlphaZero: Few positions, very thorough evaluation. Stockfish: Lots of positions, more simplistic evaluation.

i don’t think either is better or worse, it’s just different strategies.

a probably somewhat wrong but evocative way to think about it is that what AlphaZero does is closer to “intuition”; it can look at a board and immediately tell very well how good it is. maybe you could argue this is more like how human grandmasters play.

Stockfish, on the other hand, has much simpler guesses for how to evaluate a board, but they can be computed very quickly.

Both of them employ a tree search: “if i do this move then they’ll have three countermoves and i can respond in 16 ways...”. (It’s Monte Carlo because they randomly sample, rather than exploring exhaustively.) But Stockfish is able to explore many more possible outcomes, whereas AlphaZero explores far fewer but is able to focus only on those that really are promising.


To all of the people responding, thank you for explaining how it works.

AlphaZero using GPU while Stockfish uses CPU. Stockfish performs (an almost brute) force calculation of positions while AZ has to "Guess" which moves are better based on it's neural network. In this case GPU is much slower than CPU. If AZ could calculate 60 million moves per second, stockfish would have no chance to win against it.

tldr: Higher is better.


"Crushes an old version of Stockfish." The current Stockfish 10 is said to have a >100 Elo advantage over Stockfish 8.

Says they had almost the same results against Stockfish 9. That said it would be interesting to see a more fair competition with Stockfish set up by an opposition team and maybe the same budget for cloud compute. Given they run on different hardware that might be a way to do it.

Hmmm, I think 100 Elo is quite small in the great scheme of things. But simultaneously, AlphaZero's lead isn't really that much.

+155 -6 =839.

That's 839 draws to 155 Wins / 6-losses.


Still, Houdini is again leading on the CCCC live championship table, with Stockfish at the #2, and the open-source AlphaZero clone lc0 at #3

https://www.chess.com/computer-chess-championship


Will AlphaZero be available to more chess players to play? It would be interesting to find a blind spot in this engine in a format where humans could use their brains and more tools trying to beat it. Or is it really unbeatable?

Chess AI has been "unbeatable" for long time now.

At this point, Stockfish on an iPhone (set to play at full strength) can crush the best chess player in the world (Magnus Carlsen) like a bug.

I thought AlphaZero only runs on Google's proprietary hardware (TPUs)? If that's the case, then it's not much use to the rest of us, is it?

What would happen if you let AlphaZero compete against AlphaZero?

That's exactly how it's trained, it competes against itself in a millions of games and updates it's neural network with the heuristics of the winning instance.

Personally, in that case I'd probably bet on AlphaZero to win. (that was a joke)

It might be an interesting question what the appropriate bet would be for win vs draw in this case, and how this would change with greater training. Presumably the more you train both sides the more likely they are to draw?

Also interesting would be to quantify the effect a small hardware handicap has, and how this trades off with training. Is more training always better than more hardware? Vice versa?


> Personally, in that case I'd probably bet on AlphaZero to win. (that was a joke)

Then I would bet on a draw.


Unless you think they are "perfectly" matched, in which case you might bet on White to win because of a hypothesized first player advantage. I think it's still an open theoretical question as to what would happen with two equally matched perfect players: https://en.wikipedia.org/wiki/First-move_advantage_in_chess

That's how it was trained.

Shouldn’t we take into account the computing power used to train Alphazero?

I feel the comparison is a bit unfair...


Stockfish's evaluation/search heuristics are also tuned with a lot of CPU power:

http://tests.stockfishchess.org/tests

Plus it has all of the knowledge from past human/computer chess research, experimentation and tuning that's been done in other chess engines since the 70s helping it.


That’s a misleading comparison.

Stockfish would still be an extremely strong engine even without the training. Alphazero couldn’t even move a single piece without having been trained extensively.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: