Eh. It's still searching many fewer positions than Stockfish is.
So there is a tradeoff between the depth of the search and quality of evaluation. For traditional chess algorithms, better evaluation was rarely worth the cost; it would slow down the search so much that it didn't pay for itself.
But this performance tradeoff (like all optimizations) critically depends on hardware. Change the hardware and you change which optimizations are "worth it".
AlphaZero is clearly good at using TPU's to maximum effect. But what would its performance be in a CPU only environment? Maybe dumb but deeper searches still win there? This evaluation hasn't been done.
This isn't to say that the AlphaZero evaluation is "unfair". Rather that chess engines evolved to be too dependent on their environment. Getting maximum use out of CPU's is a strength, but not being able to use TPU's or even GPU's is a weakness.
But the fact that a generic algorithm absolutely destroys humans and their human crafted programs is the most interesting.
Yes TPU + GPU army is a huge amount of computation power but I'm sure they'll be research coming out trying to compress the algorithms enough to use the same computation power as stock fish.
Further, they used 5,000 (first generation) TPUs at 90 INT8 TOPS each, page 4, to run the network during MCTS and 64 (second generation) TPUs to train this thing according to the methods. That's a nice mix of using INT8 for inference and FP16ish for training IMO.
In contrast, I personally own 8 GTX Titan XP class GPUs and 8 more GTX Titan XM GPUs across 4 desktops in my home network. I'd love to experiment with algorithms like this, but I suspect I'd get just about nowhere due to insufficient sampling. These algorithms are insanely inefficient at sampling at the beginning. So I guess I will seed the network with expert training data to see if that speeds things up.
That said, more brilliant work from David Silver's group! But not all of us have 5,000 TPUs/GPUs just sitting around so there's still a lot more work/research to make this more accessible to less sexy problems.
On the other hand, Google will make a shit load of money when they make TPUs available on gcloud. Papers like this are great marketing for them.
And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.
So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.
Say you need a week to train this, so $200K-$800K...
You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.
Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.
Computer graphics is still an impressive achievement even when done on a GPU instead of a CPU.
I'll even be charitable in order to simulate the existence of school/teachers/books: training from the start gets 2KW. But gameplay still gets capped to 20W.
Of course, you can hire people, and that has a well-defined cost, so it does all come down to money again.
We are comparing machine intelligence vs human intelligence.
It can be said that with more computational power, you can raise intelligence. Human brains consume the most power relative to body size than any other animal.
Also, mobile phones have Internet access, so there's no reason the algorithm has to run on the phone itself. It could run on TPUs in the cloud. It's common for many games to have server-side components. Though this isn't even necessary except maybe if Magnus Carlsen wants to play it.
But, I don't think a lot of people would pay for that vs. having a program that just runs on there phone and still beats them. So, in practice without a significant subscription fee you are going to be limited to cellphone hardware.
PS: In practice most games take about as much computing power from a server as a chat app as companies need to pay for that hardware. Remember 1,000,000+ X get's big unless you keep X very low.
It'd be like in a discussion about SpaceX's BFR designs to colonize Mars, someone comes in and questions why they're using retropropulsion since the requisite control systems are infeasibly expensive for amateur model rockets. It's a completely different discussion.
Otherwise the only takeaway is this failed to improve the state of the art.
So, the only case where chess performance per $ matters is if you are only going to ever use that hardware to run chess. In every other case which is the vast majority of the time you care about diffent metrics.
Speed probably made the initial self play training quicker though.
It's not just training. Training used 5,000 TPUs.
It seems that the metric should be compute time, not positions evaluated.
I can do more in the same wall time with a faster cpu(s); I can afford inefficiencies that the opponent cannot, and accomplish just as much.
AlphaZero evaluates 80K positions per second, according to this paper, and the Giraffe paper says that Giraffe averaged 258570 evaluations per second when running STS.
While we can't directly compare the computer power, this implies that AZ has learned a better representation.
More seriously, it seems Deepmind and the AI community in general is having a Streetlight effect problem, i.e. looking for AI in what works now, rather than coming to terms with the hard challenges. This explains why there are so many papers on GANs. People are just doubling down on what works (where the streetlight is), rather than acknowledging that where we need to look for AI is dark. Since it's become such a cut-throat race to be the next one to say "we made a breakthrough!", it makes much more economic sense to solve simple problems and advertise them as huge challenges.
But most people in the research community already know how amazing it would be to make an affordable household robot or a search-and-rescue robot or a self-driving car. Many labs (including mine) are working on it. The streetlight adds a small bias, but the bigger problem is that we have no idea how to build human-level AI.
Do that in real time, on device without using a crazy amoubt of power because of batteries.
CNNs and faster GPUs are the biggest breakthrough in that regards but it's still a long way to go before we get to human level visual cortex.
Translating to 3D is low-level and relatively easy. That's not the reason why we don't have household robots/self-driving cars.
Framing vision as "object attributes" and "correlate to prior knowledge" might be a good approach for current research. But humans do more--we understand what we look at. We form concepts and models of the world that allow us to adapt to very novel situations.
The main reason why we haven't solved vision, language, playing chess like a human, etc is that NNs are a poor approximation of human concepts. I agree that we probably need more compute and better compute.
Informed skepticism would have discounted MCTS against alpha-beta search but wouldn't have put much stock into the idea that Neural Networks couldn't learn better features than what has been painstakingly handcrafted. We know that given sufficient data and an appropriate architecture, neural nets have achieved better local minima than humans. This shouldn't be surprising anymore. A structurally adapted searcher will always do better in its adapted to domain. A Cat is so good at being a cat, it doesn't even have to think about how to cat. Choice of optimization method, input pre-processing, loss function, hyper-parameters and architecture together define a search space, a structural prior and how to navigate.
Returning to alpha-beta vs MCTS, my view is that earlier work on the chess search space being ill-suited to MCTS has not been invalidated once you account for the synergy between the neural net and search method brought about by the imitation learning approach. What might be happening here is the neural net not only learns to correct when it goes out of bounds, it also learns to account for missteps of MCTS!
The AlphaGo Zero Chess Program is clearly smarter than stockfish from the perspective of its ability to better navigate the search space but before talking about fire alarms there are some things to note.
Assuming the paper, AlphaGo zero does well if you hold compute fixed and adjust time, but how does it do as you move along both compute and time? This is of relevance to the general community, especially if AlphaGoZero skill degrades gracefully enough to allow it to be a better tutor than current engines.
Contrary to the no fire alarm claim, we should see sudden improvements everywhere due to how close joint, structured prediction, reinforcement and imitation learning are to each other. Unexpected improvement across a broad class of problems is a fire alarm. Right now, POMDP or games with hidden information and multiple interacting agents are still very difficult. Structured prediction is still difficult. Granted, this was before AGZ, but Neural Nets+MCTS had to be modified to Neural Self-Play before it could work just ok in poker-like games.
What we should take away is the power of combining searching and learning. I'll argue that what is now being called expert iteration was presaged in an antique 2006 paper  where Hal Daume et al discuss the power of a learning algorithm trained to imitate a search computed policy. Even with limited compute and data, you can use similar ideas under the learning to search framework. The imitation approach is what's consistently yielded great results, whether applied to neural nets or logistic regression.
Neural Fictitious Self Play based on fictitious play (invented 1950s), is an approach to reinforcement learning using neural nets for function approximation. Typical RL methods like DQN are highly exploitable. Against strong programs, NFSP did okay, with a win rate of -50 mbb/h against the best bot it played against.
Looking not just at Deepmind, there's Deepstack. It's similar to AlphaGo OG, combining CFR+Neural nets. Deepstack did not win convincingly against humans at 2 player no limit hold em.
The general point I'm trying to make here is that Chess and Go are closer to checkers than to poker, which is itself a constrained game with known rules. I mention all this and this Deepmind paper: https://arxiv.org/pdf/1711.00832.pdf, to provide a sense of scale to those talking about smoke and fire alarms.
I know it's complicated, between the hardware differences, search method used, etc. But when claiming that NNs beat hand crafted evaluation functions, keep in mind that Stockfish is probably are not the best choice to compare, since it has made different tradeoff choices to get more depth (which goes back to search method and hardware choices).
You can't disconnect the search part from the discussion, as the search selectivity is ALSO learned by the neural network.
AlphaZero just tries to be smarter about which branches to evaluate so it can go deeper.
But I would love to see AlphaZero (trained) run side by side with stockfish on an iPhone hardware and defeat it. That would be a more apples to apples comparison.
In that time I figure they used the equivalent of about 1000 cpu-years. Imagine the things we'll be able to achieve as we can do more and more computation in less and less time.
Some scientist say the brain has a power of several petaflops, so if you use this, I guess the design of stockfish was way less efficient.
You can't really compare things to cpu years, it doesn't make sense. Power consumption would be a better metric I think.
A more fair comparison would be use as much computation needed for training but for runtime, use equal wattage hardware.
E.g 20W of cpu in mobile phone running stockfish to 20W of GPU in nvidia TX2 running AlphaZero to 20W human brain.
Are you using some kind of conversion factor from TPUs to CPUs? If so, what is it? And is it valid to do that?
You could convert the amount of time it took to render an hour's worth of gameplay from 1 GPU-hour to 50 CPU-days (or whatever), but is that really meaningful?
Yes, it needs a boatload of very simple compute (8 bit operations), the kind that CPUs are not even close to ideal at providing economically.
That's not how "as well" works.
Sample game 1
Sample game 2
Sample game 3
Sample game 4
Sample game 5
Sample game 6
Sample game 7
Sample game 8
Sample game 9
Sample game 10
Alpha play feels "human" at least to this FM. This is fantastic news! It is what I would imagine a good correspondence GM would play like with engine assistance.
I already commented on Game 1 where Stockfish played extremely aggressively with 13. Ncxe5 ??! and 31. Qxc7 ?!
Game 3 is a positional masterpiece. Alpha is willing to play pawns + exchange down when it correctly evaluates that Black queen and rooks will be tied down.
This kind of long term thinking is beyond what regular engines perform.
Game 10 is also an impressive showing by Alpha. Alpha is willing to play down a piece and a pawn for 15 (30 ply) moves in a middle game beyond the reach of Stockfish's raw calculations.
If one could only get access to Alpha evals :) When do mere mortals get access to TPUs on Google Compute Engine?
There's a project currently that emulates AlphaGo Zero using distributed computing / crowdsourcing: https://github.com/gcp/leela-zero . You can run it on the browser too and it will submit the games after: https://ntt123.github.io/leela-zero/
Hope such a project will be available soon for the chess variant.
Or maybe Deepmind will release this as a SaaS product?
But I don't know whether they'll do it. I hope they follow suit like other researchers who have github repos with code and models besides their papers. Really accelerates research.
That was a bad move for white to play. It's easy to win when your opponent throws the game.
No human player would trade queens in that situation.
See for yourself:
EDIT: Nope, I'm just a noob.
I'm delighted. Chess seemed so simple. I had no idea there was a special pawn capture.
A stunning demonstration of generality indeed.
This has to already exist as it is very obvious.
The first AlphaGo paper had a system that used tons of computation, and was followed up by one that used much less and worked even better. Not speaking for Google, but I think it's a bit of a race to publish great results first. I wouldn't be surprised to see something better than this that uses 1000 times less resources published in a year or two, just like what happened with Go. First prove it's possible, than figure out how to make it much more efficient.
But unsettlingly few, nonetheless.
Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.
Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.
I suspect it would exceed the state of the art in Arimaa, since Arimaa is specifically designed to have a high branching factor (17281 -- compared to 35 for chess), and this technique was designed to work well in high-branching factor games (since Go is a high-branching factor game, though much lower than Arimaa).
Deepmind is actively working in a StarCraft bot. It would be interesting to see if they can be put together a supraintelligent StarCraft bot and then translate those results to Stratego.
The paper says:
'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'
In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.
11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.
35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.
50. g4 also suspect
52. e5 is insane.
This is bullshit.
Edit: bullshit is too much - see comments below.
Edit: Oh dear. We're doomed.
Qe1 and Kh1 are fine if the plan is to prepare f4.
35. Nc4 stuck around at the #2 / #3 best move for as long as I ran that position.
Remember the Stockfish in the paper had 64 cores so you'd have to run your Stockfish for a while to get it to arrive at the same principle variation.
I'd certainly fancy my chances against this AI more than Stockfish on a lower power.
SF plays really odd moves when left to its own devices for a time. As does this AI. So maybe chess looks really weird with play significantly better than the best humans.
It's actually really disturbing.
Of course you are right about perfect play, but the human-like aspect is part of what is exciting about these new Alpha engines.
How would you know?
The whole thread is pretty hilarious. In another part of the same thread there is this comment:
we're in a similar space -- http://www.getdropbox.com (and part of the yc summer 07 program) basically, sync and backup done right (but for windows and os x). i had the same frustrations as you with existing solutions.
let me know if it's something you're interested in, or if you want to chat about it sometime.
drew (at getdropbox.com)
9. Qe1 is a pretty normal maneuvering move
13. Ncxe5??! looks like a major howler.
Ask 100 strong chess players and 99 of them would completely ignore it.
You are giving up a piece for two pawns in an open position and black has no real weaknesses. There is no real basis for a sacrifice.
This shouldn't work. The crazy thing is that Stockfish almost makes it work.
It is the kind of move you play when you absolutely must win and must win now.
The only reason Stockfish considered it is because of white pawn on a5 giving additional tactics in breaking up black pawn chain with a6 a couple of moves down. With pawn on a4 Ncxe5 wouldnt be worth attempting.
The crazy thing is that being such a bully almost worked!
At move 28. White looks very solid, with 3 perfect pawns for the piece + black has horrible weaknesses.
29. g3 is a bit suspect but the next super computer move is
31. Qxc7 this has to be losing but it is a typical computer bully move.
Most strong human players would prefer to defend h3 hole with Kg2 (on Qh5 f5 looks fine).
The idea is that black's white square bishop is boxed in with white pawns.
There must be a concrete reason why Stockfish did not play Kg2.
Overall the impression one gets is of very "human" play by Alpha and ultra aggressive play by Stockfish.
EDIT: so extremely impressive play by Alpha but a bit suspicious aggression by Stockfish.
I agree Ncxe5 looks crazy, but the weirder thing to me is that Stockfish offers a repetition the very next move. So it can't be caused by having high contempt (favouring wins over draws).
I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".
1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa
2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2,
14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently
be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions
Because captured pieces change sides, there is less of an "endgame" scenario, and as a beginner (like me) it is very easy to put too many captured pieces back into play, which makes it hard to defend everything and essentially you end up giving them back to your opponent
Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).
Maybe I'm missing some things but:
- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: https://cloud.google.com/tpu/
- I can't find the source code
This does not look like a scientific paper, but a (very impressive) tech demo.
Are we blindly accepting this as science now ?
Even at home, you can verify the results by replaying the games against stockfish. You might not be able to replicate the setup at home, but that does not mean it is not science.
Does it even qualify as a tech demo if the result only exists in DeepMind's lab?
A similar problem exists in cosmology. Can you verify the multiverse model if you only have one universe to experiment in?
As data storage requirements in RAM and TPU power requirements increase to run certain models/algorithms, machine learning is becoming more obscure. Not only can we not understand how an AI is reaching its conclusions (inscrutability), we cannot even probe it (by tweaking parameters, etc) to find weak points (inaccesibility). This is actually a good thing. Where humans cannot tread, there can be no evil?
Hope such a chess project like this will be available in the future.
Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40,
Stockfish gets its own knight trapped and the game is over.
This is not the kind of chess we normally see from Stockfish.
A big class of imperfect information games can be modeled by having a record of everything the agent has seen so far. Then it has exactly the same, if not more, information available than a human player in the same position. We know that with equal information AIs can make better decisions than humans (see also, AlphaGo :] ) so at that point the AI could reasonably be expected to achieve superhuman performance.
The "imperfect information games are harder for AI" crowd are going to be surprised by just how badly humans deal with imperfect information. AIs have a much better memory than humans do, and much more potential to use actual probability which humans are truly shocking at utilising (although neural networks don't seem to utilise this edge; so far).
Earlier this year, DeepStack, a system combining neural nets with search, competed live against humans without any side being dominant. Search policy guided training might improve its results, which are impressive compared to even 5 years ago, but this highlights how much more demanding imperfect information games are.
Actually machines can have an even higher advantage in those cases, because they can be much better at estimating probabilities than humans. Think of card counting, for example.
Furthermore, techniques like monte-carlo tree search used in AlphaGo don't work very well for poker - You can't just try and find the "best move" from the current game state, or you will end up playing a highly-exploitable strategy. You essentially have to solve the entire game every time (or completely in advance) to make sure you are playing a balanced strategy.
Only the Counter-Factual Regret Minimization algorithm has been able to achieve this level of play in Heads Up, and right now it looks hard to scale to poker games with more players, like the full-ring games you see at the World Series of Poker, for example. We still have a ways to go in Poker AI.
In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?
Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)
From there, Coursera has a paid(?) DL course by Andrew NG or there's Fast.ai which looks good.
For reinforcement learning, I hear Barto and Sutton is very readable, but I haven't read it myself. You can just pick the concepts up by reading papers. The introduction in the Deep Q-Learning paper is not great, but it's how I first learned the concept.
By the way, I believe David Silver was the lead programmer for AlphaZero.
A nefarious suggestion would be that setting 1GB limit ensures that Alpha would always have the edge in depth as Stockfish would be forced to prune long lines to preserve hash memory.
Maybe someone who has read Stockfish source code can comment how Stockfish prunes hash memory.
The interesting thing is TCEC assumes a bit about the structure of the chess program. That is, the TCEC win-adjudication rule says that if both programs agree that one program is 6.5 pawns ahead for 8 turns in a row, they judge that program to be the winner.
But programs like Alpha don't have an evaluation function that operates in conventional units (like centipawns).
> We also measured the head-to-head performance of AlphaZero against each baseline player. Settings were chosen to correspond with computer chess tournament conditions: each player was allowed 1 minute per move, resignation was enabled for all players (-900 centipawns for 10 consecutive moves for Stockfish and Elmo, 5% winrate for AlphaZero). Pondering was disabled for all players.
The answer may be that it is hard enough to become an expert at anything, but there may be some serendipitous (how to make this precise?) overlap.
[EDITED to add:] A couple of other remarks:
Playing against Stockfish, the Sicilian seems to give it more wins as white and more losses as black than any of the other openings listed here.
What's shown here are two particular versions of the Sicilian; for all we know there's a lot more 1.e4 c5 in its self-play than the graphs suggest (e.g., maybe as white it prefers 2.c3 or 2.Nc3 or something). Eyeballing those graphs, these 12 openings account for substantially less than half of AlphaZero's self-play games.
Queen's gambit is there.
This creates the psychological effect of slightly turning the knob of "Black is playing for equality", to "Black is playing for counter-play".
I'm not in a position to read the paper right now, so my apologies if that's covered in there. I want to ask just in case it's not, while this is still on the front page.
Leela zero (the main alphago zero replication project) is a crowd sourced computation effort that's going to take a fairly long time to get anywhere.
And from this paper:
> "Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters,
using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks."
>"Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.
As for that koan, I'm not convinced it's very applicable here. My interpretation of the koan is that the entire setup (training process, structure, etc.) all encode domain knowledge. In this case, I think AlphaZero's domain knowledge is transferable enough that I don't think it's relevant.
Table 2 is broken, but the rest is much more readable if you're on a phone.
It would be interesting to hear if Magnus thought AlphaZero played less like an idiot.