That said, can someone please explain what the logic is for DeepMind only releasing 20 games out of a 1000 game match? That is a relatively outstanding degree of cherry picking results.
As a former chess prodigy and software engineer, I want to believe these results but they would be much more credible if DeepMind open sourced the games.
I don’t see how that would expose any valuable proprietary IP, but I’m a total machine learning noob so maybe I’m missing something.
"Data sharing is a common requirement of funding or publication, though this obligation may come as a surprise to some authors—and to their colleagues who have had trouble acquiring data from other laboratories."
Indeed not only is it present, but due to the various ways academia ranks citations and papers, it's fairer to say it's actually incentivised by academic culture.
Agreed that it would be nice to get all 1000 (or maybe there are more? Since these seem to have come from two matches played under different conditions, and they mention pitting AZ against Stockfish in many other sets of conditions) and it really shouldn't compromise DeepMind's IP.
It's reaching quite a competitive state, having trained to an ELO of ~3500 or so (based on evaluations against stockfish et al)
AlphaGo is undoubtedly impressive but would love to see an FPGA version of Stockfish vs AlphaGo.
Even with my rudimentary understanding of machine learning, I'm fairly certain that reverse engineering a model presumably based on millions of training games would be utterly impossible just by looking at a few thousand games.
"Probably just didn't think anyone would want to dig through a 1000+ games."
Actually, I think a lot of people would be interested:
- chess players like myself
- skeptics, academic researchers, and other "data scientists" looking for actual data to back up these impressive claims
The quote from Lee about creativity is haunting. If the current debate centers on using human reinforcement for robot meta-learning. Particularly using one-shot learning from a single video demonstration. It's possible the main competitor to emerge. Given advances in near future TPU Cloud performance and AI accelerators. An emergence of end-to-end transfer learning directly from simulation. With the unexpected result of novel strategies beyond the capacity of expert level humans to devise.
Ilya Sutskever at AI Frontiers 2018: Recent Advances in Deep Learning and AI from OpenAI
These algorithms are meant for stochastic, not entirely predictable environments (so your action might not always work and you should be able to carry on task even after some action failure (eg. in case of walking - you could slip, wind blows harder or something hits you etc.))
> What is worthwhile to observe from the AlphaGo dataset, and illustrated as a series in Figures 3 and 4, is that there is clearly an incremental increase in the strength of the AlphaGo algorithm going from version αr to αrvp, building on previous strengths, without any intransitive behaviour occurring, when only considering a strategy space formed by the AlphaGo versions.
Transitivity might be true within AlphaGo versions, but that doesn't give me any confidence that it would also hold when a human is in the equation. If a group of policies more or less occupy the same state space, they are likely to be transitive, but if they occupy disjoint state spaces, I don't think we can be sure of transitivity.
Why would this be true? If we were talking about ai-generated music that would be one thing, but it isn't intuitively obvious to me why a computer would play Chess or Go all that differently from a human.
Consider rock-paper-scissor, and agent X which plays X every move. Agent rock beats agent scissor all the time, and agent scissor beats agent paper all the time. But we cannot conclude that agent rock will beat agent paper all the time. Chess and Go might have this intransitivity property. The fact that one game exists which is not transitive means the burden of proof that Chess and Go are transitive rests on the authors.
My own suspicion is that AlphaZero should be weaker against humans compared to AlphaGo Lee since the former is not trained on human games while the latter is.
AlphaZero does seem to have a style which favors positional development over value. It seems to choose pattern over material.
I wonder what would happen if one set of kids was taught the value of the pieces and another set weren't taught the value. Would one group be better than the other? Is knowing the value of the pieces a help or an impediment to developing your chess skill, especially at such an early age?
> DeepMind didn't find any when training in different ways
Intransitivity may not occur within AG versions but could occur when compared to humans. You need very different types of strategies for intransitivity to occur.
> Are they supposed to run a tournament just to show that a massive improvement that crushes all previous versions can still beat puny humans?
I think so, yes. We do this with Chess. Why should Go be different?
Deepmind is aware of this problem. See gwern's links if you are interested.
It's the other way around no? Stockfish's preference for material is hand-coded by humans... because our best wisdom values material highly.
I got the feeling that AlphaZero cares about materiel only insofar as it occupies positions and projects power; in this sense it is further in the abstract-strategy realm than grandmasters and human-coded chess programs are.
Human-style players are like quantum theory: they’re concerned with stuff. AlphaZero is like general relativity: it’s concerned with bending the backdrop so mere stuff becomes pliant and goes where it wants... of course you need stuff to bend the backdrop, but beyond that...
I don’t know if I’m making much sense, cockeyed metaphors are my only means of expressing my impressions.
In the (simpler and less studied) game of Shogi, AG made very startling moves, which increase the number of vulnerabilities (and opportunities). AG is able to better balance all these combinatoric paths.
Perhaps as humans, we have limited combinatoric capacity, and hence are biased to conservative moves to reduce our search space (so we get lower expectations since we have to bound our variance?).
Whether this is less-strategic or simply less-bound-by-human-limitation is up to discussion.
EDIT: Would be interested to hear your views about larger DRL domains. Drop me a line if u r interested firstname.lastname@example.org temp email.
Look at how Blizzard did computer opponent AI for games like Starcraft and Warcraft in the early 2000s.
The "insane" AI builds its base and units well, controls its troops quickly, and can micro manage different types of units.
But its real strength is that it gets 2x the normal amount of resources that a human player gets, allowing it to build structures sooner than human players and build 2x the units.
This is the necessary tradeoff to make up for the fact that a human can recognize and exploit computer weaknesses, and the computer generally can't infer what its opponent is doing through the subtle hints that humans pick up on.
For example if you catch even a slight glimpse of a certain type of unit an enemy has, or see that an enemy has moved into a certain part of the map early on, you can guess with a reasonable confidence level what build or strategy they are going for without seeing the full playing board.
“Those multiplayer games are harder than Go, but not that much higher,” Campbell tells IEEE Spectrum. “A group has already beaten the best players at Dota 2, though it was a restricted version of the game; Starcraft may be a little harder. I think both games are within 2 to 3 years of solution.”
2-3 years would be amazing to solve these problems IMO for non-perfect info games.
Just fyi, there are many caveats for how this relates to progress in AI as a whole (self promotion warning): https://www.skynettoday.com/editorials/is-alphago-zero-overr...
(yes the article is for AlphaGo Zero, but it largely applies to AlphaZero as well).
Isn't this controversial?
As I've been saying for years now: if you can model a problem as a complete-information adversarial game, that problem is solved.
While what they've achieved is super impressive, it has nothing to do with solving chess in a formal sense.