Hacker News new | comments | show | ask | jobs | submit login
Deepmind Produces a General-Purpose Games-Playing Machine (ieee.org)
133 points by pross356 5 days ago | hide | past | web | favorite | 45 comments





Props to DeepMind for an impressive result. And for fixing the unfair conditions from the previous match where they “crushed” a Stockfish that was severely handicapped due to time control and no opening books. Note that Stockfish score was significantly improved this time around.

That said, can someone please explain what the logic is for DeepMind only releasing 20 games out of a 1000 game match? That is a relatively outstanding degree of cherry picking results.

As a former chess prodigy and software engineer, I want to believe these results but they would be much more credible if DeepMind open sourced the games.

I don’t see how that would expose any valuable proprietary IP, but I’m a total machine learning noob so maybe I’m missing something.


I agree. I'm annoyed how companies like Google get to release selected parts of their work, keep the rest secret, yet get accepted by top Journals. I am positive (and have seen reviews saying such) that academics would never get away with this kind of filtering of results.

Academics regularly get away with just posting results as graphs, without even the actual numbers in the graphs given out. And certainly not data, code, etc. I've done that for multiple papers in top journals in my field, and lots of others do the same. I give out the data and sometimes the code when people email me, but it's never required during the peer review process. Reviewers are just expected to believe the results in the submitted paper. I always find it funny when people hold so much faith in the peer review process, like the reviewers have some special knowledge or insight that let's them legitimize a paper. Majority of the time, the peer reviewers read the exact same thing everyone else reads, then provide suggestions for change, etc and eventually accept the paper for publication. But in terms of actual verification that the authors of the paper are not fabricating results, there is nothing remotely close to that.

I agree they should do it, but it looks like other researchers do sometimes get away without publishing their data?

"Data sharing is a common requirement of funding or publication, though this obligation may come as a surprise to some authors—and to their colleagues who have had trouble acquiring data from other laboratories."

https://www.nature.com/articles/nn0807-931


I hate to break it to you, but academics certainly do get away with this practice. You simply don't have to report the 1000 results or hypothese that didn't generate positive findings: also see publication bias.

Indeed not only is it present, but due to the various ways academia ranks citations and papers, it's fairer to say it's actually incentivised by academic culture.


There are at least 230 games released according to https://deepmind.com/research/alphago/alphazero-resources/

Agreed that it would be nice to get all 1000 (or maybe there are more? Since these seem to have come from two matches played under different conditions, and they mention pitting AZ against Stockfish in many other sets of conditions) and it really shouldn't compromise DeepMind's IP.


Not sure if this is in your wheel-house, but there's an open-source community driven re-implementation of AlphaZero, targeting Chess: https://github.com/LeelaChessZero/, http://lczero.org/

It's reaching quite a competitive state, having trained to an ELO of ~3500 or so (based on evaluations against stockfish et al)


Thanks, I’ve heard about LeelaChess a bit but haven’t gotten a chance to check it out in depth. It sounds like a really interesting project. I’m glad somebody is doing some open source work in this area, just wish Google would join the party a bit more in this area.

Still think there’s a total hardware mismatch here. AlphaGo got 2 TPUs and 44 cores vs Stockfish on just 44 cores.

AlphaGo is undoubtedly impressive but would love to see an FPGA version of Stockfish vs AlphaGo.


Anyone else see the first sentence of this, think about passing props to Deepmind, and reflect that they've spent too much time writing JS lately?

Maybe you could partially reverse the model if you had this data + timeline? I'm not sure though. Probably just didn't think anyone would want to dig through a 1000+ games.

"Maybe you could partially reverse the model"

Even with my rudimentary understanding of machine learning, I'm fairly certain that reverse engineering a model presumably based on millions of training games would be utterly impossible just by looking at a few thousand games.

"Probably just didn't think anyone would want to dig through a 1000+ games."

Actually, I think a lot of people would be interested:

- chess players like myself

- skeptics, academic researchers, and other "data scientists" looking for actual data to back up these impressive claims


Feels like an historical moment. DeepMimic is a great companion to this: simulated robots using self-play to "discover" the art of Kung Fu ;)

https://www.youtube.com/watch?v=vppFvq2quQ0

The quote from Lee about creativity is haunting. If the current debate centers on using human reinforcement for robot meta-learning. Particularly using one-shot learning from a single video demonstration. It's possible the main competitor to emerge. Given advances in near future TPU Cloud performance and AI accelerators. An emergence of end-to-end transfer learning directly from simulation. With the unexpected result of novel strategies beyond the capacity of expert level humans to devise.

Ilya Sutskever at AI Frontiers 2018: Recent Advances in Deep Learning and AI from OpenAI

https://www.youtube.com/watch?v=ElyFDUab30A


So what's up with throwing boxes at everything all the time? Is that just an easy way to put varying amounts of external complications in to see how it handles?

Checking robustness to noise I think.

These algorithms are meant for stochastic, not entirely predictable environments (so your action might not always work and you should be able to carry on task even after some action failure (eg. in case of walking - you could slip, wind blows harder or something hits you etc.))


Comparing AlphaZero to AlphaGo Lee seems problematic. I don't think transitivity holds in this case, i.e. AlphaZero can beat AlphaGo Lee almost surely, and AlphaGo Lee can beat Lee Sedol almost surely, but it could be possible that AlphaZero is not able to beat Lee Sedol at all. This is because the state spaces reached in computer-computer games are probably very different from the state space reached in human-computer games. I could be wrong, but at the very least this should be discussed by Deepmind.

https://arxiv.org/abs/1806.02643 and https://arxiv.org/abs/1803.06376 (as does the choice to drop the checkpoints & historical self-plays in general from the training) indicate that AlphaGo versions, at least up until then, tend to be transitive:

> What is worthwhile to observe from the AlphaGo dataset, and illustrated as a series in Figures 3 and 4, is that there is clearly an incremental increase in the strength of the AlphaGo algorithm going from version αr to αrvp, building on previous strengths, without any intransitive behaviour occurring, when only considering a strategy space formed by the AlphaGo versions.


Thanks a lot for the links. They look quite interesting. It does seem that Deepmind is aware of this, and are working on evaluating this.

Transitivity might be true within AlphaGo versions, but that doesn't give me any confidence that it would also hold when a human is in the equation. If a group of policies more or less occupy the same state space, they are likely to be transitive, but if they occupy disjoint state spaces, I don't think we can be sure of transitivity.


> This is because the state spaces reached in computer-computer games are probably very different from the state space reached in human-computer games.

Why would this be true? If we were talking about ai-generated music that would be one thing, but it isn't intuitively obvious to me why a computer would play Chess or Go all that differently from a human.


We do know that AlphaZero's chess playing style is different. AlphaZero doesn't care as much about material, while Stockfish and humans (presumably learning from Stockfish) do. This is because Stockfish uses a material heuristic to search for moves.

Consider rock-paper-scissor, and agent X which plays X every move. Agent rock beats agent scissor all the time, and agent scissor beats agent paper all the time. But we cannot conclude that agent rock will beat agent paper all the time. Chess and Go might have this intransitivity property. The fact that one game exists which is not transitive means the burden of proof that Chess and Go are transitive rests on the authors.

My own suspicion is that AlphaZero should be weaker against humans compared to AlphaGo Lee since the former is not trained on human games while the latter is.


That's an excellent point. When humans are taught chess, we are initially taught the value or power of the pieces. The queen ( being the most powerful ), then rook ... all the way down to the pawn. Stockfish is programmed to know the value of the pieces. But Alphazero isn't taught anything but the rules. If it has any values for the pieces, it would be self generated though I doubt it generates any values for the pieces since it really doesn't need to and the value of pieces varies depending on the position.

AlphaZero does seem to have a style which favors positional development over value. It seems to choose pattern over material.

I wonder what would happen if one set of kids was taught the value of the pieces and another set weren't taught the value. Would one group be better than the other? Is knowing the value of the pieces a help or an impediment to developing your chess skill, especially at such an early age?


You can look at piece value as a heuristic for the strength of your position. The problem with purely positional play without regard for material is that you could miss some tactic that forces exchanges and suddenly you're down three pawns and losing. As humans we miss tactics a lot, which is probably why we value material as much as positional advantages.

But that isn't true. AlphaGo Zero seemed to be considerably stronger against humans than AlphaGo Lee. Lee managed to win one game against AlphaGo Lee, but AlphaGo Zero won 60 games in a row against top players, as well as all games against Ke Jie.

I think you're wrong. The version that defeated Ke Jie and 60 masters online was AlphaGo Master, which afaik did train on human games.

You're right, I misremembered how Master was trained because they already had developed Zero by then. I still don't think there's any reason to believe Zero would do worse against humans than the Lee version. Zero was much stronger than Master, which was initially trained on human games. If the consequent self-play learning caused any non-transitive relation to humans, it sure wasn't evident when Master played against humans. So why would it show up in the learning of Zero given how much stronger it is than Master. You would have a better case if there was a reason to believe humans played anywhere near optimally.

Because Zero was not trained on human games and Master was? Do we have any model which was trained only from self-play that can beat humans? I don't think so.

You seriously don't think Zero can beat humans? 6 stone advantage over AG Lee that was trained on human games? If AG Lee behaves like a superstrong human, then Zero handily beating AG Lee is clear evidence that Zero would beat humans. I don't understand how your intransitivity is supposed to work when DeepMind didn't find any when training in different ways. Are they supposed to run a tournament just to show that a massive improvement that crushes all previous versions can still beat puny humans?

Your intuition is strongly assuming transitivity: "A beats B handily and B beats C handily, therefore this is clear evidence that A beats C handily". I already gave an example where this is not true, and I don't know why you're still doubling down on the same wrong intuition.

> DeepMind didn't find any when training in different ways

Intransitivity may not occur within AG versions but could occur when compared to humans. You need very different types of strategies for intransitivity to occur.

> Are they supposed to run a tournament just to show that a massive improvement that crushes all previous versions can still beat puny humans?

I think so, yes. We do this with Chess. Why should Go be different?

Deepmind is aware of this problem. See gwern's links if you are interested.


> AlphaZero doesn't care as much about material, while Stockfish and humans (presumably learning from Stockfish) do.

It's the other way around no? Stockfish's preference for material is hand-coded by humans... because our best wisdom values material highly.


I’m no chess expert but I am a game theorist.

I got the feeling that AlphaZero cares about materiel only insofar as it occupies positions and projects power; in this sense it is further in the abstract-strategy realm than grandmasters and human-coded chess programs are.

Human-style players are like quantum theory: they’re concerned with stuff. AlphaZero is like general relativity: it’s concerned with bending the backdrop so mere stuff becomes pliant and goes where it wants... of course you need stuff to bend the backdrop, but beyond that...

I don’t know if I’m making much sense, cockeyed metaphors are my only means of expressing my impressions.


Similar sentiment has been expressed by others.

In the (simpler and less studied) game of Shogi, AG made very startling moves, which increase the number of vulnerabilities (and opportunities). AG is able to better balance all these combinatoric paths.

Perhaps as humans, we have limited combinatoric capacity, and hence are biased to conservative moves to reduce our search space (so we get lower expectations since we have to bound our variance?).

Whether this is less-strategic or simply less-bound-by-human-limitation is up to discussion.

end-rant

EDIT: Would be interested to hear your views about larger DRL domains. Drop me a line if u r interested gaxasit@getnada.com temp email.


Now that perfect information games have been "solved" it will be interesting to see how these teams move up to exponentially harder (IMO) problems where limited information gives the cognitive abilities of humans a massive edge over the brute force skill of computers.

Look at how Blizzard did computer opponent AI for games like Starcraft and Warcraft in the early 2000s.

The "insane" AI builds its base and units well, controls its troops quickly, and can micro manage different types of units.

But its real strength is that it gets 2x the normal amount of resources that a human player gets, allowing it to build structures sooner than human players and build 2x the units.

This is the necessary tradeoff to make up for the fact that a human can recognize and exploit computer weaknesses, and the computer generally can't infer what its opponent is doing through the subtle hints that humans pick up on.

For example if you catch even a slight glimpse of a certain type of unit an enemy has, or see that an enemy has moved into a certain part of the map early on, you can guess with a reasonable confidence level what build or strategy they are going for without seeing the full playing board.

“Those multiplayer games are harder than Go, but not that much higher,” Campbell tells IEEE Spectrum. “A group has already beaten the best players at Dota 2, though it was a restricted version of the game; Starcraft may be a little harder. I think both games are within 2 to 3 years of solution.”

2-3 years would be amazing to solve these problems IMO for non-perfect info games.


Dota 2 is probably harder to beat than starcraft 2, and no, the AI didn't actually manage to beat a pro team, it failed.

General purpose only if you have access to a simulator and can do MCTS with it :)

Just fyi, there are many caveats for how this relates to progress in AI as a whole (self promotion warning): https://www.skynettoday.com/editorials/is-alphago-zero-overr...

(yes the article is for AlphaGo Zero, but it largely applies to AlphaZero as well).


Now just do Magic: The Gathering!

Tough problem space, but I would be excited for this!

Robot chess-boxing would be great.

Apparently the news is that they published a new paper about AlphaZero?

yes. reviews take a lot of time

I wonder when they will start attempting educational games that teach math and language. Those types of skills could lead to general purpose AIs.

Stockfish 8. The paper is completely out of date. Play against version 10 under fair conditions.

Does anyone know what's the progress on explaining AlphaGo/Zero's moves?

"And Stockfish, in turn, is a piker next to AlphaZero, which crushed it after a mere 24 hours of self-training."

Isn't this controversial?


Why is this surprising? AlphaGo Zero mastered Go without any prior idea of what Go is, so why wouldn't the same kind of system be able to learn and play other sorts of games?

As I've been saying for years now: if you can model a problem as a complete-information adversarial game, that problem is solved.


that problem is solved.

While what they've achieved is super impressive, it has nothing to do with solving chess in a formal sense.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: