
Deepmind Produces a General-Purpose Games-Playing Machine - pross356
https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/mb
======
mindgam3
Props to DeepMind for an impressive result. And for fixing the unfair
conditions from the previous match where they “crushed” a Stockfish that was
severely handicapped due to time control and no opening books. Note that
Stockfish score was significantly improved this time around.

That said, can someone please explain what the logic is for DeepMind only
releasing 20 games out of a 1000 game match? That is a relatively outstanding
degree of cherry picking results.

As a former chess prodigy and software engineer, I want to believe these
results but they would be much more credible if DeepMind open sourced the
games.

I don’t see how that would expose any valuable proprietary IP, but I’m a total
machine learning noob so maybe I’m missing something.

~~~
CJefferson
I agree. I'm annoyed how companies like Google get to release selected parts
of their work, keep the rest secret, yet get accepted by top Journals. I am
positive (and have seen reviews saying such) that academics would never get
away with this kind of filtering of results.

~~~
newen
Academics regularly get away with just posting results as graphs, without even
the actual numbers in the graphs given out. And certainly not data, code, etc.
I've done that for multiple papers in top journals in my field, and lots of
others do the same. I give out the data and sometimes the code when people
email me, but it's never required during the peer review process. Reviewers
are just expected to believe the results in the submitted paper. I always find
it funny when people hold so much faith in the peer review process, like the
reviewers have some special knowledge or insight that let's them legitimize a
paper. Majority of the time, the peer reviewers read the exact same thing
everyone else reads, then provide suggestions for change, etc and eventually
accept the paper for publication. But in terms of actual verification that the
authors of the paper are not fabricating results, there is nothing remotely
close to that.

------
ArtWomb
Feels like an historical moment. DeepMimic is a great companion to this:
simulated robots using self-play to "discover" the art of Kung Fu ;)

[https://www.youtube.com/watch?v=vppFvq2quQ0](https://www.youtube.com/watch?v=vppFvq2quQ0)

The quote from Lee about creativity is haunting. If the current debate centers
on using human reinforcement for robot meta-learning. Particularly using one-
shot learning from a single video demonstration. It's possible the main
competitor to emerge. Given advances in near future TPU Cloud performance and
AI accelerators. An emergence of end-to-end transfer learning directly from
simulation. With the unexpected result of novel strategies beyond the capacity
of expert level humans to devise.

Ilya Sutskever at AI Frontiers 2018: Recent Advances in Deep Learning and AI
from OpenAI

[https://www.youtube.com/watch?v=ElyFDUab30A](https://www.youtube.com/watch?v=ElyFDUab30A)

~~~
dmoy
So what's up with throwing boxes at everything all the time? Is that just an
easy way to put varying amounts of external complications in to see how it
handles?

~~~
machiaweliczny
Checking robustness to noise I think.

These algorithms are meant for stochastic, not entirely predictable
environments (so your action might not always work and you should be able to
carry on task even after some action failure (eg. in case of walking - you
could slip, wind blows harder or something hits you etc.))

------
backpropaganda
Comparing AlphaZero to AlphaGo Lee seems problematic. I don't think
transitivity holds in this case, i.e. AlphaZero can beat AlphaGo Lee almost
surely, and AlphaGo Lee can beat Lee Sedol almost surely, but it could be
possible that AlphaZero is not able to beat Lee Sedol at all. This is because
the state spaces reached in computer-computer games are probably very
different from the state space reached in human-computer games. I could be
wrong, but at the very least this should be discussed by Deepmind.

~~~
gwern
[https://arxiv.org/abs/1806.02643](https://arxiv.org/abs/1806.02643) and
[https://arxiv.org/abs/1803.06376](https://arxiv.org/abs/1803.06376) (as does
the choice to drop the checkpoints & historical self-plays in general from the
training) indicate that AlphaGo versions, at least up until then, tend to be
transitive:

> What is worthwhile to observe from the AlphaGo dataset, and illustrated as a
> series in Figures 3 and 4, is that there is clearly an incremental increase
> in the strength of the AlphaGo algorithm going from version αr to αrvp,
> building on previous strengths, without any intransitive behaviour
> occurring, when only considering a strategy space formed by the AlphaGo
> versions.

~~~
backpropaganda
Thanks a lot for the links. They look quite interesting. It does seem that
Deepmind is aware of this, and are working on evaluating this.

Transitivity might be true _within_ AlphaGo versions, but that doesn't give me
any confidence that it would also hold when a human is in the equation. If a
group of policies more or less occupy the same state space, they are likely to
be transitive, but if they occupy disjoint state spaces, I don't think we can
be sure of transitivity.

------
roymurdock
Now that perfect information games have been "solved" it will be interesting
to see how these teams move up to exponentially harder (IMO) problems where
limited information gives the cognitive abilities of humans a massive edge
over the brute force skill of computers.

Look at how Blizzard did computer opponent AI for games like Starcraft and
Warcraft in the early 2000s.

The "insane" AI builds its base and units well, controls its troops quickly,
and can micro manage different types of units.

But its real strength is that it gets 2x the normal amount of resources that a
human player gets, allowing it to build structures sooner than human players
and build 2x the units.

This is the necessary tradeoff to make up for the fact that a human can
recognize and exploit computer weaknesses, and the computer generally can't
infer what its opponent is doing through the subtle hints that humans pick up
on.

For example if you catch even a slight glimpse of a certain type of unit an
enemy has, or see that an enemy has moved into a certain part of the map early
on, you can guess with a reasonable confidence level what build or strategy
they are going for without seeing the full playing board.

“Those multiplayer games are harder than Go, but not that much higher,”
Campbell tells IEEE Spectrum. “A group has already beaten the best players at
Dota 2, though it was a restricted version of the game; Starcraft may be a
little harder. I think both games are within 2 to 3 years of solution.”

2-3 years would be amazing to solve these problems IMO for non-perfect info
games.

~~~
spdionis
Dota 2 is probably harder to beat than starcraft 2, and no, the AI didn't
actually manage to beat a pro team, it failed.

------
andreyk
General purpose only if you have access to a simulator and can do MCTS with it
:)

Just fyi, there are many caveats for how this relates to progress in AI as a
whole (self promotion warning): [https://www.skynettoday.com/editorials/is-
alphago-zero-overr...](https://www.skynettoday.com/editorials/is-alphago-zero-
overrated)

(yes the article is for AlphaGo Zero, but it largely applies to AlphaZero as
well).

------
j_m_b
Now just do Magic: The Gathering!

~~~
aezed
Tough problem space, but I would be excited for this!

~~~
starbeast
Robot chess-boxing would be great.

------
skybrian
Apparently the news is that they published a new paper about AlphaZero?

~~~
buboard
yes. reviews take a lot of time

------
ilaksh
I wonder when they will start attempting educational games that teach math and
language. Those types of skills could lead to general purpose AIs.

------
xcvl
Stockfish 8. The paper is completely out of date. Play against version 10
under fair conditions.

------
budadre75
Does anyone know what's the progress on explaining AlphaGo/Zero's moves?

------
SilverSlash
"And Stockfish, in turn, is a piker next to AlphaZero, which crushed it after
a mere 24 hours of self-training."

Isn't this controversial?

------
quotemstr
Why is this surprising? AlphaGo Zero mastered Go without any prior idea of
what Go _is_ , so why wouldn't the same kind of system be able to learn and
play other sorts of games?

As I've been saying for years now: if you can model a problem as a complete-
information adversarial game, that problem is solved.

~~~
dagw
_that problem is solved._

While what they've achieved is super impressive, it has nothing to do with
solving chess in a formal sense.

