

What Google DeepMind Means for A.I. - colinprince
http://www.newyorker.com/tech/elements/deepmind-artificial-intelligence-video-games

======
Animats
This is impressive. The current approach will work only for games where the
whole state is on-screen and planning isn't required. A pure reactive system
will work for that.

I used to say that a key component of AI that was missing was the ability to
get through the next few seconds of life without falling down or bumping into
anything. I went through Stanford CS when the top-down logicians were in
charge of AI. That approach was totally incapable of dealing with the real
world. Now we're seeing the systems needed to deal with the real world in the
short term starting to work.

Once you can deal with the next few seconds, a strategy module can be added to
give goals to the low level system. This is very clear in the video game
context. As the game playing programs advance beyond the 2D full-screen games,
they'll need a low-level system to handle the next moves ("don't fall off
platform", "jump to next platform", "shoot at target" are primitives for the
2D sidescroller era) and some level of planner to handle tactical matters and
strategy.

It's possible to explicitly build hierarchical systems like that now, using
classical planning techniques to modify the goals of a machine learning
system. It's not yet possible to get a hierarchical system to emerge from
machine learning. Medium term planning as an emergent behavior is a near term
big challenge for AI.

Beyond such a two-level system, we're going to need intercommunicating
components that do different parts of the problem. The components may be
evolved, while the architecture may be designed. When AI systems can design
such architectures, they're probably ready to take over.

~~~
Houshalter
I remember an AI researcher (I forget who) recently said something to the
effect that early AI research produced all sorts of planning algorithms. E.g.
the top down camp of AI. But they weren't capable of working with the real
world because we didn't have very good low level perception. E.g. this
complicated algorithm for planning the robot's actions, but it depended on
getting input about where objects are.

Now we have decent low level perception from the bottom-up camp of AI, but
they are limited by a lack of high level stuff like planning and reasoning.

But you are right that there is no obvious way to just combine these wildly
different algorithms without lots of human guidance.

~~~
TylerJay
I think you hit the nail on the head here. That's one of the most interesting
parts of thinking about GAI for me. Which parts will end up being top-down and
which parts will end up being bottom-up? And even if we have evidence that a
certain part is TD or BU in humans, do we even _want_ machine intelligence to
work the same way?

The article says something to the effect of "no matter how much you advance
this strategy, you never get a toddler out of it." And that makes sense
because, presumably, certain parts of the human brain exercise some sort of
top-down control over the sensory-data-processing and other parts.

For example, it seems like the human mind is built to see things as _things_.
Does the human mind _reallY_ start off seeing "pixels" and then learn by
itself to think of the word as solid, whole objects instead of collections of
similarly-colored photons/pixels or atoms? It seems like this is a universal
use-case and it would make sense if our tendency to see the world in terms of
"things" instead of patches of color is built-in (gestalt psychology seems to
suggest this as well).

It sounds like the AI in the article starts off from pixels and then builds up
_some_ _sort_ of model of blocks, the ball, paddle, game physics, etc, (but
then again, maybe it doesn't have those models at all and is just doing
statistical analysis on patterns of pixels). Either way, it likely doesn't
have any higher, context-independent model of objects/things like humans do. I
suspect this may be one of the hurdles in transfer learning. Humans think of
objects as having certain properties. When other objects in other contexts
appear to have similar properties, we guess that they may have other
properties in common which gives at least a rough model of the new object.

So I guess what I'm trying to say is: Humans have hierarchical models of the
world that let us think separately about patterns of light, atoms/molecules,
whole physical objects/things, systems, etc. They are all first-class citizens
and we ascribe properties to each of them. We already have a rough-model of
anything at the same level, but a different context, and with similar-enough
properties to something we already know. It seems to me like this is
fundamentally connected to humans' ability to do transfer-learning. Could this
effect be achieved through bottom-up algorithms, or are we going to have to
figure out some top-down way of developing transferrable, generalizable,
hierarchical models?

------
iandanforth
"video games" (read "world simulator").

The important thing about their work is that it is deliberately marching down
the path of more and more complex world simulations.

We experience the world at one second per second. To learn to walk we must
first fall, and we fall at 32 feet/second^2. There's a hard limit on how fast
we can make mistakes (like tripping) and so there is a hard limit on how fast
we can learn.

Computers can experience a simulated world at many hours per second. When
they're learning to walk in a simulated world they can fail, and learn,
thousands of times before we've finished our first step.

This ratio of simulated experience to real world time is also going up.
Eventually the minimum amount of time it takes to grow a toddler like AI in
simulation will be just under the time an AI researcher is willing to wait for
results. When _that_ happens we'll see a real improvement in the quality of
AIs.

~~~
lebek
There are two sides to this, world simulation and AI. As the other replies
already said, current AI isn't close to toddler-level (there's no reasoning
going on in the DeepMind work, just statistical correlation). We're also way
off on the world simulation side - show me a realistic world simulator that
can run close to realtime. Physically-based rendering is indeed impressive but
this only accounts for visual perception and learning to walk involves much
more than that.

~~~
wutbrodo
> (there's no reasoning going on in the DeepMind work, just statistical
> correlation)

I've seen 100 people make this statement and mean 100 different things, so I
just wanted to clarify:

How are you defining "reasoning" here as distinct from statistical
correlation?

~~~
sytelus
Reasoning involves inferring and applying causation which is different from
correlation [1]. One can possibly define process of "understanding" as
building a "model" of the system where previously unseen events can be
predicated or justified using the model. The big difference in "human
understanding" seems to be that we can extract fairly minimal set of laws that
govern the system from our observations that we can communicate and apply very
efficiently.

1\.
[http://en.wikipedia.org/wiki/Correlation_does_not_imply_caus...](http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation)

~~~
wutbrodo
> Reasoning involves inferring and applying causation which is different from
> correlation

A couple points here:

* The way humans model causation is just non-naive statistical correlation (controlling for variables). That technique is still accurately described as "statistical correlation"

* I'm not even convinced that human reasoning _does_ imply generating a model of causation. Let's exclude things like rigorous scientific studies for the purpose of the discussion and focus on day-to-day human reasoning: I think the thought processes of most of the people I know could most accurately be explained by correlating things across time. Modeling causation is often incidental (X often happens after Y is a reasonable enough heuristic for general use).

~~~
Padding
> Let's exclude things like rigorous scientific studies for the purpose of the
> discussion and focus on day-to-day human reasoning

I think you'll need to look at the other end of the spectrum to see an
abundance of (wrong?) models of causality: Religion and Law.

There are no "confirmed" cases of anyone actually going to heaven or hell or
purgatory (or whatever else), and yet many of us still conform to some
arbitary ruleset in the hopes of eventually ending (or not ending) up in one
of thoses places, because we have constructed some model of how doing this
gets you into hell and doing that gets you into heaven.

Similarly, we have plenty of evidence on how companies spend huge effort on
finding loopholes in tax laws in order to avoid taxes, and yet instead of
simplyfing the ruleset (so that there are obviously no holes in it) we still
opt for piling on more laws (so that there are no obvious holes in it) because
we construct (faulty?) models of how those new rules will prevent further
exploits.

------
nsxwolf
"the A.I. has not only become better than any human player but has also
discovered a way to win that its creator never imagined."

That's a pretty standard Breakout/Arkanoid technique - getting the ball behind
the board and letting it do the work for you.

Not knocking the AI, just nitpicking this writer.

~~~
dicroce
That confused me too until I realized the author of the article was talking
about the creator of the AI, not the game.

------
NamTaf
For all of the New Yorker's clout in journalism, sentences like the following
make me wonder where their editors are. The run-on and sea of commas is
atrocious! It's not the first time I've noticed this in the last few days
either.

"Hassabis, who began working as a game designer in 1994, at the age of
seventeen, and whose first project was the Golden Joystick-winning Theme Park,
in which players got ahead by, among other things, hiring restroom-maintenance
crews and oversalting snacks in order to boost beverage sales, is well aware
that DeepMind’s current system, despite being state of the art, is at least
five years away from being a decade behind the gaming curve."

~~~
dragonwriter
That's not a run-on, and all the commas are properly used. Its _stylistically_
awful because there are way to many apositive/parenthetical phrases getting in
the way, and because they are _nested_ without using an alternative device
(like setting the outer one off with dashes rather than commas) as well as
pointless excess verbosity ("at least five years away from being a decade
behind the gaming curve").

~~~
NamTaf
When I say run-on I guess I mean the sentence just drags on when it could be
two or three separate sentences with no negative impact. It makes it difficult
to read and I, at least, lose track of where I am in it.

It's not _incorrect_ per se, it's just not well-written as far as I'm
concerned. But what do I know, I'm not a journalist.

------
hyperbovine
Another issue that this article sort of touches on but doesn't make explicit:
the real world is not a Markov decision process. There are complex, variable-
order time dependencies which we are barely aware of but which influence our
thinking every second of every day. Trying to model this in software leads to
an exponential increase storage and time complexity. The curse of
dimensionality has been with us ever since Bellman coined the phrase almost 60
years ago; it's not going away anytime soon. Thus, it's difficult for me to
see how deep Q-learning (or any other MDP-based algorithm) gets us any closer
to human-level understanding.

~~~
Houshalter
Recurrent neural networks with e.g. Long Short Term Memory
([https://en.wikipedia.org/wiki/Long_short_term_memory](https://en.wikipedia.org/wiki/Long_short_term_memory))
can model very long term dependencies effectively, and there has been some
work lately that get good results with simpler models.

They don't use it because it's computationally expensive and totally
unnecessary for Atari games, but it's certainly possible.

------
skybrian
"also discovered a way to win [breakout] that its creator never imagined"

I don't understand. We often would bounce balls between the top wall and the
bricks while playing breakout on our Atari 2600 back in the day. And I
wouldn't say we were all that good (it didn't happen right away).

~~~
nebulous1
Yeah. It's a bit odd that the author didn't know this or discuss his thoughts
with somebody who knew this (which would be most people who've played
Breakout, it often happens accidentally).

edit: Having just watched the source video, she may be actually referring to
the creator of the AI and just badly rephrasing what the guy in the video says
( he says that they didn't expect the AI to be able to work that out with the
abilities they had given it ).

------
karpathy
I really like this line of work and I expect will grow quite substantially
over the next few years. Of course, Reinforcement Learning has been around for
a long time. Similarly, Q Learning (the core model in this paper) has been
around a very long time. What is new is that normally you see these models
applied to toy MDP problems with simple dynamics, and linear Q function
approximations for fear of non-convergence etc. What's novel about this work
is that they fully embrace a complex non-linear Q function (ConvNet) looking
at the raw pixels, and get it to actually work in (relatively speaking)
complex environments (games). This requires several important tricks, as is
discussed at length in their Nature paper (e.g. experience replay, updating
the Q function only once in a while, etc.).

I implemented the DQN algorithm (used in this work) in Javascript a while ago
as well
([http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo...](http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html))
if people are interested in poking around, but my version does not implement
all the bells and whistles.

The results in this work are impressive, but also too easy to antropomorphise.
If you know what's going on under the hood you can start to easily list off
why this is unlike anything humans/animals do. Some of the limitations
include:

\- Most curcially, the exploration used is random. You button mash random
things and hope to receive a reward at some point or you're completely lost.
If anything at any point requires a precise sequence of actions to get a
reward, exponentially more training time is necessary.

\- Experience replay that performs the model updates is performed uniformly at
random, instead of some kind of importance sampling. This one is easier to
fix.

\- A discrete set of actions is assumed. Any real-valued output (e.g. torque
on a join) is a non-obvious problem in the current model.

\- There is no transfer learning between games. The algorithm always starts
from scratch. This is very much unlike what humans do in their own problem
solving.

\- The agent's policy is reactive. It's as if you always forgot what you did 1
second ago. You keep repeatedly "waking up" to the world and get 1 second to
decide what to do.

\- Q Learning is model-free, meaning that the agent builds no internal model
of the world/reward dynamics. Unlike us, it doesn't know what will happen to
the world if it perfoms some action. This also means that it does not have any
capacity to plan anything.

Of these, the biggest and most insurmountable problem is the first one: Random
exploration of actions. As humans we have complex intuitions and an internal
model of the dynamics of the world. This allows us to plan out actions that
are very likely to yield a reward, without flailing our arms around greedily,
hoping to get rewards at random at some point.

Games like Starcraft will significantly challenge an algorithm like this. You
could expect that the model would develop super-human micro, but have
difficulties with the overall strategy. For example, performing an air drop to
enemy base would be impossible with the current model: You'd have to plan it
out over many actions: "load the marines into the ship, fly the ship in
stealth around the map, drop it at the precise location of enemy base".

Hence, DQN is best at games that provide immediate rewards, and where you can
afford to "live in the moment" without much planning. Shooting things in space
invaders is a good example. Despite all these shortcoming, these are exciting
results!

~~~
fitzwatermellow
Thanks, karpathy!

I did submit your JS implementation to HN when I came across it:
[https://news.ycombinator.com/item?id=9108738](https://news.ycombinator.com/item?id=9108738)

Monte Carlo Tree Search could be the missing link. In other words, use DQN to
model the world and map actions to a value function. Then use playouts and
backpropogation of action tree results to find tactics. Of course, it does not
solve the big question: how to model "memories" and "inferences"? Indeed, very
exciting times for AI/ML!

~~~
zhanwei
We need a model to use Monte Carlo Tree Search... which is missing in this
approach since it uses model-free reinforcement learning. Unless the deep
convolution net can extract some latent features as state, it would be
impossible to do planning on top of it.

------
kriro
I'm only loosely familiar with the "general game playing" literature but I
think learning the structure of the game is the interesting research problem
here. An immediate experiment that I'd like to try would be:

(a) Identify games considered similar (+maybe define what it means to be
similar)...let's take Pong and Breakout as suggested (b) Trial and error run
of one of the games to learn the action-reward structure (c) Compare a fresh
relearning of the second game to a start where the similarities/differences
are pre-input to the AI "somehow" (some sort of diff between the game rules
etc.)

I'm thinking of a gamer thinking "oh this is just like X,Y,Z except..." when
picking up a new game.

~~~
chriswarbo
This idea is known as "incremental problem solving". Some external oracle, eg.
a parent, teacher or programmer, provides a series of problems, where each
builds on ideas from the previous.

An example algorithm is the Optimal Ordered Problem Solver, which tries to
solve each problem by generating simple programs. Successful programs get
stored in read-only memory, then the system moves on to the next problem,
generating programs which may call out to any previously-successful programs:
[http://people.idsia.ch/~juergen/oops.html](http://people.idsia.ch/~juergen/oops.html)

------
woodchuck64
> “They can find their way across a room,” Mason said. “They can see stuff,
> and as the light and shadows change they can recognize that it’s still the
> same stuff. They can understand and manipulate objects in space.”

Isn't this just adding extra dimensions to the input space? We have 2D now
(plus time?), we're missing Z, sound, sensation, maybe emotions. Each added
dimension gives the algorithm exponentially more bits to crunch but if
computer speed is doubling every 2 years or so, why is this so obviously a
dead-end to Mason?

~~~
simonbyrne
I think the distinction is that the algorithm gets to observe the 2D space
directly, whereas the whole notion of 3D space has to be learned from 2D
projections.

Also, note that the games they do best on have a nice clear objective
function: Montezuma's Revenge on the other hand does not have such a numeric
objective to optimise.

To be fair, Hassabis does freely concede these limitations in his talks (at
least in the ones targeted to academic audiences), and it will be certainly
interesting to see where they go from here.

~~~
nemo44x
That's a good point about Montezuma's Revenge, and similar games. Very rarely
does a greedy algorithm help you in that game unlike in a breakout style game
where you simply want to acquire as many points as quickly as possible.

------
msoad
The article says humans can do "transfer learning" while machine can't. It
shouldn't be impossible to implement transfer learning in machines too.

~~~
svachalek
I don't see anywhere it says machines can't; it says DeepMind doesn't. There
are a few philosophers who believe human thought is more or less magic and
can't be replicated in any way but all of the arguments I've seen are either
deeply flawed logically or are so specific as to more or less read they can't
"be" human, which I'll accept as true but isn't terribly interesting.

~~~
one-more-minute
There are plenty of smart people elsewhere who believe the human mind is un-
simulatable. For example, Roger Penrose argues that thought processes are
deterministic but non-algorithmic.

I've never really understood that perspective, though. Surely, in the absolute
worst case, we could just make an atom-for-atom copy of a human brain? Even if
you take seriously the idea that there's some kind of magic consciousness
juice that exists outside the universe, evolution has managed to hook into it
and surely so can we.

~~~
Padding
Except that atoms are not homogenous entities.

Even assuming you'd somehow manage to produce and combine atoms to a spec,
there's positively no way of obtaining that spec.

~~~
one-more-minute
Even if thinking does somehow depend on quantum effects, it seems hugely
unlikely that it would depend on the _specific_ quantum state of individual
atoms.

If it does, you don't need a spec of that state, since we know that it can
emerge from something simpler (humans start out as a single cell, after all,
and so in fact did all of humanity). You don't need the whole system, just the
right initial conditions.

At that point you're growing a brain rather than engineering one, and maybe it
takes you no closer to understanding the mechanics. But the point stands that
it must be possible to construct a brain _in principle_ , because it's already
happened so many times before.

------
danesparza
"In the longer term, after DeepMind has worked its way through Warcraft,
StarCraft, and the rest of the Blizzard Entertainment catalogue, the team’s
goal is to build an A.I. system with the capability of a toddler. "

Wait ... what? You're going to teach this thing using violent video games?
This seems like a bad plan...

~~~
throwawaymsft
The AI doesn't have a notion of "violence". There are goals and obstacles to
that goal.

Goal: Human health, obstacle: viruses.

Goal: Clean energy, obstacle: friction, entropy, battery limitations

There are concerns that humans might inadvertently become an obstacle to some
greater goal, but training on Warcraft/Starcraft where you are "fighting"
isn't special in this regard. In chess or you are battling your opponent too,
"killing" their pieces, etc.

~~~
puzzlingcaptcha
Goal: Make paperclips
([http://wiki.lesswrong.com/wiki/Paperclip_maximizer](http://wiki.lesswrong.com/wiki/Paperclip_maximizer))

------
diziet
I will get excited once it can solve Bongard puzzles:
[http://www.foundalis.com/res/diss_research.html](http://www.foundalis.com/res/diss_research.html)

------
Practicality
The video referenced at the beginning of the article:
[https://www.youtube.com/watch?v=EfGD2qveGdQ](https://www.youtube.com/watch?v=EfGD2qveGdQ)

------
raldi
Freeway isn't a driving game. It's a chicken-crossing-the-road game.

I'm looking forward to the upcoming entry in next week's _New Yorker_
"Corrections" section.

~~~
raldi
Um, holy shit, the article's been updated to refer to Freeway as "a chicken-
crossing-the-road game", and there's a footnote acknowledging the correction.

------
Mobiu5
This is actually pretty scary. It is basically giving AI a human-like form of
will. It "desires" what you program it to desire and goes about achieving it,
learning from its' own mistakes and becoming increasingly proficient at
manipulating its' environment to achieve its' goal(s) along the way. It makes
me excited, but also quite frightened to think what goals people might give
AIs like this in the future...

~~~
astazangasta
This is a really poor rendering of "desire". If the word "desire" is
meaningful, it has to be self-actualized. Otherwise it's just a programming
condition, no different than a fuse box or a dead man's switch.

~~~
Houshalter
It doesn't matter what _word_ you use, the AI is still manipulating it's
environment to achieve that "programming condition".

In any case, everyone's desires are programmed into them, just via genetics
and evolution rather than humans and programming.

------
t_fatus
"Hours after encountering its first video game, and without any human
coaching, the A.I. has not only become better than any human player but has
also discovered a way to win that its creator never imagined." this is where I
stopped. A 12Y old could find this trick, and most of us have used it when we
played.

~~~
glial
Bad writing, sure. But the fact that you're comparing the ingenuity of a
computer to that of a 12 year old is a sign of profound progress being made in
AI.

------
daronjay
Shall we play a game?....

