
Agents that imagine and plan - interconnector
https://deepmind.com/blog/agents-imagine-and-plan/
======
interfixus
" _This form of deliberative reasoning is essentially ‘imagination’, it is a
distinctly human ability_ "

A completely unfounded supposition, as so often appears to be the case when
some human monopoly is claimed. We didn't magically sprout whole new
categories of ability during a measly few million years of evolution.

Anecdotally, I see crows getting out out the way of my car. Not confused and
haphazardly as many birds do, but in calculated, deliberate, unhurried steps
to somewhere _just_ outside my trajectory - steps which clearly takes into
account such elements as my speed and the state of other traffic on the road.
Furthermore, when it's season for walnuts and the like, they'll calmly drop
their haul on the asphalt, expecting my tyres to crush it for them. This - in
my rural bit of Northern Europe - appears to be a recent import or invention;
I never saw it done until two years ago.

And there's The Case of the Dog and the Peanut Butter Jars. My dog, my peanut
butter jars, and they were empty, but not cleaned. Alone at home, she found
them one day, and clearly had experimented on the first one, which had
bitemarks aplenty on the lid. The rest she managed to unscrew without damage.
Having licked the jars clean, apparently she got to thinking of the grumpy guy
who woul eventually be coming home. I can think of no other explanation why I
found the entire stash of licked-clean jars hidden - although not succesfully
- under a rug.

Tell me again about imagination and its distinctly human nature.

~~~
zimpenfish
> I can think of no other explanation

Well, just because you can't think of one, doesn't mean your explanation is
correct, surely. This could easily be explained by an instinctual "hide food
remnants to avoid attracting bigger things".

~~~
interfixus
In some formal scheme yes. In the actual situation no, it could not easily. Or
we can reduce the question to a squabble of semantics: Alright, the dog's
actions were not conscious and actively planned, but then neither are ours. I
fail to see the fundamental difference, and have never really heard a coherent
case made that there is one. You are of course right that argument from own
lack of imagination is no proof of anything.

~~~
randallsquared
> Alright, the dog's actions were not conscious and actively planned, but then
> neither are ours.

Well, _some_ of ours are. At least a few. It's not clear that any of the dog's
actions are consciously planned, is it?

~~~
interfixus
I should think it fairly clear, unless you propose some definite, qualitative
difference between the dog and the rest of us. It's not clear that such a
difference exists, is it?

------
ansgri
[https://en.wikipedia.org/wiki/Model_predictive_control](https://en.wikipedia.org/wiki/Model_predictive_control)

Of course imagining possible outcomes before executing is useful! And it has
many uses outside deep learning. No reason to reinvent new words, really. At
least without referring to the established ones.

Maybe there _is_ a serious novel idea, but I've missed it.

Basically, if you need to control a complex process (i.e. bring some future
outcome in accordance to your plan), you can build a forward model of the
system under control (which is simpler than a reverse model), and employ some
optimization techniques (combinatorial, i.e. graph-based; numeric derivative-
free, i.e. pattern-search; or differential) to find the optimal current
action.

~~~
PeterisP
The link between imagining and deep learning is rather in the opposite
direction - it has always been obvious that imagining possible outcomes before
executing would be useful, but the novelty is that deep learning has allowed
them to actually make "imagination" that works.

MPC is an useful concept _if_ you have a predictive model that's at least
vaguely close to the actual behavior. In some contexts (e.g. modeling of
particular industrial systems) programmers could build such a model, but in
the general case that's absolutely not feasible, the world is full with
problems where, practically speaking, you can _not_ manually build a forward
model of the system under control.

So this article is about initial research on systems that can construct such a
predictive model/imagination from experience, with a proof of concept that the
current deep learning approaches allow us to build systems that can learn such
predictive models (which wasn't really possible before) and further
development of this concept seems to be the way how we can actually apply
things like MPC to problems where we won't build a forward model ourselves;
and in the long run, that means pretty much all problems.

~~~
habitue
> systems that can construct such a predictive model/imagination from
> experience

I just want to emphasize this point as the crux here. We have many many
techniques for AI that involve doing roll-outs once a smart human with domain
knowledge hands the system a fully-formed model of the dynamics. Not so many
where the dynamics are learned

------
gradstudent
I'm not a planning guy but I work in a closely related community so I'm a
least somewhat familar with the area.

Looking at the first paper
([https://arxiv.org/pdf/1707.06170.pdf](https://arxiv.org/pdf/1707.06170.pdf)),
it seems surprisingly shallow and light on details. So they have a learning
system for continuous planning. So what? The AI Planning community has been
doing this for ages with MDPs and POMDPs, solving problems where the planning
domain has some discrete variables and some continuous variables. Here's a
summary tutorial from Scott Sanner at ICAPS 2012: [http://icaps12.icaps-
conference.org/planningschool/slides-Sa...](http://icaps12.icaps-
conference.org/planningschool/slides-Sanner.pdf)

Speaking of ICAPS: this conference is the primary venue for disseminating
scientific results to researchers in the area. Yet the authors here cite
exactly _one_ ICAPS paper. WTF?

My bullshit detector is blaring.

~~~
tnecniv
I agree. Besides (PO)MDPs, the control people also get into neural networks
whenever they come in vogue.

This thesis from 2000 was the first hit for "reinforcement learning control
theory" from google: [http://www.cs.colostate.edu/~anderson/res/rl/matt-
diss.pdf](http://www.cs.colostate.edu/~anderson/res/rl/matt-diss.pdf)

BTW, people in related fields may work on similar things but don't always
publish at the same venue -- labels matter. For example, ICRA and RSS are some
of the top robotics venues and people trying to sell themselves as roboticists
will prefer to publish there.

EDIT: In the second paper, they learn the model only from the images, not from
the game state, which is neat. That should be highlighted more than the one
sentence it was given.

------
aqsalose
The obvious caveat: this is quite far away from my field of expertise. Doubly
so, because I'm not an expert in neural net ML and neither in cognitive
science. So take this with spoonful of salt. But _anyhow_ , I don't like the
word "imagine" here. It seems suggest cognitive capabilities that their model
probably does not have.

As far as I do understand the papers, their model builds (in unsupervised
fashion which sounds very cool) an internal simulation of the agent's
environment and runs it to evaluate different actions, so I can see why they'd
call it imagination / planning, because that's the obvious inspiration for the
model and so it sort of fits. But in common parlance, "imagination" [1] _also_
means something that relatively conscious agents do, often with originality,
and it does not seem that their models are yet that advanced.

I'm tempted to compare the choice of terminology to DeepDream, which is not
exactly a replication of the mental states associated with human sleep,
either.

[1]
[https://en.wikipedia.org/wiki/Imagination](https://en.wikipedia.org/wiki/Imagination)

~~~
PeterisP
Can you elaborate on what qualitative difference do you see between
imagination-as-you-understand-it and an internal simulation of a nonexistent
(maybe future, maybe never happening) state of an agent's environment or
inputs? There's an obvious quantitative difference - their environment is much
simpler than ours, and their "imagination" is bound to imagining the near
future (unlike us), but conceptually, where do you see the biggest difference?

Originality seems not to be the boundary, since even this simple model seems
to imagine world states that they never saw, never will see, and possibly even
aren't possible in their environment, i.e. they are "original" in some sense.

If I look at the common understanding of "imagination" and myself, what can I
imagine? I can imagine 'what-if' scenarios of my future, e.g. what could be
the outcome if I do this or that, or if something particular happens; I can
imagine scenarios of my past, i.e., "replay" memories; I can imagine
counterfactual scenarios that never happened and never will; I can imagine
various senses - i.e. how a particular melody (which I'm "constructing" right
now, iteratively, with the help of this imagination to guide my iterations)
might sound when played in a band, or how something I'm drawing might look
like when it's completed - all of this seems different variations on
essentially the same thing, which is an internal simulation (model) generating
data about various hypothetical states.

This _might_ be used to evaluate different actions, but it might also be used
to simply experience these states (i.e. daydream) or do something else -
that's more of a question on how the agent would want to _use_ the
"imagination module", not a particular property of the imagination/internal
simulation model itself.

~~~
ajarmst
Yeah. I doubt that their machine can "imagine no possessions", or that it
would have much utility even if it could.

~~~
landon32
You're right, but I think this paper is the first step on a (potentially very,
very) very long road to building machines that could "imagine no possessions"

------
jtraffic
Off topic: I posted this exact article four days ago:
[https://news.ycombinator.com/item?id=14813807](https://news.ycombinator.com/item?id=14813807)

In the past, when I post exact duplicates, HN redirects me and automatically
upvotes the original instead. I wonder why this doesn't always happen. (I'm
not bothered, just curious.)

Double off topic: It's very interesting to see how much difference timing
makes. My original had a single upvote, and this hit the front page.

~~~
boulos
The merging is fairly narrowly windowed in time (I think ~hours not >1 day).
Sometimes the mods will send you an email (if you have one stored in your
account profile) and ask you to repost with a front-page bonus attached. But
yeah, timing is everything :).

------
seanwilson
I'm likely completely missing the point but how is this concept of imagination
different from looking ahead in a search tree? Isn't exploring a search tree
like in Chess or Go exploring future possibilities and their consequences
before you decide on what to do next?

~~~
sullyj3
A search tree in something like chess is quite small, and very discrete. You
can enumerate every possible action, and exploring the tree to a useful depth
is computationally tractable. By contrast, for an agent operating in a complex
environment, like a robot in the real world, even if you somehow came up with
a coherent process for listing every possible action the robot could take, you
might not even be able to store them all, let alone compute their
consequences. Think about the sheer amount of information you'd need to
process. Moreover, the real world is (for practical purposes) continuous. The
robot would have the option of engaging one of it's motor for one millisecond,
or two milliseconds, or three milliseconds, etc.

This seems to be tackling the issue of what to do when there are just too many
options, and the depth of exploration necessary to make useful predictions is
too high, for you to just enumerate everything, heuristically prune, and pick
the optimum.

~~~
seanwilson
> Moreover, the real world is (for practical purposes) continuous. The robot
> would have the option of engaging one of it's motor for one millisecond, or
> two milliseconds, or three milliseconds, etc.

Are there not similar techniques to search trees that are used here? Obviously
you wouldn't enumerate all options but you'd think you could guess at some
practical ones then guess options between the most promising. Either way, it
just feels "imagination" is making it sound like an entirely new approach when
heuristically pruned search trees could be described in the same way to me.

~~~
red75prime
Ability to generate candidate paths in continuous search space is a bit more
general than pruning prebuilt search tree.

Search tree is an approximation of continuous search problems and needs to be
built by someone. This approach builds its own search tree.

~~~
red75prime
It also paves the way to algorithms which can solve problems by repurposing
available actions for achieving unintended goals, and by creating new high-
level actions from low-level ones.

That is solving problems creatively.

------
GuiA
Why do we need to explicitly design architectures such as the "imagination
encoder" the article describes? A proposed long term goal of deep learning is
to have AI that surpasses human cognition (e.g. DeepMind's About page touts
that they are "developing programs that can learn to solve any complex problem
without needing to be taught how"), which was _not_ explicitly designed in
terms of architectural components such as an "imagination encoder".

Shouldn't imagination and planning be observed spontaneously as emergent
properties of a sufficiently complex neural network? Conversely, if we have to
explicitly account for these properties and come up with specific designs to
emulate them, how do we know that we are on the right track to beyond human
levels of cognition, and not just building "one-trick networks"?

~~~
habitue
The human brain has specialized structures in it, it isn't a homogeneous mass
from which all parts of human cognition emerge once you have enough brain
cells (see elephant brain size vs. human brain size). If you've ever seen
anything else designed by evolution, you'll know it generally tends to be a
grab-bag of weird tricks all combined together in a way that somehow works. We
don't know what all the tricks are, nor which are necessary or sufficient to
create human-like intelligence.

There are also a lot of indications that ultimately you need some tricks (i.e.
specialized portions of the architecture that bias the kinds of solutions the
AI can learn) to be able to learn effectively in the environments we're
interested in. For example, we know that there is a time dimension to agent
tasks, and that objects don't pop in and out of existence, they tend to exist
continuously. These are biases we are free to add to a learning system without
worrying about it limiting the ultimate intelligence of the system.

In the limit, the No Free Lunch theorems indicate that there's no such thing
as a general learning system that doesn't sacrifice performance on some kinds
of tasks. The goal of AI research is to sacrifice performance on tasks that
we'll never encounter in favor of getting good performance on tasks we care
about.

~~~
GuiA
_> If you've ever seen anything else designed by evolution, you'll know it
generally tends to be a grab-bag of weird tricks all combined together in a
way that somehow works. We don't know what all the tricks are, nor which are
necessary or sufficient to create human-like intelligence._

That is precisely the core of my interrogation. The papers mentioned in the
article seem to be about "hand designing" the weird tricks; shouldn't the goal
be to build a system that enables the emergence of these weird tricks without
involving human design?

~~~
PeterisP
> shouldn't the goal be to build a system that enables the emergence of these
> weird tricks without involving human design

It depends on your goals - if your goal is to build a system that can perform
smart actions (e.g. build/simulate something comparable to a brain), then
that's not required (it may happen to be useful, or not); if your goal is to
build a system that can create and build systems that can perform smart
actions (e.g. build/simulate something comparable to the evolution process of
an intelligent species) then it should.

------
deepnet
> particularly in programs like AlphaGo, which use an ‘internal model’ to
> analyse how actions lead to future outcomes in order to to reason and plan.

I was under the impression that AlphaGo makes no plan but responds to the
current board state with expert move probabilites that prunes MCTS random
playouts.

There is no plan (AFAIK) or strategy in the AlphaGo papers so I find this
statement that AlphaGo is an imaginative planner quite curious.

Perhaps someone can reconcile these statements or correct my knowledge of
AlphaGo ?

Very interesting papers, it will be nice to see the imagination encoder
methods applied to highly stochastic enviroments or indeed a robot in the real
world.

~~~
ebalit
In AlphaGo, MCTS is used to explore many plans and select the best. As far as
I know, it then execute only the first action of the selected plan, and start
a new planning for the next action. As such, it doesn't "stick to the plan",
so you could say that it doesn't have a strategy. But the MCTS is definitely a
planner.

~~~
deepnet
Yes absolutely, I think your explication is perfectly correct.

Though (IMHO) MCTS is better characterised as evaluating _moves_ rather than
exploring _plans_.

The MCTS only explores the moves in order of likelyhood using the most basic
of heuristics, random playout.

The Net outputs likely moves based only the current board position, it
formulates no strategy.

No state is stored across moves - each play is independent, relying only on
the current board position.

I still don't see anything anywhere in AlphaGo that is a plan, trajectory or
strategy.

Neither is there an evaluation of the opponent nor any attempt to outwit them.

That it performs so astonishingly well without a plan is very very interesting
and should perhaps give us pause - is planning a hubris ? Do we undervalue our
use of heuristics in our own behaviour ?

------
mehh
Painful paper to read because of the inaccurate use of the word 'imagination'.

I'm sure the guys who wrote this are smart enough to know its not imagination
(perhaps arguably a small subset of the attributes that contribute to what we
know as imagination, but not imagination itself).

Which leads me to assume this hyperbole is there purely for the benefit of PR
and stock price.

~~~
mehh
Trying hard to resist saying this, but all I can think of is "infinite polygon
engine" ...

------
ww520
Evaluating different outcomes far ahead may be very computational intensive.
One thing that AlphaGO shows is that a simple approach with Monte Carlo tree
search can drastically cut down the search space. The "imagine" part could be
just guided random walk ahead in planning, with something like Monte Carlo
tree search.

------
miguelrochefort
I'm confused. I thought AI was already about this.

Are they introducing something new, or is it just gimmick and buzzwords?

~~~
landon32
Their architecture is definitely novel. What are you referring to that made it
sound like real AI we have could already do this?

------
thinkloop
> imagination is a distinctly human ability

Right...

