
Hybrid computing using a neural network with dynamic external memory - idunning
http://www.nature.com/nature/journal/vaop/ncurrent/full/nature20101.html
======
the_decider
Some interesting ideas sadly blocked behind a pay-wall journal, all for the
purpose of boosting a researcher's prestige because they now hold a "Nature"
publication. Thankfully, this article is easily accessible via Sci-Hub.
[http://www.nature.com.sci-
hub.cc/nature/journal/vaop/ncurren...](http://www.nature.com.sci-
hub.cc/nature/journal/vaop/ncurrent/full/nature20101.html)

~~~
superfx
Here's an official, publicly accessible, link to the article:
[http://rdcu.be/kXhV](http://rdcu.be/kXhV)

~~~
jedharris
Not downloadable though. Provided as a distraction from the paywall.

------
nl
This is probably the most important research direction in modern neural
network research.

Neural networks are great at pattern recognition. Things like LSTMs allow
pattern recognition through time, so they can develop "memories". This is
useful in things like understanding text (the meaning of one word often
depends on the previous few words).

But how can a neural network know "facts"?

Humans have things like books, or the ability to ask others for things they
don't know. How would we build something analogous to that for neural network-
powered "AIs"?

There's been a strand of research mostly coming out of Jason Weston's Memory
Networks research[1]. This extends on that by using a new form of memory, and
shows how it can perform at some pretty difficult tasks. These included graph
tasks like London underground traversal.

One good quote showing how well it works:

 _In this case, the best LSTM network we found in an extensive hyper-parameter
search failed to complete the first level of its training curriculum of even
the easiest task (traversal), reaching an average of only 37% accuracy after
almost two million training examples; DNCs reached an average of 98.8%
accuracy on the final lesson of the same curriculum after around one million
training examples._

[1]
[https://arxiv.org/pdf/1410.3916v11.pdf](https://arxiv.org/pdf/1410.3916v11.pdf)

~~~
petra
If it sucseeds and scales, it seems very close to AGI, right ?

~~~
nl
No-where near it. So far away that it is almost completely nonsensical to talk
about it.

I guess it is unlikely that one could have an AGI without some kind of memory,
so there is that.

~~~
petra
What further key skills will AGI need ?

~~~
visarga
In general an AGI would be based on a reinforcement learning framework. Its
main skill would be to observe the world, judge the situation and perform
actions. These three processes are run in a continuous loop. It would receive
a reward signal by which it would learn behavior. It would have to be embedded
in a world where it can move about and act upon. If it has all these
ingredients, it can become a general intelligence, as long as the reward
signal is leading it to do that.

Memorizing is just one of the actions such an agent is able to perform.
Another mental action besides memory would be attention. It would also need to
be able to simulate the world, people and systems it is interacting with (to
know how they behave) in order to be able to do reasoning and planning.

In short, an AGI would need: sensing (deep neural nets for vision, audio and
other modalities), attention, memory, estimating the desirability and effects
of various actions (a kind of imagination), an extensive database of common
known facts, and the ability to act (for example by speech and movement).

Many of these systems have been demonstrated. Sensing, attention and memory
are common place in ML papers. Creativity is demonstrated in generative models
that can write text, music and paint. Ability to predict the future and reason
about it was demonstrated in AlphaGo. Speech and motor control are under
development. We have most of the necessary blocks, but nobody has put them
together to form a functioning general AI yet.

------
idunning
Blog post for the paper: [https://deepmind.com/blog/differentiable-neural-
computers/](https://deepmind.com/blog/differentiable-neural-computers/)

------
triplefloat
Very exciting extension of Neural Turing Machines. As a side note: Gated Graph
Sequence Neural Networks
([https://arxiv.org/abs/1511.05493](https://arxiv.org/abs/1511.05493)) perform
similarly or better on the bAbI tasks mentioned in the paper. The comparison
to existing graph neural network models apparently didn't make it into the
paper (sadly).

------
gallerdude
Can someone explain what the full implications of this are? This seems really
cool, but I can't really wrap my head around it.

From what I can tell you can give the DNC simple inputs and it can derive
complex answers.

~~~
AlexCoventry
It separates the concern of memorization from those of training and
processing. In most current neural architectures, patterns in the training
data are implicitly represented in the trained neural weights, and the net is
implicitly forced to develop recall of past events by transmitting them from
each time step to the next via neural net outputs.

The framework in this paper trains a neural net which interacts with a memory
bank in a manner similar to a CPU. That means it can save and recall data on
request, which could lead to more flexible architectures (you can give a
trained net different data to recall) and easier training (since a memory-
based architecture means the neural weights no longer have to learn the data
along with the processing algorithm.)

------
bra-ket
if you're interested in this check out "Reasoning, Attention, Memory (RAM)"
NIPS Workshop 2015 organized by Jason Weston (Facebook Research):
[http://www.thespermwhale.com/jaseweston/ram/](http://www.thespermwhale.com/jaseweston/ram/)

------
foota
I have a couple questions that I'm not getting from this, does this memory
persist between each "instance" of a task? Or does it get wiped out after each
one? Is this something where you might say present the model with some data
that is the input (which it might learn to then store in memory) and then ask
a question of it?

i.e, in the blog post it discusses using the network to find the shortest path
between two stations, would the steps to do that look like this?

1\. Train the NN how to navigate any network, presenting the graph data each
time you ask the NN a problem 2\. take the trained NN and feed it the London
Underground, then ask it to tell you how to get there?

------
zardo
Instead of saving the data, you could think of using a memory address as
applying the identity function and saving the data.

Could it learn to use addresses that perform more interesting functions than
f(x)=x?

------
kylek
I'm probably totally off base here (neural networks/AI is not my wheelhouse),
but is having "memory" in neural networks a new thing? Isn't this just a
different application of a more typical 'feedback loop' in the network?

~~~
choxi
You're correct in a way, you can think of neural nets "remembering" the data
set they're trained on. Recurrent neural nets even explicitly have a "feedback
loop" like you're referring to that allows them to "remember" previous
samples. An example of that is in natural language processing where you want
to be able to remember the previous words in a sentence to interpret the
current word.

Remembering the previous words in a sentence you're currently reading is more
like short term memory though, and this paper is talking about long term
memories stored as data structures outside of the neural net itself. This
graphic from the DeepMind blog post might be helpful:
[https://i.imgur.com/KwXXCge.png](https://i.imgur.com/KwXXCge.png).

The blog post from DeepMind is a bit more accessible than the Nature paper:
[https://deepmind.com/blog/differentiable-neural-
computers/](https://deepmind.com/blog/differentiable-neural-computers/)

------
gallerdude
Does this mean we could get way better versions of char-rnn?

~~~
bbctol
This hopefully could replace current char-rnn with something very different.
Char-rnn is a long short term memory system, where recurrence in the structure
of the neural network allows short-term information to persist and inform
future actions. This paper almost mimics the brain's separate long and short
term memory structures, and could store long-term memory separately from its
main activities until needed.

------
bluetwo
One of the examples given is a block puzzle (reorder 8 pieces in a 3x3 grid
back into order)

Has this been a problem for AI and CNN's?

~~~
cscurmudgeon
That problem was solved by a non learning AI system decades ago. Current
theorem provers (a related field of AI) solve problems like this in fractions
of a second.

The progress in the article is getting a learning system to do so, eventually
leading us to handle unsolved problems.

~~~
taneq
This seems like a bit of an unfair comparison. The 'decades ago' solution was
a system, built by humans, that can solve this problem (and very closely
related ones), whereas this solution is a system, built by humans, that can
_design a system to solve the problem_.

~~~
bluetwo
I agree with the point taniq makes in that those easier systems did in fact
require a lot of hand crafting, even if parts were automated. I find it
interesting the points at which the usefulness of these approaches plateaus.

I am interested a lot in general game playing, and there is a common problem
that while the general systems tend to make interesting progress, it is the
systems finely crafted to the game that win competitions.

What I am really, REALLY interested in is what commercial application exist
for these types of technologies. Solving a puzzle slightly better than a
different tool is fun, but solving a valuable business problem is where the
money is at.

------
0xdeadbeefbabe
> a DNC can complete a moving blocks puzzle in which changing goals are
> specified by sequences of symbols

A neural network without memory can't do that or can't do it as well perhaps?

~~~
AlexCoventry
In fig. 5a, they compare its performance to that of an LSTM trained on the
same problem, and it does seem to do much better.

~~~
prats226
I am guessing for an LSTM based neural network to learn memory sequencing for
the purpose of solving problems will need a much deeper and wider network
which a separate memory block tries to provide with ready made logic and much
simpler network so it doesn't have to learn those actions.

------
prats226
Would love to see if these networks learn concepts of fast retrieval for eg
indexing etc

------
plg
but why use an ANN for tasks involving symbolic logic? I don't get it. It's
like ANNs are jumping the shark

------
ktamiola
This is remarkable!

~~~
aminorex
^ This is called proof by construction.

