

Learning to Execute and Neural Turing Machines - wojzaremba
https://plus.google.com/app/basic/stream/z12iitmzryy5g3slc23ehf2qdruyv1zjy04

======
radarsat1
The Turing Machine idea makes a lot of sense... the machine is simply a state
machine graph that interacts with the memory --- sensible that it could be
"learned" similar to any genetic algorithm approach. Pretty cool trick
regarding the differentiability of the system however.

That said, the biggest challenge here, I imagine, is evaluating the learned
system. It may give right answers, but how often does it give wrong answers?
How can the learned "machine" be tested for correctness? How does overfitting
come into the picture? For instance, halting cannot be proved nor guaranteed.
This strikes me as a fundamental advantage of a more functional "feed forward"
approach of most learning systems.

~~~
shawntan
The paper discusses putting the NTM through several tasks, and tests for
"overfitting" or how well it has generalised the task by giving it a slightly
longer task than it has seen during training. For example, in the copy task,
they trained it on sequences of length 20, but tested the it on a sequence of
length 100.

Of course, this doesn't guarantee anything, but they also take a look at some
of the internals of the learnt system which are more easily interpreted, and
found that it does some pretty consistent things.

------
agibsonccc
I would like to add for those of you not familiar: The Neural Turing Machine
method uses a neural network called a recurrent neural net. Recurrent neural
nets are used in modeling time series data and have a neat concept of training
called back propagation through time.

Here's a neat tutorial with an RBM (typically a feed forward net) as a
recurrent net for those who want to just see what a recurrent net "looks like"

[http://deeplearning.net/tutorial/rnnrbm.html](http://deeplearning.net/tutorial/rnnrbm.html)

~~~
shawntan
Nitpicking here, but while the authors do use a recurrent neural net (RNN),
they do not use it exclusively.

The system consists of a memory element, and a controller element. In their
evaluation of the system, they use both a standard feed-forward network, as
well as an RNN with long short-term memory (LSTM) units as the controller
element. In certain tasks, the feed-forward network works better.

+1 on the deeplearning.net tutorials, and theano. I've learnt a lot from
there.

~~~
agibsonccc
Right. Mainly just low hanging fruit for those who aren't in this stuff day to
day.

In a lot of my talks and day to day conversations, I've found people don't
know the difference between a feed forward architecture vs, recurrent, vs
recursive vs,...you get the point :P

------
dang
The neural Turing machines article has had significant recent attention here:
[https://hn.algolia.com/?q=neural+turing+machines#!/story/for...](https://hn.algolia.com/?q=neural+turing+machines#!/story/forever/0/neural%20turing%20machines).
If someone wants to post the "Learning to Execute" article, that would be
great.

Originally I thought we could just change the present url to that one, but
since the comments are only about the other paper, it seems better to just
treat this as a dupe.

~~~
javierluraschi
[http://arxiv.org/pdf/1410.4615v1.pdf](http://arxiv.org/pdf/1410.4615v1.pdf)

~~~
dang
I meant that someone should submit it as a story.

