

Neural Turing Machines - willwill100
http://arxiv.org/abs/1410.5401

======
teraflop
Previous discussion:
[https://news.ycombinator.com/item?id=8487807](https://news.ycombinator.com/item?id=8487807)

------
iandanforth
I'm going to be stupid in public on the hope that someone will correct me.

1\. I'm not clear on the point of this paper.

There are a lot of buzzwords and an extremely diverse set of references. The
heart of the paper seems to be a comparison between Long-Short-Term-Memory
(LSTM) recurrent nets and their NTM nets. But they don't expose the network to
very long sequences, or sequences broken by arbitrarily long delays which are
what LSTM nets are particularly good at. They seem to make the jump from "LSTM
nets are theoretically turing complete" to "LSTM nets are a good benchmark for
any computational task."

2\. The number of training examples seems huge

For many of the tasks they trained over hundreds of thousands of sequences.
This seems like very very slow learning. If I'm meant to interpret these
results as a network learning a computational rule (copying, sorting etc) is
it really that impressive if it takes 200k examples before it gets it right?
(Not sarcasm, I really don't know.)

~~~
dave_sullivan
Re: point of the paper, I think it's addressing a current need within
representation learning research where there's this question of "Ok, we can
teach really large neural networks stuff, but how do we compress that
knowledge efficiently?" How can we learn more
compact/efficient/reliable/discrete representations? I've only just finished
reading it through and this seems to me to be a promising direction and one
I'd like to see more research on.

Re: number of training examples, I'm taking the chart on pg 11 to mean the
number of training examples shown. Based on that, it looks like the NTM is
learning _a lot_ faster than the LSTM. As far as I can tell, it's getting near
0 loss about 20,000 examples in? It depends on the domain for whether learning
w/ 20k examples is impressive or not, personally I think it's comparatively
impressive.

Re: cherry picking of tasks to highlight perceived strengths of NTM, fair
enough. Although this is one I'll be playing around with a bit to find out
where that starts and stops...

Any thoughts on how this compares to the approach of HTMs?

------
macrael
Does a "typical" neural network not have any storage to speak of? When I've
seen examples of neural networks working, it's seemed like they work in cycles
in some way, with the states of each "neuron" affecting the state of others.
Is that not potentially storage?

~~~
robert_tweed
Recurrent neural networks work that way, but typically when anyone says
"neural network" they mean a feed-forward network, which is as dumb as a lead
pipe. It has no memory other than the trained synapse weights. After training,
all it does is transform some input data into output data. It's a pure
function. It's also a lot like a matrix transform.

I haven't read this paper fully yet, but it seems to be an attempt at
simplifying RNNs by replacing some of the magic internal state, which tends to
make them hard to reason about, with a more direct memory architecture.

