
Neural Turing Machines (2014) - rfreytag
https://arxiv.org/abs/1410.5401
======
nickpsecurity
Interesting. Reminds me of old attempts on using NN's or ML techniques to
derive programs from data kind of merged with fuzzy-logic controllers. Here's
two links from the analog, Turing-complete NN's they cited in the paper:

[http://binds.cs.umass.edu/anna_cp.html](http://binds.cs.umass.edu/anna_cp.html)

[http://research.cs.queensu.ca/home/akl/cisc879/papers/PAPERS...](http://research.cs.queensu.ca/home/akl/cisc879/papers/PAPERS_FROM_MINDS_AND_MACHINES/VOLUME_13_NO_1/J7L1675237505M16.pdf)

It would be interesting to see designs like theirs and others we see posted
trained until they got really good at application area, then synthesized into
digital or analog NN implementations with plenty of resiliency. The components
are so simple they can run with few gates or other components. Anyone studying
NN's know what current state of the art is for synthesizing efficient HW from
a given NN model? With or without ability to continue to learn/improve.

~~~
petra
There was work done by Doug Burger from microsoft, spliting an algorithm to
between a digital Cpu and a neural compute unit(simulated in analog). In
algorithms that we're not bottlenecked by the digital cpu, they've even seen
gains of 50x in perf/w.

But most algorithms they tested we're bottlenecked by the digital part, so the
average benefit was ~3x.

But maybe if you start the design using this "turing neural network", you
could see consistently large gains.

------
pron
Is it just me or is anyone else bothered by the lack of theory here? The
results are great, but seem so heuristical and ad-hoc, like somebody tried
lots of things and one seemed to work, but there's no clear idea as to why,
and certainly no general theory tying things together.

~~~
zodiac
I can't speak to the RL methods used here, but wrt to the architecture they
designed, I think it might seem less ad-hoc if you know that generally
speaking new neural net architectures work well when they

1) can do some useful computation

2) with a good loss function

3) with a small number of parameters

4) in a differentiable way

5) where the gradients can flow

So for instance LSTMs are better than RNNs due to (5), convnets are good due
to (3), etc.

~~~
argonaut
You've just numbered 5 different rules of thumb... That hardly supports your
idea that things are not adhoc.

~~~
zodiac
Firstly, (1) and (2) are not rules of thumbs for building nice architectures,
but basic properties about supervised learning and what kind of problems it
can solved.

Even if it were 5 rules of thumb instead of 3, "design an architecture to
satisfy 5 rules" is a lot less ad-hoc than "somebody tried lots of things and
one seemed to work", which was what I was replying to. We also know why each
of these rules are important.

~~~
argonaut
I'll concede that 1) and 2) are basic properties that an educated layperson
would understand after reading a few ML blog posts. Hardly a working theory of
neural nets.

They're still rules of thumb. You've argued that they are less adhoc than if
they didn't exist. That's misses the point: that is still enormously adhoc.

------
kikill
funny, i was just reading this article on medium, about NTM
[https://medium.com/snips-ai/ntm-lasagne-a-library-for-
neural...](https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-
turing-machines-in-lasagne-2cdce6837315#.owfvji1oy)

~~~
albi_lander
Thanks for sharing, very nice article indeed. Looks like the guy works here
--> [https://snips.ai/](https://snips.ai/). Their product looks promising and
branded as an "intelligent memory", would be interesting to know if they're
using an NTM implementation in their product.

