Turing Machines Are Recurrent Neural Networks (1996)

csantini · on Jan 20, 2016

TLDR: Given any Turing Machine you can build a Recurrent Neural Network that does the same thing.

You can then apply known facts about Touring machines, e.g. Halting Problem, so you can't generally predict when the neural network is gonna get to the final state.

seiji · on Jan 20, 2016

The simplest definition of turing completeness is: a system capable of manipulating symbols with no restrictions on recursive depth.

Basically, if you can write "while (true) { doStuff(); }" (or an equivalent TCO version), your system is turing complete.

So, all the "deep learning" fads aren't even turing complete (if we don't consider RNNs "deep learning") because they are feedforward/backprop networks of explicitly limited depth.

Humans are turing complete, but only for limited time spans. Humans get tied up in a version of the halting problem called "this is dumb let's do something else." It's difficult for humans to execute "find the highest prime number" without other needs derailing the process over time (boredom, bathroom breaks, biological death).

pygy_ · on Jan 20, 2016

> "while (true) { doStuff(); }" [...] Humans are turing complete [...]

The example I like to give when explaining algorithms and turing completeness to laypeople is "Stir until you get a smooth mix", and by extension, cooking recipes.

hobs · on Jan 20, 2016

After thinking about that for a second, that is my new favorite analogy. Way better than anything I have ever come up with to explain the same thing.

andreyk · on Jan 20, 2016

FYI, RNNs are a form of Deep Learning these days (in particular LSTM and GRU RNNs, very popular for language translation and much more for a few years now). There is also research lately into weirder types of neural nets such as 'Neural Turing Machines' (http://arxiv.org/abs/1410.5401).

seiji · on Jan 20, 2016

Sadly, "deep learning" (much like "AI" itself) has become more of a marketing term and less of a technical term these days.

If I tell someone I made a deep learning system, it conveys no details other than it's something besides a single linear discriminator.

deep learning is in the "Peak of Inflated Expectations" phase of our never ending tech hype cycle. Like most buzzwords, it holds meaning for practitioners, but when marketing dweebs with no knowledge are hired to write endless clickbait about technical terms with no underlying understanding, it dilutes the entire vocabulary ecosystem until even the practitioners can't describe what they do using their own terminology anymore.

andreyk · on Jan 20, 2016

True, it's impossible to argue "deep learning" is not in the midst of a massive hype boom, and it does just generically convey one is using a deep rather than shallow learning approach. But at least the research going on there is still quite exciting, or so it seems to me.

nbouscal · on Jan 20, 2016

Given that it's so common to present RNNs as feed-forward nets with a layer for each time step, it doesn't seem at all unreasonable to group them in with deep learning.

0xdeadbeefbabe · on Jan 20, 2016

My intuition suggests termination analysis is even more difficult with a neural network because you can't open them up and see the code.

daveguy · on Jan 20, 2016

Computationally they are the same. Both levels of difficulty are : not possible in general. In special cases they would be algorithmic solutions and the connections in an RNN are just as well defined as statements and function calls in a procedural program.

kazinator · on Jan 20, 2016

The paper seems to show that recurrent networks are Turing Machines, not the other way around.

They are finite state machines, unless the variables stored at the nodes are of unlimited size (so that a pushdown stack or whatever of unlimited depth could be represented in a variable). That goes against the "spirit", if you will, of neural networks, which maintain some small amount of state at each node.

All finite state machines are Turing Machines, of course. Just the reverse isn't true.

I don't see the point of the Example, which clearly shows a representation of a computation in the form of a finite sized matrix with bounded values, iterating on a vector. Such a contraption is not a Universal Turing Machine no matter what values you stuff into it and what initial vector you use.

Ironically, the paper was written on a computer built on sequential logic, which is network of combinational gates which feedback: essentially a recurrent neural network, and its purpose is evidently to argue that such a thing can compute.

Ah, we have this flights of fancy toward the end:

The variables can be continuous, not only integer valued.

Come on, there is no such thing. Continuous variables are not constructable (just some of them). They are only an abstraction in math. A continuous variable effectively consists of infinite bits. Even those that are called computable, like the number pi, still cannot be computed to completion. What "computable means is that there is an algorithm which makes progress toward a more and more accurate value: for any specific integer n, the function "find the n-th digit of pi" completes in a finite number of steps. That doesn't mean we can have a continuously valued variable and stick pi in it.

Moreover, in this fantasy world where we have true continuous variables, why can't we also have them in the original language whose elements are equated with the neural network? After all "for each variable V in the program, there exists a node Nv. If node Nv is continuous, so must be the variable V in the program.

What the paper really shows is that the syntax of some abstract algorithmic, imperative steps can be transliterated into the syntax of a network with feedback which does the same thing.

j2kun · on Jan 20, 2016

> An interesting question arises, for example, whether the class of NP-complete problems could be attacked more efficiently in the network environment!

It is well known that training even a two-layer feed-forward neural network is NP-hard, so I doubt that this is a reasonable direction. This was even known in 1993. [1]

[1]: https://people.csail.mit.edu/rivest/pubs/BR93.pdf