
MOV is Turing Complete [pdf] - AndrewDucker
http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
======
testbro
On a similar thread, Julian Bangert and Sergey Bratus have an implementation
of the game of life that performs its computation on the MMU using traps[1]
that technically uses 0 x86 instructions to run the computation. There's a
video of it running available too [2].

[1] : [https://github.com/jbangert/trapcc](https://github.com/jbangert/trapcc)

[2] :
[http://www.youtube.com/watch?v=eSRcvrVs5ug](http://www.youtube.com/watch?v=eSRcvrVs5ug)

~~~
hannibal5
They win since the game of life is Turing complete.

------
networked
This amounts to using your x86 CPU as an OISC [1], more specifically, as one
with a transport triggered architecture [2].

\--

Edit: I wrote a tiny TTA CPU simulator to better understand the concept when I
first encountered it back in 2006. I recently discovered its code in a old
backup and this story has prompted me to upload it; here it is:
[https://bitbucket.org/networked/oisc-
sim](https://bitbucket.org/networked/oisc-sim). The code is ugly in places and
lacks comments but it still might be easier to get started with than x86
assembly.

The simulator has the command syntax of

    
    
        REG1 REG2
    

where both REG1 and REG2 are register numbers. This transfers the contents of
REG1 to REG2. By default all registers contain unsigned int values. There are
registers that are mapped to the sum, difference, product, etc. of registers 0
and 1; that's how you do arithmetic and comparisons. Flow control is
accomplished by moving the desired instruction number to REG_PC (register 3).
Input and output (numerical only) is done by moving data to and from REG_IO
(register 4). To store a literal move its value to REG_CONST (register 11; you
can then move it to whatever register you like from there).

If you find yourself short on registers increase REGS_N to, say, 256 in mov-
rev.c.

Because it's easy to get lost in the register numbers a sample factorial
program I've included ([https://bitbucket.org/networked/oisc-
sim/src/3b3903518033494...](https://bitbucket.org/networked/oisc-
sim/src/3b3903518033494476ad10f4f3eb0d0bf59166b6/fact.oisc?at=master)) uses
the C preprocessor to assign them names. Download the simulator and type

    
    
        make fact
    

to try it. It'll ask you to input a single number followed by the <enter> key.

There's no documentation to speak of but you can get a dynamically generated
special register reference by running

    
    
        mov-rev -h
    

[1]
[https://en.wikipedia.org/wiki/One_instruction_set_computer](https://en.wikipedia.org/wiki/One_instruction_set_computer)

[2]
[https://en.wikipedia.org/wiki/Transport_triggered_architectu...](https://en.wikipedia.org/wiki/Transport_triggered_architecture)

~~~
tekacs
From the paper:

> Finding Turing-completeness in unlikely places has long been a pastime of
> bored computer scientists. The number of bizarre ma- chines that have been
> shown Turing-complete is far too great to describe them here, but a few
> resemble what this paper describes.

> ... [several examples] ...

> A single instruction machine using only MOVE was described by Douglas W.
> Jones [1], where Turing-completeness is gained by having memory-mapped
> arithmetic and logic units (so that other operations can be performed by
> moving data a predefined memory location and collecting the result
> afterwards). Some machines based on this principle have been built, and are
> generally known as “move machines” or “transport-triggered architectures”.

Also:

> Removing all but the mov instruction from future iterations of the x86
> architecture would have many advantages: the instruction format would be
> greatly simplified, the expensive decode unit would become much cheaper, and
> silicon currently used for complex functional units could be repurposed as
> even more cache. As long as someone else implements the compiler.

:)

~~~
ibrahima
> > Removing all but the mov instruction from future iterations of the x86
> architecture would have many advantages: the instruction format would be
> greatly simplified, the expensive decode unit would become much cheaper, and
> silicon currently used for complex functional units could be repurposed as
> even more cache. As long as someone else implements the compiler.

I realize it's tongue in cheek, but saying the decode unit in x86 is expensive
is kind of a tired and untrue adage in today's computer architecture terms.
But I guess it is fair to say that the die is stuffed with as much cache as
they can fit :)

~~~
duaneb
It may not be accurate to say "decode unit" anymore but who wouldn't welcome
anything removing the large amounts of cruft from the ISA? Nobody uses most of
the damn opcode space except where required for mode trickery or legacy code.

------
_quasimodo
From the Discussion section:

    
    
      Removing all but the mov instruction from future iterations of
      the x86 architecture would have many advantages: the instruc-
      tion format would be greatly simplified, the expensive decode unit
      would become much cheaper, and silicon currently used for com-
      plex functional units could be repurposed as even more cache. As
      long as someone else implements the compiler.
    

:D

------
VLM
I was very familiar with references #1 and #2, those are good papers. Three
minor issues:

1) Both this article and reference #3 make claims about not using or needing
adders to avoid self modifying code, although both implement indexed
addressing where you take a base address and a offset and er... um... argh...
ADD them together. So you're using an adder but playing word games. Only
reference #2 actually "does it" without an adder. If you're doing it for "fun"
then self modifying code is plenty of "fun".

2) The claim the compiler is going to be the icky part is not accurate. To a
crude approximation a compiler would do something about as hard as a FPGA
synth of a microcode decoder for a soft CPU. Not as simple as implementing a
HTML BLINK tag but its just no big deal breaker. The icky part is going to be
performance. Just adding a little cache isn't going to help if your code
stalls it inherently and you're going to be pretty well memory starved. Good
luck doing a fast floating point multiply-add for a DSP with this
architecture.

3) A minor issue but it should be possible to get instantaneous power very low
with this design. No one writes about this interesting issue (unless its in
reference 3 which I don't have access to) Of course "power per FLOP" type of
metrics are (Probably? Inevitably?) going to suck, unless I'm wrong.

I've done a lot of daydreaming about actually implementing a system outta
reference #1 for fun. Its about the simplest hardware "processor" you can
possibly build. After all, you're going to need (memory mapped) I/O anyway and
who cares if one of the memory mapped I/O is a 32 bit wide full adder, or
whatever other CPU-like functional module you'd like to entertain. I have two
amusing daydreams, one is a 256 bit wide move machine (why 256 bits wide? Why
the heck not? Difficult, but barely inside the limit of buildability) the
other strange daydream is vaguely like an old IBM1620 but why not try floating
point instead of BCD? Yes I'm no noob I know all about floating point rounding
issues and I consider it a huge part of the "fun". Also I'm well aware a
binary adder would be a wee bit quicker for the program counter than a
floating point adder... that's the whole point I think a pure float machine
would be hilarious fun. I'm not talking about merely sticking a floating point
copro on a chip, but literally using IEEE 754 on the address bus (obviously
with a bit of rounding) Quad precision floating point should be enough digits
for an address bus to have a heck of a lot of fun with some slack for rounding
issues.

~~~
mietek
> _1) Both this article and reference #3 make claims about not using or
> needing adders to avoid self modifying code, although both implement indexed
> addressing where you take a base address and a offset and er... um...
> argh... ADD them together. So you 're using an adder but playing word
> games._

Your concern is _addressed_ in this paper:

"It would appear that we are cheating slightly by allowing arithmetic in
addresses. However, our "arithmetic" is of a very restricted form: the indexed
instructions may only be used when R_{src} (or R_{dest} for stores) is an
even-numbered memory address. Since offsets are always 0 or 1, then our
"arithmetic" can be implemented as bitwise OR, or even bitstring
concatenation."

~~~
adsche
Hmm, I still don't see it. Can bitwise OR be implemented with just MOV?

------
turnersr
This share is most likely a cross-post from a great community of reversing
folks at
[http://www.reddit.com/r/ReverseEngineering/](http://www.reddit.com/r/ReverseEngineering/)
. I encourage interested people to watch out for other papers and resources
related to reverse engineering software and hardware at that subreddit.

------
adsche
> So, our Turing machine simulator consists of a sequence of `mov`
> instructions, followed by an unconditional branch back to the start

Newbie question: Wouldn't it be possible to just `mov` the start address of
the program into the instruction pointer register at the end of the program?

------
djulius
The conclusion is worth reading.

"As long as someone else implements the compiler." Typical british humor.

------
Geee
If you want to design your own TTA processor, check out:
[http://tce.cs.tut.fi/](http://tce.cs.tut.fi/)

------
venti
[http://www.youtube.com/watch?v=hdcTmpvDO0I](http://www.youtube.com/watch?v=hdcTmpvDO0I)

