Hacker News new | comments | ask | show | jobs | submit login
MOV is Turing Complete [pdf] (cam.ac.uk)
131 points by AndrewDucker on Sept 1, 2013 | hide | past | web | favorite | 15 comments



On a similar thread, Julian Bangert and Sergey Bratus have an implementation of the game of life that performs its computation on the MMU using traps[1] that technically uses 0 x86 instructions to run the computation. There's a video of it running available too [2].

[1] : https://github.com/jbangert/trapcc

[2] : http://www.youtube.com/watch?v=eSRcvrVs5ug


They win since the game of life is Turing complete.


This amounts to using your x86 CPU as an OISC [1], more specifically, as one with a transport triggered architecture [2].

--

Edit: I wrote a tiny TTA CPU simulator to better understand the concept when I first encountered it back in 2006. I recently discovered its code in a old backup and this story has prompted me to upload it; here it is: https://bitbucket.org/networked/oisc-sim. The code is ugly in places and lacks comments but it still might be easier to get started with than x86 assembly.

The simulator has the command syntax of

    REG1 REG2
where both REG1 and REG2 are register numbers. This transfers the contents of REG1 to REG2. By default all registers contain unsigned int values. There are registers that are mapped to the sum, difference, product, etc. of registers 0 and 1; that's how you do arithmetic and comparisons. Flow control is accomplished by moving the desired instruction number to REG_PC (register 3). Input and output (numerical only) is done by moving data to and from REG_IO (register 4). To store a literal move its value to REG_CONST (register 11; you can then move it to whatever register you like from there).

If you find yourself short on registers increase REGS_N to, say, 256 in mov-rev.c.

Because it's easy to get lost in the register numbers a sample factorial program I've included (https://bitbucket.org/networked/oisc-sim/src/3b3903518033494...) uses the C preprocessor to assign them names. Download the simulator and type

    make fact
to try it. It'll ask you to input a single number followed by the <enter> key.

There's no documentation to speak of but you can get a dynamically generated special register reference by running

    mov-rev -h
[1] https://en.wikipedia.org/wiki/One_instruction_set_computer

[2] https://en.wikipedia.org/wiki/Transport_triggered_architectu...


From the paper:

> Finding Turing-completeness in unlikely places has long been a pastime of bored computer scientists. The number of bizarre ma- chines that have been shown Turing-complete is far too great to describe them here, but a few resemble what this paper describes.

> ... [several examples] ...

> A single instruction machine using only MOVE was described by Douglas W. Jones [1], where Turing-completeness is gained by having memory-mapped arithmetic and logic units (so that other operations can be performed by moving data a predefined memory location and collecting the result afterwards). Some machines based on this principle have been built, and are generally known as “move machines” or “transport-triggered architectures”.

Also:

> Removing all but the mov instruction from future iterations of the x86 architecture would have many advantages: the instruction format would be greatly simplified, the expensive decode unit would become much cheaper, and silicon currently used for complex functional units could be repurposed as even more cache. As long as someone else implements the compiler.

:)


> > Removing all but the mov instruction from future iterations of the x86 architecture would have many advantages: the instruction format would be greatly simplified, the expensive decode unit would become much cheaper, and silicon currently used for complex functional units could be repurposed as even more cache. As long as someone else implements the compiler.

I realize it's tongue in cheek, but saying the decode unit in x86 is expensive is kind of a tired and untrue adage in today's computer architecture terms. But I guess it is fair to say that the die is stuffed with as much cache as they can fit :)


It may not be accurate to say "decode unit" anymore but who wouldn't welcome anything removing the large amounts of cruft from the ISA? Nobody uses most of the damn opcode space except where required for mode trickery or legacy code.


From the Discussion section:

  Removing all but the mov instruction from future iterations of
  the x86 architecture would have many advantages: the instruc-
  tion format would be greatly simplified, the expensive decode unit
  would become much cheaper, and silicon currently used for com-
  plex functional units could be repurposed as even more cache. As
  long as someone else implements the compiler.
:D


I was very familiar with references #1 and #2, those are good papers. Three minor issues:

1) Both this article and reference #3 make claims about not using or needing adders to avoid self modifying code, although both implement indexed addressing where you take a base address and a offset and er... um... argh... ADD them together. So you're using an adder but playing word games. Only reference #2 actually "does it" without an adder. If you're doing it for "fun" then self modifying code is plenty of "fun".

2) The claim the compiler is going to be the icky part is not accurate. To a crude approximation a compiler would do something about as hard as a FPGA synth of a microcode decoder for a soft CPU. Not as simple as implementing a HTML BLINK tag but its just no big deal breaker. The icky part is going to be performance. Just adding a little cache isn't going to help if your code stalls it inherently and you're going to be pretty well memory starved. Good luck doing a fast floating point multiply-add for a DSP with this architecture.

3) A minor issue but it should be possible to get instantaneous power very low with this design. No one writes about this interesting issue (unless its in reference 3 which I don't have access to) Of course "power per FLOP" type of metrics are (Probably? Inevitably?) going to suck, unless I'm wrong.

I've done a lot of daydreaming about actually implementing a system outta reference #1 for fun. Its about the simplest hardware "processor" you can possibly build. After all, you're going to need (memory mapped) I/O anyway and who cares if one of the memory mapped I/O is a 32 bit wide full adder, or whatever other CPU-like functional module you'd like to entertain. I have two amusing daydreams, one is a 256 bit wide move machine (why 256 bits wide? Why the heck not? Difficult, but barely inside the limit of buildability) the other strange daydream is vaguely like an old IBM1620 but why not try floating point instead of BCD? Yes I'm no noob I know all about floating point rounding issues and I consider it a huge part of the "fun". Also I'm well aware a binary adder would be a wee bit quicker for the program counter than a floating point adder... that's the whole point I think a pure float machine would be hilarious fun. I'm not talking about merely sticking a floating point copro on a chip, but literally using IEEE 754 on the address bus (obviously with a bit of rounding) Quad precision floating point should be enough digits for an address bus to have a heck of a lot of fun with some slack for rounding issues.


> 1) Both this article and reference #3 make claims about not using or needing adders to avoid self modifying code, although both implement indexed addressing where you take a base address and a offset and er... um... argh... ADD them together. So you're using an adder but playing word games.

Your concern is addressed in this paper:

"It would appear that we are cheating slightly by allowing arithmetic in addresses. However, our "arithmetic" is of a very restricted form: the indexed instructions may only be used when R_{src} (or R_{dest} for stores) is an even-numbered memory address. Since offsets are always 0 or 1, then our "arithmetic" can be implemented as bitwise OR, or even bitstring concatenation."


Hmm, I still don't see it. Can bitwise OR be implemented with just MOV?


This share is most likely a cross-post from a great community of reversing folks at http://www.reddit.com/r/ReverseEngineering/ . I encourage interested people to watch out for other papers and resources related to reverse engineering software and hardware at that subreddit.


> So, our Turing machine simulator consists of a sequence of `mov` instructions, followed by an unconditional branch back to the start

Newbie question: Wouldn't it be possible to just `mov` the start address of the program into the instruction pointer register at the end of the program?


The conclusion is worth reading.

"As long as someone else implements the compiler." Typical british humor.


If you want to design your own TTA processor, check out: http://tce.cs.tut.fi/





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: