Hacker News new | past | comments | ask | show | jobs | submit login
x86 is Turing-complete with no registers (2014) (mainisusuallyafunction.blogspot.com)
99 points by todsacerdoti on Jan 3, 2023 | hide | past | favorite | 49 comments



A similarly funny x86 project compiles programs only to "MOV" instructions. It's very much Turing complete.

https://github.com/xoreaxeaxeax/movfuscator


A few years ago a YouTube video about the movfuscator set me wondering about just how minimalist an instruction set could be without hurting code density (unsurprisingly programming with just MOVs yields spectacularly bad code density!) - and ultimately led to me embarking on my own toy CPU project!


Some one-instruction machines are surprisingly code-efficient.

Consider an instruction with four fields: A B C D. Words and addresses are normal two's-complement integers. Every cycle, the machine takes the value at address A, subtracts the value at address B, and stores it in address C. If the result of this, as a signed integer, is 0 or negative, it jumps to address D. Otherwise it take the next instruction. Using temporary values in RAM, self-modifying code and some constants, everything can be implemented with this single instruction. And a lot of it surprisingly efficiently. Some assembly-style syntax for such a machine:

    A  B  C   D
    #0 #0 tmp destination
Jump to destination. 0 - 0 = 0, so jump to D. The # is syntactic sugar for an immediate literal. The value is placed in an address and that address is used. Tmp is a memory address, a temporary scratch value. In my made-up syntax here this is implicit if there's no 4th field given:

    // move is simple
    src #0 dst 

    // subtract
    srca srcb dst

    // negate
    #0 src dst 
Addition is an exercise left for the reader :) Let's use * as the current address in the program, known at compile/assembly time. We can do a cheap subroutine call:

    #0 (*+3) retaddress subroutine
Negative of the return address of the next instruction is in retaddress. Result will always be negative for sensible addresses, so jump to subroutine. To return, just write the negative again back to the D field of an instruction:

    #0 retaddress (*+5) 
    #0 #0 tmp tmp  // last field will now have return address, jump
Of course, writing self-modifying code without checking it is guaranteed to not work. I probably got the offsets and stuff wrong. But hopefully it gives the idea. Doing the same with the address fields lets you do arbitrary indirection and address calls and make a stack. Often 2 - 5 instructions. Everything a typical RISC machine can do can be done in 1 - 30 instructions. Except multiply, divide, and bitwise boolean operations on the whole word, because you have test and branch on each bit in an inconvenient way. I'm sure something more efficient is possible, but it's quite an improvement on the classic subleq code density.

One-instruction set computer: https://esolangs.org/wiki/OISC


That’s a really interesting question (on code density) - can you share your conclusions or even your project?


The project was this: http://retroramblings.net/?page_id=1339 I ended up using a fixed instruction width of 8 bits, with eight classical registers (one of which is the program counter), and also a "temp" register which is used implicitly for opcodes which need a second operand. The code density is comparable to m68k and x86, better than 32-bit RISC ISAs, but not as good as the 16-bit compressed representations of 32-bit RISC ISAs.


Thanks so much! Looks like a really great project.

I'll have a full look through the blog later but just the summary of the code density results is very interesting (at http://retroramblings.net/?p=1414).

Do you have a gut feel for what is happening here and why the compressed 32 RISC ISAs do so well compared to yours?


I think there are three reasons: one is that the code my compiler backend produces is "ok" but not fantastic (though it's somewhat better now than it was when that article was written) - another is that I have only 8 registers, and only five are available for the code generator to play with, so I have to spill registers onto the stack sooner than more register-rich ISAs - and finally my simplistic 8-bit instructions don't do as much as a RISC ISA's 16-bit instructions, so some of the advantage of them being only a single byte is lost.


That makes a lot of sense. Thanks again for sharing such an interesting project.


The coolest proof of it being Turing complete is that there's a port of doom to it too


I think we're going to need a register-free Doom port too.


x86 is Turing-complete with no instructions, too: https://github.com/jbangert/trapcc


fwiw, linked to from the article:

> As others have shown, we can compute using alphanumeric machine code[1] or English sentences[2], using only the mov instruction[3], or using the MMU[4] as it handles a never-ending double-fault. Here is my contribution to this genre of Turing tarpit: x86 is Turing-complete with no registers.

[1] http://www.phrack.org/issues.html?issue=57&id=15#article

[2] http://www.cs.jhu.edu/~sam/ccs243-mason.pdf

[3] http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

[4] https://github.com/jbangert/trapcc


Ok this is ridiculously entertaining. Thanks!


This is a funny result in the RISC and ARM world, but it makes perfect sense in x86: x86 is a register-memory architecture instead of a load-store architecture, so memory is (effectively) an extremely large register bank.


Idk ... this is like saying your local grocery store is effectively your extremely large kitchen, it ignores the logistics and cost of access that create planning or out-of-order requirements for best performance.

> so memory is (effectively) an extremely large register bank.

This was true-ish however in the late 70's/early 80's/1Mhz CPU days - but registers were always better even if slightly. The 6502, for example, could load the X register with an immediate value in 2 cycles, or from zero page (first 256 bytes of RAM) in 3 cycles, or from an arbitrary 16-bit address in 4 cycles. The few register-to-register operations all work in 2 cycles. (Then you have the TMS9900 that actually did use RAM as registers - only having 3 - one for the program counter, one for the status register, and a "workspace pointer" that told the CPU where the fake 'registers' lived.)

Of course, x86 has elaborate caching mechanisms to help. Your freezer is still in your kitchen (cache), but you still have limited space you actually use to do work (countertop).

What RAM is basically a superfast I/O device (which is why memory-mapped I/O is a thing). It's funny that the IBM mainframe for RAM - "storage" - kinda made more and more sense the more that RAM speed diverged from CPU speed.


I believe the parent was talking about the abstraction provided by the ISA, not whether it was a good way to program. None of this experiment cares about the execution speed at all, it merely concerns itself with the technical feasibility of the technique in the ISA. This would not be possible at all in a load store oriented ISA, because you can’t ever do an alu instruction without first loading into a register.


I meant in the computation-theoretic sense: x86 doesn’t distinguish between types of operands in the way that load-store architectures do, which means that this result (TC without register operands) is not a surprising one.

You obviously shouldn’t use RAM as a naive replacement for GPRs in normal code (unless the goal is extremely fast JIT compilation where minimizing spills is not important).


I can't find the exact quote or attribute it, but a good take was "the x86 instruction set isn't particularly complex - it just doesn't make a lot of sense".


This might be a bit of a tangent but what's the advantages and disadvantages of register-memory vs. load-store? Thanks!


RM machines usually have more Memory Modes with which to access memory. You can do a math op while indirectly accessing a memory address that depends on a calculation of that address and add and offset to it. So that makes programming many things easier. It’s a little more how humans learn and think of math. Think using arrays, lists, vectors, hashes, trees. However, this often generates data dependencies which makes pipelining and ILP require more work to achieve efficiently. It also often makes the code dense.

https://en.m.wikipedia.org/wiki/Register%E2%80%93memory_arch...

LS is generally found in RISC machines and is also generally considered far easier to implement pipelining and ILP. Fewer and often simpler Memory Modes, more obvious or simpler to deduce data dependencies. But it often takes a few more instructions to complete an algorithm or even a simple expression. Less code density.

https://en.m.wikipedia.org/wiki/Load%E2%80%93store_architect...

There’s also stack machines which approximate math well especially if you like RPN/prefix notation. They just tends to make logical constructs like loops look unfamiliar when compared to math or typical/popular programming languages. So it makes programming easy, compiling easy since it’s relatively natural to analyze & compile expressions. Also, instructions can be extremely compact such as single bytes.

https://en.m.wikipedia.org/wiki/Stack_machine

Another slightly related take was VLIW. You take snippets of instructions (Basic Blocks in PL) that are known to fit certain pipelining or data dependencies rules. Theses blocks of code can then be executed concurrently since you have well established criteria. However making compilers able to predict when blocks should execute in the presence of a complex single processor turned out to make writing the compilers very complex.

https://en.m.wikipedia.org/wiki/Very_long_instruction_word


Aside, if you've had an interest in the Itanium processor, and the very long instruction word sound familiar, that's because it's one of the most famous, widely used example.

If memory serves, the compilers never really caught on. The assumption was that they would eventually figure out how to leverage the explicit parallelism efficiently, but it never materialized as a performant solution for most problems.


In principle, load-store architectures are simpler to implement, simpler to write compilers for, simpler to write speculative optimizations for, and have denser/simpler instruction encodings. Similarly in principle, register-memory architectures are simpler to write assembly for as a human, and simpler to write different speculative optimizations for.


register-memory will involve a more complicated encoding (to designate addressing modes, tho arguably register/memory is only 1 bit, but that halves your register count for the same size encoding, & it gets messy because your addressing encoding itself will want to reference registers & then you add in offsets/scaling for r1+r2*4) & screws up cycle accurate instruction timing (because memory latency is cache dependent)


I would imagine load-store to be simpler to implement in hardware.


Really cool article, but it's sad to read through all of the spam in the comments section. There appear to be 2 legitimate comments out of a few hundred


Discussed at the time:

x86 is Turing-complete with no registers - https://news.ycombinator.com/item?id=7224061 - Feb 2014 (23 comments)


> Conditional control flow is possible thanks to this gem of an instruction:

    ff242500004000            jmp qword [0x400000]
How is this "conditional"? The article explains that it simply "jumps to whatever address is stored as a 64-bit quantity at address 0x400000", but where's the condition here?


Because by being indirect, you can change the jump target in 0x400000 via the result of alu ops.


I'm guessing:

Create a 2-element array of branch targets. Compute a boolean condition function (0 or 1). Index into the 2-element array with this 0 or 1 value and write it into 0x400000, then jmp qword [0x400000].


Or know the difference between the addresses of the true pathway and the false pathway, multiply your 0 or 1 result by that difference and add that to whichever pathway's base is lower. No lookup table needed.


Everyone is finding a roundabout way to say “self modifying code”


No self modifying code needed, simply embracing pointer arithmetic more than is typically allowed without your sanity being brought into question. This technique works just fine with no changes to page permissions and W^X.


Changing and then calling a function pointer stored at some memory address isn’t really insane.


Since it's a jmp rather than a call it's not really a function pointer, but instead a basic block pointer that you're doing arithmetic on.


>"In a RISC architecture, every memory access is a register load or store, and our task would be completely impossible. But x86 does not have this property."

In any architecture, including x86, every memory access is an ALU load or store, and our task would be completely impossible.

>"But x86 does not have this property."

The x86 does have this property(!) -- but it is hidden/abstracted away from the programmer -- by the use of x86 microcode between x86 instructions -- and the ALU...

But -- an interesting and educational article nonetheless...

All x86 registers -- are basically just abstracted views of an ALU and/or Memory (both possibilities are merged into one "view", depending on which CPU operation happened last!) -- which is now may be cached by the CPU (as opposed to being a direct view of the ALU) -- at a given CPU/ALU state/time...(!)

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...


So, how much work would it be to tweak LLVM to output code for this hypothetical register-less x86? Is its architecture even flexible enough to support such a CPU?


Add [2014] to title.


Done.


Ask nicely.


Nice is overrated. The GP clearly identifies a problem and suggests a solution.


Nice is never overrated.


The original comment is as nice as it needs to be.


Being nice isn't a need, it's a courtesy. We extend courtesy to others because it is the right thing to do.


This is a misinterpretation of my comment. If you care about being nice, then start by trying to understand what people mean when they make a comment, rather than choosing an interpretation of the comment and responding to your own interpretation.

I see this a lot on HN. Any comment has different ways to be interpreted. It’s very frustrating when someone picks, out of the many different possible interpretations, an interpretation that allows them to argue against it.


I might be helpful for you to expand on what you actually meant and why you felt I misunderstood your position, because I’m not quite sure what I missed. I’d be happy to do the same for my comment: my point was that being nice is not something that really has a floor. It’s what you do to seem agreeable and pleasant. It’s also something that varies from person to person. I’m generally pretty accepting of not having enough niceness but I will still note that on the scale of how to phrase a request it’s on the ruder end, just a step up from actively demanding things from people. It doesn’t hurt to be more polite than this.


> It’s what you do to seem agreeable and pleasant.

It’s also highly cultural and differs depending on context. Something which is nice in one culture or context could be outright offensive in a different culture or context. Even if you are not traveling to another country, you can be shocked by the different ideas about how people should behave if you travel.

For example, someone from California could come to New York. Both are coastal, wealthy states in the US that vote Democratic. The Californian will think, “Geez, New Yorkers are so rude! They don’t say ‘hi‘ or ‘please excuse me’ or ‘thank you’.” Meanwhile, someone from New York goes to California and thinks, “Geez, Californians are so rude! They are passive aggressive, get in your way, and waste your time!”

It’s really just a difference of ideas about how people should behave. New Yorkers aren’t rude, despite the reputation. Californians aren’t rude either. They just have different ideas about what “rude” means. Californians will say all sorts of nice things to you to try and make you feel comfortable. New Yorkers will help you carry your luggage up the subway stairs without saying a word. Different worlds, different ways of behaving.

Any effort to moderate a forum like HN is going to run face-first into the problem of enforcing one set of standards consistently to people who are from not only both New York and California, but every other part of the world. It would be unreasonable to try and take one culture (like SF Bay area culture, since YC is based there) and enforce it on the forum, since you can’t reasonably expect everyone to be familiar with that culture’s norms.

The way you solve it is by using a looser set of standards that a broader set of people can agree on, and letting comments slide even if they don’t match what is seen as polite in your own culture. Postel’s law and all that.

If somebody says something like “Add [2014] to title.” you can only interpret it as “not nice” if you judge it against some specific cultural norm, like a California cultural norm. That’s not a good way to judge comments on HN, because HN gets comments from all over the world—and if you, for example, judge it from a New York cultural norm, you’d say “This person is being polite by getting to the point and not wasting my time.”


I think you're arguing something we already agree on. I personally found the comment to be curt, though I'm not sure I would call it rude. Still, I was more than happy to fix up the title, since I try to accept things that are below the standard of politeness I try to show others.

The part I was objecting to is actually the flip side of Postel's Law, for what you should do. The comment I replied to claimed "nice is overrated". It's not, because people actually like it when you are nice to them. And because everyone has a differing scale of what they personally consider to be polite, it's good to overcompensate a bit to try to appeal to a wider audience. You can't please everyone, but you can definitely make very small actions without descending into "toxic niceness" or whatever and avoid most problems.


It is when there's a demand for toxic niceness. Not every situation requires fluffy words.


When making a request it is traditional to recognize that you are asking a favor, however small, of someone else, and using a "please" can help signal that you are aware of this. Or, if you have an aversion to the p-word, just phrasing it as an appeal, e.g. "Can someone add [2014] to the title?", can help distinguish your ask from an order. I fail to see any toxicity in this phrasing.

Of course, I changed the title regardless, so it's not like it "really mattered"; I wouldn't have replied to this conversation if someone didn't start this thread. I certainly don't intend to demand anything from anyone. But kindness is generally free or very cheap, so I find it to be smarter to err on the side of being nice. And that's on top of whatever moral framework or whatever you may have that guides you to be considerate, hold doors open for other people, whatever, that you do to make the world a better place.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: