

Registers vs. stacks for interpreter design (2003) - luu
http://www.sidhe.org/~dan/blog/archives/000189.html

======
kazinator
The instruction stream density comparison seems a little deceptive. For the
stack we have:

    
    
        push "a"
        getvar
        push "b"
        getvar
        add
        push "c"
        storevar
    

whereas for the register interpreter (I'm fixing the example a little bit to
do the store in c):

    
    
        getvar R1, "a"
        getvar R2, "b"
        add R3, R1, R2
        setvar R3, "c"
    

The thing is, if the register machine can have a single getvar instruction
which takes immediate operand and a register, why can't the stack machine also
have a single instruction which also takes an immediate operand and and pushes
onto the stack:

    
    
        pushvar "a"
        pushvar "b"
        add
        popvar "c"
    

Now the instruction stream looks denser: just four instructions, and none of
them reference a register, so they can have a smaller encoding, perhaps.

Vice versa, the register machine's instruction stream density can be puffed up
if we don't have a getvar instruction that takes an immediate operand, but
only a register pair:

    
    
        load R1, "a"
        getvar R1, R1
    

and so on.

~~~
userbinator
It's like the classic RISC vs CISC debate. The example could turn into a
single instruction:

addvar "a", "b", "c"

I think a hybrid stack/register architecture gives the best code density -
something like the x87, where you can choose to access the registers like a
stack or by index directly. Look at some of the sub-256-byte demos for a good
example of x87 code density.

------
kazinator
I think where you win with the register machine is in compiling expressions
where you have lots of common subexpressions.

Once you have the value of a repeated subexpression in a register, you can
just refer to that register.

On a stack machine, your "register" is the top of the stack, and each time you
refer to it with a pop, you lose it.

    
    
        a = expr1 + expr2;
        b = expr1 - expr2;
        c = f(expr1, expr2);
    

Register code:

    
    
        # code to evaluate expr1 to R1
        # code to evaluate expr2 to R2
        add R5, R1, R2
        sub R6, R1, R2
        call f  # R1-R4 are used are also used for parm passing
        store R5, a
        store R6, b
        store R1, c
    

Stack:

    
    
        # code to evaluate expr1 to top of stack
        # code to evaluate expr2 to top of stack
        # now what? we want to keep expr1 and expr2 in the
        # stack and use these values, too, while
        # also leaving intermediate
        # results in the stack. "dup, rot, swap hell".
    

The obvious thing is to create compiler-generated temporary local variables,
and shunt out of the stack into these. But that's just registers in disguise!

Basically, register files have the undeniable advantage of random access,
which works favorably when expressions are DAGs rather than trees.

~~~
Dewie
While we're at it, maybe we might relate this to consuming values by using
them, and creating explicit copies when we need to consume them more than
once. This avoids having to deal with any kind of garbage collection or
reference counting, but you have to do the copying of the whole data structure
instead of copying the (sometimes/often) smaller _pointers_ to the data itself
(random access).

There was some work on this by someone, called something like Linear Lisp. It
looked like Forth with parenthesis.

~~~
DougMerritt
The illustrious Henry Baker, based on the slightly earlier (and extremely
influential) Linear Logic.

[http://www.pipeline.com/~hbaker1/LinearLisp.html](http://www.pipeline.com/~hbaker1/LinearLisp.html)

There's a conceptual relationship to SSA (static single assignment)

[http://en.wikipedia.org/wiki/Static_single_assignment_form](http://en.wikipedia.org/wiki/Static_single_assignment_form)

~~~
agumonkey
This is a wonderful read. Opens your eyes on how different things can be done
under the hood.

------
overgard
I'm not an expert on interpreters, but from the description it sounds like
register machines are very CPU cache unfriendly, whereas stacks by their
nature are very cache friendly by their nature. (It's a complicated subject,
but the basic idea is: contiguous memory is cache friendly, random access is
not). Given how catastrophic cache misses can be for performance (LOTS of CPU
stalling), I imagine the instruction density really doesn't mean much compared
to keeping the CPU fed with instructions.

The part of all this that is hard is that, when you're working at this
abstraction level, you're not writing for an abstract machine, you're writing
for real computers that are highly pipelined and heavily dependent on
relatively small caches.

~~~
gsg
Register machines have no particular problems with cache due to random access.
The typical arrangement is that there are instructions that allocate and
deallocate a block of N contiguous slots from a stack, and "registers" refer
to storage within the currently allocated block. This isn't any worse than a
stack machine.

There is a real issue with instruction density, however, in that the
interpreter will touch many cache lines containing instructions in an
unpredictable order. An inefficient encoding will result in overhead due to
cache lines being taken away from program data to serve the interpreter.

------
ufo
Lua also moved to a register-based design around the same time.

The developers talk about that a bit in section 7 of this paper:
[http://www.tecgraf.puc-
rio.br/~lhf/ftp/doc/jucs05.pdf](http://www.tecgraf.puc-
rio.br/~lhf/ftp/doc/jucs05.pdf)

------
Someone
[https://www.usenix.org/legacy/events/vee05/full_papers/p153-...](https://www.usenix.org/legacy/events/vee05/full_papers/p153-yunhe.pdf)
(2005) is the best stack vs registers comparison I know of (but I am, at best,
a hobbyist in this field, so corrections are welcome)

I think there are three options for fast VMs:

\- a stack machine that JITs to registers (JVM, C#)

\- a register machine whose registers match that of the target architecture (I
think Lua is or was an example here)

\- a register machine with infinitely many registers that JITs to one whose
registers match that of the target architecture (LLVM). I think the call is
still out here whether it is possible to make a truly fast JITter here, but I
expect that is just a matter of lack of effort.

Problem with the middle approach is that a portable JITter that wants to use
the full power of a CPU must handle both the case where the hardware has fewer
registers than the VM and the case where it has more. The other two only have
to handle one of these cases. That's why they are more popular for multi-
platform solutions.

And of course, there is the case of JITting to stack CPUs. For practical
purposes, one can mostly ignored that case nowadays.

~~~
chrisseaton
I'm not sure it makes much difference for JIT as by the time it's gone into
the JIT's IR and had multiple optimisation passes applied to it I woukd
imagine it'll be in either SSA or PDG format and what format the byte codes
came in will be long forgotten.

------
marcosdumay
In a VM, be careful about how you define "stack" and "register", because it's
really easy to get definitions where most machines are both. Or, in other
words, that's one area where the details are much more important than the
overall picture.

Anyway, the entire problem comes from the fact that we are currently JITing
the same code again and again. That's the best issue to attack. Save your
JITed code for improving it later, and the entire discussion goes away. (And
yes, that'll probably require some help from the OS)

------
nivertech
Some discussion about Stack vs Register-based VM in the context of Bitcoin
script / Ethereum VM:

[http://www.slideshare.net/nivertech/ethereum-vm-and-dsls-
for...](http://www.slideshare.net/nivertech/ethereum-vm-and-dsls-for-smart-
contracts)

Another relevant blog post:

My history with Forth and stack machines

[http://yosefk.com/blog/my-history-with-forth-stack-
machines....](http://yosefk.com/blog/my-history-with-forth-stack-
machines.html)

