Hacker News new | past | comments | ask | show | jobs | submit login
Stack Based Virtual Machines (2015) (andreabergia.com)
58 points by signa11 on Dec 29, 2020 | hide | past | favorite | 18 comments



Something overlooked in this (and many other articles about stack based VMs): if the stack based VM requires a well balanced stack (as is the case for JVM), the bytecode can be trivially rewritten to a register based VM: number each stack location and trace all basic-blocks to see which stackslot they hit. Then just replace all the push 10, pop with "move 10 to slot/register-1", "load slot/register-1" etc. The converse is much harder, so I really prefer to work with stack based VMs as they are more condense and if speed is needed a linear (in function size) scan can rewrite it to make the mapping to cpu registers much easier.


> a well balanced stack

what is a well-balanced stack?


They mean the stack depth is guaranteed to be statically apparent at each point in the control flow. In the JVM the class loader checks this with an analysis phase during loading.

(The JVM design makes this harder than it really needs to be, since it doesn't insist on reducible control flow, iirc. You could say OTOH that irreducible control flow is more valuable than trivial analyzability, which is a reasonable take, but conflicts with the above comment's design point of a stack machine being both very simple for an interpreter and not an obstacle to quickly compiling to decent register-machine code. Needing a fancier stack-depth analysis would be that sort of obstacle.)


The stack is balanced if everything pushed at a given level of scope is popped as that scope closes.

This precludes variable-stack-effect sequences as one might have in a Forth definition like:

    : ?drop  dup if drop then ;


> Closely related to stack based VM are register based virtual machines. They are also interpreters of bytecode, but their design is quite different, since they don’t use the stack for the operands but rather a set of registers.

This is a common source of confusion but register based VMs also use a stack.

In a stack based VM, the arguments and results to all operations are implicitly pushed and popped from the top of the stack. For example, the instructions to add 10 plus 20 would look similar to this:

    PUSH 10
    PUSH 20
    ADD  
In a register based VM, the arguments and results are stored in virtual registers. For example, to do the same as before the instructions might look more like the following:

    STORE R1 10
    STORE R2 20
    ADD R3 R1 R2
However, those registers are also part of the stack! The way it works is that the interpreter maintains a stack of values, just as in a stack based interpreter. When a function starts running, it grows the stack by a fixed amount, allocating an "activation record". When it executes, the registers in the instructions refer to slots in this activation record, which is part of the stack. It's not like in physical computer, where the registers and the stack are stored in separate places.

The main interpreter loop of a register based VM is actually quite similar to the main loop of a stack based VM. The part that is more different is the bytecode generation.


> However, those registers are also part of the stack!

This is a confusion of two kinds of stacks. There is the call stack, and the operand stack. The call stack stores frames for executing procedures (functions) and the operand stack (if any) is used for local data flow. The operand stack, again, if any, may be nested inside of the storage of the call stack (eliminating one of two stack pointers), but need not necessarily be.


Yes, and Forth had these dual stacks, operand stack and return stack, back in 1970. The abstract SECD machine had dual stacks in 1964.


Follow-up question: Is the difference between a stack + locals and registers just the instruction set? I imagine locals are allocated to and stored on the stack, exactly like you discuss for the registers. But in a stack VM, you can't work on them directly. Instead you have to load them into this special other place (the stack) and work on them there. Instruction set is simpler but at the cost of requiring additional operations.


Yes, the main difference is the instruction set.

One common way to design a stack-based instruction set is to refer to the local variables using stack offsets, just as in a register-based instruction set. The operation for reading a local variable makes a copy of the stack slot at the given offset and pushes it into the top of the stack, where it may be used as the input for subsequent operations. For example, to implement "X = X + Y" it might use the following code:

   LOAD 0 (pushes the contents of X to the stack)
   LOAD 1 (pushes the contents of Y to the stack)
   ADD    (pops two values and pushes the result)
   STORE 0 (pops one value, and stores it into X)
Meanwhile, in a register-based VM the ADD operation manipulates the relevant stack slots directly:

  ADD R0 R0 R1 (compute R0 + R1 and store in R0)
> Instruction set is simpler but at the cost of requiring additional operations.

Well put. This is one of the key differences between a stack VM and a register VM. In register VM the instructions are larger, because they need to specify all their arguments. However, a stack VM may need to generate more instructions to fiddle with the top of the stack.

If you want to read more, one good example of a stack-based VM is the one in the Crafting Interpreters book. For a good example of a register based VM my suggestion would be the paper about the implementation of Lua (section 7 in particular).

https://craftinginterpreters.com/local-variables.html

http://www.jucs.org/jucs_11_7/the_implementation_of_lua


> I imagine locals are allocated to and stored on the stack, exactly like you discuss for the registers. But in a stack VM, you can't work on them directly. Instead you have to load them into this special other place

Another thing you have to take into consideration is how many registers you can index. This is relevant also when you're not constrained by hardware registers when you define a VM, because register references take up space in the instruction encoding, which have consequences in the instruction decoding performance and analysis (e.g. if all your instructions have the same length you are sure you cannot have jumps in the middle of an instruction and thus start executing a corrupt "unaligned" instruction stream)

Once you have more locals than available registers, you have to start emitting instructions to spill and load locals from and into registers which play a bit the role of the dance you have to do in stack machines to access any local.

In a way you can see a stack machine as a degenerate case of a register machine: a 2 register machine with some implicit operations on them every time an instruction is executed.


This reminds me of my teenage years in the 00s. I was a member of this forum called pagemac, and what drew all of us kids together was the love of building small esoteric languages. A lot of brainfuck derivatives (one called stackfuck) and assembly like languages, and bytecodes/VMs.


Beam is a register based VM that's probably used more than LUA.


SQLite implements a register based VM too.


I guess the most popular one is either the JVM or the .Net CLR ?


Those are both stack-based.


right. I was misreading this. So for a register based VM, I would have to refer to the Parrot virtual machine then.



The JVM used an operand stack for local data flow, while the CLR uses registers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: