
Implementing a Virtual Machine in C - freefouran
http://www.blog.felixangell.com/virtual-machine-in-c/
======
neverartful
Contrary to the naysayers, I like seeing stuff like this. Why? Because it's a
simple, gentle introduction. It's easily digestible for the newcomer. And it
might be easy enough to encourage a newcomer to start building their own VM
that goes on to be something real.

For those that criticize it and find faults with it -- I'm sure the author
would consider pull requests. Or you could provide your own fork with all the
improvements that you believe are necessary.

~~~
aikah
Calling people who criticize that blog post "naysayers" is childish, at best.
Once something is out there on the web it's going to get some criticism,
that's normal.There is nothing wrong with that.

~~~
sdoering
The question (and OP made that clear) is how you criticize. Trolling is not
the right way, I believe.

Cosplaying Captain Obvious neither.

------
jcoffland
I'd like to see such an article on a register based VM. Pawn and Lua are nice
examples. Most VMs are stack based but this is mainly because they are
conceptually easier to understand. Register based machines have some real
advantages, like requiring far fewer instructions inside tight loops.

~~~
tptacek
Stack VMs aren't used just because they're easier to understand:

* In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great

* It is easier to generate code for stack machines, because you don't need to run register allocation

* There's a tradeoff in instruction complexity versus number of instructions between stack and register machines

~~~
rdc12
This paper [1] (thou it is a tad old now) shows that the register machine
approach does still outperform the stack machine by a fair margin. Largely
from the reduction of needed instructions.

Wonder which is a better source represntation for a JIT thou.

[1]
[https://www.usenix.org/legacy/events/vee05/full_papers/p153-...](https://www.usenix.org/legacy/events/vee05/full_papers/p153-yunhe.pdf)

~~~
gsg
JIT compilers aren't really an argument either way, as you can do tracing or
transform to an SSA graph from either representation quite easily. All the
real complexity is further down the compilation pipeline.

I (half) wrote a toy trace compiler based on a register bytecode a while back.
Writing the front end was quite pleasant - the difficult parts were back end
stuff like figuring out how to support trace exit information. (The difficulty
there is that you might want to perform stores, materialize objects etc as you
go off trace as part of store and allocation sinking.)

------
aftbit
I'm a bit disappointed - this VM doesn't have instructions for looping or
branching, nor does it really use the registers in any way. I was hoping to
read a writeup that introduced some concepts that were used in real (non-toy)
systems.

~~~
voidiac
Branching would be something like:

    
    
      case JMP: {
        ip = program[ip + 1] - 1;
        break;
      }

------
donpdonp
New concepts are introduced at a satisfying pace. Each bit of code is
explained thoroughly. Nice writeup.

------
earlz
I'm pretty sure everyone has wrote their own toy VMs, but I'll go ahead and
throw mine out there. (well, 1 of the 3 I've wrote that I like best). It's
called LightVM and is intended to be capable of running on tiny
microcontrollers.

The most cool thing I like about it is the opcodes and registers are extremely
general purpose. So, to do a branch, you do `mov IP, label`, or even a
"push.mv" instruction which when used against IP is basically the same as the
usual "call" instruction, but can also be used with data registers to save a
register to the stack and then set it to a value.

I've found the hardest thing about making a VM isn't making a VM, but rather
making the infrastructure around it (assembler, debugger, compilers, etc)

[https://bitbucket.org/earlz/lightvm/overview](https://bitbucket.org/earlz/lightvm/overview)

~~~
dkersten
_So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction
which when used against IP is basically the same as the usual "call"
instruction_

I wrote something a little like this once too - there was a register stack and
call, jump, branch were all implemented by pushing or popping the register
stack.

------
vbezhenar
For those who want to implement a VM as an exercise, I recommend to implement
a simple JIT-compiler after that. You'll probably be impressed at performance
improvements and it's funny exercise to do. I used GNU lightning to generate
machine code.

~~~
rounak
Any pointers/links/tutorials for this? Thanks.

------
emmanueloga_
I am starting to sound like a broken record, but here it goes. If you want a
more complete tutorial on writing stack based virtual machines, check "The
Elements of Computing Systems" and its accompanying course,
[http://www.nand2tetris.org/](http://www.nand2tetris.org/).

The book teaches you to build:

1) A CPU from basic electronics elements

2) An assembler to generate machine code

3) A bytecode VM that can be simulated and an assembler generator from the
bytecode

4) A basic programming language that generates bytecode

5) An operating system using that language.

I'm midway through building the Assembler and VM myself :-).

------
amelius
This project is nice for educational purposes, but I wouldn't call it a VM,
but instead a "bytecode interpreter".

I think nowadays it is kind of a minimum requirement to have the intermediate
code JIT-compiled (or at least compiled).

I'm also missing a garbage collector, although that is not necessarily part of
a VM (but often is). See NaCl for a counterexample. By the way, a project that
I'd like to see is an efficient garbage collector implemented inside the VM,
instead of as being part of the VM.

~~~
karmakaze
It's interesting that VM for some doesn't mean the same as virtual machine. A
bytecode interpreter _is_ a vm with the bytecodes representing opcodes of the
machine. What this isn't is a 'modern VM' complete with JIT and GC.

------
tjscanlon
For everyone who enjoyed this or wants to take it a step further, I recommend
writing a CHIP-8 emulator. I used the following source:
[http://www.multigesture.net/articles/how-to-write-an-
emulato...](http://www.multigesture.net/articles/how-to-write-an-emulator-
chip-8-interpreter/) and it was very helpful.

------
ggambetta
For people looking for less "toy" implementations, I've written two emulators,
an 8086 one and a Z80 one.

There's libz80
([https://github.com/ggambetta/libz80](https://github.com/ggambetta/libz80))
which is (AFAIK) quite complete and correct but just a library, and the 8086
one ([https://github.com/ggambetta/emulator-backed-
remakes](https://github.com/ggambetta/emulator-backed-remakes)) which is
incomplete and buggy but serves a much more interesting purpose :)

------
phodo
While seemingly simple, the simple non-turing example is not too far off from
the (simple) Forth-like stack-based programming language found and executed in
bitcoin transactions.

[https://en.bitcoin.it/wiki/Script](https://en.bitcoin.it/wiki/Script)

------
pjonesdotca
C is not my thing so a few years ago trying to sort out how a VM works, I
created a VM in Ruby.

Practical? Not in the least. But, it was a good weekend's worth of fun.

[https://github.com/patrickjonesdotca/carban](https://github.com/patrickjonesdotca/carban)

------
bvanslyke
For a project that goes a bit deeper (branching, i/o, etc) consider writing a
Chip8 simulator. There's lots of games written in chip8 bytecode to test with!

------
ternaryoperator
I find these kinds of very basic intro articles frustrating. They till the
same ground over and over: a tiny instruction set implemented with a switch
statement. None of the more difficult issues are addressed: exception
handling, linking to libraries or other programs written for the same VM,
portability of programs across architectures, accessing the OS for services
like file I/O, time, etc.-- All the things that make a toy not a toy.

Every CS student in the world has written a toy VM just like this one.

~~~
maguirre
I feel the same way. I went to read the post expecting a lot more than what I
found and came away feeling both more knowledgeable than I thought I was and
more ignorant for not knowing that I could get away with calling what a saw a
"Virtual Machine"

~~~
tptacek
It's an instruction set, with an instruction dispatcher, a stack, and a
register file. Why would it surprise you that someone would call it a VM?

Are you maybe getting your signals crossed between the kind of VM this article
is talking about (in the p-code sense of a VM) and virtualization systems?

~~~
maguirre
I was not surprised that it was called a VM. I was surprised that I didn't
know that!

------
jCanvas
I think the title is very misleading. This is not a virtual machine but an
interpreter for a made up assembly language. There is nothing wrong with that
and I am sure a beginner would find it very useful. But reading the title I
was expecting something quite different.

~~~
dalke
Virtual machines include "interpreters for a made up assembly language."
Quoting from
[http://en.wikipedia.org/wiki/Virtual_machine#Process_virtual...](http://en.wikipedia.org/wiki/Virtual_machine#Process_virtual_machines)
:

> A process VM, sometimes called an application virtual machine, or Managed
> Runtime Environment (MRE), runs as a normal application inside a host OS and
> supports a single process. ... Process VMs are implemented using an
> interpreter; performance comparable to compiled programming languages is
> achieved by the use of just-in-time compilation.

It points to several examples of process VMs. One is Parrot. Quoting from
[http://en.wikipedia.org/wiki/Parrot_virtual_machine](http://en.wikipedia.org/wiki/Parrot_virtual_machine)
:

> Parrot is a register-based process virtual machine designed to run dynamic
> languages efficiently. It is possible to compile Parrot assembly language
> and PIR (an intermediate language) to Parrot bytecode and execute it.

(I quoted that one over Java and Python virtual machines because it uses the
phase "assembly language" in the context of the VM.)

