
Implementing a Virtual Machine in C - fapjacks
http://www.felixangell.com/virtual-machine-in-c/
======
TomNomNom
A good read.

One thing I thought of as a little weird is this:

> But remember, we aren't decoding anything since our instructions are just
> given to us raw. So that means we only have to worry about fetch, and
> evaluate!

The 'decode' step is being done by the switch statement, it's not being
skipped. I think the author may be confusing turning assembly into machine
code (i.e. what an assembler does) with decoding an opcode.

I built a (very simple) VM in Go live on stage at a local mini-conference last
year. I really enjoyed it. There's a video of it here:
[https://www.youtube.com/watch?v=GjGRhIl0xWs](https://www.youtube.com/watch?v=GjGRhIl0xWs)

~~~
Corun
I'm not sure, but I think the author means that there is no extraction of
fields from the instruction, which is one way of defining the decode step.

------
r4pha
There's something really nice about these minimalistic C projects. You can
quickly take a look at the source and immediatelly grasp what's happening. A
simple and didactic introduction. Super cool.

~~~
tptacek
Yes. Obligatory and tedious re-up for my favorite minimalistic C program of
the last 5 years:

[https://github.com/rswier/c4](https://github.com/rswier/c4)

Take 2 hours to grok this and you can skip compiler books and move straight to
compiler papers.

~~~
Winterflow3r
Challenge accepted.

~~~
tptacek
An even better challenge: add structs to the compiler. :)

~~~
Winterflow3r
haha okay! I don't have any formal CS training though (and the CPU between the
ears isn't too great either) so it's going to take me more than 2 hours. I'm
going through void next() right now. Are you open for questions on Github?

~~~
tptacek
It's not my code, so, Github is a bad place to ask me questions. But you can
ask them here, and I'll respond.

------
kyrre
[https://news.ycombinator.com/item?id=9516656](https://news.ycombinator.com/item?id=9516656)

------
joe_the_user
A nice simple demonstration.

Other things that can be added easily is Labels, Loop0 - loop while a variable
is greater than zero and subX - pop stack and instruction pointer and go to
location X.

------
jheriko
why not use an array of structs as the program, with a function pointer and
the parameters stored in the struct? the instruction pointer is then an index
into that array... the functions implement the opcodes and the archetecture
becomes more instructive, cleaner, faster, smaller and easier to read imo.
executing the program can become a loop that just calls whatever function
pointer is in the struct at the instruction pointer.

~~~
stevekemp
Funny you mention that - I used that structure in my simple virtual machine
(which includes a compiler, a decompiler, support for conditionals, and also
an example of embedding):

[https://github.com/skx/simple.vm](https://github.com/skx/simple.vm)

Rather than having an enum of opcodes I have a array of pointers to functions
- one for each opcode I implement. It is a logical/clean way of implementing
things, but i think it is not necessarily better.

(In my case I lose due to inline strings, and variable length opcodes which
make complications.)

------
sklogic
And if you want a better performance, you can replace this switch with
computed gotos (supported at least by gcc and clang).

~~~
simias
You should first check if the compiler doesn't do it for you. Otherwise you're
just making your code harder to maintain and less portable for zero gain.

In my experience with similar code you can get GCC and LLVM to generate a
proper jump table if you're careful to make your code as regular as possible
in order for the optimizer to understand what you're doing.

Case in point: [https://github.com/simias/psx-
rs/blob/master/src/cpu/mod.rs#...](https://github.com/simias/psx-
rs/blob/master/src/cpu/mod.rs#L325-L400)

Here LLVM manages to inline all the functions and generate a proper jump
table. IIRC I tried getting the function address in the "match" and calling it
afterwards but then the compiler wouldn't optimize it properly anymore.

~~~
plq
> You should first check if the compiler doesn't do it for you.

It probably doesn't, because otherwise Intel wouldn't have submitted this
patch that adds computed goto support to CPython:
[http://permalink.gmane.org/gmane.comp.python.devel/153401](http://permalink.gmane.org/gmane.comp.python.devel/153401)

~~~
xigency
The same performance can easily be achieved by simple compiler optimizations
like inlining functions, unrolling loops, peephole optimizations, etc. The
fact that your compiler doesn't support the same performance doesn't
necessarily mean to re implement things in a confusing way.

Another way of achieving the exact same result would be to use an array of
function pointers.

I think the more interesting dilemma is when you have a more complex VM
possibly operating on an abstract syntax tree instead of just a bytecode
stream, and the ways to optimize code (in C) while maintaining a proper stack.

After figuring out how to make a switch statement work, there are a ton of
more pressing organizational concerns like how to make function trampolining
work without any overhead.

Another way to make it faster would be to have it interpret * two * op-codes
at once, but hey, that wouldn't do anything except save two cycles and explode
the code size to a square of what it was.

~~~
sklogic
Yes, that python patch is only halfway to a proper indirect threaded dispatch,
it still got a lookup table. See the OCaml bytecode VM for a really high
performance example of this approach.

> a more complex VM possibly operating on an abstract syntax tree

Don't do it.

~~~
thesz
> Don't do it.

[http://peterdn.com/files/A_JIT_Translator_for_Oberon.pdf](http://peterdn.com/files/A_JIT_Translator_for_Oberon.pdf)

Oberon did it with relative success. Relative to other VMs at the time.

~~~
sklogic
We're not talking about JITs here. Of course an tree-based representation is
better for JITs, and if using a flat VM you may have to promote it to a higher
level AST (or at least an SSA-based) form anyway.

But for an _immediate_ execution, a flat stream of instructions (or, better, a
threaded code) is much more efficient than any kind of tree. Yes, I'm aware of
things like SISC, but I'm yet to see a proof that their approach is more
efficient than a flat bytecode, even when JVM overhead is taken into
consideration.

