
Write Your Own Virtual Machine - vedosity
https://justinmeiners.github.io/lc3-vm/
======
chrisaycock
One of my favorite techniques for implementing VMs is the "computed goto":

[https://eli.thegreenplace.net/2012/07/12/computed-goto-
for-e...](https://eli.thegreenplace.net/2012/07/12/computed-goto-for-
efficient-dispatch-tables)

Consider this example for dispatching instructions from the article:

    
    
        while (running)
        {
            uint16_t op = mem_read(reg[R_PC]++) >> 12;
            
            switch (op)
            {
                case OP_ADD:
                    {ADD, 6}
                    break;
                case OP_AND:
                    {AND, 7}
                    break;
                case OP_NOT:
                    {NOT, 7}
                    break;
                case OP_BR:
                    {BR, 7}
                    break;
                ...
            }
        }
    

That code has a lot of branching. The _switch_ statement has to jump to the
corresponding _case_ , the _break_ statement branches to the bottom, and then
there is third branch to get back to the top of the _while_ loop. Three
branches just to hit one instruction.

Now imagine we had written the above as:

    
    
        static void* dispatch_table[] = { &&OP_ADD, &&OP_AND, &&OP_NOT, &&OP_BR,
                                          ... };
        #define DISPATCH() goto *dispatch_table[memory[reg[R_PC]++] >> 12]
    
        DISPATCH();
    
        OP_ADD:
            {ADD, 6}
            DISPATCH();
        OP_AND:
            {AND, 7}
            DISPATCH();
        OP_NOT:
            {NOT, 7}
            DISPATCH();
        OP_BR:
            {BR, 7}
            DISPATCH();
        ...
    

Now there is only one branch per instruction. The handler for each instruction
directly jumps to the next location via the _goto_. There is no need to be in
an explicit loop because the interpreter runs until it hits a halting
instruction.

Many VMs now use this technique, including the canonical Ruby and Python
interpreters.

~~~
sifoobar
For the longest time, I swore by computed goto as well. But it has its share
of problems; it's not very portable; it forces the code into a rigid, non-
extendable format; and it's not as efficient as commonly assumed. I'm far from
the first person to notice [0], so don't bother shooting the messenger.

My latest project [1] simply calls into a struct via a function pointer for
each instruction. With a twist. Since it returns the pointer to the next
operation and uses longjmp to stop the program, which means I can get away
without lookups and without a loop condition. And as an added bonus the code
is much nicer to deal with and more extendable, since the set of operations
isn't hardcoded any more.

Comparing language performance is tricky business, but it mostly runs faster
than Python3 from my tests so far. My point is that computed goto is not the
end of the story.

I'll just add that there are many different ways to write a VM; the one posted
here is very low level, the one linked below reasonably high level. Using
physical CPUs as inspiration is all good, but there's nothing gained from
pretending in itself unless you're compiling to native code at some point.

[0]
[http://www.emulators.com/docs/nx25_nostradamus.htm](http://www.emulators.com/docs/nx25_nostradamus.htm)

[1] [https://gitlab.com/sifoo/snigl](https://gitlab.com/sifoo/snigl)

~~~
tom_mellior
> so don't bother shooting the messenger

OK, let's shoot the original source: It only found that a certain unnamed
version of GCC ten years ago performed tail merging, which does indeed defeat
the purpose of the optimization. Without the author bothering to turn off tail
merging in this case (as Python does), this doesn't mean much.

> Comparing language performance is tricky business, but it mostly runs faster
> than Python3 from my tests so far.

Presumably you not only have a different interpretation approach, but also a
different object model, approach to boxing numbers, and garbage collector.
"Tricky business" is a bit of an understatement ;-)

~~~
sifoobar
Better him than me. I wouldn't be so quick to dismiss relevant coding
experience, it doesn't get old. Much of his reasoning about locality still
holds, to an even greater degree today.

I've implemented more or less the same interpreter using computed goto and
various variations on dispatch loops; and this is the fastest solution I've
managed to come up with, and the nicest code to work with. Still too many
parameters, my point is that I have plenty of experience pointing in that
direction which is better than nothing. It's not a mystery to me. Using
computed goto will involve some kind of lookup to get the relevant labels, I'm
using pointers to actual instructions as jump targets. And since I'm
longjmp'ing out, which means I don't need a loop condition; the loop is
reduced to a single regular goto.

Like I said, better than nothing. There is no such thing as perfection in this
world. I've spent quite some time [0] micro benchmarking different features to
get a fair comparison.

[0]
[https://gitlab.com/sifoo/snigl/tree/master/bench](https://gitlab.com/sifoo/snigl/tree/master/bench)

~~~
tom_mellior
Oh, I wasn't dismissing experience in general, nor was I questioning your
experiences. I would love to see your numbers on the different dispatch
methods you tried.

I was only questioning that one blog post you linked that lazily said
(paraphrasing) "computed-goto dispatch will be undone by the compiler, so
don't bother". Others have posted numbers in this thread
([https://news.ycombinator.com/item?id=18679477](https://news.ycombinator.com/item?id=18679477))
showing that this information is at best outdated.

~~~
sifoobar
No harm done; I don't do pride any more, it kept getting in the way of truth.
But there's a clear tendency, especially on HN if you ask me; to blindly trust
research by whatever authority and beat people over the head with so called
best practices; while completely disregarding personal experience.

Sometimes the reason no one finds a better solution for a long time is that no
one believes it's possible. Let's say it's 50/50, believing it's possible is
still the superior alternative.

I have no idea what Python uses for regular dispatch; the linked thread just
says that their computed goto solution is faster, which comes as no surprise
given that was the reason they switched.

I posted the link since I learnt a lot from it, and since it echoes my own
experience.

------
jonathanstrange
I once implemented a VM in Ada and just used a large switch. I extensively
benchmarked it and it was blazingly fast at -O3.

However, it was only fast when I used packages very sparingly. In contrast to
the usual advice given in the Ada community, splitting up the implementation
into several packages slowed down the main loop tremendously. I suspect this
wouldn't happen with whole-program optimization in C, but believe that the
version of gcc I was using didn't support that for Ada. Also, my Green Threads
were slower than a single thread, no matter which tricks I tried.

It's an abandoned project now, since the accompanying assembler was hacked
together in Racket and at some point I simply lost track of what was going on
where :O

~~~
TimJYoung
Object Pascal (Delphi) also optimizes case statements, depending upon the
nature of the statement:

[https://stackoverflow.com/a/2548425](https://stackoverflow.com/a/2548425)

I tested this recently (Delphi XE6), and it definitely is as-described: you
get straight jump instructions with enough case statement branches, and it is
very fast.

Barry Kelly worked on the Delphi compiler, and I believe he comments here on
Hacker News occasionally.

~~~
vardump
> you get straight jump instructions with enough case statement branches, and
> it is very fast

That's what pretty much any half-decent compiler does nowadays. It'd be much
more surprising if it didn't.

~~~
TimJYoung
The interesting part to me was the variations in the emitted instructions,
based upon the source. IOW, it would be easy to mistakenly think that you
_weren 't_ going to get jumps if you only used a few case branches to test
things out.

------
tombert
I love these kinds of tutorials, and really any tutorial that makes me think
of stuff that was previously "magic" and makes me feel stupid for not
previously understanding it after I read it (I honestly mean that in a
positive way).

This will be a fun weekend project for me...I have had an idea of a lambda-
calculus-based VM that I've wanted to build for a few months ago, and I think
this will be a good start for me to understand it.

------
techno_modus
I like this tutorial but it should be mentioned that before implementing a
virtual machine one should understand that there are many computing models and
many alternative mechanisms within each of them. Implementing VM for
sequential program execution is relatively easy. What is more (conceptually)
difficult is concurrency, asynchronous processes etc.

~~~
JustSomeNobody
> What is more (conceptually) difficult is concurrency, asynchronous processes
> etc.

Know any good resources for these?

~~~
sifoobar
My own baby, Snigl [0], does always-on cooperative concurrency and
asynchronous IO.

Just ask if you need help finding your way around.

[https://gitlab.com/sifoo/snigl](https://gitlab.com/sifoo/snigl)

~~~
JustSomeNobody
Thank you. I’ll check it out over the holidays.

------
gattr
I agree it's a great exercise. Years ago I added a very simple stack-based VM
to my raytracer to allow procedural textures & normal maps. The scene script
would e.g. contain a material description like this:

    
    
      material {
          diffuse rgb = [0, 0, 1] * (1-(0.5+0.5*noise(x*0.000002, y*0.000002, z*0.000002, 0.66, 2))) + [1, 1, 1] * (0.5+0.5*noise(x*0.000002, y*0.000002, z*0.000002, 0.66, 2));
      }
    

i.e. an expression with 3-vectors and scalars and a few basic functions
(noise, sin/cos, etc.). This can be easily "compiled" (during script parsing)
for execution on the VM. Then the overhead during actual raytracing was quite
small.

~~~
all2
I don't suppose you have code or examples lying around? I'd love to see what
you were trying to achieve, the results you had, and the code.

If you're comfortable sharing, that is.

~~~
gattr
I still have the code (old and ugly, so not on GitHub; but it still builds and
works) and examples. Let me know where I should send it.

~~~
all2
My email is in my profile.

~~~
gattr
Is it publicly visible? At the moment in your profile I can only see "about",
submissions, comments, favorites.

~~~
all2
Guess not. Huh. I added it to my 'about' block.

------
morazow
Off-topic.

Does anyone know how I can convert AST into stream of bytecodes? Are there any
good example language implementations to learn?

~~~
carapace
A little tangential, but check out "Prolog as Description and Implementation
Language in Computer Science Teaching"by Henning Christiansen

[http://www.ep.liu.se/ecp/012/004/ecp012004.pdf](http://www.ep.liu.se/ecp/012/004/ecp012004.pdf)

> ... Definitional interpreters, compilers, and other models of computation
> are defined in a systematic way as Prolog programs, and as a result, formal
> descriptions become running prototypes that can be tested and modified ...
> These programs can be extended in straightforward ways into tools such as
> analyzers, tracers and debuggers.

Also "Logic Programming and Compiler Writing" By David Warren (and the work
that followed on.)

------
wener
Ok,here is mine
[https://github.com/wenerme/bbvm](https://github.com/wenerme/bbvm)

Writing a VM is very fun and addictive. Especially writing in different
language can learn a lot!

------
ngcc_hk
Great site

------
elgfare
This was a great piece, but I found it funny that there's a bug in the very
first line of code mentioned:

    
    
        /* 65536 locations */
        uint16_t memory[UINT16_MAX];
    

Spot the leak.

