
Write Your Own Virtual Machine - ChankeyPathak
https://justinmeiners.github.io/lc3-vm/
======
ghj
I was tricked into writing a VM while doing Advent of Code last year and it
really deepened my understanding of programming (e.g., on how to implement
simple coroutines).

Advent of Code is supposed to be 25 daily programming puzzles leading up to
christmas. I went in thinking they were going to be algo/datastructure tasks
but half of them were actually about implementing and using a VM for a made up
assembly language! The difficulty ramp was gradual enough that I did it with
no background in compilers and it was lots of fun.

If you want to experience it, in last year's version
[https://adventofcode.com/2019](https://adventofcode.com/2019) you write the
VM in Days 2, 5, 7, 9 then apply it to solving problems in Days 11, 13, 15,
17, 19, 21, 23, 25.

I ended up loving the VM puzzles and ended up doing the AoC author's other
challenges which have a similar theme:
[https://challenge.synacor.com/](https://challenge.synacor.com/)

~~~
p4bl0
Funny that VM related days are all prime numbers :).

EDIT: No they're not, I wasn't fully awake, sorry.

~~~
saagarjha
Not all of them, but a lot seem to be.

~~~
schemy
A mathematician, physicist, and engineer are taking a math test. One question
asks "Are all odd numbers prime?"

The mathematician thinks, "3 is prime, 5 is prime, 7 is prime, 9 is not prime
-- nope, not all odd numbers are prime."

The physicist thinks, " 3 is prime, 5 is prime, 7 is prime, 9 is not prime --
that could be experimental error -- 11 is prime, 13 is prime, yes, they're all
prime."

The engineer thinks, " 3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is
prime, ..."

[https://users.cs.northwestern.edu/~riesbeck/mathphyseng.html](https://users.cs.northwestern.edu/~riesbeck/mathphyseng.html)

------
cecilpl2
I have written a virtual CPU from very simple building blocks and highly
recommend doing it.

I started with class Wire (which just wraps a bool) and class Transistor
(which accepts two input Wire& and has an output Wire). It has an Update()
function which sets the output state.

From those I built up gates, then flipflops, registers, mux/demuxes, counters,
an ALU, and memory banks. Then I wrote a machine language, connected the cpu
together, wrote a loader to load machine language files into ROM, then an
assembler to allow me to write assembly code.

I then added VRAM, async buses, and wrote a working implementation of Game of
Life.

It has 32bit instructions and runs about 10kHz with 8KB of RAM.

~~~
nurettin
I did something similar, but with a LogicGate and a Wire class. I would
recommend Morris Mano for those who would like to try.

~~~
bogomipz
It looks like those authors have a handful of computer engineering books. Is
there one particular title you would recommend?

~~~
nurettin
sure, I think what I've used translates to the title "digital design"

------
stingraycharles
I once wrote my own virtual machine in college, complete with compiler and
assembler, and I cannot recommend doing this enough. Especially the virtual
machine part is not nearly as difficult as you would imagine, and to this day
(15 years later) I still rely on the things i learned here.

The knowledge you gain from implementing a virtual machine translates
reasonably well to inner workings of a CPU, and you’ll have a much better
understanding of things like stacks, frame pointers and the overhead of
calling a function. It will be completely obvious to you why “i++” is slower
than “++i”.

Thanks for sharing this article.

~~~
moonchild
> It will be completely obvious to you why “i++” is slower than “++i”

...but it's not!

All you have to do is perform the most basic of optimizations: check, before
generating code for an expression, if that expression is used. If not, then
don't bother generating an intermediate for the result. Source: making a c
compiler, just implemented this optimization today. (In an 'industrial-grade
compiler', you probably want to elide this optimization, but do super complex
control flow analysis on the resulting SSA to see that the intermediate is
dead code. But for a toy compiler/vm, little tricks can save you a lot in
codegen quality for little effort.)

> inner workings of a CPU

...if only. Lots of crazy stuff going on in a CPU that doesn't even start to
come up in most VMs.

~~~
astrobe_
Technically, that's a bit off topic because this is about compilers and code
generation, while a VM defines the basic semantics compilers work with.

For VMs, what is relevant is whether you implement an "inc (reg)" or not, that
is if you choose to take the RISC path (small set of mostly orthogonal
instructions) or the CISC path (lots of microcode, things like "repnz scasb"
in x86 assembly or elaborate addressing schemes).

This actually somewhat a false dichotomy, as you can have a RISC-style
instruction set _plus_ a few "high level" instructions for things you expect
to do a lot - like, for instance, array operators _à la_ APL/J.

------
jon-wood
If you want to go really deep on this I highly recommend NAND To Tetris, a
book which starts from the basics of combining NAND gates to build basic
logic. You’ll gradually work through building a CPU from those components,
building an assembler to program it, and finally putting a virtual machine on
top of that.

~~~
dbrueck
Haha, I came here just to encourage people to check out www.nand2tetris.org
too. It's extremely educational.

It's also realllllly satisfying to play Tetris on something you "built" all
the way up from basic logic gates.

It's a great (and free) course during a pandemic lockdown or a fun project
over a holiday break.

------
nailuj
If you want a really in-depth intro into the topic (with video lectures and
guest appearances from legendary VM implementers!), check out CS294 from UC
Berkeley that has been made available under a Creative Commons license:
[http://www.wolczko.com/CS294/](http://www.wolczko.com/CS294/)

It‘s great for following along self-paced and has hands-on exercises for each
topic. You should be a bit familiar with compilers already though.

------
danjc
I've just about finished writing a VM for the ZMachine (the VM that Zork was
written for to make it portable). It's been tremendously enjoyable and I've
learnt a lot.

I'm planning a series of how-to videos out of the project and will definitely
be referencing this article to ensure my presentation of concepts is accurate.
Thanks for sharing this.

------
aSplash0fDerp
VMs have so much utility, so +1 to the author on the nerd nugget for
compartimentalizing the use as a game cartridge.

VMs and VEs (environments) have gotten so portable that IOT and 5G will have
to compete with raw storage (TB's) for the best user experience in a post-
filtered world.

Possibly OT (or use case), but I've demod LTSP [0] to run multiple
environments on a single server (PXE menu) and clustered multiple LTSP servers
(bouncing to other LTSP menus by manually adding a server list to boot
menu(s)) and treated them as books with chapters (each LTSP being a book).

Customizing each chapter by OS needs and media/content/apps has so many uses
(immersive news,learning,entertainment) and paves the way for offnet
productivity/entertainment that currently doesn't exist in the market (ala
Home Library/Encyclopedia Server).

LTSP was not as PnP as it needs to be for consumer use (and external PCIe form
factors were pricey), but anyone writing a VM (or decorating the interior) is
already riding the next wave of physical data subscriptions/refills that
bypass metered networks (SDCs or SSDs in TB capacities with OS agnostic data)
for 8k video, VR and compilation/mixedtape datasets.

Portable VMs are a trend waiting to happen, so anything you can learn about it
(inside-out) is not a waste.

[0] [https://ltsp.org](https://ltsp.org)

------
codr7
For those who haven't read the post yet, I would definitely recommend doing so
if only to understand the striped instruction set implementation using C++
generics at the end.

I rarely run into solutions that fundamentally expand my perspective these
days but this one did.

------
stevekemp
Virtual machines are definitely fun, and can be useful things to know about if
you ever design/implement a scripting language or an emulator.

I wrote a toy system a few years ago, a simple interpreter (C) along with a
compiler/decompiler (Perl) to match it. Unfortunately my system didn't have a
terribly well-designed instruction set. If I were to start over I'd probably
implement 8086 instruction-set, or similar.

That said even a toy system is fun, the biggest issue with writing your own
instruction-set is that you have to write the actual programs too. Which is
often less fun! I rewrote my interpreter in golang recently, keeping the same
instruction-set but adding a better trap-system. Of course re-implementing it
meant that I still have the problem of no real programs being written for the
machine!

[https://github.com/skx/go.vm/](https://github.com/skx/go.vm/)

------
teleforce
FYI, Edouard Bugnion one of co-founders of VMware has written a book on VM
technology "Hardware and Software Support for Virtualization"[1].

[1][https://research.vmware.com/publications/hardware-and-
softwa...](https://research.vmware.com/publications/hardware-and-software-
support-for-virtualization)

------
retrac
Slight tangent. To learn a bit of Zig, I've been doing the usual PDP-8
simulator task I do for systems languages. Zig allows arbitrary fixed width
integers, from 1 to 2^16 bits long, signed and unsigned. It is /delightful/ to
have those data types when doing an emulator. I'm convinced more languages
should have them.

------
Const-me
Never wrote virtual machines, because I know why Sun (now Oracle) and
Microsoft both spent a billion each to create theirs. You can write something
that works over a weekend, but the performance won’t be good without JIT,
generational GC, and many other extremely complicated optimizations.

If you think you need to develop a VM, I recommend to reconsider, and think
how you can reuse something that’s already there. For instance, modern .NET VM
is open source with MIT license, the code quality is more or less OK, and it’s
relatively easy to generate .NET bytecode from something else, Reflection.Emit
from the inside, or Mono.Cecil from outside.

~~~
teacpde
The point of writing your own vm isn’t to come up something that is on par
with Sun’s or Microsoft’s vm, but rather have a hands on learning experience
of the inner workings of a vm.

~~~
Const-me
Real-life VMs don’t interpret, they JIT compile. The code in the article has
nothing common with inner workings of real-life VMs.

Even VMs which do interpret don’t do the way written in the article, take a
look:
[https://github.com/python/cpython/blob/v3.9.0rc1/Python/ceva...](https://github.com/python/cpython/blob/v3.9.0rc1/Python/ceval.c#L930-L952)

What exactly it is you’re learning then?

~~~
Jasper_
> Real-life VMs don’t interpret, they JIT compile.

Pretty much every real-life JavaScript VM is tiered, and has an interpreter,
which gathers data about expected usage which the JIT will use to inform its
optimizations when it goes to generate machine code.

Still, you'd be surprised about the performance you can get out of a basic
interpreter. Games have used Lua for years. I've written and reverse
engineered plenty of custom bytecode for various reasons in the games space.
It's a useful tool to have, and there are a lot of situations where
performance either isn't the goal, or the large amount of tricks used by
JITting VMs isn't helpful.

The dispatch loop you point to is just about using a C extension (computed
gotos) to gain a few extra performance points. You can learn about it in about
30 minutes after knowing what a VM is.

------
andreareina
Extra credit: implement OP_HCF

------
konjin
Note that this is _just_ a virtual machine.

You can only run the already compiled binary files for it.

If you want to write something new you will need to do it in binary. If you
want to write in assembly you will need to write an assembler, or find one
online.

