
My Most Important Project Was a Bytecode Interpreter - 10098
http://gpfault.net/posts/most-important-project.txt.html
======
robertelder
One of the moments where I really started to feel like I was starting to 'see
the matrix' was when I was working on a regex engine to try to make my
compiler faster (it didn't, but that's another story). The asymptotically fast
way to approach regex processing actually involves writing a parser to process
the regex, so in order to write a fast compiler, you need to write another
fast compiler to process the regexes that will process the actual programs
that you write. But, if your regexes get complex, you should really write a
parser to parse the regexes that will parse the actual program. This is where
you realize that it's parsers all the way down.

When you think more about regexes this way, you realize that a regex is just a
tiny description of a virtual machine (or emulator) that can process the
simplest of instructions (check for 'a', accept '0-9', etc.). Each step in the
regex is just a piece of bytecode that can execute, and if you turn a regex on
its side you can visualize it as just a simple assembly program.

~~~
wyager
I mean, the whole point of regexes (not to be confused with PCREs) is that any
given regex is isomorphic to some canonical finite state machine. It is,
specifically speaking, a tiny description of an FSM over the alphabet of ASCII
characters (or whatever charset you're using).

Interestingly, regexes/FSMs are (IIRC) the most powerful class of machines for
which equivalence is decidable. So if you give me any two regexes, I can tell
you if they match on all the same strings, but this is not true for any more
powerful grammar.

~~~
halter73
Isn't it possible to decide the equivalence of deterministic pushdown
automata? Wouldn't DPDAs be considered more powerful than FSMs due to the
addition of a stack?

Wikipedia [1] shows there's a paper called "The equivalence problem for
deterministic pushdown automata is decidable" that won the Gödel Prize is
2002. I haven't read the paper nor do I currently have access to it though.

[1]
[https://en.wikipedia.org/wiki/Deterministic_pushdown_automat...](https://en.wikipedia.org/wiki/Deterministic_pushdown_automaton#Equivalence_problem)

~~~
kefka
Here you go!

[http://link.springer.com.sci-
hub.cc/chapter/10.1007/3-540-63...](http://link.springer.com.sci-
hub.cc/chapter/10.1007/3-540-63165-8_221)

It was initially published in 1997.

------
sillysaurus3
Also: a software rasterizer.

Most people refuse to write one because it's so easy not to. Why bother?

It will make you a better coder for the rest of your life.

Let's make a list of "power projects" like this. A bytecode interpreter, a
software rasterizer... What else?

~~~
Jasper_
Tracks I've done and suggested to friends and colleagues as learning
experiences:

* Compression (lossless, lossy, image, audio, texture, video)

* Languages (bytecode interpreter, AST interpreter, parser/lexer for a simple language, simple JIT, understanding instruction scheduling)

* DSP programming (writing programs for fast, branchless math)

* Comfort with binary and binary formats (start with packfiles .zip/.tar, move onto reverse engineering simple formats for e.g. games)

* Understanding the difference between RAM and address spaces (e.g. understanding virtual memory, mmap, memory-mapped IO, dynamic linking, the VDSO, page faulting, shared memory)

* Device drivers (easier on Linux, understanding userspace/kernel interaction, ioctls, how hardware and registers work, how to read spec sheets and hardware manuals)

* Graphics (modern software rasterizer that's not scanline-based, understanding 3D and projective transforms, GPU programming and shaders, basic lighting (reflection and illumination) models, what "GPU memory" is, what scanout is, how full scenes are accumulated all along the stack)

I could go heavily into depth for any one of these. Ask me questions if you're
interested! They're all fun and I always have more to learn.

Also, the longer you get into any given track, the more you realize they all
connect in the end.

~~~
Rexxar
Concerning last point, do you know nice example of non-scanline-based 2D
renderers ?

~~~
nadam
A very informative tutorial on high performance edge-function based
rasterization:

[https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-
occlu...](https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-
culling-index/)

(articles 6, 7, 8, others are not closely related)

As far as I know modern hardware has been non-scanline-based for ages btw.

~~~
Rexxar
Thanks, looks very interesting. But it seems more 3D-oriented than 2D
oriented.

------
tominous
I love the author's meta-idea of refusing to accept that unfamiliar things are
black boxes full of magic that can't be touched.

A great example of this mindset is the guy who bought a mainframe. [1]

Refuse to be placed in a silo. Work your way up and down the stack and you'll
be much better placed to solve problems and learn from the patterns that
repeat at all levels.

[1]
[https://news.ycombinator.com/item?id=11376711](https://news.ycombinator.com/item?id=11376711)

~~~
stephengillie
Everything is made from smaller components. Understand each of those
components better and you'll understand the entire system better.

Sometimes, you can use end-errors to tell which component has the issue. For
instance, if a web site gives a 502 error, the problem is likely with the load
balancer or lower network stack on the web server. 404 would often be a file
system level issue on the web server. 500 is frequently a network issue
between web server and database server. 400 is a problem with the site
presentation code, or maybe database malforming addresses.

~~~
dspillett
_> Everything is made from smaller components. Understand each of those
components better and you'll understand the entire system better._

This. No matter how specialised you are (or want to be) always strive to have
at least a basic understanding of the full stack and everything else that your
work touches through a couple of levels of indirection (including the wetware
such as, in commercial contexts, having a good understanding of your client's
business even if you aren't even close to being client-facing) because it will
help you produce much more useful/optimal output and can be a lot more helpful
when your colleagues/compatriots/partners/what-ever his a technical problem.
Heck, at the logical extreme a little cross discipline understanding could
even lead you to discovering a better method of doing _X_ that strips out the
need for _Y_ altogether, revolutionising how we do _Z_.

Of course don't go overboard unless you are truly a genius... Trying to keep
up with everything _in detail_ is a sure-fire route to mental burn-out!

------
wahern
Two approaches are severely underused in the software world:

1) Domain-specific languages (DSLs)

2) Virtual machines (or just explicit state machines more generally)

What I mean is, alot of problems could be solved cleanly, elegantly, more
safely, and more powerfully by using one (or both) of the above. The problem
is that when people think DSL or VM, they think big (Scheme or JVM) instead of
thinking small (printf). A DSL or VM doesn't need to be complex; it could be
incredibly simple but still be immensely more powerful than coding a solution
directly in an existing language using its constructs and APIs.

Case in point: the BSD hexdump(1) utility. POSIX defines the od(1) utility for
formatting binary data as text, and it takes a long list of complex command-
line arguments. The hexdump utility, by contrast, uses a simple DSL to specify
how to format output. hexdump can implement almost every conceivable output
format of od and then some using its DSL. The DSL is basically printf format
specifiers combined with looping declarations.

I got bored one day and decided to implement hexdump as a library (i.e. "one
hexdump to rule them all"), with a thin command-line wrapper that emulates the
BSD utility version. Unlike BSD hexdump(1) or POSIX od(1), which implement
everything in C in the typical manner, I decided to translate the hexdump DSL
into bytecode for a simple virtual machine.

    
    
      http://25thandclement.com/~william/projects/hexdump.c.html
    

The end result was that my implementation was about the same size as either of
those, but 1) could built as a shared library, command-line utility, or Lua
module, 2) is more performant (formats almost 30% faster for the common
outputs, thanks to a couple of obvious, easy, single-line optimizations the
approach opened up) than either of the others, and 3) is arguably easier to
read and hack on.

Granted, my little hexdump utility doesn't have much value. I still tend to
rewrite a simple dumper in a couple dozen lines of code for different projects
(I'm big on avoiding dependencies), and not many other people use it. But I
really liked the experience and the end result. I've used simple DSLs, VMs,
and especially explicit state machines many times before and after, but this
one was one of the largest and most satisfying.

The only more complex VM I've written was for an asynchronous I/O SPF C
library, but that one is more difficult to explain and justify, though I will
if pressed.

~~~
bachback
Yes, absolutely agree. In LISP its obvious.

1\. Every function is a tiny VM already. Every macro is a layer on top of a
"compiler" to let you redesign a language. LISP gives much more power,
precisely because every program is its own DSL, and all power of the language
within that DSL is available.
[http://www.paulgraham.com/avg.html](http://www.paulgraham.com/avg.html)

2\. In SICP they show how to build anything from LISP constructs. The elegant
thing is that LISP actually only needs very few machine primitives.
[https://mitpress.mit.edu/sicp/full-text/book/book-
Z-H-30.htm...](https://mitpress.mit.edu/sicp/full-text/book/book-
Z-H-30.html#%_chap_5)

The most powerful constructs I've seen is a combination of these 2:
Clojure/async is a macro which transforms any program to state machines. I
think that kind of power gives you 20-30x advantage in business type
applications. In fact I've seen C++ programmers spending a decade on
discovering similar things by themselves. I strongly believe everyone should
know at least a little about LISP.

~~~
wahern
I whole-heartedly agree with your points. The ability to transform programs
programmatically is extremely powerful. But, FWIW, I tend to see that as a
form of generic programming.

When I think DSL I think regular expressions (especially PCREs) or SQL, where
the syntax is tailored-designed for the specific task at hand.

The problem with in-program transformations is that you're still largely bound
to the syntax of the host language. That's particularly the case with
s-expression. That binding has amazing benefits (e.g. when auto-transforming
programs into resumable state machines) but also puts constraints on the
language syntax you can employ. That's not a problem when you're dealing with
experienced engineers and devising technical solutions, but people tend to
stay away from DSLs generally because they fear the burden imposed on others
(including their future selves) having to learn a new language, how to use it,
and how it integrates within a larger framework. You minimize that burden and
make DSLs more cost-effective by maximizing the expressiveness of the DSL in
the context of its purpose, and minimize switching costs by making the
delineation between the host language and the DSL clear and obvious.

So, for example, in concrete terms you'd generally implement arithmetic
expressions using infix notation. You can sorta implement infix notation in
LISP, but you still have the s-expression baggage (such as it is; it's
obviously not baggage from the expert's standpoint), which among other things
makes it difficult to know where the host language ends and the DSL begins.

Lua had a head start on most popular languages with its LPeg PEG parser, which
makes it trivial to parse custom DSLs into ASTs. For all the limitations of
PEGs[1], they're just amazingly powerful. But to drive home my previous point,
while expert LPeg users use the Snowball-inspired pattern where you
instantiate PEG terms as Lua objects and build grammars using Lua's arithmetic
operators (with operator overloading "a * b" means concatenation instead of
multiplication, and "a^1" means repeat 1 or more times), newbies tends to
prefer the specialized syntax which uses a notation similar to the original
PEG paper and to typical regular expression syntax. (LPeg includes a small
module to parse and compile that notation.)

Converting the AST into useable code is another matter, and that's an area
where LISP shines for sure. And now that I think about it, the best of both
worlds might be a PEG parser in LISP. But I'm completely ashamed to say I
haven't use LISP or LISP derivatives.

[1] Yes, implementing left-recursion can be taxing using a PEG. But the syntax
is so intuitive and powerful that the alternatives just can't replace it.
Regular expressions are limited, too, but they'll never replace PEGs (or
anything more sophisticated) because their expressiveness is just too powerful
and cost-effective within certain domains. Instead we supplement regular
expressions rather than replace them; and if PEGs continue catching on I
expect PEGs to be supplemented rather than replaced.

------
gopalv
The project that affected my thinking the most was a bytecode interpreter[1].

I've had use for that knowledge, nearly fifteen years later - most of the
interesting learnings about building one has been about the inner loop.

The way you build a good interpreter is upside-down in tech - the system which
is simpler often works faster than anything more complicated.

Because of working on that, then writing my final paper about the JVM,
contributing to Perl6/Parrot and then moving onto working on the PHP bytecode
with APC, my career went down a particular funnel (still with the JVM now, but
a logical level above it).

Building interpreters makes you an under-techtitect, if that's a word. It
creates systems from the inner loop outwards rather than leaving the innards
of the system for someone else to build - it produces a sort of double-vision
between the details and the actual goals of the user.

[1] - "Design of the Portable.net interpreter"

~~~
_RPM
Interesting. One thing that I still haven't solved yet is the "break" and
"continue" statement inside loops. For a break statement, it seems like it
would just be the same a JMP with an address as the operand, but there would
need to be some sort of registration of "The VM is in the loop now, and the
break address is X", and continue would also be a JMP with an address to the
top of the code for the loop.

I haven't implemented those in my system yet, and also have no idea how Python
or PHP does it.

Is PHP's VM a stack based one? I do read the Zend/ directory of PHP's source,
but it is really hard to follow and there is virtually no documentation on the
VM

~~~
wahern
You can implement those as part of a linking phase during bytecode generation:
emit a placeholder value (e.g. 0) for the jump address and when you've
finished compiling the block go back and fill-in the placeholder with the
correct address/offset.

That's relatively easy when implementing an assembler for your opcodes. Just
keep track of symbolic labels and their associated jump points (as a simple
array or linked list) and process (i.e. finalize or "link") the jump points
when the label address becomes known. My "assemblers" often have constructs
like:

    
    
      L0
      ...
      J1
      ...
      J0
      ...
      L1
    

where L? registers a symbolic jump destination (i.e. x.label[0].offset =
label_offset) and J? emits an unconditional jump and registers a link request
(i.e. push(x.label[1].from, jump_opcode_offset)). When a block is finished all
the offsets are known; you just process things like

    
    
      for (i = 0; i < x.nlabel; i++) {
        for (j = 0; j < x.label[i].nfrom; j++) {
          patch_in_offset(x.label[i].from[j], x.label[i].offset)
        }
      }
    
    

Knowing when to emit symbolic label and jump instructions from the AST is a
little more involved, but no more than analyzing the AST for anything else.

Supporting computed gotos would require much more bookkeeping, I'd imagine,
and I'm not surprised few languages support that construct. Or maybe not... I
haven't really thought it through.

One cool thing about this whole exercise is that it helps to demonstrate why
generating some intermediate representation can be easier (conceptually and
mechanically) than directly generating runnable code in a single pass. It
seems more complex but it really makes things easier.

~~~
rurban
Computed gotos defer the search for the label from compile time to run-time.
So you need to hash the targets, and then jump to it.

It all depends how your interpreter PC (program counter) works. Some have just
opcode offsets (like a real PC, as %eip, e.g. lua, ruby, ..) some have opcode
addresses (perl, lisp, ...).

With offsets you can encode jumps relative, which is a huge advantage. With
addresses you have to remain absolute, which means the data needs a full
pointer (cache), and you cannot move the code around for optimizations
afterwards. With offsets you have a cache-friendly op-array, with addresses
you have a fullblown tree (AST) or linked list, with its cache-unfriendly
pointer chasing overhead.

But with full addresses you can avoid the big switch loop overhead in an
interpreter, you just pass around the next pointer instead if incrementing the
PC. But then it gets interesting how to keep the state of the stack depth.

------
_RPM
I saw the matrix after I first implemented a virtual machine. I recommend
everyone does it because it will teach you a lot about how code is executed
and transformed from the syntax to the actual assembly/bytecode. A stack based
virtual machine is so simple it takes a lot of thinking to understand how they
work. (or maybe I'm just not that smart).

It's interesting that he implemented function calls via a jump. In my VM a
function is just mapped to a name (variable), so functions are first class.
When the VM gets to a CALL instruction, it loads the bytecode from the hash
table (via a lookup of the name).

Since this is a procedural language where statements can be executed outside
of a function, implementing the functions as a jump would be difficult because
there would need to be multiple jumps between the function definition and
statements that aren't in a function.

I really wish my CS program had a compilers class, but unfortunately they
don't, so I had to learn everything on my own.

~~~
chii
A CS education is incomplete without a semester on writing a simple compiler,
and a corresponding emulator for the output for said compiler.

~~~
zeveb
I think I agree. I really wish that I'd been walked through The Right Way (or
just A Right Way) to write a compiler and bytecode interpreter by a professor.
Oh well, it's fun to learn on my own!

------
briansteffens
Nice post! I really enjoy playing around with things like this. It's amazing
how little is needed to make a language/interpreter capable of doing virtually
anything, even if not elegantly or safely. As long as you can perform
calculations, jump around, and implement some kind of stack your language can
do just about anything.

I recently threw something together sort of like this, just for fun (I like
your interpreter's name better though):
[https://github.com/briansteffens/bemu](https://github.com/briansteffens/bemu)

It's crazy how much these little projects can clarify your understanding of
concepts that seem more complicated or magical than they really are.

------
anaccountwow
This is a required hw assignment for a freshmen class @ cmu.
[https://www.cs.cmu.edu/~fp/courses/15122-s11/lectures/23-c0v...](https://www.cs.cmu.edu/~fp/courses/15122-s11/lectures/23-c0vm.pdf)

Given it has some parts already written in the interest of time...

~~~
Arcten
What's even cooler, is that after building this VM (for the C0 language) as a
freshman, you can come back as a junior/senior and write a compiler for that
language in 15-411. It's a very cool way of going full circle.

------
oops
Nice read! Reminds me of nand2tetris that was posted not too long ago
[https://news.ycombinator.com/item?id=12333508](https://news.ycombinator.com/item?id=12333508)

(You basically implement every layer starting with the CPU and finishing with
a working Tetris game)

------
douche
This reminds me a little bit of my computer architecture class. We started at
logic gates in a simulator[1], and worked our way up from there to flip-flops
and adders, memory chips, a simple ALU, and eventually a whole 8-bit CPU in
the simulator. I want to think that we were even writing assembly for it,
loading the programs into the simulated memory, and executing it. It was a
great way to get a sense of how everything works, and I think it's when
C-style pointers really clicked for me.

[1] this one, IIRC
[https://sourceforge.net/projects/circuit/](https://sourceforge.net/projects/circuit/)

------
foobarge
I've done something similar 21 years ago: a C interpreter targeting a virtual
machine. The runtime had a dynamic equivalent of libffi to call into native
code and use existing native libraries. I added extensions to run code blocks
in threads so that the dinning philosopher problem solution was very elegant.
Back in the days, not having libffi meant generating assembly on the fly for
Sparc, MIPS, PA-Risc, i386. Fun times. That C interpreter was used to extend a
CAD package.

------
reidrac
I wrote a VM for the 6502 for fun and it was one of most interesting and
satisfying projects I've ever made in my free time.

It is very close to a bytecode interpreter, only that it comes with a
specification that is actually the opcode list for the MOS 6502 (and few
details you need to take into account when implementing that CPU).

Besides there are cross-compilers that allows you to generate 6502 code from C
for your specific VM (see cc65).

------
memsom
I did this in C#. It was a lunch time project at work a couple of years ago.
It was fun. I still want to do a V2 and remove all of the shortcuts I put in
because I didn't want to write code for the stack and stuff like that. At the
end of the day, my solution was spookily similar to this - the 32bit
instructions - well, yeah, I was the same! It was just simpler. I did have a
few general purpose registers (V1, V2 and V3 I think) and I did have routines
to handle bytes, words and such like. So stuff like this (as a random example
I pulled from the source):

ORG START

START: ST_B 10

LOOP: ST_B 10

ADD_B ;;value will go back on stack

LD_B V1

SM_B V1 ;;value we use next loop

SM_B V1 ;;value we compare

SM_B V1 ;;value we echo to console

TRP 21 ;;writes to the console

ST_S '',13,10,$

TRP 21 ;;writes to the console

CMP_B 50 ;;compares stack to the constant

JNE LOOP

ST_S 'The End',13,10,$

TRP 21 ;;writes to the console

END

------
pka
I'm thinking a lot of the complexity of writing a compiler stems from the
usage of inappropriate tools. I.e. I would rather kill myself than write a
lexer in C (without yacc / bison), but using parser combinators it's a rather
trivial task.

Similarly, annotating, transforming, folding, pattern matching on, CPS
transforming etc. the produced AST is pretty trivial in a language that
supports these constructs. And again, a nightmare in C.

That leaves codegen, but using the right abstractions it turns into a very
manageable task as well.

Here's a compiler written in Haskell for LLVM [0].

[0] [http://www.stephendiehl.com/llvm](http://www.stephendiehl.com/llvm)

~~~
TazeTSchnitzel
> I would rather kill myself than write a lexer in C

I've written several lexers in C-like languages, it's not that painful. I
wouldn't dare write a parser though.

------
philippeback
Parsers made easy and pretty much interactive:

[http://www.lukas-renggli.ch/blog/petitparser-1](http://www.lukas-
renggli.ch/blog/petitparser-1)

[http://www.themoosebook.org/book/internals/petit-
parser](http://www.themoosebook.org/book/internals/petit-parser)

This include the dynamic generation of blocks and arrows style things...

------
philippeback
Soulmate of yours here:
[https://clementbera.wordpress.com](https://clementbera.wordpress.com)

Lots of optimizations going on for OpenVM.

[https://github.com/OpenSmalltalk/opensmalltalk-
vm](https://github.com/OpenSmalltalk/opensmalltalk-vm)

Interesting bit: VM is written in Slang and transformed into C then compiled.

So you can livecode your VM. In the VM simulator.

------
elcct
I did something similar in the distant past, that is I wrote subset of C
compiler (functions, standard types, pointers) to imaginary assembler and then
bytecode interpreter. It was awesome fun, but also I got so into it my - then
- girlfriend started to question my commitment to the relationship. So be
careful, this is really interesting thing to do :)

------
rosstex
In this same vein, I recommend coding an emulator! It can be an excellent
experience.

[http://www.multigesture.net/articles/how-to-write-an-
emulato...](http://www.multigesture.net/articles/how-to-write-an-emulator-
chip-8-interpreter/)

------
curtfoo
Yes I wrote a parser/compiler and interpreter for a custom domain specific
language and it had a similar effect on my career. Lots of fun!

Okay I guess technically I used a parser generator that I then modified to
build an AST and convert it into assembly-like code that fed the interpreter.

------
reacweb
Bill gates also started with an interpreter (basic interpreter). Many parts of
early windows applications were developed in p-code and visual basic is an
important part of Microsoft success.

------
loeg
I like implementing emulators, because the toolchain and architecture
specification are all there already. You get to implement what is basically a
little embedded CPU.

------
dpratt
I'd add a driver for a non trivial binary protocol - I ended up implementing a
JVM driver for Cassandra a few years ago, and it was a blast.

~~~
voltagex_
Working with data as binary is a good test of a high level language skills.
When I was playing around with DNS, I wrote terrible code like
[https://github.com/voltagex/junkcode/blob/master/CSharp/DNS/...](https://github.com/voltagex/junkcode/blob/master/CSharp/DNS/BaxterWorks.DNS.Parsers/Query.cs).
A better way to do it is
[https://github.com/kapetan/dns/blob/master/DNS/Protocol/Head...](https://github.com/kapetan/dns/blob/master/DNS/Protocol/Header.cs)
\- structs, of course.

I'd add reading and implementing a protocol from RFC - it's a great way to
start thinking about design, especially if you read the original RFCs and work
forward through the revisions and see what was kept vs deprecated.

