
How to write a toy JVM - oftenwrong
https://zserge.com/posts/jvm/
======
lihaoyi
On the topic of toy JVMs, my Metascala project is a JVM implemented in ~4000
lines of Scala that is complete enough to interpret itself!

\-
[https://github.com/lihaoyi/Metascala](https://github.com/lihaoyi/Metascala)

Interpreting code in Metascala is about 100x slower than just running it, and
interpreting code in Metascala interpreted by Metascala is about 10,000x
slower than just running it. Not going to win any performance benchmarks, but
it's a cool demonstration of how a JVM works.

All the runtime data structures, memory allocation and garbage collection,
method dispatch logic, stack trace management, exceptions, inheritance, object
layouts, etc. are all implemented in a relatively small amount of relatively
simple code.

For example, here is the implementation of the heap, which allocates the VM's
objects inside a big byte array and has a simple copying semispace garbage
collector to clean them up:

\-
[https://github.com/lihaoyi/Metascala/blob/master/src/main/sc...](https://github.com/lihaoyi/Metascala/blob/master/src/main/scala/metascala/Heap.scala)

~~~
rightbyte
I like the commented out print statements all over the place. It make the code
seem alive somehow.

~~~
z3t4
Most of the performance overhead is probably the print messages as those are
likely sync.

------
nathell
Implementing a VM is a unique experience. You take some bytecode that's
initially a meaningless blob, sketch out an execution environment, and start
implementing opcodes, one by one, and the blob actually starts doing real
things.

Back at uni, I've done this with the ZMachine [1], in (non-idiomatic, newbie)
Haskell [2], using Zork 1 as the blob. Sixteen years after, I remember the
elation when my interpreter first printed out the familiar message:

 _You are standing in an open field west of a white house, with a boarded
front door._

[1]:
[https://en.wikipedia.org/wiki/Z-machine](https://en.wikipedia.org/wiki/Z-machine)

[2]: [https://github.com/nathell/haze](https://github.com/nathell/haze)

~~~
AndrewDavis
It truly is unique. I wrote a chip8[1] interpreter shortly after finishing my
first year of university and was still a very novice programmer.

My implementation was poor even by novice standards, i knew there was a lot of
spaghetti but I didn't mind. Implementing each opcode was like beating a level
in a video game - and it worked, not perfectly and with some unsolved bugs.
Tetris worked flawlessly though.

[1][https://en.wikipedia.org/wiki/Chip-8](https://en.wikipedia.org/wiki/Chip-8)

------
exabrial
The opening line confuses me... The JVM is one of the fastest, well
established, well documented, widely deployed platform in the world. Hundreds
of languages run on it, quite quickly.

~~~
TedDoesntTalk
I don't think English is his first language. Go easy.

~~~
exabrial
Tone is hard to communicate over the internet. I wasn't attacking him,
apologies, my comment was purely critiquing the statement itself.

------
afandian
I bought the JVM Specification book some years ago. It was fun holiday reading
(seriously) seeing how they bytecode was put together, how try-catch blocks
really work etc. It's quite readable as general interest, if you're into that
kind of thing.

I don't think it's ever been much actual use to me in programming, but was
nice-to-have background knowledge.

~~~
bArray
You mean like this book?
[https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf](https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf)

I'm very tempted to pick a hard copy up!

~~~
pjmlp
I would advise the up to date version though.

[https://docs.oracle.com/javase/specs/jvms/se14/html/](https://docs.oracle.com/javase/specs/jvms/se14/html/)

~~~
elric
Can't seem to find a dead-tree version of this one :-(

~~~
bArray
Yeah, I'm quite interested in having a dead tree version to read. I guess I
could send the PDF off to a book maker - not entirely sure of the legalities
though...

------
bArray
I fell down the rabbit hole trying to answer "what is the smallest JVM that
could be implemented".

I've come to TinyVM [1] made for some Lego system which uses about 10kB of
RAM. I was wondering about porting it to other micro-controllers...

[1] [http://tinyvm.sourceforge.net/](http://tinyvm.sourceforge.net/)

------
on_and_off
So why CAFEBABE ? it is a random hex value looking like words or intentional
to say that ?

edit, found it : [https://dzone.com/articles/the-magic-word-in-java-
cafebabe](https://dzone.com/articles/the-magic-word-in-java-cafebabe)

~~~
mettamage
I never thought of CAFE as a hex word. I always knew about DEADBEEF and so on.

Now I feel that there should be a website that lists all kinds of fun hex
words.

Like CAFEFACE

~~~
Symbiote
We don't really need a website to do it:

grep -i '^[0-9a-f]*$' /usr/share/dict/words

~~~
raverbashing
I'd say you might want to grep for [0134567] instead (being charitable) and
maybe use sed y command to find the l33t equivalences

0 - o

1 - I

3 - E (redundant)

4 - A (redundant)

5 - S

6 - G

7 - T

------
m12k
To anyone writing an article like this: Please mention what language you are
using before your first code listing, not four paragraphs after.

~~~
jbreckmckye
I didn't feel I needed to know Go to read the example code.

(Plus, didn't "if err != nil" give it away?)

~~~
afandian
Fun project (probably already been done). A "in which languages is this valid
syntax" machine. On its own `if err != nil` could parse as Rust and Python.

------
bogomipz
This was a great read! Do you plan to continue to develop this? I would love
to see more posts if so. I actually think this would be a great book as well.
Cheers.

~~~
suyash
Yes, uncover the magic, good job!

------
giancarlostoro
Outside of the specs for the JVM itself, has anybody studied from alternate
resources to learn move about the JVM? Like any specific videos or
illustrative resources? I love learning about the interiors of languages, but
sometimes you need a second person to explain things or some visuals to really
catch it.

~~~
eropple
Pretty much anything Aleksey Shipilev has ever done, tbh. Here's an example -
[https://shipilev.net/blog/2014/jmm-
pragmatics/](https://shipilev.net/blog/2014/jmm-pragmatics/)

~~~
jcims
From your link to Shiplev’s talk comes this definition of ‘nasal demons’:

[http://www.catb.org/jargon/html/N/nasal-
demons.html](http://www.catb.org/jargon/html/N/nasal-demons.html)

In there is a link to the thread that originated the term from 1992:

[http://groups.google.com/groups?hl=en&selm=10195%40ksr.com](http://groups.google.com/groups?hl=en&selm=10195%40ksr.com)

If you scroll to the bottom you’ll find possibly the most lost soul on the
Internet reviving the most dead thread ever.

------
MuffinFlavored
Include some benchmarks? :) I've always wondered how production/robust JVM
implementations make themselves "faster" after they warm up.

~~~
chrisseaton
> I've always wondered how production/robust JVM implementations make
> themselves "faster" after they warm up.

They compile the bytecode just-in-time to native machine code, using many of
the same techniques a conventional native code compiler.

~~~
nineteen999
Maybe my knowledge isn't up to date, but I always understood (since Hotspot
anyway) that not all bytecode is necessarily JIT'd to native code.

From the Hotspot Wikipedia page:

"Both VMs compile only often-run methods, using a configurable invocation-
count threshold to decide which methods to compile."[1]

Also see very old discussions at StackOverflow[2][3]. Then of course there are
compilers (eg. gcj) which compile to native up-front.

[1]
[https://en.wikipedia.org/wiki/HotSpot#Features](https://en.wikipedia.org/wiki/HotSpot#Features)

[2] [https://stackoverflow.com/questions/7100365/why-doesnt-
javas...](https://stackoverflow.com/questions/7100365/why-doesnt-javas-jit-
compiler-translate-everything-to-native-code)

[3] [https://stackoverflow.com/questions/16568253/difference-
betw...](https://stackoverflow.com/questions/16568253/difference-between-jvm-
and-hotspot)

~~~
bitcharmer
Only hot code paths get compiled. That's after 10000 executions on "server"
JVM and 1000 executions on a "client" JVM by default.

This is by design, and if you need everything compiled right away you can set
the compilation threshold to 1.

I don't see any value in compiling parts of code that only gets executed
during bootstrap.

~~~
nineteen999
> I don't see any value in compiling parts of code that only gets executed
> during bootstrap.

Not disagreeing with you there, since stopping to compile code/optimize at
runtime contributes to sluggish interactive performance.

~~~
bitcharmer
Compilation is performed concurrently to the application in specialised
compilation threads. Only OSR (on-stack-replacement) requires stopping
execution.

------
fmakunbound
> what’s missing The other two hundred instructions, the runtime, OOP type
> system, and a few other things.

Also bytecode verifier

------
orangepanda
> There are 11 groups of instructions [missing] and most of them are trivial:

> * Conversions (int to short, int to float, …).

float to string should be the most trivial of all

------
praveen9920
I always liked writing VMs

But we have to agree that JVM feels like steam engine running in the age of
electric motors. With virtualisation available cheaply in every level
(hardware, arch, OS and docker) virtualisation at runtime feels like overhead.

JVM was originally created for purpose of 'write once, run anywhere', which I
think can be addressed in alternative ways, look at golang

~~~
tannhaeuser
"Steam Engine" isn't what I'd call the JVM; it's still a remarkable piece of
software driving hundreds of thousands business apps with a strong focus on
long-term maintenance, excellent mindshare, and really very good performance
for what it does (though in somewhat of a stasis with Java9+ deprecation, and
Spring-centric development). However, I get what you mean: the JVM was
originally intended as a portable runtime for set-top boxes at a time when
there were many ISAs (MIPS, PPC, etc.) around, and not just x86 and ARM like
today. I believe Java is still a mandatory part of BlueRay disc players. It
was also at one time a candidate for running in the browser (also reflected in
the Java/JavaScript naming). Incidentally, Java was rejected in the browser by
browser vendors, and JavaScript became "more like Java" instead, and has
followed a similar path from being a browser language to being used also on
the server-side (something that Netscape started already in 1996 or so).

Edit: also I want to mention that Java was IMO the single one tech that saved
the scene from being an Microsoft-only world, and also significantly paved the
way for today's Linux dominance on the server-side; Java was picked by many
devs because it helped to keep open the door for migrating to Solaris or Linux
in an increasingly MS-dominated landscape in mid 90's

~~~
praveen9920
I agree all that about jvm. It had considerable impact on industry and it is
nice piece of software.

But you missed my point entirely. I did not call JVM a steam engine in
derogatory way. On the contrary, steam engine is awesome piece of technology
which had tremendous impact on industry.

But JVM has issues which recent runtimes have learnt and solved in different
ways.

~~~
tannhaeuser
> _But JVM has issues which recent runtimes have learnt and solved in
> different ways._

Such as?

