Interpreting code in Metascala is about 100x slower than just running it, and interpreting code in Metascala interpreted by Metascala is about 10,000x slower than just running it. Not going to win any performance benchmarks, but it's a cool demonstration of how a JVM works.
All the runtime data structures, memory allocation and garbage collection, method dispatch logic, stack trace management, exceptions, inheritance, object layouts, etc. are all implemented in a relatively small amount of relatively simple code.
For example, here is the implementation of the heap, which allocates the VM's objects inside a big byte array and has a simple copying semispace garbage collector to clean them up:
This is awesome! I have been working on a interpreter for a toy language and my next step would be study the JVM. I'll definitely check your project out
Implementing a VM is a unique experience. You take some bytecode that's initially a meaningless blob, sketch out an execution environment, and start implementing opcodes, one by one, and the blob actually starts doing real things.
Back at uni, I've done this with the ZMachine [1], in (non-idiomatic, newbie) Haskell [2], using Zork 1 as the blob. Sixteen years after, I remember the elation when my interpreter first printed out the familiar message:
You are standing in an open field west of a white house, with a boarded front door.
It truly is unique. I wrote a chip8[1] interpreter shortly after finishing my first year of university and was still a very novice programmer.
My implementation was poor even by novice standards, i knew there was a lot of spaghetti but I didn't mind. Implementing each opcode was like beating a level in a video game - and it worked, not perfectly and with some unsolved bugs. Tetris worked flawlessly though.
I did this with the Z80 machine code variant that runs in the GameBoy. I defined my goal as, run the boot rom until it jumps to the game. The boot rom makes sound and moves a logo on the screen. Not the first person to do this and not the last. But interesting projects you can do after you get one going is, study on why Android doesn't run JVM bytecode, or come up with a better byte code for Java/Scala.
The opening line confuses me... The JVM is one of the fastest, well established, well documented, widely deployed platform in the world. Hundreds of languages run on it, quite quickly.
"Whether we like it or not, but Java is one of the most widely used programming languages. However, since most of the applications in Java are either too boring or too complex - not every Java developer has enough curiosity to look under the hood and see how JVM works."
I don't think the author contradicted what you just said. I guess you may be confused by the "whether we like it or not" part? I feel like the author is commenting on Java the language (vs the JVM).
I bought the JVM Specification book some years ago. It was fun holiday reading (seriously) seeing how they bytecode was put together, how try-catch blocks really work etc. It's quite readable as general interest, if you're into that kind of thing.
I don't think it's ever been much actual use to me in programming, but was nice-to-have background knowledge.
Yeah, I'm quite interested in having a dead tree version to read. I guess I could send the PDF off to a book maker - not entirely sure of the legalities though...
Apple uses a bunch of hexspeak for error codes in iOS, including 8BADFOOD ("ate bad food"), C00010FF ("cool off" - related to thermal events), and DEAD10CC ("deadlocc" - deadlock).
There's also DEFEC8ED ("defecated"), which was used by OpenSolaris for core dumps.
Fun project (probably already been done). A "in which languages is this valid syntax" machine. On its own `if err != nil` could parse as Rust and Python.
This was a great read! Do you plan to continue to develop this? I would love to see more posts if so. I actually think this would be a great book as well. Cheers.
Outside of the specs for the JVM itself, has anybody studied from alternate resources to learn move about the JVM? Like any specific videos or illustrative resources? I love learning about the interiors of languages, but sometimes you need a second person to explain things or some visuals to really catch it.
While not directly a resource, I found writing agents for the JVM (java.lang.instrument agents and JVMTI agents) to be quite enlightening and rewarding. Depending on what you want to do you'll have to deal with bytecode transformation/instrumentation, JVM events, JNI and many other things. For example, have you ever thought about how a java debugger (JPDA, JDI, JDWP) works or a tool such as OverOps (Takipi)?
Compilation is performed concurrently to the application in specialised compilation threads. Only OSR (on-stack-replacement) requires stopping execution.
But we have to agree that JVM feels like steam engine running in the age of electric motors. With virtualisation available cheaply in every level (hardware, arch, OS and docker) virtualisation at runtime feels like overhead.
JVM was originally created for purpose of 'write once, run anywhere', which I think can be addressed in alternative ways, look at golang
"Steam Engine" isn't what I'd call the JVM; it's still a remarkable piece of software driving hundreds of thousands business apps with a strong focus on long-term maintenance, excellent mindshare, and really very good performance for what it does (though in somewhat of a stasis with Java9+ deprecation, and Spring-centric development). However, I get what you mean: the JVM was originally intended as a portable runtime for set-top boxes at a time when there were many ISAs (MIPS, PPC, etc.) around, and not just x86 and ARM like today. I believe Java is still a mandatory part of BlueRay disc players. It was also at one time a candidate for running in the browser (also reflected in the Java/JavaScript naming). Incidentally, Java was rejected in the browser by browser vendors, and JavaScript became "more like Java" instead, and has followed a similar path from being a browser language to being used also on the server-side (something that Netscape started already in 1996 or so).
Edit: also I want to mention that Java was IMO the single one tech that saved the scene from being an Microsoft-only world, and also significantly paved the way for today's Linux dominance on the server-side; Java was picked by many devs because it helped to keep open the door for migrating to Solaris or Linux in an increasingly MS-dominated landscape in mid 90's
I agree all that about jvm. It had considerable impact on industry and it is nice piece of software.
But you missed my point entirely. I did not call JVM a steam engine in derogatory way. On the contrary, steam engine is awesome piece of technology which had tremendous impact on industry.
But JVM has issues which recent runtimes have learnt and solved in different ways.
There's not much difference between Go and the JVM, except one is compiled just in time and the other ahead of time. There's also been a few ahead of time compilers for Java over the years, with substrateVM being the latest one for example.
> Go does have an extensive library, called the runtime, that is part of every Go program. The runtime library implements garbage collection, concurrency, stack management, and other critical features of the Go language
I think people can argue all they want about semantics, what is a virtual machine? There are not "proper definition" for this. Ultimately, if you abstract away the details of the running environment, to me, you've created a virtual machine.
The Go FAQ continues by insinuating it isn't a virtual machine because it doesn't do just in time compilation and only ahead of time. I think that's just word play.
> A process VM, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware or operating system and allows a program to execute in the same way on any platform.
Personally, I think "Managed Runtime Environment" is a better term, and Go would definitely fall into that term.
So I recognize the difference are real. Just in time Vs ahead of time. But it isn't docker, or any other layer of virtualization which magically allow Go to run over many machines. Its because it has an extensive runtime that abstracts away their details. And this runtime has to be bundled in every compiled application. Maybe we should start talking about virtual runtimes?
> Ultimately, if you abstract away the details of the running environment, to me, you've created a virtual machine.
So then are operating systems also virtual machines? Is the C standard library a virtual machine? This seems like a pointless definition to me.
> Maybe we should start talking about virtual runtimes?
Isn't that just called a runtime? Java is "virtual" because the instructions in the Java class files don't run directly on the CPU, unlike Go where the instructions in the binary do run directly on the CPU.
If you want to understand the differences and similarities between Go and Java in any meaningful way, you've got to discuss something more than the vague idea of a virtual machine. That's all I'm really saying.
Here it seems specifically like OP wanted to say that they find ahead of time statically linked with cross compilation to be a more convenient model of code compilation and distribution.
With that in mind, we're now better equipped to compare and contrast, and discus the pros/cons.
When we restricted ourselves to VM vs non-VM, I didn't feel like we were in this meaningful zone.
And from that angle for example, we can see that Java offers ahead of time compilation, which can be statically linked as well if one wants too. One example is GraalVM native image. There were also some options before GraalVM that I think were all commercial. That said, I've yet to see a cross-compilation offering.
The JIT approach has benefits as well, the user doesn't need to select the appropriate package for their OS. The runtime distribution is shared between all apps that uses it. It's easier to offer debugging/profiling/measuring capabilities of the running application. The user can tweak certain parameters of the runtime or even choose an alternative runtime implementation. If runtime has any known security vulnerability, the user can upgrade it to a new secure version, without waiting for a new update of the app. Better peek performance of hot code paths can in theory be achieved. Just to name a few.
Another difference is Go has no intermediate representation. While Rust, C# and Java for example do. This intermediate representation is quite useful in allowing multiple languages to reuse the Java bytecode runtime. That's not as simple in Go, since Go is a harder target for compilation.
Kind of, the ISO C is defined in terms of the C abstract machine.
> The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
You've forgotten about different processor architectures? Even today there's Intel and ARM, but at the start there was also Sparc which Sun wanted people to use. So, just like WebAssembly, java bytecode operates as a platform-independent instruction set. (ARM provides processors with some ability to execute bytecode directly - Jazelle - but it seems to be obsolete?)
Virtualisation is also not readily available from within userland. We're not at "each process runs in its own VM" yet. Perhaps we'd want "each browser tab runs in its own VM", which is kind of what Javascript aims to achieve.
Microsoft have also gone in this direction; they don't call the CLR a "VM" but it performs many of the same functions in order to run CIL/MSIL bytecode.
Then your feelings are wrong. No runtime environment provides anything near the JVM's combination of performance, productivity and observability. It's steam engines vs. an electric motor, alright.
Yes, there is a cost in footprint and warmup, but footprint is very often (though not always) the right cost. Of all software resources -- development, maintenance, memory, processing and bandwidth -- it's the cheapest (well, second to storage). By comparison Go is less RAM hungry but it's sluggish and opaque.
But the opinion that Go is sluggish and opaque is purely subjective. For anything but long running server apps, Java's memory consumption makes it feel sluggish, especially on desktop.
> But the opinion that Go is sluggish and opaque is purely subjective
It is not. It's significantly slower than Java and has maybe 5% of its observability tools. Go is technologically more than a decade behind Java in compilation, GC, and observability. I'm not saying it's not good enough for certain things -- sometimes primitive and simple gets the job done, and you might not need a Ferrari to go to the grocery store -- but let's not turn an acceptable compromise into a win.
> Java's memory consumption makes it feel sluggish
Why? If you don't mind it performing as poorly as Go, you can reduce the heap size and pick a low-latency collector like ZGC, but really Java says, give me the cheapest resource, RAM, and in return you'll get something much better.
> especially on desktop
Java desktop applications consume less RAM than Electron ones, and IntelliJ is more snappy than Atom (and recently, unfortunately, vscode, too), although there are perfectly valid reasons to prefer Electron.
Like Java, I can write a Go program, compile it on my computer for 3 different OSes, ship the binaries, and I can be reasonably sure that it works on other OSes.
All of this has been possible for decades (eg cross compile C++), but Go is the first mainstream compiled-to-native language that I know that makes it easy. Thanks to this, I can run many utilities written in Go on my Windows box, written mostly by people who likely never even tested in on Windows. That's pretty amazing to me.
People just drop a Windows binary on GitHub and think "I'll get the PR if stuff doesn't work". They'd never do that if it cost them effort to cross build a Windows binary. People didn't do that before Go, and many modern OSS command line utilities / server apps only worked on unixes.
The JVM's "write once, run everywhere" doesn't just apply to any OS that happens to have a JVM available... it means any architecture. You don't have to compile one version of your program for MIPS, ARM, x86, etc. The same bytecode will run on any JVM without being touched... something C/C++ (which were popular when Java was being created) couldn't do.
> and many modern OSS command line utilities / server apps only worked on unixes
This, I think, likely has more to do with being POSIX compatible and/or oriented for headless server usage where a shell is "home" for many 'nix admins.
This is a little bit off. The only architecture supported by Java is the JVM. It is a fictional CPU architecture with predefined characteristics, such as a memory model. The implementation of a JVM is a mapping from a real CPU to the J
VM's requirements. So no, JVM doesn't mean 'any architecture', it means 'JVM'. It only applies to OSes where a JVM is available.
The developer no longer cares what architecture the program will run on, it just works.
You compile to bytecode, and stop caring. The JVM becomes your only target architecture, regardless of what actual architecture the system has.
That's unlike C or pretty much any (all?) other compiled languages around in the early 90's when Java was still Oak and Gosling was just getting started.
> Like Java, I can write a Go program, compile it on my computer for 3 different OSes, ship the binaries, and I can be reasonably sure that it works on other OSes.
With Java, you compile one “binary” and it works not just on three OSes, but all of them that have a JVM available. The beauty of it is that it’s as portable as an interpreted language, no cross-compilation needed at all.
Heard about cross compilation while learning Go. Go surely made it mainstream and easy. Maybe thats the reason so many CLI's being build in Go. Before that python was mostly used for CLI but windows does not have python installed by default.
Sure, and it be confused if your favorite Go iptables frontend would work on Windows.
But there's plenty use cases for which the go standard library offers a sufficiently good abstraction that cross compiling just works. The same holds for JVM apps for ages (and for nodejs etc etc) but not eg for C(++).
As a windows dev I very much noticed an increase in good CLI tools that just work on my box since Go got to the scene. I think that's cool.
- https://github.com/lihaoyi/Metascala
Interpreting code in Metascala is about 100x slower than just running it, and interpreting code in Metascala interpreted by Metascala is about 10,000x slower than just running it. Not going to win any performance benchmarks, but it's a cool demonstration of how a JVM works.
All the runtime data structures, memory allocation and garbage collection, method dispatch logic, stack trace management, exceptions, inheritance, object layouts, etc. are all implemented in a relatively small amount of relatively simple code.
For example, here is the implementation of the heap, which allocates the VM's objects inside a big byte array and has a simple copying semispace garbage collector to clean them up:
- https://github.com/lihaoyi/Metascala/blob/master/src/main/sc...