Hacker News new | comments | show | ask | jobs | submit login
Doppio: JVM written in JavaScript (plasma-umass.github.io)
228 points by api on July 12, 2017 | hide | past | web | favorite | 88 comments

This is cool. Not because it's useful or performant etc. but because it shows how simple java bytecode is.

After seeing this I wanted to know:

- how many opcodes exist in the JVM

- how easy would it be to create another language that compiles to the JVM.

- how does the JVM represent basic data types

The list goes on. All because someone built a JVM in JS.

What happened to the hacker mentality of HN? Where's the curiosity gone?

This. I don't think enough people appreciate how the simple and clear the JVM byte code's design is.

Most platforms in use today are so complex. Take a look at x86, or one of the latest ECMAScript specifications. Even LLVM bitcode is a bit complicated compared to the JVM bytecode.

I think in programming language research, going forward, we need some research into "high-level bytecodes". I.e. bytecodes that capture high-level concepts in a clear and simple way.

I've actually found CIL to be simpler than JVM bytecode, even with additional capabilities. Simple addition is a good example:

JVM: dadd, fadd, iadd, ladd. Addition for double, float, int, and long data, respectively.

CIL: add, add.ovf, add.ovf.un. Addition, signed addition with overflow check, and unsigned addition with overflow check.

Those bytecodes happened because the JVM bytecode was designed to be easily interpreted, whereas CIL was designed for JIT compilation. So for example CIL's `add` opcode is missing info that needs to come from the context in which it is used and the JVM's `iadd` and variations are easier to interpret.

You can see this design choice even today in how the JVM and CLR work. The JVM starts execution in an interpreter mode, then gradually compiles pieces of code as it detects bottlenecks. So the compilation that happens at runtime is very gradual and based on runtime measurements.

The CLR on the other hand has done JIT compilation, with the ability to cache the compiled code for faster startup (e.g. Ngen). So it has been oriented towards ahead of time compilation.

Different trade-offs.

Sure. I tried to find information yesterday when I posted that regarding whether / how the `add` code handles differing types. Now since I'm not on my phone, I looked up the ECMA spec and it looks like CIL still only allows like types, with some minor exceptions involving `native int`. So it's just up to the compiler to make sure that actually happens and extend types as necessary.

I also prefer the CIL and also the C# language to Java (though I really like Java and its ecosystem), but we have to admit that MS had 5-10 years to learn from the Java design decisions and their effects, and still did not manage to overcome every problematic point :)

They also had difference purposes driving their design, and we should not forget that.

JVM - Bytecodes only for Java

CIL - Bytecodes for VB.NET, C#, Managed C++ and the 1.0 SDK contained examples for Lisp, Pascal, Eiffel, Ada, ....

Of course, history then took another path for the JVM.

EDIT: Forgot that C++/CLI replaced Managed C++, which was the C++ variant on 1.0.

> JVM - Bytecodes only for Java

Although it should be mentioned that java language semantics are largely (depending on how you measure them :-) absent from the jvm. (Default methods were a very unusual change in that respect)

And, as you say, subsequent history has weirdly inverted the JVM and CIL. The former is a lot less 'J' and the latter is a lot less 'C' ;-)

I would bet that there's a lot higher rate of .NET users running VB.NET than there are JVM users running Scala/Clojure/Kotlin. They just don't tend to be the sort to post to Hacker News.

But there are probably more CPUs running non-java JVM languages ;-) It all depends what you count, as usual.

(I was really referring to the healthy non-java jvm language community. JRuby, Scala, and Clojure have lively communities and commercial backing)

Although this might really be true, I am skeptical that this has something to do with the inherent design of the Java bytecode compared to the CIL.

I think the bane of James Gosling's existence is being incorrectly associated with Java-the-lacklustre-language instead of correctly being associated with the superb JVM.

Personally, I always associate him with a pre-GNU version of Emacs: https://en.wikipedia.org/wiki/Gosling_Emacs

If there is a problem with java bytecode, then that is hard to verify. You need multiple passes over the bytecode, until you've reached a steady state. There is also the "issue" that Java bytecode allows arbitrary control flows with goto, while Java doesn't.

IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.

> IMHO WebAssembly solves that better but I also need to admit that they could already learn from Java.

While that may be true of the language Java, WebAssembly's jump instructions are not without their annoyances too. For example, the JVM bytecode requires your stack to be precise when jumping, WebAssembly just cares about the most recent piece. If you expect your jump targets to have the same stack layout, WebAssembly makes the impl handle it. I had to account for this and other differences in my compiler [0].

0 - https://github.com/cretz/asmble#control-flow-operations

The JVM has a full specification: https://docs.oracle.com/javase/specs/jvms/se8/html/

The specification is dense but still readable. I'd start reading in order, up through the third chapter which explains how to compile various Java snippets to bytecode. Perhaps start there and go back to the second chapter when you need more context.

I've also found this old but still relevant book to be a good guide: http://www.artima.com/insidejvm/ed2/

Here ya go. Java's internals are super well documented/specified

Loader stuff:


Byte Code is simple:


Doesn't look that simple.

Ah friendo you've never read x64 assembly

Back when Java decompilers were much more primitive and easy-to-break than they are now, I wrote some mods for a Java game in a language called "Jade", which was essentially a textual syntax for raw JVM bytecode. Oddly enough I can't easily find any references to this tool online now.

Are you sure it wasn't Jasmin?


Aha, you're right! Thanks.

Beyond the Internet Event Horizon :(

Overtaken by HTML templating languages, gems and a dozen other things with the same name. Eventually these items reach such a density to form a black hole, sucking through the older items, perhaps to another place in spacetime.

I've used Krakatau, a JVM assembler / disassembler written in Python, to good effect in a project: https://github.com/Storyyeller/Krakatau

People have written x86 emulators in javascript. That doesn't make x86 bytecode simple.

If you like Doppio JVM, you may find our project interesting too http://blog.leaningtech.com/2017/06/announcing-cheerpj-java-.... Our approach is to rely heavily on AOT compilation of JARs to JS to achieve higher perf, while still supporting full reflection. Moreover the RT is automatically split to reduce download time and bandwidth. A Swing application starts with ~15MB on our system. http://cheerpjdemos.leaningtech.com/SwingDemo.html

Also, see our follow-on project, Browsix (http://browsix.org), which makes it possible to run Unix applications inside the browser.

So now you can run the JVM in JavaScript running in the JVM....

For those who don't know the JVM comes with a JavaScript engine by default: http://www.oracle.com/technetwork/articles/java/jf14-nashorn.... And you can even get v8/nodejs to be compiled in your java jar binary: https://github.com/eclipsesource/J2V8

This is very cool. You can basically even call all Java classes (and the other way around) from that JavaScript! (And reliably limit the classes that can be used if you want to execute JS in a safe environment.) Why isn't this used more often? (Or is it used more often?)

I can't speak to whether it is used more often, but I would bet the java-based Nashorn vm is significantly slower than the c++-based nodejs. In fact, a cursory google search shows this is the case, and it's not even close.

See: http://blog.jonasbandi.net/2014/03/performance-nashorn-vs-no... http://pieroxy.net/blog/2015/06/29/node_js_vs_java_nashorn.h...

(I can't speak to the quality of either of these tests, but the results seem decisive)

While it is of course going to be slower, "it's not even close" is a judgement claim. The tests you have link talk about 1.6-2-3 times the difference. In a lot of applications the performance hit is going to be worth gaining access to the whole Java ecosystem of libraries. Also the performance is very likely to increase with newer versions.

Nashorn isn't that slow, but there's another JS-on-the-JVM project called Graal.js which is about as fast as NodeJS/V8.

One reason Nashorn isn't used that much is that it doesn't expose a node.js compatible API. JS people often want Node specifically, not just the ability to run JS. It has some cool features though. The shell mode is neat.

Here's a great testing library that leans heavily on Nashorn. https://github.com/intuit/karate. I find this a beautiful use of JavaScript in Java.

I think it is impolite not to give accurate credit. It is Typescript... and not vanilla javascript.


May as well write Java.

How so?

Use Java to create bytecode which is executed in JS to produce JVM bytecode

Impressive, loaded a clojure.jar, got a repl and wrote/called some silly functions, it worked...

Of course, the Java bytecode has only a limited number of instructions.

I would like to run this in Rhino[1] - Mozilla's Javascript engine written in Java - just because. Turtles all the way down...

[1]: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rh...

I investigated trying to use Doppio for Peergos. It is a very cool project. In the end we decided against it because it wasn't fast enough for our use case yet, and it requires the page to download the whole JVM (at least the rt.jar - the runtime) which is >60 MiB. We managed to get around this by manually stripping out the parts of the JVM that we didn't need, which brought it down to a few MiB.

I imagine that once they update it to Java 9 with the modular JDK (and resultant splitting of rt.jar) this reduction will largely happen automatically with it lazily downloading the parts that it needs.

A similar project from Mozilla: https://github.com/mozilla/pluotsorbet

It was targeted toward running J2ME apps on FirefoxOS phones. With that project dead it's no longer under active development.

Atwood's Law:

"Any application that can be written in JavaScript, will eventually be written in JavaScript."


"Any module written in JavaScript will eventually be ported to Go."

Done [0] and done [1]. Though, the latter one at least, isn't really in a production state because compiling the JVM stdlib takes hours in Go.

0 - https://github.com/zxh0/jvm.go

1 - https://github.com/cretz/goahead

See also luje. "An experimental (read: toy) Java virtual machine written in pure Lua."


Does it run java applets?

Java applets for JVM written in TypeScript transpiled into JavaScript will be the next big thing. Freeze any browser, anywhere, instantly. Imagine the print out of stack trace with error - so exciting.

It will, just wait for WebAssembly to be more mature.


"Any application that can be written in JavaScript, will eventually be written in JavaScript." - Jeff Atwood

So, every application will be written in JavaScript?

Every application that can

Very few can't

"and any application that can't be written in javascript will soon be compiled to javascript" -- me

There is also TeaVM:


This is actually quite awesome.

Would be cool if it could run GUI programs!

JVM written in Javascript, what could be wrong?

Assuming it worked well enough, you could use it to run legacy Java applets without having the Java plugin installed. That would at least improve the security situation a bit while still allowing for legacy applets.

Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.

Don't worry, it has Nashorn ;)

Like Taco Town :D Run Javascript in the JVM in Javascript

But... why??


> This paper presents DOPPIO, a JavaScript-based runtime system that makes it possible to run unaltered applications written in generalpurpose languages directly inside the browser.

Someone should really have told them about webassembly...

The first commit to Doppio was 13 Feb 2012

One year before asm.js appeared, in Mar 2013

Three years before WebAssembly appeared, in June 2015

Five years later, In 2017, WebAssembly still doesn't have a GC and cannot run the Java/Scala/Groovy programs that Doppio was able to run in 2012

Doppio's lineage goes back even further than 2012. A "JVM in JS" was given in Emery Berger's 691ST course in the fall of 2011. My notes show that I submitted my "finished" JVM on October 26, 2011. I say "finished" because it became obvious rather quickly that a lot of things would be difficult if not impossible in JS (like threading and synchronization) and so we negotiated with Emery to come up with a subset of Java that we could reasonably implement for a class project. IIRC, there were 6-8 VMs written that semester.

Doppio proper (the repository you refer to) started during the spring meeting of 691. Those guys went above and beyond the minimal spec we implemented, and they tackled a lot of stuff that we thought was impossible. Thus the research paper.

IIRC, we also had to write a decentralized chat program in JS that semester. That was also "fun".

Yes, I created a HN account just to post this.

Thank you. This is great work from both an engineering and scientific perspective.

The paper was released at least a year before WebAssembly was even announced...

... and a year after WASM's predecessor was announced.

Directly from asm.js FAQ (http://asmjs.org/faq.html):

Q. Can asm.js serve as a VM for managed languages, like the JVM or CLR?

A. Right now, asm.js has no direct access to garbage-collected data; an asm.js program can only interact indirectly with external data via numeric handles. In future versions we intend to introduce garbage collection and structured data based on the ES6 structured binary data API, which will make asm.js an even better target for managed languages.

Just pointing out that the work on what would become WASM was already heavily underway by that point.

This paper was first published in June 09–11 2014 (first commit Feb 11 2014), Webassembly was first announced on June 17 2015.

asm.js that lead to WebAssembly was announced 21st March, 2013.

Doppio relies heavily on the Javascript object model, the asm.js subset of Javascript isn't very relevant.

Webassembly doesn't have GC (yet)

Does doppio actually rely on the JavaScript garbage collector?

yes -- it implements Java objects on top of Javascript objects.

> (Read the academic paper)

I admire the effort, but: doesn't "academic" mean "scientific"? Can there possibly be any "science" in having a well-known VM reimplemented in a well-known programming language?

Please take a look at the contributions in the paper at the end of the introduction. The JVM uses operating system abstractions (like a file system, threads, and synchronous APIs) that were non-existent in the browser when the paper were published, and are still widely unavailable natively. Figuring out how to construct the functionality necessary not to just simply execute Java bytecode but to provide a full Java Virtual Machine, and then show how this could be generalized beyond a JVM, is a CS contribution. Disclosure: I'm a labmate of the author.

Here's one of the scientific questions, and FWIW, it's actually written right on the first page: can you implement a programming language that relies on synchronous primitives in a language that has only asynchronous primitives? The answer to that question is not obvious, but it turns out to be "yes". Doppio is existential proof that it can be done. Furthermore, that fact is falsifiable (see Karl Popper), which makes the process of learning the answer "science."

You may not think this is an important question to answer--and that is your prerogative--but you can't argue that it isn't scientific. One of those other scientific questions is about threading (again, see page 1). BTW, did you know that System.out.println relies on synchronization primitives? You literally cannot write helloworld in Java without invoking a lot of machinery.

> I admire the effort, but: doesn't "academic" mean "scientific"?

No, it doesn't. Or did you forget about music, literature, engineering, business, finance, history, philosophy, dance, art, architecture, nursing, etc.?

> Academic: of, relating to, or characteristic of a school, especially one of higher learning.

It appears as though this was a university project, the kind of project one completes for their masters even. Academic seems to be precisely the right term.

Yes, in the sense that it's called "computer science" with little scientific method involved.

I got an error, so I falsified it according to the scientific method:

    Error extracting doppio_home.zip: Error: ENOENT: No such file or directory., '/persist/vendor/java_home/lib/images/cursors/cursors.properties'

File a bug report, please. Doppio is actively maintained, though our current focus is on Browsix (which incorporates and extends some of Doppio's functionality -- see browsix.org).

No, it doesn't, and yes, there can.

relevant paper: https://homes.cs.washington.edu/~jfogarty/publications/works...

This is an example of research as illustrated by Figure 5 in that paper (known functionality, novel techniques)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact