

The Emterpreter: Run asm.js code before it can be parsed - Rauchg
https://blog.mozilla.org/research/2015/02/23/the-emterpreter-run-code-before-it-can-be-parsed/

======
Kronopath
I actually really like the sound of this, not because of the stated benefits
(though those are nice), but because it sounds like this would actually create
a really good upgrade path to implementing a _proper bytecode_ into browsers.

I mean, with asm.js and the copious amounts of compile-to-Javascript languages
available these days, Javascript is already becoming a de facto bytecode for
the web. But it's always been a weird and uncomfortable hack done for the sake
of backwards compatibility: a higher-level language hijacked to work as a
compile target, simply because it's the only thing supported by browsers.

If projects like this Emterpreter catch on, though, it allows for a smooth
path to proper bytecode: for backwards compatibility, you have the Emterpreter
read and execute the bytecode, but in other, more modern browsers you have the
browser execute the bytecode directly. I think this would be an overall better
approach than what we have now.

~~~
estava
The thing for now is that the asm.js "family" is riding on LLVM's back. More
and more it's the LLVM that is becoming that VM people have longed for. It
looks like LLVM is inching towards the JavaScript VM with every year that goes
by. I read that Apple is using it for further JavaScript optimizations on the
browser. LLVM already powers WebKit underneath right?

So as things stand right now we have 2 popular "VMs" people really use. One is
JavaScript since it's going nowhere. And the other is LLVM that is "free as in
beer" for companies all over. JavaScript was secluded to the client. And LLVM
was secluded to the backend. Now they are going to be marrying and having lots
of children. :-)

~~~
adrianm
LLVM is not a VM by any stretch of the imagination. It is an intermediate
language for compilers whose primary utility is to provide a common target for
code generation and optimization. LLVM's name is a misnomer.

~~~
IshKebab
You're thinking of that email that someone sent on a mailing list a while back
about how LLVM isn't really a bytecode because it includes architecture-
specific codes.

It was a stupid email - you can make LLVM into an architecture-agnostic
bytecode by disallowing those codes.

Don't believe me?... [https://developer.chrome.com/native-
client/reference/pnacl-b...](https://developer.chrome.com/native-
client/reference/pnacl-bitcode-abi)

~~~
adrianm
"You're thinking of that email that someone sent on a mailing list a while
back about how LLVM isn't really a bytecode because it includes architecture-
specific codes."

No, I'm not. In fact, I have no idea what email you're referencing.

LLVM as an intermediate language (I guess you could call it bytecode if you
really wanted to) for compilers of arbitrary languages (expressly not a
virtual machine!) is the only point I intended to make.

------
flohofwoe
I think this is the first emscripten feature which I don't "get", e.g. not
fully understanding the motivation behind it :) Firefox AOT compile time is
great, and nothing at all compared to the time it takes to download the code.
The real art lies in getting the actual emscripten compiled executable small,
not because of emscripten, but because of C++'s tendency to bloat the
executable size if one isn't really careful. Also, clang and emscripten's code
generation passes have very aggressive dead code elimination, but there can be
wrong decisions on the game engine architecture-level which can cause the
inclusion of code that might never be called. A native executable that's
several dozen MByte's big is a bad thing on any platform, it just doesn't
appear as such a problem if it is a part of a 50 GByte game download. To come
to and end, I think an emscripten client of up to 5 MByte (gzipped) doesn't
really require a fix like emterpreter for its startup time(?), and it's
perfectly possible to fit a full 3D game client into such a size with a little
care about size optimization.

[edit: fixed a formulation which sounded like code bloat can be blamed on
emscripten instead of user-code]

~~~
andrewchambers
Imo a stackbased bytecode has far better code density, which is probably
pretty good for browsers.

~~~
beagle3
.. at a significant cost to execution speed, experience shows. The fastest
interpreted "general purpose" language is apparently Lua, which switched from
stack to registers to get that speed a while ago.

------
AshleysBrain
This is very cool, and great to read more interesting research from Mozilla.
However in this article I notice something curious: normal asm.js kicks off
and shortly after reaches top speed by around 700ms. emterpreter gets going by
200ms, but takes another 1200ms to reach top speed by 1400ms. Why is it not
another 700ms? Isn't it just doing the same work as asm.js but in the
background while running with emterpreter?

Non-blacklisted emterpreter looks slow enough (5fps ish on that graph?) to
simply not be useful for some use cases, like a game engine - it's not going
to be remotely playable like that. Therefore emterpret => asm.js actually
significantly increases the startup time. Playable by 1400ms is worse than
playable by 700ms. But I guess this is all preliminary and improvable though!

~~~
azakai
There are a few reasons why full speed is reached later when starting up in
the emterpreter and swapping in asm.js later:

* Compiling asm.js can use multiple CPU cores, so doing just that is faster than doing it while the emterpreter is running on (at least) one core.

* I believe, but am not sure, that compiling on a background thread is done at lower priority than stuff on the main thread.

* Swapping asm.js code in can only be done in between frames. At 10fps for example, that means around a 100ms delay just for that, and possibly more depending on the state of the browser's event queue.

------
jerf
As I sometimes see wmf say, "as predicted by prophecy:"
[https://news.ycombinator.com/item?id=6923758](https://news.ycombinator.com/item?id=6923758)

~~~
hobo_mark
Also, [https://www.destroyallsoftware.com/talks/the-birth-and-
death...](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-
javascript)

~~~
jerf
I actually acknowledged that in the version of this I posted 4 days ago:
[https://news.ycombinator.com/item?id=9071064](https://news.ycombinator.com/item?id=9071064)

However, he has JS hanging on for longer than I bet on... once asm.js gets a
good DOM binding I expect the explosion of language diversity to take about
two years, tops, and for it to rapidly become clear that JS is now just
another way of accessing the DOM. I think there's more pressure built up there
than people realize, because right now there's no point in thinking about it,
but once it's possible, _kablooie_. Node's value proposition, IMHO, is in some
sense correct, but backwards; it's not that we want to write in Javascript on
the server, it's that we want "client language = server language"... and once
there's no longer a technical handcuff pinning the client side of that
equation to Javascript, it will not take that long for it to no longer be
Javascript. It is not an impressive language, even within its own 1990s-style
dynamic language niche.

(I think this is not because it's "bad", but because it has been developed in
this really terrible multiple-vendors-that-actively-don't-want-to-cooperate
way for most of its lifetime. It's gotten past that, I think, but during those
decades all the other scripting languages were marching right along. None of
the other languages could have survived such a process and gotten to where
they are today, either.)

~~~
Touche
GC-ed languages are going to have to include the GC which I doubt can compete
with JavaScript. No one wants to write CRUD apps and manually manage memory. I
don't see asm.js being used outside of games.

~~~
wmf
Unless someone invents a language where non-GC memory management is easy and
that language can compile to both the client and server...

~~~
iopq
Rust makes it not quite "easy" but at least it makes it automatic and not
error-prone.

------
mmastrac
Hot damn, that's exciting. I prefer the asm.js approach over PNaCl because it
lets browsers gradually move over to the standard rather than forcing a flag
day. This solves the big issue, namely parsing that big, hot mess of raw JS.

I hope we'll eventually see a proper bytecode spec with bidirectional
assembly/disassembly w.r.t. JS (ie: more a transformation than an assembly
spec) evolve from this effort, but it's obviously something that needs to
happen _after_ asm.js has had its time to bake.

This, of course, assumes that browsers don't get AOT/background compilation to
the point where it's no longer necessary to consider a bytecode spec.

------
im3w1l
After we have evicted Java from our browsers and turned Javascript into the
new Java, what have we gained?

~~~
jerf
It isn't the new Java, it's the new JVM, and the answer is, "choice".

~~~
reasonish
Choice? In what sense? The JVM is a ridiculously good VM, with tons of
languages that compile to it.

Why are we reinventing the wheel?

~~~
jerf
Choice of language to compile into the VM.

As for why not the Java VM, my guess is that the browsers are staggeringly
enormous piles of C++ code and trying to integrate a Java VM into it would
probably be insanely difficult, and anything other than pure 100% integration,
too slow to use. It is probably literally easier to continue with the already-
integrated JS VM and improve it up to JVM-esque quality than to try to graft
the JVM into the browsers that exist today.

Or somebody would have already have tried since nothing would have prevented
JS from already running on the JVM, if that were feasible; asm.js is actually
independent of this question when it comes down to it.

~~~
icedchai
You know tons of language already compile and run on the JVM, right?

And you know the JVM ran in the browser almost _20 years_ ago, on hardware
with a fraction of the CPU and memory resources we have today?

~~~
jerf
The JVM does not run _in_ the browser. Its _windows_ can run physically "in"
the browser window, inasmuch as it would appear that the Java app was "in" the
browser, but the Java app itself was a "plugin", and was a separate OS
process. That's not the right kind of "in".

The JVM has never run in the same process as the browser, to the best of my
knowledge, with the exception of the ill-fated "HotJava" browser:
[http://en.wikipedia.org/wiki/HotJava](http://en.wikipedia.org/wiki/HotJava)
which ran Java applets "in" the browser by virtue of being written in Java,
instead of C++. At the time this was too sluggish for general use, though.

So allow me to repeat myself: We can't run the JVM _in_ the browser right now.
The impedance mismatch between the JVM and the C++ world are just too great to
have sufficient performance right now. All the current browsers are enormous
piles of C++ code. There is no way to "just" integrate a JVM directly into
them, and no bridge between the two is going to have enough performance for
the demands we're putting on browsers. It doesn't _matter_ how awesome Java
may or may not be when the browsers aren't in Java. It's not an option today,
short of replicating Sun's feat and rewriting your own browser in Java. But
the only thing stopping you from doing that is the sheer size of the task,
rather than any technical problem. (But make no mistake, it is an _enormous_
undertaking now to write an engine capable of replacing any of the existing
ones for even a single well-chosen use case.)

~~~
icedchai
Oh, I see. I misunderstood your use of the word "in".

------
kelvin0
Yeah, maybe if we could compile to some byte code and simply download a VM to
compile once and run everywhere ... hmmm where have I seen this before?

------
al2o3cr
"Oh wow, wouldn't it be great if browsers had a bytecode interpreter in them?"

Two words: "Java applets".

------
GeorgeHahn
Is it straightforward to reference a separate compiled library from an
Emscripten binary?

Seems that caching commonly shared dependencies could be a good way to cut
down on size and parse/compile time.

------
leeoniya
a bit OT, but one thing i always wondered is, can't we extract inferred type
information from the JIT after running plain JS code through 1M cycles and
then compile that to an asm.js variant?

~~~
munificent
This is effectively what JavaScript VMs already do and have done for years.
They compile to unoptimized machine code that has a bunch of hooks to monitor
the types that flow through the code.

After a while, if a chunk of code is identified as "hot", a second-stage
compilation kicks in. The code is recompiled to optimized machine code that
takes advantage of the types that it previously saw the code use (with
fallbacks in case those assumptions later fail).

~~~
leeoniya
yes i understand that. my question is, can that inferred type info be used to
output the optimized bytecode as asm.js so that it can be included right away.
like statically compressed .gz assets (pre-optimized).

it would effectively make a js -> asm.js compiler

~~~
aardvark179
Possibly, but the types by a VM are rarely guaranteed to be completely
accurate (some traps will be included in optimised code to capture the cases
where types may not be as expected). They may only be true for the particular
input data, other other libraries used, or stuff like that.

Using previous runs to inform the JIT of expected types is entirely reasonable
though, and think various JS implementations already do this.

~~~
leeoniya
> some traps will be included in optimised code to capture the cases where
> types may not be as expected

none of that should matter. if the resulting JIT'd code is faster with the
traps than the plain js without them, so be it.

------
rafaelferreira
Quite similar to tiered compilation on the JVM

------
AndrewDucker
So, how long until someone gets the JVM or CLR running inside the browser
using asm.js - and then reopens the world of running Java/C# in the browser.
Only this time without plugins.

~~~
TazeTSchnitzel
Why waste time and bytes on an extra VM, when you can just compile Java or CLR
bytecode to JavaScript?

~~~
AndrewDucker
Because the behaviour of the JS VM and the CLR VM is very different?

