
Firefox’s new streaming and tiering compiler - markdog12
https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/
======
markdog12
[https://lukewagner.github.io/test-tanks-compile-
time/](https://lukewagner.github.io/test-tanks-compile-time/)

Firefox Nightly: WebAssembly.instantiate took 227.6ms (54.4mb/s)

Chrome Canary: WebAssembly.instantiate took 8576ms (1.4mb/s)

Wow.

(Edit: And I believe that's not even using the streaming compilation mentioned
in the article, it's just the new baseline compiler in action)

~~~
depressedpanda
I ran the tests on my Nexus 5X running stock Android 8.1.

Chrome: WebAssembly.instantiate took 12935.5 ms (1 MB/s)

Firefox Nightly: WebAssembly.instantiate took 1223.1 ms (10.1 MB/s)

Yikes, one order of magnitude in difference.

~~~
alexeldeib
Similar for me, a touch over 10x

FF: WebAssembly.instantiate took 280.3 ms (44.2 MB/s)

Chrome: WebAssembly.instantiate took 3022.4 ms (4.1 MB/s)

------
MaxBarraclough
A two-tier JIT. Interesting to see tiered JIT compilation catch on the way it
has. I seem to remember a few years ago reading that the Java HotSpot team had
given up on tiered JIT compilation as being not worthwhile.

How far we've come. A whirlwind tour of todays JITs (apologies for the million
links):

.Net Core seems not to use tiered compilation. It never interprets the IR;
everything is run through the same JIT compiler.
[https://github.com/dotnet/coreclr/issues/4331](https://github.com/dotnet/coreclr/issues/4331)

HotSpot uses three tiers these days (counting direct interpretation as a tier)
-
[https://docs.oracle.com/javase/8/docs/technotes/guides/vm/pe...](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/performance-
enhancements-7.html#tieredcompilation)

JavaScriptCore/Nitro seems to use four -
[https://webkit.org/blog/3362/introducing-the-webkit-ftl-
jit/](https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/)

Edge's Chakra engine has two -
[https://blogs.msdn.microsoft.com/ie/2014/10/09/announcing-
ke...](https://blogs.msdn.microsoft.com/ie/2014/10/09/announcing-key-advances-
to-javascript-performance-in-windows-10-technical-preview/)

V8 seems to use two - [https://v8project.blogspot.co.uk/2017/05/launching-
ignition-...](https://v8project.blogspot.co.uk/2017/05/launching-ignition-and-
turbofan.html)

Firefox's SpiderMonkey JS engine uses two - [https://developer.mozilla.org/en-
US/docs/Mozilla/Projects/Sp...](https://developer.mozilla.org/en-
US/docs/Mozilla/Projects/SpiderMonkey/Internals)

~~~
rcthompson
Is it actually a JIT? It's just compiling everything unconditionally. I guess
the fact that the second tier replaces previously compiled functions with more
optimized versions makes it a JIT? Or does the definition of JIT require
recompiling in response to information about which code would benefit most?

~~~
MaxBarraclough
> Is it actually a JIT? It's just compiling everything unconditionally.

Still counts as JIT in my book, but you're right that it's a bit subtle.

Unix-style configure/build/install isn't considered JIT.

Installing a .Net application is pretty similar, but we don't consider it JIT.

In the usual .Net model, what's distributed is IR rather than source-code.
Compilation to native code happens at install time. The build-and-install
process is less explicit than the Unix way, and it's less error-prone (fewer
dependency issues and issues with the compiler not liking your source code).

Really it's a very similar model to the Unix one, but we call one JIT and not
the other.

Oracle Java, of course, only ever compiles to native code at runtime, and
never caches native code. 'Proper' JIT. (This may be set to change in the near
future though.)

Interestingly, .Net seems to be moving in the direction of full static
compilation, or they wouldn't be asking devs to rebuild UWP apps to
incorporate framework fixes - [https://aka.ms/sqfj4h/](https://aka.ms/sqfj4h/)

~~~
caf
It might be fun to make a source-based distribution where every binary in
/usr/bin started off as a link to a script that built and installed the
requested executable (over the top of the link), before executing it.

~~~
MaxBarraclough
Source-based distros essentially do that, they just cache the binaries.

Various research OSs are JIT-based, of course. It looks like JX (a Java
operating system) caches its native code, so it's not 'pure JIT'
[https://github.com/mczero80/jx/blob/5fbeae79/libs/compiler_e...](https://github.com/mczero80/jx/blob/5fbeae79/libs/compiler_env/jx/compiler/persistent/CodeFile.java#L343)

It looks like Cosmos (a C# operating system) does the same
[https://en.wikipedia.org/wiki/IL2CPU](https://en.wikipedia.org/wiki/IL2CPU)

------
sjrd
Nice article.

Although, as always with articles on WebAssembly, it keeps repeating that wasm
is faster than JavaScript, without ever mentioning the limitations of wasm
wrt. JS (no GC, no interaction with the DOM or with JS libraries besides
numbers, etc.). And that means there are zillions of developers who keep being
misled in thinking stuff like "Why don't you compile to wasm to make your
stuff faster?". That includes absurdities like "We should write a compiler
from JavaScript to wasm to make all our JS faster!"

~~~
flavio81
>JavaScript, without ever mentioning the limitations of wasm wrt. JS (no GC,
no interaction with the DOM or with JS libraries besides numbers, etc.).

DOM will die as soon as the industry moves to one or two good GUI toolkits
that run under Webassembly and are way faster to use than the cumbersome
present combination of HTML+CSS+CSS preprocessor+JS libs.

Mark my words.

~~~
pcwalton
I'm nearly certain that this will not be the case. Once you reinvent
everything that the DOM does, it's highly unlikely you'll end up faster than
the DOM.

Everyone thinks that the rendering engines in browsers are easy to beat in
terms of performance. I thought that too, until I implemented one. They are
definitely beatable, but not _easily_ , and certainly not with an architecture
like that of Qt or GTK.

~~~
xg15
I'm not so sure. You don't need to reinvent everything that the DOM does as
the DOM is is burdened down with all kinds of backwards compatibility concerns
and conflicting design philosophies.

E.g., I don't think any sane design of a UI toolkit would include the ability
to read _and modify_ the string representation of the UI code at runtime - yet
it's a critical feature for the DOM.

Likewise, you wouldn't necessarily need the ability to access and mutate
arbitrary nodes of the document tree at any time. (including mutations that
might change which CSS selectors apply to a node) E.g., you could only expose
higher-level widgets instead or only expose variables that feed into a
template. That would allow optimisations which aren't possible with CSS and
DOM.

Finally, a WASM toolkit would be shipped with a particular website anyway, so
it wouldn't need to be general-purpose.

On the other hand, there is a great incentive for website operators to make
their site into a single unparseable blob: Ad-blockers. If every site had it's
own internal data representation and internal rendering engine, that would
make it almost impossible for ad-blockers to modify certain parts of the site
while leaving others intact.

~~~
pcwalton
> You don't need to reinvent everything that the DOM does as the DOM is is
> burdened down with all kinds of backwards compatibility concerns and
> conflicting design philosophies.

Those can largely be avoided, and they typically don't cause global
performance impacts.

> E.g., I don't think any sane design of a UI toolkit would include the
> ability to read and modify the string representation of the UI code at
> runtime - yet it's a critical feature for the DOM.

That isn't a problem. innerHTML is lazily computed from the tree structure: if
you don't use it, you don't pay for it.

> Likewise, you wouldn't necessarily need the ability to access and mutate
> arbitrary nodes of the document tree at any time. (including mutations that
> might change which CSS selectors apply to a node) E.g., you could only
> expose higher-level widgets instead or only expose variables that feed into
> a template.

The main benefit of this would be to eliminate restyling, but cascading is
really useful from a design point of view. That's why we've seen native
frameworks such as Qt and GTK+ move to style sheets. And if you reinvent
restyling, it'll be a ton of work to do better—remember that Servo and Firefox
Quantum have a parallel work-stealing implementation of it. I've never seen
any native toolkit that even comes close to that amount of performance effort.

~~~
xg15
> _That isn 't a problem. innerHTML is lazily computed from the tree
> structure: if you don't use it, you don't pay for it._

I'm not paying for it, the DOM implementation is - with increased complexity.
(E.g., HTML parsing suddenly becomes a time-critical operation because some
wiseguy decided to implement animations for his website using setTimeout and
innerHTML.)

And they can't drop it because a lot of sites rely on it - however, if you
wrote a new, limited-purpose renderer on top of WASM, you could decide to drop
it and simplify the implementation without losing much utility.

> _And if you reinvent restyling, it 'll be a ton of work to do better_

But that's kind of my point - if you can control which parts of the tree are
exposed and which mutations are valid, you might not need to implement
restyling at all. (Or in reduced scope)

I'm not talking about cascading in general, but about how you can make
arbitrary changes to the DOM after initial load, which the restyler has to
fully support.

~~~
pcwalton
> I'm not paying for it, the DOM implementation is - with increased
> complexity. (E.g., HTML parsing suddenly becomes a time-critical operation
> because some wiseguy decided to implement animations for his website using
> setTimeout and innerHTML.)

We're talking about performance here, not implementation complexity. Besides,
it's not a win in terms of complexity if sites ship a limited subset of the
Web stack to run on top of the _full_ implementation of the Web stack that's
already there.

> But that's kind of my point - if you can control which parts of the tree are
> exposed and which mutations are valid, you might not need to implement
> restyling at all. (Or in reduced scope)

Sure, you can improve performance by removing useful features. But I think
it'll be a hard sell to front-end developers. Qt and GTK+ didn't add style
sheets and restyling for no reason. They added those features because
developers demanded them.

~~~
xg15
I think we're talking past each other.

My point is that writing custom UI renderers using canvas and WASM might
become a reasonable thing to do. For that you don't need to stick to the web
stack at all, you can invent whatever language, API and data model fits your
needs. Those can be a lot simpler than the DOM and therefore easier to
implement with good performance.

------
Twirrim
Out of curiosity, using just released versions of browsers on this 2015 mac
pro:

Firefox 57: WebAssembly.instantiate took 2990.2ms (4.1mb/s) Chrome 63:
WebAssembly.instantiate took 8736.9ms (1.4mb/s) Safari 11.0.2:
WebAssembly.instantiate took 10341ms (1.2mb/s)

If more speed is about to arrive, wow.

I'm curious what optimisations are needed / valuable for wasm files to improve
streaming performance. I'm assuming if, e.g.:

def foo(baz): bar(baz)

...

def bar(baz): baz = baz +1

Then compilation would start and get stuck until it had a definition for bar?
If so, presumably the next build time optimisations for a website will be to
shuffle the code around in to as optimal an order as possible so as to improve
streaming compilation speed?

~~~
kllrnohj
Function declarations are independent from function bodies. So think C/C++
headers/source file splits. You don't need to know what code is in bar if you
know it takes 1 argument of type int and returns an int. That's all you needed
to know how to call it successfully, so you can compile foo in your example
perfectly fine. You just need to patch up the call location later when bar is
resolved to actual address (this is the "link" step in a typical AOT
compilation chain, or done by the loader if it's a dynamic dependency)

~~~
jnordwick
Considering the major optimization in compiling is inlining, knowing the
function body is very important to compilation, but I guess that can be pushed
off until the next tier.

~~~
kllrnohj
WebAsm is an intermediate not a source language. Initial in-lining & other
optimization have already been performed long before it hit your browser.
There could potentially be a JIT or similar to do a secondary optimization
pass in the browser if something is hot, but it's probably going to be largely
considered a codegen issue rather than a runtime issue.

------
gkya
I'm not familiar with Web Assembly, but the recent trend is that as the
downloads become faster, web performance in a vanilla browser becomes slower,
because websites just send more stuff to you. Pages grow toward infinity.
Also, if, like @sjrd mentioned, this code can't manipulate DOM or can use only
a restricted set of JS objects, the where will the gain be? Is this intended
to be used for number crunching code in the browser runtime? Help bitcoin
miner scripts? What's the purpose then?

~~~
sudhirj
DOM manipulation is already on the cards, should be out soon. This is likely a
separate sub-team of people just working on making it as fast as possible.

~~~
pas
Could you give details on "soon"?

All I was able to find is this issue:
[https://github.com/WebAssembly/design/issues/1079](https://github.com/WebAssembly/design/issues/1079)
with not activity for a long time

~~~
linclark
With the more recent host bindings proposal[1], direct DOM access was
decoupled from GC. This is expected to move much more quickly.

[1] [https://github.com/WebAssembly/host-
bindings/blob/master/pro...](https://github.com/WebAssembly/host-
bindings/blob/master/proposals/host-bindings/Overview.md)

------
axelfontaine
Using [https://lukewagner.github.io/test-tanks-compile-
time/](https://lukewagner.github.io/test-tanks-compile-time/)

Chrome 63: 3143.7ms (3.9mb/s)

Firefox 57: 1499ms (8.3mb/s)

Edge 41: 97.3ms (127.2mb/s) !!!

~~~
luke_wagner
To wit, as described in their blog post:
[https://blogs.windows.com/msedgedev/2017/04/20/improved-](https://blogs.windows.com/msedgedev/2017/04/20/improved-)
Edge validates and compiles wasm code lazily. Thus, this simplistic benchmark
isn't really measuring compile time on Edge. In contrast, Firefox, Chrome and
Safari are doing some amount of AOT compilation before
WebAssembly.instantiate() resolves.

~~~
ehsankia
Here's a stupid question, but is the result of the Firefox and Chrome
"instantiate" the exact same? Is the compilation doing the same job, or one
could be performing more optimizations? Aka faster compilation but slower
execution.

~~~
computerphage
They're doing slightly different things, but not to nearly the extent of Edge
which is doing something very different.

------
JepZ
Quoting Yehuda Katz, the co-creator of Ember.js, when it comes to JS-sizes is
kinda hilarious (random google result):

[https://gist.github.com/Restuta/cda69e50a853aa64912d](https://gist.github.com/Restuta/cda69e50a853aa64912d)

No offense to Yehuda in general (he is doing great work), but Ember.js so
ignorant of any js-size recommendations, that it seems weird to quote Yehuda
in that context.

------
microcolonel
On a desktop, we compile 30-60 megabytes of WebAssembly code per second.
That’s faster than the network delivers the packets.

Funny enough, on my workstation it seems to compile something more like 60-80
MiB/s, to keep up with my network which was recently upgraded to gigabit.

Very impressive stuff, I hope workstation CPUs can keep pace with networks.

------
bsimpson
Would be nice if it mentioned WebAssembly in the title - I presumed this was a
new feature for JS in Firefox Quantum.

~~~
markdog12
I originally put WASM in the title, not sure what happened.

------
nothrabannosir
does someone here, familiar with webassembly semantics, know if it’s
theoretically possible to start streaming execution of code? I.e. as soon as
the “main” (?) function is in, and block on every function call which is not
yet compiled, recursively? Or could the last block of webassembly bytecode
potentially change the semantics of the first?

Sooner or later, that’s an avenue people will want to explore, I assume?

~~~
tsavola
Yes, it's possible. No, the last block won't change the semantics of the
first.

------
zach43
Interesting article...I did not realize that the WASM needs to be compiled
into machine code on the client system, I just assumed it would be directly
interpreted by the JS engine.

As a side note, it is interesting to see that multithreaded compilation of a
single page provides significant performance benefits here...this is usually
not done with C/C++ code compilation from what I understand about it

~~~
Yoric
Well, the difference between "interpreted" and "compiled" has become very
blurry during the last 20 years. These days, most "interpreted" programming
languages are actually compiled to machine code on the client system.

This includes the JVML, of course, but also JavaScript, Python (with PyPy),
etc. PHP isn't quite there yet, but it's coming.

> As a side note, it is interesting to see that multithreaded compilation of a
> single page provides significant performance benefits here...this is usually
> not done with C/C++ code compilation from what I understand about it

It's slightly different, but native code is typically compiled concurrently,
too. The meat of it is often handled by the build system rather than the
compiler itself, but that's not so different.

~~~
lclarkmichalek
PHP is there with HHVM

~~~
noir_lord
Which ironically isn't really any faster than PHP7 in the real world outside
of benchmarks.

7 was a phenomenal release, I saw 50% reductions in processing time across the
board and on old array heavy systems 5-10x memory reduction.

~~~
solidr53
Thanks to pressure from HHVM I assume. Nothing was happening in the PHP
language for freaking three years.

To be fair the benchmarks usually take a wordpress or drupal installation and
do a requests per second measurement, which IMO is a real world benchmark.

No hate, I just don't get why hhvm doesn't get any love for what they did.
Maybe because from HPHPc to HHVM they seriously gave the PHPc a competition
and people kind of got mad.

[https://kinsta.com/blog/the-definitive-php-7-final-
version-h...](https://kinsta.com/blog/the-definitive-php-7-final-version-hhvm-
benchmark/)

~~~
krapp
>No hate, I just don't get why hhvm doesn't get any love for what they did.

I don't know - I expected to see a ton of Hack projects show up here but it's
like no one cared about the language except as a wake-up call to PHP. Maybe
the involvement of Facebook put people off.

------
filereaper
Total Aside. As a compiler and runtimes guy, I'm super excited for streaming
compilation. I think stuff like this and ethereum for distributed computation
is really cool stuff! :D

~~~
dfox
Streaming compilation is the way it was always historically done. One reason
is that computers used to not have enough RAM to store whole non-trivial
programs in it in AST or another intermediate form.

Second reason is that this approach matches how the underlying theory of
languages and automatons works. One can view modern AST producing compiler
frontend as compiler that compiles it's input into program that builds the
resulting AST.

On the other hand many modern optimalization passes simply cannot be done in
streaming manner or even by any pushdown automaton.

~~~
caf
Yes, which is likely why in older languages like C you must declare symbols
before you use them etc.

------
sehugg
That's great news. On [http://8bitworkshop.com/](http://8bitworkshop.com/) I'd
like to offer some additional WASM modules on-demand but they take 15+ seconds
to load. (It seems 50% of the time is parsing and 50% module instantiation)

------
Dolores12
Guess what, downloading compiled executable code is even faster. Is that where
we are heading to? Flash 2.0? Wouldn't it be great to save all the electric
power that was used to compile very same code on millions of computers every
day?

~~~
littlestymaar
Downloading a bigger executable wouldn't necessarily be faster actually, it
depends on the size difference and the client bandwidth.[1]

Additionnally, it wouldn't be portable (executable compiled for desktop
wouldn't run on mobile).

[1] See this comment :
[https://news.ycombinator.com/item?id=16171133](https://news.ycombinator.com/item?id=16171133)

~~~
Dolores12
What you are saying is that source code is less in size than compiled native
code which is nonsense.

~~~
littlestymaar
First of all I wasn't talking about source code, I was comparing the output of
a C/Rust->wasm compiler to the one of a C/Rust->x86 compiler. Since the wasm
virtual machine has a JIT, I believe the compilation to wasm isn't too
aggressive with optimizations. And since those make the binary bigger, I
assume a wasm output would be lighter than an x86 one. I didn't benchmark it
though.

And if you compare the size of the binary output with the size of the source
code, the binary is bigger in many cases because of optimizations (and runtime
size, for small programs). Additionally, the source code can be gziped with a
good compression factor whereas the binary cannot. Then 99% of the time, the
source code is lighter to send over the internet than the compiled binary.

------
breatheoften
Does wasm do runtime code specialization? I wonder if there will end up being
a way to do to timing attacks against the optimizing wasm compiler/linker step
... Is it possible to setup code such that the optimization time depends on
the runtime inferred type of an 'x' that you aren't supposed to have access to
...?

~~~
yoklov
The term you’re looking for is speculation, not specialization, but no, I
don’t think it does either. C++ and other languages targeting WASM often do
type specialization, but it’s entirely done before the browser sees WASM, and
has nothing to do with what you’re describing. (Which would be speculative
compilation).

I’d imagine that nobody does speculative compilation since the benefit is too
low given how fast the network is. Also, yes, there would be security
concerns.

------
stevemk14ebr
Caching of compiled code! As i read it they want to cache the wasm bytecode at
the client level. What if servers did the caching instead? Group clients by
the architectures they use and serve the cached bytecode to the right 'groups'
of clients.

~~~
sjrd
That would assume you trust the server not to give you malicious machine code
(which you of course cannot!). wasm is specified in such a way that it is
still sandboxed by the VM that compiles it. If you fetch arbitrary machine
code, you cannot verify it and that leads to huge security holes!

~~~
chrisseaton
> which you of course cannot!

Didn't Google's NaCL implement verification of sandboxed machine code?

~~~
sjrd
Maybe, but at what cost? I wouldn't be surprised if the cost of verifying the
machine code was higher than the cost of compiling wasm to machine code.

------
mrmondo
Firefox Beta (58) macOS 10.13.2: WebAssembly.instantiate took 151.9 ms (81.5
MB/s)

------
rmrfrmrf
> But there’s no good reason to keep the compiler waiting. It’s technically
> possible to compile WebAssembly line by line. This means you should be able
> to start as soon as the first chunk comes in.

Maybe they can optimize further by speculating what the next line will be...

------
StreamBright
What are the security implications of wasm?

~~~
markdog12
Good writeup here:
[http://webassembly.org/docs/security/](http://webassembly.org/docs/security/)

It runs in existing browser VMs, which have been pretty battle tested.

Another interesting note is that threads are now on hold for WebAssembly due
to Spectre, that is, SharedBufferArray has been disabled. Hopefully it can be
enabled in the future.

------
daveheq
Where do I buy the Firefox coin?

------
bobsc123
This cracks me up. Modern web browsers really started to evolve in the 90's
when security problems really ramped up. You used to just download
excecutables and run them on your computer because the functionality wasn't
there otherwise. Flash and Java applets were the initial answer to that before
Javascript and HTML evolved. We've come almost full circle to browsers
basically being little VM's that can do anything again, the main reason they
were developed in the first place. Most people's entire computer experience is
now in the browser and here come executables again which will require another
internal layer to mitigate problems.

~~~
ajkjk
The end state ends up looking a lot like (the user-facing side of) an
operating system, except that:

* the filesystem is cloud storage (Drive/Dropbox/what have you -- the Unhosted ([https://unhosted.org/](https://unhosted.org/)) architecture)

* the apps are insecure but open-source by requirement (interpreted jS)

* ... running in a controlled sandbox (the browser)

* ... using a standard UI language (HTML/CSS)

* with functionality modifiable/overridable by user preference (extensions)

It's pretty much the ecosystem you would want if you were building this from
scratch! Except you'd want Html/CSS/JS to be much more intelligently designed
from the start (I'm waiting so eagerly for the day that browsers natively run
more scripting languages than just JS...)

It never could be done in the 90s because everything ran too slowly, but it's
feasible now.

~~~
pjmlp
> It never could be done in the 90s because everything ran too slowly, but
> it's feasible now.

It used to be called Lisp Machines, Smalltalk, Oberon Juice, Java Jini,
Inferno.

------
jokoon
That's cool, but until I see it coming to smartphones, that won't really be
useful, except for gaming.

In theory this could really be the universal vm for the web everyone needed,
but it's still lacking real sockets and dom support.

~~~
aidenn0
It's available on android now...

~~~
jokoon
I mean as an universal way to make apps

------
jnordwick
I guess i don't understand the push to wasm. Why not just embed hotspot, or a
branch of it. Is there any difference?

Or going the other way, could hotspot be replaced with a wasm jit by compiling
java to wasm? I know they have slightly different memory models, but I don't
understand why they seem to be treated so separately.

~~~
icebraining
Why Hotspot? There are many bytecode languages, with their own execution
environments - .NET/CLR, Parrot, BEAM, etc. wasm is an attempt to design one
specifically for the web, rather than trying to shoehorn one made for a
different environment.

~~~
jnordwick
HotSpots optimization and code generation is far superior than any other VM.

But that is kind of my point - there are advanced vms out there. I don't see
why the web needs its own vm apart from them. All the differences I see are
fairly minimal.

~~~
kodablah
But the bytecode and stdlib it works with are the least suitable. I could go
on for a long time about JVM insns and the class model and GC assumptions that
make it bad for the web. Similarly, I could go on forever about the stdlib
that supports this bytecode (strings, threads, class loaders, etc) and why
it's bad for the web too.

