
Analyzing the Performance of WebAssembly vs. Native Code - furcyd
https://arxiv.org/abs/1901.09056
======
pier25
I never expected WebAssembly to be as fast as native only to be significantly
faster than JavaScript.

Considering the popularity of Electron, if WebAssembly does get as fast as
native or at least 80% there (and that's a really big if) it would easily
become the cross platform language for the vast majority of desktop
applications.

For mobile we will always be dragged down by Apple and its reticence to
embrace web technologies. It's 2019 and we can't even show a add to home
screen native banner.

~~~
obl
What's the point of using this for native applications ?

I mean yeah, it might be a little easier to distribute but it's kind of absurd
to agree to throw away 20% performance for a bit of convenience and then sched
tears on how the latest intel chips only delivered 5% improvement over the
previous ones.

~~~
cm2187
It’s not “a bit of convenience” when you are the business owner having to
write multiple cheques instead of one, because each platform enforces its own
incompatible technologies.

Plus there is another big argument. Some app written in wasm is not just
compatible with today’s platforms (iOS, Android, MacOS, Windows, etc), but
also with tomorrow’s platforms. The lack of apps is probably what killed
windows mobile and resulted in the current duopoly. If all apps become
compatible with any platform as long as they implement a standard, then there
is a chance for a challenger to break the duopoly.

~~~
ghettoimp
Hrmn. Isn't this cross-platform argument the same thing we heard for using
Java instead of C (write once run anywhere) or, for that matter, using C
instead of assembly (portable assembly language)?

Don't get me wrong, this is all to the good. But does WASM have some special
way to prevent the various platforms from implementing their own unique,
special, and (of course) incompatible APIs?

~~~
TheCoelacanth
Java became one of the most heavily used programming languages, so it seems to
me like that supports the cross-platform argument.

~~~
hollerith
The topic under discussion is using webasm to write cross-platform _user-
facing_ code, and Java never caught on for that purpose.

~~~
yellowapple
It caught on plenty for that purpose. See also: Minecraft and Android.

It probably would've caught on more/sooner on the desktop if it had adopted a
native-look-and-feel GUI toolkit earlier rather than later. Hopefully a WASM
reincarnation of Java's cross-platform goals will prioritize that sort of
look-and-feel integration.

~~~
hollerith
Android certainly satisfies the _user-facing_ requirement, but not the _cross-
platform_.

Specifically, if all we knew about user-facing Java was that most Android app
are written in Java (or something that differs from Java only enough to try to
avoid infringing the intellectual property rights of Java's owner) then we
would expect many cross-platform user-facing apps to be written in Java.

But we have no need for such indirect evidence because we can directly count
the cross-platform user-facing apps that are written in Java. You mentioned
one, Minecraft, and I will add Eclipse, IntelliJ and its derivatives and Jin
(a "client" used to connect to the Free Internet Chess Server and Internet
Chess Club). Since I know how hundreds of user-facing cross-platform apps are
implemented, and since four out of hundreds is not much, I stick to my claim
that Java never caught on for writing cross-platform user-facing code.

~~~
yellowapple
"Android certainly satisfies the user-facing requirement, but not the cross-
platform."

Depends on how you define "platform". Android apps certainly run on multiple
architectures (mostly ARM, but also x86 and MIPS). They also run on non-
Android operating systems (namely: ChromeOS; theoretically-speaking, other
operating systems could run Android apps, too, so long as Dalvik runs on those
operating systems, though unfortunately this is not the case for most
operating systems).

And sure, we can provably claim that Java didn't "catch on" compared to the
sheer volume of user-facing programs written in C or C++, but by that logic
macOS didn't "catch on" compared to the sheer volume of user-facing computers
running Windows (which might indeed be true depending on how you define "catch
on").

------
hackcasual
They disable vectorization on native, so hard to call this a true apples-to-
apples comparison

~~~
Veedrac
It makes sense since WASM's SIMD isn't ready yet, and a paper saying “WASM is
slower because SIMD isn't ready yet” would be rather uninformative to browser
vendors.

~~~
hackcasual
I think the problem when you're doing X vs. native is you're implicitly saying
"X vs. the best possible situation." I think it's fine to compare it against
vector-less operations, but they should also benchmark against more vanilla
native compiler settings.

~~~
vanderZwan
> _when you 're doing X vs. native is you're implicitly saying "X vs. the best
> possible situation."_

Not really, any sensible benchmark would be "X vs. the most relevant point of
comparison".

------
mgamache
WASM will be the default for any non-trivial web apps in the next few years
(assuming the DOM access gets addressed). Why would you ship JavaScript of the
internet to then get compiled locally? I know there's arguments for JIT, but I
don't think they make sense for JavaScript. Also devs hate JavaScript! Okay, I
know we don't all hate it and I know it's popular, but really why does
TypeScript exist? It's just a rational way to produce JS that hides the silly
nature of JS. Now you don't need JS (or TypeScript) choose the language that
works best.

~~~
nerdponx
Does WASM reduce (or at least not-increase) payload size? If not, I'm not sure
it's worth it unless the performance improvement is really big.

~~~
markdog12
It reduces size, but perhaps more importantly, reduces parsing time, which is
crucial, particularly for slow mobile devices.

------
xiphias2
Compilation speed should be part of the benchmarks, as it's critical for
WebASM. The best would be to use the Clang optimization setting with the most
similar compilation speed to compare runtime speeds.

As an example the article mentions suboptimal register allocation, but without
compile time comparision there's no way to know if there really is a simple
way to improve WebGL implementation.

~~~
comex
It's true that Clang can't compete with the JITs in question on speed. Even at
the settings with the most comparable quality of generated code, Clang will be
much slower, because its internal design is primarily geared towards maximum
optimization potential at high optimization levels, with lower optimization
levels as more of an afterthought.

But that's beside the point. The paper isn't suggesting switching to Clang or
to the algorithms Clang is using. Rather, it's treating Clang's output as an
approximation of 'optimal' code generation for the given C code. There are
various reasons to compare it to WebAssembly JITs:

\- For one, the paper identifies specific reasons the JIT output is slower,
which shows potential areas of improvement they could focus on.

\- The comparison also provides a sort of upper bound on how much the JITs
theoretically could be improved. It's only a weak upper bound; the upper bound
of code quality achievable _at the required performance levels_ is lower and
would be more relevant, but there's no way to measure that.

\- It also indicates the potential of adding a higher tier to the JIT that
optimizes very hot code using slower algorithms.

\- And finally, most entertainingly, the benchmarks help answer age-old
questions like "can WebAssembly replace native code?" :) Or, at least, "for
what applications can WebAssembly replace native code and have acceptable
performance?"...

------
quangio
The benchmark seems to use only browsers' built in VMs to run WebAssembly
(more like the the efficiency of different VM JIT compilers compared to
native). How about running them in standalone webassembly VM like WAVM (it
does more aggressive optimization, use LLVM for JIT), Asmble (running in JVM),
or Wasmer (use Cranelift like Firefox does, maybe a little bit faster than
Firefox)

(I just skimmed the paper, sorry if i miss something)

~~~
jacobolus
Speed when running in the browser is probably the main practical concern for
most folks using wasm....

~~~
derefr
Some projects are looking forward to using WASM as a general replacement for
their current Turing-complete query language engine (e.g. Javascript in
CouchDB, Lua in Redis, etc.) WAVM is probably what such systems would embed.

------
theninjagreg
The paper is comparing to native. How does it WASM compare to Javascript?

~~~
markdog12
> We find that while WebAssembly runs faster than JavaScript (on average 1.3×
> faster across SPEC CPU)

------
int_19h
I wonder if wasm is feasible to target in hardware. I mean, there were
hardware Java bytecode implementations, and that's a lot higher level, so it's
definitely possible - but is it worthwhile? Are there any idiosyncrasies in
wasm that make it slower than x86 or ARM, that could be fixed if the
underlying architecture was more accommodating?

~~~
uasm
> "I wonder if wasm is feasible to target in hardware. I mean, there were
> hardware Java bytecode implementations, and that's a lot higher level, so
> it's definitely possible - but is it worthwhile? Are there any
> idiosyncrasies in wasm that make it slower than x86 or ARM, that could be
> fixed if the underlying architecture was more accommodating?"

You're asking for unified, standardized, cross-vendor hardware + instruction
sets.

Why not do this other way around though? "Compile" JS to WASM, then to
x86/x64. Finally just execute on the bare metal.

~~~
int_19h
Let me rephrase that. Is x86/x64 (or ARM, for that matter) architecture ideal
to compile WASM into? Or could a different architecture be designed that would
result in better performance?

~~~
uasm
> "Let me rephrase that. Is x86/x64 (or ARM, for that matter) architecture
> ideal to compile WASM into? Or could a different architecture be designed
> that would result in better performance?"

I'm sure you could compile WASM into Japanese or French if you had to. Maybe
then, our French-native CPUs would finally be able to process "sudo fais moi
un sandwich". My point being, you're asking a very broad, very theoretical
question.

Are CPUs these days not performing well enough for most applications? If so,
is it necessarily due to the complex instruction sets powering those
platforms?

~~~
vanderZwan
I'm sure you aren't doing this consciously, but you're answering completely
different ones that are easier to dismiss than the original one that more
difficult to answer. It comes across as using a straw man.

int_19h is not asking a very broad, theoretical question so much as that it is
a difficult question to answer, requiring specific technical know-how about
CPUs and compiler design.

Saying that one can compile WASM into Japanese or French is not an answer to
_" is x86/x64 (or ARM) an ideal target architecture for WASM"_, so irrelevant.

Whether or not _" CPUs these days [are] not performing well enough for most
applications"_ is not relevant for answering GPs question. It's a
justification for you not to _have_ to answer his question. But ultimately,
whether or not there would be a point to targeting WASM in the hardware is
unrelated to whether it is possible, and what the limitations are. Also: did
most websites need the performance of V8 when it came out? No, because no
website used JavaScript intensely due to its slow performance.

------
Veedrac
Summary:

> Root Cause Analysis and Advice for Implementers:

> We conduct a forensic analysis with the aid of performance counter results
> to identify the root causes of this performance gap. We find the following
> results: (1) code compiled to WebAssembly yields more loads and stores than
> native code (2.1× more loads and 2× more stores in Chrome; 1.6× more loads
> and 1.7× more stores in Firefox). We attribute this to reduced availability
> of registers, a sub-optimal register allocator, and a failure to effectively
> exploit a wider range of x86 addressing modes; (2) increased code sizes lead
> to more instructions being executed and more L1 instruction cache misses;
> and (3) generated code has more branches due to safety checks for overflow
> and indirect function calls.

A surprisingly large amount of this boils down to "x86 needs more registers",
but there's quite a lot of detail here. It validates my personal experience
that Chrome doesn't do a very good job relative to Firefox, for example

> Code generated by Firefox has 1.15× more branch instructions retired and
> 1.21× more conditional branch instructions retired than native code, while
> code generated by Chrome has 4.13× branch instructions retired and 5.06×
> more conditional branch instructions retired.

> Chrome executes 2.9× more instructions and Firefox executes 1.53× more
> instructions on average than native code.

> On average, Chrome suffers from 3.88× more L1 instruction cache misses than
> native code, and Firefox suffers from 1.8× more L1 instruction cache misses
> than native code.

Overall, it was always obvious that Chrome's deficit compared to Firefox was
just engineering work, but what is new is figuring out how difficult the
native code deficit would be to close. Some aspects are quite straightforward;
browsers probably need to develop a more aggressive tier-2 WASM JIT, now that
both listed have a tier-1 baseline that is very fast to compile. This should
include:

1\. a better register allocator,

2\. better peephole optimizations,

3\. sophisticated loop optimizations.

Unavoidable slowdowns might include:

1\. register pressure from reserved registers,

2\. stack overflow checks,

3\. function table bounds checks.

———

Is this line a typo? I can't make sense of it; I read 1.5 and 1.9 respectively
from the table. (There's another mistyped sentence starting ‘Clang,’ but it is
not a major issue.)

> On average WebAssembly in Firefox runs at 1.9× over native code and in
> Chrome runs at 1.75× over native code.

~~~
tachyonbeam
This isn't very surprising. C compilers have developed all kinds of low level
optimizations over decades, trying to extract the last bits of performance
they could. In contrast, JIT compilers for JS focused on recovering missing
type information through type inference and other low-hanging fruits.

It's also the case that there's a lot more pressure on JS JITs to produce code
fast, and so running multiple passes of low level optimizations was not
desirable in a web context. This is somewhat bad news for WASM: better low
level optimizations could mean longer compilation times. However, AFAIK,
Chrome has been working on caching compiled code, so you may only have to
compile WASM when it changes (or when you reinstall/update your browser).

------
titzer
Chrome and Firefox performance should be closer, at least in Chrome 72 and 73.
I think the comparisons may have been done without trap handler support in
Chrome (virtual memory tricks for fast memory bounds checks).

~~~
emeryberger
In our experiments with Chrome, we found no difference for SPEC with or
without the trap handler.

------
snek
Seems like the tl;dr is that v8 and spidermonkey need to work on adding more
optimization tiers to their wasm piplines

~~~
maeln
Most performance spec for WASM are not even finished (SIMD, Thread, ...).
Doing a benchmark will be interesting when at least SIMD as been properly
finished and implemented.

------
crabl
Apple has shown and continues to show willingness to embrace web technologies
once they show clear promise or stability[1]. They might not implement
"hemorrhaging edge" technologies immediately, but if the technology supports
it, and it contributes positively to the user experience, they will do it.

[1]:
[https://twitter.com/mhartington/status/1089292031548145666](https://twitter.com/mhartington/status/1089292031548145666)

~~~
NicoJuicy
They won't add functionality that would compete with the app store.

Add to home screen is a good example

~~~
hpen
Serious question: Do non iOS users actually add PWAs to there home screen?

~~~
marcosdumay
It's basically equivalent to bookmarks.

Browsers have been working had to make people not want bookmarks, but
everybody that I show how to get them is just loves them.

(But well, they would be better if the mobile web didn't suck.)

