
Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code - mpweiher
https://www.usenix.org/conference/atc19/presentation/jangda
======
bastawhiz
So many comments here missing the point. As a developer, I can use wasm to
ship a single binary that works on Windows, Mac, Linux, Android, and iOS. I
can use the language that I want. I don't need to have Xcode, visual studio,
android studio, or any other set of tools: you can reasonably build an app
with your text editor and LLVM.

And then, you can use my app by typing its address into your address bar. No
installation needed. No polluting your home directory. No registry entries.
It's sandboxed, you're never giving the app permissions that you don't intend.
A service worker can be used to make the app available offline. My dear
grandmother could use this without confusion. My dad can use these apps and
feel confident he's not not infesting himself with malware or bundled
software.

The complaints about "the web doesn't feel native!" and "it's less secure!"
and "it's just as easy to ship a native app [for my platform of choice]!" are
tired arguments. We've finally got a simple way to distribute apps fully cross
platform in a way that CAN respect privacy and guarantees a reasonable level
of security. And, it's pretty darn fast.

~~~
Merad
> So many comments here missing the point. As a developer, I can use wasm to
> ship a single binary that works on Windows, Mac, Linux, Android, and iOS.

But we've accomplished that by effectively turning web browsers into VM's that
have a complexity on par with a full operating system. I have a sneaking
suspicion that we're going to end up deciding that was a huge mistake.

~~~
jdc
You'd have to go back farther than the intro of WASM to see a time that wasn't
the case.

------
flohofwoe
FWIW: in my 8-bit emulators I'm closer to the 10% performance difference for
code compiled in clang with -O3 (the native version uses latest Xcode's clang,
the WASM the latest LLVM WASM backend via emscripten):

[https://floooh.github.io/tiny8bit/cpc-
ui.html](https://floooh.github.io/tiny8bit/cpc-ui.html)

The time in the top-right is the time spent per frame in the emulator, in the
WASM version that's somewhere around 2.5 to 2.8ms on my laptop.

The native version is somewhere between 2.0 and 2.2ms.

From my experiments, it's quite easy to accidentally hit a "slow path" on the
WASM side, for instance registers are spilling earlier (AFAIK because WASM has
one or two registers set aside). Things are getting better though. About a
year ago, the same emulator code seemed to have triggered some specific slow-
path and the WASM version was nearly 4x slower.

Of course it's always possible to put more time into optimizing the native
version of a piece of code when targeting a specific compiler version,
compiler-specific intrinsics and a specific CPU model. This is where WASM will
always lag behind, with a portable ISA you'll always hit a wall where further
optimization simply isn't possible.

So from my experience: somewhere between 10% slower and 3x slower is all
within the "expected" range, the more important part is that performance is
_predictable_ for a given piece of code compiled with a given compiler version
(e.g. you won't have the GC come in and do its thing for a few hundred
milliseconds). And at least the emscripten team is very responsive and
interested in such pathological cases.

~~~
hinkley
But 10% is something I can easily sell to my boss. I only need one or two
compensating qualities for that argument to succeed.

If I'm getting work done faster, prioritizing a couple more perf stories is a
reasonable tax to pay.

~~~
tracker1
It depends on what you are trying to do... you can only spend so much CPU time
on a blinking cursor waiting for a user to click/type something.

------
FpUser
Ok. At some point in the future the browser/wasm will have won. So instead of
Linux/Windows/OSX we will have Firefox/Safari/Chrome/Whatever all with their
own implementation gotchas, missing features and performance loss. All for the
sake of browser vendors trying to displace the existing native deployment
platforms. They sure have nice financial incentive. Why do we - developers
fall into this trap? We've been there quite a few times already, with cross
platform standards. No matter what vendors will fsck'em up with proprietary
things.

~~~
flohofwoe
But it's the other way around. Can you distribute a native application on
macOS without Apple's blessing? No, you can't (not when Catalina will be out
anyway). And Windows is closing the "gap" really fast too. Try distributing a
native installer without the browser or SmartScreen bringing up a scare-
dialog-box.

Until OS vendors are willing to provide a safe sandbox to run untrusted code
in, and untangled from a non-technical curating mechanism which has all sorts
of conflicts-of-interests of what's morally acceptable and what's not, WASM is
the next best thing to an open, yet secure, platform.

Desktop Linux aside, the web is currently the only open platform for
distributing software outside of the interests of a single "platform owner".
Google may close this "loop hole" soon, so I'm not too optimistic for the
future, but today, that's what it is.

~~~
danieldk
_But it 's the other way around. Can you distribute a native application on
macOS without Apple's blessing? No, you can't (not when Catalina will be out
anyway)._

Yes you can:

[https://forums.macrumors.com/threads/unsigned-apps-
catalyst-...](https://forums.macrumors.com/threads/unsigned-apps-catalyst-app-
signing.2183873/#post-27437170)

 _WASM is the next best thing to an open, yet secure, platform._

So, instead of walled gardens with signed and checked applications (bad). We
have to run untrusted and unchecked code of vendors who run code to track your
movements around the web and in their applications? (Even worse)

 _Desktop Linux aside_

Maybe we should make Linux an acceptable platform for more people rather than
optimizing for proprietary web applications that you don't even own a copy of
anymore.

~~~
AnIdiotOnTheNet
> Maybe we should make Linux an acceptable platform for more people rather
> than optimizing for proprietary web applications that you don't even own a
> copy of anymore.

Won't ever happen for two reasons: 1) The Linux Desktop community would have
to admit that there are better ways to do things, and 2) They'd have to agree
on what those things are. 20 years of Linux Desktop history show these things
to be impossible.

Theoretically, a new community could spring up and put a new userland on top
of the Linux kernel, like Android did, but it would seem like the people who
have the interest in doing such a thing (like myself) lack enough time,
talent, and/or ability to organize sufficiently to pull it off. Not to mention
that we'd all have to agree too.

~~~
ScottFree
I lack the talent, but I'm definitely interested.

Have you taken a look around Casey Muratori's handmade.network? Or interacted
much with the guys who watch Jonathan Blow's livestreams? They make much the
same point you do about taking the linux kernel and creating a new userland
around it.

Anyone who's interested in working on something like this, email
walkinginalinuxuserland@lj3.me. The least we could all do is keep in touch,
right?

~~~
AnIdiotOnTheNet
Yes, I've hung out on Johns and Casey's streams. I don't bother much with
handmade.network because very little there interests me. Essence is kinda
neat, but it is a long way from practicality and I'm not sure how much my
ideals and Nakst's align.

I also think a forum or IRC channel would be more appropriate, but email is a
start and I appreciate you taking the initiative.

See also Probono's Hello[0], but I'm not sure he hasn't decided that Haiku
already fits for him.

[0] [https://github.com/probonopd/hello](https://github.com/probonopd/hello)

------
danieldk
Slower at an average of 50% looks acceptable to me for many applications.

However, for me personally I wonder: what's the point? I would go from Linux
-> glibc -> native application to Linux -> glibc -> Web Browser -> WebAssembly
application. Yet another layer of abstraction.

Sure, WebAssembly can be run directly on any platform. But precompiling an
application is not a problem (my Linux distribution provides binary builds).
So, this seems to be interesting to (primarily) web applications. However, I
try to avoid web apps, because they are typically proprietary, a moving
target, the owner can randomly decide to charge me for things, or (ab)use my
data, and they feel non-native.

(I do see the benefit for people who use web apps, since WebAssembly is
typically faster than Javascript and opens up web app programming to more
languages.)

~~~
cjfd
People might start baking chips that can natively run web assembly at some
point. If that happens we have found a very expensive path from running native
binaries to running native binaries for the purpose of making this independent
of the platform.

~~~
not_a_cop75
Lol. Well, I mean nothing is outside of the realm of possibility, but remember
those native Java chips! Some thought those were going to be a big win. And
then you had companies like Transmeta trying to do the exact opposite -
interpretively optimize all the things....or something like that.

~~~
CalChris
Transmeta JIT'd x86 into VLIW.

That was right before the Power Wall became an obstacle and CPUs were already
burning up. Transmeta in a sense invented low power computing but then Intel
said _thank you very much; we 'll take it from here._ It wasn't that they
didn't deliver. It's just that they were competing against Intel in the x86
space. A lot of the Transmeta team moved to nVidia and work on Denver but
that's different and they got an architecture license from ARM to do that.

------
titzer
Please actually read the paper. As usual, the mean is a poor approximation of
a distribution. This paper states repeatedly that it was able to reproduce the
experimental results from the 2017 PLDI paper, which primarily used PolyBenchC
as the suite. My team wrote that paper. We didn't cherry pick.

The issue is mostly that we suck more on Spec2k6 than we realized. Since
publishing that paper we do track Spec2k6 internally but we don't spend time
tuning for it.

/v8 wasm runtime TL

~~~
zenlibs
Thank you for the clarification. Some questions related to V8 perf:

1\. WASM <\--> JS/Host calls seem to have a higher cost on chrome than FF
[1][2]. Is that in your optimization roadmap?

2\. Wasmer tracks V8 perf against other backbends, and the LLVM backend
consistently outperforms V8 [3]. Can anything be done to address this,
especially for node, where the compile time cost of a super optimizing
compiler might be beneficial in certain applications?

[1]: [https://bnjbvr.github.io/perf-wasm-
calls/](https://bnjbvr.github.io/perf-wasm-calls/)

[2]: [https://hacks.mozilla.org/2018/10/calls-between-
javascript-a...](https://hacks.mozilla.org/2018/10/calls-between-javascript-
and-webassembly-are-finally-fast-/)

[3]: [http://speed.wasmer.io/comparison/](http://speed.wasmer.io/comparison/)

------
wiremine
How much of this performance gap is due to the inherent limitations of the
WebAssembly technology, and how much is due to it being relatively new?

I.e., do we expect to see performance improvements similar to what we've seen
in JavaScript over the last two decades? Or are we going to need better
hardware to realize better performance?

~~~
Paul-ish
Section 6.4 tries to answer this question.

> It is worth asking if the performance issues identified hereare fundamental.
> We believe that two of the identified issues are not: that is, they could be
> ameliorated by improved implementations. WebAssembly implementations today
> useregister allocators (§6.1.2) and code generators (§6.2.1) thatperform
> worse than Clang’s counterparts. However, an offlinecompiler like Clang can
> spend considerably more time togenerate better code, whereas WebAssembly
> compilers mustbe fast enough to run online. Therefore, solutions adoptedby
> other JITs, such as further optimizing hot code, are likely applicable here.

> The four other issues that we have identified appear to arise from the
> design constraints of WebAssembly: the stackoverflow checks (§6.2.2),
> indirect call checks (§6.2.3), andreserved registers (§6.1.1) have a runtime
> cost and lead to in-creased code size (§6.3). Unfortunately, these checks
> are nec-essary for WebAssembly’s safety guarantees. A redesigned
> WebAssembly, with richer types for memory and function pointers [23], might
> be able to perform some of these checksat compile time, but that could
> complicate the implementation of compilers that produce WebAssembly.
> Finally, a Web-Assembly implementation in a browser must interoperate witha
> high-performance JavaScript implementation, which mayimpose its own
> constraints. For example, current JavaScript implementations reserve a few
> registers for their own use,which increases register pressure on
> WebAssembly.

~~~
wiremine
> A redesigned WebAssembly, with richer types for memory and function pointers
> [23], might be able to perform some of these checks at compile time, but
> that could complicate the implementation of compilers that produce
> WebAssembly.

Great info, thanks for sharing! So, to summarize:

Performance will get a bit better for WebAssembly 1.0, but we'll need a
WebAssembly 2.0 (for lack of a better term) to realize major performance
improvements. And/Or we'll need advancements in JIT technology.

Beyond that, there will always be some limitations because of the Javascript
interop requirements.

And, as other threads here have pointed out... maybe who cares? It's still
pretty fast!

------
iTokio
> an offline compiler like Clang can spend considerably more time to generate
> better code, whereas WebAssembly compilers must be fast enough to run online

> [security] checks are necessary for WebAssembly’s safety

> current JavaScript implementations reserve a few registers for their own
> use, which increases register pressure on WebAssembly

~~~
PudgePacket
> > an offline compiler like Clang can spend considerably more time to
> generate better code, whereas WebAssembly compilers must be fast enough to
> run online

[https://hacks.mozilla.org/2018/01/making-webassembly-even-
fa...](https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-
firefoxs-new-streaming-and-tiering-compiler/)

> One of these speedups is streaming compilation, where the browser compiles
> the code while the code is still being downloaded. Up until now, this was
> just a potential future speedup. But with the release of Firefox 58 next
> week, it becomes a reality.

> Firefox 58 also includes a new 2-tiered compiler. The new baseline compiler
> compiles code 10–15 times faster than the optimizing compiler.

Seems to deal with the "whereas WebAssembly compilers must be fast enough to
run online"

~~~
azakai
Streaming and tiering do help, and I believe all browsers do that. Still, the
article's point is valid that offline compilers can spend more time compiling,
because even if you have a faster tier before you, running a slower compiler
eats up battery and increases the time before the code runs at full speed.

------
mikece
So... don't completely discard native desktop apps just yet? That said, I
suspect we'll soon see something akin to Eelctron for WebAssembly: a toolkit
for making UIs that acts as a permissions context/bridge to making calls into
the host operating system's API. Or maybe WebAssembly won't be the
revolutionary tidal wave of change that many of its proponents hope it will
be?

~~~
jtbayly
Won’t that be a security nightmare?

~~~
kowdermeister
Electron apps can already access the host OS, spawn child processes:

[https://github.com/martinjackson/electron-run-shell-
example](https://github.com/martinjackson/electron-run-shell-example)

------
gok
If anything this paper is unfair to the native compiler; they compare V8 and
SpiderMonkey from April 2019 to a clang release from March 2017, and they turn
off autovectorization. That being said, most of the slowdowns they identify
seem to be deficiencies in the WASM engines rather than fundamental problems
with WASM itself.

------
jakobegger
Can someone explain a concrete use case for web assembly? Are there projects
out there using web assembly in production? What kind of application needs
more performance than Javascript can offer, but doesn't need full native
performance?

~~~
GeneralMaximus
Figma ([https://www.figma.com/](https://www.figma.com/)) is a design tool that
runs in the browser and makes heavy use of WASM. It's definitely slower than
using a native app like Sketch, but not slow enough to be completely
impractical. A few design people I work with here in India have switched to
Figma from Sketch, and they seem very pleased with the results.

I can see WASM becoming useful for enabling apps that do real-time
audio/video/image processing in the browser. Games are another obvious use
case.

Maybe one day I'll be able to run an Ableton-lite right in the browser.

~~~
throwawaywindev
Everyone likes Figma because of its real time collaboration. The performance
is terrible but you put up with it because of the features.

------
BrissyCoder
It's early days. Calm down. It will speed up.

~~~
inlined
The reserved registers may always be a problem. It reminds me of the people
who complained about massive slowdown in games when Vista released. The games
were so optimized to fit in a single page that the few new pieces of state in
the window manager caused page thrashing.

~~~
titzer
For single-instance scenarios with a single memory, I think using a 0-based
memory will be a lot more efficient than reserving a register for the memory
base. That doesn't fly when a Wasm engine is embedded into an arbitrary C++
program, but for standalone scenarios this may be feasible.

------
Octoth0rpe
Tl;dr: 2x slower in FF, 2.5x slower in chrome. Depending on the application,
that doesn't seem to be a huge tradeoff. Certainly many developers are
comfortable writing things in python, which is likely hundreds of times slower
than C++ in many problem spaces.

TBH though my big hope for web assembly has nothing to do with the web at all,
but as a distribution format for drivers that can be installed on any cpu
arch/kernel. Finally, my dream of BSD on risc-v is achieved! ....or more
likely, just running linux on arm.

~~~
royjacobs
You don't mind drivers being half as performant as they were before, with the
only benefit being they're easier to distribute? Why? How is that beneficial
to anyone except the person packaging the driver?

~~~
bad_user
The browser is a really good sandbox. The security of browsers is also what
makes web apps easy to distribute.

That said I don't see the point of writing drivers in webassembly. If you want
better security for drivers, there are better endeavors like microkernel
architectures which Linux still isn't using.

------
mises
10% slower is actually not that bad, when compared to the safety and
portability not given by C/C++. Past 10% performance parity is the stuff of
performance optimization challenges; we may have to accept that.

We're all putting up with regular 10% drops in our CPU performance because
intel was stupid about alot of things; maybe we could give 10% up for common
applications (let's face it, most aren't CPU-bound) having more safety.

Of course, I don't know if wasm is the answer. Jury's still out for whether
it's great in native or not; honestly, GUI is still the issue.

------
jacknews
I would say even 50% slow down is a fair trade-off for very many applications,
for the benefit or running almost anywhere, without installation etc.

That was java's claim (but not the reality), what's the slowdown there?

~~~
eitland
In a lot of cases Java is faster than what average developers would write in
C.

That is except startup time.

------
hellofunk
The article claims that this is a revelation, but I’ve been hearing for at
least one year already that webassembly comes at best around half the speed of
native.

------
jedisct1
Relevant: benchmarking WebAssembly using libsodium:
[https://00f.net/2019/04/09/benchmarking-webassembly-using-
li...](https://00f.net/2019/04/09/benchmarking-webassembly-using-libsodium/)

------
devit
It looks like there's another issue that is not quite mentioned: the Chrome
code generator is using %rbx to point to the start of the WebAssembly memory,
which means that the available addressing modes are severely reduced and a
register is used up.

It seems like that it could use a %fs or %gs segment override instead (one of
which can be programmed arbitrarily at least on Linux), which should be cost-
free on all x86-64 processors and would solve the issue.

Once this is changed, the only remaining fundamental unfixable differences
would be reduced cache efficiency from the extra bytes for the segment
override, and the need to reserve an extra register for the wasm stack
pointer.

However, running WebAssembly code in a separate address space would allow to
run arbitrary code, although this would have the drawback of higher memory
usage and IPC slowdown.

Is there any article discussing the rationale between Firefox and Chrome's
current approach and whether they plan to use either segment overrides or
process isolation?

~~~
titzer
> the Chrome code generator is using %rbx

The memory base is actually loaded from the instance pointer, and neither of
these are pinned to specific registers. They're both allocatable by the
register allocator.

> %fs or %gs segment override instead

This is tricky to do for all platforms that V8 supports. I'm not sure it's
free; it's still at the very least another prefix.

~~~
devit
> This is tricky to do for all platforms that V8 supports

It can be done only where supported. Ivy Bridge and later have instructions to
set them in user space (WRFSBASE/WRGSBASE), and although it needs the OS to
enable it, I would expect it can be done on all significant x86-64 OSes.

> I'm not sure it's free; it's still at the very least another prefix.

There are benchmarks that show TLS is as fast as global variable, so it's
probably free (aside from the impact on the I-cache due to increased code
size, but using a general purpose register is probably much worse due to
additional instructions and extra SIB bytes).

------
dmitrygr
Watching wasm fans realize they've reinvented java is priceless

------
leshow
I didn't watch the video, does it say how much of the perf gap is due to the
unix API's they tried to replicate and how much is due to wasm actually being
'slower'?

50% the speed of native is really not bad, it's not like wasm is done being
worked on. That's a very promising start.

~~~
emeryberger
(I'm one of the paper authors.) The paper points out that the overhead of
Browsix-WASM is under 1% across all benchmarks.

------
jerp
Coming form a win32 desktop background, how would I do a UI in web assembly?

~~~
ScottFree
Have you seen Blazor?

[https://dotnet.microsoft.com/apps/aspnet/web-
apps/client](https://dotnet.microsoft.com/apps/aspnet/web-apps/client)

[https://www.youtube.com/watch?v=uW-
Kk7Qpv5U](https://www.youtube.com/watch?v=uW-Kk7Qpv5U)

It packages the entire .NET framework into a single wasm app, so it's a little
bloated, but it works.

------
pulse7
Not So Fast... yet... (implement first, optimize later)

