
Replacing JavaScript Hot Path with WebAssembly - markdog12
https://developers.google.com/web/updates/2019/02/hotpath-with-wasm
======
robko
My guess is that the JS implementation of the worst-performing browser is
having trouble with the non-1 for-loop steps. Doing 90-degree image rotation
with fixed steps and some index calculations should work better (0.18 sec vs
1.5 sec for their implementation in node.js):

    
    
        for (var y = 0; y < height; y++)
            for (var x = 0; x < width; x++)
                b[x + y*width] = a[y + (width - 1 - x)*height];
    

Although that's still far from the theoretical maximum throughput because the
cache utilization is really bad. If you apply loop tiling, it should be even
faster. This problem is closely related to matrix transpose, so there is a
great deal of research you can build upon.

EDIT: 0.07 seconds with loop tiling:

    
    
        for (var y0 = 0; y0 < height; y0 += 64){
            for (var x0 = 0; x0 < width; x0 += 64){
                for (var y = y0; y < y0 + 64; y++){
                    for (var x = x0; x < x0 + 64; x++){
                        b[x + y*width] = a[y + (width - 1 - x)*height];

~~~
acqq
Your 0.18 sec result is (to use the units they used in the article) 180ms, and
if I understand correctly their best webassembly compiled and executed result
(?) is 300ms. Beautiful.

EDIT: But it could also be that your computer is somewhat faster than theirs?
Do you happen to have some very fast CPU? Can you say which? When I run C-like
C++ versions of your code I get the speeds you get with node.js. However, you
made overall much better results than they were able, it's still great work!

    
    
        #include <stdio.h>
        int main(int argc, char* argv[]) {
            enum { height = 4096, width = 4096 };
            unsigned* a = new unsigned[ height*width ];
            unsigned* b = new unsigned[ height*width ];
            if ( argc < 2 ) { // call with no params
                // to measure overhead when just allocations
                // and no calculations are done
                printf( "%d %d\n", (int)a, (int)b );
                return 1;
            }
            if ( argv[1][0] == '1' ) // call with 1 the fastest
            for (unsigned y0 = 0; y0 < height; y0 += 64)
                for (unsigned x0 = 0; x0 < width; x0 += 64)
                    for (unsigned y = y0; y < y0 + 64; y++)
                        for (unsigned x = x0; x < x0 + 64; x++)
                            b[x + y*width] = a[y + (width - 1 - x)*height];
            else
            for (unsigned y = 0; y < height; y++)
                for (unsigned x = 0; x < width; x++)
                    b[x + y*width] = a[y + (width - 1 - x)*height];
    
            return 0;
        }

~~~
vijaybritto
I think its fast because of the L1 cache or something like that. I dont
understand fully but this is what i got

~~~
acqq
The fastest version is the fastest because it's the most cache-friendly one of
all which were presented. See e.g.

[https://stackoverflow.com/questions/5200338/a-cache-
efficien...](https://stackoverflow.com/questions/5200338/a-cache-efficient-
matrix-transpose-program)

But note that robko made an improvement even before making that.

~~~
acqq
> made an improvement even before

Or maybe not: my short experiments with the simplified version based on their
algorithm and his JavaScript versions gave some conflicting results. I haven't
thoroughly verified them, this note is just to motivate the others to try.

------
iamleppert
Did you see the benchmarks? There's almost no difference between javascript
and wasm except for a single certain browser. So you're really going to take
on the maintenance burden to get that better performance?

This is a cool technique but I can just imagine the looks on my team mates
faces when I tell them it isn't react... :/

~~~
nicoburns
Weird that they didn't say which browser is which :/

~~~
RivieraKid
Probably were worried it would look like they're trying to shame their
business partners (Apple and Microsoft I guess).

~~~
acqq
Actually it seems that the second worst in JavaScript (when executing their
example) is Chrome?

User robko here
[https://news.ycombinator.com/item?id=19167078](https://news.ycombinator.com/item?id=19167078)
measured the code on node.js, and node.js is based on Chrome's V8 and he
measured 1.5 sec vs article author's of around 2.7s, so it would seem that
robko has some almost twice as fast CPU, and the other two (fast) JavaScripts
are under 500 ms, and the slowest is 8 seconds, so V8 of Chrome remains the
only candidate for the second worst performing of their example.

~~~
om2
I wish they had at east posted a browser-runnable version of their test so we
could see for ourselves which browser is which, or compare JS vs WASM on our
own systems. (On this type of code, I'd expect Safari to be the fastest, not
Chrome.)

~~~
acqq
See my "minimal" C++ translation in my other post here. There's not much to
add. For JavaScript start with their code, but add the allocation, just
replace allocations with var a = new Uint32Array(height * width); and b the
same. Add the timing (1), put in HTML and you're done. It's easy, just a few
minutes for anybody who works with that (and this site should be filled with
the competent developers AFAIK).

1) [https://developer.mozilla.org/en-
US/docs/Web/API/Performance](https://developer.mozilla.org/en-
US/docs/Web/API/Performance)

------
alangpierce
The "predictable performance" point applies not just to performance across
browsers but also that you don't need to pay JIT warm-up costs. A while back,
I ran some benchmarks on the same codebase in TypeScript and AssemblyScript
and found that wasm was much faster than JS for short computations and often
slower than JS when V8 is given multiple seconds to fully warm up the JIT:

[https://github.com/alangpierce/sucrase/issues/216](https://github.com/alangpierce/sucrase/issues/216)

So really, it depends a lot on the use case. In my case, it's often a short-
lived node process that a user is directly waiting on, so compiling to wasm is
probably useful. It also depends on what you're doing; some types of work
(e.g. where you'd want careful memory management) are a lot harder for V8 to
optimize from JS and can be expressed more nicely in AssemblyScript or another
language that gives more memory flexibility.

~~~
why_only_15
For that, it looks like unless you're running the same js on a really huge
dataset webassembly will win (going from the second speed test). Even when
you're compiling 50MB of JS with that thing, Wasm is 5% slower than JS, and
when you're compiling 500KB (more typical) it's 300% faster.

------
gok
Wow all these numbers seem insanely bad. 500 milliseconds to transpose 16
million pixels (so 64mil bytes)? A modern CPU should able to do that at least
10x faster, if not 100x.

~~~
robko
You are correct. The code is using an inefficient cache access pattern, so
most of the time is spent waiting.

You probably won't get 100x faster without SIMD, but 10x is certainly doable.
Unfortunately, SIMD.js support has been removed from Chrome and Firefox a
while ago, even though it is not available in wasm to this day.

~~~
kllrnohj
How would SIMD do anything to address the problem's fundamental anti-cache-
friendly access patterns? You'd need to restructure the problem to be cache-
friendly, but SIMD won't really be relevant to that.

~~~
robko
You can use both at once. Usually, you'd have something like 64x64 tiles in
cache and use 4x4 or 8x8 tiles for SIMD.

------
thefounder
Can we expect a day when WASM will be first class citizen in browsers (i.e
like JavaScript) and not just a sidekick?

~~~
tachyonbeam
Not anytime soon IMO because WASM still has to access browser APIs through the
DOM, which is really built with JS in mind.

~~~
int_19h
HTML DOM is described in terms of IDL interfaces, complete with types. I
wouldn't say that it's optimized for JS - indeed, that's why jQuery and
similar were introduced. When WHATWG took over, they improved it specifically
for better JS interop, but it's still straightforward to map to most
statically typed languages.

[https://dom.spec.whatwg.org/#infrastructure](https://dom.spec.whatwg.org/#infrastructure)

[https://heycam.github.io/webidl/](https://heycam.github.io/webidl/)

~~~
olliej
The problem isn’t exposing the APIs, the problem is the wasm has what is
essentially the C memory model, so you couldn’t trust any point/object you get
from wasm land.

That’s why there so much work being put into giving wasm a more typical (for a
vm) typed heap. Similar issues occur with lifetime of objects - if you get
anything from the dom, you have to keep it live if wasm references it, but
wasm has no idea of what memory or a handle is.

These are solvable problems, but you’re not getting dom access until after
they’re solved.

~~~
int_19h
Why can't wasm just use opaque handles for DOM objects? It doesn't need them
to be in wasm-accessible memory, after all. It just needs to be able to invoke
methods on them.

~~~
olliej
It’s not “wasm just needs to be able to invoke them”

Because the wasm memory model doesn’t have typed memory - if you call a dom
api and get a handle back, you need to store it. Then you need to be able to
pass it back to the host vm.

So now your wasm code needs to make sure the handle stays live - wasm by
design doesn’t interact with the host GC, so you have to manually keep the
handle alive (refcounting apis or whatever), and the host VM has to have
someway to deal with you trying to use the handle without having kept it
alive.

Similarly because wasm is designed around storing raw memory in the heap the
wasm code can treat the handles as integers. Eg an attacker can just generate
spoof handles and try to create type-confusion bugs, or maybe manually over
release things.

So the problem isn’t “how do we let wasm make these calls” but rather “how do
we do that without making it trivially exploitable”.

~~~
int_19h
But surely that is also fundamentally a solved problem? I mean, we've had
distributed systems for a long time, and they had to deal with all the same
issues - lifetime, security etc.

~~~
olliej
Distributed systems are designed (for better or worth) on the idea of non-
malicious nodes.

Those that aren't have an extremely limited API - that would be logically not
dissimilar from "untrusted wasm talks to more trusted JS".

------
PhilippGille
I would have loved to see Go included in the comparison as well. It can
compile to wasm since 1.11.

~~~
dassurma
Go was pretty much a non-starter. They (currently) need a runtime which will
make the file size non-competitive to the other ones. Also, since only Chrome
has support for threads in WebAssembly (in Canary), we’d not be able to make
any use of the concurrency.

------
kodablah
I'd be tempted to hand-write the WAT for that. It's not that bad, much easier
than dropping into a x86 asm block in C or something.

~~~
dassurma
I did :D Turns out compilers are pretty good at generating WAT.

------
barrystaes
Hmm.. the Google article stipulates:

> WebAssembly on the other hand is built entirely around raw execution speed.
> So if we want fast, predictable performance across browsers for code like
> this, WebAssembly can help.

So i wanted to see how i could use WebAssembly in a React webapps. I found
this SO question sees the opposite:

> When running this [ WebAssembly] code in Chrome, I observe "pauses" that
> cause the app to be a bit jittery. Running the app in Firefox is a lot
> faster and smoother.

[https://stackoverflow.com/questions/53584607/react-app-
with-...](https://stackoverflow.com/questions/53584607/react-app-with-
webassembly-is-slow-in-chrome-but-fast-in-firefox-why)

Same code, different browser, different performance. I'd love seeing a Google
Developer answering that question in depth..

~~~
IshKebab
That looks like a bug, rather than an inherent characteristic of wasm.

------
z3t4
I would try optimizing the JS before dropping down to webassembly. For example
try replacing let and const with var as let and const in loops have to create
a new variable for each iteration.

~~~
Etheryte
That's not how let and const work at all, where did you get that impression?

~~~
z3t4
Have you ever made a for loop using var only to have the variable point to the
last value in the iteration ? And had to make a closure using forEach,
function or self calling function ? With let you do not have to do that as a
new variable is created for each iteration. Instead or reassigned when you use
var.

------
sisu2019
WebAssembly is a bit underwhelming to be honest. It feels like every week
there is a new language that can come close to C performance meanwhile they've
been working on WebAssembly for years and years and it can barely beat JS.

Shouldn't WA as a greenfield project with it's extremely basic memory model
and lack of runtime or standard library be super easy to optimize?

After all, there is no point in having the bad ergonomics of assembly together
with the awful performance of JS, right?

------
amelius
Those are runtimes in the second range. Are they doing that in a separate
thread or do they block the UI? And how long does it take to transfer the data
to that thread?

------
asien
The performance gain are so small , its not worth this setup overhead . The
average user won’t see the difference. Hence , the simplicity of’this module .
Just do it in JS , chrome as 70% market share why would you ever bother ?

V8 has received decades of optimizations and it can easily compete with
compiled languages in terms of speed.

I was hyped to death for WASM , but this is the tenth article I’m reading on
this subject and I still ending on the same conclusion : there is no advantage
for front end developers to use WASM.

Only Rendering Engine ( Unity , Adobe Products, Autodesk ) can really benefits
from this.

~~~
news_to_me
> chrome [h]as 70% market share why would you ever bother ?

This view seriously needs to die. It's honestly not that hard to test in two
or three browsers, and the differences are minor enough that it isn't a pain.
But the only way that's possible is through Web standardization, which only
happens when there are diverse options.

As web developers, it's our duty to keep the web healthy, and that means not
only optimizing for a single browser.

~~~
outworlder
Agreed. This is IE 6.0 all over again.

~~~
shpx
With the exception that Chrome is a good browser.

~~~
olliej
While msft did abuse their position to solidify an IE centric world, people
need to realize that when ie4/5/6 were released they were dramatically better
than the competitors. The problem is that post-domination they simply
stagnated and so the design shortcomings start being a problem.

It needs to be repeated: at the time IE /was/ a good browser. Just like chrome
today. And similar to chrome played fast and loose with web exposed features.
Sometimes for the better (XHR was an IE invention), sometimes for worse (so
was activeX).

~~~
cesarb
> Sometimes for the better (XHR was an IE invention), sometimes for worse (so
> was activeX).

Wasn't XMLHttpRequest an ActiveX object?

------
jijji
anyone like to take a guess what "Browser 4" is?

------
Roboprog
Too bad no wasm on IE 11

Where it would be needed most

~~~
tracker1
Depends on your use case... a lot of places are just plain deprecating IE
altogether.

~~~
ghayes
Isn't Microsoft one of the founding companies in the initiative? [0]

[0] [https://webassembly.org/](https://webassembly.org/)

~~~
simlevesque
Microsoft !== IE

The logo on the front page you linked it the logo of Microsoft's other
browser, Edge. There is no other mention of Microsoft or Internet Explorer on
it.

------
gcb0
(overly ironic) tl;dr "let's rewrite something in this new browser feature
because the other browser feature we added last week is not supported anywhere
and buggy in chrome"

