
The Path to Parallel JavaScript - leo2urlevan
https://blog.mozilla.org/javascript/2015/02/26/the-path-to-parallel-javascript/
======
reissbaker
This is nice for some pretty limited use cases, but the most common use case
for multithreading in app-like programs (which is what Worker-based apps
presumably service: these are not documents) is removing latency from the UI
thread. But as long as only the main thread can touch the UI, and the main
thread also can't access shared memory, this limits uses of this to scenarios
where copying the entire render state from a Worker is reasonably fast — in
which case, Workers currently already solve the problem. This proposed
implementation of shared memory doesn't actually solve one of the big
remaining needs for shared memory, which is when it's prohibitively expensive
to copy state between a Worker thread and the UI thread at 60fps.

For example, Workers aren't particularly useful for games in their current
iteration: the overhead of copying the state of the world back to the
rendering thread is high. This is exactly the problem that shared memory
_would_ solve, were it not limited to Workers. This puts web export (or even
primary web-based game authorship) at a significant disadvantage as compared
to native apps: native code can share memory, and web-based implementations
can't. In many cases architectures that are optimal for shared-memory
threading are pathological when the rendering thread requires copies, meaning
that threading gets thrown out the window for web. Even with asm.js-compiled
"near-native" performance on the single core, you can only use 25% of the
available CPU if you can't use multithreading. A 4x performance hit is the
difference between 60fps and 15fps... Or 15fps and ~4fps.

The title of the blog post got me pretty excited, but the proposal is fairly
disappointing in terms of unlocking better performance for web apps. The use
cases here are pretty limited to things like CPU-bound number crunching, and I
doubt too many people are running machine learning algorithms in a browser as
compared to the number people who're using browsers to, y'know, render UIs. By
all means scope the problem down to sharing primitive data in ArrayBuffers —
we can build abstractions on top of that! — but limiting it to Worker threads
makes it near-useless for most web applications. Workers already solve the use
cases for UIs that can tolerate copies between the UI thread and the Worker
threads, and this proposal doesn't allow us to solve needs for UIs that can't.

~~~
azakai
The blog post does mention the main thread, as something that is more complex
and in need of further investigation.

Still, even without shared memory being accessible to the main thread, I think
sharing between workers can be extremely useful. Yes, you need to proxy
information to the main thread in order to render, but that doesn't need to be
a big problem. See for example the test here, where a 3D game's rendering was
proxied to the main thread, with little loss of performance,

[https://blog.mozilla.org/research/2014/07/22/webgl-in-web-
wo...](https://blog.mozilla.org/research/2014/07/22/webgl-in-web-workers-
today-and-faster-than-expected/)

That very small overhead could be worth letting the game run in multiple
workers while using shared memory.

Also, things like Canvas 2D and WebGL are APIs that might exist in workers,
there are efforts towards that happening right now. That would eliminate the
need to copy anything to the main thread, and avoid a lot of latency.

~~~
reissbaker
I can't speak to the BananaBread codebase, since I haven't read it — although
BananaBread performs poorly as compared to commercial engines, regardless —
but if you look at even the published benchmarks in that blog post, the
Worker-based implementation is slower in all cases except for Firefox with two
bots. Chrome is always slower when using Workers, sometimes massively so, and
Firefox with ten bots is slower multithreaded than single-threaded.

Regardless, shared memory isn't only useful for WebGL. It's useful for any
kind of UI where you don't want to block the main thread, and if you don't
have DOM access it's tough to make that work. If copying is fine then current
Workers are good enough; if copying isn't fine, then this doesn't change that.

~~~
azakai
I agree with

> If copying is fine then current Workers are good enough; if copying isn't
> fine, then this doesn't change that.

I am saying that copying _is_ fine (for most apps).

Copying is fine because BananaBread is indeed less performant than commercial
engines, as you said, and that actually makes it a _better_ test candidate
here. It does far more GL commands than a perfectly optimized engine would,
which means much more overhead in terms of sending messages to the main
thread.

Despite that extra overhead, it does very well. 2 bots, a realistic workload,
is fast in Firefox, and the slowdown in Chrome (where message-passing overhead
is higher) is almost negligible. 10 bots, as mentioned in the blogpost, is a
stress test, not a realistic workload. As expected, things start to get slower
there. (Game is still playable though!)

And large amounts of GL commands, as tested there, are much more than what a
typical UI application would need. So for UI applications, that just want to
not stall the main thread, I think a single worker proxying operations to the
main thread could be enough. Copying is fine.

The proposal in the blogpost here is for other cases. Workers + copying
already solve quite a lot, very well. What this new proposal focuses on are
computation-intensive applications, that can multiply their performance by
using multiple threads. For example, an online photo-editing application needs
this. This might be just a small fraction of websites, but they are important
too.

~~~
reissbaker
Here are some things that BananaBread doesn't test that I suspect would break
larger games in commercial engines:

* High-poly models. This is one area where copying breaks down: that can be a ton of data. BananaBread has a single, low-poly model that it uses for NPCs, and it presumably rarely (if ever?) needs to be re-copied.

* Large, seamless worlds. If you can transfer the entire world into a single static GL buffer, sure, copying isn't a problem since it only happens once on boot. If you need to incrementally load and unload in chunks, you're going to be paying that cost again and again.

* Multi-pass rendering. In fact, the proxying approach makes multi-pass rendering impossible, as noted in the blog post.

By all means, if your application already works single-threaded, or within the
confines of the existing spec, you're going to be fine. But memory copies
aren't free, and UI offloading is one of the biggest reasons to use shared
memory.

Shared memory in Workers is nice — it doesn't make anything worse, and it
makes some things better! — but it's a little disappointing that the main
thread can't access the shared memory buffers. That's all.

~~~
azakai
I understand your disappointment, clearly more opportunities are opened up on
the main thread. As the article says, this is a proposed first step, and the
main thread is trickier, so it can be considered carefully later on.
Meanwhile, for a large set of use cases, the current proposal can provide
massive speedups.

------
amelius
Great developments, since, imho, we really need multi-threading to make decent
user-interfaces (ones without hick-ups due to blocking of the cpu-resource).

I think what we need is immutable data-structures to be shareable between
threads. This approach should also allow structural sharing between threads,
allowing for efficient and safe data structures.

Also, I could see a use for a mechanism where a thread creates a data-
structure, then marks it as read-only, such that it can become shared.

~~~
coderzach
Anecdotally, I've found that most performance hick-ups don't come from
computation blocking the ui, but from rendering one part of the ui blocking
every other part from rendering.

The solution to that being parallel or async paint/layout, which is something
I never hear mentioned (probably because it's a really hard problem).

~~~
Joe8Bit
Servo is actually in the process of implementing a parallel layout
engine[1][2]

1:
[http://en.wikipedia.org/wiki/Servo_(layout_engine)](http://en.wikipedia.org/wiki/Servo_\(layout_engine\))
2: [http://pcwalton.github.io/blog/2014/02/25/revamped-
parallel-...](http://pcwalton.github.io/blog/2014/02/25/revamped-parallel-
layout-in-servo/)

~~~
amelius
Interesting. I wonder how they handle incremental rendering though. Only the
forward-path is described in your reference [2].

~~~
sanxiyn
As far as I can tell there is nothing unusual or difficult here. Servo's
incremental layout implementation is here:
[https://github.com/servo/servo/blob/master/components/layout...](https://github.com/servo/servo/blob/master/components/layout/incremental.rs)

------
polskibus
What's wrong with message passing though - MPI does it, Erlang does it -
surely this paradigm can deliver good performance for data and task
parallelism.

I hope Mozilla will continue experimenting with parallel js. Exciting times!

~~~
odiroot
I think they mostly mean the overhead of (de)serialization from/into JSON.
It's also hard to pass binary data that way.

~~~
jewel
Would it work to have the data structures be copy-on-write? That way if the
worker only reads then it's O(1), you just pass a reference to the worker.

I imagine it'd be a pain to write a garbage collector for something like that.

~~~
benjaminjackman
Copy on write is already implemented-ish with transferable objects [1] (just
copy then write the copy).

However, copy on write still requires allocation and for some applications
that is a deal breaker.

1: [https://developer.mozilla.org/en-
US/docs/Web/API/Web_Workers...](https://developer.mozilla.org/en-
US/docs/Web/API/Web_Workers_API/Advanced_concepts_and_examples#Passing_data_by_transferring_ownership_%28transferable_objects%29)

------
higherpurpose
Will Spidermonkey be rewritten in Rust eventually?

Sidenote: WebGL 2.0 should come with ASTC support to be relatively future-
proof.

~~~
gsnedders
Note that JITing compilers gain a lot less from Rust's safety guarantees than
most code: you can easily JIT some code that breaks one of the safety
properties that Rust ordinarily defends.

------
thomasfoster96
From what I've seen, SIMD.js does make doing some calculations quicker,
although it's not the 4x increase one might assume. It's more like 30% to 50%.
The latency and overhead associated with moving the SIMD.js calculations into
a Web Worker actually reduces the performance increase to as little as 10% to
20%.

While message passing might make sense for some tasks, it's not going to be
quick enough to do things in 16ms and achieve the magical 60fps. Eventually
shared memory is going to be needed, and if we get that far I think there will
have to be some sort of acknowledgement that these performance-related
features can't possibly be foolproof.

Oh, and an asynchronous thread safe DOM, please and thanks :)

------
tracker1
I've been thinking that something combined with the use of async functions
(es7) could be used for Shared* locking...

    
    
        sharedObject.lock(^() => { 
          /* use object, which is locked until async promise resolves/rejects */ 
        });
    

where `^` is a short key for an async lambda function... `async` keyword could
be used too... just throwing the `^` out there.

This could lock the object, allowing for an async function to execute... when
the async function promise resolves, the lock is released... it would have to
be limited to async functions though. but that would likely hit the JS engines
around the same time as any shared objects anyway.

~~~
tracker1
For me, the biggest problems with workers, is that you can't simply pass the
functions that the worker needs (separated from state) from the main window...
it means you're creating a separate script for workers, which isn't so bad,
just not always easy to reason around.

It also makes isomorphic interfaces (async node & browser) slightly harder.

------
lukasm
Does anyone know what is the origin of async/await? It's very convenient for
simple cases, but it makes it very hard to mix synchronous and async code(C#).

~~~
coderzach
How does it make it difficult?

~~~
sanxiyn
You can only call async functions inside async functions.

See also What color is your function?
[http://journal.stuffwithstuff.com/2015/02/01/what-color-
is-y...](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-
function/)

~~~
TheCoreh
If that was the case then async functions would be useless, since the global
scope is not an async function. You can call async functions from regular
functions, they will just return a promise. You can then use normal promise
handing behavior (passing a callback function, which is how you'd currently
handle async anyway).

In real-world scenarios though, most of the time you'll be reacting to events,
and not calling async code from sync code.

Edit: To clarify I'm talking about the ES7 async/await feature.

------
Zikes
I would love to see a channel primitive similar to Golang's. They really seem
to have hit the nail on the head with that one.

~~~
noelwelsh
The CSP model is very nice, but I don't think that what this proposal is
aiming for. CSP is more about concurrency, while the post makes it clear they
are looking at parallelism. This is more addressing "things that appear to be
single threaded but run faster" like image analysis, for instance.

~~~
nkozyra
They're obviously separate concepts, but tangentially related. Even in Go
you'll learn that sometimes introducing multithreading to a concurrent
application will actually _reduce_ your performance due to reconciliation of
context-switching within the application.

Javascript already has its own inherent concurrency, obviously, but it's not
outlandish to say that introducing a goroutine/coroutine concept would be a
lot more elegant and manageable.

------
leeoniya
i have always wanted Canvas as well as read-only array buffers (for things
like map-reduce image analysis) to be accessible in web workers.

------
general_failure
Why not implement go routines model?

~~~
tlrobinson
Aren't go routines pretty similar to WebWorkers, but with special syntax for
creating them (and sending/receiving messages over channels) and perhaps
lighter weight (though that may just be an implementation issue with current
JS engines)?

Edit: nevermind, it looks like go routines can share memory, (but channels are
the preferred method of synchronization):
[https://golang.org/ref/mem](https://golang.org/ref/mem)

------
spidermantoo
Good Article, Thanks.

------
bhouston
This is really important work!

------
PinnBrain
Oh no, will this become an ego thing? Parallelism must lie somewhere near
abstraction as a premature optimization. Computers are hard, even single
threaded.

