
A Taste of JavaScript's New Parallel Primitives - faide
https://hacks.mozilla.org/2016/05/a-taste-of-javascripts-new-parallel-primitives/
======
pfooti
You know, I'm not entirely sure how I feel about this. On the one hand: yeah,
I get that having really multithreaded stuff is pretty handy, especially for
certain computationally-bound tasks.

On the other hand, I quite like the single-threadedness of javascript.
Promises-based systems (or async/await) give us basically cooperative
multitasking anyway to break up long-running (unresponsive) threads without
worrying about mutexes and semaphores. I understand exactly when and where my
javascript code will be interrupted, and I don't need to wrap blocks in atomic
operation markers extraneously.

I've written plenty multithreaded code, starting with old pthreads stuff and
eventually moving on to Java (but my own experience with threaded stuff is
limited mainly to C and Java), and it can be a _real pain_. I guess limiting
shared memory to explicitly named blocks means you don't have as much to worry
about vis-a-vis nonreentrant code messing up your memory space.

That said, it is a pretty useful construct, and I see where this can benefit
browser-based games dev in particular (graphics can be sped up a lot with
multicore rendering, I bet).

~~~
dherman
[I'm a colleague of the OP and Mozilla/TC39 member, i.e. someone who cares a
lot about the JS programming model :)]

I'm enthusiastic about SharedArrayBuffer because, unlike threads in
traditional languages like C++ or Java, we have two separate sets of tools for
two very separate jobs: workers and shared memory for _parallelism_, and async
functions and promises for _concurrency_.

Not to put too fine a point on it, shared memory primitives are critical
building blocks for unlocking some of the highest performance use cases of the
Web platform, particularly for making full use of multicore and hyperthreaded
hardware. There's real power the Web has so far left on the table, and it's
got the capacity to unleash all sorts of new classes of applications.

At the same time, I _don't_ believe shared memory should, or in practice will,
change JavaScript's model of concurrency, that is, handling simultaneous
events caused by e.g. user interface actions, timers, or I/O. In fact, I'm
extremely excited about where JavaScript is headed with async functions. Async
functions are a sweet spot between on the one hand the excessively verbose and
error-prone world of callbacks or often even hand-written promise-based
control flow and on the other hand the fully implicit and hard-to-manage world
of shared-memory threading.

The async culture of JS is strong and I don't see it being threatened by a
low-level API for shared binary data. But I do see it being a primitive that
the JS ecosystem can use to experiment with parallel programming models.

~~~
_yosefk
I'm curious about 2 things:

1\. How is the accidental modification of random JS objects from multiple
threads prevented - that is, how is the communication restricted to explicitly
shared memory? Is it done by using OS process underneath?

2\. Exposing atomics greatly diminishes the effectiveness of automated race
detection tools. Is there a specific rationale for not exposing an interface
along the lines of Cilk instead - say, a parallel for loop and a parallel
function call that can be waited for? The mandelbrot example looks like it
could be handled just fine (meaning, just as efficiently and with a bit less
code) with a parallel for loop with what OpenMP calls a dynamic scheduling
policy (so an atomic counter hidden in its guts.)

There do exist tasks which can be handled more efficiently using raw atomics
than using a Cilk-like interface, but in my experience they are the exception
rather than the rule; on the other hand parallelism bugs are the rule rather
than the exception, and so effective automated debugging tools are a godsend.

Cilk comes with great race detection tools and these can be developed for any
system with a similar interface; the thing enabling this is that a Cilk
program's task dependency graph is a fork-join graph, whereas with atomics
it's a generic DAG and the number of task orderings an automated debugging
tool has to try with a DAG is potentially very large, whereas with a fork-join
graph it's always just two orderings. I wrote about it here
[http://yosefk.com/blog/checkedthreads-bug-free-shared-
memory...](http://yosefk.com/blog/checkedthreads-bug-free-shared-memory-
parallelism.html) \- my point though isn't to plug my own Cilk knock-off that
I present in that post but to elaborate on the benefits of a Cilk-like
interface relatively to raw atomics.

~~~
AgentME
1\. You can't ever get a reference to regular objects that exist in other
threads (workers). Communication with workers is limited to sending strings,
copies of JSON objects, transfers of typed arrays, and references to
SharedArrayBuffers.

2\. I assume it was done at a low level so that multi-threaded C++ could be
compiled to javascript (asm.js/WebAssembly).

~~~
_yosefk
For (1) does this mean that everything in the global namespace barfs when
called from a worker thread?

(2) sounds like it might need a larger set of primitives, though I'm not sure.

~~~
AgentME
1\. Web workers don't share a javascript namespace or anything with the parent
page. They're like a brand new page (that happens to not have a DOM). Outside
of SharedArrayBuffer, there's no shared memory.

------
_getify
I'm excited about the `SharedArrayBuffer` addition, but quite meh on the
`Atomic.wait()` and `Atomic.wake()`.

I think CSP's channel-based message control is a far better fit here,
especially since CSP can quite naturally be modeled inside generators and thus
have only local-blocking.

That means the silliness of "the main thread of a web page is not allowed to
call Atomics.wait" becomes moot, because the main thread can do `yield
CSP.take(..)` and not block the main UI thread, but still simply locally wait
for an atomic operation to hand it data at completion.

I already have a project that implements a bridge for CSP semantics from main
UI thread to other threads, including adapters for web workers, remote web
socket servers, node processes, etc: [https://github.com/getify/remote-csp-
channel](https://github.com/getify/remote-csp-channel)

What's exciting, for the web workers part in particular, is the ability to
wire in SharedArrayBuffer so the data interchange across those boundaries is
extremely cheap, while still maintaining the CSP take/put semantics for
atomic-operation control.

------
schmichael
> if we want JS applications on the web to continue to be viable alternatives
> to native applications on each platform

This is where I disagree with the direction Mozilla has been going for years.
I don't want the web to be a desktop app replacement with HTTP as the delivery
mechanism. I'm fine with rich single page web apps, but I don't understand the
reason why web apps need complete feature parity with desktop apps.

Why not let the web be good at some things and native apps be good at others?

~~~
dherman
I don't know if there's a uniform Mozilla position on this, but here's mine!
:) The main reason I care about the Web is because it's the world's biggest
software platform that isn't owned. If someone can deliver their app to the
world without submitting it for review by an app store and without paying a
company a %-age of the revenue, and if they can market it through the viral
power of URLs, then they have a lot more control over their own destiny.
That's why I think it's important for the Web not to give up on hard but
solvable problems.

But also I think there's a false dichotomy between "the Web should just be for
documents" and "the Web should just be for apps." The Web is simultaneously an
application platform that blows all other platforms out of the water for
delivering content. First, there's a reason why so many native apps embed
WebViews -- despite its warts, CSS is the result of hundreds of person-years
of tuning for deploying portable textual content.

But more importantly, you just can't beat the URL. How many more times will we
convince the entirety of humanity to know how to visually parse
"www.zombo.com" on a billboard or in a text message? It's easy to take the Web
for granted, it's fun to snark about its warts, and there's a cottage industry
of premature declarations of its death. But I personally believe that the
humble little hyperlink is at the heart of the Web's power, competitive
strength, and longevity. It was a century-old dream passed on from Vannevar
Bush to Doug Englebart to Xerox PARC and ultimately to TBL who made it real.

~~~
schmichael
> without paying a company a %-age of the revenue

It may not be a %-age of revenue, but you definitely can't host a non-trivial
webapp for free either.

You could even argue that in many webapps scaling costs are proportional to
revenue, which makes it awfully similar to an app store.

> But also I think there's a false dichotomy between "the Web should just be
> for documents" and "the Web should just be for apps."

Yeah, I don't have a clear idea on where the web should "end", but wow... web
pages able to eat all my cores and have data races seems like a line to be
crossed with great caution and care.

~~~
btown
> web pages able to eat all my cores

We're already there:

    
    
        var code = "while(true){}";
        var Blob = window.Blob;
        var URL = window.webkitURL || window.URL;
        var bb = new Blob([code], {type : 'text/javascript'});
        code = URL.createObjectURL(bb);
        for (var i = 0; i < 8; i += 1) new Worker(code);

------
seibelj
This is the last piece needed to allow multi-threaded code with shared state
to emscripten [0]. A very good thing indeed

[0] [http://kripken.github.io/emscripten-
site/docs/porting/guidel...](http://kripken.github.io/emscripten-
site/docs/porting/guidelines/portability_guidelines.html)

------
ryandvm
The saving grace of JavaScript's everything-is-async, single threaded model
was that it was just slightly less difficult to reason about than multiple
threads and shared state. (Though I'd say that's debatable...)

My guess is that, despite the sugar coating that JavaScript's async internals
have received of late, writing stable multi-threaded code with JavaScript is
going to be hard.

JavaScript now has the safety of multi-threaded code with the ease of
asynchronicity!

~~~
AgentME
SharedArrayBuffer only allows plain typed (byte) arrays to be shared at least.
Arbitrary javascript objects can't be shared, so there's a very clear division
about what can get affected by other threads and what can't. You don't have to
worry about whether existing libraries are thread-safe, etc.

~~~
Klathmon
Not only that, but if you don't need "Shared" array buffers (meaning more than
one thread using it at once) you can use "Transferrable" [1] ArrayBuffers.

It's just a zero-copy transfer to a worker (or from a worker) but it makes
sure the "sender" doesn't have access to the memory any more.

It's incredibly easy to use, avoids all the common issues and pitfalls with
shared memory, and being zero-copy it's stupidly fast.

Obviously it's not a replacement for true shared memory, but i've used it in
the past to do some image processing in the browser (broke the image into
chunks, and transferred each chunk to a worker to process, then return and
stitch them all back together).

[1][https://developer.mozilla.org/en-
US/docs/Web/API/Transferabl...](https://developer.mozilla.org/en-
US/docs/Web/API/Transferable)

~~~
pfooti
That's pretty rad - the vast majority of what I would see myself wanting to do
in a multithreaded javascript world would be limited to something with
transferrable arraybuffers. Like: "hey, worker, go do some work and lmk when
you're done". Moving big chunks of memory around in ways that atomically only
ever have one allowed accessing thread would be plenty.

~~~
yoklov
I thought this too, but the fact that you can't send only a chunk of an array
buffer is a huge limitation. It basically limits you do using a background
thread to do something, instead of dividing the work among many (well, you
can, but you loose most of the benefit of transferrable).

~~~
Klathmon
Well you can still divide the work among many workers, you just need to incur
the cost of copying/splitting the buffer before you start sending them off.

In most cases you know how many workers you want at the start of the program,
so that cost of splitting/merging only happens once (and you can do that
splitting/merging in a worker to avoid hanging the main thread) and then you
can pass those chunks around freely.

~~~
yoklov
For workloads like graphics, the cost of splitting/merging happens each time.

------
bhauer
On my grossly overpowered workstation, I can only crank the number of workers
in the Mandelbrot demo to 20 [1]. Attempting to go beyond 20, the console
reports:

    
    
        RangeError: out-of-range index for atomic access
    

That said, 20 workers is about 11x faster than the single-threaded version.

[1] [https://axis-of-eval.org/blog/mandel3.html?numWorkers=20](https://axis-
of-eval.org/blog/mandel3.html?numWorkers=20)

------
CHsurfer
I keep hoping that JS would evolve to support the actor model, a la
Erlang/Elixir, with their process based persistence, concurrency via message
passing, etc. It just seems so much simpler and tractable than this proposal.

~~~
cpeterso
I've seen projects to compile Erlang to JS, but has anyone experimented with a
JS compiler that targets Erlang's BEAM VM like Elixir?

JS is an approachable language but Node has problems with scaling and error
handling of non-blocking IO. Erlang solves those problems but the language is
not approachable and has a smaller ecosystem than JS. I'm imagining something
like Node with "micro-workers" so developers could reuse their existing JS
code, but not have to worry about scaling or non-blocking APIs.

~~~
jchrisa
Please implement!

------
kbenson
> This leads to the following situation where the main program and the worker
> both reference the same memory, which doesn’t belong to either of them:

If only Mozilla had some technology that could deal with ownership of
memory...

Seriously, if rust doesn't have an ASM.js optimized target yet, it really
should.

~~~
seibelj
If rust can be compiled to LLVM, then emscripten can be the backend to asm JS

~~~
steveklabnik
The issue is emscripten uses a different LLVM version than we do; so it can
work, but it's got some rough edges.

We want both compile to JS and compile to wasm to work well, the work just
isn't done yet.

------
rl3
> _Consider synchronization: The new Atomics object has two methods, wait and
> wake, which can be used to send a signal from one worker to another: one
> worker waits for a signal by calling Atomics.wait, and the other worker
> sends that signal using Atomics.wake._

Having not yet played with this myself: is anyone familiar with what kind of
latency overhead is involved with signaling in the Atomics API? I'm not very
familiar with the API yet, so I've no idea how signaling is implemented under
the hood.

The MessageChannel API by contrast (i.e. _postMessage_ ) can be quite slow,
depending. While you can use it within a render loop, it usually pays to be
very sparing with it. Typical latency for a virtually empty _postMessage_ call
on an already-established channel is usually .05ms to .1ms. Most serialization
operations will usually balloon that to well over 1ms (hence the need for
shared memory). Plus transferables suck.

> _Finally, there is clutter that stems from shared memory being a flat array
> of integer values; more complicated data structures in shared memory must be
> managed manually._

This is probably the biggest drawback to the API, at least for plain
Javascript. It really favors asm.js or WebAssembly compile targets for
seamless operation, whereas plain Javascript can't even share native types
without serialization/deserialization operations to and from byte arrays.

------
kgr
I'm excited to see progress in the area of JS concurrency, but I'm not sure
how useful this is going to be. It lets me share ArrayBuffers between workers,
but all of my data is in the form of Objects, not primitive arrays.

One place where I would like to use this is for collision detection, like in
this example:
[http://codepen.io/kgr/pen/GoeeQw](http://codepen.io/kgr/pen/GoeeQw)

But I'm relying on objects with polymorphic intersects() methods to determine
if they intersect with each other, and once I encode everything up as arrays,
I lose the convenience and power of objects.

~~~
phpnode
Here's a typed objects system for JS which uses ArrayBuffers for backing
storage, in future it will also support SharedArrayBuffer -
[https://github.com/codemix/reign](https://github.com/codemix/reign)
(disclaimer: I wrote this).

------
hkjgkjy
If only we did not have mutable data structures, there would be no or few
problems to find in this.

Concurrency isn't hard - try Clojure core/async and you will find out. Shared
mutable state is mind-boggingly hard

------
piotrkaminski
If the problem that this is trying to solve is that `postMessage` is slow and
you can't transfer slices of arrays, then perhaps they should solve it by
speeding up `postMessage` and making array slicing cheap? Forcing a shared-
memory concurrency model into JavaScript seems like a bit of an overreaction.

------
cpeterso
Are JavaScript workers implemented using real OS threads or green threads? How
heavyweight is a worker?

~~~
Klathmon
They are real OS threads in all implementations i've seen.

As for how heavy, they are definitely a bit heavier than i'd like. But rather
than me try to describe it, [1] is a really good benchmark with results that
you can run yourself if you want.

[1][https://github.com/gmarty/web-workers-
benchmark](https://github.com/gmarty/web-workers-benchmark)

------
imaginenore
I really want WebCL. Worker threads are just so lame compared to what GPUs can
do.

