I must be silly but I expected the web worker version to run faster. I must have...

sgift · on Oct 28, 2016

A quote from Brian Goetz which reminds us of the reasons not all code benefits from being run in parallel:

"A parallel execution always does more work than a sequential one, since — in addition to solving the problem — it also must decompose the problem, create and manage tasks to describe the subproblems, dispatch and wait for those tasks, and merge their results. So the parallel execution always starts out "behind" the sequential one and hopes to make up for the initial deficit through economy of scale."

https://www.ibm.com/developerworks/library/j-java-streams-4-...

But even then, your point stands that being able to work while the web worker version is running makes it better even if it is slower.

jsingleton · on Oct 28, 2016

Not silly at all. It can't run faster, as there is a maximum speed your machine could do this. Doing it in the background can't change this but it is non-blocking. Many threads won't always help speed it up, but it does get it off of the UI thread. For example, the browser process may be single threaded or already using all the available cores it can.

Asynchronous behaviours, including synchronising with the UI aren't free and all add overhead. Of course the benefit is that the app stays responsive to user input.

Some problems can be performed in parallel, but even if they can then it may slow them down. Orchestration isn't free and unless you can use more cores it probably won't help. I wrote a simple demo of this in .NET Core [0] for a book [1] (where I've written more on these topics, from a C# perspective).

It may not be intuitive to web developers, as the server is naturally blocking and async has a different effect on HTTP request/responses. Yet if you've done any desktop/mobile app work then you'll know it's very important to not do anything intensive on the UI thread. It's a pretty common mistake to make.

[0]: https://github.com/PacktPublishing/ASP.NET-Core-1.0-High-Per...

[1]: https://unop.uk/book

dspillett · on Oct 28, 2016

> I must be silly but I expected the web worker version to run faster. I must have understood "many threads".

They are only using a single worker thread, not multiple, so it will take pretty much the same time for the actual as the same engine is interpreting and/or JIT-compiling the code.The extra time you see is that needed to bring up the worker, manage synchronisation, tear it down when complete, and any other such overheads.

A quick look at the source says bubble sort is being used which on its own is not as obviously paralisable as, for example, Shell's sort, but multiple threads could be used here generally.

But most parallel sort algorithms assume efficient shared memory so the thread can all access the data immediately without having their own copy, or other fast IPC methods, which is not possible with webworkers which can only communicate via message passing currently which is unlikely to be efficient enough for large scale sorting in many (most?) cases. There are experimental shared memory options in some JS engines, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... for example) so this may change in the not too distant future.

In this case choosing a better single threaded sort would far outperform this done over multiple threads - they presumably picked a slow algorithm to properly illustrate their point.

Edit: Additionally, as well as building the worker you need to pass it the initial data to work on (or new data if reusing a worker in a more advanced algorithm) which will consume time. In this simple example the data to be sorted is created in the worker rather than being passed in (and back out sorted), so you'd need to factor that in too. Though transferables would help here if what you are doing naturally works with the object types that support that.

chadscira · on Oct 28, 2016

If you wanted to go crazy you can use transferables to get around the memory copy cost. It would be a fun experiment to figure out at what point does it make sense to use multiple workers considering the initial cost of worker distribution.

dspillett · on Oct 28, 2016

Having had a quick read (I'd somehow missed nontransferables and thought less efficient message passing was the only way, thanks for the pointer) that could help but because the objects are moved to the new context and lost in the original you do need to create and populate fresh ones for each worker (so extra copy operations that wouldn't be needed if true shared memory were possible).

Still, that might work well for some algorithms over certain data types. For example a divide and conquer based sort like merge or quick might benefit as you can send the first few partitions to different workers and let the first worker to the final merging. It wouldn't work for Shell's sort though as splitting the data into transferable chunks for that algorithm is not going to be efficient as it touches from the beginning of the array to the end in each iteration. And you'd need to be analyse if the gaining time splitting the actual sort between threads is enough to not be negated (or worse, dwarfed) by the time needed for all the extra data copying.

Outside sorting, number crunching tasks where the operations have a high locality of reference and so are easily partitioned would work well without hitting too much sharing trouble. Fractals for instance: computing one pixel of a Mandelbrot display does not rely on knowing anything about the neighbouring ones (or any others) so partitioning the work is going to be trivial. Compression too, if you don't mind losing a small amount of compression efficiency (due to dictionary resets at the partition boundaries) in exchange for the bump in potential parallelism.

Touche · on Oct 28, 2016

There's serialization costs for going back and forth.

In some contexts launching multiple workers is possible if you can split up work, in which case it would be faster. Sorting a single array is not such a context though.

cjdell · on Oct 28, 2016

You can bypass the serialisation cost by using the transferList for ArrayBuffer instances, which is used by typed arrays i.e. Int32Array, Float64Array etc.

i.e. self.postMessage(myTypedArray, [myTypedArray.buffer])

I used this approach to generated geometry within a web worker for a WebGL project. Only downside is the web worker that initialised the array will lose ownership of the buffer and will have to create a new one to pass more information.

https://developer.mozilla.org/en/docs/Web/API/Worker/postMes...

Touche · on Oct 28, 2016

Yeah, it's really only useful for game dev and other math heavy use-cases. In normal JavaScript, which could also benefit from threads, you use Objects and Arrays, which can't be transferred.

OskarS · on Oct 28, 2016

You can totally split sorting across different threads/cores, just divide the list up and do a final merge pass. Basically the idea behind external sorting, where the amount of data is not big enough to fit into main memory, so you have to sort chunk-by-chunk and then merge the chunks.

SigmundA · on Oct 28, 2016

The demo doesn't seem to serialize anything, it doesn't transfer the array.

I would guess the extra time is just worker initialization including fetching/parsing worker.js.

laxd · on Oct 28, 2016

Merge sort?

BatFastard · on Oct 28, 2016

3 more years and maybe HTML 5 will catch up to where Flash was 3 years ago! Here's to hoping!

wigster · on Oct 28, 2016

for me the web worker IS consistently faster. Chrome 54. 3100ms vs 3400ms.

spdionis · on Oct 28, 2016

Same here. 7000ms vs 5500ms. Chrome 54.