Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Sorting 50K Array, HTML5 Web Worker Demo (afshinm.github.io)
88 points by afshinmeh on Oct 28, 2016 | hide | past | favorite | 59 comments



For anyone that is interested in easily using WebWorkers for this sort of work please check out task.js (https://github.com/icodeforlove/task.js).

It allows you to send a pure function with provided arguments to a autoscaling WebWorker pool.

Heres a similar demo (http://s.codepen.io/icodeforlove/debug/ZOjBBB , you can bring all cores to 100% usage)

But for this specific scenario it could have been coded like this using task.js (vs http://afshinm.github.io/50k/worker.js which is not reusable)

    // mentioned sort function (close to exact code from site)
    function bubbleSort(a) {
        var swapped;
        
        do {
            swapped = false;
            for (var i=0; i < a.length-1; i++) {
                if (a[i] > a[i+1]) {
                    var temp = a[i];
                    a[i] = a[i+1];
                    a[i+1] = temp;
                    swapped = true;
                }
            }
        } while (swapped);

        return a;
    }

    // turn the function into a worker task
    const bubbleSortAsync = task.wrap(bubbleSort);

    // this now runs in a autoscaling worker pool (first run includes worker initialization time)
    console.time('BubbleSort Worker');
    bubbleSortAsync([9,1,2,3]).then(sortedArray => {
        console.timeEnd('BubbleSort Worker');
    }, error => { /* this is extra credit, yes we can handle worker errors */ });
Also when dealing with large messages you can use transferables (https://developer.mozilla.org/en-US/docs/Web/API/Transferabl...).

The web has really came a long way, this kind of stuff is extremely useful, for instance you could use it when processing pixel data for a 10MP image with a RGBA array of 40M. In that scenario the work can be spread across all cores, and blows away the alternate approach of just locking the UI. In that case if you were using task.js you would just chunk the work, and use a async control flow library like bluebird.

   // in this situation all cores would be at 100% without locking the UI
   let processedPixels = await Promise.map(chunks, processPixelChunkAsync);


Although the web worker version is slower, the fact that it doesn't block UI is a huge gain.

I tried to see if I could make the bubble sort function async as to not block the browser UI using callbacks as opposed to workers. Here's a (messy) fiddle for those that want to check it out: https://jsfiddle.net/tbinetruy/8q1de9jk/ (watch out, this fiddle runs the blocking bubble sort on a 50k array on page load, so it'll freeze your UI for 5 seconds. Then the non-blocking sort kicks in and your UI becomes usable again).

For a 50k array, the blocking code ran in 6.682s and the non-blocking code in 9.297s (when I ran it locally just now anyway). I added a clickable button and update the DOM every 25ms with a random number to show the user what the non blocking code allows for.

The same type of non blocking code without relying on web workers should also be possible with async functions and promises. I'll have to look into it.


I don't get why everyone is focused on the worker version being slower. Microbenchmarks in JS are hard to get right, and I don't think this demo attempts to make an accurate comparison. It's just showing what workers do and why they're useful.

I use workers heavily in a game I'm making, and in my experience they perform equivalently to regular JS, modulo any time you spend copying memory back and forth.

edit: It seems this demo dynamically creates the worker from scratch every time it's used, and destroys it afterwards. That's probably the cause of any performance difference people are seeing.


Good catch.

From my experience though there is a small overhead in terms of workers, also the message cost. But still even if it was 300% slower it's justified because it is unacceptable to lock the browser.

Too many commenters here are coming from the backend, or desktop software where they already have a UI thread. I ask these commenters a simple question, when is the last time you saw your browser freeze? If that doesn't happen often then the devs are doing many things like this to keep that experience enjoyable. Web is it own kind of beast in terms of development, the biggest concern is perceived responsiveness. We get to write off async delays when using webworkers (and similar methods) because users are already used to network requests taking time.


In response to your last comment, you should check out https://github.com/icodeforlove/task.js. It will do that and much more.


To me, it's not slower, performance almost exactly matches the blocking version. I'm on i7 Macbook Pro and Chrome.


On Android Chrome the web worker took 18 seconds while the non web worker took 11 seconds but was utterly unresponsive. Having a responsive ui using the web worker with later results than a non responsive ui with faster results is a trade off I'll take any day.


You don't need web workers to be "responsive". You just need to cut the job up into bits, and yield to the UI thread every so often.

It's utterly unresponsive because it's not programmed very well. Not because it's not using web workers.

The demo is a great example of why Threads are inefficient for most cases. Better programming is what's needed.


That requires decomposing problems in more unnatural ways and introducing greater overhead.

Using a trivial sort example, you're suggesting something like:

  sort (Array):
    i := 0;
    while(not-sorted):
      sort-step
      i := i + 1
      if i % 100 = 0:
        yield
This requires tuning, and all sorts of fun things. And a redesign of your algorithms.

With something like webworkers, you don't have to alter your algorithms in any fundamental way. They're left alone and the change is how they're plugged into each other. You allow the scheduler to handle yielding and other details.


It requires programming.

Starting a thread is usually lazy inefficient programming. IMHO programmers shouldn't be allowed to try using threads until they're actually competent working in a single thread.


Ok. So they're supposed to manually create all the potential benefits of multithreading with manual yields and coroutines and the like?

I have an application, it does the following things:

  - Presents an interface to the user
  - Loads data files
  - Loads a test description file
  - Processes the data files per what's in the test description
  - Displays all this information
  - Provides custom filtering options for data analysis
You propose that I should do all of this in a single thread? The test processing can take a minute in some cases (couple hundred megs of data, couple hundred test cases to consider, rare but happens). The user shouldn't be able to use the other filtering tools while this is going on?

In some cases, loading the data files can take several minutes (again, hundreds of megs sometimes) if done over the network. I can't stop them from doing that. Should the UI lockup during this time preventing them from doing other things? (Admittedly, at that point there's only two other things they might do, change a couple options and specify which test description file to use).

It's a simple app, but it definitely benefits from multi-threading. I don't need to come up with some convoluted method of interleaving different tasks that have different run-time considerations. The scheduler can and should take care of that for me.

EDIT: I also assume you wouldn't want anyone to use languages like go or erlang where multi-threading (with green threads) is the expected use case.


Of course it's far better to start several threads, throw all the instructions at the task scheduler, have it execute them in a random order, deal with synchronization and concurrency issues. Of course that's better...


A less snarky answer, leaving the other alone.

You have a program that needs to run tasks A,B,C (can be in parallel) followed by D (must be sequential, after the others).

You can optionally put everything in a serial process:

  ABCD
But that's only "optimal" if you have a single core, and no asynchronous tasks (like a delay in B for accessing a file over the network, where it doesn't touch the CPU for 10 seconds).

Or you can design it like:

  (A|B|C)D   % where | indicates independence, parallelism
Now, if we have a language like erlang or go, you can do something like (erlang is rusty):

  main() ->
    Pids = lists:map(fun (F) -> spawn(F, [self()]) end, [fun a/1, fun b/1, fun c/1]),
    lists:map(fun (Pid) -> receive
                             Pid -> ok
                           end
               end, Pids),
    d().
There, done. The scheduler, now, will handle the interleaving of each of those independent processes across the available cores.

If you want to do that manually, without preemptive multitasking, have fun tuning it correctly. Do you have each parallel task look like this:

  TaskA:
    while(work-to-be-done)
      do-work
      yield
And just hope for the best? Is the context switching really going to be worth it? You're running a yield on each iteration of the loop now. That's clearly wasteful as you lose out on things like branch prediction, cache speed versus RAM speed, for running over this a number of times.

So now you have to tune each one of these if you really want the optimal results. And you have to know precisely what machine your target system is running. Yeah, in the embedded world, we've got that, and we pretty much do exactly what you suggest. Because we control the full stack. We know exactly what software runs on a CPU. We know exactly what the CPU speed will be. Memory access times. We can come up with near optimal, low-level coroutine style programs like this.

But most programs don't run in such a known, quantified, and understood environment.


And I assume, then, that you're only ever using assembly for programming? Because that's the only way you can be sure you're not wasting any cycles.


Show us, won't you?


I must be silly but I expected the web worker version to run faster. I must have understood "many threads". It ran nearly 200ms slower :)

That said, not locking the UI thread and showing user feedback is totally worth 200ms extra for most tasks.


A quote from Brian Goetz which reminds us of the reasons not all code benefits from being run in parallel:

"A parallel execution always does more work than a sequential one, since — in addition to solving the problem — it also must decompose the problem, create and manage tasks to describe the subproblems, dispatch and wait for those tasks, and merge their results. So the parallel execution always starts out "behind" the sequential one and hopes to make up for the initial deficit through economy of scale."

https://www.ibm.com/developerworks/library/j-java-streams-4-...

But even then, your point stands that being able to work while the web worker version is running makes it better even if it is slower.


Not silly at all. It can't run faster, as there is a maximum speed your machine could do this. Doing it in the background can't change this but it is non-blocking. Many threads won't always help speed it up, but it does get it off of the UI thread. For example, the browser process may be single threaded or already using all the available cores it can.

Asynchronous behaviours, including synchronising with the UI aren't free and all add overhead. Of course the benefit is that the app stays responsive to user input.

Some problems can be performed in parallel, but even if they can then it may slow them down. Orchestration isn't free and unless you can use more cores it probably won't help. I wrote a simple demo of this in .NET Core [0] for a book [1] (where I've written more on these topics, from a C# perspective).

It may not be intuitive to web developers, as the server is naturally blocking and async has a different effect on HTTP request/responses. Yet if you've done any desktop/mobile app work then you'll know it's very important to not do anything intensive on the UI thread. It's a pretty common mistake to make.

[0]: https://github.com/PacktPublishing/ASP.NET-Core-1.0-High-Per...

[1]: https://unop.uk/book


> I must be silly but I expected the web worker version to run faster. I must have understood "many threads".

They are only using a single worker thread, not multiple, so it will take pretty much the same time for the actual as the same engine is interpreting and/or JIT-compiling the code.The extra time you see is that needed to bring up the worker, manage synchronisation, tear it down when complete, and any other such overheads.

A quick look at the source says bubble sort is being used which on its own is not as obviously paralisable as, for example, Shell's sort, but multiple threads could be used here generally.

But most parallel sort algorithms assume efficient shared memory so the thread can all access the data immediately without having their own copy, or other fast IPC methods, which is not possible with webworkers which can only communicate via message passing currently which is unlikely to be efficient enough for large scale sorting in many (most?) cases. There are experimental shared memory options in some JS engines, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... for example) so this may change in the not too distant future.

In this case choosing a better single threaded sort would far outperform this done over multiple threads - they presumably picked a slow algorithm to properly illustrate their point.

Edit: Additionally, as well as building the worker you need to pass it the initial data to work on (or new data if reusing a worker in a more advanced algorithm) which will consume time. In this simple example the data to be sorted is created in the worker rather than being passed in (and back out sorted), so you'd need to factor that in too. Though transferables would help here if what you are doing naturally works with the object types that support that.


If you wanted to go crazy you can use transferables to get around the memory copy cost. It would be a fun experiment to figure out at what point does it make sense to use multiple workers considering the initial cost of worker distribution.


Having had a quick read (I'd somehow missed nontransferables and thought less efficient message passing was the only way, thanks for the pointer) that could help but because the objects are moved to the new context and lost in the original you do need to create and populate fresh ones for each worker (so extra copy operations that wouldn't be needed if true shared memory were possible).

Still, that might work well for some algorithms over certain data types. For example a divide and conquer based sort like merge or quick might benefit as you can send the first few partitions to different workers and let the first worker to the final merging. It wouldn't work for Shell's sort though as splitting the data into transferable chunks for that algorithm is not going to be efficient as it touches from the beginning of the array to the end in each iteration. And you'd need to be analyse if the gaining time splitting the actual sort between threads is enough to not be negated (or worse, dwarfed) by the time needed for all the extra data copying.

Outside sorting, number crunching tasks where the operations have a high locality of reference and so are easily partitioned would work well without hitting too much sharing trouble. Fractals for instance: computing one pixel of a Mandelbrot display does not rely on knowing anything about the neighbouring ones (or any others) so partitioning the work is going to be trivial. Compression too, if you don't mind losing a small amount of compression efficiency (due to dictionary resets at the partition boundaries) in exchange for the bump in potential parallelism.


There's serialization costs for going back and forth.

In some contexts launching multiple workers is possible if you can split up work, in which case it would be faster. Sorting a single array is not such a context though.


You can bypass the serialisation cost by using the transferList for ArrayBuffer instances, which is used by typed arrays i.e. Int32Array, Float64Array etc.

i.e. self.postMessage(myTypedArray, [myTypedArray.buffer])

I used this approach to generated geometry within a web worker for a WebGL project. Only downside is the web worker that initialised the array will lose ownership of the buffer and will have to create a new one to pass more information.

https://developer.mozilla.org/en/docs/Web/API/Worker/postMes...


Yeah, it's really only useful for game dev and other math heavy use-cases. In normal JavaScript, which could also benefit from threads, you use Objects and Arrays, which can't be transferred.


You can totally split sorting across different threads/cores, just divide the list up and do a final merge pass. Basically the idea behind external sorting, where the amount of data is not big enough to fit into main memory, so you have to sort chunk-by-chunk and then merge the chunks.


The demo doesn't seem to serialize anything, it doesn't transfer the array.

I would guess the extra time is just worker initialization including fetching/parsing worker.js.


Merge sort?


3 more years and maybe HTML 5 will catch up to where Flash was 3 years ago! Here's to hoping!


for me the web worker IS consistently faster. Chrome 54. 3100ms vs 3400ms.


Same here. 7000ms vs 5500ms. Chrome 54.


For anyone familiar with the landscape:

Are webworkers likely to improve dramatically as time goes on? For example, the cost of using web workers appears[1] to be far greater than just spinning up another thread in, say, Go[2]. I assume this is mainly due to how the browser is sandboxing JS. So, is there room for improvement here? Or have most things js related been so drastically optimized that we are without much hope here?

[1]: Purely speculative on my part. [2]: Yes i know Go has green threads, it's purely an example. I was going to say Rust, but that seems like an unfair comparison.


In my experience web workers aren't noticeably slower than regular JS (apart from the cost of copying any shared memory back and forth).

They're slower in this demo (for some people but not others), but microbenchmarks in JS are very hard to to do right, and I don't think the author here was trying to make an accurate ubenchmark.


The difference on my machine is not even 50ms - good tradeoff for more responsiveness


300-400 ms on mine.

Ofcourse, not hanging the website beats 400 ms.


Cool demo! Web Workers are nice, and well supported by the evergreen browswers. Lots of apps do CPU intensive stuff on the server though and don't actually need them.

I am more interested in "Shared Web Workers", since it would allow for synchronization between tabs. It's got pretty poor support though, only Firefox, Chrome desktop, and Opera :(


But why have a server do it when you can use the client!

(Use them to read in semi-large xlsx files and unzip and resize folders of images client side before uploading, really because processing them one by one on the client makes it simpler to show progress (feels like we're doing this wrong, but it works), but also saves resources for us.)


Check out localStorage: it's change event is shared instantaneously between tabs


That's what I'll have to do. It just seems hacky when all I want to do is pass events and not persist the data. Plus I think localstorage is turned off in some private browsing modes, maybe Safari?


Oh my, I was given the impossible task of writing a hardware resource allocation algorithm to produce encoding XML config from the UI in JavaScript. Think of a hardware device needing to efficiently transcode 80 SD or 12 HD (mix and match) video streams (plus audio - stereo/dolby) on varying encode cards (6 max), all from a drag and drop interface and without disrupting existing encoding allocations to prevent glitches while it reconfigures.

Long story short, this had to work in IE8 and not block the UI to get it done and no plugins. I wrote some crazy way of doing the work in chunks using a setTimeout to get it complete. The project was a success but I stated very firmly that this was a job for the device, not the UI in future versions. Hard when the device in question only had a 300MHz CPU.

Those were the days.


Life before WebWorkers was fun :). Another way was to create multiple iframes, and use a form of hacky message passing to use more cores, but i think this may have depended on the browser implementation.


Haha, would never of thought to use iframes!


Btw, webpack's worker-loaders make using web workers incredibly easy.


Is this demo supposed to highlight the strength of web worker? It runs 300-400ms slower on web worker.


Web workers offer several benefits:

1) UI isn't locked up by long running tasks;

2) Potential performance improvement by running multiple processes at once (parallelization);

3) Potential design improvements by decomposing the problem into a CSP-like design (which may or may not relate back to (2)).

(2) is really not the purpose though, it's just a potential benefit. golang, for instance, doesn't give us inherently better performance by use of goroutines. What it gives us is better design and separation of execution blocks so that they can run concurrently. Better performance is just a perk when things can run in parallel.


Yes. And it succeeds. The browser is responsive during the web worker method and is beachballing without.


"As you can see, without Web Worker, your browser maybe able to sort the 50K Array but you can't work with your browser while sorting and also your browser won't render anything until sorting ends, that's why you can't see the animated progress bar in the page."


2.8 seconds without 3.3 seconds with webworker

Imho not acceptable trade-off, did not hear the term web workers yet, and now i'm not even upset i did not and probably will never....


2.8 seconds is locking the browser

3.3 seconds is not locking browser

When dealing with web it is unacceptable to lock the browser. Also with core / worker distribution you may be able to make the worker approach faster :).


That's something we disagree on then, in my humble opinion, on the web it is unacceptable to make users wait longer then necessary. And it's not a small, 1 or 2 percent difference. And certainly not longer then 1 second.

And why is the example then not setup with core/worker distribution so it actually does as you write 'potentially' work faster?

Nice i can still use my browser while i wait longer for the stuff i _want_ to see. But what is the use? I want to see that data/page i click the link or button for, nothing else.

Or please elaborate why it is in _your opinion_ unacceptable to lock a browser but get the data on screen faster?


Its not just something me, and you disagree on. I would say it's me, plus a good amount of other, and also most browsers.

If the browser is locked over X seconds it will actually tell you that the script is unresponsive, and ask if it should stop it. But you can of course work around this, and disable it in your browser of choice (http://www.reginout.com/nflp/fix-unresponsive-script-error-i...). Or you could just use a webworker, and not lock the UI.


If my opinion differs from you or anybody else, that would be considered a disagreement.

I never see my browser being locked, and if it is locked waiting for the information i desperately want to see, i honestly couldn't care of my UI locks up during that. I don't care about anything else at that point.

I understand this is not how most people look at it and to you there is just one way and thats the responsive UI way. But i see it as they rather have nice spinning wheels but a car that won't drive anyway..

And if that test uses a sleep, which is specifically says it does not, it would have even been a bad showcase. Now it performs bad (people suggest using multiple workers.. lol for a bubble sort)...

Be critical about a non locking UI that doesn't show your information anyway. Fix the application it self so it wouldn't need to lockup in the first place.


This demo is not a demo of performance, it's a demo of the UI not being locked up. It could've used a 3 second sleep for all that the sorting itself mattered.


That's not how you should think about this.

It's 2.8 seconds of unresponsiveness vs. 0 seconds of unresponsiveness.


3.5 seconds on Chrome.

Native code does this around 10000% faster.


Sorted with bubble sort. It's just an artificial load to facilitate the demo.


Web Workers are such a disaster of a terrible API. I can't believe actual programmers came up with that syntax.

Passing strings and filenames around, what were they thinking?


This reads like the gratuitous/lazy critique of someone who didn't bother to read the actual API documentation for Web Workers. You can pass more complex data structures to Web Workers than mere strings. ArrayBuffers instances for example.


Oooooo, ArrayBuffers! Really?

I've actually used Web Workers and they're horrible to use. Just even a glance at the syntax will send any sane programmer into a pit of despair.

The 2nd comment at the moment is library to use Web Workers without having to use the default sucky API.

That says it all.

Why would I make a huge comment explaining the obviously bad the API is when even the laziest programmer can open the link and see how bad it is?


It basically like a process rather than a thread, and you can share buffers zero copy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: