Napa.js: A multi-threaded JavaScript runtime

ridiculous_fish · on Oct 27, 2022

This appears to be based on completely isolated runtimes. Note this line from fibonacci-recursive:

    // Broadcast function declaration of 'fibonacci' to napa workers.
    zone.broadcast(fibonacci.toString());

This "serializes" the function `fibonacci` as a string (!) and then `eval`s it in all the workers; this has some obvious limitations. Anyways about the same level as WebWorker.

Filip Pizlo probably has outlined the best attempt at how real multithreaded JS might work: https://webkit.org/blog/7846/concurrent-javascript-it-can-wo...

fulafel · on Oct 27, 2022

In https://github.com/microsoft/napajs/blob/master/docs/api/tra... it sounds like there aren't completely isolated runtimes - "JavaScript standard built-in objects in the whitelist can be transported among napa workers transparently" and it seems to imply they are not transported via serialization but via a sharing mechanism leveraging multiple threads in the same runtime.

(And also it talks about implementing your own Transportable classes)

esailija · on Oct 27, 2022

Yeah, this could have been a small npm package for node. Node already supports workers that communicate with serialization or shared array buffers+locks.

inglor · on Oct 27, 2022

Napa is mostly abandoned and predates Node.js having native multi-threaded support and worker_threads.

simplotek · on Oct 27, 2022

This should be the top comment. The last commit to the repo was posted 5 years ago, and all development seemed to have stopped on 2018

lucasyvas · on Oct 27, 2022

I think JavaScript is at its most useful within the bounds of a single threaded event loop, personally.

Cool project and props for Node module interop, but I personally can't see why I'd use this.

If you want to go more parallel/concurrent, there are already great languages to choose from! Unless you're of the mindset that everything that can be done with JavaScript will be done with JavaScript :)

0x445442 · on Oct 27, 2022

This! The single threaded event loop is a great model for UI development but it’s a train wreck for handling http requests. I’ve spent the last year adding features to and fixing bugs on various micro services written in Java where the authors decided to use the new shiny reactive frameworks in attempt to turn Java into NodeJS. On a number of occasions I’ve been in meetings where the original authors got lost in their on maze Just Single madness.

The worst part is these were “Staff Engineers” who I know never did any type of analysis/engineering to show these frameworks provided any type of advantage over the more traditional threaded frameworks when considering runtime performance, resource requirements, understandability and maintainability.

beebeepka · on Oct 27, 2022

Aahaaa... What about already existing projects or front-ends? Nobody should use workers because there are other languages out there and also because lucasyvas said so with a playful smile

Scarbutt · on Oct 27, 2022

but I personally can't see why I'd use this.

For webapps, it means using one language/ecosystem instead of two.

sidcool · on Oct 27, 2022

I don't see any new commits since 4 years now. Any reason this is on top?

_aaed · on Oct 27, 2022

Oddly enough, it's not in https://killedbymicrosoft.info/

dustedcodes · on Oct 27, 2022

Oddly neither is F# </sarcasm>

sbf501 · on Oct 27, 2022

I'm still not over Wunderlist. How effing hard is it to make a shared to-do list that gets the basic UI features right? I'm looking at you, Apple.

jonathanyc · on Oct 27, 2022

Are people still using this, or some successor to this? It looks very cool, but the last commit was in 2018.

inglor · on Oct 27, 2022

The "successor" is the fact we've added native support to Node.js core

stareatgoats · on Oct 27, 2022

Perhaps relevant comment from previous discussion (2017) https://news.ycombinator.com/item?id=15499144

franze · on Oct 27, 2022

Reminds me of the great old RingoJS project https://github.com/ringo/ringojs which do or did power some of Austrias biggest websites.

bilekas · on Oct 27, 2022

Ice been using PM2 for years and it's probably the best I've found. I remember trying this earlier but didn't really deliver the kind of Performance it was promising.

beebeepka · on Oct 27, 2022

I've only used pm2 as a task runner. Never thought of it as a "runtime"

bilekas · on Oct 27, 2022

Depending on how you have it configured. Using a combination of worker pools, and taking advantage of the ecosystem config files. You can distribute work off to multiple threads. In one of my use cases I have a Redis server between all as a POC to see the kind of redundancy that could be achieved. It's flexibility is its greatest feature imo.

wiseowise · on Oct 27, 2022

Where are the benchmarks?

Vanit · on Oct 27, 2022

I'm curious what this means for GC and WeakMap etc.

mhio · on Oct 27, 2022

Each thread or worker is a V8 Isolate, each with their own GC. I don't think WeakMaps could cross the isolate boundary.

The project lists a set of "transportable" types, which can be passed in as an argument when calling a worker function - https://github.com/microsoft/napajs/blob/master/docs/api/tra...

soylentgraham · on Oct 27, 2022

Wonder why this was built on v8 and not chakra

samsquire · on Oct 27, 2022

I love this. My hobby is multithreaded programming. I enjoy trying to get high requests per second with parallel programs. I tend to use Java, C and Rust. But were it possible to create shared memory threading with JavaScript I would probably do more in JavaScript too.

Most CPUs have multiple cores and Amdahl's law and universal scaling law mean there's a a very real scalability advantage to running on multiple cores. Especially with Hyperthreading and the big.LITTLE or chiplet design or Intel's efficiency and performance cores.

One limitation to multithreading nobody talks about is talked about in this whitepaper. It essentially means parallel speed up is limited to cubic speed up.

https://ieeexplore.ieee.org/document/2150

But unfortunately the whitepaper requires payment to read.

I found a summary in this PDF:

https://web.eecs.umich.edu/~imarkov/pubs/jour/dt13-limits.pd...

"To this end, the 1998 IEEE Transactions on Computers paper ‘‘Your Favorite Parallel Algorithms Might Not Be as Fast as You Think’’ by David Fisher accounts for the finite density of processing elements in space, the (low) dimension d of the space in which parallel computation is performed, the finite speed of communication, and the linear growth of communication delay with distance. Neglected in most publications, these four factors limit parallel speed-up to power d + 1.Considering matrix multiplication as an example where exponential speed-up is possible in theory, a two-dimensional computing system (a planar circuit, a modern GPU, etc.) can offer at most a cubic speed-up. Given that the general result is asymptotic, it is significant only for large numbers of processing elements that communicate with each other. In particular, for circuits and FPGAs, it limits the benefits of threedimensional integration to power 4/3 (optimistically assuming a fully isotropic system). For twodimensional GPUs, at most a cubic speed-up over sequential computation is possible. To this end, a 2012 report by the Oak Ridge Leadership Computing Facility analyzed widely used simulation applications (turbulent combustion; molecular, fluid and plasma dynamics; seismology; atmospheric science; nuclear reactors, etc.). GPU-based speed-ups ranged from 1.4 to 3.3 times for ten applications and 6.1 times for the eleventh (quantum chromo-dynamics). These mediocre speed-ups likely reflect flaws in prevailing computer organization, where heavy reliance on shared memories dramatically increases communication costs, but alternatives would drastically complicate programming."

I wrote a parallel actor multithreaded implementation that can get 100 million requests per second without locks. I also wrote an parallel assembly interpreter which can execute this program which uses the underlying actor implementation. Notice the mailbox command. This assembly program essentially sends integers between 25 threads and it also sends a method call between thread (it sends a jump instruction to the other thread for running on that thread.

   threads 25
   <start>
   mailbox numbers
   mailbox methods
   set running 1
   set current_thread 0
   set received_value 0
   set current 1
   set increment 1
   :while1
   while running :end
   receive numbers received_value :send
   receivecode methods :send :send
   :send
   add received_value current
   addv current_thread 1
   modulo current_thread 25
   send numbers current_thread increment :while1
   sendcode methods current_thread :print
   endwhile :while1
   jump :end
   :print
   println current
   return
   :end

https://GitHub.com/samsquire/multiversion-concurrency-contro...

One problem with a javascript multithreaded runtime, which Python also suffers with its subinterpreter design is that data must be marshalled between threads. This is slow.

I would like to design a interpreter that can share data with zero cost abstractions when moving data across a subinterpreter boundary. I think it can be done in zero copies.

Erlang copies data when data is sent between processes which allows garbage collection to be per process.