Hacker News new | past | comments | ask | show | jobs | submit login
Why does it seem like threading and fibers in Node have been abandoned? (adamcanady.com)
42 points by adamcanady on Dec 2, 2013 | hide | past | favorite | 47 comments



I work with the node source and I have made C/C++ based node modules.

Threading at the JS level in node is really ineffective. V8 has a global lock on all JS objects that prevent them from being passed between threads. The only way to allow threading is to serialize and deserialize the objects into new "isolates" (V8 language). V8 does this for some pretty obvious reasons, you wouldn't really want to allow threading at the JS level in a browser, now would you.

You can use threads if you dive down into C++ land, but event then, you should have a really good reason for doing so, because if you're only manipulating objects for processing you're going to have to serialize into a C++ data-structure, process, and then serialize into a JS object. Threads in node are good for doing process intensive work that doesn't require any two way input from the main event loop, otherwise it's very unlikely to be worth your time or effort.

This is why fibers are kind of silly for node and is also why process intensive programs should most likely be avoided in node (with rare exception). Please use node for what it is good at; it's a short lived connection proxy maven, and good at ancillary front-end work that requires complex state transitions (i.e. mobile). Working it into a monolithic stack is just asking for trouble and pain. No serious architect should consider it for a large monolithic rendering stack.


Agreed on all accounts, with one small, specific clarification:

If computation is local to each of many long-running threads, and the two-way messages being passed around is small enough (so the serialization overhead is negligible), then threads do perform better on multi-core machines than alternative models.

EtherCalc.org is one such use case: http://aosabook.org/en/posa/from-socialcalc-to-ethercalc.htm...


Agreed.


Don't Isolates also allow and support the use of Transferables? Objects can only be in one isolated thread or another, but I thought the purpose of Transferables was to get rid of the need to ser-deser them?

Which mates well with what one of the projects in the article is up to: audreyt's node-webworker-threads. Webworkers extend webmessaging, so it's- as you talk about- about passing objects to one another.

Certainly Webworkers can and have been implemented without threads. Node-webworkers does just that, but it uses websockets to do inter-worker communication: it cannot leverage Transferables that causes the need to copy objects. Would that there be a common process where postMessage() could transfer an object from one owner to another!

I highly recommend learning and using webworkers for all developers- they are a huge part of the future of the web- and anything that Node can do to nurture and make common and harmonious it's existence with the de-facto well defined means for multi-processing. I cannot but see as a good thing. I also cannot help but think that perhaps multi-threading is a fine and advantageous way to achieve webworkers intercommunicating form of multi-processing: it enables Transferables, where ser-deser can be avoided.


Transferables are fast, but they are still a copy operation.


+1. If you want threads, build a node c++ addon that manages your threads/high performance work (and try not to pass too much back and forth with your Javascript, because of the performance overhead of marshaling described above). Which means you are just writing a lot of C++, and your javascript simply becomes a convenient interface to start/stop the processing and script actions on values emitted from your c++ addon.

Or, like everyone else recommends: use processes and ipc (e.g., the cluster module).


Note that on Linux, the performance difference between processes and threads is an awful lot smaller than it was when threads first came into the world. This is another case where someone invented a better idea for performance reasons, and the people in charge of the old idea (processes) realized that they could do a lot better. So they did. And nowadays few people have a real need to squeeze every bit of performance out of a server. For most of us it is a question of buying one or two more servers. Only the Googles of the world really need to deal with this kind of performance tweaking, and if you look at what they are doing, it does not include Javascript.


> V8 does this for some pretty obvious reasons, you wouldn't really want to allow threading at the JS level in a browser, now would you.

I'd really like to have Erlang-style actors at the JS level in a browser, which is predicated on having threads.


> you wouldn't really want to allow threading at the JS level in a browser

I was wondering over this same statement. It seems rather...presumptuous?


Because technologies like Q [1] and streams [2] provide a very abstractions around doing long-running tasks asynchronously, which jives better with node's entire philosophy. Combine that with the fact that running multiple processes is a perfectly viable way to do actual long-running tasks and node provides abstractions around that out of the box, and it really starts to look like fibers and webworkers in node are patchy hold-overs for technologists who haven't fully changed into thinking about things in "the node way."

[1] https://github.com/kriskowal/q [2] https://github.com/substack/stream-handbook


How do promises help with CPU-bound tasks?


I already mentioned child processes, which model scaling up to multiple servers better than threads do, are more powerful than web workers, and are supported out-of-the-box with a powerful arbitrary message passing schema including the ability to transfer sockets between processes. It looks like there's some abstractions built around using multiple processes and work queues to help amortize the weight of the additional processes (see node-compute-cluster mentioned elsewhere in this thread), but I've not used anything like this.


> I already mentioned child processes, which model scaling up to multiple servers better than threads do ...

All taking away threads does is eliminate high-value optimization strategies that are available to multi-machine scalable systems that do support threads.

Threads don't dictate shared mutable state as a programming paradigm, they simply make it possible to avoid crossing expensive network and process boundaries where appropriate.

There's nothing stopping you from architecting thread-based architectures to use low-cost in-core dispatch of message within a VM, and save the high-cost serialization, IPC/network overhead for off-machine message dispatch.


On which OS is crossing the process boundary expensive?

And how expensive is expensive?


> On which OS is crossing the process boundary expensive?

All of them, always. It's either expensive in terms of resource cost (requiring serialization and complex local IPC overhead), or expensive in terms of complexity (requiring shared memory and inter-process coordination).


Node's cluster module is handy, but it's a workaround for the fundamental weakness in Node - the entire thing's built on top of a single-threaded event loop which blocks.

Instead of just communicating across threads to do some async busy work, you now have to communicate across processes... along with heartbeat monitoring of those processes and endless debate over which IPC mechanism to use.


"which model scaling up to multiple servers better than threads do"

And with that I see you are operating in the Node bubble, where "threads" will forever be stuck in 1995.

It isn't 1995 anymore.


There are threads and processes and both have their pros and cons. Yes it is great that Node has multi-process support out of the box, but what about when threads are a better solution? In my opinion, there is a place for a multi-threading support even if it is not the "Node way".


Short of rewriting the whole system around a threaded model, the only reason to use threads over processes in Node is to make the IPC faster. This means that ultimately supporting threads is an optimization that has to compete for cost-benefit with all the other possible optimizations for Node to support.


Surely you should do things like this in a separate task queue? Use a bunch of node processes for handling taking HTTP requests and spitting out responses, use a different bunch of processes for handling slow things.


Are there any practical examples (in the wild?) that are in-line with what you're saying here. Genuinely curious.


Browserify [1] is huge and is built on the stream principle. Streams do a bit of processing and can be combined. As far as q, I have seen it used in a couple places but don't have anything big-ticket off the top of my head. Here's the official public list: https://npmjs.org/browse/depended/q

[1] http://browserify.org/


I've f.e. used https://github.com/lloyd/node-compute-cluster which basically is a broker between various computation processes in node.


Hi, maintainer of https://github.com/audreyt/node-webworker-threads/ here.

We're using it happily in production, and it's proved to be quite stable, and there's quite a few users — certainly more than the average amount of emails I get from other projects. :-)

It's true that I haven't heard from the earliest adopters (like Uber.com), but it might be a sign of its maturity rather than abandonment.

The last commit was 2 month ago, and only to repair compatibility with the then-just-released V8 version. It will likely require another commit for Node 0.12.0 — pull requests welcome!

Edit: Just released v0.4.8 to npm with references to the http://aosabook.org/en/posa/from-socialcalc-to-ethercalc.htm... writeup.


I suspect people who dislike the idea of multithreaded programming are over-represented in the Node community. I'd even speculate that Node owes some of its popularity to being a release valve for a general thread-grumpiness that has been simmering in developer circles for years, fueled by the common belief that multithreaded programming is error-prone and difficult to reason about. So there's probably a bit of an ideological hurdle to overcome, but I also think that it's not an insurmountable problem, as the community matures and if a really useful solution presents itself.


It would be tough for Fibers to be totally abandoned, as http://www.meteor.com/ relies on it.

I don't think it's really needed any code updates recently, as it works on Linux / OS X / Windows and all recent versions of node.

I think it's somewhat hard to tell what the adoption of fibers looks like, as it typically wouldn't be used by other modules in npm. You don't want to force people using your module to use fibers. So, it would mostly see its usage in applications.

It will be interesting to see what happens to fibers once generators ship with core node (available now under the --harmony-generators v8 flag). The most compelling reasons for using fibers can also be met by using generators.

(I maintain a library built on top of fibers: https://github.com/scriby/asyncblock)


But surely Node.js is not meant for that? It's meant for high-load scenarios where you want to do simple processing for a lot of connections - not treated like a silver bullet. It's just an extra layer for GUI purposes. If you want something advanced you will be better off with a programming language instead of script anyway (type checking etc.). For example: do a long-running process using a proper technology and feed the status of the process to display on GUI via lean Node.js services.


Please explain why Node shouldn't be permitted or wouldn't be appropriate for "advanced" scenarios. What advantage does Node acquire by resisting fibers & threads inclusion? You seem to have a set characterization for where you think Node fits, but I don't understand what basis you are making these assertions from, and I'm curious if there's anything to justify these stated opinions.

As for type-checking, there are options such as TypeScript which can be used on Node if a team finds them helpful.


What advantages _does_ Node bring except for "same language on server side" and good handling of high-load scenarios?

I might be ignorant, but it seems that you have to do a lot of ceremonies for some simple parallelisation, how do I do Parallel.For(...) or Task.Run(...).ContinueWith(...) in Node.js?


There's a big difference between a thread implementation that works (Java) and a thread implementation that only works (Python). Even languages close to the metal enough that you'd think they could support threads (C++) have enough problems with libraries and dependencies that you can't entirely depend on threads working right.


The current way if your workload is cpu-bound is to use cluster or child_process with a priority queue.

http://nodejs.org/api/cluster.html http://nodejs.org/api/child_process.html


I disagree with the premise that a project is "abandoned" unless it has very recent commits to master. We are living in a weird world if that's the case.


My CPU intensive work is wrapped around services. This can be direct native calls to libs (c or openmp), actors via akka, finagle, storm, Hadoop, spark or mesos for that matter. The beauty of node is it's single threaded-ness. Doesn't mean I cannot orchestrate these single threaded apps as services, asynchronously. Freeing me up from thinking too much about locks and low level context switching.

Now not to say you can't have it "just because", a worthy exercise I'm sure. But in the coming years we will all be asked to use multicores, multiple machines, and multiple data centers. For these scenarios, it may make more sense to think high order.

Hopefully I made some sense, I happen to be on my mobile device.


From what I understand, v8 does allow multithreading, but each thread still needs its own isolate scope. That's roughly the equivalent of forking off another node process per thread, which is what cluster does.


Because writing plain javascript is simpler for the majority of people to understand and will run in more places without the need for custom setups. Module authors will not write to these custom implementations since it would greatly limit the audience for their module.


So, basically, because threading is hard.

Lol, keep killing it web bros.

(Full disclosure: I am, in fact, using Node now, and there is nothing quite so painful as the lack of coroutine or fiber or thread support. Cluster is cool, but only if you want to make a bunch of processes.)


I don't necessarily want threading in Node.js, but it would be nice to have coroutines/green threads...something simple that scales across CPUs, at least. Unfortunately with JavaScript I think the best you're going to get is Node's cluster module, because of the threading limitations with V8 and other JavaScript engines. Realistically, it's a trade off - Node.js is easy and quick to work with, but the only way to take full advantage of the machine's processing power is through forking processes.


Seriously, green threads with yield() and autoyield on blocking operations would be great. Sometimes the easiest way to write a program is to present it as a logical flow of events, and let the machine/operating environment hide the async nature of life from you.

Then again, that would mean reinventing (again) core OS functionality in Javascript. Le sigh.


People write modules for node to an audience rather than to scratch their own itch?


Hmm, the last commit for webworker-threads seems to be two months ago instead of six?


I hadn't run into node-webworker-threads project before. What a delight! I'd used Node-webworkers[1] in the past but that's multiprocess+websockets. Audreyt is killing it: a solid implementation of the standard webworker model.

[1] https://github.com/pgriess/node-webworker


I'm puzzled, what's the use case for threads in the node asynchronous model ? Isn't the major use of threads is to prevent long running task from blocking the app in synchronous programming ?


IO is asynchronous, but code execution is not asynchronous. If a block of code is doing intensive math for 100ms, then that blocks the whole process, and potentially other IO, from executing in that 100ms.

Perhaps you should think of node as "event-driven" rather than "asynchronous".


CPU-bound loads still block the event loop. Modules take care of this behind the scenes (sometimes?) but if you're rolling your own expensive operation threads can come in handy.


Sometimes you need to do a lot of CPU work which the async I/O model will never help with.


[deleted]


Not to be pedantic, but the Cluster API uses forked workers, so it's technically a multi-process model, each process with its own single thread.

AFAIK there is no threading support in node.js. Not today and not ever (?).


There's a model in all this that many developers lose sight of. More and more intensive work is being lifted on the front end. Obviously not a universal phenomenon, but important to any discussion about performance, threading, and IO blocking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: