A guide to threads in Node.js

tannhaeuser · on March 24, 2019

> It was only in 2009 that Ryan Dahl, creator of Node.js, made it possible for developers to use the language to write back-end code

Server-side JavaScript was a thing long before that. In fact, Netscape used it as early as 1994 [1], predating Java as a backend language. And CommonJS (on what nodejs' modules and many APIs are based) was a community effort towards a common API by 2000s SSJS implementations (helma, v8cgi/teajs, and others).

Apart from that, a nice read for those who need a WebWorker-like API for CPU-bound nodejs tasks.

[1]: https://en.wikipedia.org/wiki/Server-side_scripting

[2]: http://wiki.commonjs.org/wiki/CommonJS

kbenson · on March 24, 2019

And Rhino[1], which is a Javascript engine written by Netscape in Java, originally for a Java based browser but used quite a bit on the server when that project was killed).

1: https://en.wikipedia.org/wiki/Rhino_(JavaScript_engine)

lucideer · on March 25, 2019

The CommonJS wiki has unfortunately been updated over time to remove some past info (e.g. dead projects), so it doesn't give as good an overview of what once existed as it once did (unless you dig through MediaWiki edits).

Flusspferd—a Spidermonkey-based alternative that was also released in 2009—has quite a nice overview page[0] that's somewhat frozen in time, including "Supported CommonJS Specifications" and "Related projects" sections.

[0] http://flusspferd.github.io/

egeozcan · on March 24, 2019

I was playing with SilkJS at some point. Not sure if it was there earlier than node but the earliest archive is from 2012 https://web.archive.org/web/20120108050201/http://silkjs.org...

Funny now there's zero references to it on the internet. I had really liked it back then. It was fully synchronous and had support for threading. Cool stuff:

https://web.archive.org/web/20130615124030/http://silkjs.net...

Edit: Repository seems to be alive! https://github.com/mschwartz/SilkJS/

denysonique · on March 24, 2019

There was also Aptana Jaxer around 2008: https://en.wikipedia.org/wiki/Aptana#Aptana_Jaxer http://www.jaxer.org/

bryanrasmussen · on March 25, 2019

I wish Aptana had won the race, or at least if not beat Node, competed reasonably with Node.

rileymat2 · on March 25, 2019

https://en.wikipedia.org/wiki/JScript#JScript

In ~1997 you got a version of JScript (very close to JavaScript) in IIS.

bryanrasmussen · on March 25, 2019

Well JScript was Microsoft's implementation of JavaScript that they were at the time prevented from calling JavaScript IIRC, so I guess I would just say it was JavaScript. Since it was implemented as an ActiveScripting language it meant it could basically be made to work anywhere in Windows.

rileymat2 · on March 25, 2019

Yes, those were wiggle words for the pedantic crowd. There are very minor differences/additions that were included in JScript. But almost enough not to be relevant.

est31 · on March 24, 2019

> Netscape used it as early as 1994

Sad how neither this, nor xulrunner took off. They simply were ahead of their time. Possibly still are.

taf2 · on March 24, 2019

I deployed a consumer desktop application on xulrunner (really pre xulrunner) targeting windoze and OS X. I had a Linux version working too. I still from that experience believe html,css,js is the best choice for user facing applications. It was a health app for medical records so in 2004-5 we felt privacy wa s the main reason to not do this as a web app. We still had users that I know of until 2012, if I hadn’t lost the code I’d have open sourced it

crabasa · on March 24, 2019

> type WorkerCallback = (err: any, result?: any) => any;
export function runWorker(path: string, cb: WorkerCallback, workerData: object | null = null) { const worker = new Worker(path, { workerData });
worker.on('message', cb.bind(null, null)); worker.on('error', cb);
worker.on('exit', (exitCode) => { if (exitCode === 0) { return null; }
return cb(new Error(`Worker has stopped with code ${exitCode}`)); }); return worker;
}

For an article titled "A Guide to Threads in Node.js", I really wish the first example of writing a thread wasn't in TypeScript.

danenania · on March 24, 2019

Why? Just think of it as inline documentation. It in no way changes the semantics of the javascript and efficiently communicates how the API works.

crabasa · on March 24, 2019

When writing a developer guide, especially to something as fundamental as "threads in Node.js", it pays to:

1) Make as few assumptions as possible regarding what the reader knows or has experience with

2) Make code samples as complete and "copy-and-pastable" as possible.

You're assuming that all developers reading this guide will effortlessly translate that TS code to JS.

Using TS creates a possible barrier to understanding, which I'm sure isn't the intent of someone publishing such a guide. Better to simply write some comments or augment the prose in the guide.

danenania · on March 24, 2019

Ok, fair enough. But for anyone who has some familiarity with type systems, the lack of type signatures (or needing to mentally parse them in some other non-standard format) also acts as a barrier to understanding. Perhaps either a side-by-side view or a tab switcher would be ideal.

crabasa · on March 24, 2019

Except this is a guide for Node.js (JavaScript). The one assumption the author can make is that the reader knows and is comfortable with JavsScript.

danenania · on March 24, 2019

Yeah I'm not trying to disagree with you there, just pointing out that TS types are an efficient way to add important context that many (like me) will appreciate, and as a popular well-defined standard I think they are superior to some other kind of ad hoc spec that is included in comments or prose or whatever.

I think it would be great if more JS code examples started including type signatures, since they are useful whether or not you program in TS. But yes, there should probably also be a JS-only example in that case and/or a way to toggle them off.

hackerbob · on March 24, 2019

We've been throw this game before: CoffeeScript

Until TS becomes an official standard supported by ECMAScript out of the box. Were just going along with what feels or looks good and the javascript community has proven that can change from year to year.

danenania · on March 25, 2019

Well, I was huge fan of CoffeeScript, so you’ve got me there. I don’t regret a single one of the (many) times I used it though. It made me and the teams I was part of significantly more productive and allowed us to produce more concise and maintainable code than we could have with plain JS. I feel the same about TypeScript today except honestly the productivity gains are a lot more dramatic.

Yaggo · on March 25, 2019

As a former coffeescript fan, doesn’t the extra “clutter” in typescript syntax bother you?

I like CS very much myself. While I can see the benefit of having type system (in certain projects), I just can’t get over the aesthetic/syntax issue.

true_religion · on March 25, 2019

For me, not really. If I want to use type annotation, I need to write the types somewhere in order to communicate it to future readers.

I would much rather do it via Typescripts annotations than JSDocs.

It's not clutter as it was deemed necessary by the writer (me).

If I don't want to use type annotations, I simply don't add them and Typescript does not force you to do so unless you tell it to.

danenania · on March 25, 2019

For me, most of the key CS features that made me reach for it all ended up in ES6. And while I also prefer CS indentation to braces, I don't see that as very important.

jhall1468 · on March 25, 2019

I mean, that requirement (supported by ECMAScript out of the box) means you can't even use babel on the front-end since stage 4 isn't necessarily supported until the next release.

Types aren't coming to ECMAScript anytime soon and with flow and TS it's probably time to get used to the syntax.

kbenson · on March 24, 2019

Probably because it adds extra cognitive overhead for anyone that doesn't know TypeScript as they interpret and translate TypeScript into JavaScript in their head? Or perhaps because it's slightly misleading?

GordonS · on March 24, 2019

As a staunch supporter of static typing, I agree with you - the target audience is very obviously those from a JavaScript background, so it makes sense for the code snippets to be in JavaScript, not Typescript. If the author wanted to add additional info for context, IMO in this instance they should have used a JSDoc comment.

hombre_fatal · on March 25, 2019

Eh, I'd see your point if it was written with ReasonML, but these comments seem ridiculous to me. For example, did you really grapple with the example code? Or are you just assuming everyone is too stupid to understand some inline, ignorable annotations in what remains to be Javascript?

danenania · on March 25, 2019

TypeScript is a more expressive, concise, intuitive, and popular version of jsdoc that can be statically enforced by a compiler, provide autocompletion, and will never get out of date. I just don’t see the downside of using TS in place of jsdoc—the syntax is even pretty comparable.

com2kid · on March 24, 2019

I wasn't a huge fan of the "single thread for everything" paradigm, then I helped design a single threaded cooperatively multitasked embedded OS in C/C++.

No overhead from context switches. Everyone who is running knows how long they have to run, and knows that they are NOT going to get interrupted. No need to worry about locks or how to share data. Want to pass data to another module? Just pass it through well defined interfaces and you darn well know there will never be a read/write conflict, and the next time that module's code runs, it'll have access to that data. (No queue!)

The only exception, and what made it all possible, was the interrupt routines from hardware[1]. Anyone who subscribed to hardware events (entire OS was subscription based, no polling reads ever) had to implement a "thunk pattern" to put data into a receive buffer. The design pattern to do this was the same everywhere in all modules, making code understandable across the entire project.

It is an incredibly freeing paradigm to write in. It becomes so much easier to prove[2] the correctness of code when you can read all the code straight through and not have to ever worry about someone stomping on your data.

It wasn't an RTOS, but even so, it becomes really easy to start providing performance guarantees.

Internal builds had a watchdog timer[3] that would crash the device if it wasn't 'kicked' every so often. Set that to 3ms, start working with the code, and the stack traces tell you instantly who is over their CPU budget. Rewrite code and break it apart into multiple chunks that are scheduled for later execution, repeat until everyone is under their CPU allotment.

For many tasks, single threaded code is nice. Getting rid of preemption is even nicer.

Preemptive multithreading is a compromise. It means that no one thread/process can bring down the system by hogging 100% of resources, but it also creates a huge overhead where important work, work that makes for a better user experience, can (will!) get interrupted for work that honestly doesn't need to be done right now.

The solution to this is just throw so much CPU at the problem that everything gets done in a reasonable amount of time. It has often been noted that "reasonable amount of time" means systems today are less responsive than a 486 running DOS from 1992.

Of course it isn't reasonable to have a modern cooperatively multithreaded consumer OS, no way would the hundreds of processes ran at any one time all cooperate with each other.

But if you ever get a chance to write code on a single threaded cooperative system, go do it. It is a lot of fun.

Now all this meant that going from embedded C to NodeJS wasn't that large of a mental leap! Not having to directly read bytes off the wire was weird (seriously, took a bit of getting use to), but it turns out that "get data, do work, schedule what needs to be done, return early" ends up being the same paradigm at both the top and the bottom of the programming stacks!

[1] All I/O was done using DMA engines, basically a fancy limited programmable piece of hardware that can read and write to all the different pieces of HW hanging off of the main chip, so for example as Bluetooth packets come in, the DMA engine shoves the packets into a buffer and when the buffer is full it raises an interrupt that lets the CPU know that data is waiting. It looks almost exactly like async I/O in any of the modern programming languages, except you have direct access to all that IO being hardware offloaded. Writing to the bare metal rocks.

[2] For a reasonable enough degree of "prove" that software is reliable and doesn't crash from threading issues

[3] A watchdog timer is a physical timer hooked across the power lines of your chip. If it isn't activated every so often (in the industry this is called "kicking the watchdog") it will, in debug builds it does a controlled crash of the CPU, and in retail builds it will reset the entire system. If you've ever had an embedded device reset itself before your very eyes, it is becomes the CPU got locked up and no one kicked the watchdog, so the entire system emergency reset itself.

caprese · on March 24, 2019

It took a little re-learning for me, since the aughts had so much furor and exuberance for multithreaded possibilities, as a holdover from using the newly abundant computational resources as efficiently as possible.

The reality is, or has become, that the abundance of computational resources has resulted in computers being able to be created in the aether spontaneously, groups of computers even. Making them do one thing in a consistent time span is more important than filling up the theoretical limitation of their allocated resources.

Computer in this context being the elements that allow for computation, as much as an abacus or a professional human has been called a computer. Since many times these compute instances reside solely in the memory of a host machine, somewhere up the chain are the machines made of metal.

A cluster of node processes, of which have their own external memory store and their own database, communicating via REST, is great. The distinction between programming languages being moved into further into irrelevancy, with people that haven't re-learned this crying discrimination, now that their once-useful elitist gatekeeping only serves to remind everyone else how out of touch this person is.

It is fascinating how many computers we actually use now. When factoring in the CDNs over top of our redundant clusters containing single threaded processes with an occasional worker.

tonyarkles · on March 24, 2019

Now and then I get to work in systems like this, and you're right. It's a joy! And thank you, I really like the idea of setting the watchdog to really small timeouts to put upper bounds on execution time in debug builds. That's a fantastically clever use of it!

For the record, I much prefer the term "pet the watchdog" :).

com2kid · on March 24, 2019

>For the record, I much prefer the term "pet the watchdog" :).

I think the EEs who setup the watchdog were far too grizzled. :D

tonyarkles · on March 24, 2019

Hah! I've only got a little bit of grey in my beard so far...

striking · on March 24, 2019

Some others say "feed the watchdog", which to me makes the most sense.

imtringued · on March 25, 2019

>The solution to this is just throw so much CPU at the problem that everything gets done in a reasonable amount of time.

Erlang exploits this fact. It is not possible to use iteration, only recursion. This means it is not possible to block the cooperative scheduler unless you have a single incredibly long function that spans several thousands of lines of code. On the other hand calling into C code directly can still bring the cooperative scheduler down which is why Erlang code generally uses a separate process to run C code.

amelius · on March 25, 2019

> a single threaded cooperatively multitasked embedded OS in C/C++.

It's good to remember that Windows 3.1 was essentially such a system. Apparently, for some reason the paradigm wasn't good enough.

imtringued · on March 25, 2019

If it wasn't good enough why do NodeJS, Golang, Erlang, Ponylang and the commenter above still use it? The reason is OS level process scheduling has different requirements than task scheduling within a single application. The compromise is significantly lower efficiency in exchange for not letting rouge processes lock up the system. The poster child here is Erlang, which does a context switch on every function call because it has to switch to the receiving actor. Can your OS scheduler keep up with that?

amelius · on March 25, 2019

I wouldn't call Erlang's model "cooperative multitasking", since you can't make the assumption that there will be no context switch without giving up calling functions. In other words, it's cooperative, but under the enormous pressure that if you make a function call, you have to give up the thread of execution.

com2kid · on March 25, 2019

If you control everything that runs on the system, it works just fine as an execution model.

It works much less well when random apps are free to take ownership of the CPU and never release it. :)

amelius · on March 25, 2019

> Preemptive multithreading is a compromise. It means that no one thread/process can bring down the system by hogging 100% of resources, but it also creates a huge overhead where important work, work that makes for a better user experience, can (will!) get interrupted for work that honestly doesn't need to be done right now.

This can be addressed by simply giving a higher priority to threads that do UI work. I think my phone works that way, because the responsiveness is pretty amazing.

com2kid · on March 25, 2019

Your phone has 4 processors working at over 1ghz and a GPU also working at stupid speeds.

A single threaded system going all out, with 0 layers of abstraction, can provide a responsive system with a couple hundred megahertz and a dumb display buffer.

Heck at lower resolutions you can get by with under 100mhz and you'll get by just fine.

As an extreme example of this, look at the old video game consoles. Their CPUs ran in the single digit mhz range, but some of them were able to provide frame perfect controls!

This was because they had a code loop that looked like

1. Get inputs 2. run game logic 3. Render graphics 4. Output graphics to screen

Compare that to now days where touch screen latency on some Android phones can be over 100ms! Heck best of class touch on iOS/Android and you'll be at 40-60ms.

Of course the problem with fixed game loops like that is they aren't very flexible. :) My phone needs to do a lot more than render low res graphics to the screen and handle input.

Eventually Android threw enough CPU at the problem that their phones are responsive, but... it took awhile.

Trisell · on March 24, 2019

Honest question. Are we going to be able to pass functions with data to threads? Or are these only designed to use files? Which makes this feel like a better wrapper on the child process api that already exists.

ryanpetrich · on March 24, 2019

The code to run in the worker is passed as a string or a path, so no data can be captured. Data can be sent to the worker by posting messages, but the data is never shared—only copied. A limited set of types can be transferred to a worker, after which point they will become unusable in the parent.

11235813213455 · on March 24, 2019

For sharing (byte) data, I think there is SharedArrayBuffer https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

But else, yes, the sent and received data is serialized (when using ipc, else you can also send raw data through streams, and handle the serialization yourself)

rndgermandude · on March 24, 2019

SharedArrayBuffer -> shared

ArrayBuffer -> ownership can be transferred around, but the data can only be owned by one thread at a time. The data itself isn't copied. Passing around ArrayBuffers back and forth is good enough for a lot of stuff. In a worker you can then put a node Buffer, TypedArray, or DataView around the ArrayBuffer again, if you want.

ilaksh · on March 25, 2019

Supposedly SharedArrayBuffer and atomics have been in Node.js since version 9.

I wonder if there are any articles where people use them in threads.

primq · on March 24, 2019

memory sharing is explained in the article.

snek · on March 24, 2019

There has been some light discussion in the standards bodies about this, and so far the write-up looks like this: https://github.com/domenic/proposal-blocks

Scarbutt · on March 24, 2019

With the javascript community fixing most of it shortcomings, I suspect in a few years JS/TS is going to be way far ahead of any programming language in term of users.

GordonS · on March 24, 2019

Interesting that you say JS/TS - while Typescript of course transpile to JavaScript, the 2 languages take very different approaches to something that really divides devs - the question of static vs dynamic typing.

mynegation · on March 25, 2019

Exactly. Together, this pair of languages will appeal to a broader range of developers - those preferring dynamic typing, those preferring static typic, and pragmatists who start the quick prototype in JS and move to TS for better maintainability.

Scarbutt · on March 25, 2019

Added TS there because they share the runtime and the ecosystem in an almost seamless way, TS users use JS libs and vice versa. TS is being forced on JS users in indirect ways, so JS users will need to be able to at least read TS at some point. Surprisingly, npm inc states that 46% of npm users are already using TS.

sbhn · on March 25, 2019

A good video on parallel processing CPU bound taks with nodejs. https://youtu.be/ZYfSe9qKaZE

dboreham · on March 24, 2019

Haven't read the article but I'm assuming it's an empty page?

<ducks>

hombre_fatal · on March 24, 2019

benatkin · on March 24, 2019

It is empty of the kind of threads where you don’t have to worry about promises (with or without async/await) or callbacks.