
Node.js: Cluster vs. Async - skazka16
http://synsem.com/SyncNotAsync/
======
onestone
This article is wrong on so many levels...

\- It implies that async execution is equivalent to callback hell. In reality
there are excellent ways to have async code which looks just like sync
(generators, async/await).

\- It benchmarks multi-core (sync) vs single-core (async) and makes claims
based on the results.

\- It presents async execution as an antipode of clustering. In reality it's a
best practice to make use of both.

...and everything that follows is just irrelevant.

~~~
rajeevk
There are so many statements in the article that makes me laugh

> async cons: Adds latency with parallel overhead

What parallel? This should be cons for multi threaded running on single core
cpu

> async pros: Callbacks help enforce error-prone synchronization

How does callbacks help in error prone synchronization? It is not callback, it
is single threaded model that does.

------
eldude
I don't understand these strawmans. The virtue of the asynchronous programming
model is low-memory overhead as compared to threads AND low latency for IO-
bound tasks in highly concurrent scenarios.

Request per-process/thread has all the same memory overhead implication it has
always had. It's almost like the author is ignorant of the reason for node.js'
success or why it was built in the first place.

Also, "callback hell" is just FUD. Nobody who does this for a living and knows
what they're doing really has an issue with this. Promises solve the
unreliability issues, and async/await solves the syntax complexity issues.

I'd like to see this same analysis for 1000 req concurrency measuring memory
overhead and using async/await for code comparison. Cooperative multitasking
will always be capable of lower latency when you know what you're doing, and
async programming is lightyears simpler than multi-threaded programming.

~~~
tracker1
I'd say at least 10K simultaneous requests on a single instance with ease, let
alone several. Just a simple echo web-server...
[http://localhost/foo](http://localhost/foo) => "hello foo" ...

Launching that many threads will quickly hit bottlenecks on most systems...
I've seen this happen in a poorly written simulation server (each actor had
it's own work thread)... the server would freeze up randomly with only a
handful of connections in a test scenario... changing to an event-loop using
an async thread-pool resolved most of these issues (this was before node).

------
kellros
Edit: The whole idea behind clustering is to run an application instance per
thread/core for better performance and load balance requests between the
application instances. This article seems absurd in its intention to force you
to choose between multi-threaded synchronous application instances or a single
application instance using callbacks.

We've been running a Koa.js API server using Cluster in production for over a
year now with no hiccups (on a Windows machine).

I've been thinking about making the switch to iisnode, as it handles
clustering, graceful shutdown and zero-downtime from within IIS (and does a
couple of other things). It uses named pipes to proxy connections and also
supports web sockets among other things.

With the nodeProcessCommandLine configuration setting, you can pass parameters
to node (e.g. --harmony), use babel-node or io.js.

See:
[http://www.hanselman.com/blog/InstallingAndRunningNodejsAppl...](http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx)

A blog post I wrote a while ago: [https://shelakel.co.za/hosting-almost-any-
application-on-iis...](https://shelakel.co.za/hosting-almost-any-application-
on-iis-via-iis-node/)

------
crueber
Hogwash. It seems like this person doesn't understand that node is an event
based, asynchronous language, and that's one of its big advantages over other
languages that force threads, or don't generally offer parallel execution.

If this had compared Node.js with clustering and async vs a synchronous
language like Ruby, it might have been interesting. Maybe. But non-
asynchronous operations in Node are an antipattern that core node contribs are
trying to remove (the -Sync in node stdlib are trying to be removed).

Good coding conventions, promises, generators, and async/await are your
friends for making callback hell go away.

~~~
thomasfoster96
Indeed, I've noticed that quite a few -Sync methods aren't documented anymore.
My hope is that node will have a major release which removes all -Sync methods
and replaces callbacks with Promises. Then we'll all be happy _and_ use
cluster when we can.

~~~
greggman
No please! More sync function please!

Yes, if I'm writing a web server I want everything to be async. But, just as I
want to share JS on the server and the client I also want to use JS as my
build language. At least for me, I find it much faster to build using a sync
style.

Maybe I just haven't learned the async way but for example I tried to make a
build system using node. I needed to spawn out to a builder to build some
stuff, copy files, spawn git in various ways to see if repos are dirty or
clean, git add, git commit, and a few other things. I found it a massive
nightmare and after 2 days I switched to synchronous python for my building.
Was done in 2 hours.

If there's some articles or tips that will make me as comfortable at async for
building as I'm as sync in python then please point me at them. But, for
whatever reason, I'm struggling with async for really complex tasks. (and yes,
I'm using promises)

~~~
crueber
If you truly need to do sync programming for whatever reason, you probably
aren't using the right language. It's not just web programming that benefits
from asynchrony, it's pretty much any non-trivial algorithm. Sure, it can
require a bit more code to do right, but you don't want to block all your
other background events from firing just because you have to write a bit more
code.

If you really prefer synchronous style, and don't mind the drawbacks, there
are plenty of languages out there that'll work better for you. PHP, Ruby, and
Python readily come to mind.

~~~
greggman
Yes, but a big draw of using node at all is that it's JavaScript. I write some
function "templateThisString" and I can use it on both the server and the
client. Now I also want to use it in my build process. If I switch to another
language I have double the work.

------
bungle
OpenResty does something similar. Code can be written synchronously, but all
the network io for example happens in a non-blocking manner. The code still
looks like synchronous, though - without call back hell. This doesn't come
without issues as you need to change libraries to use OpenResty (Nginx)
network primitives. Overall it is one of the nicest platforms I have worked
with. A great webserver (Nginx) that can be programmed with a great language
(Lua + LuaJIT).

At Nginx conf the Nginx developers where showing interest to bring Javascript
to the platform. They said that they will take similar approach that OpenResty
uses (aka no callback hell).

~~~
vbernat
OpenResty is an evented model too. Coroutines are used to avoid using
callbacks.

------
pmalynin
I recently started programming more using promises and I can say that I am
very satisfied with the way it does away with the callback hell problem.

Instead of taking a callback, a function returns a promise, which can be
daisy-chained to do work.

Ex:

file.read().then(console.log);

Or using the example in the article:

var a = fileA.read();

var b = fileB.read();

promise.all([a,b]).then((data) => console.log(data[0], data[1]));

~~~
iamstef
just wait till you try generators + async/await, it can be a lovely
experience. No callbacks, and working try/catch.

~~~
LePetitDev
+1 for generators. As a former PHP developer, I jumped into using generators
that return promises in io.js (as well as es6 classes), and the code reads
very much like PHP code, except all of the i/o is now asynchronous.

~~~
esailija
There is no benefit of io being async in itself until you have many users. The
immediate and more accessible benefit is speeding up individual requests due
to the ease at which you can perfom io in parallel. But if you just sprinkle
await/yield everywhere (which everyone unfortunately does), you don't even get
this benefit.

~~~
brentburgoyne
Sure you can.

    
    
      let [a, b] = await Promise.all([asyncA(), asyncB()])

~~~
esailija
I am referring to usage of generators without promises (or rather code that
uses generators in a way that it wouldnt matter if promises or thunks were
used). And even then I didn't say that you couldn't, even when using promises
and generators together most people make their code unneceasarily sequential.

------
DonPellegrino
Why is this even being upvoted? it confuses important concepts and what isn't
wrong is irrelevant.

------
bhouston
We ([http://Clara.io](http://Clara.io)) run multiple NodeJS instances per
machine and our code base is Async. I believe this gives us the best of both
worlds.

Also Sync versions of calls in NodeJS are likely going to be deprecated, thus
this won't even be possible in NodeJS going forward.

~~~
egeozcan
Sync calls wouldn't be deprecated IMHO as not everything utilizing Node.js is
an HTTP server (think of build scripts, background jobs, etc.).

~~~
pokpokpok
I often use it for ad hoc scripts, similar to the way many people use python

------
zjonsson
If you look more closely at the actual code, this exercise compares the
performance of readFileSync (an operation that deliberately blocks) on 1 core
vs 2 cores.

------
andrewmutz
It makes sense to have both event-based and non-event-based options for server
side javascript development.

OS-level multitasking won't be able to achieve the same level of concurrency,
but the simplicity and maintainability of the application will go up. The
right choice depends on the needs of the application, of course.

Both evented and non-evented approaches have their place, and most server-side
languages allow development with either approach: Ruby, Python, C, Java all
have solid options for evented and non-evented solutions.

~~~
bysin
> OS-level multitasking won't be able to achieve the same level of concurrency

Do you have a source for this claim? I've seen it repeated many times,
especially in the node.js community but I've yet to see any evidence to back
it up. From what I've read, a synchronous threaded model can be just as fast
as an event-based system [1].

[1]
[http://www.mailinator.com/tymaPaulMultithreaded.pdf](http://www.mailinator.com/tymaPaulMultithreaded.pdf)

~~~
ggreer
A big problem with one-thread-per-connection is that you open yourself to
slowloris-type DoS attacks.[1] Normal load (and even extreme load) is fine,
but a few malicious clients can use up all of your threads and take down your
server.

This is touched upon in the slides you linked to. On slide 62 (SMTP server) a
point says, "Server spends a lot of time waiting for the next command (like
many milliseconds)." A malicious client could send bytes very slowly, using up
a thread for a much longer period of time. If the client has an async
architecture, it can open multiple slow connections with little overhead. The
asymmetry in resource usage can be quite staggering.

1\.
[http://en.wikipedia.org/wiki/Slowloris_(software)](http://en.wikipedia.org/wiki/Slowloris_\(software\))

~~~
kentonv
You seem to be imagining a case where you only allocate a small fixed thread-
pool and when it runs out you just stop and wait. I think the slide deck is
advocating that you just keep allocating more threads.

~~~
ggreer
I'm talking about hitting OS or resource limits. Let's say a server is
configured to time-out requests after 2 minutes. A malicious client could do
something like...

Every second:

1\. Open 40 connections to the server.

2\. For all open connections, send one byte.

Repeat indefinitely.

Steady state would be reached at 4,800 open connections. At 1 byte of actual
data per second per connection, data plus TCP overhead would use around
200KB/s of bandwidth. The server would have to run 4,800 threads to handle
this load. Depending on memory usage per thread, this could exhaust the
server's RAM.

There are ways to mitigate this simple example attack, but the only way to
defend against more sophisticated variants is to break the one-thread-per-
connection relationship.

~~~
buster
What i am truely missing is a good benchmark and comparisons between async vs
sync. It seems true that everybody says that async is best but i don't see
much evidence. For example, how should 4800 threads exhaust the servers RAM
when the thread stack size can be as small as 48kB. That's a round 200MB of
memory.

I'm not saying that the threaded approach is better, but that almost everyone
comes around with some theoretical statement but nobody seems to care to find
hard evidence.

~~~
kentonv
You are right to distrust these claims. The reality is that threads can be
significantly faster than async -- async code has to do a lot of bookkeeping
and that bookkeeping has overhead. OTOH, threads have their own kind of
overhead that can also be bad.

The slide deck that bysin linked above is pretty good:

[http://www.mailinator.com/tymaPaulMultithreaded.pdf](http://www.mailinator.com/tymaPaulMultithreaded.pdf)

This is by Paul Tyma, who at the time worked on Google's Java infrastructure
team with Josh Bloch and other people who know what they're doing. Apparently
he found threads to be faster in a number of benchmarks.

Ultimately which is actually faster will always depend on your use case.
Unfortunately this means that general benchmarks aren't all that useful; you
need to benchmark _your_ system. And you aren't going to write your whole
system both ways in order to find out which is faster. So probably you should
just choose the style you're more comfortable with.

Async is kind of like libertarianism: It works pretty well in some cases,
pretty poorly in others, but it has a contingent of fans who think they've
discovered some magic solution to all problems and if you disagree then you
must just not understand and you need to be educated.

(Note: The code I've been writing lately is heavily async, FWIW.)

------
martin-adams
I've adapted the node-fibers library to write synchronous style code. It works
really well for my needs, but I do understand that my approach does litter the
function prototype which is not ideal.

Code looks like this:

    
    
      var sync = require('./sync');
      
      sync(function () {
        try {
          var result = someAsyncFunc1.callSync(this, 'param1', param2');
          if (result.something === true) {
            var result2 = someAsyncFunc2.callSync(this, 'param1');
          } else {
            var result2 = someAsyncFunc3.callSync(this, 'param1');
          }
          
          console.log(result2.message);      
        } catch (ex) {
          // One of them returned an err param in their callback
        }
      });
    

I haven't tested the performance, so no idea if it's a running like a dog.

~~~
btown
Meteor does something similar with Fibers, but you wrap each function ahead of
time, so you don't need to worry about littering Function.prototype.

[https://www.discovermeteor.com/blog/wrapping-npm-
packages/](https://www.discovermeteor.com/blog/wrapping-npm-packages/)

It feels a bit like "get that sync code off my Javascript lawn!" but once you
get used to it, it's pretty great in practice, and good for introducing
newcomers.

------
babby
>Cons: Larger memory footprint

The more annoying con is lack of shared memory. A single process can be much
less complex when it doesn't have to worry about messaging systems and off
process caching.

------
bcoates
This is unsuitable for real-world applications, where you will, inevitably,
need at least a little mutable shared state. Async handles this reasonably
well at a decent performance cost; shared-memory threads (near-certain
catastrophic failure) and database-only state (awful performance) do not.

The only real competition is transactional memory but it hasn't become
mainstream yet.

~~~
adamlett
I'm not sure what you're saying here exactly. _What_ is unsuitable for real-
world applications?

I will say this, though, regarding the need for shared, mutable state: You can
communicate by sharing memory, but you can also share memory by communicating.
This is I guess what you allude to with _database-only state_ , but it doesn't
need to be a database. It could just as easily be a memcached server or some
other fase key-value store.

------
tlrobinson
_" Asynchronous event-driven programming is at the heart of Node.js, however
it is also the root cause of Callback Hell."_

I'd argue the root cause is... callbacks.

Asynchronous programming can be done elegantly, in a synchronous style using
"async/await", originally (?) in C# [1], likely to be added to the next
version of JavaScript [2], also in Dart [2], Hack [4], and Python 3.5 [5]. It
can also be emulated in languages with coroutines/generators [6][7][8] (which
in turn can be implemented by a fairly simple transpiler [9][10])

This:

    
    
        function foo(a, callback) {
          bar(a, function(err, b) {
            if (err) {
              callback(err)
            } else {
               baz(b, function(err, c) {
                 if (err) {
                   callback(err)
                 } else {
                   // some more stuff
                   callback(null, d)
                 }
               })
            }
          })
        }
    

Becomes this:

    
    
        async function foo() {
          var a = await bar(a)
          var c = await baz(b)
          // some more stuff
          return d;
        }
    

And you'll see even greater improvements when using other constructs like
try/catch, conditionals, loops, etc.

[1] [https://msdn.microsoft.com/en-
us/library/hh191443.aspx](https://msdn.microsoft.com/en-
us/library/hh191443.aspx)

[2] [http://jakearchibald.com/2014/es7-async-
functions/](http://jakearchibald.com/2014/es7-async-functions/)

[3] [https://www.dartlang.org/articles/await-
async/](https://www.dartlang.org/articles/await-async/)

[4]
[http://docs.hhvm.com/manual/en/hack.async.asyncawait.php](http://docs.hhvm.com/manual/en/hack.async.asyncawait.php)

[5] [https://lwn.net/Articles/643786/](https://lwn.net/Articles/643786/)

[6]
[https://github.com/petkaantonov/bluebird/blob/master/API.md#...](https://github.com/petkaantonov/bluebird/blob/master/API.md#promisecoroutinegeneratorfunction-
generatorfunction---function)

[7] [https://github.com/kriskowal/q/tree/v1/examples/async-
genera...](https://github.com/kriskowal/q/tree/v1/examples/async-generators)

[8] [http://taskjs.org/](http://taskjs.org/)

[9] [https://babeljs.io/docs/learn-
es6/#generators](https://babeljs.io/docs/learn-es6/#generators)

[10]
[https://facebook.github.io/regenerator/](https://facebook.github.io/regenerator/)

~~~
tracker1
Not only the simplified code you write with async/await (using babeljs for
this today) ... There are a lot of complex processes that are easier to reason
against in JS over low level C.

Not to mention, that the article's example is only comparing a small piece of
work... A single node instance can easily manage 10K simultaneous requests...
launching a thread per request, and you're going to hit resource bottlenecks
at the CPU pretty quickly, compared to several node instances approaching a
million simultaneous connections on a single server. Node uses not only an
even loop, but a shared thread pool for isolation of work without blowing out
resource contention.

I've seen issues with even a few thousand threads in poorly written simulation
servers... going to an event-loop against a threadpool always worked out
better under true load.

------
meritt
This is great. I wonder how long til the node community "discovers" that using
a dedicated httpd and communicating over a standardized middleware (fcgi,
wsgi, rack, etc) is also a superior approach instead of handling http
directly.

~~~
tlrobinson
FastCGI, really?

And Node has the equivalent of WSGI and Rack: Connect/Express middleware.

------
KaiserPro
Now forgive my naivety, but isn't this just threads?

I understand that because if the funny scoping rules it means that threading
is actually surprisingly hard? surely you'd want more control over your
threaded event loops?

------
igl
Do your apps run single-threaded? Why wouldn't you cluster? In-memory state is
easily avoidable. Using in-memory sessions even prints a warning in express by
default.

------
bricss
Just add a pickle Promises

