
It ain't about the callbacks, it's about the flow control - majke
https://idea.popcount.org/2013-09-05-it-aint-about-the-callbacks/
======
crazygringo
> _It 's impossible to slow down the pace of accepting new incoming client
> connections._ > _A noisy client will be able to crash our server, as c.write
> is non-blocking_

Correct me if I'm wrong (I haven't done Node.js server programming), but what
do these have to do with the callback model? First, what does it mean to "slow
down the pace" of accepting connections? I'm assuming you could code a limit
to the number of connections, and then immediately drop new connections as
they're made? Or put them into a queue?

And what does c.write being non-blocking have to do with crashing the server?
Again, just program it so that we stop writing data if the buffer's gotten too
big.

Sure, it's more code to write, and the flow control is harder to write, but
even if you're doing everything synchronously, you've still got to manage all
that stuff. If every connection winds up on its own thread, you've got to
manage those threads. You've still got to worry about too much data coming in.

I completely fail to see, given the specific kind of flow control and resource
control the author wants, how the asynchronous model is harder than the
synchronous model. Maybe someone can enlighten me?

~~~
majke
> _First, what does it mean to "slow down the pace" of accepting connections?_

In C, if you don't call 'accept' new connections will:

\- first: get queued in a kernel queue of SYN+ACKed connections

\- second: when queue is full Linux stops sending SYN+ACK, thus forcing the
client to re-send SYN.

Not calling 'accept' pushes pressure back to the clients, without sending RST,
without introducing any state on the server.

> _and then immediately drop new connections as they 're made_

That's not the same as pushing back the pressure. Dropping connections means:
"go away, I'm broken". Not calling 'accept' means: "I'm busy right now, wait".

> _And what does c.write being non-blocking have to do with crashing the
> server?_

If 'write' is non-blocking _and_ always succeeds (as in Node) the node.js
process may be asked to 'write' plenty of data. As it can't send it to the
kernel, it'll get buffered in the node process, and eventually run out of
memory. This is the case if you don't have any flow-control mechanisms: you'll
accept all the requests and buffer in memory all the responses, possibly
leading to a crash.

> _but even if you 're doing everything synchronously, you've still got to
> manage all that stuff._

Yes indeed! But at least it's obvious that you're spawning too many threads,
or that all the threads are waiting on particular slow thing (disk for
example). The state still need to be handled, but it's much easier to reason
about it! (it's easier to count threads / coroutines than to count how many
times a callback X was called and haven't yet called back the next callback on
the path)

> _Maybe someone can enlighten me?_

I wanted to avoid the discussion about credit based flow control. It's
possible to write flow-control aware code using callbacks, but it's generally
harder. And requires a programmer to be aware of the problem.

Also, IMO flow control in callbacks style introduce spaghetti, as after the
last thing ('write') succeeds you need to inform the first thing ('read') to
proceed. But that's aesthetics and another discussion.

~~~
crazygringo
Thanks for replying. I still don't ultimately see that much of a difference --
if the Node server wants to accept the connections in a queue instead of
disconnecting them (but not turn on echoing yet), that's still pretty easy to
do with a couple of lines of code.

And looking up socket.write, Node actually tells you with the return value, if
the data was written immediately or queued in user memory, plus there's a
callback when it gets written. So you certainly do have flow-control
mechanisms.

In the end, maybe it takes a few more lines of code with Node (since you have
to take care of the queues yourself), but you're also getting a lot of
flexibility in handling things exactly the way you want, it seems.

~~~
majke
> _if the Node server wants to accept the connections in a queue instead of
> disconnecting them, that 's still pretty easy to do_

But then the queued connections are using file descriptors, and this is a
limited resource, right? In real world sometimes not being able to handle all
the traffic is an error condition, and disconnecting the client is okay. In
other circumstances you'd like to be able to slow down the clients if the
server can't cope. I claim this particular API in node can't express the
latter.

> _So you certainly do have flow-control mechanisms._

Yes you do! And Streams API is a great attempt to unify flow control
mechanisms. The problem is: those mechanisms are a secondary citizen, invented
as node matured. And in fact, they are pretty complex, see the documentation
of Streams. I think most node.js users can't be bothered with this - and
that's the point: flow control requires thinking in node, and you get it for
free if using "blocking" coroutines / threads.

> _maybe it takes a few more lines of code with Node_

Agreed. A decent programmer can write a good program in any paradigm. But I
claim that callback API's require more thought to use correctly (due to flow
control). Thus, most users will write poor programs using callbacks.

~~~
aaronblohowiak
File descriptors and RAM for the buffers.

------
swannodette
If you find this post interesting I strongly recommend looking at how Go,
Clojure's core.async, and similar CSP systems attempt to address these issues.

One big advantage of a good CSP model is that you can express push/pull
sensibly because you have a (buffered) channel abstraction between
communicating processes.

~~~
doublec
"Concepts, Techniques, and Models of Computer Programming" also covers flow
control using similar methods IIRC. They use the dataflow variables of Oz
which act like channels.

------
nonchalance
> ... but it becomes an issue when a platform forces you to use only the
> callback style. I'm talking about you, JavaScript.

JS definitely doesn't force you to use the callback style. NodeJS HTTP module
(the culprit in the discussion) does, but JS as a language doesn't.

~~~
camus
Explain me how to do any async operation without a callback in ECMASCRIPT 5 ?
You just cant. Callback is the only way to do async in JS <= ES5 .

~~~
nonchalance
Read the article again. The statement, in context, has nothing to do with
async. The author is claiming that JavaScript forces users to only use the
callback style, but that's not true (especially given the last few paragraphs,
where the author explicitly points out synchronous code).

The reality is that you could build up a C-style flow (loops with
select/poll/epoll/...) in JS if a javascript platform exposed those primitives
(you would need something slightly lower than libuv)

~~~
majke
If I understand correctly you're saying that JavaScript is a neutral language.
It's only a coincidence that in Browsers and in Node.js it heavily relies on
the callback style. As a language itself, it could be synchronous.

Well, my friend, with this logic I claim that C is purely functional (it's
only a coincidence that some implementations use rw memory) and there's
nothing stopping Lisp from having mutable data.

~~~
kevingadd
Unfortunately such a claim is false. The event loop and callback oriented
asynchrony model is baked into the ES spec itself; future ES features rely on
it explicitly.

~~~
esailija
You are probably confusing ES spec with DOM spec, if not, at least I cannot
find anything in ES spec even remotely related to how anything built on top
would have to be asynchronous.

~~~
kevingadd
The concept of an event loop and event loop 'turns' is backed into ES promises
the last time I read the draft spec and kept up with the spec discussions.
It's not something you can polyfill in older versions of JS, it's a platform
feature designed around how browsers work.

~~~
mook
Hmm, do you recall where the ES promises spec lives? I can't find it in the
ES6 draft (
[http://people.mozilla.org/~jorendorff/es6-draft.html](http://people.mozilla.org/~jorendorff/es6-draft.html)
), but it's in the DOM spec (
[http://dom.spec.whatwg.org/#promises](http://dom.spec.whatwg.org/#promises)
). ES6 does end up having iterators, which does look useful for promises,
though... see task.js.

------
lsb
Interestingly, the ease of reasoning about flow control when pulling data from
a source, versus pushing it to a destination, is one thing that can make
reasoning about Haskell space usage particularly tricky.

~~~
banachtarski
As a person who loves Haskell, I generally agree. Because it's about as
declarative as you can get, you have no insight into the execution model. In
particularly, following the lazy/strict chain is hard (it's occasionally hard,
especially in multithreaded scenarios to know what statement will be fully
evaluated, or just "thunk"). The hope is, of course, that the compiler will
get so good that it can leverage the purity of the code to do what you want as
fast as possible (it's a throughput language after all).

Note, however, that Haskell can easily solve the problem mentioned in this
article, and you have several ways of doing it.

------
olegp
If you want to get some of the benefits mentioned in the article with Node,
check out my Common Node projects which uses fibers to emulate threads and a
synchronous environment: [https://github.com/olegp/common-
node](https://github.com/olegp/common-node)

Here's a slightly more advanced version of a Telnet server written for Common
Node: [https://github.com/olegp/common-
node/blob/master/examples/ch...](https://github.com/olegp/common-
node/blob/master/examples/chat.js) \- it should be pretty straightforward to
add the logic necessary to throttle the rate at which new connections are
accepted.

------
undoware
As a node junkie and LiveScript apologist, I'm surprised I hadn't heard these
complaints against callbacks before -- fascinating stuff. I might be doing
more things in Go from now on. ;)

------
Sidnicious
I'm working on a library for C++ called Team where this kind of flow control
is possible. At its core it's got coroutines, so you can write code in a
blocking style.

The current (early!) socket APIs can all be called with a callback (which act
like Node and call it as fast as possible), or without one (in which case they
block). Here's an echo server that accepts connections as fast as possible,
but does blocking reads and writes so only one chunk per client gets buffered
at a time:

[https://github.com/Sidnicious/team/blob/b157ea2c58177a2587c9...](https://github.com/Sidnicious/team/blob/b157ea2c58177a2587c9735f855cfbd15080d2f2/examples/echo_server.cpp)

The callbacks-as-fast-as-possible version is just implemented on top of the
blocking one, so it'd be trivial to add flow control via a blocking callback
(or by calling `accept()` in blocking style and making your own decision about
when to call it again):

[https://github.com/Sidnicious/team/blob/b157ea2c58177a2587c9...](https://github.com/Sidnicious/team/blob/b157ea2c58177a2587c9735f855cfbd15080d2f2/team/net.h#L119-125)

------
mtdewcmu
> A noisy client will be able to crash our server, as c.write is non-blocking
> and will buffer an infinite amount of data in memory (this can be partially
> solved with a node.js Stream API, more on that later).

This doesn't make any sense at all.

First, c.write would not buffer what the other server sends. It only takes
what you give it. Maybe that was a typo.

Second, non-blocking and buffering are practically in perfect opposition to
each other. For a system call to buffer an indefinite amount, it would have to
block. That's exactly what node was designed to never do. node never buffers
anything -- not when I tried it, at least. If the client sent a long stream of
data, node would call the read callback every time the kernel received a
packet or a kernel buffer filled up. c.write() would return immediately every
time, and a return packet would get sent out. Nothing bad would happen.

------
Yaggo
This is not exactly my area of speciality, but could you use iptables to limit
incoming connections to your Node.js port?

[http://www.cyberciti.biz/tips/howto-limit-linux-syn-
attacks....](http://www.cyberciti.biz/tips/howto-limit-linux-syn-attacks.html)

------
bcoates
So the issue here is that Node needs backpressure on events as well as pipes,
right?

------
benatkin
One attempt to solve the problem, just by being able to accept more
connections: [https://github.com/lloyd/node-
toobusy](https://github.com/lloyd/node-toobusy)

------
dccoolgai
Interesting article... my 2c is that you might want try a library like
async.js, that handles asynchronous code with explicity flow-control based
structure. Async has throttling (
[https://github.com/caolan/async#queue](https://github.com/caolan/async#queue)
) and all kinds of things built into it that can address some of those
things.. for other things where you are worried about buffers, you could use
things like RabbitMQ or AWS-SQS to mitigate situations where you have mismatch
in push/pull.

~~~
ef4
async.js still suffers the lack of real flow control that the article is
talking about. In fact that queue you linked to is a perfect example of a
naive implementation with no back pressure at all.

The question is: how do you guarantee that your queue never grows beyond a
certain size? The library doesn't even attempt to help with that. And anything
that you implement yourself would still suffer the race condition mentioned in
the article.

~~~
andrewvc
Agrred. I want to add that this is why systems engineering is its own thing.
Application developers seldom understand what's going on at the copying bytes
around/syscall level.

~~~
ef4
People sometimes mock the idea of "rockstar" developers, but this is one of
the things that separates the merely good from the great.

A developer who can jump up or down four levels in the technology stack with
ease is an entirely different species, when compared with someone who lives at
one level.

All abstractions leak.

------
susi22
Interesting article. Though I think in most serious production environments
there is an NGINX or varnish in front of node.js which handles all these cases
(and more)

------
IsaacSchlueter
Marek,

Streams return `false` when a write will buffer past a configurable
highWaterMark. The first hand-rolled `on('data', write)` pipe doesn't take
this into consideration, and so yes, backpressure is not handled. `r.pipe(w)`
does the right thing here.

The single "extra read" that you're seeing is just filling up to that
configureable highWaterMark, which is an intentional feature. In the real
world, connections are often of extremely variable speeds. Consider sending
data from a database on localhost to a client on a 3G or 4G network. The
mobile connection is extremely bursty, but with a high max speed and periods
of very high latency. The database connection is extremely steady, but with a
slower max throughput because of hard disk latency. In that case, you
absolutely don't want to miss a chance to send data to the mobile client
during a burst, so the ideal default approach is for Node to smooth out those
highs and lows by buffering a small amount of data in memory. We don't
consider 64KB to be a large amount for most purposes, but as I mentioned, it
is configurable.

There is no way to pause the accept call, it's true. We've considered adding
that feature, but no one has ever asked for it. Perhaps if you explain your
use case in a github issue, we could do that. You can `server.close()` but
that also unbinds, so clients get an ECONNREFUSED. Except in the cluster use-
case, bind() and accept() are typically very tied to one another. It wouldn't
be too hard to expose them separately, but like I said, no one's ever asked.
If your complaint is that we haven't implemented a feature that no one's ever
asked for, well, ok, that's just not how we do things, so maybe it's just a
cultural difference in our approaches to creating software, I don't know.

    
    
        First, I believe most node.js programmers (including myself)
        don't understand Streams and just don't implement the Stream
        interfaces correctly.
    

Ok, well, there's not really any excuse for that any more. They're thoroughly
documented, base classes are provided, there are blogs and examples all over
the place. Maybe start with
[http://api.nodejs.org/stream.html](http://api.nodejs.org/stream.html) and if
you have questions that aren't answered, complain about it at
[https://github.com/joyent/node/issues](https://github.com/joyent/node/issues)
and mention `@isaacs` in the issue.

It's literally a single method that you have to override to implement a well-
behaved Readable, Writable, or Transform stream.

    
    
        But even if Streams were properly implemented everywhere
        the API suffers a race condition: it's possible to get
        plenty of data before the writer reacts and stops the reader.
    

This is not true. The Writable stream object has a highWaterMark. Once that
much data is buffered in memory, it starts returning `false` to its consumers.
If you'd like to set that to 0, go right ahead. It will return `true` only if
the data is immediately consumed by the underlying system. This doesn't happen
"some time in the future". It happens at the first `write()` call that pushes
the buffer over the high water mark. The example you describe is quite easy to
simulate with setTimeout and the like. Perhaps you could post a bug if it
behaves in a way that is problematic?

I have a hard time sussing out what you're actually complaining about in this
article. You certainly seem upset about some things node does, but I can't
figure out exactly what's bugging you. Is it the inability to delay accept()
calls? Is it callbacks? Is it streams? Is it non-blocking IO as such?

Streams aren't really a "callback based" API as much as an event-based one,
and actually, a more strictly callback-based stream API would be quite a bit
_easier_ to get right, in my opinion, with much less ceremony:
[http://lists.w3.org/Archives/Public/public-
webapps/2013JulSe...](http://lists.w3.org/Archives/Public/public-
webapps/2013JulSep/0355.html)

A similar approach could be taken to the listen/accept stuff you write about.
`server.accept(function(conn) {})` and then call accept() again when you're
ready for another one. A convenience method can then be trivially layered on
top of that to call accept() repeatedly:

    
    
        Server.prototype.listen = function(cb) {
          this.accept(function onconn(conn) {
            cb(conn);
            this.accept(onconn);
          });
        };
    
    

I could be wrong, but I suspect, at the root, the cause of your distaste with
Node is actually EventEmitters, rather than any of that other stuff you
mention. And if so, I agree 100%. The "evented" part of Node is a mistake
which can only be properly appreciated with the benefit of hindsight. It's too
late to easily change now, of course, and so that's the design constraint we
were faced with in building streams2 and streams3. But I think that platforms
of the future should avoid this landmine.

Fair warning: I'm going to be offline first for NodeConf and then for
vacation, for the next several weeks, so this is is a bit of a hit-and-run
comment. Feel free to reply to i@izs.me or post issues on the Node.js github
page. I probably won't see replies here.

