

An Introduction to Node's New Streams - calvinfo
http://calv.info/an-introduction-to-nodes-new-streams/

======
rgarcia
The new node streams API is nice, but for processing a lot of data like this
we've found them unsuitable for a few reasons:

1) objectMode is considered an abomination by most of node-core[1], so if
you're putting anything other than a string or a buffer through a stream,
you're "doing it wrong" (I disagree, but I also am not a node maintainer).

2) If you want to process data in parallel you're out of luck, since Writable
handles writes one at a time. We've created some workarounds but I'm not super
happy with them [2].

3) Once you jerry-rig Writable/Transform to run in parallel, you're stuck with
one core. We've also created a workaround for this [3].

We've started moving towards go's channels because of all these issues.

[1]
[https://github.com/joyent/node/pull/4835](https://github.com/joyent/node/pull/4835)

[2] [https://github.com/Clever/writable-stream-
parallel](https://github.com/Clever/writable-stream-parallel)
[https://github.com/clever/understream](https://github.com/clever/understream)

[3] [https://github.com/Clever/async-
forkqueue](https://github.com/Clever/async-forkqueue)

~~~
casual_slacker
As per questions 2 and 3, besides the naturally single-threaded nature of
node, is that having parallel producers requires a locking mechanism, and
parallel consumers doesn't really make sense with these types of streams.

How do you avoid these issues with go?

~~~
rgarcia
In many stream pipelines you don't care about the order of the data. You just
have a bunch of unordered input data that you need to transform as fast as
possible.

Go's channels [1] let you essentially construct loosely-typed streams. The
channel's type lets you be as opinionated as you want about what
should/shouldn't be passed within the "stream" (interface{}, []byte, string,
MyAwesomeType, etc.). If you use buffered channels you have "backpressure".
You can use > 1 core by using goroutines.

[1] [http://golangtutorials.blogspot.com/2011/06/channels-in-
go.h...](http://golangtutorials.blogspot.com/2011/06/channels-in-go.html)

------
bcoates
I love the new streams API. A 0.8 based Node project I had had to have a
messy, callback-filled ad-hoc flow control system because advisory I/O pauses
weren't enough to give bounded memory usage (in practice, not just in theory)
and the general lameness of the old streams meant I minimized their use.

I was able to re-write the I/O pipeline into a handful of Streams2 transforms,
I was able to ditch a few hundred lines of ugly code and what's left is vastly
less complicated.

I agree with the author that it's kind of lame that objectMode streams appear
to be second-class, I guess the Node people think everything should be a raw
data pipeline annotated with events for a data channel, like 'npm tar' does? I
can't quite wrap my brain around the object-hate.

------
mtdewcmu
I came upon node around two years ago, wrote a little project to play with it,
and encountered its streams. I wasn't prepared for how low-level their
behavior was. It was almost like being inside the kernel, except with a veneer
of Javascript. If you had a fairly large amount of data to write out, and you
wrote it too rapidly with the expectation that the stream would buffer for
you, it was easy to trigger pathological behavior. It appeared that each
write() from the application triggered an immediate system call to write that
data to the kernel. The call was nonblocking, so once the kernel buffer filled
up, the kernel would start rejecting any more data. node would then apparently
set a timer and keep retrying the syscall until all the data was gone. If you
kept sending data too fast, huge numbers of pending I/Os would rapidly build
up, and the system would eventually be getting hammered by syscalls as fast as
node could send them, effectively crashing the app.

From a technical design standpoint, this behavior seemed to me like something
you'd always have to be aware of, but, in practice, it didn't seem to be an
issue with network I/O. The reason, I'm guessing, had to do with the nature of
network I/O and the kind of traffic servers, especially node servers, face.
Interactive low-latency applications would never send enough data downstream
to trigger it. In other network applications, the random ebb and flow of
network traffic would tend to mitigate it.

I haven't really kept up with node, so I don't know how the status might have
changed in the past two years. It looks like with this new API they're trying
to make streams easier overall, and that's good.

