
How I want to write Node: Stream all the things - nua
http://caolanmcmahon.com/posts/how_i_want_to_write_node_stream_all_the_things_new/
======
ixmatus
Forgive me for being "That Guy" but I really think Javascript is ill-suited
for this paradigm!

Streams, honestly, are hard to keep straight when the program gets big without
a stronger type system. IMHO.

Some really sharp people have been working on stream computing software in
Haskell for a while - Gabriel's Pipes package is a good example of generalized
stream computing with _strong_ equational reasoning as its foundation.

Maybe if you _really_ want to try and do this in Node you can gain some
inspiration from his journey:
[http://hackage.haskell.org/package/pipes](http://hackage.haskell.org/package/pipes)

~~~
aegiso
Here's the main enlightenment of becoming a node.js guy:

Node follows the unix way. Everything is a stream. It's just Buffers and JS
objects flying around. It's really stupid, and sometimes it's nasty. This
isn't helped by Javascript's warts.

But there's an enormous upside to this: following the stupid Unix way means
that no matter what you need to do with your data, there's an npm module for
it. Just .pipe() your stream in and your code is done. This is amazing. And
it's possible only because of how bare-bones and loose the Buffer stream API
is.

Strong typing has its place, but it would ruin node's biggest selling point.
It's hard to realize this without trying it.

~~~
hamburglar
Yes, and "the unix way" isn't just "everything's a stream", but rather
"everything's a _text_ stream." In that regard, node's version of
"everything's a stream" is actually a half-step up in abstraction.

~~~
ixmatus
Isn't it more accurately a _byte_ stream (I don't know, which is why I'm
asking)?

~~~
jkrems
The default build-in streams found in the stdlib are byte streams (byte chunk
streams, to be more exact). But user-level there are object streams as well,
those are also mentioned in the docs:
[http://nodejs.org/api/stream.html#stream_object_mode](http://nodejs.org/api/stream.html#stream_object_mode)
\- so they are kind-of official.

------
glenjamin
If you're doing Node.js, Caolan's async library is pretty much part of the
standard toolkit.

I know Caolan's been thinking about this and reworking it for a while, so I'll
be interested to see whether it manages to see significant takeup.

~~~
nailer
From Great British Node Conf, maybe two months ago: a speaker talking about
promises asked what people use for flow control:

\- promises: about 20% of the room.

\- async: about 80% of the room

For me async.waterfall([list of functions]) is little nicer than 10 chained
.thens().

And people advocating promises _still_ keep saying it keeps things flat. No it
doesn't, we're already flat because we're all using async. Stop pretending
async doesn't exist and isn't massively popular.

And way, way better documented. Q.spawn what? And this is the _best_ promises
library?

Stack Overflow question: Simplest fs.readFile example with generators and Q?

Current only answer:

    
    
      Q.spawn(function* () {
          …
          var data = yield Q.ninvoke(fs, "readFile", somefile);
          …
      });
    

Answer from Highland docs:

    
    
        var data = _.wrapCallback(fs.readFile)('myfile');
    

\- What's Q (yes it's a module, but what does it mean? Is it supposed to a
misspelt queue or something else?

\- What does 'ninvoking' something do?

\- Shouldn't I just be able to to put the variable declaration outside of the
scope?

\- Why do competing Open Source implementations of the same standard exist?
Can't there just one reference implementation?

That's not the future.

I might be really ignorant here. I probably am - I could read a shit tonne of
docs to work out what this strange beast does and technically someone can
probably do a better job answering that Stack Overflow question. But nobody
has, because very few people know how to operate the current state of the art
generators/promises setup.

From the Q docs: "If you have a number of promise-producing functions that
need to be run sequentially"

No, I don't have a number of promise producing functions. Nobody in nodeland
has that. I __just have functions __. I could read about turning them into
promise producing functions, and calculate whether this abstraction layer is
adding value, but then again, I could do productive work with async.

And from the looks of it, Highland too.

~~~
gruseom
I have a question about the async library: what does it do if an asynchronous
function or a callback raises an exception? The word "catch" doesn't appear in
[https://github.com/caolan/async/blob/master/lib/async.js](https://github.com/caolan/async/blob/master/lib/async.js).

I ask because I wrote some Lisp macros (I work in a Lisp that compiles to JS)
to implement a few async patterns I need, and making sure that exceptions are
trapped and threaded into the callback chain correctly was the most
complicated part.

~~~
thedufer
It doesn't do anything about thrown exceptions. The correct way to deal with
an error in asynchronous code is to pass an object describing the error as the
first argument to the callback. Any code that takes a callback is expected to
know this and not throw exceptions.

From a practical perspective, it doesn't make sense to try to catch exceptions
in asynchronous code, anyway. Once you do something asynchronous, you lose the
stack and thus the try block. The way to catch thrown exceptions in
asynchronous code is with domains, which something as low level as async would
not be expected to handle.

~~~
gruseom
_The correct way to deal with an error in asynchronous code is to pass an
object describing the error as the first argument to the callback._

Sure, but what if the error is thrown at you as an exception in the first
place—which happens a fair amount, because that's how the JS runtime tells you
when something is wrong? How do you get from there to the callback way?

What the Lisp macro I mentioned does is generate a separate try-catch around
each block of code that runs at a different time and thus might throw an
exception that would not otherwise get caught. In that way it catches every
exception that's thrown, converts it to an error object, and passes the error
back through the callback chain. The async library could do the same, albeit
with a lot more code. I'm curious why it doesn't.

 _From a practical perspective, it doesn 't make sense to try to catch
exceptions in asynchronous code, anyway._

I don't think that's right. Asynchronous code is just synchronous code that
runs at different times. Each block of synchronous code can generate
exceptions. I agree that if you don't catch them then, they become useless;
but you _can_ catch them then. The reason this is not a "practical
perspective" in JS is not that it doesn't make sense, it's that the language
doesn't support it. Even the minimum code necessary to catch every exception
involves so many try-catch blocks as to obscure the rest of the program. So no
one writes such code by hand in JS.

Yet it is, I think, code that one wants, because without it you don't have a
consistent error model. You end up having one model for first-class errors—the
ones you detect and pass to callbacks before an exception has a chance to
arise—and a second one for the dregs—the ones that come from any code that
didn't know about or follow the callback convention (which, critically,
includes the language runtime). The latter kind of error either crashes the
server or gets caught by a top-level handler so it "only" crashes the request
it was processing. That's a half-baked system.

~~~
thedufer
> Sure, but what if the error is thrown at you as an exception in the first
> place—which happens a fair amount, because that's how the JS runtime tells
> you when something is wrong? How do you get from there to the callback way?

Its on you to catch that, not your libraries. This shouldn't be terribly
common, though. The only thing I can remember having to wrap in a try/catch in
the codebase I work on is JSON.parse.

> The async library could do the same, albeit with a lot more code. I'm
> curious why it doesn't.

It couldn't, without domains. try-catch wouldn't do it. Domains are something
that is not very well understood, in my experience, and expected to happen at
a higher level than libraries like async.

~~~
gruseom
I don't understand most of this. For example, I don't know why you say that
the async library couldn't try-catch every place that an exception might occur
(mostly its calls to the functions that get passed in to it). It would be
interesting if it couldn't, since then we'd have an example of something
macros can do that functions cannot. But it seems obvious to me that it could;
you'd just need a lot of try-catches. What am I missing?

As for domains, I don't know what you mean by them, but if they're catching
errors at a higher level than the async library, my guess is that they must be
some more sophisticated sort of top-level handler; perhaps something that
keeps track of which async calls are in progress and attempts to bind
exceptions back to their context? Whatever it is, it sounds complicated.

But what I understand least of all is how you guys all seem to write
Javascript code that generates almost no exceptions. To me that sounds almost
like bug-free code. No null references, for example? I get stuff like that all
the time.

~~~
thedufer
> No null references, for example?

We write in CoffeeScript, where a null reference check is so astonishingly
easy to write that you use them everywhere you might get a null. I'm not sure
what other exceptions you're seeing. We do basically no math, so /0 errors
aren't a problem.

> For example, I don't know why you say that the async library couldn't try-
> catch every place that an exception might occur

Let's build a typical function you might pass to async:

function(next) { request.get(url, function(err, data) { JSON.parse(data); } }

Let's assume the server doesn't serve JSON like we expect - so JSON.parse
throws an exception. The only thing async could have wrapped in a try/catch is
the main function, but we've fired off a request and then the call stack
wrapped up, including the try/catch. Next, an event occurs that calls our
callbacks, not going through async at all. That's where the exception occurs.
The stack trace generated by that exception doesn't contain any code in the
async lib, so it can't possibly have a try/catch active.

Domains are a way of fixing this. You create a Domain and bind callbacks to it
- if that callback throws an exception, the Domain instead emits an error
event.

~~~
gruseom
Ok, thanks, I get it now. In my case a macro transforms the body of each
callback to catch exceptions and pass them back as error args through the
callback chain. So in your example, there would be a generated try-catch
around the JSON.parse(data). I forgot this detail (sign of a successful
abstraction?) and it does seem an example of something macros can do that
functions cannot.

Re null reference checks, to get behavior analogous to a null exception you
have not only to check for null, but also pass back an explicit error if you
find it. That's a lot more work than adding in an extra question mark. Null
checks that do nothing but not crash are a mixed blessing; 90+% of the time
they do what you want, but when they don't, you get a silent failure and a
debugging goose chase. I'd be surprised if you told me that that never
happens.

I took a look at Node.js domains and they do seem really complicated. If I
were working in Javascript instead of having control over the language, I
doubt I would use them; I would probably just crash-and-restart as one of the
other commenters described. That's not a good solution, but probably the best
tradeoff given the alternatives.

~~~
thedufer
Our use-case for domains is to allow the process to finish serving its other
in-progress reqs before crashing. When an error occurs, we stop accepting new
connections in that process, give them 10-15 seconds to complete, and then do
the crash-and-restart cycle.

That said, we get very thorough testing from our large user base, and we
quickly fix crashers. Our server proc crash rate is almost 0, brought up by
occasional spikes on releases.

------
cjf4
Really can't wait for "all the things" to stop being used as an acceptable
replacement for "everything."

~~~
badman_ting
+1

------
syntern
Most of the things he describes can be done with Dart's async and collections
libraries. Compiles to js, works on server side, used in production. Even if
someone wants to reinvent the wheel, they should take a look at those
libraries and see how it is done there.

------
ilaksh
I use async.map when I need to do a bunch of file operations or something
asynchronously and wait for them to finish. I have largely avoided using
stream specific syntax because the event model was more familiar to me. I have
used ToffeeScript instead of async.waterfall or async. series because
ToffeeScript is cleaner.

With these improvements to streams in Highland making things more convenient
and broadly applicable I expect to be using Highland streams for certain
things.

------
badman_ting
I get that arrays are meant to stand in for more asynchronous sources of data
but those things seem so different to me that I don't understand why you'd
want a library that treats them the same. If you need to map an array that's
taken care of. I know I'm being dense, I just don't get it.

------
eplawless
I'm curious which features highland provides which RxJS doesn't. From what I
understand, composable streams from any data source with backpressure support
is pretty much the definition of Rx.

Sometimes simplicity is a feature, too, though.

~~~
caolanmcmahon
Rx doesn't handle back-pressure or laziness, so it's for only really for
handling events.

~~~
caolanmcmahon
RxJS advocates are unhappy with this comment so I'm going to qualify it a
little. Apologies for any misunderstanding...

Rx doesn't handle _automatic_ back-pressure (like Node Streams) but does have
mechanisms to avoid overwhelming slow consumers. Rx also has delayed
subscription which you can call lazy, but not by turning the stream into a
pull-stream (allowing you to sequence actions in the way Highland does).

If any of the above needs further qualification or comment please weigh in on
the issue by commenting here... but for now I'll leave it at that. I actually
list RxJS in the blogpost because it's a _good_ example!

~~~
mattpodwysocki
Coming in 2.3, we will have full capabilities for backpressure. We already
have window/buffer/throttle, etc. But, I think it's naive to have only one
style of backpressure because many are valid. Just an example of RxJS, and
what can be done, which includes a style in which you can do several forms of
backpressure, and yes, push to pull based models:
[https://gist.github.com/mattpodwysocki/9010149](https://gist.github.com/mattpodwysocki/9010149)

Still fleshing it out, but pretty close to calling it complete:
[https://github.com/Reactive-
Extensions/RxJS/tree/master/src/...](https://github.com/Reactive-
Extensions/RxJS/tree/master/src/core/backpressure)

We're more than open to pull requests though if anyone thinks we're missing
something here.

------
nailer
@caolanmcmahon: consider using methods?

\- We have object.defineProperty() in ES5 to avoid enumeration.

\- You can use user-specified prefixing to avoid future conflicts.

Eg:

    
    
        {foo: 1, bar: 2}.hlPairs();
    

rather than:

    
    
        _.pairs({foo: 1, bar: 2});

------
hamburglar
This is highly intriguing, but I must ask why the JS community has this
fascination with obscure identifiers like "_". It decreases readability when
what should be a logically-chosen descriptive identifier for your class is
replaced with a single character that visually recedes into the language
syntax. Edit: I've grudgingly given jquery a pass on this because of its
ubiquity, but come on, a stream library? Not to mention the fact that a
reasonably popular lib already seems to have squatted on underscore.

~~~
Cthulhu_
I agree; what I personally wonder is why Underscore and similar libraries
don't make use of Javascript's prototype business and add methods to the array
and object prototypes? Probably things I'm overlooking here, but, [1, 2,
3].map(stuff) is much nicer than _.map([1, 2, 3], stuff) and the like.

~~~
jashkenas
Ha ha.

You're describing the state of affairs before Underscore existed, back when
functional-ish programming in JavaScript was ruled by Prototype.js:

[http://prototypejs.org/](http://prototypejs.org/)

... which added a lot of useful methods to native prototypes.

While handy in controlled and limited environments, mucking about with native
prototypes quickly becomes extremely dangerous and difficult — once you have
two third-party modules on the page that expect different versions of your
patched prototype method ... once you have a new version of a browser that
implements one of your previously-extended functions, but does it differently
— you're pretty well screwed. Both of those things tended to happen in large
sites.

~~~
hamburglar
Whereas with _.map, you always know exactly which implementation you're
getting, right? :)

~~~
jashkenas
Actually, you do. You've loaded it, and you can lock it down privately to your
library or app with _.noConflict().

You can have ten different versions of Underscore loaded on the page, living
in peace and harmony, in ten different third-party modules. Not that you
should. But that you could.

~~~
hamburglar
Well then I guess this _ has a bug because it doesn't have a 'noConflict'
method. :)

But to be perfectly fair, you are 100% correct: this is not a technical
complaint and perhaps this entire sub-thread is, as has been claimed, "bike-
shedding." Since this lib is designed to be loaded via an AMD-style mechanism,
users can call it whatever they want, so _ is just as valid an identifier as
any other. Except the obvious issue that readers of the code, examples, and
any code that follows suit, will end up with this completely pointless
ambiguity because _ is ultimately a meaningless name if it evolves to mean
simply "some library I loaded." You may as well write sample code like:

var $ = require('http'); $.createServer(...);

I would think people would call that out as ridiculous and confusing.

------
gfosco
This sounds intriguing. I wish there were some more complex examples. Part of
the greatness of Promises is taking a big chunk of pyramid code and turning it
into a set of simple steps... I'd like to see how this would handle that.

~~~
caolanmcmahon
Good idea, I'll definitely post a follow-up with some real code done using
async/callbacks and highland/streams. The comparisons usually start to look
_more favourable_ with longer examples, due to the Highland API being so
composable.

~~~
qubyte
Nice. I need to compare code written with highland to code with
async.series/.each etc. (but not waterfall, I don't like that dude ;)).

------
lukasm
I really like Node, but I want a language with sane semantics. What's the best
option apart from JS and CoffeScript. TypeScript is awesome, but I'm afraid
that there will no good community, because of MS stigma.

~~~
codygman
You can use Haskell for the same semantics if I understand you correctly.
Actually coming from Node, Go might be a better fit.

~~~
lukasm
In other words I would like to use language that runs on Node, but it's not JS
or CS

------
mcgwiz
Along these lines, John Resig recently created
[http://nodestreams.com/](http://nodestreams.com/) as a way of composing and
conceptualizing stream processing. Pretty nifty.

------
albertoleal
This is amazing! And it's something that I wanted for some time.

LazyJS just received stream support, but I'm pretty much sold on to
highlandjs.

------
kimjotki2
It looks like this whole notion of 'stream' is just a syntactic sugar for
javascript guys who lost themselves in a bunch of nested stupid callbacks,
which is worse than lisp parens. Instead of nesting whole callbacks, create a
stream in the middle; execute first half of callback chain and dump the result
into the stream so that next half of callback chain can be executed later.

Why is this so an enlightenment for node guys? UNIX does it right since epoch
- simple programs perform simple tasks and connected via pipes. Python has
gevent, so you don't even need 'a stream' or other bullshit, you just write
the code as-is and greenlets provide the concurrency needed.

The real enlightenment comes from 'programming properly'; you start with C and
torture your brain with function pointers and realize why it is a good idea to
treat functions as a first-class objects. then you learn some 'proper'
functional programming languages like lisp or something to learn how to think
in functional way. which is the only guaranteed and proven path to prevent
yourself from shooting your own foot by writing 20+ nested callbacks. If you
start with binding an anonymous function to a <button>'s click event and think
you can do this to do real programming, you'll never get it right.

~~~
woah
Wise words of wisdom from kimjotki2, the only real programmer on the internet.
Bow down, bitches.

~~~
coldtea
This is not reddit.

