

Asynchronous Iteration Patterns (in Node.js) - pshapiro
http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js

======
jerf
Yes, this is necessary. Now, build up a few more of these patterns, start
watching them intertwine with each other in ways that require you to manually
interleave them, come to the realization that building your glorious
asynchronous stack on top of a fundamentally synchronous language was a bad
idea.

In languages that are not fundamentally synchronous, such as Erlang, you do
not leap through hoops to manage this. You simply write a function that
performs your insertions in the straightforward and obvious way, and the
runtime manages it with no blocking at any point.

I actually don't like the frequently-made assertion that "a pattern is
automatically a weakness in the language", but it does apply here. As the
Node.js community laboriously builds up the patterns necessary to work in this
paradigm, recapitulating the work done in numerous other async-on-top-of-sync-
language libraries, it probably is worth keeping the assertion in mind. You
shouldn't even have to _think_ about this, let alone argue which way is the
best way to do it.

(Somebody modded this down. Maybe I should make it clear that I'm actually
speaking from experience on another asynchronous project written in Perl on
top of glib's asynchronousness, which isn't fundamentally different from
Node.js'. I'm not speculating, I've been on the receiving end of this
complexity explosion. It _will_ happen.)

~~~
moe
_I've been on the receiving end of this complexity explosion_

Count me in. Been there with Twisted and (partly) EventMachine.

Imho node should really abstract this away before the eco-system turns into a
giant spaghetti ball. Of course you can't bend javascript into a concurrency
model as elegant as erlang's. But co-routines or a similar abstraction is
urgently needed at least for the code that the users end up writing.

On a related note; I recently switched to coffeescript in an effort to bring
my (still small) node codebase back into a half-way readable state.

The simple act of removing most curly brackets had an astonishing impact on
readability. I still have to reason about callback chains, but at least I no
more have to wade through half screens full of closing brackets while doings
so.

That observation was a bitter-sweet reminder of what I'm (willfully) getting
myself into with node - all the while keeping my fingers crossed that this
wart on an otherwise beautiful environment will be improved asap.

~~~
strmpnk
A long time ago I convinced or at least started the ball rolling on getting
coroutines added to node.js. Now when they finally were added, it was done in
what I would call a very wrong fashion. Coroutines could have events pushed to
them without being yielded to (there needs to be a guard). Of course the
reentry issue is just as bad as the one being solved so I think it was written
off as a horrible idea and never returned to.

Of course, it didn't have to be that way if they had proper event triggering
semantics. I find very little hope of the community returning to this issue
though. Not that it's the only way to solve things... but it would have been a
nice balance IMO. (and I still hold that coroutines can be done cheaply... at
least as cheaply as event sources if not more)

(Update: to clarify on proper coroutines on node, each needs to be able to
iterate on an independent tick loop rather than allow preemption from
EventEmitter#emit)

------
moe
This is a great demonstration of the fundamental issue with node. His code
goes from:

    
    
       function insertCollection(collection) {
         for(var i = 0; i < collection.length; i++) {
           db.insert(collection[i]);
         }
       }
    

To this:

    
    
      function insertCollection(collection, callback) {
        var coll = collection.slice(0); // clone collection
        (function insertOne() {
          var record = coll.splice(0, 1)[0];
          try {
            db.insert(record, function(err) {
              if (err) { callback(err); return }
              if (coll.length == 0) {
                callback();
              } else {
                insertOne();
              }
            }
          } catch (exception) {
            callback(exception);
          }
        })();
      }
    

Hopefully this article will work as an eye-opener for anyone who was not yet
convinced that node badly needs a concurrency abstraction.

~~~
substack
First, why wrap your second example in try {} catch {} but not the first? The
second example (final example from the article) seems like it was
intentionally made more complicated to prove its point, but it could just be
unfamiliarity. Node idioms are just _different_ from other environments and it
takes time to get a good feel for them.

(Edit: noticed that the article does it this way.)

How about just:

    
    
        function insertCollection(collection, cb) {
            if (collection.length === 0) cb(null);
            else db.insert(collection[0], function (err) {
                if (err) cb(err)
                else insertCollection(collection.slice(1), cb)
            }
        }
    

That's hardly much more complex, and as a bonus won't lock up your whole
program while db.insert() is working. You could use something like my own
library Seq() too:

    
    
        var Seq = require('seq');
        function insertCollection (collection, cb) {
            Seq.ap(collection).seqEach(function () {
                db.insert(c, this);
            }).seq(cb).catch(cb)
        }

That is, if you actually want the inserts to go sequentially. Usually you want
them to go in parallel with say a maximum of 10 pending requests:

    
    
        var Seq = require('seq');
        function insertCollection (collection, cb) {
            Seq.ap(collection).parEach(10, function () {
                db.insert(c, this);
            }).seq(cb).catch(cb)
        }
    

I don't see any "fundamental issue" with node here. It's just a very new
ecosystem and the idioms and libraries are rapidly evolving.

~~~
moe
_why wrap your second example_

 _That's hardly much more complex_

Well, sorry to be snarky, but you couldn't have supported my point better.

Both you and the article author (where the code snippets are from) missed this
critical detail during the first iteration. And not on some complex, esoteric
problem but on one of the most basic language constructs.

I blame neither of you, I only ask that we please not discount the extra
complexity.

~~~
substack
Ah, your point makes much more sense now. Yes, the author had some overly
complex examples, but I'm optimistic about using user-space libraries to
handle flow control without having to add new language primitives.

------
codahale
You'd overflow your stack after inserting 600-700 records:

<https://gist.github.com/845874>

------
chapel
I posted this as a comment on the blog post, though I would share it here.

This is awesome that you wrote a post on this, it is an important building
block to proper node.js coding. I know I had trouble with it.

There is another way to handle asynchronous iteration, while similar to what
you have done, gives a bit more flexibility and is abstracted out so you can
use it anywhere.

<https://gist.github.com/b5af7369ec9939ab7d94>

This is using code from a script of mine as an example, but it should be
pretty easy to see the pattern.

Here is your above serialized example with the above function I just showed
you:

<https://gist.github.com/26bda51358610667f9f3>

\---

On a side note, I wanted to say that people hell bent on turning node.js into
another language should just use that language. There is nothing wrong with
the way node.js handles asynchronous code, and really keeping everything
asynchronous instead of building in stuff to fake synchronous code is bad for
node.js and the whole ecosystem. It promotes bad habits and bad code. If you
can't handle callbacks or events, then maybe you should go back to using
whatever you are more comfortable with, or write your own abstractions.

~~~
jimmyjazz14
Your last paragraph comes off as rather dismissive. How does abstracting away
the control flow promote bad code? Programers should only need to worry about
the control flow of their program when they absolutely must (which is pretty
rare) otherwise the compiler (or interpretor) should be able to take care of
those details. Its not that people can't handle callbacks or events, its that
they should not have to for most of the things node forces you to use them for
(everything, all the time).

~~~
chapel
Granted it was a bit dismissive, but it is a common issue of people coming
into the node.js community and saying it needs this or that. One main thing
being that you shouldn't have to deal with callbacks/events. Now I think
everyone is entitled to their own opinion on how things work, but the powers
that be and the majority of senior node.js developers agree that the way it is
handled is probably the best for now. Removing that control or mucking up the
core of node.js so people can code like they do in other languages is not a
good thing.

I know people like node.js and want to make it better in their own way, but I
feel the course it is on is the best course for right now. It might add more
cruft for developers, but there are plenty of abstractions that make it simple
if you choose to.

------
dlsspy
Here's how this looks in twisted (handling exceptions, capturing (but
ignoring) outputs, etc...):

    
    
        @defer.inlineCallbacks
        def insertCollection(collection):
            for ob in collection:
                yield db.insert(ob)
    

...as much as people talk about the complexity of twisted with respect to
async programming, they've really figured it out over the many years of the
life of the project.

I agree with substack that you can do this without changing the language, but
you do have to get something usable to people at some point.

I think most people who write any code in node.js run into this issue. I am a
casual user, but ran into it a couple of days ago and did some research to
find that there are many third-party modules I could bring in to do a for
loop, but nothing built-in. The old way of doing this in twisted (manually
managing deferreds) really turned a lot of people off. By the time twisted got
inlineDeferreds and inlineCallbacks, people already had their impressions of
how difficult things are.

I still think node.js is fun and viable, but it does need a for loop.

------
aurynn
Interesting article - I really liked the updated example at the very end,
demonstrating a new pattern (for me) for handling exceptions that get thrown
by wrapping the DB inserts in their own function.

------
stanleydrew
"Here we are using tail recursion to keep inserting the records."

Correct me if I'm wrong, but I don't think he means tail recursion. It's
recursion, yes. But does tail recursion even make sense in an asynchronous
setting?

~~~
route66
He even mentions that it would blow up the stack ... so, no TCO is applied,
which does not exist in javascript (at least not in V8). But tail recursive it
is, as the recursion is the last statement. What he implements in the
variation is a kind of pseudo-trampoline through setTimeout ...

------
Charuru
Abstractions are absolutely necessary imo. Code like that is too long to be
easily maintainable.

[http://groups.google.com/group/nodejs/browse_thread/thread/c...](http://groups.google.com/group/nodejs/browse_thread/thread/c334947643c80968/8d9ab9481199c5d8?show_docid=8d9ab9481199c5d8)

Also, that guy is responsible for nodetuts, which I'm very thankful for!

<http://nodetuts.com/index.html>

