Async Iterators: These Promises Are Killing My Performance

Matthias247 · on Aug 27, 2017

I agree that the decision that Promise continuations run always in a new eventloop iteration will have a serious impact in performance. Nevertheless I think it is a great design, because it provides you with lots of guarantees. I have worked a lot with C# Promises (Tasks), where the implemenation allows synchronous completions, and that gives way more room for errors. E.g. whenever you call TaskCompletionSource.SetResult(xyz) you might directly jump into the completion handler which executes arbitrary code (and may even call recursively back into your code). If you don't know that, don't know when it happens (understanding all the SynchronizationContext stuff) and don't take precautions against it (like checking for state modification after promise resolval) you might run into hard to find errors. Similar things can happen on the using side. In JS-land a continuation will always run in the next eventloop iteration, which guarantees that code after Promise.then() will still get executed before it. In other designs the continuation might run directly in .then sometimes, which is also another thing to account for. In my opinion the JS promise spec provides the most sensible and least error prone behavior. In my opinion that's great, because for the average programmer correctness is more important than performance. The smaller amount of coded that really needs to have a very high performance can still be hand-optimized, e.g. by falling back to synchronous callbacks where it matters, or for the usecases of the link reading bigger parts of the file at once and then performing more of the parsing synchronously.

shangxiao · on Aug 27, 2017

I appreciate the fact that you've taken the time to offer a comment which I personally think is interesting & valuable, but with all due respect, could you type in paragraphs please… it makes it a lot easier to read :)

jkrems · on Aug 27, 2017

Not sure which spec the article is referring to but it's not the one implemented by V8 natively.

> The problem is that Promises have to be resolved in the next turn of the event loop. This is part of the spec.

That statement is false unless I'm missing something major here. Promises aren't resolved in the next turn of the event loop. They should be resolved as part of the microtask queue which can happen at the end of the tick (recursively).

Example (tested on node 8):

    Promise.resolve()
      .then(() => console.log('1'))
      .then(() => console.log('2'))
      .then(() => console.log('still the same tick!'));
    setTimeout(() => console.log('timeout happening in the next tick'), 0);

Result:

    1
    2
    still the same tick!
    timeout happening in the next tick

The same will happen with async functions:

    (async () => {
      console.log(await Promise.resolve('1'));
      console.log(await Promise.resolve('2'));
      console.log(await Promise.resolve('still the same tick!'));
    })();
    setTimeout(() => console.log('timeout happening in the next tick'), 0);

noonat · on Aug 27, 2017

You're correct. The spec only specifies that "onFulfilled or onRejected must not be called until the execution context stack contains only platform code." This doesn't require that the implementation use the same macro-task scheduling mechanism that setTimeout might use.

jerf · on Aug 27, 2017

"Unfortunately, generators and iterators are synchronous. Unless we’re willing to read our entire file into memory, they can’t help us."

This doesn't seem to follow to me. Synchronous reading means you have to read a given chunk in before you can process it, which means that if you're not reading out of cache that you're going to stall on the spinning disk, but synchronous code doesn't require you to read the whole file at once. And you can easily write something that will process it into nice lines for you. The "standard library" may not do that because there isn't much standard library for Javascript, but it's not that hard.

The proof is, that's what the Python snippet is doing. It's built into Python in this case, and I would expect it does some magic internal stuff to accelerate this common case (in particular doing some scanning at the C level to see how long of a Python string to allocate once, instead of using Python-level string manipulation), but the API ought to be one you can do in Node, even synchronously, with generators.

Arnavion · on Aug 27, 2017

>Synchronous reading means you have to read a given chunk in before you can process it, which means that if you're not reading out of cache that you're going to stall on the spinning disk, but synchronous code doesn't require you to read the whole file at once.

It's implicit that stalling the node.js event loop is not allowed, so a synchronous "line" generator or iterator would block the event loop when it has exhausted the data read so far. The node.js event loop cannnot suspend the current task and process other tasks while waiting for more data to be read.

Edit: The point of the author's post is that async iterators generate a Promise (and thus a yield to the event loop) for every element, even when the element is already available synchronously. Even if the generator has read three lines in the latest call to read(), it must still yield three promises of one line each (each involving one yield back to the event loop) before all three lines are processed.

Edit 2: That said, it's not hard to fix this, and the author actually comes close to the solution but doesn't realize it. They say the result of the async iterator in the spec should contain an array of values, but it's just as easy to write the "line" iterator to yield an array of lines as its values without changing the spec.

hetman · on Aug 28, 2017

I think the problem with yielding an array of lines is that all the code that uses the result of the function would have to be aware of this grouping of results all the way up the stack.

What he was proposing was a solution that looked like you were only returning one value per iteration (so none of the code doing the processing had to be aware there are arrays of values) but underneath the covers was actually being processed in batches on each run through the event loop.

It's actually a very elegant solution. The only code that would have to be aware that it's actually dealing with batches of values is the code producing those batches right at the top of the stack of calls.

Arnavion · on Aug 28, 2017

That's correct, and I understand the author's solution. My point was that changing the spec is not a hard requirement, since it can be accomplished in user code.

hetman · on Aug 29, 2017

Then I suppose we don't need async iterators at all since this can all already be done with callbacks.

abritinthebay · on Aug 27, 2017

Thank you. This is not Promises being slow (though they do have inherent overhead) this is just plain bad code!

I don't know much about the author, Dan, but this should be able to be written to be a LOT faster. Perhaps not as fast as the synchronous example (reading something into memory then processing it is always going to be very fast if you can do it) but faster than the code example they give.

Jare · on Aug 27, 2017

The article is correct that promises and async iterators are not a good idea for tight innerloops. The author just doesn't seem to realize that it is possible to read parts of a file synchronously, and thus build a sync iterator that way. fs.readSync() should do it: https://nodejs.org/api/fs.html#fs_fs_readsync_fd_buffer_offs...

(I assume there are 100s of npm packages that do this already)

hetman · on Aug 28, 2017

I think you've misunderstood the author's point. He would like a fast solution that also looks as simple as the Python example.

Even if you have a synchronous file read function that's doing partial file reads, you can not use it inside a synchronous loop to process the returned data because it will hold up the entire Node.js event loop until the whole file has been processed.

You can use the synchronous file read function in a way that doesn't hold up the main event loop but to do that you re-introduce all the complexity the author was trying to avoid.

Jare · on Aug 28, 2017

But you CAN make a loop that looks as simple as the Python example; Python just happens to provide one built in, right? The Python code holds the thread for the entire duration just like the Node one will hold the event loop. Wouldn't the Python loop cause the same problems as the node one in an evented app, say, a Twister app? (sorry my Python is a few years rusty)

The ideal would be just as performant as the sync Python code, but effectively async. And his observation that async iterators by design can't do it without severe overhead for such tight inner loops is very good.

abritinthebay · on Aug 28, 2017

Right, exactly. Very well put :)

sp527 · on Aug 27, 2017

If you're using Node.js as a stateless web server (the principle use case motivating its design), when are you reading files off the local disk other than in the initialization phase? What's the problem with applying the 'right tool for the job' mindset and defaulting to other platforms for more generalized computing tasks?

camus2 · on Aug 28, 2017

> If you're using Node.js as a stateless web server (the principle use case motivating its design), when are you reading files off the local disk other than in the initialization phase? What's the problem with applying the 'right tool for the job' mindset and defaulting to other platforms for more generalized computing tasks?

what's the point using nodejs if you can't do a simple task such as reading a file? Use something else to begin with. NodeJS wasn't designed to be "stateless", no more than Javascript was. If you want "stateless" use clojure or Haskell.

iamleppert · on Aug 27, 2017

Not a scientific test:

  var fs = require(‘fs’),
   csv = require(‘csv-parser’);
  var start = new Date().getTime();
  fs.createReadStream(‘./Batting.csv’)
   .pipe(csv())
   .on(‘data’, function() {})
   .on(‘end’, function() {
     console.log(new Date().getTime() — start);
   });

This takes about 400 ms on my machine to parse a 6.1 MB CSV (15.25 MB/s). I didn’t do anything useful with the data, but I did verify it exercises the parsing function.

Clearly all the work/time is spent in the actual parsing and handling corner-cases in CSV trying to be correct over high performance/throughput.

anaisbetts · on Aug 27, 2017

> (note: I’m using TypeScript to convert the async iterators and generators into something that node.js can run. See my code here.)

This is your problem, TypeScript's Promises polyfill is really really slow. Switch TypeScript to emit ESNext and run it on Node 8 and you'll see a much much different situation.

Arnavion · on Aug 28, 2017

TS does not provide any Promises polyfill.

sebcat · on Aug 27, 2017

I've written event-driven, non-blocking style C for a couple of different projects on POSIX platforms. When a file descriptor becomes readable, you read from it until read(2) returns EAGAIN[1] or until a condition is fulfilled, such as the read buffer containing a delimiter. People new to event-driven programming may just invoke read(2) once for each readable event, check for a condition and then yield to the event loop. This sounds like the same mistake, only with more turtles stacked on top of it.

[1] This applies to sockets and pipes (incl. stdin), not regular file I/O which usually ends up being off-loaded to a thread/process pool (if needed)

0xcde4c3db · on Aug 27, 2017

I don't think it's quite the same thing, but that description reminds me of Ted Unangst's "accidentally nonblocking" post.

https://www.tedunangst.com/flak/post/accidentally-nonblockin...

hetman · on Aug 28, 2017

I think this is a bit different. Even if the file reading code was reading the maximum amount of available data it would still have to store most of it in a local buffer and then yield only one line in a promise which would mean only one line was being processed per event loop tick.

There's one more thing you should be aware of here that can catch a lot of developers out. On most POSIX systems, file reads from disk do not return EAGAIN but will in fact block if the disk is too slow returning the data back. EAGAIN is only meaningful on things like pipes, sockets, etc. The only way I'm aware of to do disk access truely asynchronously is using the POSIX AIO interface.

skybrian · on Aug 27, 2017

re: "Promises have to be resolved in the next turn of the event loop. This is part of the spec."

This seems imprecise? There are multiple queues and the details seem to be implementation-specific [1]. The Promises/A+ specification doesn't seem to specify a particular implementation [2].

The reason this might matter is that if you use promises to do something CPU-intensive without doing I/O, it might be possible to starve the event queue, similar to what happens if you're just writing a loop.

On the other hand, the example in the article is waiting on reads, so they'll be delivered through the event queue. Can reading larger chunks at a time and buffering be used to speed that up?

[1] https://github.com/nodejs/node/issues/2736#issuecomment-1386...

[2] "This can be implemented with either a 'macro-task' mechanism such as setTimeout or setImmediate, or with a 'micro-task' mechanism such as MutationObserver or process.nextTick. Since the promise implementation is considered platform code, it may itself contain a task-scheduling queue or 'trampoline' in which the handlers are called."

kevincox · on Aug 27, 2017

I think the main concerns is that the following code happens in the expected order.

console.log("First");

promise.then(_ => console.log("Third"));

console.log("Second");

This means that even if `promise` is fulfilled already the callback must be delayed at least until the next turn of the event loop.

skybrian · on Aug 28, 2017

Yes, it does need to be delayed until the stack is unwound, but it doesn't need to be delayed until after any events already on the main event queue. (I'm not sure what "the next turn of the event loop" means in the presence of multiple queues.)

mighty_bander · on Aug 28, 2017

The number of excuses for node's poor performance here is shocking. Talk about high-level architectural concepts and quibble about the likelihood of this implementation being necessary if you like, but if you can't provide a decent implementation for this very basic use case, then node has a hole in it.

ricardobeat · on Aug 28, 2017

Node is not optimized for this use case at all. Try parsing 64 files at once then compare them again.

bluepnume · on Aug 27, 2017

The biggest problem I had with forced promise asynchronicity was with browser promise polyfills, which use setTimeout to force success and error handlers to be called asynchronously.

This is fine, unless you're:

1. Modifying the DOM and hoping to avoid jankiness 2. Trying to work with popup windows, in a browser which deprioritizes setTimeout in all but the focused window.

I ended up implementing https://github.com/krakenjs/zalgo-promise to get around this. It intentionally "releases Zalgo" by allowing promises to be resolved/then'd synchronously, but my belief is this doesn't have to cause bugs if it's used consistently.

Arnavion · on Aug 28, 2017

You're using poor polyfills if they use setTimeout. Good polyfills [1] use MutationObserver which uses the (a?) microtask queue.

Invoking a Promise's then() callback synchronously is a violation of the spec. IIRC there are some tests to prevent that in the test suite.

[1]: https://github.com/petkaantonov/bluebird/blob/2c9f7a44/src/s...

shinypotatoe · on Aug 27, 2017

Interesting and surprising. Node 8 supports async/await as well as iterator functions. So the target for the typescript transpiler can be set to 'es7'. Would be interesting if this has an inpact on the performance.

rictic · on Aug 27, 2017

Poking around with it, it looks like that gives a 20% or so improvement in node v8.4.0, from 1239ms to 1022ms (vs 56ms for the sync version).

ilaksh · on Aug 27, 2017

Would be nice to say what version of Node.js.

wereHamster · on Aug 27, 2017

It doesn't matter which version of node. The fact that promises must be resolved in the next tick is baked into the spec, no upgrade of node will ever improve that situation.

jkrems · on Aug 27, 2017

> The fact that promises must be resolved in the next tick is baked into the spec

From all I know, that statement is false. Promises aren't resolved in the next tick. They should be resolved as part of the microtask queue which can happen at the end of the tick (recursively).

    Promise.resolve()
      .then(() => console.log('1'))
      .then(() => console.log('2'))
      .then(() => console.log('still the same tick!'));
    setTimeout(() => console.log('timeout happening in the next tick'), 0);

Result:

    1
    2
    still the same tick!
    timeout happening in the next tick

ilaksh · on Aug 27, 2017

Right but there were a lot of specific numbers so the context for specific benchmarks is versions.

Anyway it sounds like we should try to fix the spec.

amelius · on Aug 27, 2017

My problem with promises is that you can't "uninvoke" them. I.e., if all listeners unsubscribe, the original job they were waiting for could be killed, but the design of Promises doesn't allow that. Instead, the job will continue to be executed (without any listeners), and thus wasting resources.

joshstrange · on Aug 27, 2017

I won't argue that sometimes you might want to cancel a promise that you no longer need (ex: user cancels ajax operation) but just because there are no listeners does not always mean you don't want it to complete (ex: Fire and forget, not always a great idea sure but a thing nonetheless).

amelius · on Aug 27, 2017

True, but you can implement fire and forget by adding a dummy listener. The functionality I'm suggesting is more general.

skybrian · on Aug 27, 2017

I think it's less of a design flaw and more a conceptual issue. Since a Promise is just a value, it can be stored in an arbitrary data structure on the heap with no listeners. But the then() method can be called any time later. So, a promise being potentially "listened to" is really just being reachable by the garbage collector.

There's no reason the source of the promise can't implement a cancel() method, though.

amelius · on Aug 27, 2017

But a cancel() method would not be a "clean" solution if there potentially are multiple listeners.

I really want to compute things if and only if there are listeners.

kevincox · on Aug 27, 2017

Why are you generating the promise in the first place if there are no listeners? Why don't you return a function which can be used to generate the promise instead?

amelius · on Aug 27, 2017

I'm not sure what would be the advantage.

The number of listeners can go up and down during the computation. When that number reaches zero, the computation needs to be stopped.

When the number becomes nonzero again, the computation needs to be restarted.

Those are the requirements, basically.

skybrian · on Aug 28, 2017

For that use case, maybe an event emitter would work better?

fibo · on Aug 27, 2017

For efficient text processing the only language that I'd use is Perl5.

deckiedan · on Aug 27, 2017

Perl is great, but not that much (if any) faster to run than the equivalent Python, these days.

If you use perl regularly, then the $_ syntax stuff is super fast to type and do all kinds of cool stuff, but I find as a irregular scripter that I forget after 6 months or so. Python's monotonous syntax means stuff is generally still understandable even by non programmers.

egeozcan · on Aug 27, 2017

In the documentation, there is a specific example for reading a file line by line, and it is very concise[1]. I'm too lazy to test how fast it is though.

[1]: https://nodejs.org/api/readline.html#readline_example_read_f...