im obviously missing something here, hopefully someone can explain?
I understand the eventloop/nexttick/etc architecture of node, but i dont understand how in this case he was blocking the loop? Shouldnt all the operations to S3 be async (and thus non-blocking, even if waiting for a timeout)? What was the specific part in this scenario that was causing the loop to stall?
If I understand correctly, the problem was that they were making several thousands of requests to S3. While the requests to S3 themselves were asynchronous, the callbacks for these requests were queued up for (synchronous) execution in the event loop. Due the large number of callbacks in the queue already, new callbacks for the incoming requests were queued up for execution behind the previous callbacks, leading to latencies in serving up responses.
Ah ok! Perfect, thanks, that's what I was missing. Makes total sense now. In any lang (even in ruby!) you could have separate thread pools(or EM loops, whatever) for the s3 requests & web handlers.
But because node only has one EL, and node interprocess comms is awkward, its tricky.
Gotcha, cheers.
I wonder if using something like async.eachLimit would have helped; it might prevent the s3 batches from flooding the loop & give a chance for web reqs to interleave, but probably at a cost to the median resp time.
I understand the eventloop/nexttick/etc architecture of node, but i dont understand how in this case he was blocking the loop? Shouldnt all the operations to S3 be async (and thus non-blocking, even if waiting for a timeout)? What was the specific part in this scenario that was causing the loop to stall?