// Overly clever way of saying: increment `currentChunk` then use the incremented value
this.chunks[++currentChunk] = 
this.chunks[currentChunk] = ;
I would really like AWS to have an event that fires when all the lambdas in a set have completed, but that doesn't seem to exist. Any suggestions?
I guess the individual Lambdas could pop an "I'm done" message onto a FIFO queue. With a single Lambda consumer of that queue that updates an S3 bucket with a count. Done when count == 10,000. That's atomic, and no polling, but feels a bit smelly.
Again, I can't imagine why this would be useful, but it's a certain kind of fun to take Amazon's metaphor and run with it.
(It's also very hard to Google for "aws lambda partial application" and get results that match what I'm describing here.)
How is this working for you? Where do you see it going?
It actually works quite well. I use it for processing large amounts of data quickly. I originally built this to process some data that would've taken ~14 days and shortened to a few minutes (so nice!).
I enjoy using it when I can find a way to fit it into a project. I've had it internally for a month so not a whole lot of time to see it in a variety of projects.
In terms of where I see it going, I'm not entirely sure yet. It is pretty capable for what I am using it for currently, it can use node.js libraries so it fits the current needs (which are pretty basic). I'll see what happens when more people try it out. :)
Do you have any tips on what should be added, since you've been down this road before?
We end up with hundreds or thousands of Lambda invocations and the orchestrator is aware of all of them. If there's a failure in any lambda, I can stop the entire job and retry from scratch. The orchestrator knows exactly when the job is done so I can easily chain it together with some other non-Lambda job in my system.
Been working fine for us for ~200k invocations and 8k CPU minutes per day baseline with the ability to scale up to 10x that just by running more stuff. Our nightly job processes about 50gb of data in 3 min.
I would be interested, though, if anyone has figured out a way to raise the 30 second RequestResponse timeout. Both that and the 300s Lambda timeout are annoyingly low. Google is slightly better, they go to 600s!
Hmm. Are you saying you can queue additional jobs on the same worker to use up remaining execution time?
- the active lambda decides to stop performing work once there's only 30 seconds left
- it switches to perform whatever cleanup is necessary
- it queues up a new lambda to be executed by pushing the lambda name and event payload to redis
- I have a special lambda that's scheduled for once a minute that scans certain keys in redis, pops the data out, and invokes the lambdas
To give an example, I have a job where I need to read ~50 files in S3 line by line and do some work over them. I spin up 50 lambdas, one for each file. Each lambda chugs along until it is about to run out of time. It then queues up a followup lambda with the byte # to start at in the event payload. The followup lambda then picks up where the last one left off.
I have some cases where instead of queueing up the followup lambda I just invoke it directly. The value in having a queue step in between is that you can more cleanly handle throttling and you can avoid event payload limits by just putting the payload into Redis.
I originally wanted to name it unity-hivemind after I had watched Rick and Morty that weekend, but I figured the name was both a collision and a confusing due to the Unity graphics engine.