Show HN: Hivemind – Distributed jobs using AWS Lambda functions

posnet · on Sept 28, 2017

More mature python based solution.

defined · on Sept 29, 2017

This would be an entertaining comment when read by a C programmer:

    // Overly clever way of saying: increment `currentChunk` then use the incremented value
    this.chunks[++currentChunk] = []

Pre- and post-increment operators are bread and butter in C, hardly overly clever. Are they a recent addition to JS?

nikkwong · on Sept 29, 2017

Nope.. They've been in JS since the early versions of emcascript. JS is often used for more web-appy stuff than actual math so I wouldn't say they're used very abundantly, which may explain the comment.

_pgon · on Sept 29, 2017

On top of that, he could have just written "currentChunk++;" and skipped the comment entirely.

J-dawg · on Sept 29, 2017

wouldn't that return the value of currentChunk before it was incremented? So it would need an additional line to be equivalent to the example (making it a bit more readable in my opinion):

  currentChunk++;
  this.chunks[currentChunk] = [];

Which is perhaps what you meant all along. Please excuse the pedantry.

_pgon · on Sept 29, 2017

Yeah, that's what I meant - instead of taking the shortcut with a comment explaining it, just don't take it in the first place.

Kiro · on Sept 29, 2017

Not only when read by a C programmer. I'm a JS programmer and I don't understand what's "overly clever" about something you learn in JS 101.

J-dawg · on Sept 29, 2017

Although the post-increment (x++) operator was part of 'JS 101' for me, I've never knowingly seen the pre-increment (++x) operator before. TIL.

madamelic · on Sept 29, 2017

The comment was a bit tongue-in-cheek, but I'll accept that I oversold it.

ak217 · on Sept 28, 2017

What do folks use for event-driven scatter-gather job control using Lambdas only? For example, imagine I launch 10K Lambda jobs from an orchestrator Lambda. I can't guarantee they'll finish in 300s, so I can't just poll for completion. I could wake up a gather Lambda as a scheduled event, but the schedule granularity is too coarse. Waking up a gather lambda every time a job completes is wasteful and racy.

I would really like AWS to have an event that fires when all the lambdas in a set have completed, but that doesn't seem to exist. Any suggestions?

teej · on Sept 28, 2017

What we do is a separate Python process, not in Lambda, that polls status. Status is persisted in Redis. It supports lambdas calling other lambdas so dynamic workflows just work and can live for longer than 300s. This has been working fine so far but would probably have issues at 10k+.

tyingq · on Sept 29, 2017

There's an SQS/Lamba integration. Push 10k messages in a queue as jobs. The lambdas pull them off and delete them once done.

https://cloudonaut.io/integrate-sqs-and-lambda-serverless-ar...

teej · on Sept 29, 2017

From what I can tell, this doesn't really help you know when your queued up lambdas have all completed, which is what GP wants.

tyingq · on Sept 29, 2017

You're right. I assumed the queue depth api call would be useable as a counter. It appears though, that it's only an approximate count unless you use FIFO queues.

I guess the individual Lambdas could pop an "I'm done" message onto a FIFO queue. With a single Lambda consumer of that queue that updates an S3 bucket with a count. Done when count == 10,000. That's atomic, and no polling, but feels a bit smelly.

joseph · on Sept 28, 2017

Have you tried Step Functions? I've used them to orchestrate Lambdas with good success when I needed something that will run longer than Lambdas can individually.

kondro · on Sept 29, 2017

Yeah, they’re good, but more expensive than you’d expect given the relative pricing compared to other AWS services. $0.025/1000 transitions seems and order of magnitude or two high given it’s just ultimately a central job store with some very, very simple logic for changing state.

tfjaeckel · on Sept 29, 2017

Step Functions are convenient but for the reason you stated we stopped using them. Pricing seems off for what they do, especially when it's in volume.

btashton · on Sept 29, 2017

I wrote a simple lambda function that takes in a payload and a step function arn. It then fires up a bunch of step function jobs and maps the execution identifier to the input and stores that in s3 for later access. This ends up being really nice for creating distributed cron like jobs by just connecting the lambda to cloudwatch events.

zachrose · on Sept 29, 2017

Here's a not-even-half-baked idea I've had for a while: would it be useful to have partial application of AWS Lambdas? Such that if you don't "call" an AWS Lambda with all its "arguments" it will create a new AWS Lambda that's just waiting for the remaining arguments, and return an ARN for the new Lambda?

Again, I can't imagine why this would be useful, but it's a certain kind of fun to take Amazon's metaphor and run with it.

(It's also very hard to Google for "aws lambda partial application" and get results that match what I'm describing here.)

empath75 · on Sept 29, 2017

Why stop there. What if we implemented aws lambda calculus.

snaky · on Sept 30, 2017

Distributed lambda calculus would be fun to implement. Differential distance-based reduction cost to start with?

teej · on Sept 28, 2017

Thanks for sharing! I ended up building a similar featureset in Python but using invocation type Event instead of RequestResponse. This ended up being a pain in the ass because Event type has these awful baked in automatic retries that you cannot configure or disable. It ended up being worth it because we wanted to use up to the full 5 minute runtime limit.

How is this working for you? Where do you see it going?

madamelic · on Sept 28, 2017

>How is this working for you? Where do you see it going?

It actually works quite well. I use it for processing large amounts of data quickly. I originally built this to process some data that would've taken ~14 days and shortened to a few minutes (so nice!).

I enjoy using it when I can find a way to fit it into a project. I've had it internally for a month so not a whole lot of time to see it in a variety of projects.

In terms of where I see it going, I'm not entirely sure yet. It is pretty capable for what I am using it for currently, it can use node.js libraries so it fits the current needs (which are pretty basic). I'll see what happens when more people try it out. :)

Do you have any tips on what should be added, since you've been down this road before?

teej · on Sept 29, 2017

Most of the things I've added are for logging and orchestration. The most important unlock for us was the ability for us to enable chaining of Lambdas that worked seamlessly with our orchestration. So Orchestrator kicks off 25 lambdas (A) that are reading files off S3. Each Lambda A reads 100 lines of that file and invokes a Lambda B to parse it. Lambda B parses and filters each line in its batch and sends it to Lambda C to be persisted or Lambda D if it needs to be discarded.

We end up with hundreds or thousands of Lambda invocations and the orchestrator is aware of all of them. If there's a failure in any lambda, I can stop the entire job and retry from scratch. The orchestrator knows exactly when the job is done so I can easily chain it together with some other non-Lambda job in my system.

Been working fine for us for ~200k invocations and 8k CPU minutes per day baseline with the ability to scale up to 10x that just by running more stuff. Our nightly job processes about 50gb of data in 3 min.

ak217 · on Sept 28, 2017

FYI, I believe you can disable retries and replace them with a DLQ (which you can then ignore, if you don't care about failures).

I would be interested, though, if anyone has figured out a way to raise the 30 second RequestResponse timeout. Both that and the 300s Lambda timeout are annoyingly low. Google is slightly better, they go to 600s!

teej · on Sept 28, 2017

At my company we created a queue for Lambdas. The pattern we use is to monitor the execution time left and queue up a follow-up once there's <30s left. Not every job we do can be composed this way but it helps a lot.

madamelic · on Sept 28, 2017

>The pattern we use is to monitor the execution time left and queue up a follow-up once there's <30s left. Not every job we do can be composed this way but it helps a lot.

Hmm. Are you saying you can queue additional jobs on the same worker to use up remaining execution time?

teej · on Sept 28, 2017

Amazon doesnt really give you any direct access to the underlying "worker" so it doesnt work like that. It's really as simple as:

- the active lambda decides to stop performing work once there's only 30 seconds left

- it switches to perform whatever cleanup is necessary

- it queues up a new lambda to be executed by pushing the lambda name and event payload to redis

- I have a special lambda that's scheduled for once a minute that scans certain keys in redis, pops the data out, and invokes the lambdas

To give an example, I have a job where I need to read ~50 files in S3 line by line and do some work over them. I spin up 50 lambdas, one for each file. Each lambda chugs along until it is about to run out of time. It then queues up a followup lambda with the byte # to start at in the event payload. The followup lambda then picks up where the last one left off.

I have some cases where instead of queueing up the followup lambda I just invoke it directly. The value in having a queue step in between is that you can more cleanly handle throttling and you can avoid event payload limits by just putting the payload into Redis.

madamelic · on Sept 29, 2017

That's... really awesome. I think that may have to be a follow-up feature for this project.

Thanks!

Edmond · on Sept 28, 2017

good thing we changed our product name from Hivemind, a wicked trademark fight has just been averted:

http://blog.codesolvent.com/2017/05/say-hello-to-solvent.htm...

:)

madamelic · on Sept 28, 2017

Haha.

I originally wanted to name it unity-hivemind after I had watched Rick and Morty that weekend, but I figured the name was both a collision and a confusing due to the Unity graphics engine.

kitd · on Sept 29, 2017

Back in the day, there was Apache HiveMind too, an early IoC container

https://hivemind.apache.org

dvdhnt · on Sept 28, 2017

I’m sure this is great, but for some of us, being tied to a single vendor is a non-starter.

madamelic · on Sept 28, 2017

Totally understand. :) Just thought I'd share something that made my life a little easier.

dvdhnt · on Sept 28, 2017

I definitely appreciate that, and I didn’t mean to take anything away from your work. Cheers.