Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hivemind – Distributed jobs using AWS Lambda functions (github.com)
66 points by madamelic on Sept 28, 2017 | hide | past | web | favorite | 35 comments



More mature python based solution.

http://pywren.io/


This would be an entertaining comment when read by a C programmer:

    // Overly clever way of saying: increment `currentChunk` then use the incremented value
    this.chunks[++currentChunk] = []
Pre- and post-increment operators are bread and butter in C, hardly overly clever. Are they a recent addition to JS?


Nope.. They've been in JS since the early versions of emcascript. JS is often used for more web-appy stuff than actual math so I wouldn't say they're used very abundantly, which may explain the comment.


On top of that, he could have just written "currentChunk++;" and skipped the comment entirely.


wouldn't that return the value of currentChunk before it was incremented? So it would need an additional line to be equivalent to the example (making it a bit more readable in my opinion):

  currentChunk++;
  this.chunks[currentChunk] = [];
Which is perhaps what you meant all along. Please excuse the pedantry.


Yeah, that's what I meant - instead of taking the shortcut with a comment explaining it, just don't take it in the first place.


Not only when read by a C programmer. I'm a JS programmer and I don't understand what's "overly clever" about something you learn in JS 101.


Although the post-increment (x++) operator was part of 'JS 101' for me, I've never knowingly seen the pre-increment (++x) operator before. TIL.


The comment was a bit tongue-in-cheek, but I'll accept that I oversold it.


What do folks use for event-driven scatter-gather job control using Lambdas only? For example, imagine I launch 10K Lambda jobs from an orchestrator Lambda. I can't guarantee they'll finish in 300s, so I can't just poll for completion. I could wake up a gather Lambda as a scheduled event, but the schedule granularity is too coarse. Waking up a gather lambda every time a job completes is wasteful and racy.

I would really like AWS to have an event that fires when all the lambdas in a set have completed, but that doesn't seem to exist. Any suggestions?


What we do is a separate Python process, not in Lambda, that polls status. Status is persisted in Redis. It supports lambdas calling other lambdas so dynamic workflows just work and can live for longer than 300s. This has been working fine so far but would probably have issues at 10k+.


There's an SQS/Lamba integration. Push 10k messages in a queue as jobs. The lambdas pull them off and delete them once done.

https://cloudonaut.io/integrate-sqs-and-lambda-serverless-ar...


From what I can tell, this doesn't really help you know when your queued up lambdas have all completed, which is what GP wants.


You're right. I assumed the queue depth api call would be useable as a counter. It appears though, that it's only an approximate count unless you use FIFO queues.

I guess the individual Lambdas could pop an "I'm done" message onto a FIFO queue. With a single Lambda consumer of that queue that updates an S3 bucket with a count. Done when count == 10,000. That's atomic, and no polling, but feels a bit smelly.


Have you tried Step Functions? I've used them to orchestrate Lambdas with good success when I needed something that will run longer than Lambdas can individually.


Yeah, they’re good, but more expensive than you’d expect given the relative pricing compared to other AWS services. $0.025/1000 transitions seems and order of magnitude or two high given it’s just ultimately a central job store with some very, very simple logic for changing state.


Step Functions are convenient but for the reason you stated we stopped using them. Pricing seems off for what they do, especially when it's in volume.


I wrote a simple lambda function that takes in a payload and a step function arn. It then fires up a bunch of step function jobs and maps the execution identifier to the input and stores that in s3 for later access. This ends up being really nice for creating distributed cron like jobs by just connecting the lambda to cloudwatch events.


Here's a not-even-half-baked idea I've had for a while: would it be useful to have partial application of AWS Lambdas? Such that if you don't "call" an AWS Lambda with all its "arguments" it will create a new AWS Lambda that's just waiting for the remaining arguments, and return an ARN for the new Lambda?

Again, I can't imagine why this would be useful, but it's a certain kind of fun to take Amazon's metaphor and run with it.

(It's also very hard to Google for "aws lambda partial application" and get results that match what I'm describing here.)


Why stop there. What if we implemented aws lambda calculus.


Distributed lambda calculus would be fun to implement. Differential distance-based reduction cost to start with?


Thanks for sharing! I ended up building a similar featureset in Python but using invocation type Event instead of RequestResponse. This ended up being a pain in the ass because Event type has these awful baked in automatic retries that you cannot configure or disable. It ended up being worth it because we wanted to use up to the full 5 minute runtime limit.

How is this working for you? Where do you see it going?


>How is this working for you? Where do you see it going?

It actually works quite well. I use it for processing large amounts of data quickly. I originally built this to process some data that would've taken ~14 days and shortened to a few minutes (so nice!).

I enjoy using it when I can find a way to fit it into a project. I've had it internally for a month so not a whole lot of time to see it in a variety of projects.

In terms of where I see it going, I'm not entirely sure yet. It is pretty capable for what I am using it for currently, it can use node.js libraries so it fits the current needs (which are pretty basic). I'll see what happens when more people try it out. :)

Do you have any tips on what should be added, since you've been down this road before?


Most of the things I've added are for logging and orchestration. The most important unlock for us was the ability for us to enable chaining of Lambdas that worked seamlessly with our orchestration. So Orchestrator kicks off 25 lambdas (A) that are reading files off S3. Each Lambda A reads 100 lines of that file and invokes a Lambda B to parse it. Lambda B parses and filters each line in its batch and sends it to Lambda C to be persisted or Lambda D if it needs to be discarded.

We end up with hundreds or thousands of Lambda invocations and the orchestrator is aware of all of them. If there's a failure in any lambda, I can stop the entire job and retry from scratch. The orchestrator knows exactly when the job is done so I can easily chain it together with some other non-Lambda job in my system.

Been working fine for us for ~200k invocations and 8k CPU minutes per day baseline with the ability to scale up to 10x that just by running more stuff. Our nightly job processes about 50gb of data in 3 min.


FYI, I believe you can disable retries and replace them with a DLQ (which you can then ignore, if you don't care about failures).

I would be interested, though, if anyone has figured out a way to raise the 30 second RequestResponse timeout. Both that and the 300s Lambda timeout are annoyingly low. Google is slightly better, they go to 600s!


At my company we created a queue for Lambdas. The pattern we use is to monitor the execution time left and queue up a follow-up once there's <30s left. Not every job we do can be composed this way but it helps a lot.


>The pattern we use is to monitor the execution time left and queue up a follow-up once there's <30s left. Not every job we do can be composed this way but it helps a lot.

Hmm. Are you saying you can queue additional jobs on the same worker to use up remaining execution time?


Amazon doesnt really give you any direct access to the underlying "worker" so it doesnt work like that. It's really as simple as:

- the active lambda decides to stop performing work once there's only 30 seconds left

- it switches to perform whatever cleanup is necessary

- it queues up a new lambda to be executed by pushing the lambda name and event payload to redis

- I have a special lambda that's scheduled for once a minute that scans certain keys in redis, pops the data out, and invokes the lambdas

To give an example, I have a job where I need to read ~50 files in S3 line by line and do some work over them. I spin up 50 lambdas, one for each file. Each lambda chugs along until it is about to run out of time. It then queues up a followup lambda with the byte # to start at in the event payload. The followup lambda then picks up where the last one left off.

I have some cases where instead of queueing up the followup lambda I just invoke it directly. The value in having a queue step in between is that you can more cleanly handle throttling and you can avoid event payload limits by just putting the payload into Redis.


That's... really awesome. I think that may have to be a follow-up feature for this project.

Thanks!


good thing we changed our product name from Hivemind, a wicked trademark fight has just been averted:

http://blog.codesolvent.com/2017/05/say-hello-to-solvent.htm...

:)


Haha.

I originally wanted to name it unity-hivemind after I had watched Rick and Morty that weekend, but I figured the name was both a collision and a confusing due to the Unity graphics engine.


Back in the day, there was Apache HiveMind too, an early IoC container

https://hivemind.apache.org


I’m sure this is great, but for some of us, being tied to a single vendor is a non-starter.


Totally understand. :) Just thought I'd share something that made my life a little easier.


I definitely appreciate that, and I didn’t mean to take anything away from your work. Cheers.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: