
Show HN: Hivemind – Distributed jobs using AWS Lambda functions - madamelic
https://github.com/littlstar/hivemind
======
posnet
More mature python based solution.

[http://pywren.io/](http://pywren.io/)

------
defined
This would be an entertaining comment when read by a C programmer:

    
    
        // Overly clever way of saying: increment `currentChunk` then use the incremented value
        this.chunks[++currentChunk] = []
    

Pre- and post-increment operators are bread and butter in C, hardly overly
clever. Are they a recent addition to JS?

~~~
Kiro
Not only when read by a C programmer. I'm a JS programmer and I don't
understand what's "overly clever" about something you learn in JS 101.

~~~
J-dawg
Although the post-increment (x++) operator was part of 'JS 101' for me, I've
never knowingly seen the pre-increment (++x) operator before. TIL.

------
ak217
What do folks use for event-driven scatter-gather job control using Lambdas
only? For example, imagine I launch 10K Lambda jobs from an orchestrator
Lambda. I can't guarantee they'll finish in 300s, so I can't just poll for
completion. I could wake up a gather Lambda as a scheduled event, but the
schedule granularity is too coarse. Waking up a gather lambda every time a job
completes is wasteful and racy.

I would really like AWS to have an event that fires when all the lambdas in a
set have completed, but that doesn't seem to exist. Any suggestions?

~~~
joseph
Have you tried Step Functions? I've used them to orchestrate Lambdas with good
success when I needed something that will run longer than Lambdas can
individually.

~~~
kondro
Yeah, they’re good, but more expensive than you’d expect given the relative
pricing compared to other AWS services. $0.025/1000 transitions seems and
order of magnitude or two high given it’s just ultimately a central job store
with some very, very simple logic for changing state.

~~~
tfjaeckel
Step Functions are convenient but for the reason you stated we stopped using
them. Pricing seems off for what they do, especially when it's in volume.

------
zachrose
Here's a not-even-half-baked idea I've had for a while: would it be useful to
have partial application of AWS Lambdas? Such that if you don't "call" an AWS
Lambda with all its "arguments" it will create a new AWS Lambda that's just
waiting for the remaining arguments, and return an ARN for the new Lambda?

Again, I can't imagine why this would be useful, but it's a certain kind of
fun to take Amazon's metaphor and run with it.

(It's also very hard to Google for "aws lambda partial application" and get
results that match what I'm describing here.)

~~~
empath75
Why stop there. What if we implemented aws lambda calculus.

~~~
snaky
Distributed lambda calculus would be fun to implement. Differential distance-
based reduction cost to start with?

------
teej
Thanks for sharing! I ended up building a similar featureset in Python but
using invocation type Event instead of RequestResponse. This ended up being a
pain in the ass because Event type has these awful baked in automatic retries
that you cannot configure or disable. It ended up being worth it because we
wanted to use up to the full 5 minute runtime limit.

How is this working for you? Where do you see it going?

~~~
madamelic
>How is this working for you? Where do you see it going?

It actually works quite well. I use it for processing large amounts of data
quickly. I originally built this to process some data that would've taken ~14
days and shortened to a few minutes (so nice!).

I enjoy using it when I can find a way to fit it into a project. I've had it
internally for a month so not a whole lot of time to see it in a variety of
projects.

In terms of where I see it going, I'm not entirely sure yet. It is pretty
capable for what I am using it for currently, it can use node.js libraries so
it fits the current needs (which are pretty basic). I'll see what happens when
more people try it out. :)

Do you have any tips on what should be added, since you've been down this road
before?

~~~
teej
Most of the things I've added are for logging and orchestration. The most
important unlock for us was the ability for us to enable chaining of Lambdas
that worked seamlessly with our orchestration. So Orchestrator kicks off 25
lambdas (A) that are reading files off S3. Each Lambda A reads 100 lines of
that file and invokes a Lambda B to parse it. Lambda B parses and filters each
line in its batch and sends it to Lambda C to be persisted or Lambda D if it
needs to be discarded.

We end up with hundreds or thousands of Lambda invocations and the
orchestrator is aware of all of them. If there's a failure in any lambda, I
can stop the entire job and retry from scratch. The orchestrator knows exactly
when the job is done so I can easily chain it together with some other non-
Lambda job in my system.

Been working fine for us for ~200k invocations and 8k CPU minutes per day
baseline with the ability to scale up to 10x that just by running more stuff.
Our nightly job processes about 50gb of data in 3 min.

------
Edmond
good thing we changed our product name from Hivemind, a wicked trademark fight
has just been averted:

[http://blog.codesolvent.com/2017/05/say-hello-to-
solvent.htm...](http://blog.codesolvent.com/2017/05/say-hello-to-solvent.html)

:)

~~~
madamelic
Haha.

I originally wanted to name it unity-hivemind after I had watched Rick and
Morty that weekend, but I figured the name was both a collision and a
confusing due to the Unity graphics engine.

------
dvdhnt
I’m sure this is great, but for some of us, being tied to a single vendor is a
non-starter.

~~~
madamelic
Totally understand. :) Just thought I'd share something that made my life a
little easier.

~~~
dvdhnt
I definitely appreciate that, and I didn’t mean to take anything away from
your work. Cheers.

