Show HN: Meadowrun automates the tedious details of running Python on AWS/Azure

indoorskier · on June 10, 2022

Hi HN,

We're two programmers who have worked in core/platform engineering roles for most of our working lives. During that time, one of the main problems we've solved time and time again is to let people run their ad-hoc jobs and scripts on remote compute without hassle.

To solve this once and for everyone, we made Meadowrun, an open source tool that automates the tedious details of running Python code on cloud VMs. It runs in your AWS or Azure account, nothing else required.

No need to mess around with containers, SSH into remote machines, copy code across, set up images or look up instance types that sound like Starbucks orders ("t3.venti.oatmilk.latte") and what they cost.

All with the same experience as you'd have running on your laptop - just change the code or dependencies locally and run - meadowrun takes care of the rest.

We welcome any and all feedback!

by_the_bay · on June 10, 2022

If your target workflows are ad hoc jobs, wouldn’t running on lambda make more sense than dealing with the headache of provisioning ec2?

hrichardlee1 · on June 10, 2022

Yes! (in some cases) The tradeoffs are that lambdas are way faster to start up, but they max out at 10GB of memory, 15 minutes of runtime, you can't use GPUs, and you get 250MB to store your code (you get more if you build a container) and 512MB of temp space. We're in the middle of working on adding this as an option--we want to make it seamless to switch between running your code on Lambda (for short, quick jobs) vs EC2 (for longer, more resource-intensive jobs) depending on what kind of workload you have.

r00tanon · on June 10, 2022

Good news. We've found lambdas orchestrated with step functions and properly split out as to function can do a lot of data processing even with the limits you stated before needing to switch over to AWS batch or ECS. Nice to have that tool in the belt. I'll check it out.

by_the_bay · on June 10, 2022

Cool, look forward to trying it out

yunohn · on June 10, 2022

Their landing page claims that Lambda support is planned.

But I think there’s scope for both, data jobs needs ec2.

binaryblitz · on June 10, 2022

This is an awesome idea!

Any plans to implement Fargate as an option? You mention the limitations of Lambda and Fargate pretty much takes care of all of those, without needing to provision EC2.

hrichardlee · on June 11, 2022

Thank you!

Fargate is more of a maybe for us as it doesn't seem to offer a ton of advantages over EC2. It still takes about 30 seconds to launch a Fargate job, and as far as I can tell there's no way to "keep an instance around". With Meadowrun-on-EC2 or Lambda, when you run two jobs run one after another with the same libraries and the same code (or even slightly different code), there's almost 0 overhead for running the second job. So Fargate is only slightly better for a cold start (30s compared to 45-60s for an EC2 instance in my experience), and significantly worse for a warm start (still 30s). And that's the core experience we're trying to make amazing--run some code, look at the results/data, tweak it, run it again, repeat.

Meadowrun is taking care of all the messy details of provisioning and managing the EC2 instances, so Meadowrun-on-Fargate won't be any easier to use than Meadowrun-on-EC2, and I don't see a ton of advantages to make up for the inability to get a warm start on Fargate. That said, AWS is super dynamic, so we're definitely keeping an eye on Fargate.

sscarduzio · on June 12, 2022

I'm seeing a lot of projects that aim to bridge the usability gap between the real world and the bloated, overcomplicated cloud services.

Another signal of cloud rot: I see myself and my peers are migrating away from AWS to smaller, less complicated, cheaper providers like Linode, but also Hetzner, Wasabi.

Nowadays the cloud fatigue is higher than the burden of self hosting your services.

danbmil99 · on June 11, 2022

This looks quite a bit like a project we worked on last year:

https://news.ycombinator.com/item?id=28191450

We got quite a few comments on hacker news but unfortunately we didn't see a lot of uptake.

hrichardlee · on June 11, 2022

Seems like we're definitely on the same wavelength! We'd love to hear more about your experiences--I'll reach out directly

danbmil99 · on June 20, 2022

Not sure if you tried to email me, I didn't see anything.

If you want to connect, I'm danbmil99 at gmail etc

garyrob · on June 10, 2022

This reminds me of PyWren.[1] Not sure if that's still operational though. The latest update on their site was from 2017.

[1] http://pywren.io

indoorskier · on June 10, 2022

Great to hear you mention that. PyWren was one of our inspirations. Other inspirations are its successor NumPyWren [1], and gg [2].

We've looked at the code for PyWren and in our opinion it's not practically usable as-is, even if you wanted to target only Lambda. Also we initially focussed more on the deployment aspect (i.e. getting environment + code on the target machines reproducibly), and EC2 because we figured to make this general enough people would need an escape hatch anyway if Lambdas didn't cut it for some reason.

[1] https://github.com/Vaishaal/numpywren

[2] "From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers" https://www.usenix.org/conference/atc19/presentation/fouladi

barefeg · on June 10, 2022

What are the pros and cons of implementing this scheduler over the cloud provider’s services rather than using kubernetes?

hrichardlee · on June 10, 2022

One pro is less overhead--I think all of the major options for an autoscaling Kubernetes cluster require at least one node that's always on. Meadowrun uses AWS Lambda/Azure Functions to manage instances, so it gets a lot closer to truly scaling down to zero.

Another pro is if your workflows aren't already container-based, not running on Kubernetes means we can build your containers for you on Meadowrun so you don't need to e.g. install Docker locally to get your libraries/code running on Meadowrun (it's hard to build containers in Kubernetes itself).

I mentioned this in another comment, but this also means we can e.g. use AWS Lambda as the compute layer, or if you have software that's hard to containerize, you can even use a custom AMI. (Both of these are features on the roadmap, so this is a bit theoretical at this point.)

The biggest con is probably that a lot of people already use Kubernetes, especially if they have an on-prem/hybrid deployment, or maybe if they have services with e.g. a load balancer that interact with their ad-hoc/batch jobs.

We are planning on adding the ability for Meadowrun to target Kubernetes as well, so Kubernetes takes care of the resource scheduling, but you still get the benefits of Meadowrun--a really simple API for running ad-hoc/batch jobs.

hhh · on June 11, 2022

Very interested once this can target k8s :)

nathants · on June 10, 2022

this is fantastic! simple map/reduce type abstractions over ec2 spot, lambda, and s3 are definitely the play. this is going to help a lot of people use more cheap vcpu billed by the second.

i4i instances recently launched. so much fast local disk. so much bandwidth to s3. needs more data processing.

subscribing to the commits on github.

hrichardlee · on June 10, 2022

Thank you! We appreciate it!

focom · on June 10, 2022

Cool stuff, I wish you guys success!