We're two programmers who have worked in core/platform engineering roles for most of our working lives. During that time, one of the main problems we've solved time and time again is to let people run their ad-hoc jobs and scripts on remote compute without hassle.
To solve this once and for everyone, we made Meadowrun, an open source tool that automates the tedious details of running Python code on cloud VMs. It runs in your AWS or Azure account, nothing else required.
No need to mess around with containers, SSH into remote machines, copy code across, set up images or look up instance types that sound like Starbucks orders ("t3.venti.oatmilk.latte") and what they cost.
All with the same experience as you'd have running on your laptop - just change the code or dependencies locally and run - meadowrun takes care of the rest.
Yes! (in some cases) The tradeoffs are that lambdas are way faster to start up, but they max out at 10GB of memory, 15 minutes of runtime, you can't use GPUs, and you get 250MB to store your code (you get more if you build a container) and 512MB of temp space. We're in the middle of working on adding this as an option--we want to make it seamless to switch between running your code on Lambda (for short, quick jobs) vs EC2 (for longer, more resource-intensive jobs) depending on what kind of workload you have.
Good news. We've found lambdas orchestrated with step functions and properly split out as to function can do a lot of data processing even with the limits you stated before needing to switch over to AWS batch or ECS. Nice to have that tool in the belt. I'll check it out.
Any plans to implement Fargate as an option? You mention the limitations of Lambda and Fargate pretty much takes care of all of those, without needing to provision EC2.
Fargate is more of a maybe for us as it doesn't seem to offer a ton of advantages over EC2. It still takes about 30 seconds to launch a Fargate job, and as far as I can tell there's no way to "keep an instance around". With Meadowrun-on-EC2 or Lambda, when you run two jobs run one after another with the same libraries and the same code (or even slightly different code), there's almost 0 overhead for running the second job. So Fargate is only slightly better for a cold start (30s compared to 45-60s for an EC2 instance in my experience), and significantly worse for a warm start (still 30s). And that's the core experience we're trying to make amazing--run some code, look at the results/data, tweak it, run it again, repeat.
Meadowrun is taking care of all the messy details of provisioning and managing the EC2 instances, so Meadowrun-on-Fargate won't be any easier to use than Meadowrun-on-EC2, and I don't see a ton of advantages to make up for the inability to get a warm start on Fargate. That said, AWS is super dynamic, so we're definitely keeping an eye on Fargate.
I'm seeing a lot of projects that aim to bridge the usability gap between the real world and the bloated, overcomplicated cloud services.
Another signal of cloud rot: I see myself and my peers are migrating away from AWS to smaller, less complicated, cheaper providers like Linode, but also Hetzner, Wasabi.
Nowadays the cloud fatigue is higher than the burden of self hosting your services.
Great to hear you mention that. PyWren was one of our inspirations. Other inspirations are its successor NumPyWren [1], and gg [2].
We've looked at the code for PyWren and in our opinion it's not practically usable as-is, even if you wanted to target only Lambda. Also we initially focussed more on the deployment aspect (i.e. getting environment + code on the target machines reproducibly), and EC2 because we figured to make this general enough people would need an escape hatch anyway if Lambdas didn't cut it for some reason.
One pro is less overhead--I think all of the major options for an autoscaling Kubernetes cluster require at least one node that's always on. Meadowrun uses AWS Lambda/Azure Functions to manage instances, so it gets a lot closer to truly scaling down to zero.
Another pro is if your workflows aren't already container-based, not running on Kubernetes means we can build your containers for you on Meadowrun so you don't need to e.g. install Docker locally to get your libraries/code running on Meadowrun (it's hard to build containers in Kubernetes itself).
I mentioned this in another comment, but this also means we can e.g. use AWS Lambda as the compute layer, or if you have software that's hard to containerize, you can even use a custom AMI. (Both of these are features on the roadmap, so this is a bit theoretical at this point.)
The biggest con is probably that a lot of people already use Kubernetes, especially if they have an on-prem/hybrid deployment, or maybe if they have services with e.g. a load balancer that interact with their ad-hoc/batch jobs.
We are planning on adding the ability for Meadowrun to target Kubernetes as well, so Kubernetes takes care of the resource scheduling, but you still get the benefits of Meadowrun--a really simple API for running ad-hoc/batch jobs.
this is fantastic! simple map/reduce type abstractions over ec2 spot, lambda, and s3 are definitely the play. this is going to help a lot of people use more cheap vcpu billed by the second.
i4i instances recently launched. so much fast local disk. so much bandwidth to s3. needs more data processing.
We're two programmers who have worked in core/platform engineering roles for most of our working lives. During that time, one of the main problems we've solved time and time again is to let people run their ad-hoc jobs and scripts on remote compute without hassle.
To solve this once and for everyone, we made Meadowrun, an open source tool that automates the tedious details of running Python code on cloud VMs. It runs in your AWS or Azure account, nothing else required.
No need to mess around with containers, SSH into remote machines, copy code across, set up images or look up instance types that sound like Starbucks orders ("t3.venti.oatmilk.latte") and what they cost.
All with the same experience as you'd have running on your laptop - just change the code or dependencies locally and run - meadowrun takes care of the rest.
We welcome any and all feedback!