Something that turned me off of GPU lambda services is that they don't offer a way to run everything locally. I have an instance with a GPU where I do my dev. I'm running Postgres, the front end, back end (Node), and my GPU worker thread (Python) on that box. Replicate and other offerings do not let me run the full hosted environment on my machine (you can run the service container but it behaves differently as there's no included router/scheduler). It feels wrong to use a magically instantiated box for my GPU worker.
Really all that I want is for Render.com to have GPU instances (I saw fly.io now has GPUs which is great, but I've both heard bad things about their stability and don't care for their rethinking of server architecture). Please will someone give me PaaS web hosting with GPU instances?
I'm a simple man. If there's a fundamental shift in hosting philosophy I will resist that change. I have loved docker and PaaS as revolutions in development and hosting experiences because the interface is at some level still just running Linux processes like I do on my computer. You can tell me that now my code is hosted in a serverless runtime. But you need to give me that runtime so that I can spin it up on my own computer, on EKS, or whatever if need be.
Lambda is fundamentally a request/response architecture and is meant to be tied together with several other AWS services. As such, I don't think Modals offering is really comparable, nor is "lambda on hard mode" a particularly good description for what they've made.
I've deployed lambdas written in rust, mostly because I needed a good C interop and didn't want to mess around with C++ AWS (been down that road, without native json support it's a pain).
The rust lambda SDK is just fine. You can write your rust endpoint on, e.g. Axum, then deploy straightaway.
I run a few full fledged apis with just one lambda in this way. It's not hard mode at all.
The work around Rust and AWS Lambda integration makes it a joy to work with. Fast developer experience, fast and cheap runtime, when using rust in lambda it’s fast anyway.
Like yourself I use Axum, and pretend AWS lambda doesn’t exist. By default I use the standard HTTP server for local development, I then have an environment variable to toggle lambda mode for when running in lambda. If I wanted to I can run the app anywhere, lambda, ec2, eks, fargate, 3rd party vps/server.
Using Cargo Lambda and its associated AWS CDK plugin it cross compiles to ARM64, sets up AWS stacks with databases or other resources, removes a bunch of manual tasks.
The only thing that is inconvenient is you can not use custom domains for function urls. If you need a vanity name, you have to go via api gateway and associated costs. That’s by design though.
This plus because you can dockerize and run on lambda, essentially you can run most anything these days (most things i've encountered are reasonably easy to dockerize, i'm sure there are exceptions, but in the main easy)
I'm curious about latency, cold and warm, using docker. I have a dockerized number cruncher and it's a breeze to maintain, and I'm thinking of moving everything over. What's your experience?
My understanding is that cold starts on containerized Lambdas is actually better than non-containerized for some workloads, because using containers allows Lambda to do better caching of the code, as well as lazy-loading. YMMV of course based on exactly what image you use (eg if you're not using a common base, like Ubuntu or Amazon Linux, you won't get as much benefit from the caching) and how much custom code you have (like hundreds of MBs worth).
I never had a case where cold starts mattered because either 1) it was the kind of service where cold starts intrinsically didnt matter, or 2) we generally had > 1 req/15mins meaning we always had something warm.
3) Also you can pay for provisioned capacity[1] if the cold start thing makes it worth the money, though also just look into fargate[2] if that's the case.
There are lots of kinds of containerization too btw, if i'm not mistaken AWS has a lot of investment in Firecracker too https://firecracker-microvm.github.io/
Docker is a bit more cold start time over native (zipped). That said, rust is so much faster than the scripted languages it's still much faster than what most are doing.
> been down that road, without native json support it's a pain
Did rust get native JSON support in the year since I last used it?
If you need JSON support in C++ nlohmann json is the defacto just like serde would be for rust.
Now if you just aren't adept at C++ build tooling that is fine as a reason to use Rust for this but "because there is no JSON support" definitely isn't a valid reason.
You are using JSON. Performance stopped being an option about 3 decisions before choosing Nlohmann.
If you cared about performance a JSON parser isn't on your list and if it is it's a relatively minor part of the product stack so once again, use the thing that works and is popular.
If your primary means of communication is JSON you are likely optimising a little too hard if you are looking for the most performant parser implementation, good enough is good enough there. If you want performance pick a different format.
The single largest contributor to performance degradation on websites is that very industry.
Look JSON has it's advantages and is a fine tradeoff, performance isn't a place it's good at, that's not something that makes it a bad format, it's just if you want to optimise for performance I would start by reducing the sheer amount of redundant data being passed around in a JSON blob long before I would hyper optimise on a C++ JSON parser.
Sure if you are using a JS or python json parser there are massive gains to be had by calling into a lower level language to do the parsing but picking between the choices in C++ parsers is probably bikeshedding.
Now if your use case truely needs to absolute most performant JSON parser and you will trade off usability and portability for it then sure but another one of hose axioms apply. The solution for the 99th percentile is rarely the correct solution for the 50th percentile
Function URLs aren’t part of lambda, they’re just a thin abstraction around API Gateway v2 (http APIs) that allow all calls, and have randomly generated domains, so you’re not gaining anything and losing some functionality by doing this instead of running an API GW with lambda proxy integration yourself. If setting up API GW is too difficult, you could use SAM or Serverless Framework to automatically provision it. Then you can have a real domain, SSL, failover, endpoint validation, etc.
>so you’re not gaining anything and losing some functionality by doing this instead of running an API GW
You're gaining the fact that Function URLs are free while APIGW can be pretty costly, as well as the fact that Function URLs are fantastically less complex than APIGW if your use cases fit it.
I think they are only an asynchronous invocation in that case though? The reason they do that is they dont want your connections holding a port for 15 minutes.
(Author) Modal tackles how to make FaaS work, but for actual _function calls_, and also with containers that have much higher resource caps (see article: 3 CPUs vs 64 CPUs, or 10 GB RAM vs 336 GB RAM).
EC2 isn't the same compute shape. We run fast-booting (think: seconds, not minutes), dynamic sandboxed containers on a single host (think: gVisor, Firecracker) and optimized file system lookups (FUSE, distributed caching, readahead, profiling). It also means we bill by the CPU cycle, scale rapidly, and bill you only for 100% utilization. You do not manage individual VMs.
This is why scaling the limits of functions-as-a-service is quite different from scaling VMs, and that's what the content of the article focuses on.
I've been transitioning some compute-heavy workloads from Lambda/AWS Batch to modal recently and have nothing but good things to say about it. One of those technologies where you are shipping the same afternoon as checking it out. "Wow, that just works?" Highly highly recommended, feels like the future IMO.
As the other commenter points out, this offering isn't quite comparable to Lambda directly. This ends up comparing apples to oranges here and there, but overall I was able to get a good idea of the choices made and the tradeoffs involved. Nice work! I do have a complaint about the comparison table that shows 'convert HTTP to function calls' as an alternative to load balancers and reverse proxies: as we see later in the article, there is still a load balancer involved, and that table creates a false impression that there isn't.
(Author) Thanks for the feedback! Yes, there's still a network load balancer involved — we debated whether we should include the last section of the post but decided it's still worth writing out. The idea is that the load balancer doesn't need to directly connect to the running machines / containers, but instead just a much smaller set of `modal-http` service pods that translate HTTP to function calls, which lets us deploy and service big apps a lot faster. :)
Writing a good network load balancer is very hard and we don't intend to suggest that it's our forte. We very much defer to all of the existing systems literature on this (and a couple papers like Maglev are in the post.)
I don’t get it. A lot of this post is describing how they translate HTTP requests into function calls, and deal with issues around the HTTP protocol.
If you’re invoking a heavy-duty, long running function then you’re likely doing something bespoke. Why use HTTP at all? Wouldn’t GRPC be a better fit, because that seems to be what is being reinvented here.
The selling point of HTTP is that it’s ubiquitous and simple. But you’re coupling it with an offering that is specific and complex. Is using a GRPC library such a burden that makes this effort worthwhile?
If only there was some kind of long-lived proxy you could put in front of things that would handle connection pooling and would forward requests onto a short-lived backend, scaling up instances as needed to serve those requests.
This is shockingly low, and I wouldn't believe it without data. 16Mbps (2MB/s) would be more believable. In my experience you can reliably get 25MB/s (~400Mbps) in the network layer of things in AWS.
You're correct, it is 2 MB/s. The actual bandwidth from the AWS Lambda docs is:
>Uncapped for the first 6 MB of your function's response. For responses larger than 6 MB, 2MBps for the remainder of the response
Some of the other numbers in the article are also incorrect. Lambda functions using containers can use a 10 GB container image (the article claims 50 MB), and container images are actually the faster/preferred way to do it these days.
The services compared aren't really equivalents - Cloudflare Workers is more like Lambda@Edge, in particular I just can't imagine a reason you'd need it for the NN training/video processing/the background job type tasks it mentioned; and Google Cloud Run I'm less familiar with, but isn't it more like AppRunner or Fargate?
For workloads that take minutes to run, it's very easy to hit max socket connection limit somaxconn. How do you handle that with just request response model?
Really all that I want is for Render.com to have GPU instances (I saw fly.io now has GPUs which is great, but I've both heard bad things about their stability and don't care for their rethinking of server architecture). Please will someone give me PaaS web hosting with GPU instances?
I'm a simple man. If there's a fundamental shift in hosting philosophy I will resist that change. I have loved docker and PaaS as revolutions in development and hosting experiences because the interface is at some level still just running Linux processes like I do on my computer. You can tell me that now my code is hosted in a serverless runtime. But you need to give me that runtime so that I can spin it up on my own computer, on EKS, or whatever if need be.