Lambda on hard mode: serverless HTTP in Rust

teaearlgraycold · on March 16, 2024

Something that turned me off of GPU lambda services is that they don't offer a way to run everything locally. I have an instance with a GPU where I do my dev. I'm running Postgres, the front end, back end (Node), and my GPU worker thread (Python) on that box. Replicate and other offerings do not let me run the full hosted environment on my machine (you can run the service container but it behaves differently as there's no included router/scheduler). It feels wrong to use a magically instantiated box for my GPU worker.

Really all that I want is for Render.com to have GPU instances (I saw fly.io now has GPUs which is great, but I've both heard bad things about their stability and don't care for their rethinking of server architecture). Please will someone give me PaaS web hosting with GPU instances?

I'm a simple man. If there's a fundamental shift in hosting philosophy I will resist that change. I have loved docker and PaaS as revolutions in development and hosting experiences because the interface is at some level still just running Linux processes like I do on my computer. You can tell me that now my code is hosted in a serverless runtime. But you need to give me that runtime so that I can spin it up on my own computer, on EKS, or whatever if need be.

akira2501 · on March 16, 2024

Lambda is fundamentally a request/response architecture and is meant to be tied together with several other AWS services. As such, I don't think Modals offering is really comparable, nor is "lambda on hard mode" a particularly good description for what they've made.

Perhaps "EC2 on easy mode" is more like it.

jvanderbot · on March 16, 2024

I've deployed lambdas written in rust, mostly because I needed a good C interop and didn't want to mess around with C++ AWS (been down that road, without native json support it's a pain).

The rust lambda SDK is just fine. You can write your rust endpoint on, e.g. Axum, then deploy straightaway.

I run a few full fledged apis with just one lambda in this way. It's not hard mode at all.

tiew9Vii · on March 17, 2024

The work around Rust and AWS Lambda integration makes it a joy to work with. Fast developer experience, fast and cheap runtime, when using rust in lambda it’s fast anyway.

Like yourself I use Axum, and pretend AWS lambda doesn’t exist. By default I use the standard HTTP server for local development, I then have an environment variable to toggle lambda mode for when running in lambda. If I wanted to I can run the app anywhere, lambda, ec2, eks, fargate, 3rd party vps/server.

Using Cargo Lambda and its associated AWS CDK plugin it cross compiles to ARM64, sets up AWS stacks with databases or other resources, removes a bunch of manual tasks.

The only thing that is inconvenient is you can not use custom domains for function urls. If you need a vanity name, you have to go via api gateway and associated costs. That’s by design though.

The monolith lambda works well. The KISS approach

jvanderbot · on March 18, 2024

Yep exactly how I do it. API gateway is such a pain, just to specify a default route pointing to the lambda. But it's a one time setup.

maerF0x0 · on March 16, 2024

This plus because you can dockerize and run on lambda, essentially you can run most anything these days (most things i've encountered are reasonably easy to dockerize, i'm sure there are exceptions, but in the main easy)

jvanderbot · on March 16, 2024

I'm curious about latency, cold and warm, using docker. I have a dockerized number cruncher and it's a breeze to maintain, and I'm thinking of moving everything over. What's your experience?

arrakeenrevived · on March 16, 2024

My understanding is that cold starts on containerized Lambdas is actually better than non-containerized for some workloads, because using containers allows Lambda to do better caching of the code, as well as lazy-loading. YMMV of course based on exactly what image you use (eg if you're not using a common base, like Ubuntu or Amazon Linux, you won't get as much benefit from the caching) and how much custom code you have (like hundreds of MBs worth).

There's a very interesting blog post about it here, as well an an accompanying whitepaper: https://brooker.co.za/blog/2023/05/23/snapshot-loading.html

maerF0x0 · on March 17, 2024

I never had a case where cold starts mattered because either 1) it was the kind of service where cold starts intrinsically didnt matter, or 2) we generally had > 1 req/15mins meaning we always had something warm.

3) Also you can pay for provisioned capacity[1] if the cold start thing makes it worth the money, though also just look into fargate[2] if that's the case.

[1]: https://docs.aws.amazon.com/lambda/latest/dg/configuration-c...

[2]: https://aws.amazon.com/fargate/

vermilingua · on March 16, 2024

My experience with rust coldstarts was very good, my ECS backed lambdas coldstart and return a response in <40ms.

jvanderbot · on March 16, 2024

Yeah at this point vercel and AWS and basically everyone support serverless docker. It's probably dumb to do anything else.

maerF0x0 · on March 17, 2024

There are lots of kinds of containerization too btw, if i'm not mistaken AWS has a lot of investment in Firecracker too https://firecracker-microvm.github.io/

tracker1 · on March 17, 2024

Docker is a bit more cold start time over native (zipped). That said, rust is so much faster than the scripted languages it's still much faster than what most are doing.

jpc0 · on March 17, 2024

> been down that road, without native json support it's a pain

Did rust get native JSON support in the year since I last used it?

If you need JSON support in C++ nlohmann json is the defacto just like serde would be for rust.

Now if you just aren't adept at C++ build tooling that is fine as a reason to use Rust for this but "because there is no JSON support" definitely isn't a valid reason.

secondcoming · on March 17, 2024

People keep saying nlohmann json is defacto for C++, but it’s literally the worst performer out of any C++ json offering.

jpc0 · on March 17, 2024

You are using JSON. Performance stopped being an option about 3 decisions before choosing Nlohmann.

If you cared about performance a JSON parser isn't on your list and if it is it's a relatively minor part of the product stack so once again, use the thing that works and is popular.

If your primary means of communication is JSON you are likely optimising a little too hard if you are looking for the most performant parser implementation, good enough is good enough there. If you want performance pick a different format.

secondcoming · on March 18, 2024

Not really. The entire adtech industry revolves around passing trillions of json messages around.

jpc0 · on March 18, 2024

You are proving my point here.

The single largest contributor to performance degradation on websites is that very industry.

Look JSON has it's advantages and is a fine tradeoff, performance isn't a place it's good at, that's not something that makes it a bad format, it's just if you want to optimise for performance I would start by reducing the sheer amount of redundant data being passed around in a JSON blob long before I would hyper optimise on a C++ JSON parser.

Sure if you are using a JS or python json parser there are massive gains to be had by calling into a lower level language to do the parsing but picking between the choices in C++ parsers is probably bikeshedding.

Now if your use case truely needs to absolute most performant JSON parser and you will trade off usability and portability for it then sure but another one of hose axioms apply. The solution for the 99th percentile is rarely the correct solution for the 50th percentile

tracker1 · on March 17, 2024

Serde/mini-serde is pretty ubiquitous, and axum is getting close to the default as well.

maerF0x0 · on March 16, 2024

> a request/response architecture

fundamentally it's a HTTPS server too, you can actually invoke them with direct HTTPS calls, no SDK required. [1]

[1]: https://docs.aws.amazon.com/lambda/latest/dg/urls-invocation...

lukevp · on March 17, 2024

Function URLs aren’t part of lambda, they’re just a thin abstraction around API Gateway v2 (http APIs) that allow all calls, and have randomly generated domains, so you’re not gaining anything and losing some functionality by doing this instead of running an API GW with lambda proxy integration yourself. If setting up API GW is too difficult, you could use SAM or Serverless Framework to automatically provision it. Then you can have a real domain, SSL, failover, endpoint validation, etc.

arrakeenrevived · on March 17, 2024

>so you’re not gaining anything and losing some functionality by doing this instead of running an API GW

You're gaining the fact that Function URLs are free while APIGW can be pretty costly, as well as the fact that Function URLs are fantastically less complex than APIGW if your use cases fit it.

tlarkworthy · on March 17, 2024

Function URLs are not limited to 30 seconds. That's massive

maerF0x0 · on March 18, 2024

I think they are only an asynchronous invocation in that case though? The reason they do that is they dont want your connections holding a port for 15 minutes.

plumeria · on March 17, 2024

I don’t get why they don’t support gRPC (HTTP 2)? They already support Websockets.

ekzhang · on March 17, 2024

(Author) Modal tackles how to make FaaS work, but for actual _function calls_, and also with containers that have much higher resource caps (see article: 3 CPUs vs 64 CPUs, or 10 GB RAM vs 336 GB RAM).

EC2 isn't the same compute shape. We run fast-booting (think: seconds, not minutes), dynamic sandboxed containers on a single host (think: gVisor, Firecracker) and optimized file system lookups (FUSE, distributed caching, readahead, profiling). It also means we bill by the CPU cycle, scale rapidly, and bill you only for 100% utilization. You do not manage individual VMs.

This is why scaling the limits of functions-as-a-service is quite different from scaling VMs, and that's what the content of the article focuses on.

extr · on March 17, 2024

I've been transitioning some compute-heavy workloads from Lambda/AWS Batch to modal recently and have nothing but good things to say about it. One of those technologies where you are shipping the same afternoon as checking it out. "Wow, that just works?" Highly highly recommended, feels like the future IMO.

nathancahill · on March 16, 2024

My first impression is that this is serverless done right. My second impression is I'd love to work there.

voiceblue · on March 16, 2024

As the other commenter points out, this offering isn't quite comparable to Lambda directly. This ends up comparing apples to oranges here and there, but overall I was able to get a good idea of the choices made and the tradeoffs involved. Nice work! I do have a complaint about the comparison table that shows 'convert HTTP to function calls' as an alternative to load balancers and reverse proxies: as we see later in the article, there is still a load balancer involved, and that table creates a false impression that there isn't.

ekzhang · on March 17, 2024

(Author) Thanks for the feedback! Yes, there's still a network load balancer involved — we debated whether we should include the last section of the post but decided it's still worth writing out. The idea is that the load balancer doesn't need to directly connect to the running machines / containers, but instead just a much smaller set of `modal-http` service pods that translate HTTP to function calls, which lets us deploy and service big apps a lot faster. :)

Writing a good network load balancer is very hard and we don't intend to suggest that it's our forte. We very much defer to all of the existing systems literature on this (and a couple papers like Maglev are in the post.)

orf · on March 16, 2024

I don’t get it. A lot of this post is describing how they translate HTTP requests into function calls, and deal with issues around the HTTP protocol.

If you’re invoking a heavy-duty, long running function then you’re likely doing something bespoke. Why use HTTP at all? Wouldn’t GRPC be a better fit, because that seems to be what is being reinvented here.

The selling point of HTTP is that it’s ubiquitous and simple. But you’re coupling it with an offering that is specific and complex. Is using a GRPC library such a burden that makes this effort worthwhile?

ekzhang · on March 17, 2024

(Author) HTTP is the de facto protocol / "narrow waist" of the Internet. To prove the point further, as other commenters mention, gRPC is also HTTP.

orf · on March 17, 2024

Indeed, and it gives you a function call interface on top of HTTP with streaming built in, and a lot of the boring stuff handled for you.

Which is exactly what you seem to want.

DinaCoder98 · on March 17, 2024

gRPC is for long-lived services (and connections). Combining it with lambda doesn't make much sense.

orf · on March 17, 2024

If only there was some kind of long-lived proxy you could put in front of things that would handle connection pooling and would forward requests onto a short-lived backend, scaling up instances as needed to serve those requests.

DinaCoder98 · on March 17, 2024

Ok, this would still obviate the use of gRPC on lambda

orf · on March 17, 2024

Everdred2dx · on March 17, 2024

Easier for the web devs to work with I guess? Multiple mentions of web browsers in the post

quadhome · on March 17, 2024

gRPC is layered on top of HTTP.

https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2....

orf · on March 17, 2024

Yes it is. Thanks?

j4ah4n · on March 16, 2024

This looks amazing, nice write up!

Anyone know if there's anything like this, but using Python and self hostable?

WiseWeasel · on March 16, 2024

You can migrate a Django project between serverless and self-hosted https://dev.to/vaddimart/deploy-django-app-on-aws-lambda-usi...

pcwelder · on March 17, 2024

Ray is the closest I think.

xcdzvyn · on March 17, 2024

A web server?

maerF0x0 · on March 16, 2024

Their quoted limits on AWS seem quite off.

> As of 2024, they can only use 3 CPUs (6 threads) and 10 GB of memory

Actually you get 1vCPU (eg a hyperthread) per 1769MB of memory[1]

[1] -https://docs.aws.amazon.com/lambda/latest/dg/configuration-f...

> Response bandwidth is 2 Mbps

This is shockingly low, and I wouldn't believe it without data. 16Mbps (2MB/s) would be more believable. In my experience you can reliably get 25MB/s (~400Mbps) in the network layer of things in AWS.

arrakeenrevived · on March 16, 2024

You're correct, it is 2 MB/s. The actual bandwidth from the AWS Lambda docs is:

>Uncapped for the first 6 MB of your function's response. For responses larger than 6 MB, 2MBps for the remainder of the response

Some of the other numbers in the article are also incorrect. Lambda functions using containers can use a 10 GB container image (the article claims 50 MB), and container images are actually the faster/preferred way to do it these days.

ekzhang · on March 17, 2024

(Author) Yes, it’s in the page linked in the section — and the maximum memory is 10240 MB, so that’s 5.8 vCPU < 3 physical CPUs.

Good point about 2 Mbps vs 2 MBps, I’ll update that. Forgive me for the typo!

OJFord · on March 17, 2024

The services compared aren't really equivalents - Cloudflare Workers is more like Lambda@Edge, in particular I just can't imagine a reason you'd need it for the NN training/video processing/the background job type tasks it mentioned; and Google Cloud Run I'm less familiar with, but isn't it more like AppRunner or Fargate?

ekzhang · on March 17, 2024

(Author) Yes, that's exactly right

pcwelder · on March 17, 2024

FYI the hyperlink on text "traditionally understood" is giving 404 (https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-...)

pcwelder · on March 17, 2024

For workloads that take minutes to run, it's very easy to hit max socket connection limit somaxconn. How do you handle that with just request response model?