Vercel Fluid Compute

rpicard · 2025-02-16T13:57:50 1739714270

This isn’t my area of expertise, but it sounds like the core innovation is that one compute worker can handle multiple requests like a traditional server would, but they’ll scale down a lot faster than you’d get with an EC2 instance for example.

Is that right?

cramforce · 2025-02-16T18:16:56 1739729816

That's 1 part, yes!

Part 2 is that you can also use an actual server if your workloads happens to be predictable (or is partly predictable). That gives you better cost efficiency for that part of the workload

Displaimer: CTO of Vercel here

rpicard · 2025-02-16T20:58:16 1739739496

Love it! When I first learned about serverless as a vibe years ago it didn’t click, but it’s so clear to me that abstracting compute fully is a huge win as long as you can safely assume things about the way your code will run.

It’s interesting to see so many companies coming at it from different angles. Fly, Vercel, Cloudflare, Northflank, Temporal? etc.

Pretty much all the code I need for my work can run in GitHub Actions, so I’m not so much the target, but still enjoy watching the development.

If you can answer, what’s the overall initiative here for Vercel? Own all the compute, as opposed to just the front end ish things?

cramforce · 2025-02-17T18:49:10 1739818150

The initiative is really focused on our current workloads like APIs, SSR, etc.

It definitely makes Vercel suitable for a broader set of workloads, though!

tuananh · 2025-02-16T14:14:19 1739715259

i'm curious how it work.

because provider like aws lambda wont give you vertical scaling since they can't guarantee resources request.

so i guess they are mixing it to a pool (serverless + microvm) ?

cramforce · 2025-02-16T18:18:32 1739729912

The big difference is how the microvm is utilized. Lambda reserves the entire VM to handle a request end to end. Fluid can use a VM for multiple concurrent requests. Since most workloads are often idle waiting for IO, this ends up being much more efficient.

Displaimer: CTO of Vercel here

Rutledge · 2025-02-17T09:10:43 1739783443

The concurrent request handling seems great for our AI eval workloads, where we're waiting for LLM API calls and DB operations but curious how Vercel handles potential noisy neighbor issues when one request consumes excessive CPU/memory?

Disclosure: CEO of Scorecard- AI eval platform, current Vercel customer. Intrigued since most of our time serverless time is spent waiting for model responses, but cautious about 'magic' solutions.

schniz · 2025-02-17T18:42:06 1739817726

We built Fluid with noisy neighbors(=requests to the same instance) in mind. So because we are a data-driven team, we

1. track metrics and have our own dashboards to ensure we proactively understand and act whenever something like that happens 2. also use these metrics in our routing to smartly know when to scale up. we have tested a lot of variations of all the metrics we gather and things are looking good

anyway, the more workload types we will host with this system, the more we know and the better/performant it will get. we're running this for a while now, and it shows great results.

there's no magic, just data coming from a complex system, fed into a fairly complex system!

hope that answers the question, and thanks for trusting us

alex12de · 2025-02-19T04:13:09 1739938389

So if undertood 1. correctly I could use this solution to potencially save money, but it could turn into a nigthmare very quickly if you guys aren't watching?

Rutledge · 2025-02-19T05:42:36 1739943756

Yes quite helpful- thanks for explaining and will try it out!

tuananh · 2025-02-17T13:08:28 1739797708

i think the majority of Vercel customers are doing web site hosting & most of the web requests are IO bound so it makes sense to handle multiple requests per microvm.

can't say the same if customer is doing CPU bound workload.

tuananh · 2025-02-17T02:29:37 1739759377

but then fluid will break that resources request requirement right?