The best cold starts are those which aren't noticed by the user. For my blog search (which runs on Lambda), I found a nice way of achieving that [1]: as soon as a user puts the focus to the input field for the search text, this will already submit a "ping" request to Lambda. Then, when they submit the actual query itself, they will hit the already running Lambda most of the times.
And, as others said, assigning more RAM to your Lambda than it actually may need itself, will also help with cold start times, as this increases the assigned CPU shares, too.
PC doesn't make that much sense for a use case like mine, where absolute cost minimization is the goal. My request volume (including ping requests) easily fits into the Lambda free tier, whereas with PC I'd be looking at some cost.
More generally, I feel PC kinda defeats the purpose of Lambda to begin with. If I'm in that range of continuous load, I'd rather look at more traditional means of provisioning and hosting this application.
I think you are thinking of reserved concurrency - with reserved concurrency you can only run as many instances as you have reserved. With provisioned concurrency you can run as many as your quota allows but you are guaranteed to be able to handle as many concurrent instances have provisioned. Neither one of these are going to help you with cold start times, it is not defining the number of instances running, its just reserving a portion of quota so that they can run. Where provisioned/reserved concurrency comes in useful is keeping a run-away lambda function from starving out other functions either by guaranteeing a portion of the quota for the other functions or keeping a function from exceeding a number of concurrent executions.
The "ping" technique you mentioned is one way to keep a function warm but if lambda decides to start a second instance of the function because the hot one is handling a request, then that person is going to take a warm up hit and nothing you can do about that.
If you are really latency sensitive then lambda might not be the right choice for you. You can get a t4g.nano SPOT instance for about $3.50/month and you can keep that warm, but that is probably a whole lot more then you are paying for lambda.
Reserved concurrency both guarantees a portion of your concurrency limit be allocated to a lambda as well as capping concurrency of that lambda to that portion. Reserved concurrency has no cost associated.
Provisioned concurrency keeps a certain number of execution environments warm for your use. Provisioned concurrency costs money.
PC is a good way to keep certain amount of performance guarantees. It seems like you’re solution fits into the free tier, so probably a bad idea for you.
Provisioned concurrency is a bit of a non-starter for cold start latency reduction in user-facing applications. Its a tool in the toolbox, but the problem is: Let's say you set the provisioned concurrency at 10, then you have 11 concurrent requests come in. That 11th request will still get a cold start. PC doesn't scale automatically.
The ping architecture of warming up functions does scale (better) in this setup. Its not perfect, but nothing on lambda is; its possible that the ping request for User2 could get routed to a live function, which finishes, then starts handling a real request for User1, User2's real request comes in but this function is busy, so Lambda cold starts another one.
That being said, these ping functions can be cheap relative to PC. PC is not cheap; its ~$10/mo/gb/function.
With the ping architecture; you'd generally just invoke then immediately exit, so there's very little billed execution time. For sparsely executed functions, the ping architecture is better because PC is billed 24/7/365, whereas ping is billed per invoke (that being said: PC is a much cleaner solution). For high volume functions, PC may be cheaper, but it also won't work as well; if your concurrent executions are pretty stable, then its a great choice, but if its spiky high volume (as most setups are!) then it's hard to find a number which balances both cost and efficacy.
A setup which also includes Application Autoscaling, to automatically adjust the PC level with time of day or system load or whatever, would approach a better setup. But, of course, be more work.
The other frustrating thing about PC is its price relative to a cloudwatch events cron warming setup. Most sources say Lambdas stay warm for ~5 minutes, which means you only need ~15,000 invocations/month per "faux-Provisioned Concurrency" level. Theorycrafting at 100ms/invocation and 512mb: that's ~$0.012/month. PC is ~500 times more expensive than this. This kind of cron-based ping setup is very easy when PC=1; at 1+, it gets harder, but not impossible: You have a cron configured to execute a "proxy" lambda, which then concurrently invokes the real lambda N times depending on what level of PC you more-or-less want.
It's also worth noting that PC cannot be maintained on the $LATEST function version. It can only be maintained on an explicit Function Version or Alias (which does not point to $LATEST). So there's just more management overhead there; you can't just have a CI deploy functions and let AWS do everything else, you have to configure the CI to publish new versions, delete the existing PC config, and create a new PC config, maybe also adjust autoscaling settings as well if you configured it. All automatable, but definitely in the domain of "CloudOps Engineer Job Security"; anyone who tells you AWS will make your life easier is probably working for AWS.
At least in the OP’s case it seems the requests are infrequent. I understand your point about the 11th request, but at that point what % of your customers are impacted. The tail latency is always an issue, PC is a good way to solve that.
PC is definitely "better". It does require you to opt-in to function versioning, but if your ops automation is good enough to make that easy, its just better. The biggest advantage over warming pings is, really, that it keeps your metrics and logs clean and focused just on user invocations, not on weird faux-invokes that are extremely difficult to filter out.
But, its downside is that its ~100x more expensive than warming pings for infrequently invoked functions (defined as: average concurrent invocations over a month timeframe between 0 and 1).
If your company has the money, go for PC. But don't feel bad about going for warming pings; it ain't a bad setup at all.
You increase the usage of the lambda, sure, but not by double unless all this lambda does is respond to the search. There's a point where this is all semantics when running cloud services with regard to how you've deployed your code behind gateways and lambdas, but most applications have obvious triggers that could warm up a lambda at 10% or less overhead.
Lambdas can be dirt cheap if you use them well. But that's always what happens. I recently saw a report about contractor who used them for crawling websites. Their client saw a surprise $12k bill for Lambda, when a simple Scrapy crawler on a low-end instance would have cost them < $100/month for the same load.
Why? Because if you spin up a ton of Lambda invocations and have them sit around waiting for the network, you pay for each CPU to sit idle, rather than having just one CPU stay busy managing a bunch of async IO.
Same result would be achieved by paying tons of $$ for a high-end machine with dozens of cores, just to scrape one URL per core.
They could have implemented asyncIO within just a few Lambdas - heck even one Lambda could certainly handle hundreds, if not thousands of ASYNC jobs at once, just as a cheap machine, as you pointed.
They treated Lambda functions as threads or async jobs, while they should be looked at as core processors.
I'm not sure I understand the point about asyncIO: yeah, the Lambda CPU isn't spinning, but don't they count actual runtime? So even if things are async, the Lambda still needs to be up to respond to the event when the async IO is finished.
Or do you mean handling things in an async way and Lambda passing the info on to another computing environment?
I think what they're talking about is using one Lambda to persist, handling many crawl requests. That doesn't make a ton of sense, as each Lambda has to make sure to push all its results and shut down before the max timeout is up, but it would technically work and be more efficient than what I described. Much better, though, just to use long-running spot instances, though, or just a regular instance running something like Scrapy.
Exactly. These people thought they were using the tech well. They weren't. I like stories of tech gone wrong as examples to learn from. I thought others might as well.
Couldn't this also have been handled efficiently in Lambda had it been configured to use batching? (Assuming it was implemented as async invocations driven by a queue of some sort.)
It does seem unfortunate that Lambda cannot be configured to support concurrent synchronous invocations for serving HTTP requests concurrently.
I guess you're paying a 5x premium for Lambda in the constant usage scenario, a similar premium as on-demand vs spot instances.
4GB m6g.medium $0.0385/hour
4GB Lambda works out at $0.1920/hour
It's been a while since I last used queue driven autoscaling groups, but https://aws.amazon.com/blogs/compute/scaling-your-applicatio... has startup time of about 4 minutes from scratch or 36s from a paused instance in a 'warm pool.' vs less than a second for Lambda in the original article. So the decision ends up coming down to responsiveness vs cost. Presumably Fargate comes somewhere in between.
Thanks, will take a look. Can I run Java, or any kind of binary, there? That's why I'm not running this with CloudFlare workers, for instance. An alternative I'm actually considering is fly.io.
Velo runs only JavaScript, but unlike cloudflare, velo runs a full node.js.
With that in mind, you can a lot of things, from accessing your database or any other asset on aws (or gcp or azure). You can also use other wix apis, like payment api or the wix data database.
If you need to access a DB in AWS, isn't the networking overhead on every single call going to dominate all cold start performance gains in a few occasional requests in Velo?
My experience with cold starts in Azure Functions Serverless is pretty awful. Like most other Azure services, their affordable consumer grade offerings are designed from the ground up not to be good enough for "serious" use.
Cold start times compared to Lambda are worse, and in addition, we would get random 404s which do not appear in any logs; inspecting these 404s indicated they were emitted by nginx, leading me to believe that the ultimate container endpoint was killed for whatever reason but that fact didn't make it back to the router, which attempted and failed to reach the function.
Of course the cold start and 404 are mitigated if you pay for the premium serverless or just host their middleware on their own App Service plans (basically VMs)
Same experience with Firebase. I just joined a team that has been using it. I've never worked with serverless before, and it boggles my mind how anyone thought it would be a good idea.
The cold starts are horrendous. In one case, it's consistently taking about 7 seconds to return ~10K of data. I investigated the actual runtime of the function and it completes in about 20ms, so the only real bottleneck is the fucking cold start.
Why didn't the Firebase minInstances work for you? I found amazing performance benefits, but at a $ cost. I actually forgot[1] a function to set a minInstance and a user complained that this particular functionality was slow (compare to the rest of the site). However, it isn't cheap.
You also want to be sure your code is optimized[2]. For example, don't require module packages unless the function needs it, else you're loading unnecessary packages. I usually set a global variable `let fetch;` and in the function that requires fetch initialize it `fetch = fetch ?? require("node-fetch);`.
I'm assuming by Firebase you mean Firebase Functions? We have a fairly complex infrastructure running off a combination of Firebase's RTDB, Firestore, and Google Cloud Functions and have never seen anywhere near what you're describing. Are you sure you're experiencing a 7 second "cold start", or is the invocation simply taking 7 seconds to run? Because the latter is far more easily explained.
I've confirmed that the actual execution of the function itself takes ~30ms, and that the time to download is standard (~200ms). That only leaves the cold start; nothing else makes sense.
edit: I even set up a node server on AppEngine itself, copied over the exact code to an endpoint, and it was taking 300-500ms tops (after the initial cold start of the AppEngine server).
I've seen 30s on AWS, so it's not that surprising. They have now improved it greatly though.
And yet I still believe it's a great technology, as always it's a matter of putting it on the right use case. Message consumption from a queue or topic, low traffic and low criticality API are two great use cases.
No, you've seen 30s on a random implementation running on AWS.
If you write your lambdas without knowing what you're doing then you can't blame the technology for being grossly misused by you.
Case in point: developing AWS Lambdas for the JDK runtime. The bulk of the startup time is not the lambda at all but the way the lambda code is initialized. This means clients and auth and stuff. I've worked on JDK Lambdas where cold starts were close to 20s due to their choice of dependency injection framework (Guide, the bane of JDK lambdas) which was shaven down to half that number by simply migrating to Dagger. I've worked on other JDK lambdas which saw similar reductions in cold starts just by paying attention to how a Redis client was configured.
Just keep in mind that cold start times represent the time it takes for your own code to initialize. This is not the lambda, but your own code. If you tell the lambda to needlessly run a lot of crap, you can't blame the lambda for actually doing what you set it to do.
That sort of time could be easily reached with lambdas that require VPC access, where a new ENI needs to be provisioned for each lambda container. I don't think alive seen 30s, but could easily see 5-10s for this case. And since this is required to run an isolated DB that is not some other AWS service, it isn't that uncommon. I believe they have since improved start times in this scenario significantly more recently.
And yet it magically came down to 10s when amazon improved their system. Specifically it became much faster to join a VPC.
And don't get me wrong: yes I was running some init code but not that much: load config from ssm, connect to a DB. I did bundle a lot of libs that didn't need to be there. But:
- fact is, it took 30s
- the use case didn't need it to be faster so I didn't care much
30s is probably an edge case. Did this use Java/JVM runtime without AOT/GraalVM? I cannot imagine any other runtime that would cause 30s cold start. Care to share more details on this?
That's what I thought. We've spent weeks investigating this one function and tried everything that Firebase recommends in their docs. Nothing has worked.
I'm extremely surprised to hear that. I know that there can be implementation differences, but on the level of application-code, this stuff is super simple. Create a javascript function then upload it. Not really much else to it, so I can't fathom what the difference is between your project and my own.
We're not doing anything crazy, it's just a basic CRUD application with minimal data (entire DB is less than 100MB at this point). And yet we're seeing constant, constant lags of several seconds for almost every single request. I can't explain it.
Never use Basic SKU, plan your network carefully before you even create the vnets, monitor NAT capacity, beware undocumented ARM rate limits. Good luck
> consistent and pervasive security model with Azure AD
Wait, this is the first time I hear this about Azure. Could you elaborate? It is possible that things have improved significantly since I last worked with Azure but lack of a consistent security model (like IAM on AWS) to control human and service (Azure Functions, App Service apps etc) access to specific resources (Cosmos databases, EventHubs etc) especially painful.
some of it is wonky, such as the login model for postgres on Azure SQL (you create login-capable postgres groups that exactly mirror the name of an Azure AD group, and then the "password" you pass in is actually a JWT proving YOU are in fact a member of that AD group -- so you have to hit a funky endpoint to get a time-limited "password")
I like Azure in general, but Function cold start times are really awful.
I regularly see start up times exceeding 10s for small, dotnet based functions. One is an auth endpoint for a self-hosted Docker registry, and the Docker CLI often times out when logging in if there is a cold start. I'm planning on moving these functions to Docker containers hosted in a VM.
I have other issues with Functions too. If you enable client certificates, the portal UI becomes pretty useless, with lots of stuff inaccessible. I have one such endpoint in production just now, and it's even worse than that, as every now and then it just... stops working until I manually restart it. Nothing useful in the logs either.
Azure Functions cold start times also depend on the underlying tech stack. I was using Python on a Linux host for Slack related Azure Functions and they ran into timeouts sometimes (which for the Slack API is 3s I think). After I switched to Nodejs on Windows I never got a timeout again.
For the Azure Functions consumption plan this can be mitigated to an extent by just having a keep alive function run inside the same function app (set to say a 3-5 minute timer trigger).
Azure Functions, in my opinion, should mostly be used in cases when you want to do some work over some time every now and then. It will also probably be cheaper to use something else in your case. In later versions of AF you can use a real Startup file to mitigate some life cycle related issues.
The way Azure Function scales out is different and is not entirely suited for the same goal as lambdas. Lambdas happily scale from 1 to 1000 instances in seconds* (EDIT: not A second), whereas Azure Functions just wont do that.
Last time I tried this was a few years ago, but seems like its still the case.
For an initial burst of traffic, your functions' cumulative concurrency in a Region can reach an initial level of between 500 and 3000, which varies per Region. After the initial burst, your functions' concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or until a concurrency limit is reached. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
That's just burst concurrency. It takes a lot longer that 1 second to actually reach the peak burstable limit. So while the Lambdas are scaling up your clients are either waiting several seconds, or getting 429s.
At my last job we built an entire API on top of serverless. One of the things we had to figure out was this cold start time. If a user were to hit an endpoint for the first time, it would take 2x as long as it normally would at first. To combat this we wrote a "runWarm" function that kept the API alive at all times.
Sure kind of defeats the purpose of serverless but hey, enterprise software.
That's one of the things that always threw me with complaints about cold starts - how many apps/etc do I use daily where I interact, and there's a 10 second delay before something happens? The answer: quite a lot.
Yeah, we can do better. And in fact, with Serverless, -most users will experience better-. It's only when load is increasing that you see those delays, and then it's still only a delay. Not shed load.
The fact I can experience that delay easily in dev makes people think it's going to be a huge problem, but, A. In real use it probably isn't as common as it is in dev (since you have way more traffic) B. You can design to minimize it (different API endpoints can hit the same lambda and be routed to the right handler there, making it more likely to be hot), C. It forces you to test and plan for worst case from the beginning (rather than at the end where you've built something, and now have to load test it).
Not to say to use it all the time, of course; there are plenty of scenarios where the cost, the delay, etc, are non-starters. But there are also plenty of scenarios where an engineer's instinctual reaction would be "too slow", but in reality it's fine; your p95 is going to look great, and only your P99 is going to look bad (on that note, a slow API response accompanied with a spinner is a very different thing from a UX perspective than a slow page load with no indication of progress), and even then it's predictable when it happens, and it forces function scale out rather than tanking a service. Of course, it's often not obvious upfront which scenarios those would be until/unless you try it, and that's definitely a barrier.
There is actually a really awesome middle-ground that AWS offers that no one seems to talk about.
That is using ECS + Fargate. This gives you (IMHO) the best of both worlds between Lambda and EC2.
ECS is Elastic Container Service. Think docker/podman containers. You can even pull from Dockerhub or ECR (Elastic Container Registry - amazon's version of dockerhub). ECS can then deploy to either a traditional EC2 compute instance (giving you a standard containerization deployment) or to "Fargate".
Fargate is a serverless container compute instance. It is like serverless EC2. You get the "serverless" benefits of Lambda, but it is always-on. It has automatic scaling, so it can scale up and down with traffic (all of which is configured in ECS). You don't need to manage security updates of the underlying compute instance or manage the system. You get high-availability and fault tolerance "for free". But at the end of the day, its basically a non-managed EC2 instance. You can choose the ram/cpu options that you need for your Fargate just like any other compute instance. My recommendation is go as small as possible and rely on horizontal scaling instead of vertical. This keeps costs as low as possible.
When I hear people trying to keep Lambdas running indefinitely, it really defeats the purpose of Lambda. Lambda has plenty of benefits, but it is best used for functions that run intermittently and are isolated. If you want the serverless benefits of Lambda, but want to have the benefits of a traditional server too, then you need to look at Fargate.
And of course there is a world where you combine the two. Maybe you have an authentication service that needs to run 24/7. Run it via ECS+Fargate. Maybe your primary API should also run on Fargate. But then when you need to boot up a bunch of batch processing at midnight each night to send out invoices, those can use Lambdas. They do their job and then go to sleep until the next day.
I should also add that the developer experience is far superior going the ECS+Fargate route over Lambda. I have built extensive APIs in Lambda and they are so difficult to debug and you always feel like you are coding with one hand tied behind your back. But with ECS+Fargate you just build projects as you normally would, with your traditional environment. You can do live testing locally just like any other container project. Run docker or podman on your system using an Amazon Linux, Alpine Linux, CentOS base. And that same environment will match your Fargate deployment. It makes the developer experience much better.
>It has automatic scaling, so it can scale up and down with traffic (all of which is configured in ECS)
Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?
Also with lambda there's no configuring to do for scaling. If anything lambda gives you tools to limit the concurrency.
Thanks for pointing that out. I should have clarified because I agree that "Automatic" is a relative term.
Lambda is entirely automatic like you point out. You literally don't need to think about it. You upload your function and it scales to meet demand (within limits).
ECS however still requires configuration, but it is extremely simple to do. They actually call it "Service Auto-Scaling". Within there you choose a scaling strategy and set a few parameters. That is it. After that, it really is "automatic".
Most of the time you will be selecting the "Target Tracking" strategy. Then you select a Cloudwatch metric and it will deploy and terminate Fargate instances (called "tasks" in the docs) to stay within your specified range. So a good example would be selecting a CPUUsage metric and keeping the average CPUUsage between 40-70%. If the average CPU usage starts to get above 70% across your tasks (Fargate instances), then ECS will deploy more automatically. If it falls below 40% then it will terminate them until you get within your desired range. You get all this magic from a simple configuration in ECS. So that's what I mean by automatic. Its pretty easy. Depending on what you are doing, it can set scaling to any other metric. It could be bandwidth, users, memory usage, etc. Some of these (like memory) require you to configure a custom metric, but again it isn't bad.
You can also scale according to other strategies like scheduled. So if get lots of traffic during business hours you can scale up during business hours and scale down during the night. Again, just set your schedule in ECS. It is pretty simple.
The difference in scaling is more subtle than that. The thing that makes lambda so nice from scalability point of view is that you don't need to worry about the scalability of your application. You don't need any awkward async stuff or tune application server flags or anything like that. Your only concern with lambda code is to respond to one request as fast as possible. You can write something that burns 100% CPU in a busyloop per request in a lambda if you want and it'll scale all the same. In fargate making sure that the application is able to handle some economical amount of concurrency is your responsibility, and it can in some cases be very much non-trivial problem.
Scaling does take time, but you would normally scale based on resource utilization (like if CPU or RAM usage exceeded 70%). So unless you had a really large and abrupt spike in traffic, the new container would be up before it's actually needed.
It's definitely not apples to apples with Lambda though--if you do have a very bursty workload, the cold start would be slower with Fargate, and you'd probably drop some requests too while scaling up.
If your app allows for it, a pattern I like is Fargate for the main server with a Lambda failover. That way you avoid cold starts with normal traffic patterns, and can also absorb a big spike if needed.
I think it's just the trade off between these two scenarios.
- Relatively poor amortized scale out time with good guarantees in the worst case.
- Good amortized scale out time with dropped requests / timeouts in the worst case.
With lambda, it doesn't really matter how spiky the traffic is. Users will see the cold start latency, albeit more often. With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.
At least that seems to be the case to me. I have personally never ran a docker image in fargate, but I'd be surprised if it could spin up, initialize and serve a request in two seconds.
> With Fargate, users won't run into the cold start latencies - until they do, and the whole request may timeout waiting for that new server to spin up.
In practice that sort of setup is not trivial to accomplish with Fargate; normally while you are scaling up the requests get sent to the currently running tasks. There is no built-in ability to queue requests with Fargate(+ELB) so that they would then be routed to a new task. This is especially problematic if your application doesn't handle overloads very gracefully.
> Doesn't scaling take time, though? Doesn't downloading a new docker container definition and starting it take at least as long as initializing a new lambda function?
Yes, especially because they still don't support caching the image locally for Fargate. If you start a new instance with autoscaling, or restart one, you have to download the full image again. Depending on its size, start times can be minutes...
The big issue with ECS+Fargate is the lack of CPU bursting capability. This means that if you want to run a small service that doesn't consume much, you have two options:
1. Use a 0.25cpu + 0.5gb ram configuration and accept that your responses are now 4 times slower because the 25% time is strictly enforced.
2. Use a 1cpu + 2gb ram (costing 4 times more) even though it is very under-utilized.
AWS is definitely in no rush to fix this, as they keep saying they are aware of the issue and "thinking about it". No commitment or solution on sight though:
Agreed - I'm in the process of moving hundreds of Java Lambdas into a Spring application running in ECS. It costs more to run, but I get flexibility with the scaling parameters and I can more easily run my application locally too. I'm still stuck on AWS but less so than with Lambda.
> To combat this we wrote a "runWarm" function that kept the API alive at all times.
This doesn't really work like you'd expect and isn't recommended, as it only helps a particular use-case. The reason is that AWS Lambda will only keep a single instance of your function alive. That means if two requests come in at the same time, you'd see a cold start on one of those invocations.
Instead, you want to look at something like provisioned concurrency.
Provisioned concurrency is insanely expensive. If you have any kind of a thundering herd access pattern then Lambda is a complete non-starter because of the warm-up and scaling characteristics. We eventually just put an nginx/openresty server on a regular medium EC2 instance and got rid of Lambda from our stack completely and now we're paying about 1/300th the cost we were previously and the performance is infinitely better.
I'm sure it has some use-cases in some kind of backoffice task queue scenario, but Lambda is nearly unusable in a web context unless you have a very trivial amount of traffic.
This is another example of AWS over marketing Lambda. Lambda is horrendously expensive when requests pass a certain level per second. You can graph it against ECS / EC2 to see the point it stops becoming economical.
Taking all of this into account, Lambda is then useful for a very small niche:
- Tasks that don't care about low P99 latency. These tend to be asynchronous processing workflows, as APIs in the customer request path tend to care about low P99 latency.
- Tasks that have a low request per second. Again, these tend to be asynchronous processing workflows.
You talk to anyone on the AWS serverless team and the conversation eventually focuses on toil. If you can quantify engineering toil for your organization, and give it a number, the point at which Lambda stops being economical shifts right, but it doesn't change the overall shape of the graph.
> This is another example of AWS over marketing Lambda. Lambda is horrendously expensive when requests pass a certain level per second.
I feel this is a gross misrepresentation of AWS Lambdas.
AWS lambdas are primarily tailored for background processes, event handlers, and infrequent invocations. This is how they are sold, including in AWS' serverless tutorials.
Even though they can scale like crazy, and even though you can put together an API with API Gateway or even Application Load Balancer, it's widely known that if your API handles more more traffic than a few requests per second then you're better off putting together your own service.
The rationale is that if you don't need to do much with a handler, or you don't expect to handle a lot of traffic on a small number of endpoints, AWS lambdas offer a cheaper solution to develop and operate. In some cases (most happy path cases?), they are actually free to use. Beyond a certain threshold, you're better off getting your own service to run on EC2/Fargate/ECS/whatever, specially given that once you have a service up and running then adding a controller is trivial.
> I feel this is a gross misrepresentation of AWS Lambdas.
AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic. You can set up your code to automatically trigger from over 200 AWS services and SaaS applications or call it directly from any web or mobile app. You can write Lambda functions in your favorite language (Node.js, Python, Go, Java, and more) and use both serverless and container tools, such as AWS SAM or Docker CLI, to build, test, and deploy your functions.
> it's widely known that if your API handles more more traffic than a few requests per second then you're better off putting together your own service.
How is it widely known? Is it on their documentation clearly or in their marketing materials to use another AWS product?
That's what's I mean by over marketing here. Requiring insider baseball knowledge because using it as described footguns your company at infection points isn't a great customer experience.
> AWS Lambda is a serverless compute service that lets you run code (...)
So? It can run your code the way you tell it to run, but you still need to have your head on your shoulders and know what you're doing, right?
> How is it widely known?
It's quite literally covered at the start of AWS's intro to serverless courses. Unless someone started hammering code without spending a minute learning about the technology or doing any reading at all whatsoever on the topic, this is immediately clear to everyone.
Let's put it differently: have you actually looked into AWS's docs on typical lamba usecases, lambda's pricing and lambda quotas?
> That's what's I mean by over marketing here. Requiring insider baseball knowledge (...)
This sort of stuff is covered quite literally in their marketing brochures. You need to even be completely detached from their marketing to not be aware of this. Let me be clear: you need to not have the faintest idea of what you are doing at all to be oblivious to this.
There's plenty of things to criticize AWD over, but I'm sorry but this requires complete ignorance and a complete lack of even the most cursory research to not be aware.
You've been going on and on. I linked you the AWS marketing page on Lambda that includes it scales with no infrastructure and can be used for all use case.
You've had two chances to cite something on their vast marketing and documentation other than marketing brochures (are you serious?) and AWS specific training, paid or otherwise.
You even quoted the wrong part of the marketing spiel.
Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic
ANY scale of traffic, requests or events. Just upload a ZIP or image and you're done. We know that isn't the case, don't we? Even without AWS sales people showing up personally to provide us marketing brochures they wouldn't put on their website.
I use Netlify serverless functions (which is just a wrapper around AWS Lambda) because it basically fits the criteria for me. I have a low but bursty access pattern that fits into the free tier, and there's a static SPA page that can serve up instantly while the XHR triggers to do the cold start fetch. I don't think I would use it for anything consumer facing though. This is just a backend where an extra 300ms isn't going to make a big difference to the admins.
In my experience cold starts don't affect the p99 if you have substantial traffic, because you have enough lambdas consistently running that cold start rate is ~0.1%. P99.9 also matters though!
Insanely expensive is definitely a flexible term. I think numbers help here.
Provisions Concurrency $8.64 / GB / month
256 MB per Lambda (Assuming Python, Ruby, NodeJS, or Rust)
$2.16 per Lambda per month
A lot of organizations can probably make a good business case for keeping 100s or even 1000s of Lambda's warm. You also don't need to keep them warm 24x7, can get an additional 12% discount using savings plans, and if you're a big guy you get your EDP discount.
You should look at offering this as a service perhaps. 2,500 250MB lambdas for $250/month with all AWS guarantees (ie, Multi-AZ, permissioning on every call etc etc) would be pretty compelling I think for folks running intermediate lambda workloads (ie, 5-10K lambadas at a time).
I'm not trying to offer it as a service. I'm trying to run my workload in a way that can scale from 0 -> 10,000 request/second in an instant and doesn't cost my company $5,000/month to do so.
It's pretty easy if you know what you're doing (or care to figure it out).
If you can do $250/month with all ops costs and features of lambda for 5,000 or 10,000 requests per second - you would be silly not to offer a service.
There are plenty of us who can run a system that scales to 10krps. That's relatively easy? I personally can't stand lambda and don't use it FWIW. I like EC2, I actually like fargate a lot of all sorts of things including lambda like services without a separate lambda for each request.
But for folks with a payload, that want the lambda like experience - if you have a solution, all ops cost included (ie, no well paid developer or ops person needed for customer) for $250/month for the scale we are talking here (2,500 x 250MB = 625GB etc) then you have an amazing solution going especially if you can do the networking, IAM controls etc that aws provides.
The problem I've seen, when folks say amazon is "insanely expensive" they are usually not actually comparing the AWS offering to a similar offering. If your cheap solution is not lambda like, you need to compare to EC2 or similar (with perhaps a good programmer doing something a bit more monolithic than aws).
I'll never understand how we got to this point of learned helplessness where people think hosted services like Lambda are the only ones capable of being secure and robust. It's madness..
No, but I think that it's super common to discount to $0 all the work that using lambda saves you from maintenance and operations.
And if you can do any of that at scale for $250/mo you're lightyears ahead of nearly everyone.
it ends up being cheaper overall if you have high utilization of your provisioning since the per second fee while a function is running is cheaper. using https://calculator.aws/#/createCalculator/Lambda, if you have a steady 1 request/s and each requests takes 1 second, 2592000 seconds in a month. at 1024mb, i get 36.52 for provisioned and 43.72 for on demand. With autoscaling...you wont get 100% utilization, but it probably ends up being close enough to a wash
> I'm sure it has some use-cases in some kind of backoffice task queue scenario, but Lambda is nearly unusable in a web context unless you have a very trivial amount of traffic.
This has been the outcome for me on several projects too. Just use loadbalanced EC2 (or EB, for simplification) and pay for a few instances running 24/7. It's actually cheaper than having a busy lambda in all my cases.
The only other case (other than occasional backoffice jobs) would be long-tail stuff: an API endpoint that is used in rare situations: for example the "POST /datatakeout" or "DELETE /subscription/1337" or such. Things that might be heavy, require offbeat tools and so on.
We've had them for building PDFs and .docx from reports; a feature used by <2% of the users, yet requiring all sorts of tools, from latex to pandoc.
Yeah the caveats, gotchas, and workarounds you have to do to get something reasonable running on Lambda are just goofy.
At some point we just stopped and wondered why we were punishing ourselves with this stuff. We switched to a traditional webserver on regular EC2 instances and haven't looked back.
Have you run into issues with Lambda with complex tasks? I thought there was a 15 minute limit to tasks, plus a maximum storage size when importing large dependencies, etc?
The latex example did not run entirely on Lambda. Lambda would write a job into a queue (just Postgres), trigger a launch of a beefy ec2 instance, after which a worker on that ec2 picked up the job. Another lambda function would be called by the server itself to shut down the worker when all jobs were done.
Kludgy and slow. But it worked and did save some money, because the instance running this latex worker was big and chunky yet utilized maybe 10 hours a month.
Lambda was mostly acting as a kldugy load-balancer really.
Here is a little AWS doc describing what parent is talking about. Personally, I had confused "provisioned concurrency" with "concurrency limit" since I don't work with cloud stuff outside of hobbying.
Have your runWarm sleep for 500ms and execute 50 of them concurrently. As long as none of the functions are finished and you start a new one you get a new instance, at least that's what I think.
You can get 50 hot instances that way no?
I'd rather scale per connections. Have a lambda instance do 50 concurrent requests. Something like https://fly.io but cheaper.
That reminds me of a custom Linux device driver that I worked with in the past. It implemented "mmap" so that a user application could map a ring buffer into userspace for zero-copy transfers.
It used lazy mapping in the sense that it relied on the page fault handler to get triggered to map each page in as they were touched.
This resulted in a latency increase for the very first accesses, but then it was fast after that since the pages stayed mapped in.
The solution?
Read the entire ring buffer one time during startup to force all pages to get mapped in.
I eventually changed the driver to just map them all in at once.
When we notice someone is using a form, we fire a no-op request to the function that will handle the data from the form so that it is less likely to be cold when the user is ready to proceed.
(We could get better results by switching to a different implementation language; but we have a body of code already working correctly aside from the extra second or two of cold start.)
What you and others are doing is attempting to predict your peak traffic when you take this approach. It may work for some companies, but more commonly in my experience, it hides P99+ tail latency from companies that may not instrument deeply (and they think the problem is solved).
The rate at which you execute `runWarm` is the peak traffic you're expecting. A request comes in over that threshold and you'll still experience cold start latency.
Provisioned concurrency doesn't change this, but it does move the complexity of `runWarm` to the Lambda team and gives you more control (give me a pool of 50 warmed Lambdas vs. me trying too run `runWarm` enough to keep 50 warmed myself). That's valuable in a lot of use cases. At the end of the day you're still in the game of predicting peak traffic and paying (a lot) for it.
We're almost always trying to predict peak traffic though! The difference is using a course grain computing platform, like EC2 for example, where a single box can handle hundreds++ requests per second, gives you more room for error, and is cheaper.
There are a lot of other trade-offs to consider. My biggest issue is this isn't enumerated clearly by AWS, and I run into way too many people who have footgun themselves unnecessarily with Lambda.
Lambda@Edge helps with latency, definitely not with cold start times. You also can't buy provisioned Lambda@Edge, so for low traffic scenarios it's even worse than typical Lambda (where you can easily provision capacity, or keep on-demand capacity warm, which is not so cheap or easy when that must be done across every CloudFront cache region). For a low traffic environment, running e.g. 3-5 regular provisioned Lambda functions in different regions will produce a much more sensible latency distribution for end users than Edge would.
CloudFront Functions have no cold start, but their execution time is sorely restricted (1ms IIRC). You can't do much with them except origin selection, header tweaks or generating redirects, and there is no network or filesystem IO whatsoever.
Nor does it actually work. If you have a synthetic "runWarm" event, you'll trigger one concurrent lambda to stay warm. This helps if your cold start time is long and your average invoke time is short but you're just levying the cold start tax to the second concurrent user.
There's no reasonable way to keep a concurrency > 1 warm with synthetic events without negatively impacting your cold start percentage for users.
Provisioned concurrency is the correct solution and I'll remind everyone here that you can put provisioned concurrency in an autoscaling group, since the comments here seem to be saying keeping 100 lambdas warm is worse than a server that can handle 100 concurrent users (DUH!)
To be fair, all you’d need to accomplish that without more than necessary parts for production is to ensure that the code path invoking the function is accessed via an external monitoring probe with an adjustment to SLA or SLO to account for the cold start time. Obviously not going to work for many systems, but it’s easy to forget all the side effects of the observability plane when writing applications.
Not that I ever saw. They have made many improvements. But a cold start time of 2 minutes wasn't considered an bug or issue before they fixed the VPC/Lambda interconnect.
Something I discovered recently, for my tiny Go Lambda functions it is basically always worth it to run them at least with 256mb of memory even if they don't need more than 128mb. This is because most of my functions run twice as fast at 256mb than they do at 128mb. Since lambda pricing is memory_limit times execution time, you get better performance for free.
Test your lambda functions in different configurations to see if the optimal setting is different than the minimal setting.
We run a few .net core lambdas and a few things that make a big difference for latency. 1. pre-jit the package, this reduces cold start times as the JIT doesn't need to run on most items. Still does later to optimize some items. 2 is sticking to the new .net json seralizer. The reference code uses both the new and old newtsonsoft package. The old package has higher memory allocations as it doesn't make use of the Span type.
The test code is quite small and might not benefit from R2R that much, libs it relies on are already jitted. Ditching Newtonsoft would affect response time though.
I have similar code where it takes some JSON input, and sends it off to SQS with a small bit of formatting. It impacts cold starts even for these smaller functions.
After like 256mb it is less of an impact when using readytorun. Some of the lambdas I have are webhooks so latency isn't as important and when it's user facing 512mb seems to be a sweet spot.
Like someone else said lambdas are priced by memory * execution time - if you cut execution time by 1/2 by doubling the memory and using AOT - you got faster lambdas for free (even if it's not as high as 1/2 you'll probably still not be paying x2).
AWS Lambda is pretty cool, it just gets used a lot for applications that it was never really designed for. While I wish that Amazon would address the cold start times, if you try to grill your burgers with a cordless drill, you can’t really blame the drill manufacturer when the meat doesn’t cook.
The main downside of Lambda, in particular for user facing applications is that the incentives of the cloud provider and you are completely opposed. You (the developer) want a bunch of warm lambdas ready to serve user requests and the cloud provider is looking to minimize costs by keeping the number of running lambdas as low as possible. It's the incentive model that fundamentally makes Lambda a poor choice for these types of applications.
Other downsides include the fact that Lambdas have fixed memory sizes. If you have units of work that vary in amount of memory required you're basically stuck paying the costs of the largest units of work unless you can implement some sort of routing logic somewhere else. My company ran into this issue using lambdas to process some data where the 99% of requests were fine running in 256mb but a few required more. There was so way to know ahead of time how much memory the computation would require ahead of time. We ended up finding a way to deal with it but in the short term we had to bump the lambda memory limits.
That doesn't even get into the problems with testing.
In my experience, Lambdas are best used as glue between AWS components, message processors and cron style tasks.
> the incentives of the cloud provider and you are completely opposed
I think this is a little overstated. The cloud provider wants their customers to be happy while minimizing costs (and therefore costs to the customer). It's not truly a perverse incentive scenario.
Disagree with "completely opposed". Cloud providers want to make money, sure, but in general everyone in the ecosystem benefits if every CPU cycle is used efficiently. Any overhead goes out of both AWS's and your pockets and instead to the electricity provider, server manufacturer, cooling service.
I just want to appreciate the article. Starting with non-clickbait title, upfront summary, detailed numbers, code for reruns, great graphs, no dreamy story and no advertisement of any kind.
It is hosted on Medium but the author has done a banging great job, so gets a pass. If he is reading, excellent work!
I recently discovered that uWSGI has a "cheap mode" that will hold the socket open but only actually spawn workers when a connection comes in (and kill them automatically after a timeout without any requests).
If you already have 24/7 compute instances going and can spare the CPU/RAM headroom, you can co-host your "lambdas" there, and make them even cheaper :)
It makes Lambda look like a product with a much narrower niche than what AWS wants to sell it as. For many people knowing beforehand that cold start times are > 500ms with 256MB (quite extravagant for serving a single web request) would disqualify Lambda for any customer-serving endpoint. As it stands many get tricked into that choice if they don't perform these tests themselves.
In my experience, if you can't say "get over it" to your customers when they complain about performance then Lambda is not the right tool. Just use EC2.
It's an excellent product for glue code between the various AWS services. Just about every AWS product can trigger Lambda functions, so if you want to run image recognition whenever a new image is uploaded to S3, Lambda is the way to do that. They also make great cron jobs. But for some reason Amazon likes to sell it as a way to run any web application backend, as if that was a good use case.
It can be. We run dynamic image resizing (have a couple million image high quality originals in S3, and customers request sizes based on their screen). Each request is handled by a lambda, and even though these are memory intensive operations we never need to worry about servers or running out or RAM or circuit breakers or anything. Whatever the load it just works. The actual operations take on the order of 100ms, so the cold start is negligible to us. And the end product is cached on a CDN anyway. Costs less than one m5.large, but at peak loads it does work 100 times what's possible on the m5.large.
Say you open a page with a 100 images on it for example. With lambda the all images are resized for you in parallel, so total 100ms. If this was servers, would have to run 100 servers to give you the same performance. A single servers could resize images in sequence all day and might be cheaper than running a lambda repeatedly all day, but that's not the requirement. The requirement is to suddenly do 100 things in parallel within 100ms just once when you open the app.
> Say you open a page with a 100 images on it for example. With lambda the all images are resized for you in parallel, so total 100ms. If this was servers, would have to run 100 servers to give you the same performance
You're probably just simplifying, but to clarify servers can totally do multiple things at once. That's how Amazon runs multiple lambdas on one physical server.
They can, and these are already multi core operations. If takes 4 cores 100ms to do this operation, so on a server with 4 cores doing a 100 of them takes 10 seconds, while on lambda it takes only 0.1 seconds to to them all in parallel.
afaik AWS doesn't publish benchmarks on runtimes; but if they did, I am sure it'd result in a lot of finger-pointing and wasted energy if they were not to normalize the process first (something like acidtests.org).
They don't want to provide it themselves because then they have to admit that the performance is abysmal. Instead they let random blogos provide this data so they can just sit back and say "you're doing it wrong."
They have never cared about this cold-start metric or the devs who do. The hope is that the first users have a degraded experience helps the next 1,000 users that minute have a perfect experience.
To AWS it's like complaining about the end bit of crust in an endless loaf of sliced white bread that was baked in under 2 seconds.
This incorrect. They've made optimizations in this space, but it's a hard problem with a lot of variables. Examples include mostly solving for VPC + elastic interface provisioning, which used to take much longer and made Lambdas within VPCs unusable for customer facing APIs.
The size of the individual Lambda matters quite a bit. It has to be downloaded, processed, and initialized. Latency then varies by language. They can optimize things like their own Lambda runtime that executes that code on a per language basis being quicker, but the rest are hard problems and/or requires educating customers.
Their biggest problem is they oversold Lambda and serverless in my opinion, and now walk it back very slowly, buried deep in their documentation.
A pattern I have implemented is to have my API code on both ECS/Fargate and Lambda at the same time, and send traffic to the appropriate one using an Elastic Load Balancer. I flag specific endpoints as "cpu intensive" and have them run on lambda.
Implemented by
- Duplicating all routes in the API with the "/sls/" prefix (this is a couple of lines in FastAPI)
- Setting up a rule in ELB to route to Lambda if the route starts with /sls, or to ECS otherwise.
- Set up the CPU intensive routes to automatically respond with a 307 to the same route but prefixed with /sls.
Boom, with that the system can handle bursts of CPU intensive traffic (e.g. data exports) while remaining responsive to the simple 99% of requests all on one vCPU.
And the same dockerfile, with just a tiny change, can be used both in ECS and Lambda.
If anyone is running into cold start problems on Firebase, I recently discovered you can add .runWith({minInstances: 1}) to your cloud functions.
It keeps 1 instance running at all times, and for the most part completely gets rid of cold starts. You have to pay a small cost each month (a few dollars), but its worth it on valuable functions that result in conversions, e.g. loading a Stripe checkout.
I'm surprised Node has cold-start issues. I had it in my mind that JS was Lambda's "native" language and wouldn't have cold start issues at all. Did it used to be like that? Didn't Lambda launch with only support for JS, and maybe a couple other languages that could compile to it?
I thought nodejs/v8 or any javascript runtime would have some kind of startup cost since it has to parse and compile the javascript code first. See a simple hello world execution time comparison:
# a Go hello world
$ time ./hello
hi
real 0m0.002s
$ time echo 'console.log("hello")' | node -
hello
real 0m0.039s
The ~25ms of cold start noted in this article feels acceptable and impressive to me, given what node is doing under the hood.
Cold start has been a problem with Lambda since day 1, and in fact has massively improved in recent years.
Node.js is optimized for request throughput rather than startup time. The assumption is that you will have a "hot" server running indefinitely. The Lambda pattern is in general a very recent invention, and not something that languages/rutimes have specifically considered in their design yet.
With node.js, the cold start problem is caused by how node loads files. For each file it does about 10 IO operations (to resolve the file from the module name), then load, parse and compile the file.
If using any file system that is not super fast, this amounts to long delays.
There are ways to get around that, but those are not available on lambda
I wonder ho w much time was spent requiring all of aws-sdk. The v3 sdk is modular and should be quicker to load. Bundlers like rebuild save space and reduce parsing time.
Container based lambda image configurations (vs zip based) would be a good addition to this comparison. People use them eg to get over the zip based lambda size limit.
Also maybe mentione provisioned concurrency (where you pay AWS to keep one or more instances of your lambda warm).
Both of these are supported by Serverless framework btw.
Definitely. In my experience, docker image based Lambdas had consistently poor (>3s) cold starts regardless of memory. I hope it will eventually improve as it is a much nicer packaging approach than ZIP file.
Also, it would have been nice to include ARM vs x86 now that ARM is available.
Slightly off topic, but what's the deal with Azure Functions cold start times in the Consumption (i.e. serverless) plan? I get cold start times in the multi seconds range (sometimes huge values, like 20s). Am I doing something wrong? Or is this expected?
I've experienced this as well. I gave up on optimizing it.
I get around it by using a load balancer (Cloudflare currently) that does periodic health checks. Keeps it alive and the charges are minimal (still well within the free tier).
Speaking of Cloudflare it's on my "nice to-do" list to move this to Workers as a primary anyway.
I also use a completely separate uptime monitor and alerting platform (Uptime Robot) so one way or another I'd be keeping at least one instance warm no matter what.
I think if you get to this point with lambda you're probably overthinking it. I think language runtime choice is important because some choices do have a cost, but likewise, choosing lambda is a tradeoff -- you don't have to manage servers, but some of the startup and runtime operations will be hidden to you. If you're okay with the possible additional latency and don't want to manage servers, it's fine. If you do and want to eke performance, it might not be.
Larger lambdas mean a higher likelihood of concurrent access, which will result in cold starts when there is contention. Your cold starts will be slower with more code (It's not clear how much the size of your image affects start time, but it does have SOME impact).
It's best to just not worry about these kinds of optimizations -- that's what lambda is for. If you *want* to worry about optimizing, the best optimization is running a server that is actively listening.
Scope your lambda codebase in a way that makes sense. It's fine if your lambda takes multiple event types or does routing, but you're making the test surface more complex. Just like subnets, VPCs and everything else in AWS, you can scope them pretty much however you want and there's no hard fast rule saying "put more code in one" or "put less code in one", but by there are patterns that make sense and generally lots of individual transactions are easier to track and manage unless you have an explicit use case that requires scoping it to one lambda, in which case do that.
There are a few cases where I've advocated for bigger lambdas vs smaller ones:
* grapqhl (there still isn't a very good graphql router and data aggregator, so just handling the whole /graphql route makes the most sense)
* Limited concurrency lambdas. If you have a downstream that can only handle 10 concurrent transactions but you have multiple lambda interactions that hit that service, it might be better to at least bundle all of the downstream interactions into one lambda to limit the concurrency on it.
> NodeJs is the slowest runtime, after some time it becomes better(JIT?) but still is not good enough. In addition, we see the NodeJS has the worst maximum duration.
The conclusion drawn about NodeJS performance is flawed due to a quirk of the default settings in the AWS SDK for JS compared to other languages. By default, it opens and closes a TCP connection for each request. That overhead can be greater than the time actually needed to interact with DDB.
I submitted a pull request to fix that configuration[0]. I expect the performance of NodeJS warm starts to look quite a bit better after that.
In addition, the NodeJS cold start time can be further optimized by bundling into a single file artifact to reduce the amount of disk IO needed when requiring dependencies. Webpack, Parcel, ESBuild, and other bundlers could achieve that, I'm sure.
EDIT: That may already be happening here in the build.sh file. I see it runs `sam build --use-container NodeJsFunction -b nodejs`.
I was surprised by the quality of this one. That said...
Cold starts are a FaaS learning subject but they almost never matter much in practice. What workloads are intermittent and also need extremely low latencies? Usually when I see people worrying about this it is because they have architected their system with call chains and the use case, if it really matters, can be re-architected so that the query result is pre prepared. This is much like search results... Search engines certainly don't process the entire web to service your queries. Instead, they pre-calculate the result for each query and update those results as content CRUD happens.
I've been getting around 20ms cold starts, 1ms warm exec on the 128MB ARM Graviton2 using Rust for the most basic test cases. Graviton2 was slightly slower on cold starts than X86 for me (1-2ms) but who doesn't want to save $0.0000000004 per execution? Adding calls to parameter store/dynamo DB bumps it up a little but still < 120ms cold, and any added latency comes from waiting on the external service calls.
Memory usage is 20-30MB, and I haven't done anything to optimise memory. I know I can get rid of a few allocations I'm doing for simplicity if I want to.
I've not always been the greatest fan of Lambdas seeing it has hidden complexity orchestrating and a blackbox for debugging. Re-visiting a few years on and with Rust, you get an excellent language, excellent runtime characteristics and substantial cost savings unless you really need more than 128MB memory, i.e. processing large volumes of data per execution in memory or transcoding. Any asynchronous/event-driven service I write, I'll just package as a Rust lambda going forward and pay fractions of a cent per month. I am still on-the-wall with HTTP exposed services as that's a big plumbing exercise and hidden gateway costs but not as adverse to it as I was.
This is great work - thanks for putting this together. We recently got a request to add Rust support to ServerlessStack and looks like there's good reason to :)
Is that build tag necessary? I should check out what it does, because I've been deploying Go functions to custom runtimes without it. Does it skip some kind of "determine if Go-specific RPC or custom runtime" check?
The articles states that it takes approximately 1000 requests to optimize / warm up. I suspect that they had concurrency set at the default 999, so the first 999 requests would spin up new instances.
Does that mean their 15,000 requests were actually 15 requests spread over 1000 instances?
Surprised to see such mediocre performance from Node. It was an engineering decision on our team to develop one of our Lambdas with Node and we were deciding between Python and Node. Looks like Go and Rust look very promising.
The event driven architecture for NodeJS with simple requests like the test performed fits Node usage. I've seen past cold-start tests like this (admittingly it's been a while), and Node and Python are often the leaders of the pack with Java being the worst.
Not exactly, the heavy lifting is done by v8 on both sides. Deno can do lots of things around the ergonomics, and switch around the event loop (though libuv is already pretty good), but outside of that they are mostly equivalent.
Honestly shocked that rust is about 4 times faster than node for the DynamoDB insert workload in the average case. I knew it'd be faster but I would have expected maybe 50% faster since most of the time is probably spent simply sending the data to dynamoDB and awaiting a response.
Also, what is up with Python being faster than Node in the beginning and then getting slower over time? The other languages (apart from graal) get faster over time. I'm referring to the average MS latency at 128MB graph at the bottom.
Nice to have some updated data and comparisons. This article doesn't include the effect if the Lambda has to connect to a VPC though, which adds time for the ENI. Though that was greatly improved in 2019-2020: https://aws.amazon.com/blogs/compute/announcing-improved-vpc...
Lambda only has default support for the LTS releases of .NET, which is 2.1, 3.1 and the upcoming (November 2021) .NET 6.
The only way to run .NET 5 in a Lambda that I know of would be custom runtimes or containers. Is that what do you mean by “switched to using Docker images”?
This article describes the containerized performance of .NET in Lambda and the cold starts are dramatically (~4x) worse.
>Even though Lambda’s policy has always been to support LTS versions of language runtimes for managed runtimes, the new container image support makes .NET 5 a first class platform for Lambda functions.
The 128M case is really strange, why do Go and Rust take so much more time to start than higher-capacity machines, and even Python? Do they get run inside a wasm runtime or something, and that runtime has to go back and forth requesting memory which a native python runtime gets “for free”?
From the looks of it, Go/Rust cold starts are almost completely CPU bound, so you see the near perfect scaling there. Meanwhile Python I'd guess is mostly io bound, which doesn't scale. That kinda makes sense as Go/Rust compile down to single executable while Python loads lots of libraries from disk.
For Rust, I suspect spinning up Tokio and the rest of async runtime might be making cold starts worse. But that's purely speculation. The python lambda is good old-fashioned sync io.
Another difference is that Rust lambda initializes a logger, while Python one doesn't. That might add some milliseconds too to the startup.
As far as I'm aware AWS Lambda scale various other resources with the requested memory. I assume Rust scales mostly with the assigned CPU time that increases with memory?
The article states that 600ms is a low cold start figure. However, 600ms cold start is still unacceptable for usage by WebApps.
For webapp, the figure should be 100ms. The only platform that meets that figure (that I know of) is Velo by Wix with around 50ms cold start for node.js
I would have liked to see more values along the lambda "breakpoints" between 1GB and 10GB of memory. Unless things have changed recently, my understanding is that CPU and IO scale up specifically at those breakpoints rather than being continuous.
I would love to see languages like OCaml, D, Nim benchmarked here as well. They sit sortof in between Go and Rust, where I don't have to deal with manual memory management but get enough expressiveness to write a nice Lambda.
Not sure about the others, but OCaml tends to perform in the same ballpark as Go, and a bit slower for programs that benefit from shared memory parallelism.
I wish the author had done a comparison for Java apps using JLink that generates a custom Java runtime image that contains only the platform modules that are required for a given application, and if that makes a difference.
jlink usage won't make any significant difference typically. What does make a difference is (App)CDS though, as available in newer Java versions. Memory-mapping a file with the class metadata from previous runs can easily shave off a second or more from time-to-first-response [1], depending on number and size of classes required for that.
Started with a .NET Core API about a year ago. Monolith-first. Mix of clients across mobile and React. Async/await is one of the better things about C# (the language used for ASP.NET Core) and as a result, we were able to do things you'd never consider doing in-process on a system like Ruby on Rails (right on the thread serving the HTTP request), like transcoding a 12 megapixel HEIC upload into JPEG. We just did it, left the connection open, and when it was done, returned an HTTP 200 OK.
That worked well for a while and let us serve tons of clients on a single Heroku dyno. The problem: memory. Resizing images takes tens/hundreds of MB when you're doing it into three different formats.
Over the last two weeks, I extracted the HEIC->JPEG transcode/resize out of our monolith into a Lambda. I'm extremely happy with how it turned out. We went with C++ because the whole idea was performance, we're going to be doing point cloud processing and other heavyweight stuff, and wanted fine-grained control of memory. Our process has 28MB of dynamic libraries (.so files), starts in 200ms, and runs comfortably on a 512MB instance. We moved to 1024 to provide a margin of safety just in case we get a really large image. The system has progressed into "I don't even think about it"-level reliably. It just works and I pay something like $1 for 40-50k transcode operations. No EC2 instances to manage, no queues, no task runners, no Ruby OOM, no running RabbitMQ, none of that (former ops engineer at a very high-scale analytics company).
As a general comment, I don't see many cloud services written in C/C++. This is no doubt partly because those skills just aren't widespread. But I think the bigger lesson is that it might be worth adding a little bit of development complexity to save 10x as much ops overhead. When I explained this setup to my friend, his first reaction was, "Why didn't you just put ImageMagick (the binary) into a container?" Once I explained that actually, I need to get images from S3, and write them into several formats, and manipulate their S3 keys in somewhat complex ways, fire off an HTTP request to a server, and pass a JWT around...sure, I could write this in a shell script, with wget, and curl, and everthing else. But at some point you just have to write the right code for the job using the right tools.
I think hybrid approaches like this make the most sense. .NET and Java are great high-productivity tools for running server apps where memory is relatively abundant. I wouldn't try to move a system like that onto Lambda any more than I'd try to do something that more naturally fits with a queue/worker pattern on a webserver. This seems kind of obvious but if I'm being honest, it's probably experience talking a bit.
It's also neat to just get back to the metal a bit. Drop the containers, runtime environments, multi-hundred-MB deployment packages, just ship a xx MB package up to the cloud, deploy it, and have it run as a standalone linux binary with all the speed and simplicity that brings. Modern C++ is a totally different animal than 90s C++, I'd encourage giving it a try if you haven't in a while.
Agree, it's always funny to see people using runtimes they are not used to using, so they don't know how to configure it properly, like assigning how much memory it can use. Wouldn't trust these results too much as a consequence.
If you can run your entire function in V8 and don't need node, Cloudflare Workers is MUCH faster, more affordable, more manageable, and more reliable. They get cloned all over the world and you're not region-locked.
We've moved from AWS to Cloudflare and I can confirm Workers are awesome. The only downside is lots of work required to get npm packages running properly. Due to the lack of Node.js' runtime, bundling, shims and stubbing required to get stuff like "fs" and "net" compiling. AWS is more mature, has a better community support around it and generally the tooling is more stable. Workers are quickly catching up, though, with great new additions such as ES modules, custom builds, Miniflare, etc.
How does a single metric from a highly specialized runtime environment indicate a tech stack is dead?
There are things you can do right now [1] to mitigate these cold start issues.
Going forward, ahead-of-time compilation will be an option. [2]
Aside from cold starts, note that the improvements in .NET make ASP.NET Core one of the fastest web frameworks. [3]
The article:
> “.Net has almost the same performance as Golang and Rust, but only after 1k iterations(after JIT).”
Additions like async/await and nullable reference types make it easier to write bug-free code, which for a lot of folks is a better trade off than “speaking to the hardware directly”.
.NET also runs natively on a bunch of platforms now, including ARM.
I’d call all of that continuous improvement. Perhaps even reinvention?
This guy has such a rudimentary understanding that he can't point to real examples of Ethereum applications that support his point. Most smart contracts are immutable and non-upgradeable, which nullifies this entire blog post. Many protocols are also moving to DAO governance for even further decentralization of power. This reflects a highly pervasive issue as people with a somewhat technical background but no experience with crypto networks and applications think they are qualified to give opinions on whether the protocols will make it or not.
And, as others said, assigning more RAM to your Lambda than it actually may need itself, will also help with cold start times, as this increases the assigned CPU shares, too.
[1] https://www.morling.dev/blog/how-i-built-a-serverless-search...