With Fargate Savings Plans and Spot Instances, the cost of running workloads on Fargate is getting substantially cheaper, and with the exception of extremely bursty workloads, much more consistently performant vs Lambda. The cost of provisioning Lambda capacity as well as paying for the compute time on that capacity means Fargate is even more appealing for high volume workloads.
The new pricing page for lambda ("Example 2") shows the cost for a 100M invocation/month workload with provisioned capacity for $542/month. For that same cost you could run ~61 Fargate instances (0.25 CPU, 0.5GB RAM) 24/7 for the same price, or ~160 instances with spot. For context I have ran a simple NodeJS workload on both Lambda and Fargate, and was able to handle 100M events/mo with just 3 instances.
Serverless developers take note: its time to learn Docker and how to write a task-definition.json.
Dollar to dollar comparisons are one way to compare these two technologies but it leaves a lot not covered. The application programming model varies greatly (socket/port vs. event). There's also a lot more that Lambda brings to the table in terms of monitoring, logging, etc that you'd need to do work yourself to enable.
Fargate is a great product, but it doesn't completely remove all operational work to the degree that Lambda does.
Agreed. For the vast majority of cases, a Lambda function is easier to ship and maintain, and most likely dramatically cheaper. I really only think the value of using Fargate kicks in compared to Lambda at around 5M+ invocations/month. YMMV based on workflow and workload.
You are comparing only the costs of running them, what about the cost of developers who build/debug/troubleshoot the container. As someone who is running both on Lambda and Fragate it's way harder to make things tick on Fargate
It also depends a lot on the nature of the workload. For IO bound workloads (like most web services), Fargate can easily be an order of magnitude cheaper than Lambda. But if your task is soaking the CPU then you can really get your money's worth out of Lambda.
I agree, but you don’t need to learn how to use a task-definition file. I would actually advise against it. You can create your entire Fargate environment with CloudFormation.
You can also create your entire Fargate environment in a couple lines of TypeScript / Python / Java code using the AWS Cloud Development Kit. The AWS CDK is a declarative SDK that generates and deploys CloudFormation on your behalf, while offering you prebuilt patterns for many common deployment architectures: https://docs.aws.amazon.com/cdk/latest/guide/ecs_example.htm...
That’s cool. I never played with CDK. One issue I had with using CloudFormation was that when I built the Docker file with a tag of :latest and then ran the CF Template, CloudFormation doesn’t perform any updates because the template didn’t change.
Luckily we use CodeBuild and Octopus Deploy. I was able to use the CodeBuild build number environment variable to tag the Docker container, to specify the Octopus build number and use an Octopus Deploy variable in the CF template to force unique and consistent tags.
Lambda’s success was never about bursty workloads, it’s the elastic event driven compute model it enables.
Yes only paying for the compute you actually use is great, but so is having basically limitless compute power (your wallet willing) without the ops overhead and system maintenance.
Cold starts have been a problem fire a while, and while there many be a better way than this long term, ultimately to some degree the solution will always be keeping a function warm. And that’s ultimately compute, and aws is not likely to give that away.
Honest question not being sarcastic: if this cold start latency is so important why chose function over elastic beanstalk or other auto-scaling type system?
Answer could help us try something new. We currently use large google app engine 'apps' after failing to get functions to scale quick enough (and hit limits). we have SUPER bursty traffic that needs to scale up to hundreds of instances very fast.
> if this cold start latency is so important why chose function over elastic beanstalk or other auto-scaling type system?
Pricing! It depends on the application, but there are some use-case where Lambda is way cheaper, so if we can also partially solve cold start latency, why not?
Way cheaper given a particular workload and access pattern*, Lambda is not magical and its pricing is tricky to figure out until you have something running.
You can continue to pay only for what you use and handle super bursty workloads. But there is no free solution to cold-start if you don't see the future. People needing consistent latency were already doing something similar to keep the containers alive or they just avoided to use Lambda. So I think it's a step forward.
"Click a checkbox and we'll run your code for you, take care of OS security updates, compliance requirements, autoscaling, load balancing, AZ resiliency, getting logs of your box, restarting unhealthy processes, ..."
You wouldn’t use provisioned capacity for “workloads” where you don’t care about latency - like processing events.
It would only be used for user impacting APIs.
There are a few types of processes that I have had to create.
1. A Windows service that processed a queue. We have 20x more messages at peak. Of course since it was tied to Windows, lambda wasn’t an option. I had to create an autoscaling group based on queue length. That also involves CloudWatch alarms to trigger scaling and now we either have one instance running all the time (production) or we have a min of zero and only launch an instance when there is a message in the queue (non prod). Not only is the process slower to scale, but because it’s Windows AWS does hourly billing.
Of course the deployment process and Cloudformation template are a lot more complicated than lambda.
2. Same sort of process on lambda. The CloudFormation template using SAM is much simpler and the process is faster to scale in and out.
Also, you can configure everything on the web and export the template.
3. A Node/Express API using lambda proxy integration behind API Gateway.
Again this was easy to set up but cold start times were killing us and we knew that we were going to have to move it off of lambda because of the 6MB request/response limit.
4. The same API as above running in Fargate.
Since we knew advance that this was the direction we wanted to go, I opted to use Node/Express for the lambda. So we didn’t require any code changes. But creating a registry, Docker containers, services, clusters, load balancers, autoscaling groups, etc took a lot longer to get right and then automating everything with CloudFormation was more complicated.
Hey all, I lead developer advocacy for serverless at AWS and was part of this product launch since we started thinking about it(quite some time ago I should say). I'm running around re:Invent this week, but will try and pop in and answer any questions I can.
Provisioned Concurrency (PC) is an interesting feature for us as we've gotten so much feedback over the years about the pain point of the service over head leading to your code execution (the cold start). With PC we basically end up removing most of that service overhead by pre-spinning up execution environments.
This feature is really for folks with interactive, super latency sensitive workloads. This will bring any overhead from our side down to sub 100ms. Realistically not every workload needs this, so don't feel like you need this to have well performing functions. There are still a lot of thing you need to do in your code as well as knobs like memory which impact function perf.
Fwiw new networking for VPC is completely rolled out for all public regions now. (#2) (and thank you, its called "Attached to a VPC" and not in :) )
This covers you straight through 4.
Now it's possible that your execution environment could be sitting for sometime waiting for any action and so pre-handler DB connections and things like that might need to be tweaked in this model.
So I had to convert three lambda APIs using proxy integration to Fargate mostly because of the 6MB request/response limit but the cold starts caused us to make a rule that we weren’t going to convert our EC2 hosted APIs to lambda. We were going to host them on Fargate.
But since the APIs that I moved over to Fargate are now automatically being deployed to both lambda and Fargate with separate URLs, we can A/B test both and see if we will move to lambda in cases where the request/response limit isn’t a problem.
Btw, I didn’t think using a NAT instead of an ENI had rolled out completely. I tried to delete a stack recently and it still took awhile to “cleanup” resources. I thought that was caused when it was deleting the ENI. I’ll be on the look out for it next time I need to delete a stack.
I'm wondering how feasible it'd be to vary PC through the day.
The pricing examples include using PC on a limited duty cycle, and billing is defined to start from the moment it's enabled (rather than from when it's ready), so it'd be reasonable to expect there's some level of certainty that the concurrency level is ready within a defined timeframe. What might that timeframe be, and to what level of certainty?
Sure, but there are no actionable metrics re. response time.
The closest is a suggestion from prior documentation[1] that Lambda "can scale by an additional 500 instances each minute" for a given function, but that's phrased like a promotional claim, not a commitment or even an objective, and it's even unclear whether that's a floor, a ceiling, or some average measure. I wouldn't doubt that PC lives on the same control plane as Lambda's regular scaling, but assuming identical behaviour is unwise unless documented.
AFAIK it's not a solution, it's just a workaround.
As others have said, the previous workaround was a cron event that would invoke a function every few minutes to keep it warm. This is a lot better than that.
They're still working to get cold starts as fast as possible, but this helps a lot in the meantime.
Your function is frozen if there is no active invoke in progress. So no, ssh or minecraft server will not work, unless you make them communicate over Lambda invocations.
Am I misunderstanding something here? Based on the AWS calculations on the Lambda pricing page, a single 256Mb Lambda would incur a cost of $2.7902232 per month, using "provisionedConcurrency: 1". Pushing it to 3008Mb, to get access to more processing power, makes that go up to $32.78 per month (EU London region).
Compared to the standard way of warming it up by hitting the endpoint once every 5 minutes, which comes out to 8640 calls per month, which costs next to nothing.
Unless I am terribly mistaken, it doesn't seem like allowing AWS to handle this and not doing it in code (warmup plugin, cron job, etc.) is worth the cost.
Pinging the lambda to keep it warm doesn’t do the same thing.
When you do that, it only keeps one instance warm. If you have 10 concurrent requests, even if one is warm, the other 9 requests will still experience a cold start.
The only way around this is to send a request that holds the connection open long enough to make sure concurrent requests start a new lambda instance. While you are keeping the request open, that lambda instance isn’t available for a real call.
If the entire purpose of lambda is to make things easier, once you start down the Rube Goldberg path of trying to keep enough instances warm, it kind of defeats the purpose. Just spend the money and the time to set up an autoscaling group of the smallest instances of EC2 or use Fargate if you don’t want the cold start times.
It really depends on how much a fast request is worth to you. With a manual ping, you can keep only one function alive. In the worst case you may actually block your existing function with that ping which causes another instance to be spawned for real requests (and experience slow startup there).
The timed pings are just a hack and don't solve all the issues.
Apparently you get a discount on the execution time for provisioned lambdas, not sure how much this would offset (the more actual utilization you get the better I guess)
> @ben11kehoe @kondro @mwarkentin You pay for the configured Provisioned Concurrency with a flat hourly charge. Lambda usage gets billed the same, but with a discount on unit pricing ($0.035/GB-hour vs $0.06 on "on demand").
I am curious how, without actually invoking the function, lambda knows when your application has been initialized.
Does it just run your application for a few seconds and then freeze it? Or does it not even run the application code, merely ensure the virtual machine is loaded into memory ready to be run?
It seems a periodic warmer invocation at least has the advantage of ensuring your app is fully initialized and ready to respond to requests.
As a seasoned AWS developer, I love this feature. However, I wonder how the increasing complexity of AWS affects new devs as they try to grok the offered services. AWS typically does a pretty good job hiding advanced features from beginners, but I wonder how long they can do that.
Lambda has always been the most expensive compute you can buy on AWS -- you could think of that as the premium for being the most "elastic". So this feature is about giving away some of that elasticity for (a) performance predictability and (b) a bit of total cost savings. Note that you can still happily "burst" into exactly as much concurrency as you could before, you'll just have cold starts.
People used to write cron jobs to keep their functions warm, which besides being ugly didn't even work well -- you could at best keep one instance warm with infrequent pinging, i.e. a provisioned concurrency of 1. So this feature addresses that use case in a much more systematic way.
There's some precedent for features like this -- provisioned IOPS and reserved instances come to mind. In both those cases you tradeoff elasticity and get some predictability in return (performance in one case, cost in another).
I doubt there are that many people that want a provisioned concurrency of greater than one.
If you have a reliable base-load of a few requests a second and you don't have some constraint that forces you to use lambda, you are going to get much better value running your application on ecs/ec2.
So excited for this, between this and the removal of VPC cold start issues recently, avoiding Lambda for APIs because of latency seems to be a thing of the past.
Sorry for the stupid question, I genuinely want to know: how does this differ from firing up your function with an additional call every, idk, 5 mins? Wouldn’t it be cheaper and easier?
This works beyond a scale of 1, e.g if you want to be ready for 100 concurrent invocations using ping it’s quite difficult to do that I imagine.
Also presumably more reliable.
With the 5 minute ping the underlying container will be reprovisioned every few hours. At which point it’s a race to see whether the next ping comes before the real user request to swallow the cold start.
And what if I fire 500 concurrent keep-alive's? (Or maybe a bit of overhead). Then the 500 lambda instances will stay alive for a couple of minutes again, but I'm still only charged the 500 calls. Not the Gb/sec the rest of the time the labmda's are alive, right?
How do you ensure that the 500 concurrent keep-alives land on different Lambda functions? I.e. requests 220 and onwards might hit lambda functions which were warmed up by requests 0-219. I just made the above numbers up of course.
That's why I mentioned 'with maybe some overhead'. You can also have the keep-alive handling take a second or 2 extra to complete, to have the lambda blocked for this time, so the burst calls get more spread.
It's not going to be a precise solution, but still, paying 2-3 seconds every minute instead of paying the whole minute is still a lot cheaper.
Request for anyone on the Lambda team who happens to read this: your API doesn’t appear to offer a way to retrieve the “last modified by” user when grabbing function metadata.
Huh. Turns out fewer of the ones I use than I thought provide that level of detail. That just happens to be the closest approximation of what I'm currently trying to automate.
Anyway, there are several API endpoints in Lambda which supply "LastModified" but none that I can find supply "LastModifiedUser".
Layer Name and Layer Arn (the Layer Version Arn without a version suffix) are interchangeable in APIs that require a Layer Name parameter. I understand that you're trying to extract the "LayerName" field returned in the API response in ListLayers, but you can do it more concisely.
If you just need the Layer Arn to call other APIs:
a) You could eliminate steps 1 and 6, and use the Layer Arn value from 5 to call APIs that require a layer name.
b) Alternatively, you could yourself chop off the version number from the Layer Version Arn string(s) in step 4, and skip the GetLayerVersionByArn call in step 5 all together.
If you explicitly require the name, not the Layer Arn:
c) You could parse it right out of the Layer Version Arn yourself.
When it comes to doing your own string manipulation for (b) or (c), there are many ways to skin a cat ... but you could use regex (the pattern is in the API documentation), or split on colons and index the second to last element in the array.
Is it useful to return the Layer Name in more APIs? What is your use-case?
Sorry, been away from this for a while. This is for a dashboard (we have many accounts and not all stakeholders have permissions across all accounts, so we need ways to manage the information outside the web console).
I am reluctant to treat ARNs as anything but opaque blobs; I do so when necessary, but I know their format has changed in the past, and the lack of a central resource to track breaking changes across AWS means that we would likely not know that the ARN format is changing until our tools break.
I appreciate the ideas, however, and I'll look more closely at my flow to see what shortcuts I can live with.
I think this is a really good feature and has many use cases.
I also anticipate so many developers that shouldn't use Lambdas are going to use Lambdas becaues of provisioned concurrency.
I'm still frustrated that Lambda can't have alias specific environmental variables. Aren't alias' supposed to be used for staging function versions through a release pipeline?
The new pricing page for lambda ("Example 2") shows the cost for a 100M invocation/month workload with provisioned capacity for $542/month. For that same cost you could run ~61 Fargate instances (0.25 CPU, 0.5GB RAM) 24/7 for the same price, or ~160 instances with spot. For context I have ran a simple NodeJS workload on both Lambda and Fargate, and was able to handle 100M events/mo with just 3 instances.
Serverless developers take note: its time to learn Docker and how to write a task-definition.json.