AWS Introducing Provisioned Concurrency for Lambda Functions

reilly3000 · on Dec 4, 2019

With Fargate Savings Plans and Spot Instances, the cost of running workloads on Fargate is getting substantially cheaper, and with the exception of extremely bursty workloads, much more consistently performant vs Lambda. The cost of provisioning Lambda capacity as well as paying for the compute time on that capacity means Fargate is even more appealing for high volume workloads.

The new pricing page for lambda ("Example 2") shows the cost for a 100M invocation/month workload with provisioned capacity for $542/month. For that same cost you could run ~61 Fargate instances (0.25 CPU, 0.5GB RAM) 24/7 for the same price, or ~160 instances with spot. For context I have ran a simple NodeJS workload on both Lambda and Fargate, and was able to handle 100M events/mo with just 3 instances.

Serverless developers take note: its time to learn Docker and how to write a task-definition.json.

munns · on Dec 4, 2019

Dollar to dollar comparisons are one way to compare these two technologies but it leaves a lot not covered. The application programming model varies greatly (socket/port vs. event). There's also a lot more that Lambda brings to the table in terms of monitoring, logging, etc that you'd need to do work yourself to enable.

Fargate is a great product, but it doesn't completely remove all operational work to the degree that Lambda does.

reilly3000 · on Dec 4, 2019

Agreed. For the vast majority of cases, a Lambda function is easier to ship and maintain, and most likely dramatically cheaper. I really only think the value of using Fargate kicks in compared to Lambda at around 5M+ invocations/month. YMMV based on workflow and workload.

efi-lumigo · on Dec 4, 2019

You are comparing only the costs of running them, what about the cost of developers who build/debug/troubleshoot the container. As someone who is running both on Lambda and Fragate it's way harder to make things tick on Fargate

time0ut · on Dec 4, 2019

It also depends a lot on the nature of the workload. For IO bound workloads (like most web services), Fargate can easily be an order of magnitude cheaper than Lambda. But if your task is soaking the CPU then you can really get your money's worth out of Lambda.

scarface74 · on Dec 4, 2019

I agree, but you don’t need to learn how to use a task-definition file. I would actually advise against it. You can create your entire Fargate environment with CloudFormation.

NathanKP · on Dec 4, 2019

You can also create your entire Fargate environment in a couple lines of TypeScript / Python / Java code using the AWS Cloud Development Kit. The AWS CDK is a declarative SDK that generates and deploys CloudFormation on your behalf, while offering you prebuilt patterns for many common deployment architectures: https://docs.aws.amazon.com/cdk/latest/guide/ecs_example.htm...

scarface74 · on Dec 4, 2019

That’s cool. I never played with CDK. One issue I had with using CloudFormation was that when I built the Docker file with a tag of :latest and then ran the CF Template, CloudFormation doesn’t perform any updates because the template didn’t change.

Luckily we use CodeBuild and Octopus Deploy. I was able to use the CodeBuild build number environment variable to tag the Docker container, to specify the Octopus build number and use an Octopus Deploy variable in the CF template to force unique and consistent tags.

planetary_dust · on Dec 4, 2019

It was mentioned today in the serverless leadership session that savings plan is coming to Lambda too.

etaioinshrdlu · on Dec 4, 2019

This feels like a step backwards to me, nevermind how necessary it may be. The magic was paying only for what you use on super bursty workloads.

Now this is like throwing your hands up and saying the users bursts are too big for AWS.

ojkelly · on Dec 4, 2019

Lambda’s success was never about bursty workloads, it’s the elastic event driven compute model it enables.

Yes only paying for the compute you actually use is great, but so is having basically limitless compute power (your wallet willing) without the ops overhead and system maintenance.

Cold starts have been a problem fire a while, and while there many be a better way than this long term, ultimately to some degree the solution will always be keeping a function warm. And that’s ultimately compute, and aws is not likely to give that away.

moduspol · on Dec 4, 2019

The other responses to your comment are good, but another point:

They also support autoscaling the provisioned concurrency. See here:

https://docs.aws.amazon.com/autoscaling/application/userguid...

dillondoyle · on Dec 4, 2019

Honest question not being sarcastic: if this cold start latency is so important why chose function over elastic beanstalk or other auto-scaling type system?

Answer could help us try something new. We currently use large google app engine 'apps' after failing to get functions to scale quick enough (and hit limits). we have SUPER bursty traffic that needs to scale up to hundreds of instances very fast.

matteuan · on Dec 4, 2019

> if this cold start latency is so important why chose function over elastic beanstalk or other auto-scaling type system? Pricing! It depends on the application, but there are some use-case where Lambda is way cheaper, so if we can also partially solve cold start latency, why not?

piva00 · on Dec 4, 2019

Way cheaper given a particular workload and access pattern*, Lambda is not magical and its pricing is tricky to figure out until you have something running.

inopinatus · on Dec 4, 2019

Cold start of an instance is still slower and larger grained.

matteuan · on Dec 4, 2019

You can continue to pay only for what you use and handle super bursty workloads. But there is no free solution to cold-start if you don't see the future. People needing consistent latency were already doing something similar to keep the containers alive or they just avoided to use Lambda. So I think it's a step forward.

soamv · on Dec 4, 2019

No, this isn't about how much you can burst -- the maximum concurrency is still the same.

scottndecker · on Dec 4, 2019

AWS 2006: "Run your workloads on our EC2 instances in the cloud 24/7."

AWS 2014: "Run your work loads on serverless so you don't have to deal with those pesky EC2 instances 24/7 anymore."

AWS 2019: "Click a checkbox and you can have your serverless workloads get dedicated EC2 instances 24/7!"

cle · on Dec 4, 2019

That's pretty reductive, it's more like

"Click a checkbox and we'll run your code for you, take care of OS security updates, compliance requirements, autoscaling, load balancing, AZ resiliency, getting logs of your box, restarting unhealthy processes, ..."

scarface74 · on Dec 4, 2019

You wouldn’t use provisioned capacity for “workloads” where you don’t care about latency - like processing events.

It would only be used for user impacting APIs.

There are a few types of processes that I have had to create.

1. A Windows service that processed a queue. We have 20x more messages at peak. Of course since it was tied to Windows, lambda wasn’t an option. I had to create an autoscaling group based on queue length. That also involves CloudWatch alarms to trigger scaling and now we either have one instance running all the time (production) or we have a min of zero and only launch an instance when there is a message in the queue (non prod). Not only is the process slower to scale, but because it’s Windows AWS does hourly billing.

Of course the deployment process and Cloudformation template are a lot more complicated than lambda.

2. Same sort of process on lambda. The CloudFormation template using SAM is much simpler and the process is faster to scale in and out.

Also, you can configure everything on the web and export the template.

3. A Node/Express API using lambda proxy integration behind API Gateway.

Again this was easy to set up but cold start times were killing us and we knew that we were going to have to move it off of lambda because of the 6MB request/response limit.

4. The same API as above running in Fargate.

Since we knew advance that this was the direction we wanted to go, I opted to use Node/Express for the lambda. So we didn’t require any code changes. But creating a registry, Docker containers, services, clusters, load balancers, autoscaling groups, etc took a lot longer to get right and then automating everything with CloudFormation was more complicated.

munns · on Dec 4, 2019

Hey all, I lead developer advocacy for serverless at AWS and was part of this product launch since we started thinking about it(quite some time ago I should say). I'm running around re:Invent this week, but will try and pop in and answer any questions I can.

Provisioned Concurrency (PC) is an interesting feature for us as we've gotten so much feedback over the years about the pain point of the service over head leading to your code execution (the cold start). With PC we basically end up removing most of that service overhead by pre-spinning up execution environments.

This feature is really for folks with interactive, super latency sensitive workloads. This will bring any overhead from our side down to sub 100ms. Realistically not every workload needs this, so don't feel like you need this to have well performing functions. There are still a lot of thing you need to do in your code as well as knobs like memory which impact function perf.

- Chris Munns - https://twitter.com/chrismunns

scarface74 · on Dec 4, 2019

The cold start from using lambda has a number of causes

1. the time to initialize the VM

2. the time to create an ENI if you are connecting to a VPC[1](until the NAT alternative rolls out globally)

3. the time to initialize your language runtime (Java seems to be the worse, scripting languages the best)

4. any program initialization done outside of your handler that runs once per cold start of your lambda runtime.

A fully “warm” instance avoids all four when run.

Is my understanding correct that a “provisioned” runtime that isn’t “warm” will only avoid the first two?

What state is a “provisioned” instance in?

[1] I refuse to use the colloquial but incorrect statement that the lambda is “running inside your VPC”.

munns · on Dec 4, 2019

Fwiw new networking for VPC is completely rolled out for all public regions now. (#2) (and thank you, its called "Attached to a VPC" and not in :) )

This covers you straight through 4.

Now it's possible that your execution environment could be sitting for sometime waiting for any action and so pre-handler DB connections and things like that might need to be tweaked in this model.

Thanks, - munns

scarface74 · on Dec 4, 2019

So I had to convert three lambda APIs using proxy integration to Fargate mostly because of the 6MB request/response limit but the cold starts caused us to make a rule that we weren’t going to convert our EC2 hosted APIs to lambda. We were going to host them on Fargate.

But since the APIs that I moved over to Fargate are now automatically being deployed to both lambda and Fargate with separate URLs, we can A/B test both and see if we will move to lambda in cases where the request/response limit isn’t a problem.

Btw, I didn’t think using a NAT instead of an ENI had rolled out completely. I tried to delete a stack recently and it still took awhile to “cleanup” resources. I thought that was caused when it was deleting the ENI. I’ll be on the look out for it next time I need to delete a stack.

scarface74 · on Dec 5, 2019

Well, now we have RDS proxy that was announced after my original post, so that should help with DB connections....

inopinatus · on Dec 4, 2019

I'm wondering how feasible it'd be to vary PC through the day.

The pricing examples include using PC on a limited duty cycle, and billing is defined to start from the moment it's enabled (rather than from when it's ready), so it'd be reasonable to expect there's some level of certainty that the concurrency level is ready within a defined timeframe. What might that timeframe be, and to what level of certainty?

WNWceAJ9R9Ezc4 · on Dec 4, 2019

PC supports auto scaling.

https://docs.aws.amazon.com/autoscaling/application/userguid...

inopinatus · on Dec 4, 2019

Sure, but there are no actionable metrics re. response time.

The closest is a suggestion from prior documentation[1] that Lambda "can scale by an additional 500 instances each minute" for a given function, but that's phrased like a promotional claim, not a commitment or even an objective, and it's even unclear whether that's a floor, a ceiling, or some average measure. I wouldn't doubt that PC lives on the same control plane as Lambda's regular scaling, but assuming identical behaviour is unwise unless documented.

Guess I can suck-it-and-see.

[1] https://docs.aws.amazon.com/lambda/latest/dg/scaling.html

munns · on Dec 4, 2019

We've just published a second post on this launch: https://aws.amazon.com/blogs/compute/new-for-aws-lambda-pred...

nexuist · on Dec 4, 2019

I am a huge fan of serverless, and AWS as well.

I also find it deeply ironic that their solution to cold starts is to keep the function running 24/7...

Could I include openssh and Apache in my Lambda instance? Maybe run a Minecraft server? :P

moduspol · on Dec 4, 2019

AFAIK it's not a solution, it's just a workaround.

As others have said, the previous workaround was a cron event that would invoke a function every few minutes to keep it warm. This is a lot better than that.

They're still working to get cold starts as fast as possible, but this helps a lot in the meantime.

karavelov · on Dec 4, 2019

Your function is frozen if there is no active invoke in progress. So no, ssh or minecraft server will not work, unless you make them communicate over Lambda invocations.

WatchDog · on Dec 4, 2019

Even if it were not frozen, the idle "Provisioned Concurrency" price is still more than paying for an active on-demand ec2 instance.

efi-lumigo · on Dec 4, 2019

Cold start is inherent to the Lambda model, trying fixing cold start will probably bend the model, there is no way around it

leovingi · on Dec 4, 2019

Am I misunderstanding something here? Based on the AWS calculations on the Lambda pricing page, a single 256Mb Lambda would incur a cost of $2.7902232 per month, using "provisionedConcurrency: 1". Pushing it to 3008Mb, to get access to more processing power, makes that go up to $32.78 per month (EU London region). Compared to the standard way of warming it up by hitting the endpoint once every 5 minutes, which comes out to 8640 calls per month, which costs next to nothing.

Unless I am terribly mistaken, it doesn't seem like allowing AWS to handle this and not doing it in code (warmup plugin, cron job, etc.) is worth the cost.

scarface74 · on Dec 4, 2019

Pinging the lambda to keep it warm doesn’t do the same thing.

When you do that, it only keeps one instance warm. If you have 10 concurrent requests, even if one is warm, the other 9 requests will still experience a cold start.

The only way around this is to send a request that holds the connection open long enough to make sure concurrent requests start a new lambda instance. While you are keeping the request open, that lambda instance isn’t available for a real call.

If the entire purpose of lambda is to make things easier, once you start down the Rube Goldberg path of trying to keep enough instances warm, it kind of defeats the purpose. Just spend the money and the time to set up an autoscaling group of the smallest instances of EC2 or use Fargate if you don’t want the cold start times.

viraptor · on Dec 4, 2019

It really depends on how much a fast request is worth to you. With a manual ping, you can keep only one function alive. In the worst case you may actually block your existing function with that ping which causes another instance to be spawned for real requests (and experience slow startup there).

The timed pings are just a hack and don't solve all the issues.

mwarkentin · on Dec 4, 2019

Apparently you get a discount on the execution time for provisioned lambdas, not sure how much this would offset (the more actual utilization you get the better I guess)

> @ben11kehoe @kondro @mwarkentin You pay for the configured Provisioned Concurrency with a flat hourly charge. Lambda usage gets billed the same, but with a discount on unit pricing ($0.035/GB-hour vs $0.06 on "on demand").

http://twitter.com/ajaynairthinks/status/1202125357144391680

WatchDog · on Dec 4, 2019

I am curious how, without actually invoking the function, lambda knows when your application has been initialized. Does it just run your application for a few seconds and then freeze it? Or does it not even run the application code, merely ensure the virtual machine is loaded into memory ready to be run? It seems a periodic warmer invocation at least has the advantage of ensuring your app is fully initialized and ready to respond to requests.

jugg1es · on Dec 4, 2019

As a seasoned AWS developer, I love this feature. However, I wonder how the increasing complexity of AWS affects new devs as they try to grok the offered services. AWS typically does a pretty good job hiding advanced features from beginners, but I wonder how long they can do that.

soamv · on Dec 4, 2019

Lambda has always been the most expensive compute you can buy on AWS -- you could think of that as the premium for being the most "elastic". So this feature is about giving away some of that elasticity for (a) performance predictability and (b) a bit of total cost savings. Note that you can still happily "burst" into exactly as much concurrency as you could before, you'll just have cold starts.

People used to write cron jobs to keep their functions warm, which besides being ugly didn't even work well -- you could at best keep one instance warm with infrequent pinging, i.e. a provisioned concurrency of 1. So this feature addresses that use case in a much more systematic way.

There's some precedent for features like this -- provisioned IOPS and reserved instances come to mind. In both those cases you tradeoff elasticity and get some predictability in return (performance in one case, cost in another).

WatchDog · on Dec 4, 2019

I doubt there are that many people that want a provisioned concurrency of greater than one.

If you have a reliable base-load of a few requests a second and you don't have some constraint that forces you to use lambda, you are going to get much better value running your application on ecs/ec2.

hn_throwaway_99 · on Dec 4, 2019

This is a big deal. Cold starts were always the huge Achilles heal for using lambdas for interactive APIs. Kudos for this.

peterkelly · on Dec 4, 2019

They really went out of their wait to avoid using the word "server" in that article.

I've always hated the term "serverless", but its usage in this context is even more ridiculous.

tybit · on Dec 4, 2019

So excited for this, between this and the removal of VPC cold start issues recently, avoiding Lambda for APIs because of latency seems to be a thing of the past.

gcatalfamo · on Dec 4, 2019

Sorry for the stupid question, I genuinely want to know: how does this differ from firing up your function with an additional call every, idk, 5 mins? Wouldn’t it be cheaper and easier?

tybit · on Dec 4, 2019

This works beyond a scale of 1, e.g if you want to be ready for 100 concurrent invocations using ping it’s quite difficult to do that I imagine.

Also presumably more reliable. With the 5 minute ping the underlying container will be reprovisioned every few hours. At which point it’s a race to see whether the next ping comes before the real user request to swallow the cold start.

momania · on Dec 4, 2019

And what if I fire 500 concurrent keep-alive's? (Or maybe a bit of overhead). Then the 500 lambda instances will stay alive for a couple of minutes again, but I'm still only charged the 500 calls. Not the Gb/sec the rest of the time the labmda's are alive, right?

mactunes · on Dec 4, 2019

How do you ensure that the 500 concurrent keep-alives land on different Lambda functions? I.e. requests 220 and onwards might hit lambda functions which were warmed up by requests 0-219. I just made the above numbers up of course.

momania · on Dec 4, 2019

That's why I mentioned 'with maybe some overhead'. You can also have the keep-alive handling take a second or 2 extra to complete, to have the lambda blocked for this time, so the burst calls get more spread. It's not going to be a precise solution, but still, paying 2-3 seconds every minute instead of paying the whole minute is still a lot cheaper.

momania · on Dec 4, 2019

Ok, I quickly did the math for 500 lambdas (1Gb):

Provisioned: 500 * 86400 (sec/day) * 0.000004167 ~= $180

Keep-alive: 500 * 1440 (min/day) * 2 (sec. runtime) * 0.000016667 (100ms price) * 10 ~= $240

Invocation costs on the provisioned ones would be a lot cheaper too cheaper too. Roughly $45 provisioned vs $80 for 50M calls.

So depending on the demands, provisioning can be more performant, and cheaper at the same time.

alexellisuk · on Dec 4, 2019

This is relatively easy to do with OpenFaaS and Knative on Kubernetes. If we're paying for idle, why not take a look at EKS on Fargate?

https://www.openfaas.com

macintux · on Dec 4, 2019

Request for anyone on the Lambda team who happens to read this: your API doesn’t appear to offer a way to retrieve the “last modified by” user when grabbing function metadata.

Very unlike other AWS APIs and very annoying.

msftie · on Dec 4, 2019

Which API specifically are you referring to (GetFuntionConfiguration ?), and which APIs are you comparing it against?

macintux · on Dec 4, 2019

Examples of APIs that provide a "LastModifiedUser" result:

* https://docs.aws.amazon.com/systems-manager/latest/APIRefere...

Huh. Turns out fewer of the ones I use than I thought provide that level of detail. That just happens to be the closest approximation of what I'm currently trying to automate.

Anyway, there are several API endpoints in Lambda which supply "LastModified" but none that I can find supply "LastModifiedUser".

* https://docs.aws.amazon.com/lambda/latest/dg/API_GetFunction...

* https://docs.aws.amazon.com/lambda/latest/dg/API_ListFunctio...

macintux · on Dec 4, 2019

There are other oddities too. For example, to get a layer name from a lambda definition, the simplest/most robust process I can define:

1 Retrieve all layers with list_layers and index them by ARN

2 Retrieve all function metadata

3 For each function metadata item, extract all layer version ARNs

4 For each layer version ARN, call get_layer_version_by_arn

5 Extract the layer ARN from that result

6 Use that layer ARN to retrieve the name from the data we retrieved in step 1

msftie · on Dec 4, 2019

Layer Name and Layer Arn (the Layer Version Arn without a version suffix) are interchangeable in APIs that require a Layer Name parameter. I understand that you're trying to extract the "LayerName" field returned in the API response in ListLayers, but you can do it more concisely.

If you just need the Layer Arn to call other APIs:

a) You could eliminate steps 1 and 6, and use the Layer Arn value from 5 to call APIs that require a layer name.

b) Alternatively, you could yourself chop off the version number from the Layer Version Arn string(s) in step 4, and skip the GetLayerVersionByArn call in step 5 all together.

If you explicitly require the name, not the Layer Arn:

c) You could parse it right out of the Layer Version Arn yourself.

When it comes to doing your own string manipulation for (b) or (c), there are many ways to skin a cat ... but you could use regex (the pattern is in the API documentation), or split on colons and index the second to last element in the array.

Is it useful to return the Layer Name in more APIs? What is your use-case?

macintux · on Dec 10, 2019

Sorry, been away from this for a while. This is for a dashboard (we have many accounts and not all stakeholders have permissions across all accounts, so we need ways to manage the information outside the web console).

I am reluctant to treat ARNs as anything but opaque blobs; I do so when necessary, but I know their format has changed in the past, and the lack of a central resource to track breaking changes across AWS means that we would likely not know that the ARN format is changing until our tools break.

I appreciate the ideas, however, and I'll look more closely at my flow to see what shortcuts I can live with.

stunt · on Dec 4, 2019

I think this is a really good feature and has many use cases. I also anticipate so many developers that shouldn't use Lambdas are going to use Lambdas becaues of provisioned concurrency.

ac360 · on Dec 10, 2019

Provisioned Concurrency is now supported in the Serverless Framework - https://github.com/serverless/serverless

NewLogic · on Dec 4, 2019

I'm still frustrated that Lambda can't have alias specific environmental variables. Aren't alias' supposed to be used for staging function versions through a release pipeline?

bni · on Dec 4, 2019

Alias is a super weird feature of AWS Lambda imho. We setup separate Lambdas for dev, test and prod instead, like Serverless Framework does it.

k__ · on Dec 4, 2019

At least if you build APIs, you can use VTL and avoid Lambda and its cold starts completely

pumanoir · on Dec 4, 2019

Which service uses VTL (is it Appsync)?

k__ · on Dec 4, 2019

AppSync and API-Gateway

choukri060 · on Dec 4, 2019

tkyjonathan · on Dec 4, 2019

I am not even sure that developers around me know how to do concurrency, since moving to micro services.