Comparing Serverless Performance for CPU Bound Tasks

cremp · on July 10, 2018

Each and every one of these posts from Cloudflare is direct targeting and completely biased. Should be noted that their response times are indicative of them being far fewer hops away. Unless they can run in the same DC, or even make the RTT fair, their 'Webpage response' metric is utterly useless.

Notice how they admit that they don't know how lambda really works. They switch between lambda@edge and Region-based lambdas, and don't seem to be able to be consistent with it.

Java Lambdas have horrible cold start times, and I'm not seeing any of this reflected anywhere in their report.

> Our Lambda is deployed to with the default 128MB of memory behind an API Gateway in us-east-1

Well duh the lambda is slower; it's going through API Gateway, and that does authentication processing as well.

All in all, these blog posts from Cloudflare are turning me off from them entirely, because they aren't even saying 'yeah, AWS got us beat in this case here.'

zackbloom · on July 10, 2018

Hi, I'm the author of the post, thanks for sharing your thoughts!

You're absolutely right we don't know how Lambda works. We have read what we could find that's publicly available, and done a bunch of testing, but Amazon doesn't share all that much about their architecture.

I agree that the cold-start times of Lambda are slow, particularly with languages like Java and with VPCs. My plan at the moment is to write a blog post focused on cold-start times specifically, when I can figure out how to accurately test that around the world.

I'm not entirely sure why API Gateway would add hundreds of ms of latency. We also do authentication processing with our Access product, for example, and it certainly doesn't add anywhere near that. I also don't have any of those API Gateway features enabled to begin with. If you would very much like it, I'm happy to test a Lambda by hitting the Invoke API directly, but I doubt you'll see much of a difference. As the post says, Lambda is granting us a much smaller quantity of CPU time, there's not much you can do to get around that.

I apologize if the transitions between global and region-specific tests are unclear. The majority of the tests are being done from DC, specifically to focus the comparison around execution time, not global latency. I did my best in the post to specify where I was running the test from. If you have an idea of how that can be better expressed please share and I'll do my best to incorporate it in the future.

cremp · on July 10, 2018

Thanks for being forthcoming; and I do appreciate the post. I'm partially venting, and I've been using AWS for a while. Vendor lock-in can be a pain in the rear sometimes.

A lot of the gripe I've got with these posts is that they seem somewhat incomplete. Hard to say things if you've only been messing with it for a week, and don't have a lot of the ins-and-outs of the services. That being said, I've spent at least 2 years diving into lambda and weird issues with it.

I'm not an employee of Amazon, and so my understanding can be off-base as well.

Lambdas are just managed EC2 instances. Each lambda code is stored in an AWS controlled S3 bucket; and on initial execution (cold start) pulled down, and run in their own chroot jail. I've dealt with java lambdas the most, and I can say that they take your zip, and run the jail inside the zip. They keep your java process open, and just call the handler on each call (warm start.) Each concurrent call can start another jail on another managed instance; getting the cold start time again.

You can get cold starts by uploading a new zip, or changing any of the lambda compute parameters.

The Golang works in a similar way, jails the zip, and keeps the program running, calling the handler as invokes come in.

I haven't done the python, node, or .NET enough to know if those are the same principles; I'd assume they are.

Interestingly, API Gateway really is just Cloudfront. Cloudfront is just the AWS managed API Gateway.

js2 · on July 11, 2018

On the Python bits, see this blog post - "Reverse engineering AWS Lambda":

https://www.denialof.services/lambda/

_ivvf · on July 10, 2018

Assuming the author's tests are single-thread, I'm pretty sure 1024 MB doesn't give you a full CPU core on Lambda. I could be wrong though; I haven't payed attention to Lambda in a long time. Last I remember it was 1.5 GB that gave you a full core. This alone makes the comparison between a mid-range server and Lambda unfair, not to mention the differences between language runtimes.

That said, if you are using Lambda and expecting to not pay extra you have somehow been mislead. Lambda is definitely more expensive per cycle than managing your own instances, and I doubt that will change any time soon.

hobls · on July 10, 2018

Anyone using Lambda should _absolutely_ do load testing with different memory configurations. You will get different results, and should analyze what is best for your application.

When calculating the overall cost of managing your own instances you should also include time spent by your engineering team. There are particular tipping points in terms of overall requests per second at which point you'd save money by moving from Lambda to something like Fargate, and then even farther above that, you're better off using EC2. And then even above that, you should be running your own instances in a colo space. (And then at some point you should probably be building your own datacenters, and then at some point you should start colonizing the moon, and then... you get the idea.)

chx · on July 10, 2018

> And then even above that, you should be running your own instances in a colo space

Why are people jumping from EC2 to colo and skipping dedicated servers? Mystery of my life. We were running the 75th largest site in the US some years ago (as measured by Quantcast), ran the numbers and colo was ridiculously expensive and way more troublesome.

mmt · on July 11, 2018

> colo was ridiculously expensive

That's slightly surprising to me, unless you were limiting yourself to a particular geographic market for colo that was unusually supply-constrained at the time. (The SFBA during certain years comes to mind).

Sometimes rented servers, for a specific use case (or if the provider is overstocked on a particular model) are a great deal, but I've never seen that at the high-performance end of the spectrum, if they even offer such a model in the first place.

For the average and middle-performance cases, though, for truly comparable servers and connectivity (internal and external, which can be tough to find in the first place), I found rented servers to be moderately more expensive than colo plus buying hardware amortized over 3 years [1].

> and way more troublesome

This one, is the mystery of my life. You mention "magic smoke" downthread, but I've only experienced that once in my entire career and that was with proprietary hardware 2 decades ago.

Conversely, my experience with rented servers is that when there is a hardware problem, other than obvious failure of a replaceable part, "troublesome" doesn't begin to describe it.

[1] yes, including all the costs like installation/rack/stack, network ports, spares, etc. They're not de minimis, but it's maybe a few extra percentage points on the overall cost.

hobls · on July 10, 2018

Oh, I was defining "EC2" as everything from small shared virtual boxes to EC2 bare metal. I suppose on-site dedicated servers is definitely a point I left out of my hypothetical list though! I used to work at an unnamed company that had a server rack plugged in between the kitchen and the ping pong table. Though I never witnessed such a thing, I'd be shocked if there was never an outage due to a particularly enthusiastic game.

chx · on July 10, 2018

Not on site, no, renting them from some big provider. The advantages are huge: if the magic smoke gets released getting a replacement on line is Someone Else's (TM) problem. If you find you need to grow, you release the one you rented into the pool and the provider will be able to rent it to the next schmuck. EC2 bare metal is very expensive. In general, I am very cloud skeptic. I found the bare metal being a much better price/value for almost all websites. People are writing these complex apps that scale across database clusters and I am like... YAGNI. One database server (or a two for a hot spare) and maybe separate few web frontends is almost always enough.

hobls · on July 10, 2018

I’m sure that makes plenty of sense for many applications! I’m very fond of some of the benefits of cloud infrastructure, but I’m sure you’ve heard the pitch.

philipodonnell · on July 10, 2018

Are there any tools that allow you to load test Lambda services with different memory configurations?

hobls · on July 10, 2018

Not to my knowledge. I'd just do something simple by changing your CloudFormation (or Terraform or whatever) templates, deploying the new config, and then running another load test. Could also just spin multiple different versions of the Lambda up, but at some point you need to start getting creative to actually generate enough traffic and running multiple load tests at the same time becomes a pain.

When it comes to load testing tools I like Vegeta[1], personally. (Though I've also used some much more complicated proprietary tools when testing at great scale.)

1: https://github.com/tsenart/vegeta

reilly3000 · on July 10, 2018

In Serverless Framework you can specify memory per function so it stands to reason making multiple copies at different memory levels would be pretty easy to test, even concurrently.

hobls · on July 10, 2018

The real fun starts when you need to start spinning up a distributed load testing fleet to actually generate enough load. :P

zlynx · on July 10, 2018

Has anyone done research on cooling datacenters on the Moon?

NathanKP · on July 10, 2018

It's not quite that straightforward. Lambda is more expensive per cycle if you are capable of keeping an instance fully utilized and getting every cycle out of that machine. After all if your Node.js or Go code takes say 20ms per request and can handle many concurrent requests you can squeeze a LOT of requests per second from a single instance.

But many workloads don't have that high of a request volume and can't actually make full use of an instance. If you have a small API or service that gets one or two requests every few seconds then paying for a 100ms chunk of Lambda execution time every couple seconds is going to be much cheaper than reserving an entire instance and then not being able to get good utilization out of it.

The tipping point is whether or not you have enough workload volume to keep an instance busy at all times. So for example password hashing in the article above. Because password hashing is deliberately CPU intensive it is very easy to keep an EC2 instance busy with even a low request volume. For a good hashing algorithm with lots of rounds its not uncommon to only get 10 authentications per second per core, because the algorithm is deliberately designed to be CPU heavy. So if you process more than 10 auths/sec then its probably cheaper to put the workload in a container that runs on an instance because you can keep that instance busy.

But if the same service is only handling one or two password hashes every minute, then you can save money by only paying for 100ms increments when an auth request arrives, and stop paying when there is nothing to do.

sfeng · on July 10, 2018

How does that compare to the cost of cloudflare workers?

NathanKP · on July 10, 2018

No idea. Given that a secure password hash will probably take about 100ms-200ms of execution time to calculate that would fall under Cloudflares "custom pricing" tier that you need to call them and negotiate.

The baseline Cloudflare worker tiers are limited to less than 5ms, less than 10ms, and less than 50ms, which isn't going to be enough time to calculate a 12 round bcrypt for example.

zackbloom · on July 10, 2018

I have a Worker running bcrypt here: https://cloudflareworkers.com/#4addaef33b10b6a58954ffbb310e7...

Based on this code: https://gist.github.com/zackbloom/c0064838cbf85e7b81df9d4690...

That means it would cost you $0.50 / million requests. AWS Lambda would be $1.84 / million, $3.50 / million for API Gateway, $0.40 / million for AWS Route 53, and various other charges.

NathanKP · on July 10, 2018

Technically CloudFlare workers is better compared with Lambda @ Edge no? I don't think you'd be using API Gateway, instead CloudFront right?

zackbloom · on July 10, 2018

Originally we were thinking about it in those terms (as an alternative to Lambda@Edge), but based on our recent results I am happy to have people compare it to Lambda as well. Lambda with API Gateway is at least eight times more expensive than Workers, and only runs your code in one location instead of 151. Unless you're using Lambda to do something ultra-Amazon-specific (like process S3 changes), I don't see why it would be the better choice.

NathanKP · on July 10, 2018

Gotcha. Thanks for explaining!

eximius · on July 10, 2018

Is it just me, or does anyone else find the documentation of AWS and related services nearly incomprehensible? Maybe it's just too 'enterprise-y' and I haven't spent enough time in that environment, but it feels like all the information is squirreled away in 10,000 different pages and that I'd have to read all of it to just get the basics.

Also, does anyone know if there is an API for AWS to dynamically create, load, and launch EC2 and/or Lambda instances (i.e., boto - though I'm open to suggestions for something else) AND, preferably, have separate billing for each thing? Do I need multiple accounts to do separate billing? Something about IAM roles...?

deklerk · on July 10, 2018

YES. Absolutely yes. I _hated_ this aspect of working with AWS. Also, documentation always came off as "generic" and "wordy" without being "useful" - a huge, huge article that might have the tiny sliver you need in it, or perhaps one of its dozens of sister articles would have the answer, who knows.

lainga · on July 10, 2018

You're only supposed to grab the tiny sliver so you can mention it to $consultant in desperation.

PretzelFisch · on July 11, 2018

I find the documentation good and some frustratingly vague depending on the service. In AWS everything can be done via code including starting and launching EC2/lambda. You'll need to look in their sdk documentation and experiment a little to figure out what everything means and in what order you need to make the api requests.

djhworld · on July 10, 2018

I remember a few years ago we tried to implement a scheduled Lambda that needed to download a bunch of files from an S3 prefix, perform some aggregation on the data and then write the result to a database.

Our EC2 prototype of this on one of the m3 class instances could do the work in about 2 minutes which seemed a perfect opportunity to port to Lambda.

Even on the top memory instance at the time (1536mb), the job just couldn't finish, timing out after 5 minutes. The code was multi threaded, to parallelise the downloads, but not matter how much we tweaked this the Lambda would just never complete in time.

As you don't have visibility of the internal we didn't know whether this was due to CPU constraints (decompressing lots of GZIP streams), network saturation (downloading files from S3) or what.

In the end we gave up. Didn't have the time or resource to keep digging, and just pinned the problem on the use case we were trying to fit was against what Lamba is designed for

Not saying this is an indictment of Lambda, we use it in lots of places, with a lot of critical path code (ETL Pipelines).

alanning · on July 10, 2018

We’ve found lambda’s x-ray feature to be very helpful wrt finding the source of slowdowns. I know it wasn’t available during the project you were writing about but wanted to mention it for others.

gleenn · on July 10, 2018

I thought the use case for things like Lambda were more along the lines of rarely used web requests that you'd save money on by not running a full box. I do remember them being slow too.

djhworld · on July 10, 2018

Nah, I think the scope is wider then that.

In my case we use lambda to perform ETL based on S3 events, so when a file drops into S3, Lambda is invoked to process it.

That works very well for us and is cheaper than running a box 24x7, as the file drops arrive sprodically throughout the day and Lambda can scale to meet the demand.

RhodesianHunter · on July 10, 2018

If your job is easily parallelizable then you can run multiple lambdas in parallel. For the above use case they probably should have kicked off one lambda per prefix or similar.

djhworld · on July 10, 2018

That's exactly what we were doing. 1 Lambda to download and aggregate all files under a prefix.

The problem was the task just couldn't complete in < 5 minutes.

manigandham · on July 10, 2018

You have to fan out further then, process each file separately and aggregate the aggregates, using SQS or something else to queue up the processing.

Azure's Durable Functions have an advantage here in making extreme fan-out situations easy.

djhworld · on July 10, 2018

We considered it, but at the time we just felt implementing map/reduce over Lambda would just introduce a more complex architecture for such a simple problem.

Maybe the recently introduced SQS->Lambda support might make it a bit cleaner, but in the end we opted for EC2.

tetha · on July 10, 2018

This is a fight I currently have with our dev-team atm. Situational awareness is a real concern, especially if the code is misbehaving in productive loads. If a solution doesn't give us situational awareness if things go wrong, I object to that solution.

wolf550e · on July 10, 2018

I'll copy from Twitter[1]:

@zackbloom @jgrahamc I can't find it in the docs on AWS site, but I've read that AWS Lambda scales CPU linearly until 1.5GB, then gives you 2nd thread/core and again scales linearly until 3GB. If your PBKDF2 was single threaded, Lambda bigger than 1.5GB is wasted.

11:12 AM - 9 Jul 2018

reply by blog post author[2]:

Replying to @ZTarantov @Cloudflare @jgrahamc I can't think of a way to test that within the Node code. The only option seems to be to update the C++ version (or some other language) to use multiple threads.

5:16 PM - 9 Jul 2018

1 - https://twitter.com/ZTarantov/status/1016384547364229120

2 - https://twitter.com/zackbloom/status/1016476314864312321

Dunedan · on July 10, 2018

Yes, Lambda functions use a second core above 1536MB of memory. Back in the past they had it in their documentation, but removed it at a certain point. Also see: https://stackoverflow.com/questions/34135359/whats-the-maxim...

manigandham · on July 10, 2018

It might not be true any longer if they removed it from the documentation.

handruin · on July 10, 2018

I've recently been exploring AWS Lambda in a stack which contained API Gateway + Python Flask under Lambda for a task I was working on. I deployed it using Zappa and its purpose was to be a simple REST frontend for transferring files to S3.

After experimenting with uploads from Lambda to S3 I was noticing that the time to upload a tiny 4MB file changed dramatically when I reconfigured the Lambda function's memory size. At 500MB it took 16 seconds to upload the file which is pretty slow. Once I got past roughly 1500MB of memory, the performance no longer improved and the best I could get was about 8 seconds for the same payload.

None of my tests were controlled or rigorous in any way so take them with a grain of salt...they were just surprising to me that the speed changed dramatically with memory size allocation. I'm new to Lambda so I wasn't ware that memory size is tied to other resource performance. I'm curious if this goes beyond CPU and also changes network bandwidth/performance? The Lambda I deployed did not write data to the temp location that is provided, it streamed directly to S3.

I've since moved on from this implementation and now my Lambda function performs a much simpler task of generating pre-signed S3 URLs. I have noticed something else about Lambda that bothers me a little. If my function remains idle for some period of time and then I invoke it, the amount of time it takes to execute is around 800ms-1000ms. If I perform numerous calls right after, I get billed the minimum of 100ms because the execution time is under that. The part that bothers me is I'm being charged a one-time cost that's about 8x-10x the normal amount because my function has gone idle and cold. I'll have to continue reading to see if this is expected. It's not a huge amount in terms of cost but surprising that I'm paying for AWS to wake up from whatever state it is in.

thinkmassive · on July 10, 2018

This is why people using Lambda at scale are concerned with keeping the containers “warm” https://aws.amazon.com/blogs/compute/container-reuse-in-lamb...

alanning · on July 10, 2018

Keep in mind that if you have any kind of fanout at scale, keeping a few lambda instances “warm” probably won’t improve your throughout much.

Update: found a nice article with metrics re: lambda-backed api gateway but the premise applies to any fan-out.

https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l...

zackbloom · on July 10, 2018

It's also worth pointing out that if your Lambda is in a VPC its cold start time can be over 10s.

lucb1e · on July 10, 2018

This headline is weird. I thought it was going to be about doing computations client side since it says "serverless", but what they mean is "without a dedicated instance running all the time" (about halfway through the article, I figured out what "lambdas" are in this context).

So if there goes so much effort into calculating costs for PBKDF2 on servers (ahem, "serverless"), why not move it to the client side? I like client side hashing a lot because it transparently shows what security you apply, and any passive or after-the-fact attacks (think 1024 bit encryption decryption which will slowly move from 'impossible for small governments' to 'just very slow' soon) are instantly mitigated. The server should still apply a single round of their favorite hash function (like SHA-2) with a secret value, so an attacker will not be able to log in with stolen database credentials.

But that's probably too cheap and transparent when you can also do it with a Lambda™.

kentonv · on July 10, 2018

"Serverless" is a recent industry buzzword which roughly means: "Server hosting environment where you upload code representing some sort of event handler and let the host decide where and when to run it. You are billed per event rather than per server instance."

This article is comparing the raw CPU power provided by two different serverless products. PBKDF2 is used only as an example of a computation requiring a lot of CPU.

lucb1e · on July 11, 2018

> PBKDF2 is used only as an example of a computation requiring a lot of CPU.

Oh wow, I completely missed the point here. Having worked on strong client-side hashing in browsers and being into crypto generally, I saw this problem being presented and completely mistook it. Thanks!

com2kid · on July 10, 2018

I'd love to see an honest comparison across other providers, throwing in Google's Firebase Functions and Azure Cloud Functions.

chrisco255 · on July 10, 2018

Here's a good comparison of Lambda vs Azure Functions performance and scaling up to 400 concurrent requests: https://www.azurefromthetrenches.com/azure-functions-vs-aws-...

doczoidberg · on July 10, 2018

"Since I published this piece Microsoft have made significant improvements to HTTP scaling on Azure Functions and the below is out of date. Please see this post for a revised comparison. https://www.azurefromthetrenches.com/azure-functions-signifi...

com2kid · on July 10, 2018

Excellent, though 200ms seems like a long time for a service to respond to something, especially after being warmed up.

I'm only so-so happy about GCF's response time. I honestly wonder why these cloud functions take so long to execute after being warmed up.

doczoidberg · on July 10, 2018

hmm firebase (google) cloud functions take 2,3s in average for me for a respond (simple json output out of firebase)

com2kid · on July 11, 2018

My stupid JSON returning functions average out to around 50ms - 250ms.

I'm querying across a couple hundred rows. I'm reasonably certain that calling out to Perl and a regex would be faster for so little data. :/

sudhirj · on July 11, 2018

zackbloom you’ve made your point already, but remember that these posts represent a moving target. AWS could crush CF performance on pretty much all these numbers with a few configuration changes, which they might well do. And you’re not acknowledging the rest of the Lambda moat, like SQS integration, free S3 bandwidth etc.

Workers has a clear advantage over Lambda@Edge, but not because of the current resource configuration differences across the two products - the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices. Harp on that, most of what you’re talking about now will likely be invalidated by the next reinvent, and they’ll make it a point to tell the world.

kentonv · on July 11, 2018

> AWS could crush CF performance on pretty much all these numbers with a few configuration change

Eh? I can see how they could match Workers' raw CPU throughput by simply turning off throttling. But how would they "crush" it? And how can they easily improve other performance measures like network latency, cold start time, or deploy time? Honestly curious what you're getting at here.

> the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

Thanks for the kind words.

sudhirj · on July 11, 2018

Because the pure geographic latency numbers are quite comparable. Workers latency when compared to Lambda@Edge/CloudFront isn't that different, and is also a moving target because both of you are adding locations all over the world continuously and buying more and more local racks. There's no clear winner here.

Cold start time is fixable, there's already been large improvements, and this is more a VPC problem. Easiest way would be to overprovision aggressively / make sure Lambdas tied to APIGW always have lambda running, or ever use some variation of ML predictions to keep things warm. But again, this isn't comparable - Lambda is a truck compared to Workers/Lambda@Edge being bikes. Parallel scalability is more important there than speed. There are enough ways to keep a few warm ones ready.

Deploy time is really download time from S3, AWS could cache more aggressively on the local cloudfront caches. I'm not seeing deployment time as being a big factor, though.

By "crush" I mean make claims about performance irrelevant. To claim that AWS cannot equalize performance between Lambda@Edge and Workers doesn't make sense, they can. And they can improve Lambda price-performance as well, and are already doing so. I'm saying this cannot and is not the Workers USP - no one in the AWS ecosystem is going to jump to Workers based on this because it lacks the rest of the AWS ecosystem.

> > the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

That's really the big differentiator for Workers. I think you should blow that trumpet a lot more. If you only publicize performance numbers, what happens to the Workers story when that advantage is lost?

> Thanks for the kind words.

You made a good decision and built something great, you're welcome.

penagwin · on July 11, 2018

Hello, I know you're the tech lead for the web workers at cloudflare so pardon my ignorance if I'm wrong.

At least for pure computer speed, I think he means that if you (cloudflare) and AWS got into an arms race in terms of allocated CPU/Memory to the webworkers/lambda, they have more raw resources to do so. They also have a global presence, not necessarily to the degree that you do obviously.

I highly doubt they would do this, and I think you have the superior product. I'm just a student/hobbyist so I admittedly don't have a ton of experience. I'm very biased towards CF, you guys are great! :D

zackbloom · on July 11, 2018

Honestly, if that's what happens and Lambda end up priced more reasonably because of us I would consider that a victory, and a lovely outcome for the world.

That said, I also want people to build applications which run all around the world. I can only imagine what it's like for people in Australia to browse the modern Internet, but I doubt it's particularly fun and I'd like to help fix it if I can.

CupOfJava · on July 10, 2018

The x-axis is percentile of requests with that latency or lower. You need to read the article to figure that out. Label all your axes!

dsl · on July 10, 2018

The question is, what does Amazon know now, that Cloudflare will figure out in a year or so?

ryanworl · on July 10, 2018

Hopefully Amazon will learn from Cloudflare that your JavaScript runtime can use V8 directly instead of running Node in a container per instance, and then Lambda@Edge can be cheaper!

microcolonel · on July 10, 2018

Good work at CloudFlare. I personally figure that Amazon ought to be doing more interesting things with Lambda, like maybe starting the workers from a memory snapshot.

cremp · on July 10, 2018

They do. After the cold start, things are just run in memory.

Java for example will keep your static variables and such in memory, and keep /tmp until you haven't called the lambda for a while.

With GoLang, they call start your program, and just call the handler method as required.