
Comparing Serverless Performance for CPU Bound Tasks - bosdev
https://blog.cloudflare.com/serverless-performance-with-cpu-bound-tasks/?h
======
cremp
Each and every one of these posts from Cloudflare is direct targeting and
completely biased. Should be noted that their response times are indicative of
them being far fewer hops away. Unless they can run in the same DC, or even
make the RTT fair, their 'Webpage response' metric is utterly useless.

Notice how they admit that they don't know how lambda really works. They
switch between lambda@edge and Region-based lambdas, and don't seem to be able
to be consistent with it.

Java Lambdas have horrible cold start times, and I'm not seeing any of this
reflected anywhere in their report.

> Our Lambda is deployed to with the default 128MB of memory behind an API
> Gateway in us-east-1

Well duh the lambda is slower; it's going through API Gateway, and that does
authentication processing as well.

All in all, these blog posts from Cloudflare are turning me off from them
entirely, because they aren't even saying 'yeah, AWS got us beat in this case
here.'

~~~
zackbloom
Hi, I'm the author of the post, thanks for sharing your thoughts!

You're absolutely right we don't know how Lambda works. We have read what we
could find that's publicly available, and done a bunch of testing, but Amazon
doesn't share all that much about their architecture.

I agree that the cold-start times of Lambda are slow, particularly with
languages like Java and with VPCs. My plan at the moment is to write a blog
post focused on cold-start times specifically, when I can figure out how to
accurately test that around the world.

I'm not entirely sure why API Gateway would add hundreds of ms of latency. We
also do authentication processing with our Access product, for example, and it
certainly doesn't add anywhere near that. I also don't have any of those API
Gateway features enabled to begin with. If you would very much like it, I'm
happy to test a Lambda by hitting the Invoke API directly, but I doubt you'll
see much of a difference. As the post says, Lambda is granting us a much
smaller quantity of CPU time, there's not much you can do to get around that.

I apologize if the transitions between global and region-specific tests are
unclear. The majority of the tests are being done from DC, specifically to
focus the comparison around execution time, not global latency. I did my best
in the post to specify where I was running the test from. If you have an idea
of how that can be better expressed please share and I'll do my best to
incorporate it in the future.

~~~
cremp
Thanks for being forthcoming; and I do appreciate the post. I'm partially
venting, and I've been using AWS for a while. Vendor lock-in can be a pain in
the rear sometimes.

A lot of the gripe I've got with these posts is that they seem somewhat
incomplete. Hard to say things if you've only been messing with it for a week,
and don't have a lot of the ins-and-outs of the services. That being said,
I've spent at least 2 years diving into lambda and weird issues with it.

I'm not an employee of Amazon, and so my understanding can be off-base as
well.

Lambdas are just managed EC2 instances. Each lambda code is stored in an AWS
controlled S3 bucket; and on initial execution (cold start) pulled down, and
run in their own chroot jail. I've dealt with java lambdas the most, and I can
say that they take your zip, and run the jail inside the zip. They keep your
java process open, and just call the handler on each call (warm start.) Each
concurrent call can start another jail on another managed instance; getting
the cold start time again.

You can get cold starts by uploading a new zip, or changing any of the lambda
compute parameters.

The Golang works in a similar way, jails the zip, and keeps the program
running, calling the handler as invokes come in.

I haven't done the python, node, or .NET enough to know if those are the same
principles; I'd assume they are.

Interestingly, API Gateway really is just Cloudfront. Cloudfront is just the
AWS managed API Gateway.

~~~
js2
On the Python bits, see this blog post - "Reverse engineering AWS Lambda":

[https://www.denialof.services/lambda/](https://www.denialof.services/lambda/)

------
sreque
Assuming the author's tests are single-thread, I'm pretty sure 1024 MB doesn't
give you a full CPU core on Lambda. I could be wrong though; I haven't payed
attention to Lambda in a long time. Last I remember it was 1.5 GB that gave
you a full core. This alone makes the comparison between a mid-range server
and Lambda unfair, not to mention the differences between language runtimes.

That said, if you are using Lambda and expecting to not pay extra you have
somehow been mislead. Lambda is definitely more expensive per cycle than
managing your own instances, and I doubt that will change any time soon.

~~~
hobls
Anyone using Lambda should _absolutely_ do load testing with different memory
configurations. You will get different results, and should analyze what is
best for your application.

When calculating the overall cost of managing your own instances you should
also include time spent by your engineering team. There are particular tipping
points in terms of overall requests per second at which point you'd save money
by moving from Lambda to something like Fargate, and then even farther above
that, you're better off using EC2. And then even above that, you should be
running your own instances in a colo space. (And then at some point you should
probably be building your own datacenters, and then at some point you should
start colonizing the moon, and then... you get the idea.)

~~~
chx
> And then even above that, you should be running your own instances in a colo
> space

Why are people jumping from EC2 to colo and skipping dedicated servers?
Mystery of my life. We were running the 75th largest site in the US some years
ago (as measured by Quantcast), ran the numbers and colo was ridiculously
expensive and way more troublesome.

~~~
hobls
Oh, I was defining "EC2" as everything from small shared virtual boxes to EC2
bare metal. I suppose on-site dedicated servers is definitely a point I left
out of my hypothetical list though! I used to work at an unnamed company that
had a server rack plugged in between the kitchen and the ping pong table.
Though I never witnessed such a thing, I'd be shocked if there was never an
outage due to a particularly enthusiastic game.

~~~
chx
Not on site, no, renting them from some big provider. The advantages are huge:
if the magic smoke gets released getting a replacement on line is Someone
Else's (TM) problem. If you find you need to grow, you release the one you
rented into the pool and the provider will be able to rent it to the next
schmuck. EC2 bare metal is very expensive. In general, I am very cloud
skeptic. I found the bare metal being a much better price/value for almost all
websites. People are writing these complex apps that scale across database
clusters and I am like... YAGNI. One database server (or a two for a hot
spare) and maybe separate few web frontends is almost always enough.

~~~
hobls
I’m sure that makes plenty of sense for many applications! I’m very fond of
some of the benefits of cloud infrastructure, but I’m sure you’ve heard the
pitch.

------
eximius
Is it just me, or does anyone else find the documentation of AWS and related
services nearly incomprehensible? Maybe it's just too 'enterprise-y' and I
haven't spent enough time in that environment, but it feels like all the
information is squirreled away in 10,000 different pages and that I'd have to
read _all_ of it to just get the basics.

Also, does anyone know if there is an API for AWS to dynamically create, load,
and launch EC2 and/or Lambda instances (i.e., boto - though I'm open to
suggestions for something else) AND, preferably, have separate billing for
each thing? Do I need multiple accounts to do separate billing? Something
about IAM roles...?

~~~
deklerk
YES. Absolutely yes. I _hated_ this aspect of working with AWS. Also,
documentation always came off as "generic" and "wordy" without being "useful"
\- a huge, huge article that might have the tiny sliver you need in it, or
perhaps one of its dozens of sister articles would have the answer, who knows.

~~~
lainga
You're only supposed to grab the tiny sliver so you can mention it to
$consultant in desperation.

------
djhworld
I remember a few years ago we tried to implement a scheduled Lambda that
needed to download a bunch of files from an S3 prefix, perform some
aggregation on the data and then write the result to a database.

Our EC2 prototype of this on one of the m3 class instances could do the work
in about 2 minutes which seemed a perfect opportunity to port to Lambda.

Even on the top memory instance at the time (1536mb), the job just couldn't
finish, timing out after 5 minutes. The code was multi threaded, to
parallelise the downloads, but not matter how much we tweaked this the Lambda
would just never complete in time.

As you don't have visibility of the internal we didn't know whether this was
due to CPU constraints (decompressing lots of GZIP streams), network
saturation (downloading files from S3) or what.

In the end we gave up. Didn't have the time or resource to keep digging, and
just pinned the problem on the use case we were trying to fit was against what
Lamba is designed for

Not saying this is an indictment of Lambda, we use it in lots of places, with
a lot of critical path code (ETL Pipelines).

~~~
gleenn
I thought the use case for things like Lambda were more along the lines of
rarely used web requests that you'd save money on by not running a full box. I
do remember them being slow too.

~~~
RhodesianHunter
If your job is easily parallelizable then you can run multiple lambdas in
parallel. For the above use case they probably should have kicked off one
lambda per prefix or similar.

~~~
djhworld
That's exactly what we were doing. 1 Lambda to download and aggregate all
files under a prefix.

The problem was the task just couldn't complete in < 5 minutes.

~~~
manigandham
You have to fan out further then, process each file separately and aggregate
the aggregates, using SQS or something else to queue up the processing.

Azure's Durable Functions have an advantage here in making extreme fan-out
situations easy.

~~~
djhworld
We considered it, but at the time we just felt implementing map/reduce over
Lambda would just introduce a more complex architecture for such a simple
problem.

Maybe the recently introduced SQS->Lambda support might make it a bit cleaner,
but in the end we opted for EC2.

------
wolf550e
I'll copy from Twitter[1]:

@zackbloom @jgrahamc I can't find it in the docs on AWS site, but I've read
that AWS Lambda scales CPU linearly until 1.5GB, then gives you 2nd
thread/core and again scales linearly until 3GB. If your PBKDF2 was single
threaded, Lambda bigger than 1.5GB is wasted.

11:12 AM - 9 Jul 2018

reply by blog post author[2]:

Replying to @ZTarantov @Cloudflare @jgrahamc I can't think of a way to test
that within the Node code. The only option seems to be to update the C++
version (or some other language) to use multiple threads.

5:16 PM - 9 Jul 2018

1 -
[https://twitter.com/ZTarantov/status/1016384547364229120](https://twitter.com/ZTarantov/status/1016384547364229120)

2 -
[https://twitter.com/zackbloom/status/1016476314864312321](https://twitter.com/zackbloom/status/1016476314864312321)

~~~
Dunedan
Yes, Lambda functions use a second core above 1536MB of memory. Back in the
past they had it in their documentation, but removed it at a certain point.
Also see: [https://stackoverflow.com/questions/34135359/whats-the-
maxim...](https://stackoverflow.com/questions/34135359/whats-the-maximum-
number-of-virtual-processor-cores-available-in-aws-lambda/47582392#47582392)

~~~
manigandham
It might not be true any longer if they removed it from the documentation.

------
handruin
I've recently been exploring AWS Lambda in a stack which contained API Gateway
+ Python Flask under Lambda for a task I was working on. I deployed it using
Zappa and its purpose was to be a simple REST frontend for transferring files
to S3.

After experimenting with uploads from Lambda to S3 I was noticing that the
time to upload a tiny 4MB file changed dramatically when I reconfigured the
Lambda function's memory size. At 500MB it took 16 seconds to upload the file
which is pretty slow. Once I got past roughly 1500MB of memory, the
performance no longer improved and the best I could get was about 8 seconds
for the same payload.

None of my tests were controlled or rigorous in any way so take them with a
grain of salt...they were just surprising to me that the speed changed
dramatically with memory size allocation. I'm new to Lambda so I wasn't ware
that memory size is tied to other resource performance. I'm curious if this
goes beyond CPU and also changes network bandwidth/performance? The Lambda I
deployed did not write data to the temp location that is provided, it streamed
directly to S3.

I've since moved on from this implementation and now my Lambda function
performs a much simpler task of generating pre-signed S3 URLs. I have noticed
something else about Lambda that bothers me a little. If my function remains
idle for some period of time and then I invoke it, the amount of time it takes
to execute is around 800ms-1000ms. If I perform numerous calls right after, I
get billed the minimum of 100ms because the execution time is under that. The
part that bothers me is I'm being charged a one-time cost that's about 8x-10x
the normal amount because my function has gone idle and cold. I'll have to
continue reading to see if this is expected. It's not a huge amount in terms
of cost but surprising that I'm paying for AWS to wake up from whatever state
it is in.

~~~
thinkmassive
This is why people using Lambda at scale are concerned with keeping the
containers “warm” [https://aws.amazon.com/blogs/compute/container-reuse-in-
lamb...](https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/)

~~~
alanning
Keep in mind that if you have any kind of fanout at scale, keeping a few
lambda instances “warm” probably won’t improve your throughout much.

Update: found a nice article with metrics re: lambda-backed api gateway but
the premise applies to any fan-out.

[https://hackernoon.com/im-afraid-you-re-thinking-about-
aws-l...](https://hackernoon.com/im-afraid-you-re-thinking-about-aws-lambda-
cold-starts-all-wrong-7d907f278a4f)

------
lucb1e
This headline is weird. I thought it was going to be about doing computations
client side since it says "serverless", but what they mean is "without a
dedicated instance running all the time" (about halfway through the article, I
figured out what "lambdas" are in this context).

So if there goes so much effort into calculating costs for PBKDF2 on servers
(ahem, "serverless"), why not move it to the client side? I like client side
hashing a lot because it transparently shows what security you apply, and any
passive or after-the-fact attacks (think 1024 bit encryption decryption which
will slowly move from 'impossible for small governments' to 'just very slow'
soon) are instantly mitigated. The server should still apply a single round of
their favorite hash function (like SHA-2) with a secret value, so an attacker
will not be able to log in with stolen database credentials.

But that's probably too cheap and transparent when you can also do it with a
Lambda™.

~~~
kentonv
"Serverless" is a recent industry buzzword which roughly means: "Server
hosting environment where you upload code representing some sort of event
handler and let the host decide where and when to run it. You are billed per
event rather than per server instance."

This article is comparing the raw CPU power provided by two different
serverless products. PBKDF2 is used only as an example of a computation
requiring a lot of CPU.

~~~
lucb1e
> PBKDF2 is used only as an example of a computation requiring a lot of CPU.

Oh wow, I completely missed the point here. Having worked on strong client-
side hashing in browsers and being into crypto generally, I saw this problem
being presented and completely mistook it. Thanks!

------
com2kid
I'd love to see an honest comparison across other providers, throwing in
Google's Firebase Functions and Azure Cloud Functions.

~~~
chrisco255
Here's a good comparison of Lambda vs Azure Functions performance and scaling
up to 400 concurrent requests: [https://www.azurefromthetrenches.com/azure-
functions-vs-aws-...](https://www.azurefromthetrenches.com/azure-functions-vs-
aws-lambda-scaling-face-off/)

~~~
doczoidberg
"Since I published this piece Microsoft have made significant improvements to
HTTP scaling on Azure Functions and the below is out of date. Please see this
post for a revised comparison. [https://www.azurefromthetrenches.com/azure-
functions-signifi...](https://www.azurefromthetrenches.com/azure-functions-
significant-improvements-in-http-trigger-scaling/")

~~~
com2kid
Excellent, though 200ms seems like a long time for a service to respond to
something, especially after being warmed up.

I'm only so-so happy about GCF's response time. I honestly wonder why these
cloud functions take so long to execute after being warmed up.

~~~
doczoidberg
hmm firebase (google) cloud functions take 2,3s in average for me for a
respond (simple json output out of firebase)

~~~
com2kid
My stupid JSON returning functions average out to around 50ms - 250ms.

I'm querying across a couple hundred rows. I'm reasonably certain that calling
out to Perl and a regex would be faster for so little data. :/

------
sudhirj
zackbloom you’ve made your point already, but remember that these posts
represent a moving target. AWS could crush CF performance on pretty much all
these numbers with a few configuration changes, which they might well do. And
you’re not acknowledging the rest of the Lambda moat, like SQS integration,
free S3 bandwidth etc.

Workers has a clear advantage over Lambda@Edge, but not because of the current
resource configuration differences across the two products - the advantage is
your choice of V8 and adoption of the Service Worker API standard, which
brilliantly outshines the L@Edge API choices. Harp on that, most of what
you’re talking about now will likely be invalidated by the next reinvent, and
they’ll make it a point to tell the world.

~~~
kentonv
> AWS could crush CF performance on pretty much all these numbers with a few
> configuration change

Eh? I can see how they could match Workers' raw CPU throughput by simply
turning off throttling. But how would they "crush" it? And how can they easily
improve other performance measures like network latency, cold start time, or
deploy time? Honestly curious what you're getting at here.

> the advantage is your choice of V8 and adoption of the Service Worker API
> standard, which brilliantly outshines the L@Edge API choices.

Thanks for the kind words.

~~~
penagwin
Hello, I know you're the tech lead for the web workers at cloudflare so pardon
my ignorance if I'm wrong.

At least for pure computer speed, I think he means that if you (cloudflare)
and AWS got into an arms race in terms of allocated CPU/Memory to the
webworkers/lambda, they have more raw resources to do so. They also have a
global presence, not necessarily to the degree that you do obviously.

I highly doubt they would do this, and I think you have the superior product.
I'm just a student/hobbyist so I admittedly don't have a ton of experience.
I'm very biased towards CF, you guys are great! :D

~~~
zackbloom
Honestly, if that's what happens and Lambda end up priced more reasonably
because of us I would consider that a victory, and a lovely outcome for the
world.

That said, I also want people to build applications which run all around the
world. I can only imagine what it's like for people in Australia to browse the
modern Internet, but I doubt it's particularly fun and I'd like to help fix it
if I can.

------
CupOfJava
The x-axis is percentile of requests with that latency or lower. You need to
read the article to figure that out. Label all your axes!

------
dsl
The question is, what does Amazon know now, that Cloudflare will figure out in
a year or so?

~~~
ryanworl
Hopefully Amazon will learn from Cloudflare that your JavaScript runtime can
use V8 directly instead of running Node in a container per instance, and then
Lambda@Edge can be cheaper!

------
microcolonel
Good work at CloudFlare. I personally figure that Amazon ought to be doing
more interesting things with Lambda, like maybe starting the workers from a
memory snapshot.

~~~
cremp
They do. After the cold start, things are just run in memory.

Java for example will keep your static variables and such in memory, and keep
/tmp until you haven't called the lambda for a while.

With GoLang, they call start your program, and just call the handler method as
required.

