Notice how they admit that they don't know how lambda really works.
They switch between lambda@edge and Region-based lambdas, and don't seem to be able to be consistent with it.
Java Lambdas have horrible cold start times, and I'm not seeing any of this reflected anywhere in their report.
> Our Lambda is deployed to with the default 128MB of memory behind an API Gateway in us-east-1
Well duh the lambda is slower; it's going through API Gateway, and that does authentication processing as well.
All in all, these blog posts from Cloudflare are turning me off from them entirely, because they aren't even saying 'yeah, AWS got us beat in this case here.'
You're absolutely right we don't know how Lambda works. We have read what we could find that's publicly available, and done a bunch of testing, but Amazon doesn't share all that much about their architecture.
I agree that the cold-start times of Lambda are slow, particularly with languages like Java and with VPCs. My plan at the moment is to write a blog post focused on cold-start times specifically, when I can figure out how to accurately test that around the world.
I'm not entirely sure why API Gateway would add hundreds of ms of latency. We also do authentication processing with our Access product, for example, and it certainly doesn't add anywhere near that. I also don't have any of those API Gateway features enabled to begin with. If you would very much like it, I'm happy to test a Lambda by hitting the Invoke API directly, but I doubt you'll see much of a difference. As the post says, Lambda is granting us a much smaller quantity of CPU time, there's not much you can do to get around that.
I apologize if the transitions between global and region-specific tests are unclear. The majority of the tests are being done from DC, specifically to focus the comparison around execution time, not global latency. I did my best in the post to specify where I was running the test from. If you have an idea of how that can be better expressed please share and I'll do my best to incorporate it in the future.
A lot of the gripe I've got with these posts is that they seem somewhat incomplete. Hard to say things if you've only been messing with it for a week, and don't have a lot of the ins-and-outs of the services. That being said, I've spent at least 2 years diving into lambda and weird issues with it.
I'm not an employee of Amazon, and so my understanding can be off-base as well.
Lambdas are just managed EC2 instances. Each lambda code is stored in an AWS controlled S3 bucket; and on initial execution (cold start) pulled down, and run in their own chroot jail.
I've dealt with java lambdas the most, and I can say that they take your zip, and run the jail inside the zip. They keep your java process open, and just call the handler on each call (warm start.)
Each concurrent call can start another jail on another managed instance; getting the cold start time again.
You can get cold starts by uploading a new zip, or changing any of the lambda compute parameters.
The Golang works in a similar way, jails the zip, and keeps the program running, calling the handler as invokes come in.
I haven't done the python, node, or .NET enough to know if those are the same principles; I'd assume they are.
Interestingly, API Gateway really is just Cloudfront. Cloudfront is just the AWS managed API Gateway.
That said, if you are using Lambda and expecting to not pay extra you have somehow been mislead. Lambda is definitely more expensive per cycle than managing your own instances, and I doubt that will change any time soon.
When calculating the overall cost of managing your own instances you should also include time spent by your engineering team. There are particular tipping points in terms of overall requests per second at which point you'd save money by moving from Lambda to something like Fargate, and then even farther above that, you're better off using EC2. And then even above that, you should be running your own instances in a colo space. (And then at some point you should probably be building your own datacenters, and then at some point you should start colonizing the moon, and then... you get the idea.)
Why are people jumping from EC2 to colo and skipping dedicated servers? Mystery of my life. We were running the 75th largest site in the US some years ago (as measured by Quantcast), ran the numbers and colo was ridiculously expensive and way more troublesome.
That's slightly surprising to me, unless you were limiting yourself to a particular geographic market for colo that was unusually supply-constrained at the time. (The SFBA during certain years comes to mind).
Sometimes rented servers, for a specific use case (or if the provider is overstocked on a particular model) are a great deal, but I've never seen that at the high-performance end of the spectrum, if they even offer such a model in the first place.
For the average and middle-performance cases, though, for truly comparable servers and connectivity (internal and external, which can be tough to find in the first place), I found rented servers to be moderately more expensive than colo plus buying hardware amortized over 3 years .
> and way more troublesome
This one, is the mystery of my life. You mention "magic smoke" downthread, but I've only experienced that once in my entire career and that was with proprietary hardware 2 decades ago.
Conversely, my experience with rented servers is that when there is a hardware problem, other than obvious failure of a replaceable part, "troublesome" doesn't begin to describe it.
 yes, including all the costs like installation/rack/stack, network ports, spares, etc. They're not de minimis, but it's maybe a few extra percentage points on the overall cost.
When it comes to load testing tools I like Vegeta, personally. (Though I've also used some much more complicated proprietary tools when testing at great scale.)
But many workloads don't have that high of a request volume and can't actually make full use of an instance. If you have a small API or service that gets one or two requests every few seconds then paying for a 100ms chunk of Lambda execution time every couple seconds is going to be much cheaper than reserving an entire instance and then not being able to get good utilization out of it.
The tipping point is whether or not you have enough workload volume to keep an instance busy at all times. So for example password hashing in the article above. Because password hashing is deliberately CPU intensive it is very easy to keep an EC2 instance busy with even a low request volume. For a good hashing algorithm with lots of rounds its not uncommon to only get 10 authentications per second per core, because the algorithm is deliberately designed to be CPU heavy. So if you process more than 10 auths/sec then its probably cheaper to put the workload in a container that runs on an instance because you can keep that instance busy.
But if the same service is only handling one or two password hashes every minute, then you can save money by only paying for 100ms increments when an auth request arrives, and stop paying when there is nothing to do.
The baseline Cloudflare worker tiers are limited to less than 5ms, less than 10ms, and less than 50ms, which isn't going to be enough time to calculate a 12 round bcrypt for example.
Based on this code: https://gist.github.com/zackbloom/c0064838cbf85e7b81df9d4690...
That means it would cost you $0.50 / million requests. AWS Lambda would be $1.84 / million, $3.50 / million for API Gateway, $0.40 / million for AWS Route 53, and various other charges.
Also, does anyone know if there is an API for AWS to dynamically create, load, and launch EC2 and/or Lambda instances (i.e., boto - though I'm open to suggestions for something else) AND, preferably, have separate billing for each thing? Do I need multiple accounts to do separate billing? Something about IAM roles...?
Our EC2 prototype of this on one of the m3 class instances could do the work in about 2 minutes which seemed a perfect opportunity to port to Lambda.
Even on the top memory instance at the time (1536mb), the job just couldn't finish, timing out after 5 minutes. The code was multi threaded, to parallelise the downloads, but not matter how much we tweaked this the Lambda would just never complete in time.
As you don't have visibility of the internal we didn't know whether this was due to CPU constraints (decompressing lots of GZIP streams), network saturation (downloading files from S3) or what.
In the end we gave up. Didn't have the time or resource to keep digging, and just pinned the problem on the use case we were trying to fit was against what Lamba is designed for
Not saying this is an indictment of Lambda, we use it in lots of places, with a lot of critical path code (ETL Pipelines).
In my case we use lambda to perform ETL based on S3 events, so when a file drops into S3, Lambda is invoked to process it.
That works very well for us and is cheaper than running a box 24x7, as the file drops arrive sprodically throughout the day and Lambda can scale to meet the demand.
The problem was the task just couldn't complete in < 5 minutes.
Azure's Durable Functions have an advantage here in making extreme fan-out situations easy.
Maybe the recently introduced SQS->Lambda support might make it a bit cleaner, but in the end we opted for EC2.
@zackbloom @jgrahamc I can't find it in the docs on AWS site, but I've read that AWS Lambda scales CPU linearly until 1.5GB, then gives you 2nd thread/core and again scales linearly until 3GB. If your PBKDF2 was single threaded, Lambda bigger than 1.5GB is wasted.
11:12 AM - 9 Jul 2018
reply by blog post author:
Replying to @ZTarantov @Cloudflare @jgrahamc
I can't think of a way to test that within the Node code. The only option seems to be to update the C++ version (or some other language) to use multiple threads.
5:16 PM - 9 Jul 2018
1 - https://twitter.com/ZTarantov/status/1016384547364229120
2 - https://twitter.com/zackbloom/status/1016476314864312321
After experimenting with uploads from Lambda to S3 I was noticing that the time to upload a tiny 4MB file changed dramatically when I reconfigured the Lambda function's memory size. At 500MB it took 16 seconds to upload the file which is pretty slow. Once I got past roughly 1500MB of memory, the performance no longer improved and the best I could get was about 8 seconds for the same payload.
None of my tests were controlled or rigorous in any way so take them with a grain of salt...they were just surprising to me that the speed changed dramatically with memory size allocation. I'm new to Lambda so I wasn't ware that memory size is tied to other resource performance. I'm curious if this goes beyond CPU and also changes network bandwidth/performance? The Lambda I deployed did not write data to the temp location that is provided, it streamed directly to S3.
I've since moved on from this implementation and now my Lambda function performs a much simpler task of generating pre-signed S3 URLs. I have noticed something else about Lambda that bothers me a little. If my function remains idle for some period of time and then I invoke it, the amount of time it takes to execute is around 800ms-1000ms. If I perform numerous calls right after, I get billed the minimum of 100ms because the execution time is under that. The part that bothers me is I'm being charged a one-time cost that's about 8x-10x the normal amount because my function has gone idle and cold. I'll have to continue reading to see if this is expected. It's not a huge amount in terms of cost but surprising that I'm paying for AWS to wake up from whatever state it is in.
Update: found a nice article with metrics re: lambda-backed api gateway but the premise applies to any fan-out.
So if there goes so much effort into calculating costs for PBKDF2 on servers (ahem, "serverless"), why not move it to the client side? I like client side hashing a lot because it transparently shows what security you apply, and any passive or after-the-fact attacks (think 1024 bit encryption decryption which will slowly move from 'impossible for small governments' to 'just very slow' soon) are instantly mitigated. The server should still apply a single round of their favorite hash function (like SHA-2) with a secret value, so an attacker will not be able to log in with stolen database credentials.
But that's probably too cheap and transparent when you can also do it with a Lambda™.
This article is comparing the raw CPU power provided by two different serverless products. PBKDF2 is used only as an example of a computation requiring a lot of CPU.
Oh wow, I completely missed the point here. Having worked on strong client-side hashing in browsers and being into crypto generally, I saw this problem being presented and completely mistook it. Thanks!
I'm only so-so happy about GCF's response time. I honestly wonder why these cloud functions take so long to execute after being warmed up.
I'm querying across a couple hundred rows. I'm reasonably certain that calling out to Perl and a regex would be faster for so little data. :/
Workers has a clear advantage over Lambda@Edge, but not because of the current resource configuration differences across the two products - the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices. Harp on that, most of what you’re talking about now will likely be invalidated by the next reinvent, and they’ll make it a point to tell the world.
Eh? I can see how they could match Workers' raw CPU throughput by simply turning off throttling. But how would they "crush" it? And how can they easily improve other performance measures like network latency, cold start time, or deploy time? Honestly curious what you're getting at here.
> the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.
Thanks for the kind words.
Cold start time is fixable, there's already been large improvements, and this is more a VPC problem. Easiest way would be to overprovision aggressively / make sure Lambdas tied to APIGW always have lambda running, or ever use some variation of ML predictions to keep things warm. But again, this isn't comparable - Lambda is a truck compared to Workers/Lambda@Edge being bikes. Parallel scalability is more important there than speed. There are enough ways to keep a few warm ones ready.
Deploy time is really download time from S3, AWS could cache more aggressively on the local cloudfront caches. I'm not seeing deployment time as being a big factor, though.
By "crush" I mean make claims about performance irrelevant. To claim that AWS cannot equalize performance between Lambda@Edge and Workers doesn't make sense, they can. And they can improve Lambda price-performance as well, and are already doing so. I'm saying this cannot and is not the Workers USP - no one in the AWS ecosystem is going to jump to Workers based on this because it lacks the rest of the AWS ecosystem.
> > the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.
That's really the big differentiator for Workers. I think you should blow that trumpet a lot more. If you only publicize performance numbers, what happens to the Workers story when that advantage is lost?
> Thanks for the kind words.
You made a good decision and built something great, you're welcome.
At least for pure computer speed, I think he means that if you (cloudflare) and AWS got into an arms race in terms of allocated CPU/Memory to the webworkers/lambda, they have more raw resources to do so. They also have a global presence, not necessarily to the degree that you do obviously.
I highly doubt they would do this, and I think you have the superior product. I'm just a student/hobbyist so I admittedly don't have a ton of experience. I'm very biased towards CF, you guys are great! :D
That said, I also want people to build applications which run all around the world. I can only imagine what it's like for people in Australia to browse the modern Internet, but I doubt it's particularly fun and I'd like to help fix it if I can.
Java for example will keep your static variables and such in memory, and keep /tmp until you haven't called the lambda for a while.
With GoLang, they call start your program, and just call the handler method as required.