Running a site the size of the BBC on Lambda is nothing short of an exuberant wa...

oppopower · on Nov 5, 2020

AWS Lambda $0.20 per 1M requests $0.0000166667 for every GB-second SLA 99.95%

Lets assume - 2000 calls/sec - each call is 1 sec duration - 0.128 GB/sec/call - db, storage iops will be the same if deployed as K8s - 5x9s SLA (imples a three region deployment)

requests per year = 2000 * 3600 * 24 * 365 = 63072000000 request costs = 0.2 * 63072000000 / 1000000 = 12,614 USD

GB/sec per year = 0.128 * 2000 * 3600 * 24 * 365 = 8073216000 GB/sec costs = 8073216000 * 0.0000166667 = 134,553.87

Total cost = 12614 + 134553.87 = 147,167.87 USD

The equivalent K8s would be - three clusters - 2000 cores (more likely 10% more = 2200) - 256GB memory

Three clusters will require 3000 cores to cater for region loss - 3000 cores on 32 core machines => 94 machines - round up to 99 machines to give vm level redundancy => 33 machines per cluster

Azure D32a_v4 (32 core, 128 GiB, 800 GiB storage) = $1.84/hour PAYG; $0.5704/hour Spot

DS32a_v4 at spot pricing = 99 * 0.5704 * 24 * 365 = 494673.696 USD

Plus FTE support (e.g. n FTEs @ 100k USD)

With 2 FTEs total is 694,673 USD

Summary: AWS Lambda is 4.7x cheaper than a Kubernetes solution

dougmoscrop · on Nov 9, 2020

I responded lower, but dude! 2000 requests a second is hardly anything at all, unless the application server is doing some seriously heavy lifting in which case the architecture is wrong.

You should redo the calculations with 1gb of memory for Lambda and like 30 machines would be generous

Concurrency is key. Requests don't cost much when they're just waiting for other things, but Lambda continues to pile costs on for every increase in concurrency.

APIs should maybe use a tiny fraction actual real CPU time. Perhaps BBCs are different - In order to make an actual fair comparison and properly predict what they would need in servers, greater detail is needed than what you have available to you, but I think your estimations are off by a significant amount.

ralph87 · on Nov 11, 2020

I stopped reading at "3000 cores"; there is a lot of money to be made mopping up disasters like that, it's clearly even something of a growth industry. We had one machine push 2,400 requests/sec average over election night, without even touching 30% capacity, costing around $600/mo including bandwidth. Its mirror in another region costs slightly more at $800/mo. As a side note, it's always the case with those folk that they invent new employees to top up their estimates, that wouldn't be required in the serverless world, yet in every serverless project I've ever seen, they absolutely still existed because they had to.

Price-perf ratio between Lambda and EC2 is obscene, even before accounting for Lambda's 100ms billing granularity, per-request fees, provisioned capacity or API Gateway. Assuming one request to a 1 vCPU, 1,792MB worker that lasted all month (impossible, I know), this comes to around $76, compared to (for example) a 1.7GB 1 vCPU m1.small at $32/mo or $17.50/mo partial upfront reserved.

Let's say we have a "50% partial-reserved" autoscaling group that never scales down, this gives us a $24.75/mo blended equivalent VM cost for a single $76 Lambda worker, or around 3x markup, rising to 6x if the ASG did scale down to 50% its size the entire month. That's totally fine if you're running an idle Lambda load where no billing occurs, but we're talking about the BBC, one of the largest sites in the world...

The BBC actually publish some stats for 2020, their peak month was 1.5e9 page views. Counting just the News home page, this translates to what looks like 4 dynamic requests, or 2,280 requests/sec.

Assuming those 4 dynamic requests took 250ms each and were pegging 100% VM CPU, that still only works out to 570 VMs, or $14,107/mo. Let's assume the app is not insane, and on average we expect 30 requests/sec per VM (probably switching out the m1.medium for a larger size taking proportionally increased load), now we're looking at something much more representative of a typical app deployment on EC2, $1,881/mo. on VM time. Multiply by 1.5x to account for a 100% idle backup ASG in another region and we have a final sane figure: $2,821/mo.

As an aside, I don't know anyone using 128mb workers for anything interactive not because of memory requirements, but because CPU timeslice scales with memory. For almost every load I've worked with, we ended up using 1,536mb slices as a good latency/cost tradeoff.

ralph87 · on Nov 12, 2020

Just for completeness, updating parent comment's Lambda estimates, not counting provisioned worker costs, and assuming no request takes more than 100 ms.

    Lambda requests: ((1.5e9 * 4) / 1e6) * .20     = $   1,200 
    Lambda CPU (1536 MB): 0.0000025000 * 1.5e9 * 4 = $  15,000
    API Gateway HTTP reqs:
        (count): 1.5e9 * 4 = (6 billion)
        (first 300m): 300 * 1.0                    = $     300
        (next 5700m): 5700 * 0.9                   = $   5,130

    LAMBDA MONTHLY TOTAL                           = $  21,630
    LAMBDA YEARLY TOTAL                            = $ 259,560

And for comparison:

     NLB (2x)
        (NLB hours 1 month):
           2 * 0.0225 * 24 * 30.45                 = $      33
        (NCLU hours):
           2 * (2280/50) * 0.006 * 24 * 30.45      = $     399

     NLB MONTHLY TOTAL                             = $     432
     NLB YEARLY TOTAL                              = $   5,184

     EC2 YEARLY
        (if  1 req/vCPU)                           = $ 253,926
        (if 15 reqs/vCPU)                          = $  67,704
        (if 30 reqs/vCPU)                          = $  33,852

Note the "1 req/vCPU" case would require requests to burn 250ms of pure CPU (i.e. not sleeping on IO) each -- which in an equivalent scenario would inflate the Lambda CPU usage by 3x due to the 100ms billing granularity, i.e. an extra $30,000/month.

That's an 87% reduction in operational costs in the ideal (and not uncommon!) case, and a minimum of a 59% reduction in the case of a web app from hell burning 250 ms CPU per request.

dougmoscrop · on Nov 13, 2020

Totally agree. Lambda needs to 1/10 their costs or start billing for real CPU time and get rid of invocation overheads to really compete at these scales.

Now I have dozens of serverless projects for smaller use things because there is still a point where the gross costs just don't matter (as in, if my employer was worried about lambda vs EC2 efficiency, there are probably a few meetings we could cancel or trim the audience of that would make up for it.)

But not at this scale.

shinytech · on Nov 11, 2020

Lambda has huge potential for very defined work load. In this case, I do not get it. As you mentioned the idea of having 3 regions with 3000 cores? Are you doing ML on K8s? Another aspect is the caching with CDN and internally, I do not get that either.

smarx007 · on Nov 5, 2020

What you are missing are the Amazon API Gateway costs. I really liked my own calculations on Lambda costs similar to yours until I figured I'd need to use API Gateway too.

Edit: another thing is the amount of RAM used by functions. The CPU speed you get is proportional to RAM so if your code fits in the RAM but has poor performance, doubling the RAM is what you have to do. Another hidden cost.

oppopower · on Nov 5, 2020

Thanks, peer review is what I'm after

The GB/sec calc is there My assumption was that even with k8s there'll be an API gateway, GTM etc

Comparing AWS Lamda = 147,167.87 USD to K8s with autoscaling:

DS32a_v4 at spot pricing = 66 * 0.5704 * 24 * 365 = 329782.46 USD With 2 FTEs @ 100k total is 529782.46 USD Caveat application can tolerate autoscaling delays

Summary: AWS Lambda is 3.5x cheaper than a Kubernetes solution

dougmoscrop · on Nov 9, 2020

The calculations are still a little more complicated. I think serverless is the future, but I also think we need to continue to put pressure on AWS to lower costs

Lambda and servers are not equal, you can't just calculate the number of servers one would need for an equivalent Lambda load. It's entirely possible that they could get away with significantly fewer servers than you think.

Your cost calculation includes 128mb provisioned. You cannot run an API with 128mb Lambdas. Try 1gb or even 1.5gb. It's not that you need that much memory of course, but if you want to have p98 execution and initialization times that are palatable, you need the proportional speed benefits that come with the additional memory.

And no, you won't need API gateway because you'd likely be including your own in your cluster and it will handle far more load without needing nearly as much autoscaling as the app servers.

Lambda autoscales too - it's not instant, and there are steps it goes through as it ramps up.

If Lambda removed the per-invocation overhead and billed for actual CPU time used, not "executing" (wall) time, I think that would be fantastic. Again, I still think it's the future, but it has a ways to go before it's appropriate for certain use cases and load profiles.

Edit: oh, and I think the managed ROI is also a case by case basis. Do you have people who know how to run a cluster for you already? Completely different conversation.

I will also say that Lambda is still not maintenance-free, either.

plantain · on Nov 5, 2020

API gateway is optional though - it's poorly documented but most workloads are just fine without it.

smarx007 · on Nov 7, 2020

How does one invoke a Lambda function via HTTP without an AWS account (ie a public API call)? I think you are not including it in the "most workloads"?

dougmoscrop · on Nov 9, 2020

Most if not all AWS services are really just HTTP APIs. A Lambda invocation is really just a POST to a public AWS endpoint. You can absolutely come up with login flows that obtain a set of temporary STS credentials that are only allowed to invoke your "API" function. (Agreed this is not most workloads)

_glsb · on Nov 4, 2020

Completely agree. What also irks me is the corpspeak sleigh of hand, where the problem of maintaining your infrastructure is “solved” by using AWS Lambda. It’s not solved, you’re just paying some AWS contractors to do it for you and you assume it’ll be alright.

remus · on Nov 4, 2020

> It’s not solved, you’re just paying some AWS contractors to do it for you and you assume it’ll be alright.

In as much as maintaining infrastructure can ever be 'solved', doesn't paying someone else who does a good job of it to provide you with the infra count as solving the problem? Otherwise you'd be down the mines picking ore out of the ground so you can build your chips right, rather than relying on intel/AMD to do it for you and assuming it'll be alright.

_glsb · on Nov 5, 2020

The problem arises when the infra providers’ setup stops being aligned with yours. Which, given a small organization and a long enough period of time, is guaranteed to happen.

The question is then: is it just cheaper in the long run to deal with the hassle of your own infra (we are still talking about the cloud, btw, the thing that supposedly already solved it) or would it be ok to follow the practices and changes in the providers offering?

strainyy · on Nov 4, 2020

There wasn't any mention of cost, so you're making a big assumption here. I imagine the caching layer would significantly reduce the Lambda calls.