My understanding is that this is a difficult problem to solve "perfectly" due to lag between incurring a cost and recording the cost.
I believe GCP currently has the best feature for this. You can set both billing alerts as well as caps. However, I also believe that it can take up to 24 hours between incurring a cost and it showing up on your billing report. So even on GCP (which is the most forward-thinking cloud service for this feature), you can incur up to 24 hours of charges over your maximum billing threshold cutoff. I'm also not 100% sure if GCP's billing threshold is really designed to be a "hard cap" per se.
The real question is whether AWS should let perfect be the enemy of good; and/or whether providing a somewhat "broken" service like GCP's would mislead customers into feeling more protected than they actually are.
See here where someone set a Firebase billing budget of $7 but an infinite recursion generated $72,000 in charges. When the founders started seeing the charges come in, all they could do was watch as it grew and grew....because their screen was merely reflecting what had already happened in hours past.
That is an argument for why these overages occur. It isn't an argument for why customers should eat that cost rather than Amazon. In fact Amazon is in a much better position than customers to absorb those costs. Sure, they'd have to increase rates slightly to cover it. But it would give customers peace of mind.
And if the costs become exorbitant, Amazon is in a better position to improve their own systems to reduce the amount of overages that people run into in practice.
In theory, Amazon's first leadership principle is Customer Obsession. (See https://www.amazon.jobs/en/principles for the full list.) If they took that seriously, then setting this issue to rest for their customers would be a no-brainer.
FWIW I know of a startup whose video sharing app was used to reshare a pay per view football match and they incurred a $30k bandwidth bill that AWS did not cancel. That killed the startup. It was largely their own fault for not securing the platform well enough, or moderating popular streams, but being able to cap their AWS bill would have kept them in business..
I am not going to risk runaway costs in hope that Amazon "might" cancel it.
Though, apparently population both caring about it and avoiding Amazon as result would pay less than cost of implementing it and not refunded income from catastrophic runaways.
I would argue that AWS does not have a "Customer Obsession", and that's exactly why it's Amazon's most profitable business (by far) and the underwriter of all of Bezos's ambitions.
Disclosure: AWS employee. Support specifically. There are good things about my employer. There are bad things about my employer.
AWS does indeed obsess about the customer. Every step along the chain there is someone there advocating for the customer. There are mechanisms to keep the customer in mind even for the developers who actually code the service and don't talk to customers on a daily basis.
I've had many, many service team members shadow me as I worked their service's tickets. This is explicitly so they can see in real time customer pain-points. If a customer has a question about a unique use case, the service team will proactively reach out to support engineers to set up a call to discuss the use case further. There are monthly (or twice-monthly) meetings between support service owners (i.e. those people in support who 'own' a service) and service teams to identify the top issues customers are having with the service. AWS is constantly looking for ways to better assist customers, make support less difficult for customers, increase self-service options for customers, etc.
I'm really, really curious where the basis behind your argument. Because from everything I've seen and been a part of, it's simply untrue.
Customer obsession would be things like implementing bill caps on new account creation.
Customer obsession would be NOT shipping buggy, unreliable software like AWS Amplify.
Customer obsession would be CloudFormation-first.
Customer obsession would be not forcing me to upgrade to a paid Support account to report a bug.
The list goes on, unfortunately. I do believe AWS employees mean what they say, but the external reality (IMO) is it takes a lot of time and effort to get AWS to notice their customers unless you're one of the big boys.
Bill caps sound great until you leave one on on a production system and your whole business comes crashing down during a spike in customer traffic.
Customer obsession means not shipping bugs? OK, Bob, let's see your code.
CloudFormation is wonderful and essential. But AWS clearly optimizes for delivery speed. SOME service customers want Cloudformation from day 1. Others would rather have the API first, and have CFN a few months later.
> Bill caps sound great until you leave one on on a production system and your whole business comes crashing down during a spike in customer traffic.
Hence the "ask on account creation". If I want a dev account, I can choose to cap it. The amount of SMB that would benefit from this is staggering.
> Customer obsession means not shipping bugs? OK, Bob, let's see your code.
A major difference being that my company isn't worth $1T+.
I've used AWS for a long time and spoken at length with many wonderful, intelligent people in the company; and I didn't mean to tread on anyone's toes, I just wanted to express how it feels as a customer who spent >£150k/month for half a decade.
Amazon is in the news right now for employees making tone-deaf dishonest public statements trying to deflect legitimate criticism. Out of respect for your employer, please stop.
I am an AWS employee as well, and I'm definitely one of the biggest critics of the company. That being said, AWS is definitely still customer obsessed.
I feel dirty defending AWS, but this is one case I'd give them the benefit of the doubt. There must be _a_ reason they haven't implemented this yet and that reason must be somehow protecting the customer. "Customers want this" ends the discussion around here. You must have a really good reason to disagree.
Should they simply automatically eat the costs then it would undoubtedly result in abuse. Just look at GitHub Actions.
Oh I didn't think I was starting 1000 instances of miners in the EC2 GPU cloud, it certainly exceeded my configured budget, please give my money back..
"You can only launch 1 concurrent EC2 instance per $100 on your max cap, if you want to launch 1000 instances of EC2 your max cap needs to be set to at least $100.000, or to $unlimited".
There are many solutions to this problem to prevent abuse. All of which are way better for consumers than the status quo of everybody having an $unlimited spending limit.
That's where "if the costs become exorbitant, Amazon is in a better position to improve their own systems to reduce the amount of overages that people run into in practice" comes in.
If starting 1000 instances exceeds your configured budget, they could simply not start them, and shut down whatever number of instances you managed to start as soon as their cost exceeds the budget.
I guess the question is how precise monitoring and reactionary system Amazon wants build for this, for an arguably marginal use case anyway; they do already provide Amazon Budgets for automatic actions when exceeding budgets, but it's quite not as real-time. And then making the niche cases favor the customer is an invitation to abuse.
But least all Amazon accounts are tied to a credit card, so abusing in a scale similar to e.g. the GitHub case is not that easy.
> Oh I didn't think I was starting 1000 instances of miners in the EC2 GPU cloud, it certainly exceeded my configured budget, please give my money back..
All AWS accounts start off without the ability to do this (via the quota system) and being able to start 1000 ec2 instances of any type is a setting that needs to be unlocked via a support request (which can never be done by that support person, but always needs to be escalated to the "service team" and takes about 1 business day).
This is a billing question, not a technical question, and looked at through that lens it's easy to put a hard limit on a monthly bill: just don't ever issue bills greater than that amount.
If I say I only want to pay a maximum of $1000 a month, and I hit that limit but it takes a bit for the provider to shut everything down so really $1100 of resources were consumed, then the provider eats the $100 overrun and I get a bill for $1000.
With an actual hard limit you create a financial incentive for the provider to minimize this overrun. Yes it might be difficult to fix but I assure you, if hard limits existed, the technical issues would be solved soon enough because now there's a reason to invest in a solution.
GCP doesn't support a cap, only alerts. You can use those alerts to implement your own cap mechanism, same as how it works on AWS except AWS billing is only on a one hour delay I believe so I'd say AWS wins here.
App Engine used to support caps. They're no longer supported, because for every customer pleasantly surprised, there were five customers incandescent with rage that their service had gone down at the worst possible moment due to a spike in actual demand.
I could equally well believe that they got rid of it because it affected the quarterly earnings report and there was maybe one customer who was "incandescent with rage" that the caps they put in place worked exactly as advertised.
Well... most cloudy limits only affect current operations. If you add a limit to the number of VMs running you might experience service degradation for a while, until you learn to cope with new peak demand by increasing your quota or being more efficient.
That raging customer might well assume that because almost all limits are like that, all are, including the new S3 limit, but the S3 limit causes service degradation forever, not during peak load. The writes that failed for a while map to reads that'll fail forever, because that data isn't there.
We can come up with possibilities that sound more or plausible. I'd love to hear something more factual.
We talked about caps with the Google reps at my day job.
The short answer was 'we can, but don't want to' (note : this may be completely unrelated to what Google thinks internally, and is just what the fairly high up the food chain rep told us)
I'm using caps right now, they work- it shuts down all resources attached to the billing account if it goes over. There are many levels of alerts before it hits that though.
its on the "total spend" of a billing account level though, and obviously you'd have to be a billing administrator, so to work with it would is awkward; many billing accounts across disparate projects is basically the only way.
My understanding is that this is a difficult problem to solve "perfectly" due to lag between incurring a cost and recording the cost.
It's impossible to solve perfectly with tech.
Amazon saying "If you put a number in this form we won't charge you more than that, but your account will be limited by <list of limitations> and if you go over those limits then <long list of conditions that will apply if you go over the limits up to and including removing you from the platform if AWS think it was fraudulent>" is a perfect solution.
The only reason why there would need to be a perfect tech solution is if AWS are concerned about giving people a small amount of service that they're not able to charge for. AWS clearly believe protecting themselves from overages is more important than giving customers peace of mind that they're not going to be hit with a huge bill. That's a reasonable position for a business to take, and they're completely free to do it, but you can't also argue that customers are wrong to avoid deploying to AWS because they're scared of a surprise bill.
This argument ("it's not implemented because it's to difficult to do properly") was debated ad nauseam in the past. Well, look how others implemented it and you will see how to do it.
A simple example: S3. Currently, what happens when you exceed your credit card limit is that they send you an email they weren't able to charge you but they continue to provide the service for the next two months, and during that time everything works fine: you can access your buckets and you are charged for storage and transfer.
Now, what would happen with hard caps implemented? You didn't pay, so you're locked out of your account. Nobody can access your S3 objects, including yourself. If you care enough about them, you need to make a payment and settle your account. If you don't do it within one month, the whole content will be deleted.
But does it need to be solved perfectly? Change my mind, but I don't believe anybody, garage nerd or small startup, would be affected if they overshot their planned costs by 10%. I really need a solution when my bill gets by (my) mistake 100x bigger. And I'm convinced AWS could so easily solve this - if they cared to.
There ought to be a "emergency shutoff" threshold, period. And there's just no customer-centric excuse for not implementing it after these many years.
Here's how to implement it:
"Amazon, what do you do today if my credit card fails and all the retries fail?"
Do THAT if billing hits <my emergency off switch threshold>.
Will it disrupt the heck out of all my AWS services? Of course. That's the point, if something went so seriously wrong that my billing hits an absurd level that will put me out of business, I'd rather have downtime.
At one point, I owed a balance of $0.57 to AWS and started to get warning emails about my account being suspended. Just out of morbid curiosity, I waited to see what would happen.
2.5 years later, after dozens of automated mails, they finally suspended it.
Just offer to switch existing usage to flat rate services instead of consumption based services. This isn’t rocket science. We know how to control costs.
There are no billing limits, but there are resource limits set by AWS upfront. I had to create many support cases to raise this or that limit. (for example we had a limit if 100 concurrent lambda workers, then 2.5k if I remember correctly). Some number of active ec2s, some total TB of storage etc. We were hitting those limits pretty frequently despite spending over £50k per month with them (mostly dev and test services).
I'm pretty sure that AWS' service quotas exist more as a guardrail to prevent customers from accidentally spinning up 1000 instances instead of 10, which would not only leave you with an eye-watering bill, but affect resource availability for other customers.
They're usually quite happy to increate the quote if you contact support.
Most limits exist within what AWS has determined is 'normal' usage. Once you pass that, you can request a service limit increase.
Service limit increases are typically only denied when raising the limit would negatively impact the availability of the service (noisy neighbor issues, for example), or if the customer is needing a limit increase because they're trying to use the service in a way it wasn't designed.
i looked into this a year ago and there was no such thing available (there is https://docs.microsoft.com/en-us/azure/cost-management-billi... but that is not really a spending limit, it's more like in some cases you get credits from microsoft and when you have spent all the credits they stop things for you). i mean a system where you pay monthly what you consumed, and you can set a limit, and the provider guarantees that you do not have to pay more than the limit.
but maybe i overlooked something, so if you know more about this, please tell.
I believe GCP currently has the best feature for this. You can set both billing alerts as well as caps. However, I also believe that it can take up to 24 hours between incurring a cost and it showing up on your billing report. So even on GCP (which is the most forward-thinking cloud service for this feature), you can incur up to 24 hours of charges over your maximum billing threshold cutoff. I'm also not 100% sure if GCP's billing threshold is really designed to be a "hard cap" per se.
The real question is whether AWS should let perfect be the enemy of good; and/or whether providing a somewhat "broken" service like GCP's would mislead customers into feeling more protected than they actually are.
See here where someone set a Firebase billing budget of $7 but an infinite recursion generated $72,000 in charges. When the founders started seeing the charges come in, all they could do was watch as it grew and grew....because their screen was merely reflecting what had already happened in hours past.
https://www.theregister.com/2020/12/10/google_cloud_over_run...
Discussed here: https://news.ycombinator.com/item?id=25398148