It's really not that complicated... to watch your own spend. But yet everyone here keeps running into issues, and that's just with your own projects. I'm sure you can at least appreciate the complexities involved at the scale of AWS where even the minority use-cases matter.
"Everything except for persistent storage" is nowhere near useful enough to work and can cause catastrophic losses. Wipe local disks? What about bandwidth? Shutdown Cloudfront and Lambda? What about queues and SNS topics? What about costs that are inseparable from storage like Kinesis, Redshift, and RDS? Delete all those too? And as I said before, what happens if you set a budget and AWS takes your service down which affects your customers?
It's easy to say it's simple in an HN comment. It's entirely different when you need to implement it at massive scale and that's before even talking about legal and accounting issues. There's a reason why AWS doesn't offer it.
Just shut down everything, but don't delete existing data written to disks. That can cover a wide array of budget problems. If you set a budget like that you really do not want to go over it and any potential loss from customers is not as huge as going over that budget. At least have that option.
I sometimes for example fiddle with Google APIs. I do not even have customers so don't really care if things will stop working, but I have accidentally spent 100 euros or more. I have alerts, but those alerts arrived way too late.
I make a loop mistake in my code and now I suddenly owe 100 euros...
> "Just shut down everything, but don't delete existing data written to disks."
I literally just explained why this doesn't work with AWS services. You will have data loss.
And it creates a whole new class of mistakes. If people mistakenly overspend then they'll mistakenly delete their resources too. All these complaints that AWS should cover their billing will then be multiplied by complaints that AWS should recover their infrastructure. No cloud vendor wants that liability.
It's not an unreasonable use case to just nuke everything if your spend exceeds some level. (I'm just playing around and want to set some minimal budget.) But, yes, implement that and you will see a post on here at some point about how my startup had a usage spike/we made a simple mistake and AWS wiped out everything so we had to close up shop.
ADDED: A lot of people seem to think it's a simple matter of a spending limit. Which implies that a cloud provider can easily decide:
1.) How badly you care about not exceeding a spending threshold at all
2.) How much you care about persistent storage and services directly related to persistent storage
3.) What is reasonable from a user's perspective to simply shutdown on short notice
Don't let the perfect be the enemy of the good. In so many use cases, shutting off everything except storage would do a good job. And the cloud provider doesn't have to decide anything. It's a simple matter of setting a spending limit with specified semantics. A magic "do what I want" spending limit is not necessary.
> "shutting off everything except storage would do a good job"
Except it wouldn't. This is the 3rd time in this thread explaining that. Edge cases matter, especially when creating leading to new mistakes like setting a budget and deleting data or shutting off service when customers need it most.
If it's not a hard budget but a complex set of rules to disable services... then you already have that today. Use the alarms and APIs to turn off what you don't need.
Edge cases are the difference between good job and perfect job. It makes no sense to use edge cases to say it qualifies as neither.
> If it's not a hard budget but a complex set of rules to disable services... then you already have that today. Use the alarms and APIs to turn off what you don't need.
I have been describing a simple set of rules, not a complex one.
It used to be extremely difficult to get accurate usage data on all their services. Has that been fixed? If not, then the alarms aren't good enough. If the alarms can automate enough right now, in a non-buggy way, then that should be the answer to people "hey, the alarms do more than alarm, use them to trigger shutdowns". Don't say "it can't be done, sorry". If the alarms aren't good enough for that automation, then the argument stands.
And using the APIs means that each company that wants safety is duplicating effort in an almost untested way, a recipe for so many bugs it makes the problem worse. No, this needs to be a feature of AWS itself.
"Everything except for persistent storage" is nowhere near useful enough to work and can cause catastrophic losses. Wipe local disks? What about bandwidth? Shutdown Cloudfront and Lambda? What about queues and SNS topics? What about costs that are inseparable from storage like Kinesis, Redshift, and RDS? Delete all those too? And as I said before, what happens if you set a budget and AWS takes your service down which affects your customers?
It's easy to say it's simple in an HN comment. It's entirely different when you need to implement it at massive scale and that's before even talking about legal and accounting issues. There's a reason why AWS doesn't offer it.