The lesson for me was don't trust your internally-hacked-together instance management system. The AWS interface to storage and instances is the base truth. And perhaps more importantly - I'm never getting into another startup which has financial risk like that without being a core expert in that risk/tech. I was focused on the business + client code - and had very little clue about the nitty-gritty of AWS. I should have been more involved with the code on that side, or at least the data-flow architecture.
If you don't want to pay for PD, you can patch together any number of ways to get your phone to scream and holler when it gets an email from firstname.lastname@example.org. It's also good to have clear expectations as to whose responsibility it is to deal with problem x between the hours of y and z and exactly what they are supposed to do.
Keep the alerts restricted to the really important stuff, because if your team becomes overloaded with useless alerts they will 1) dislike you and 2) be more prone to accidentally mistaking a five alarm fire for a burnt casserole.
There are more complex systems you could build, but that's a start.
I remember PagerDuty was advertising (a lot) on Leo Laporte's podcasts a few years back.
A clause in the contract: if monthly bill reaches $Xk amount then:
(a) seek written approval by client, and
(b) continue until $Yk or approval is given with a new ceiling price.
I imagine AWS would have 0 problems suspending all my services if I can't pay, so why can't it do the same thing when it reaches my arbitrary cap?
This may be something that is 'unstated', but unless you actually had access to fix something that was wrong, as well, being an expert in that wouldn't really help all that much. I've been in situations where I have explicit/expert knowledge of XYZ, but when the people responsible for XYZ do not take your input, and/or don't provide you the ability to fix a problem, expert knowledge is useless (or worse, it's like having to watch a train wreck happen when you know you could have stopped it).
In theory ;), you shouldn't have to be a core expert in everything. But yeah... in the real world, things aren't so cut and dry. :/
Easier for both sides to just ask AWS for a refund if there's a reasonable case.
This wouldn't be an issue if it was configurable.
Would you rather make a mistake leading to a big bill with the possibility of a refund or set your max budget and have your resources permanently deleted?
EC2 instances, EBS volumes, S3 data... should AWS delete those when you hit your budget? How do you stop the billing otherwise?
With prioritisation, so the non-steady state services are stopped/killed with plenty of time to leave the needed foundations still running. :)
2) If it's a soft budget then it's no different than the alarms you already have.
3) If you want to stop it before it hits the budget, then you're asking for a forecasted model with a non-deterministic point in time where things will be shutdown.
This just leads to neverending complexity and AWS doesn't want this liability. That's why they provide billing alarms and APIs so you can control what you spend.
Not if I'm busy, or away from work, or asleep. There is a massive difference between getting an alarm (which is probably delayed because AWS is so bad at reporting spent money) versus having low priority servers immediately cut.
Even without a priority system, shutting down all active servers would be a huge improvement over just a warning in many situations.
You want it to selectively turn off only EC2? Does it matter which instance and in which order? What if you're not running EC2 and it's other services? Is there a global priority list of all AWS services? Is it ranked by what's costing you the most? Do you want to maintain your own priority of services?
And what if the budget was a mistake and now you lost customers because your service went down? Do you still blame AWS for that? Or would you rather have the extra bill?
There is no easy solution.
"Everything except for persistent storage" is nowhere near useful enough to work and can cause catastrophic losses. Wipe local disks? What about bandwidth? Shutdown Cloudfront and Lambda? What about queues and SNS topics? What about costs that are inseparable from storage like Kinesis, Redshift, and RDS? Delete all those too? And as I said before, what happens if you set a budget and AWS takes your service down which affects your customers?
It's easy to say it's simple in an HN comment. It's entirely different when you need to implement it at massive scale and that's before even talking about legal and accounting issues. There's a reason why AWS doesn't offer it.
I sometimes for example fiddle with Google APIs. I do not even have customers so don't really care if things will stop working, but I have accidentally spent 100 euros or more. I have alerts, but those alerts arrived way too late.
I make a loop mistake in my code and now I suddenly owe 100 euros...
I literally just explained why this doesn't work with AWS services. You will have data loss.
And it creates a whole new class of mistakes. If people mistakenly overspend then they'll mistakenly delete their resources too. All these complaints that AWS should cover their billing will then be multiplied by complaints that AWS should recover their infrastructure. No cloud vendor wants that liability.
ADDED: A lot of people seem to think it's a simple matter of a spending limit. Which implies that a cloud provider can easily decide:
1.) How badly you care about not exceeding a spending threshold at all
2.) How much you care about persistent storage and services directly related to persistent storage
3.) What is reasonable from a user's perspective to simply shutdown on short notice
Not so simple.
That's not a he's cap, since turning off services isn't instant and costs continue to accrue. But, yes, there are ways to mitigate the risk of uncapped costs and they are subject to automation.
AWS would rather lose some billings than deal with the fallout of losing data or critical service for customers (and in turn their customers).
In theory I could build something using budget alarms, apis, and iam permissions to make sure everything gets shut down if a developer exceeds their budget, but if I made a mistake it could end up being very expensive. Not that I don't trust developers at my company to use such an account responsibly, but it is very easy to accidentally spend a lot of many on AWS, especially if you aren't an expert in it.
It's impossible for AWS to know how to handle hard caps because there are too many ways to alter what's running and it's too contextual to your business at that moment. That's why they give you tools and calculators and pricing tables so that it's your responsibility (or a potential startup opportunity).
Money is easy to deal with. Alarms work. Bills can be negotiated. But you can't get back lost data, lost service, or lost customers.
Right. In my experience, if you don't understand what's going on beneath your abstractions, you're always in for a world of hurt as soon as something goes sideways.
They have a good track record of cancelling huge bills the first time they happen
Scenario 1: Amazon will ask for the payment (if using cc); the bank will respond there are no funds in the account; Amazon deals directly with the company further directly, not with the bank, eventually getting payment order from the court. If the company went bankrupt meanwhile, Amazon might not get their money.
Scenario 2: Amazon will send the invoice; invoice will not get paid. After due date, Amazon will contact the company directly; bank doesn't even enter the picture, until collection order comes from the court. If the company went bankrupt meanwhile, Amazon might not get their money.
There's no scenario where some hypothetical loan would go straight to Amazon, unless Amazon has some instrument, that instruct the bank to pay them. Something like bank guarantee or promisory note, and uses them before declaring bankrupcy.
So for Amazon draining loaned money, they would have to transfer them to a normal account and pay with debit card paired to that account, with no limit set.
It is not wise to transfer them to a normal account; you pay interest for the balance on the loan account; if you move them to your normal account, you are paying interest for money that is sitting on your normal account.
That's why you don't pay large sums with CC, but with invoice + bank transfer, and have a limit set on your cards, when you do.
- control: you are in control, when you do the payment. You can plan your cash flow.
- additional advantages: You also have payment terms, some vendors offer discounts for earlier payments; if your cash flow can handle that, why would you giving up of that?
- liability: with CC, you are getting credit that is drawn at other party leisure. It's you, who is liable for this credit line, even if the other party made a mistake. You are always liable to the bank, never towards the vendors. With bank transfers, every single payment was authorized by you (where by 'you' I mean authorized person at your company) and the liability is towards the vendor, who is not likely to have such a strong position (see Porter's five forces).
- leverage: if another party makes a mistake, they have motivation to correct it. Every company in existence has already received invoices, that are incorrect. Withholding payment until they are corrected is a strong motivator. Without that, you could be left without invoices that can be put into accounting AND without money that you have to account for.
- setting up processes: when you grow beyond certain size, you are going to want to formalize both the procurement, accounts payable and treasury. Having purchasing and payment discipline that are compatible with that already in place will mean less pain from the growth, less things to change.
When we need people in the field purchasing small supplies, we don't want them to handle cash, so they get debit (not credit) cards, with relatively small limits. It is enough for them to get by, but not enough to make any damage of significance. (The exception is fuel and that's what fuel cards are for - basically it has a form factor of a credit or debit card, but works only for fuel, is paired to a license plate and the vendor sends invoice at the end of the month).
Another scenario, where CCs are useful, if you need to pay something right now; you don't or can't want to wait for the order->delivery+invoice->payment cycle. That's fine for consumer impulse purchases, but that should not be a normal way for company purchases.
Of course, if you start a new business relation, some companies would not trust you, that you are going to pay the invoice; sending advance invoice and paying it is fine. In practice, it is quite rare occurrence.
The real world is filled with barbershops, daycares, bars, clinics, PVC manufacturers etc
None of them get VC money.
When they need money, they go to a bank and usually have to place a PG in order to get funds.
Tech startups have it easy. Its all equity. You are not pledging your lifetime earnings on a business idea.
Once tech startups lose their upside potential (prob not anytime soon if ever), you will be sitting with the regular folk, those that pledge their skin and life to their business.
Either way it might be nice to keep your options open, depending on your plans.
I know several other companies that had expensive mistakes without refunds. There's probably a complex decision tree for these issues and I doubt anyone really knows outside of AWS.
Really? Working in Southern California a few years ago, refund requests were refused ALL THE TIME. This is why there's a common belief that what you are charged you simply owe them, period.
It may be more progressive now, but let's not be revisionist.
Being an entitled jerk who blames other people for your own negligence is bad, and you shouldn't change that. But openly giving companies the opportunity to be kind (while admitting that it was entirely your fault) potentially helps both them and you.
That might make sense for some particular services (e.g., capping the cost on active EC2 instances) but lots of AWS costs of data storage costs, and you probably don't want all your data deleted because you ran too many EC2 instances and hit your budget cap.
Where exactly you are willing to shut off to avoid excess spend and what you don't want to sacrifice automatically varies from customer to customer, so there's no good one-size-fits-all automated solution.
It's the very very first thing I set when setting up my GCloud hobby project. I was like, this is fun and all, but I don't care about this enough so I limited it to 3$ per day and 50$ per month. If it goes above, I'm very happy to let it die, and it also gives me a warning so I know something is up. The 2 times it triggered, there was something I managed to fix so the tool is still up and running costing pennies.
The account I did it on was tied to my "junk" email, so I didn't catch amazon banging on my door saying my payment info needed to be updated. Well until I did happen upon one of the emails. Nearly had a heart attack.
Talked to aws support and they full refunded me. Very very kind of them, but now I'm terrified to touch anything aws.
I also don't understand why everyone is assuming
"if I hit threshold X do A, if I hit threshold Y do B" where A and B are some combination of shutting down and deleting resources,
is as difficult as solving NP complete.
Greed, I'm assuming.
The problem with billing is that often these charges are not calculated instantly, and others are not trivial to deal with. For example what happens if you go over budget on bandwidth or bucket storage, but still within quota? What do you kill? Do you immediately shut down everything? Do you lose data? There are lots of edge cases.
You can normally write your own hooks to monitor billing alerts and take action appropriately.
In my experience AWS had very stringent limits on the amount of active instances of each type (starts around 10 for new accounts, 2 for the more expensive instances). It takes tickets to support then days of waiting to raise these limits.
That should have prevented your company from creating tens of instances, let alone hundreds, unless that's already your typical daily usage.
Does AWS update the billing console per day or upon request? I get charged per month, but I should add a habit in my habit tracker to learn more about my expenses...
It's horrifying how many places treat writing tests for services as critical, but then completely fail to write tests for their operational tooling. Including tools responsible for scaling up and down infrastructure, deleting objects etc.
Now I have learned, _always_ refresh the page and instance list prior to shutting anything down and _always_ confirm the shutdown was successful.
I set the lifecycle rule on all objects in the bucket, for as soon as possible (24 hours).
About 2 days later first thing in the morning I get a bunch of frantic messages from my manager that whatever script I was running, please stop it, before I'd even done anything for the day.
The lifecycle rule had taken effect near the end of the previous day, and he was just getting all the billing alerts from overnight, it was all done.
I read about glacier pricing, but didn't realize there was a lifecycle transfer fee per 1000 objects (I forget the exact price, maybe $0.05 per 1000 objects). That section was a lot further down the pricing page.
The bucket contained over 700 million small files.
I'd just blown $42,000.
That was over a month's AWS budget for us, in the end AWS gave us 10% back.
On the plus side, I didn't get in too much trouble, and given we'd break even in 4 years on S3 costs, upper management was gracious enough to see it as an unplanned investment.
TLDR: My company spent 42k for me to learn to read to the bottom of every AWS pricing page.
When there is a lot of money involved, people self-select into your company who view their jobs as basically to extract as much money as possible. This is especially true at the higher rungs. VP of marketing? Nope, professional money extractor. VP of engineering? Nope, professional money extractor too. You might think -- don't hire them. You can't! It doesn't matter how good the founders are, these people have spent their entire lifetimes perfecting their veneer. At that level they're the best in the world at it. Doesn't matter how good the founders are, they'll self select some of these people who will slip past their psychology. You might think -- fire them. Not so easy! They're good at embedding themselves into the org, they're good at slipping past the founders's radars, and they're high up so half their job is recruiting. They'll have dozens of cronies running around your company within a month or two.
I'm guessing something like the dynamic described here was involved.
The silver lining here may be that he outed himself (literally) before he was able to build an empire of such incompetence.
My company is service-based and just over 1000 people. Timesheets equal billable hours. It's occasionally very pressurized and we lose people pretty quickly when there's a lull in work, but it also means that useless people have absolutely nowhere to hide.
But with a fire-fast approach, it sounds like your company can move fast on hires and be ready to contain the damage.
My own thoughts about this:
Disclaimer: Not a founder myself, but have observed one at close range.
I have been bitten colleagues and it still hurts. Because they weren't that great with I.T.
I rather show it off what I can do and what I need to work on. Than relying on somebody else. (Again I have been bitten by that.)
You know, it happens, to everyone, however good or experienced; what matters for a company's (and individual) sake is how we respond to mistakes.
You guys responded well, that was resilient. The next step would maybe be antifragility. Did something change afterwards, because of this bad experience?
You should have a "global" cloudtrail turned on in all your aws accounts, with the integrity checksumming turned on, either feeding directly to an s3 bucket in yet another account that you don't give anybody access to or at least feeding to a bucket that has replication set up to a bucket in another locked-down account.
The cloudwatch events console can find some cloudtrail events for you, but you might have to set up Athena or something to dig through every event.
I would rather my whole system shut down and be unusable while I investigate vs. auto-scale and charge me a bill I can't cover.
However, searching around it seems like I can only get alerts when a $$$ threshold is passed, but AWS won't take any action to stop computing or anything. Please prove me wrong.
The counterargument is that you get a usage spike (which is often a good thing for a company), and AWS shuts down everything connected to your AWS account without warning.
I'm not necessarily sure that optional/non-default hard circuit breakers would be a bad thing. But it certainly appears not to be a heavily demanded customer feature and, honestly, if it's not the default--which is shouldn't be--I wonder how many customers, or at least customers the cloud providers really care about, would use them.
Nearly every customer (i.e. all of them with a budget) would make use of circuit breakers and it would make Amazon absolutely $0 while costing them untold amounts. Are you really surprised Amazon hasn’t implemented them?
For example Vultr can give you a "bare metal" 8vcpu-32GB box for $120 a month (Not sure if this is contract or on-demand) vs amazons M5.2xlarge for $205 reserved. $80 might not sound like much, but that's 70% more. Who would love to save ~42% on their cloud costs?
It become harmful to them though. At a certain point people feel the hit and avoid the service. Having people spent a little more accidentally and go ‘oh well, oops’ is the sweet spot. An unexpected $80k which kills the company is bad for everyone.
The same could be done with cloud doo-dads.
Sadly, some services take as long as 24 hours to report billing.
Sure, but most large companies (the kind which AWS gets a lot more revenue from and cares about a lot more) want the exact opposite. Most large companies have the extra cash to spend in the case that a runwaway auto-scale was in error, but on the other hand, completely shutting down operations whenever a traffic spike happens could resort in millions of lost revenue.
>However, searching around it seems like I can only get alerts when a $$$ threshold is passed, but AWS won't take any action to stop computing or anything. Please prove me wrong.
The general advice is to use the various kinds of usage alerts (billing alerts, forecasts, etc) to trigger Lambda functions that shut down your instances as you desire. It takes a little configuration on your part, but again, AWS intentionally errs on the side of not automatically turning off your instances unless you specifically tell it to.
It does not have to be all or nothing. You could for example setup separate account per department and/or purpose and impose hard cap on spending for experimentation, but not on production.
Great companies find ways to help their customers thrive.
People make mistakes with transferring money in the millions of dollars all the time, and it's not uncommon for people to be just like "oops, back that out". It's obviously going to be in the news when that doesn't happen though.
What some people are asking for--and it's a reasonable use case but one that AWS, somewhat understandably, isn't really focused on--is: "Burn it all down if you have to, including deleting databases, but no way no how charge me a penny more than $100/month (or whatever my set limit is)."
That means any data getting lost as a result of that limit is data that they weren't guaranteeing in the first place. You might not be able to actually read your EBS volumes, S3 buckets or Aurora tables without increasing the spending limit or otherwise committing more funds, but it won't go away that second, and you would have enough time to fix it (worst case - wait until next month; you did budget that already).
Alternatively: assign each resource to a pool, and monthly spending limits to each pool. Give your EBS/S3 $1000/month, and your R&D-pet-project-that-may-accidentally-spawn-a-billion-machines $50/month.
And, as you say, it makes sense to have a pool--whether it's your whole AWS account or not--where you can turn on the "burn it all down" option if you want to.
For example, you shut down the ability to launch anything in a particular region easily - but assume you specifically want to exceed a default limit - you can call speak with your rep and have any of your limits set to whatever you want
The proposed solution is usually to setup billing alerts so you can detect the issue early and fix the problem in a way that makes sense.
I'd suggest further: new AWS (azure, etc) accounts should have billing alarms at $0, $10, $100, $1000, etc. by default on account creation. Users can delete if they don't want them or want something different. Getting an alert at $100 as it happens instead of getting a >$1k bill at the end of the month is a much better customer experience.
Depending on what you're doing, data loss might not be nearly as big of a threat as a massive bill.
I could also imagine it being configurable on a service by service basis to mitigate against the data loss downside - e.g. maybe you have a hard cap on your lambdas but not your database snapshots.
Someone could build a cost circuit break lambda function fairly easily. Wire a billing alert to the lambda, use AWS API to terminate all EC2 instances (or other resources). Someone could open source that fairly easily.
I think it's about reasonable defaults. You can recommend that customers configure their accounts with the behaviors they choose, but clearly many aren't using the tools they currently have. So we have these horror stories.
I will mention that I don't think deleting resources is a good default behavior as that can cause data loss (itself a potentially business ending event). But people are certainly able to evaluate their own risk and can implement that with the tools that exist today.
There may be a maximum threshold for this, though.
We also asked them to summarize what happened so we can think how to help other users not to make the same mistake in future.
I had a personal project, where I wanted to occasionally do short but highly parallel jobs. Once my scripts didn't close everything down correctly and a week later I had spent £600. That's a lot of money to me personally. I asked politely and it was never refunded.
So we switched to Google Cloud, which has a better UI for telling you how much you're about to spend. As we grew, we ended up spending way more money on GCP than we ever did on AWS.
yeah I know I shouldn't +1 here but the upvote just didn't cut it.
- Not stopping AWS Glue with an insane amount of DPUs attached when not using it. Quote "I don't know I just attached as many as possible so it would go faster when I needed".
- Bad queueing of jobs that got deployed to production before the end of month reports came out. Ticket quote "sometimes jobs freeze and stall, can kick up 2 retry instances then fails", not a big problem in the middle of the month when there was only a job once a week. End of the month comes along and 100+ users auto upload data to be processed, spinning up 100+ c5.12xlarge instances which should finish in ~2 mins but hang overnight and spin up 2 retry instances each
- Bad queueing of data transfer (I am sensing a queueing problem) that led to high db usage so autoscaling r5.24xlarge (one big db for everything) to the limit of 40 instances
I had a new employee dev include AWS creds in his github which was pulled instantly by hacker bots that launched a SHITTON of instances globally and cost $100K in a matter of hours...
It took my team like 6 hours to get it all under control... but AWS dropped all charges and we didnt have to pay any hosting costs for it....
So why didnt you work with AWS to kill such charges?
We've heard the horror stories from the Google data egress pricing "surprises" (like that GPT adventure game guy incurred a few month ago https://news.ycombinator.com/item?id=21739879).
We've heard the AWS and Azure horror stories.
It seems crazy that the only hope of correcting a mistaken overspend is a helpful support desk. The first one is free, right?
At least AWS does have such a support desk, Azure may have and with GCP you are better off just shuting down the company.
How about lesser providers such as Digital Ocean.
Let's say your code mistakenly provisions 1000 droplets instead of 100. Is this a scenario you can prevent at an admin level?
If their reputation isn’t enough to keep you from depending on GCP, their hiring of a lot of people from Oracle to beef up their sales staff should be a major warning.
That was one of the new Oracle execs firing the entire Seattle-based cloud marketing team.
The reason for firing those people was solely to open up headcount to get more salespeople. The marketing team was not happy, for obvious reasons, and company PR worked pretty hard to spin this one. This is not how Googlers expect the company to work. But it is exactly what Oracle refugees expect.
My take: GOOG is in for some Ballmer doldrum years of its own. They've well and truly arrived for the employees, but Wall Street hasn't quite figured it out yet.
Here is my referral code for the one I use:
(I previously asked Dan, the mod here, if I can share in this way and he said it's okay. I don't have other affiliation with that company and have found it good.)
Claiming an unfounded dispute or transferring funds to a new company is fraud and you'll probably end up with both AWS and your bank coming after you for collections. With $80k on the line, its enough to file legal claims.
The best plan is to negotiate directly with AWS and ask for forgiveness and a payment plan. Do not try to run away from your financial obligations or you will make it far worse.
EDIT: you've rewritten your comment so this doesn't apply, but please don't recommend that people avoid their debts.
Yes in civil court. So, no don't pay it. This isn't London 1689. Debtors prisons do not exist.
"Claiming an unfounded dispute or transferring funds to a new company is fraud"
Explaining to the credit card company you were tricked or confused in this purchase is not fraud.
2) AWS did not take advantage of you and making a mistake does not absolve you of responsibility. There's nothing to dispute.
3) Bankruptcy is allowed, and is also exactly what happened. You stated other things like filing fake disputes and transferring funds to a new company, which is fraud. And that does come with criminal charges.
EDIT: to your completely revised comment - Bills are still owed, even if it's only civil, and judgements can result in wages, taxes and other assets being garnished. Saying you were "tricked or confused" when you weren't is fraudulent, and credit card companies are not going to defend you from that. Unless AWS forced those charges or failed to deliver services, there's no dispute.
How do you know? That is what a court system is for.
"like filing fake disputes and transferring funds to a new company"
Ah the old straw man. Nope. I didn't say file fake disputes.
"please don't recommend running away from debts"
Having the ability to not pay debts is the entire point of Limited Liability Companies. People out of human dignity should have the right to NOT pay debts. Please don't recommend paying whatever a debtor wants.
Since you're revising your comments, there's no point to further discussion but please don't recommend running away from debts. That's not going to end well.
Familiar with The Merchant of Venice?
That's very strange.
HN allows people to respond to things an hour or two after they are posted. I don't think that is extreme behavior and we've both done it in this thread.
By the time you said this, there were already 3 post/reply loops between that person and myself. I don't see the purpose of telling me to ignore them, especially when you don't have the original context of their edited comments and are downstream of the conversation that already happened on the exact topic you say should be ignored.
Filing a dispute when you knowingly made a mistake is a bad move, and your bank will quickly figure this out when AWS provides the billing statement, API logs and signed TOS. You're going to have a very tough time if you try to litigate this in court.
Debts (or at least payment plans) can be negotiated. Disputing to weasel out of them will only make things worse. A little communication can go a long way.
If cryptocurrency and smart contracts make sense to you, you might not be aware that forgiveness for human error really does happen in normal business.
What's wrong with this approach? It's not like they can collect on your personally, or go after the new company. (I wonder how they would even figure what legal entity is behind the new company/wesbite.)
I don’t think those apply here. If by sheer accident you were hit with a giant AWS bill, and you were facing potentially having to shut down your company, and you conducted the maneuver that I described, what’s wrong with it? Your company was facing a life-or-death situation, and decided to be reborn.
Maybe there needs to be a form of corporate bankruptcy where the company can retain its core/key IP assets...
That is not a safe assumption to make, especially if you are deliberately (AKA fraudulently) dodging debts (IANAL).
Kind of makes me annoyed. I'm sure enterprises don't care/want unlimited. But solo practitioners, and people new to the platform would love a default e.g. $5K/month limit (or less).
Feels like these services just want people to "gotcha" into spending a bunch of money without simple safety nets.
PS - No, alerts do not accomplish the same thing, by the time you get the alert you could have spent tens of thousands.
I strongly suspect opaque pricing and high/nonexistent limits are more about getting large organizations to transition to the cloud seamlessly (i.e. not completely caring/realizing what they're getting into for any particular migration/deployment).
Tricking personal users into spending thousands by accident probably doesn't net much money compared to enterprise spend and runs the risk of alienating people who then can go into work and recommend against using a particular platform, having been burned by it on their personal accounts.
As a counter-datapoint, we accidentally left a Redshift cluster up idling for two weeks before we started getting alerts, and after numerous attempts have failed to get compensated in any way. The reasoning was that, well, it was what we requested and they had to allocate compute power to it (which we didn’t use).
All in all a very frustrating experience and it makes me fairly cynical of all these “I got my money back without problems!” comments.
(For what it’s worth, it was about $4k of costs which was a lot for us at the time)
Isn't AWS supposed to focus on the virtuous cycle of saving customers money (or at least reducing AWS supports need to write off customer mistakes)?
AWS is a large organization; I believe this type of stuff highly depends upon your “entrance” into the organization, i.e. the account manager. We were probably just unlucky with our Redshift troubles, but it did eventually trigger a move to Google Cloud / Bigquery, as the pay-as-you-go method seemed a bit safer (although it’s still too difficult imho to accurately estimate the costs of queries).
nobody wants to work on the billing code because it’s a mess and the penalty for a mistake is very high.
When I got charged an extremely large amount I was contacted by the customer support. They explained to me what happened (I had leaked a SECRET in my repo) and then got refunded the total amount.
It was quite an anecdotical experience because I wasn't expecting any of it as it was my mistake.
Someone had sent 70,000 emails (which is the default daily limit at my tier). Luckily only cost ~$8.
The access keys would be in your usr directory and all of the SDKs would know how to find them. When running on AWS, the SDKs get the credentials from the instance metadata.
* Firebase Hosting with Firestore
* Cloudflare Workers Sites (using KV)
* Netlify (possibly w/ FaunaDB)
Done. For 98% of hobbyist projects, a single Vultr $5/month node is probably far more than enough. For 99%, three Vultr $10/month instances (web, DB, cache) is probably enough.
So add an interface that will let you specify that somehow for common scenarios? There must be something better than zero help they can offer. Not everyone needs something that can autoscale to Google levels.
I've actually had my payments on my personal account bounce once or twice and no, they did not.
This by definition is deploying something you don't fully understand. If there's a problem in any of those templates you won't know. You won't really know if they even do what they say they do.
Using one to do something as important as this would be crazy.
Enterprise companies do not want infinite billing. They want fixed and reliable billing, more than anything else. With on-prem equipment they know a few years in advance what their expenditure is going to be at any time, and will have a budgeted amount over the top of that that they're on-board with. Bring the idea of autoscaling with limits, and they're very happy indeed, particularly with the idea of automatically scaling down.
> Azure has, built in, hard price/cost limits but doesn't allow the public to use them. For example if you have MSDN subscription credit you get a hard limit of up to $150/month, but you yourself cannot pick a bespoke limit to use the service more safely.
I would be willing to bet that that is something enterprise customers can get access too, particularly if their annual expenditure is high enough under normal operation. Microsoft knows the enterprise market very well, just like Oracle does, and like Amazon doesn't (historically speaking, at least).
Yes some of them have gift card programs already, but they probably don't want expanded regulations to contend with large sums of money.
Because it is just like that. Nine years ago Amazon said (about the same issue raised 14 years ago): "We’ve received similar requests from many of our customers and we definitely understand how important this is to your business; We hear you loud and clear. As it stands right now, we are planning to implement some features that are requested on this thread, though we don’t yet have a timeline to share."  In other words, they know people need it, but they prefer not to implement it.
They may not necessarily enforce spending limits - but it's possible to restrict provisioning of costly resources, or even whitelist resources that can be provisioned.
Almost every Cloud Foundation project nowadays involves setting up these guard-rails.
There can be also many reasons for the budget overrun. It's not always a user error. It could be issue with the platform itself such as error in billing system or faulty autoscale logic. Or it could be caused by an external event, such as denial-of-service attack.
(Not sure how things work with the MSDN subscription credit, but at least you are not supposed to be running production workloads with those)
I ran into that issue when I wanted to play with AWS EC2 (few years ago, maybe it has changed since then, or maybe I didn't look hard enough). The free VMs were too slow to be usable. Considering my usage, I was unlikely to run into un-expected spendings, but I didn't want to take any risk. Can anyone recommend a similar service with a simpler customer interface where you can set up a simple safety spending limit?
Though amongst those service-types, I can't really recommend beyond the fact that linode & DO didn't give me any headaches for the one month I used them
Which means you won't learn AWS/Azure/etc instead, and they lose mind-share. This is actually an argument for why they SHOULD offer hard limits, not an argument against.
If their goal is to push startups/newbies/hobbyists to other platform, they're definitely on the right path. If the goal is to make their cloud services safe to learn/start using, then they could do much better.
Your logic is a self-contradiction:
- You need an expert to use AWS/Azure
- It is unsafe to even learn AWS/Azure without already being an expert.
Where do these experts come from? Osmosis? If there's no safe way to learn them, and being an expert is a prerequisite to using them, then you've created an artificial self-limiting supply shortage.
This is another argument that defeats itself and shows that these limits are absolutely needed to stop a mindshare loss/lack of expertise.
In my case, working for a company that gave me admin access from day one with no practical experience with AWS.
Even though I haven’t done anything stupid (yet) and think I know enough not to now, I wouldn’t recommend that....
My own mistakes are probably a greater risk, but still. Turn on that 2FA.
I do realize you can get closer to a hard limit while possibly exempting some services that would let you get over the limit--I suppose. Though then people would doubtless complain that the hard and fast limit is not, in fact, a hard and fast limit.
Intuitively, if you're capping out your S3 storage, the hard cutoff should look like "don't allow me to store any additional data".
If you're capping out retrieval, then "don't serve the data any more".
But if the data is stored, the clock keeps ticking until you delete it. If I have a TB of data stored, and I hit my $1K (or whatever limit) on April 15, the only way that I don't get hit with a >$1K bill for the month is if AWS deletes anything I have stored on the service. (Or at least holds it hostage until I pay up for the overage.)
There's enough room there for workflows where I know I'm going to delete data later that allowing configuration would be valuable. (Maybe I can set a timed expiration at the moment of storage, instead of having to store first and separately delete later? That would keep end-of-month predictions accurate.) But it isn't difficult to set the hard cutoff.
What you're asking for is not possible and will have unintended consequences. Guaranteed not to meet every customer's expectation of how it works.
Yes, that's the idea. Compare https://news.ycombinator.com/item?id=22719015
>> My pet project cannot involve the risk of possibly costing me thousands or more because I made a mistake or it got super popular over night. I'd rather my site just 503.
> What you're asking for is not possible
No, for services like these it should cap at the cost of keeping the data indefinitely. If your budget limit for S3 was $1000 per month, and you tried to add an object which if not deleted would make you use $1010 next month (and every month after that), it should reject adding that object.
So now we've created a situation where everything's running fine, our bill is consistently $500/mo, I go casually turn on a $1k/mo spending limit... aaaaand suddenly everything starts failing in totally non-obvious ways.
That said, i am pretty sure the the TOS for MSDN funds say they are not be used for production systems.