
How to burn the most money with a single click in Azure - Metalnem
https://mijailovic.net/2020/03/28/azure-money-burning/
======
appstorelottery
A few years ago my startup was killed by a AWS mistake that ran overnight. The
irony: my AWS expert at the time had made exactly the same provisioning
mistake at his previous job - so I figured he'd never make a $80k mistake
again. It turns out - his mistake with my startup was even more impressive.
More positively - he did help shell out with me to cover the cost & overnight
we were out of money. The mistake shocked me so much, and I've since heard
_so_ many stories of similar mistakes. The event hit me so hard I went back in
time to PHP and shared hosting. Not kidding.

~~~
_bxg1
They should give you the option to set a hard limit across your entire
account, to prevent you from accidentally spending more money than you have.
"If I try to spend more than $5k in a month, something has gone wrong, don't
let me do that."

~~~
umvi
Seems like circuit breakers should be a standard safety feature for
automatically infinitely scaling computers.

I would rather my whole system shut down and be unusable while I investigate
vs. auto-scale and charge me a bill I can't cover.

However, searching around it seems like I can only get alerts when a $$$
threshold is passed, but AWS won't take any action to stop computing or
anything. Please prove me wrong.

~~~
ghaff
>I would rather my whole system shut down and be unusable while I investigate
vs. auto-scale and charge me a bill I can't cover.

The counterargument is that you get a usage spike (which is often a _good_
thing for a company), and AWS shuts down _everything_ connected to your AWS
account without warning.

I'm not necessarily sure that optional/non-default hard circuit breakers would
be a bad thing. But it certainly appears not to be a heavily demanded customer
feature and, honestly, if it's not the default--which is shouldn't be--I
wonder how many customers, or at least customers the cloud providers really
care about, would use them.

~~~
kortilla
The usage spike is very very rarely worth the cost. That’s a pipe dream the
cloud providers sell to cover up the fact that these scenarios are sweet sweet
profit for them and nothing more. There are very few businesses where making
more money is just a matter of throwing some more compute at it.

Nearly every customer (i.e. all of them with a budget) would make use of
circuit breakers and it would make Amazon absolutely $0 while costing them
untold amounts. Are you really surprised Amazon hasn’t implemented them?

~~~
lostlogin
> Are you really surprised Amazon hasn’t implemented them?

It become harmful to them though. At a certain point people feel the hit and
avoid the service. Having people spent a little more accidentally and go ‘oh
well, oops’ is the sweet spot. An unexpected $80k which kills the company is
bad for everyone.

~~~
runawaybottle
This almost feels like banking fees. A dollar here, a dollar there. In this
case it’s a couple of thousand here and there until you can’t afford it
anymore lol.

------
Someone1234
Azure has, built in, hard price/cost limits but doesn't allow the public to
use them. For example if you have MSDN subscription credit you get a hard
limit of up to $150/month, but you yourself cannot pick a bespoke limit to use
the service more safely.

Kind of makes me annoyed. I'm sure enterprises don't care/want unlimited. But
solo practitioners, and people new to the platform would love a default e.g.
$5K/month limit (or less).

Feels like these services just want people to "gotcha" into spending a bunch
of money without simple safety nets.

PS - No, alerts do not accomplish the same thing, by the time you get the
alert you could have spent tens of thousands.

~~~
wrkronmiller
This is only anecdotal, not personal experience, but I've read online and have
had friends "oops" away large sums of money on AWS, and for the most part they
seem to have at least gotten a partial discount when they contacted customer
support.

I strongly suspect opaque pricing and high/nonexistent limits are more about
getting large organizations to transition to the cloud seamlessly (i.e. not
completely caring/realizing what they're getting into for any particular
migration/deployment).

Tricking personal users into spending thousands by accident probably doesn't
net much money compared to enterprise spend and runs the risk of alienating
people who then can go into work and recommend against using a particular
platform, having been burned by it on their personal accounts.

~~~
stingraycharles
> “oops" away large sums of money on AWS, and for the most part they seem to
> have at least gotten a partial discount when they contacted customer support

As a counter-datapoint, we accidentally left a Redshift cluster up idling for
two weeks before we started getting alerts, and after numerous attempts have
failed to get compensated in any way. The reasoning was that, well, it was
what we requested and they had to allocate compute power to it (which we
didn’t use).

All in all a very frustrating experience and it makes me fairly cynical of all
these “I got my money back without problems!” comments.

(For what it’s worth, it was about $4k of costs which was a lot for us at the
time)

~~~
thanksforfish
It's also unnecessary. Give users cost controls, then you won't need the
current mess of hoping support will write off $$$ of mistakes. With a risk of
bankrupting a small shop if support doesn't help it drives risk averse users
towards less dynamic offerings.

Isn't AWS supposed to focus on the virtuous cycle of saving customers money
(or at least reducing AWS supports need to write off customer mistakes)?

~~~
stingraycharles
I always have the feeling of a bit of “randomness” with these kinds of
compensations. It makes sense, as it’s difficult to “codify” these types of
things, lest they get abused and you might as well just lower your prices at
that point.

AWS is a large organization; I believe this type of stuff highly depends upon
your “entrance” into the organization, i.e. the account manager. We were
probably just unlucky with our Redshift troubles, but it did eventually
trigger a move to Google Cloud / Bigquery, as the pay-as-you-go method seemed
a bit safer (although it’s still too difficult imho to accurately estimate the
costs of queries).

------
slrainka
Just a couple of months ago, we were blindsided by a massive AWS bill after
turning on encryption for logging (an ask from security team). The encryption
relied on using KMS, but because it was a serverless setup and each time a
lambda was initialized it would grab the KMS keys. Later we found out that
invoking that many KMS calls while doesn't cost that much, however does invoke
CloudTrail logs which are quite costly. Sometimes it's hard to model something
like that. Following this experience KMS team made some changes to how their
service/pricing works given its tight coupling to CloudTrail. We also stopped
using KMS and simplified the log encryption approach by writing logs directly
into an encrypted S3 bucket.

~~~
sim_card_map
We are using two $3.5 AWS lightsail instances to handle hundreds of thousands
of customers :)

No hidden or unexpected costs.

~~~
slrainka
Can you share a bit more about the architecture and stack? And a few more
details such as Roughly how many TPS do you serve from this setup? Does it
scale well? Are response times consistent?

~~~
sim_card_map
It's all written in Go + Postgres. Nothing else. Not sure what TPS is.

No problem with scaling.

Stack Overflow runs on a single machine. Use a fast compiled language and a
simple stack. No need in $6k/mo AWS bills which are the fashion these days.

~~~
haggy
Your response gives me zero confidence that you know what you're talking
about. Having "hundreds of thousands" of users tells us nothing (which is why
TPS was brought up). Also if you do have a high concurrent user count then
there's no way there have been "no issues scaling". There are always scale
points in systems that actually grow.

~~~
sim_card_map
Since 2015, no issues. You can choose not to believe me.

~~~
ajkjk
You're choosing not to try to convince anyone that what you're saying is true.
Why bother posting at all, in that case?

------
s1k3s
I see a lot of people here saying how they "get burnt" by unexpected increase
in their bills, because some guy who was supposed to manage the cloud made a
mistake. That's not the real burn, the real burn is when you pay $8K a month
on cloud services for apps that could run on a laptop in a basement. And you
pay that monthly, for years. And before devops guys slash me with "scaling",
"fault tolerance", "redundancy" and what not, remember that your little
website isn't a multi million user app and all the issues that you think are
fixed by simply moving to the cloud can also be fixed by running your app on a
laptop (for free).

I used to handle infra for a client a few years ago (the stack was built by
another employee before I got there) and it always amazed me why these people
pay $10-20K every month to AWS for their app with 500 users a day. Needless to
say that AWS was also causing a lot of headaches with their instance
management, network maintenance and service unavailability. I know cloud is
cool, but business should probably think twice before going for it.

~~~
gitgud
Very true, some companies boast about 10,000 registered users. When in reality
there's only 100 daily active users...

I think the problem is that companies want to _appear_ bigger than they are.
Saying they use _AWS Cloud Stuff_ is a selling point to investors who have a
distorted understanding of how scaling works...

The cloud service companies are marketed to entice people into buying services
they don't need... yet/ever

------
bane
These basic gaps in cloud hosting providers' tooling have created a rather
large cottage industry of companies that exit only to make the tools to fill
in the gaps -- like VPC cluster management, security compliance, whatever.

The perverse side of it is, it costs to host the third party tooling as well,
so cloud providers get more money from you setting up these tools so that you
don't burn away all your money. However, they don't get more money from
plugging the holes in their own tooling. So they have no incentive to fix it
on their own.

~~~
hinkley
Perverse is the right word here.

Perverse incentives are why things are the way they are. This will only change
when a large enough competitor defects.

------
leoedin
I got burned by AWS billing. I played around with some tutorials and before I
knew it I had a $100 credit card bill.

Even working out what I was paying for proved really tricky. Turning it all
off involved crawling through a bunch of opaque control panels.

I'm definitely never considering AWS again for a personal project. It's too
dangerous. I'm not a company and I don't have corporate level budgets.

~~~
farisjarrah
I get around this in GCP by having a script that spins up a brand new project
whenever I want to play around with stuff. When I am done playing around with
things, I have a 2nd script that deletes the project and thus all the
associated resources that are in use in that project.

~~~
scarface74
Why do you need a script? With CloudFormation (and I assume terraform), you
create a file containing all of your resources in a “stack” and you delete the
stack when you’re done.

------
praveenpenumaka
We had an instance when some consultant dev wrote a lambda function which gets
triggered when a new image is saved in S3, takes it resizes it and re-save it
in same S3.

For non-technical guys, that is recursion. We were surprised to see $5000 bill
in 10 days.

~~~
williamdclt
You got lucky, if it saved _two_ different sizes for the each image...

~~~
bspammer
I'm actually interested how AWS handles exponential cases like this. They must
throttle it somehow, otherwise something dumb like this would be causing
availability issues everyday.

~~~
praveenpenumaka
Unfortunately, there is nothing inbuilt in AWS to mitigate these conditions,
except for monitoring for "anamolous" behaviour.

~~~
bspammer
I mean there must be something, because otherwise taking down AWS would be as
simple as the above. After 50 steps, you've created a quadrillion images.

~~~
praveenpenumaka
They surely have very high upper limits on S3 files.

Lambda functions have limit on concurrent functions you can run.

------
Razengan
Takeaways from this and other people’s experiences in the comments (apart from
making sure such mistakes don’t happen in the first place):

• Providers should let users set hard spending limits.

• Providers should offer a channel for investigating and waiving charges for
honest mistakes if reported soon enough.

• Perhaps you should sign up with throwaway billing details so you can
continue on another account till you sort it out. Morally and legally unsound
but probably the better alternative to “killing your startup” I guess.

------
reilly3000
About a year ago I spent $80 making a single query using BigQuery. There was a
large public dataset and I did a query that spanned several months of data. My
query was something like SELECT fields FROM table.* WHERE date = ... the
problem was that WHERE still scans data from ALL date partitioned tables then
returns a filtered result. What I should have done is FROM table.date - anyhow
I am glad I checked the cost before I hit it with more queries. At $5/TB
scanned is remarkably cheap for small data but not for large data. In my case
it was 16TB.

Still, it was pretty amazing that the query returned data within ~2 seconds.

So be careful kids. Public datasets don’t cost anything to store, but your
cloud account is going to get pummeled for your exploratory data analysis and
if you haven’t set any billing controls or alerts you’re in for a nasty
surprise.

------
patrec
Which cloud or physical server hosting providers provide either user
configurable hard spending limits or the option to (only) pre-pay? The big 3
cloud providers obviously don't, and neither does DO, I believe (you can pre-
pay but not as your only payment option).

For personal hobby projects taking your chances on customer support goodwill
in case you rack up a bill that would bankrupt you due to a wrong click seems
kind of insane.

So what are good alternatives?

~~~
why-el
Can't think of any, but managed hosting tends to do better. For instance if
you are doing your testing in Ruby, I'd suggest hosting on Heroku while you do
all your preliminary work, where it's clear(er) what the cost will be, then
move to AWS later when the bank account is, well, beefier. :)

~~~
patrec
Bonkers! Surely there must be a market for this? Also, I get that the selling
point of heroku is kind of AWS for dummies, so you are probably slightly less
likely to mess up because of that, but it's still some auto-scale-this, click
together-that kinda of service. As far as hosting models are concerned,
getting a dedicated box and running the stuff you want seems far safer, no?

~~~
why-el
Agreed, it's a matter of degree. You can auto-scale up to a certain amount, at
least in Heroku, so in practice you can limit it, say up to 8 servers or
something, but nothing beats your own box if you can afford the opportunity
cost.

------
seanwilson
Does AWS still not have a way to stop accidentally huge bills? How is this not
intentionally negligent on Amazon's part at this stage?

~~~
Terretta
The fewer system parts running "if" statements in delivering your service, the
better.

The use case of delivering the service to a consumer of the service happens
all the time, the use case of "oh, I didn't understand how this works and
foot-gunned" is relatively rare.

AWS can eat that cost less expensively than maintaining the "if" statements
inline on every request.

The most reliable code is code you don't write at all.

~~~
hinkley
You can run an awful lot of if statements in a microsecond these days.

~~~
Terretta
At scale, every microsecond matters.

But it's less the CPU time, and more the complexity.

~~~
hinkley
That depends entirely on if the microsecond is in the embarrassingly parallel
part of the workload or in the sequential part.

------
croh
Oh boy this why I love digital ocean ! There is never any surprize. Simple and
intutive UI. Free alerts, free firewall. you can see all resources on single
dashboard. not like aws where you forget to terminate instance from different
region and screwed up in one night.

------
brassattax
I wonder if reserving all available services on a hacked account is the next
DOS attack.

~~~
redis_mlc
Already been done for a Defcon talk. Limits were added.

------
thoraway1010
Not that I'd want production infra to be wiped because I went over a spending
limit by $1 - but can't you setup a billing alert that goes to an alarm action
to terminate all your EC2 instances as soon as the billing alert triggers? Or
a lambda function that iterates through your account and deletes everything?

I'm not sure why AWS would build something like this though themselves. To
stop spending money EVERYTHING must be deleted (s3 / glacier / etc). If
someone in accounting loads the wrong budget amount you lose all your data.

Amazon's focus seems more on making sure your data is kept and available.

~~~
maest
Biling alerts are not real time.

\- Some guy

~~~
thoraway1010
Good point - I think they are on a 6 hour average delay from what I've seen.
An area for improvement it would seem for sure.

------
runawaybottle
At what point would a VPS no longer meet your needs? Let’s say a standard
startup, not something that will reach TikTok traffic, or something like
Slack.

------
samstave
The easiest way I can think of is to got commit your AWS keys to a public
repo.

(This actually happened to me a while back - a new employee created a 201st
repo, which our private repo paid limit was 200, so the new repo was
automatically made public and he had keys in his code. We had thousands of
bitcoin mining instances launched using the keys and cost $75,000 super fast.

We caught it really quickly and AWS dropped the cost.

------
new_here
It’s worth mentioning here that if you find yourself accidentally running up
an AWS bill you can get in touch with their customer support, explain the
mistake and ask for amnesty.

I once accidentally ran up a $2.8k bill and after explaining the situation
they added a credit to my account to cancel out the charge at the end of the
month. They obviously review it case by case but it’s definitely worth a shot.

------
londons_explore
Why do none of these cloud services have spend caps?

Why not have a simple "budget" setting and have a setting what to discuss when
it's exceeded. Options could be "shut down most recently started resources" or
"shut down everything but don't delete any data", and "delete everything".

~~~
maklu
Azure has this in their Cost Management page, available for each subscription.
You can set a budget and get notifications about when you are reaching its
limit. Not sure about the power it has with re to shutting things down.

~~~
topkai22
I believe you can hook into apis to shut things down, but it’s not super
straight forward.

There are many Azure services that can’t be set to zero dollar billing without
data loss, so I’m not sure how Azure could deal with those in a unified
manner.

------
batoure
I turned on a POC of a platform on AWS last month that stood up 30,000$ worth
of AWS spend for it self... now where near this but the ease with which it
came online was terrifying

------
anvarik
this part made me laught:

> Our next candidate is Azure Databricks. I have no idea what it is, and I
> don’t even care! All I know is that it’s pretty expensive, and that’s
> exactly what I need

------
crazygringo
This is hilarious.

But serious question: the original tweet is about a single instance that costs
over 3 million dollars.

Is that genuinely a single _physical_ instance? Like is it even technically
possible to build a traditional single physical server of CPU+RAM+disk that
costs that much?

Or, being a database server, is it some kind of clever abstraction that
actually splits it up physically (e.g. relying on the fact that different
database threads might be able to get away without sharing memory)?

~~~
judge2020
I'm not sure exactly what processor AWS uses, but assuming you match the
db.r5.24xlarge's 48 cores, the Xeon Platinum 8160 is the closest match for the
processor - ~$5k. The ram is likely ddr3 16gb ecc memory, so 768gb at
$50/stick (you might be able to find a better price) is $2400. As for disk,
the price of the instance doesn't include disk size since it's an EBS-only
server.

The actual price comes from SQL server enterprise edition. A mysql
db.r5.24xlarge multi-az prorated for 3 years is $219,551, which is ~$600 less
per month than on-demand pricing. However, sqlserver-ee is 2,782,588. I
believe this comes from per-core pricing for sql server.
[https://docs.google.com/spreadsheets/d/e/2PACX-1vQZT7wl1yvav...](https://docs.google.com/spreadsheets/d/e/2PACX-1vQZT7wl1yvavctIYchaxHAI_BEct1lKRnl3A_Z8D9xBK-u1eqxy__S9w2kB9BucnLq8iZIYb-
YGUZiN/pubhtml?gid=0&single=true)

Note - all prices were North Virginia, the original tweet was regarding
servers in Bahrain.

~~~
crazygringo
Oh wow. That makes much more sense then, that it's mostly SQL Server
licensing.

I was vaguely aware that some server licensing is per-core, but I never
realized it could add up to costs at that level.

Thanks!

------
mcv
A co-worker created a website for a charity and got some free Azure budget for
it that would have been plenty for a year. After half a year, though, it was
gone because something innocuous that he'd accidentally activated ate up all
his budget.

Sorry I don't have more detailed data than that, but it certainly alerted me
to the fact that you need to be really careful with these cloud services.

------
ablekh
Could people familiar with the "Cost Management and Billing" functionality on
both Azure and AWS, share here relevant coherent thoughts (i.e., mini-review)
on feature comparison, pros/cons, etc. for these two platforms? I'm especially
interested in this from the perspective of a multi-tenant SaaS
provider/vendor.

------
ineedasername
Well, not sure if this counts or if it's a cheat, but you could do the
following: Find the single most expensive click, and then a consulting company
to implement it, and click _their_ button to sign the contract; Still just a
single click and Boom: 20% to 200% increase over Azure cost alone.

------
igammarays
Existential question about cloud usage: at what point does the risk + cost +
knowledge/consultation + DevOps staff required to efficiently manage a cloud
provider outweigh the risk + cost + staff of running your own data centre?

------
moondev
If your startup infrastructure scales "to well" then the new attack is not DoS
but a CoS - cost of service attack. Trigger a cloud bill so high it bankrupts
them!

------
zengid
I wonder if they have started courses in 'cloud accounting for engineers' at
universities yet? Snark aside, it would be useful for business oriented IT
degrees.

------
tly_alex
Just quickly glanced over some comments on this thread, found it's a bit funny
that most comments are about AWS, mean while the original artichle is about
Azure.

------
29athrowaway
I burnt $5,000 by putting some servers in the wrong region. Network traffic
within the same region was free, but across regions was not.

------
ertucetin
This is the reason why billing alerts exist. Just use them. You can't make
joke of money.

~~~
hinkley
I know this is common advice, but I expect history will very justifiably tag
this kind of talk as victim-blaming.

I think fundamentally small and middle-sized companies are building websites
improperly (cargo culting companies that can afford million dollar mistakes),
which surely exacerbates the situation.

But we used to be able to count on a person providing a service not to
overcharge you because it was bad for business. When they didn't, it was news.

------
alfianHac
Magic

