
The ominous opacity of the AWS bill – a cautionary tale - usr1106
https://www.taloflow.ai/blog/ominous-aws-bill
======
jjoonathan
Here's my $700 surprise bill story. There are many like it, but this one is
mine.

* An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.

* Not trusting CloudFormation, I looked through each (known service, region) manually to make sure resources had been torn down. This failed to identify the running resources because a tutorial div opened in regions with no running resources and remained open if you switched to a region with running resources, hiding them.

* Not trusting my manual service tour, I kept a close eye on my daily costs until I saw several days pass with $0 spend. This failed because free tier credits were hiding substantial service usage.

* Not trusting any of the above, I had billing alerts set as a catch-all. They correctly triggered on an unrelated usage surge, but with such high latency that I incorrectly attributed their failure to reset to high latency rather than to a genuine underlying charge.

Bam, $700 charge next month. Amazon was quick to refund half of it. I was
eventually able to get them to refund the other half by making waves in the
support system of a high-spend business account.

At the last re:invent session I went to, I surveyed a table of 6 people. After
sharing my $700 figure, 3 of the 6 came forward with even bigger numbers, 1 of
the 6 with a smaller number, and the remaining person was a newbie.

~~~
benatkin
Among the cloud providers, this seems to be unique to Amazon. There's a lot of
malfeasance that's unique to Amazon, like HQ2 and the counterfeit and
offensive products on Amazon.com. I don't feel like giving any part of their
company money. For twitch I donate on Stream Labs. It sucks not being counted
as a subscriber or being able to use the emotes, but I prefer that over Amazon
taking a cut.

~~~
maest
What is the cost reporting situation on azure and gcp?

We've gone for aws because they're supposed to have good customer support, but
opaque cost reporting and the inability to inpose spending limits is a
concern.

Does anyone know how azure/gcp:

1\. Handle cost reporting

2\. Handle spending limits (e.g. can I impose a hard spending limit per
service/per user/globally?)

~~~
webpaymentsguy
Used all 3 for complicated things, preferred AWS for better overall
security/compliance support and more features. Azure is largely comparable to
AWS for most shops and some services can be slightly cheaper. GCP seems to be
way behind both but I'm sure lots of places could get away with using them.

1\. Azure basically has comparable cost reporting to Amazon, though has a cost
aggregator if you want to use both Azure and AWS. I personally thought it
didn't really bring all the nice features AWS billing had into Azure very well
so I'd not recommend if your AWS usage is large and varied. I found GCP to
have less features than either Azure or AWS for billing.

2\. None of the 3 providers have a hard spending limit feature, though Google
app engine service (not GCP) let's you shut it down. Other than that,
permission roles are generally the same, AWS wins slightly on features again
but Azure had a slightly nicer UI.

Anyways, you should do your own research on what cloud seems sane to you, and
not let randos on HN make your business decisions.

------
brodouevencode
My job is cost optimizations at a very large corporation. We have been given
the order to go all in on AWS. Some things I've found to be particularly
annoying:

* Data transfer will bite you in the ass if you let it. Especially over NAT gateway in very high traffic sites. So you do the right thing and put your application in private subnets, route traffic in over the load balancers and out over the NATGW. Then you get a $20k/mo bill for your microserviced application that has a hundreds of requests per second during peak hours. Pro-tip: the poo-pooed nat instances are actually a cheaper solution, but you're on the hook for maintaining it.

* The CUR can get huge. I mean millions and millions of lines. AWS says you can throw it into S3, query with Athena, etc. etc. But if that data set is huge even _that_ will cost you a lot of money to run reporting, analysis, etc. Especially after you build that dashboard for the refresh happy VP.

* The Cost Explorer is admittedly getting better, but still lacking a lot of necessary detail. You have to pair it up with CloudWatch to get actual cost and usage in a usable way. The value add services like EMR/Elasticsearch service/all the ML stuff do the hideous job of hiding actual usage. You gotta dig hard.

* The third party cost tracking tools (CloudHealth/Metricly/CloudAbility/Cloudyn) are just a wrapper around what you can get out of the CUR. Their value-add is reporting and advisement, and giving recommendations on right sizing and reserved instances and savings plans. Though if your cloud team is sufficiently savvy they can do this themselves.

* No matter how you do your analysis, tagging will make your life so much easier. Can't emphasize this enough.

~~~
ersii
> * No matter how you do your analysis, tagging will make your life so much
> easier. Can't emphasize this enough.

Do you have any recommended resources for reading or tips on how and what to
tag in what way for making ones AWS Life easier?

~~~
ldoughty
Tagging is simple*. The real killer is the things you can't tag, like
bandwidth. Be sure to use multiple AWS accounts if you want to split bills (or
at least track bills) to sub-groups like per department. Give each of these
groups their own account (or set of accounts, preferably, if we're talking a
business.. maybe even different accounts for Dev/preprod/prod, if this is a
major cost and the project is worth it)

For tags, you can make any tag you want and summarize bills by tags... So
anything take can be tagged is trackable.. But things like bandwidth are not.

It's also hard to enforce tagging when you can't automatically destroy non-
complaint objects, so again, separate accounts help here.. if the sub-
department wants to know their spend better, THEY are more likely to enforce
the rule than A top-down policy from a disconnected IT group... And you can't
simply apply a gonna "all things must be tagged" enforced in the AWS level
because some items can't be tagged, or the tagging has to happen after
creation (for instance, by SDK/cli, you can't create an ec2 instance with
tags.. you make the instance, then tag it. The GUI does this behind the scenes
so it looks like one step)

So again, for major booking boundaries, use different accounts. After that
point, it's on the delegated entities to use tags appropriately... And it's
often different for each group anyway.

~~~
dragonwriter
> It's also hard to enforce tagging when you can't automatically destroy non-
> complaint objects

You can automatically destroy non-compliant (with your tagging policy)
objects, by querying objects that exist and examining their tags through the
API (heck, you could even script the CLI to do this), and, if you use AWS
Organizations, you can prevent noncompliant resources with a combination of
service control policies (to require tagging) and tag policies (to specify use
of tags).

> (for instance, by SDK/cli, you can't create an ec2 instance with tags.. you
> make the instance, then tag it.

That's...not true. The runinstances call in the SDK that creates one or more
instances from an AMI takes an optional set of tag specifications for tags
that can be applied to the instances and/or any of a wide variety of
associated resources.

(python)
[https://boto3.amazonaws.com/v1/documentation/api/latest/refe...](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.run_instances)

(Java)
[https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/am...](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/ec2/model/RunInstancesRequest.html)

------
ajb
We've got a last-ditch alert set when we spend more than $X in a day. But the
way you have to do this is baroque and it is a bit unreliable. We set a metric
on Max(EstimatedCharges, over 1 day) - Min(EstimateCharges, over 1 day).

Unfortunately EstimatedCharges only updates once a day and sometimes the Max
udpates before the Min, triggering a false alarm. Obviously we could make it
more reliable by using a 48 hour period, but then we'd only find out if
something went haywire when it had been going for 2 days.

Really, how much would it cost them to run the cron job for EstimateCharges
once an hour? Even less would be good (you can spend _a lot_ on AWS in an
hour).

It also stops working if you have a credit (until it runs out) so good luck if
something goes wrong during that period.

We even asked a consultant (recommended by Amazon) if there was a more fine
grained method, and they thought the only way would be using a third party
service which ingests all the events and does its own estimate. This is nuts!
If your charging is as fine grained as AWS is, so should your reporting be.

~~~
herostratus101
Wondering if this would be a good Lambda application. Assuming boto3 allows
it, you could have a Lambda function poll for this up to every minute. It
actually might be a trivial function to write.

~~~
haimez
Wonder how much that lambda will cost you if it goes into an infinite loop

~~~
herostratus101
Why would it even involve a loop? And even your lambda function were:

while True: pass

Lambda functions time out after a period of time that you specify (limited to
15m max).

------
usr1106
Looked at my AWS bill today. On the positive side the bill is zero, because I
have I voucher I got at a conference. But it expires end of the year, so I
better understand what I will be paying for.

I spent quite some time in their cost explorer, but I don't understand a lot.
Most days I have some positive costs, which is probably my usage. Some days I
have negative cost, that's probably when they transfer credits from the
voucher. They do this every end of month, but also irregulary some days when I
use particular services and/or have higher than usual service.

It appears to me that the negative balances are a sum of costs of several days
and the credit from the voucher. As a simpliplifed example I might see +1, +1,
+1, -6. So I "reverse engineer" this as I "spent" 1, 1, 1, 3 and on the 4th
day they credited 6 from my voucher. Too bad the 3 is not visible, I need to
dig it out myself. In reality I use several services and they seem to credit
them on different days. So the reverse engineering is not really possible. At
least not without a major effort.

I remember many years ago it was possible to download hourly (probably also
daily) usage reports. I.e. is usage in hours, KB, requests etc. not in money.
I don't find them at all anymore. Anybody knows whether they still exist?

Also to my surprise I was billed 135 SQS requests last month. Well, I wasn't
billed, because 1,000,000 are free. But my point is that I don't even know
what SQS is and I am sure I haven't used it directly. It appears to me that
they are "billing" me for their implementation details, because they might use
those SQSes internally. Is that how it works? So if basically not using the
services at all causes 135 requests, how much would that be if I really run
some production there?

All in all, very opaque. Thank you AWS for the voucher, but I am not impressed
about the billing transparency.

~~~
murphy214
I would love to have tooling for what AWS consider an IO/OP, I've read the
documentation quite a few times and I think I get it, basically every 256 kb
read or reads under that but at what level of the stack is that considered
from. I tried to find metrics/tools of how to count it as AWS counts but
couldn't really find much.

The reason being I have a library that reads a file in an x-size buffer along
a file iteratively using bufio in go, and I'm not exactly sure what
optimizations are happening that I can't see, and at some points I'm
incrementing a file a byte at a time, thats by definition an IO/OP I think
(super inefficient). Unfortunately a lot of the cloud metrics don't give you
enough granularity or quick feedback to optimize.

~~~
wmf
I would bet an EBS IOP is the same as a Linux block device IOP which you can
monitor with iostat.

------
jackgill
Understanding the AWS bill is far harder than it should be. That being said,
there are some resources that weren't mentioned in this blog post. The
ultimate source of truth is the AWS Cost & Usage Report [1] which can be
delivered in Parquet and queried with SQL via Athena.

Although the Cost & Usage Report alone can solve many billing mysteries, in
some cases it's also necessary to go to CloudTrail logs to determine exactly
which user or application incurred charges.

[1]
[https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2...](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-
reports-costusage.html)

------
jasonkimtech
I'm one of the founders of [https://taloflow.ai](https://taloflow.ai) (the
company tied to the blog post above). We built Tim (taloflow infra monitor) to
save endless hours going through spreadsheets or Cost Explorer. We built it as
real-time dataflow with visualizations on Grafana so you can correlate events
such as deployments to how your costs change. We also built a model that
predicts your real-time costs inferred from infrastructure metrics that aren't
available through Cost Explorer.

Our tool is free for any devs spending less than 60K a year on AWS. Let me
know if you wanna test it out!

------
wazoox
A friend of mine makes a living by helping big companies navigating AWS
pricing and guessing what the bill will look like. He even built a complete
software suite for that :
[https://github.com/trackit/trackit](https://github.com/trackit/trackit)

~~~
scribu
From the screenshots in the README, I couldn't tell how it's different from
AWS' built-in cost explorer.

~~~
mring33621
I would guess that the human driving it is the value-add. I'm sure s/he uses
the custom built software to produce useful, actionable results for their
clients.

------
FpUser
My company was at some point hosting boatload of large media files on Amazon
S3 for our clients to download. It was financial disaster. As soon as
bandwidth from regular hosting services became available I've switched and
laughed all the way to the bank. Now switching to dedicated servers for even
cheaper.

~~~
Scoundreller
+1.

When it comes to web-serving, S3 is great for really bursty loads or lots of
tiny ones.

When you have a lot of throughput all the time; S3 will be an expensive
choice.

------
peterburkimsher
Is there a cloud computing service that provides a free tier that doesn't need
a credit card? I like to maintain a personal website, and I've had to
periodically hop from one platform to another.

I started on free.fr, my parents' ISP, with a web page built in iWeb. But then
I wanted a .com domain.

I moved to Wordpress, but that didn't let me customise the layout.

Then I moved to Google App Engine (appspot), which is good, but it's blocked
in China.

Then I moved to OpenShift (rhcloud), which was great while it lasted. It
wasn't just for hosting a static web page - I could SSH into the server, at
last! But it shut down the free tier.

I tried Heroku, but I'm getting warnings about 80% of monthly usage even when
there's no content there, just a redirecting page.

Currently I'm using Github Pages, but I worry about how Microsoft will try
monetising that. It's also a pity not to have SSH, FTP, or SQL - all my apps
(e.g. Pingtype) have to run on the client in localstorage.

The company I'm working for spends over $3000 per month on AWS.

Whenever I read about these "free giveaway" AWS coupons that require
registering with a credit card, I just think that there's going to be a nasty
fee like this. So in practice I just run things locally on my laptop. If
there's a better provider, please tell me though! It's been a few years since
I last checked the options, and moved everything over to Github.

~~~
Sohcahtoa82
Are you specifically looking for something free? Or are you just trying to
avoid waking up to a massive credit card charge?

If you don't get a lot of traffic, I'd go for an Amazon LightSail instance.
$5/month includes 1 TB of data transfer. If you really don't like AWS, Digital
Ocean has similar offerings.

Does AWS allow you to pay with pre-paid Visa/MC cards? Could use one of those
to pay for the account to avoid a surprise bill from draining your bank
account.

~~~
peterburkimsher
Ideally something that doesn't need a credit card to register, yet has paid
tiers to allow scaling up if there's demand. A prepaid credit card sounds like
a decent idea; if I had less money in my TransferWise then I could probably
use that.

------
segmondy
I got $1000 AWS credit, I refused to use it for fear of ending up with a
phantom bill after I was done.

~~~
why-el
Genuinely curious if this has been done, but considering that a $1000 is not a
lot of money which suggests you are not running a big operation, why not use
it and tie the rest to a card with a limit such as from privacy.com? Granted
Amazon might deny such cards, in which case, if you really want it, you can
set up a new debit card whose account has say a maximum you can spend?

~~~
Bootwizard
Just because you can't actually pay it doesn't meant they won't bill you for
it. That's just going to generate a debt to Amazon once the money runs out. It
won't magically turn off your AWS charges...unfortunately.

~~~
OkGoDoIt
I’ve had the card I use for AWS expire in the past, so I’ve got the billing
notifications from them. Basically they start emailing you as soon as the
charge fails and give you a couple months before they shut off your services.
As long as you pay the past due charges by then, there are no complications.
Based on the wording in the emails, it seems like they will just shut down the
account if you don’t pay the past due charges. Which feels like the right
outcome to me. I suppose they could send you to collections but companies that
do that tend to be more upfront about it.

------
k-ian
AWS is great about dogfooding, most all their services run on top of native
AWS (ec2, lambda, dynamo)... but they don't do that for billing. It's all just
fake money being thrown around internally.

~~~
lostlogin
Internal billing is such a joke everywhere I have had the misfortune of
stumbling across it.

------
tyingq
Not surprising. There's little benefit to AWS from it being clear. We are
pretty happy with Cloudability. Especially if you apply tags for app name, app
version, portfolio group, environment, etc.

~~~
ablekh
What's the Cloudability pricing? I can't find it on their website.

~~~
tyingq
Here's the list pricing. Pretty expensive, but in our case it saves a lot more
than it costs.

[https://aws.amazon.com/marketplace/pp/Cloudability-Apptio-
Cl...](https://aws.amazon.com/marketplace/pp/Cloudability-Apptio-
Cloudability/B075PYPH14)

Edit: Have never tried it, but this open source "Ice" tool looks like it might
be useful for smaller shops:
[https://github.com/Teevity/ice](https://github.com/Teevity/ice)

~~~
ablekh
Much appreciate all the info as well as your comments.

------
paulie_a
I still get charged a buck a month when I have zero services running. I tore
everything down a couple months ago and verified recently. I haven't had a
chance to complain because it is such little money but aws billing is complete
shit and their alerts are a dark pattern.

------
mst
They have their own solution for this. Which is fine, and may turn out to be
great.

But you might want to talk to people like @QuinnyPig from the Duckbill Group
before you assume the fix to your AWS issues is a third party vendor's
product.

------
buzzdenver
Their billing tools are notoriously poor. Just yesterday they confirmed for me
that Cost Explorer only has access to data going back 12 months, so good luck
trying to do year over year comparisons.

~~~
webpaymentsguy
I agree Cost Explorer leaves a lot to be desired compared to what third party
vendors offer, but CUR can give you more granularity.

Options seem limited from providers in general since Azure and GCP haven't
done much better in this regard - GCP cloud billing in particular felt less
far along than the other two providers.

------
rob-olmos
Found this issue with other SaaS-like providers too. Eg, a popular email
relay/delivery service has a per-email price if you go over your rate plan,
with no ability to set a hard limit for the account or per sub-user.

Compromised account or server? It'd be interesting for their spam filters to
catch most of it. But an accidental loop or issue in your code? (like another
commenter mentioned with a $30k bill). Yikes.

Incredible to lack such a basic feature to better protect an account
especially when money is involved.

------
pkphilip
I also have had instances of Amazon raising large bills against my account
when the stated services were not being used. I was able to fight the payment
and get the payment reversed because it just so happened that during this
specified period my account was locked.. Which also means that none of the
Amazon resources allotted to my account would have been in active state

------
dustingetz
Also see [https://www.cloudzero.com/](https://www.cloudzero.com/)

------
coleifer
Imagine buying a server and colocating.

~~~
jotm
This isn't 2005!

------
partiallypro
One thing I hate about AWS billing is that it doesn't separate out disk costs
from the VM compute. Also, is it just me or is RDS ridiculously expensive? I'm
fairly new to AWS, having mostly used (and continue to use) Azure.

~~~
inferiorhuman
_One thing I hate about AWS billing is that it doesn 't separate out disk
costs from the VM compute. _

Doesn't it? EBS stuff (storage and IOPS) being a separate line item last I
checked. Ephemeral storage (if applicable) is included with the compute price.

 _Also, is it just me or is RDS ridiculously expensive? I 'm fairly new to
AWS, having mostly used (and continue to use) Azure. _

I seem to remember it being around what an EC2 instance cost until you go to
multi-AZ and then you're paying for an extra instance. But I've only used RDS
with postgres and mysql type engines, none of the proprietary stuff that would
add on extra licensing fees.

~~~
jrockway
Nope, the RDS instances cost quite a bit more. For example, in us-east-1, an
unreserved t3.micro costs $91.104000 a year but a Postgres t3.micro costs
$157.680000 a year. That particular RDS instance does not include any storage,
you pay as you go with EBS.

For the m4 series, RDS is almost double the instance cost.

~~~
WrtCdEvrydy
Yeah, RDS is pretty expensive but unlike raw compute you do get guaranteed
performance.

The issue with AWS is that it's easy to add new services without finding new
vendors so companies just spend more and more on AWS as features are not as
important as 'cost savings'.

------
codazoda
"There's money in confusion"

~~~
lostlogin
A big (by NZ standards) teleco CEO, Theresa Gettung was famously quoted as
saying “What has every telco in the world done in the past? It's used
confusion as its chief marketing tool. And that's fine”. She was rightly
pilloried for this, but the honesty is admirable.

[https://www.nzherald.co.nz/business/news/article.cfm?c_id=3&...](https://www.nzherald.co.nz/business/news/article.cfm?c_id=3&objectid=10380894)

------
dustingetz
Blockchain for AWS billing. Money streaming, in real-time!

------
usr1106
Not really a billing opaqueness issue, but endless lambda loops are a nasty
spending risk. One idea how to at least partially protect against it
ttps://theburningmonk.com/2019/06/aws-lambda-how-to-detect-and-stop-
accidental-infinite-recursions/

Disclaimer: I have not personally been involved using lambdas for anything
serious, so my experience is limited.

