You can offload some of that to paid services like cloud-health but it still takes engineering resources to manage the costs you find out about. I.e You may want a 'fully immutable' data pipeline (i.e DPL) - with each job running on a fresh instance each time but that usually means getting hit with a full billing hour even if the job takes <1 minute. So you have to use schedulers/containers but then you're working outside the 'instance' paradigm and get hit with the problems segment talks about.
To control costs and provide good UX for end users we eventually started regressing to mainframe-style computing with the ridiculous X1 RIs which was an interesting experiment. These take like a house down-payment but on the plus side users are really pleased when they see 128 cpus in htop.
It'd be real nice if the customer didn't have to do this but AWS would rather spend their engineering resources to compete with the G at beagle identification APIs or try to vertically integrate with poor clones of github. (I realize this is slightly unfair )
I know, I know - you're saying 'lambda' is the answer - won't get into that here.
This is opposed to the 'cloud' thing of get an instance just big enough to run your job, run lots of them if you need to scale.
Dealing with Amazon billing is absolutely infuriating. We are also on their EDP (Enterprise Discount) program and all of it is done MANUALLY. There is literally NO way i can check hundreds if not thousands of instance hours to ensure there was no error in their processing. Its complete madness. We use a dozen services and EBS, Bandwidth, inter-AZ bandwidth is impossible to audit.
That being said we have Looker sitting on top of Redshift, we make sure each "team" has a tag (infra wont spin up without it) and then we can easily set budgets, watch trends in Looker and track spend by team and by product name. Our finance team loves it.
The data is there, but Amazon REALLY makes you do the work. Their default billing reporting is useless.
Will publish on HN when our tool is open sourced.
It's been very helpful to notice and track down extra costs as they come in
Unfortunately, the project has been largely abandoned and the current version in the Netflix GitHub repository won't work out of the box for many companies due to lack of support for new AWS regions, instances, reservation types, and services, as well as showstopping bugs (such as https://github.com/Netflix/ice/issues/210). In order to effectively use ICE today most users will need to maintain their own patched fork.
I'm currently campaigning to create a community-maintained version of ICE, with committers from multiple organizations, in order to revive the project: https://github.com/Netflix/ice/issues/240#issuecomment-29960...
If you want to get good information by Tag, By Product, By detailed Date Range, etc you need it in a DB.
We just migrated our data infrastructure to GCP. One of the big motivators was experiences like this with AWS. We've got near-realtime GCP cost dashboards in BigQuery, and the only meaningful work on our end to make that happen was writing the SQL queries.
The company posting this recently managed to save $1,000,000 annually on their AWS bill.
Having confusing billing makes it harder to spot that you're paying too much.
It is the same reason Dropbox doesn't include anything in the web interface to find large files.
Edit: tossing in some options to accomplish this; not really related to the current discussion:
Space usage analyser for Dropbox? | https://webapps.stackexchange.com/questions/47440/space-usag...
Unclouded - Cloud Manager | https://play.google.com/store/apps/details?id=com.cgollner.u...
Imagine you lead the product team at AWS.
- The team is reviewing what to build for the next quarter.
- You have a long list of revenue-generating features
- At the end of the list there's one more item that'll help your customers spend LESS money on your product.
- You can only build so many things and you know that you, your department, and Amazon the company will get pats on the back in $$$ form if you focus on the revenue generating features
- Sure, it'd be really good to help longterm customers understand their costs better, but your biggest ones have the resources to build that infrastructure themselves anyways.
And that is why this is so hard.
How many customers quit amazon due to hard to read billing?
I suspect those are the two big questions to explain why this feature is lower on the priority radar...
"That lightbulb is broken!"
"I was going to replace it, but I couldn't find a customer to pay for it, and nobody has threatened to change to a brighter competitor"
It's about focus. Only so many hours in a day, so many dollars, so many developers. If AWS got a nicer billing daashboard but Lambda was delayed by 6 months, how would it have changed the place AWS has in the market?
Yes you could take this too far and never change a light bulb. But I would answer "I will be more productive if I have a nice light bulb, so it is worth the 5 minutes to change it"
It's a complicated tool for sure, but once it was all set up we finally had visibility into a complex multi-account AWS spend, and could start generating automated cost reports for each company business unit and major customer.
I wouldn't recommend going to the effort of building a custom setup... AWS billing is just too complicated and it changes frequently to add even more layers of complexity. As one example, the recent change to add RI size flexibility completely changed the calculations for RI costs and recommendations.
I've also used cloudability and cloudcheckr in the past, but both systems had serious drawbacks. In my opinion cloudhealth is a much more advanced/professional system at this point.
We don't yet introspect ECS clusters to assign a portion of spend to tasks, but we do breakdowns of services, tags, instances, ELBs etc across 1-N accounts. For S3 we can actually introspect buckets and produce rollups by object metadata as well as heat maps (which objects are being accessed a lot)
1. Obviously shut down and remove anything you're not using -- instances, ebs volumes, etc.
2. ELB - paying per-request for many small requests; and you pay double on the bandwidth in some cases. Switched to terminating ssl using nginx on the frontends and using dns load balancing to save a pretty good chunk of money.
3. vpc endpoints for s3 - can be significant savings if you're doing a lot of I/O on s3 from private instances in a vpc over a nat gateway.
4. new instance types -- the newer instance types typically have more compute/ram/disk for less money, migrate to them.
5. Consolidate services onto a smaller number of VMs where it makes sense
Once you learn some of these tricks, you just sorta do them that way from the start on new products. Best practices in terms of spend on these platforms.
We tried initially building it on our own, but the engineering cost was way too high. Especially given all quirks and changes in AWS.
Sure, there are several pros and cons to weigh, but if the application isn't locked in, a migration to metal could make sense. Anyone went through this?
The higher ups were lured in with promises of easy elasticity that'd allow them to save money compared to our relatively overprovisioned DCs but the reality is that only some projects that benefit from this and certain teams which need large numbers of heavy duty servers for baseline usage are much more expensive.
We have an internal team of 5-6 who are responsible just for building infrastructure on AWS for other teams and prodding teams to get into AWS.
Probably looking at $500k in engineering and staffing and training costs at the very least, just to get back to where you started, not to mention the progress you didn't make on the product.
Source: 6 figure/mo AWS spend. We use pretty much every infrastructure piece in AWS in multiple regions and to replicate that flexibility in our own data centers would be an incredible cost.
And even in your case, I wonder if it would have gotten off the ground if you had to deal with infrastructure up front rather than as an efficiency move after the basic applocation was built, up, running, and stable: even if the cloud was mostly useful for ramping new systems up to the point where the system was basically stabilized and the infrastructure and ops needs were clear and could be assessed at leisure and efficiently provided with dedicated resources, that would still be a huge role.
tl;dr: They like the IaaS (ECS, Dynamo, ..), they need a lot of resources, and they're moving too fast to build it out in-house
It picks up on your AWS tags and then lets you allocate that to projects/departments/cost centres. Reporting on untagged service is available OOTB.
The business model is a 30 day trial and then you're charged a percentage of your AWS monthly spend. It worked out cheaper than building something ourselves.
For example, we switched to purchasing reserved instances up front to try and reduce our monthly bills (albeit at a higher up front cost). But trying to match up Reserved instances with EC2 or Elastic Beanstalk instances that are actually running is a real headache.
Several times, we've been caught with useless reserved instances sitting there not being utilised - because we had shut down a project or similar. Had we been able to see "Oh, we have 10 t1.micro reserved instances, but only 7 are being utilised by active EC2 instances at the moment" then we could easily know that we could provision another 3 t1.micro instances at no extra cost to experiment with something or for a client project. Or we could convert those t1.micro instances to another region or instance type to meet demand elsewhere.
It would also be handy on their billing reports if they delineated or showed the breakdown of the total usage hours against 'on demand' and 'reserved' instances. I know they _sort of_ do that now, but I think it could be better laid out so we can see at a glance whether we are getting best use from our reserved instances.
AWS also published a quick guide how to get the billing data into Redshift and use Quicksight to analyze it: https://aws.amazon.com/de/blogs/aws/new-upload-aws-cost-usag...
Edit: Sample invoice: https://www.microsoft.com/en-us/download/details.aspx?id=388...
If you have an EA agreement, half (or more?) of the billing report interfaces don't work. If you have to use the EA portal due to an EA agreement....lets just say within the past year they have been through some rough periods of random undocumented format changes, magically disappearing billing reports for a few hours at a time, random intermixed windows and linux style line endings in the same file, periods where they are DAYS behind your actual usage, etc...
Source: I wrote a infrastructure inventory and billing/usage aggregation system that normalizes data from Azure, AWS, and vCenter to help our company easily track spending by service, tags, etc... across all of platforms where we have a presence.
That still doesn't solve the problem of making sense of each item.
So what's the advantage of the Azure invoices?
We've got a prepaid service that allows you to always ensure your AWS bill never goes above certain amount as well as flat discounts on your monthly spend
Installing ICE to see costs in detail https://github.com/Netflix/ice
EDIT: lies, first use is 1688: