Hacker News new | past | comments | ask | show | jobs | submit login
As AWS Use Soars, Companies Surprised by Cloud Bills (theinformation.com)
155 points by kaboro 28 days ago | hide | past | web | favorite | 161 comments

When I joined a particular startup 4 years ago I was surprised that they owned their own servers mounted in racks in a rented colo. I was all, how retro! But then the head of IT showed me the cost savings over AWS and how he orchestrated it all using VMWare images, and I saw he had a point. He saved more than enough money to pay for his full time salary to keep it all going, even dealing with hardware maintenance. What was unique was that the company had a predictable load that didn't change much. If we had needed to change capacity level drastically over a regular cycle then AWS would have started to make sense.

I have worked at a startup that was paying > 20k per month for AWS. I am certain the whole system could run on one dedicated blade. There was just this culture (made easy with docker containers) of "spin up another box" for every silly little tool. I have met self taught dev ops people who did not understand that you could create more than one user on a aws ubuntu machine. I was like: "if cost is a big issue, then lets spin up one beefy machine and create users for each dev", because thats what everyone using unix did back in the day (it works fine). The reply was "thats not possible" so every dev got their own dedicated instance that was 99% idle. We are paying >3k per month to aws and the business team is complaining about waste. sigh. old guy rant over.

Did you weigh these costs against the value of the developers not stepping on eachother's toes using the same instance?Separate users with sudo privileges provides zero isolation.

I'm unsure what use case was, but if it was a testing or staging environment dedicated to each developer, having a separate right-sized instance for each is quite sane and reasonable.

EC2 instances such as the T2/T3 series are cheap enough that there's normally not a need to be so miserly in my experience.

>> I am certain the whole system could run on one dedicated blade. There was just this culture (made easy with docker containers) of "spin up another box" for every silly little tool.

> Did you weigh these costs against the value of the developers not stepping on each other's toes using the same instance? Separate users with sudo privileges provides zero isolation.

Each developer's machine is a separate instance. If the whole system could run on "one dedicated blade," then the developers could easily run a local instance on their own machines (either directly or in a VM) without stepping on anyone's toes.

Using AWS for local development seems wasteful and over-complicated to me. The only good reason I can see for doing it is if you're really vendor-locked into AWS-proprietary services, and it's impractical to run your code anywhere except on AWS (but that's arguably a mistake in itself).

> Each developer's machine is a separate instance. If the whole system could run on "one dedicated blade," then the developers could easily run a local instance on their own machines (either directly or in a VM) without stepping on anyone's toes.

What the parent you're responding to likely meant was that, by running everything on one system ("one dedicated blade") you are causing all of the things on that system to share certain things. The set of system-installed packages; if nginx needs to be installed and running, is its configuration now shared between all of the services on that machine; and even the very OS itself (e.g., if it needs to go down for updates, or it needs to do, say a dist-upgrade, which means that all of the associates services on that instance need to dist-upgrade together, and one laggard can thus hold everyone back.)

Even in the case of developers using an instance on AWS, the same applies, really: a dev, legitimately developing some component, could have cross-talk to others, in all of the same ways listed above. I do agree that giving devs dedicated instances might not be appropriate in many instances, and it perhaps wasn't appropriate in yours. But there's two points here: whether the devs needed those instances, or could develop on local laptops, and whether putting all instances or users on a single instance works well.

Even for development, given that most corps seem to love giving devs that code primarily for Linux servers OS X laptops, there might be advantages to allowing them to develop on the actual target hardware.

> There was just this culture (made easy with docker containers) of "spin up another box" for every silly little tool.

Docker here should make it rather trivial to combine everything onto a single instance, since that's what isolation gets you: a well defined unit that doesn't have endless unknown dependencies on the host system.

Sounds cheaper to just give them VMware fusion and a dev OS image which is mostly identical to the target architecture.

I really don't see why many run their environment on macOS through Homebrew etc which is surely going to require modifications here and there when deploying to Linux servers.

An old job had that setup, with about 15 devs per instance. It worked surprisingly well, where each developer had sudo access but was asked to 'use it with care'. It helped that there was a mix in the type of work, so not everyone was doing resource-intensive work, etc.

Can you not run a docker container for each dev who would log into it, so they're isolated to begin with?

Of course ports can't be shared for each app servers but you can get around that anyhow.

If the development environment is comprised of multiple container images that would likely justify dedicated small EC2 instances.

Knew a place that turned off the developers' test environment because it was costing 10x that per month. Which was as popular and successful as you can imagine. I'm.... skeptical about the benefits of the cloud architecture they actually had, given those costs.

Our org just implemented this. A script turns off all services in lower environments (dev, qa) at night; another one turns 'em on in the morning. Of course if you need to, you can run your own script to turn on your services "if you need to".

This is being sold as a "step in the right direction", where "right direction" is anything that reduces cloud costs. (Payroll costs are apparently invisible, so the time people waste dealing with this baloney doesn't count. Not to mention the annoyance.)

I'm tempted to point out that they could save a lot more money by merging microservices into macroservices....

I've worked for a certain IoT startup that at one point paid $850k/month (eight hundred and fifty thousand USD per month) for AWS usage. After about a year of full time work by a team of SREs and programmers, the bill was made to go down by half.

Any breakdowns on cost?

That's like half of one engineer. Less if you include all of the overhead of having an employee. That can easily be the right choice.

I have met self taught dev ops people who did not understand that you could create more than one user on a aws ubuntu machine.

Wait, you're saying there's a Bachelor of Science DevOps degree?

They are very young, nice kids, who have started from a pure aws standpoint. They started with AWS tutorials. These tutorials did not teach the full unix environment. The gaps in their knowledge is like the grande canyon. If they just read the O'reilly book on linux: http://shop.oreilly.com/product/9780596154493.do. they would be miles ahead. But, their only interest is docker/kubernetes. Those technologies are great, but, they simply have massive gaps in their knowledge of the fundementals. They don't seem to like reading technical books from cover to cover. So they just read tutorials far enough to get the system started. Then they cut/paste from stackoverflow when problems arise. If that does not work, they blame the tools ("couched is too slow!"). Again, its not their fault, just the new dev culture. I freely admit I am an old fogey.

We agree.

I interviewed at (a fast growing real estate startup) and as an old guy, I highlighted my vast enterprise Linux engineering experience. The response from management was, oh, that's nice but doesn't matter here. I can teach people the command line in one day. This is a big AWS and exclusively Linux shop.

What do you say to that?

From my experience, the hardcore Unix folks should stick to jobs that highlights that type of knowledge. If you join an AWS shop, you're never going to dig into low level things like the proc filesystem, and the arcane knowledge of knowing every single tool just isn't used. I've gone from remoting into machines on the daily, to not using SSH for almost a year now. Just depends on the job I suppose. Went from having knowledge of all the different signals, and getting really good at tcpdump/gdb, to just writing Terraform/CloudFormation scripts, and nuking an instance from orbit if there were any issues with it. Such a different workflow.

Yes, BUT:

Grey-haired Linux folks who've hammered on complex, changing, sometimes esoteric systems in different companies with unique challenges COMBINED with having to have learned a programming language, to view these people as obsolete is an ultimate folly of youth here in IT.

To suggest that it's all different now, your Shell/Perl/Webserver/database/caching/networking experience somehow has nothing on Python/Terraform/AWS/Docker really shows inexperience of hiring managers.

I learned both, and I can tell you AWS+someyaml+Python are MUCH easier than any of the stuff I listed above. Yet, there's a new generation of hiring managers that believes otherwise.

Today at work I had to explain absolutely basic DNS to someone. I dunno, call me skeptical about this new paradigm business. The paradigms look awfully familiar.

I'm on the business side, and am relatively young. Have dabbled with computer stuff since middle school (built desktops with my Dad, internet privileges revoked for a year in middle school for hacking the Oracle system, have run linux on my non-work laptop for the past ~5 years, etc.), but never pursued serious study whether solo or in a formal setting.

Both with tech and non-tech related learning, my autodidact process (largely determined by growing up through the late 90's and early aughts) was the following:

(1) Google search or Amazon search

(2) Blitz-read some website I thought might be relevant from the Google search results or Blitz-read for author and subject matter info. I thought might be relevant from the Amazon search results

(3) Stop at (2) because I had pieced together a solution or had gleaned a sufficient amount of information; or repeat (1) and (2) until piecing together a solution or gleaning...

This rather clunky but speedy (notice I did not say efficient) process likely accounts for far more of my knowledge and skill base then I'd care to admit to friends and colleagues.

All this to say: management's response to you is representative of a default mode of thinking determined (to a non-trivial degree) by the environment in which they grew up.

This default mode of thinking has a number of short-term benefits, but often results in many long-term headaches. Indeed, you really can teach someone the command line in a day. However, this someone will end up doing stuff I do with the command line, such as using ls 100x more than is necessary or almost always changing directories one at a time.

“Thanks for your time. Have a nice day.”

Documentation is no replacement for experience.

EDIT: I agree that OPs choice of words makes the idea expressed ambiguous.

I just don't understand what a "self-taught" DevOps person is -- as opposed to a someone-taught? Self-taught usually means "not educated formally."

All DevOps / SysAdmins / etc. are "self-taught." So it's unclear what the OP meant.

it can but if that blade or the enclosure dies, the system is sol...also, procurement can be a pain

I agree completely.

I consult with multiple customers and they all want cloud. It's "Cloud First" and any discussion to the contrary results in lost business. If you do that you are considered a Dinosaur.

On the issue of predictable load, most businesses not only have predictable load but a lot of capacity is just allocated and never used. VMware serves such scenarios even better by going for heavy over allocation. Everyone will demand 10 servers with 8vCPU, 64 GB of RAM without actually using even 10% of the allocated capacity.

But hey Cloud is cool so I have to be on it.

You can be cloud first and still not squander money on idle resources...

That requires culture change and it's hard for big boys with checkbooks.

If all you’re doing with AWS is a bunch of EC2 instances - the equivalent of a bunch of VMs - yes you will spend more and not get any operational experiences.

That’s like the old school Netops people who got one AWS certificate, did a lift and shift of a bunch of VMs and then proclaimed their client has “moved to the cloud”. The client still has the same amount of staff, same processes, etc. Of course it isn’t going to save any money.

That's the reality of it. Any proper move requires longer term planning (> 2 years) and currently CIO themselves don't usually last 2 years. Any project however complex needs to be executed in a quarter which means "Lift and Shift".

Sad reality.

This. If you can predict your workload with some accuracy, and have someone who knows how to design systems- you can save a lot of money. AWS' value is that "we can do everything except write your code for you" which you end up paying a lot for.

If you have someone who knows how to design systems and your workload is constant most of the time with occasioal spikes you could run your base load on your own machines and offload the spikes to AWS.

Yep. There is a 99.99% chance one of the following will happen:

- Your Colo has direct connect to AWS, so you can deploy EC2 under times of load

- Your Colo offers some sort of virtualized infrastructure that you could wire into your network and offload under times of load

- Your Colo has a virtual server provider running that you can offload to in times of load.

- Is close enough to one of the big 3 that a Site to Site will work fine

... isn't that the hardest part, predicting workload, for many companies. Scaling up within a colo can be troublesome due to procuring hw, having remote hands rack it, setup, etc.

If you can predict workload then you could also reduce your cloud bills, in theory, by moving spinning up/down instances.

Many are going the cloud route because the cost of entry is much easier to start developing but eventually, yes, the cost skyrockets and you do end up paying for putting little forethought into the real workload requirements.

> Scaling up within a colo can be troublesome due to procuring hw, having remote hands rack it, setup, etc.

One senior sysadmin/infrastructure engineer with on-prem experience is enough to scale up your colo presence. You can either assemble your own servers or custom order and have them shipped directly to the colo. Remote console and power management (PDUs) is a solved problem. Out of band management, also solved. Virtualization, solved. Push your VM snapshots, content, artifacts, and database backups into your object store of choice for offsite backups. Ideally, your colo facility is within driving distance of your office if you absolutely need physical access. Otherwise, you pay someone at the colo a few hundred bucks a month for remote hands services.

EDIT: k8s arguably makes this easier than in the past. You want to run on your own cluster? You can. You need to burst your containers to spot instances in someone else's cloud? You can.

All the problems you consider solved are only solved if you have the tooling from AWS/Google or half solved if you have VMWare.

With physical hardware, the sysadmin will spend a lot of his time fighting the hardware supplier and the colo to get hardware in place and running, then reinvent the tooling. Any trip to the facility is half a day lost.

Many organizations run their own hardware successfully without the tooling I mention. There are problems, just different problems from cloud with significant cost savings realized. Why would you be fighting your hardware supplier or your colo? I have never had a hardware vendor "fight me", and only once had challenging negotiations with a colo provider.

Have you never lost half a day of getting work done because Amazon's EC2 API was failing to launch new instances? I have! [1]

To simply hand wave away hosting your own hardware as infeasible and more costly (especially when it's been proven to be more costly) is unproductive at best.

[1] https://hn.algolia.com/?query=AWS%20outage&sort=byPopularity... (HN search of "AWS Outage")

The cost savings have never been proven. If anything, they're proven to be miscalculated every time someones gives his numbers.

Self hosting is doable up to 2 or 3 servers with a very few services. Beyond that it's really suffering from the lack of tooling, the lack of isolation and the lead time of weeks to get anything more in place.

DELL wants to re verify by phone 3 times before they bill or ship anything. Don't ask me why, I don't know, I keep begging them to stop doing that.

I've had the API down about 4 times in a year, always for specific types of instance, if I remember the count well. Never more than a few hours. It's nothing in comparison to the weeks it takes to ship and setup physical hardware.

Funny, I managed to run a globally-dispersed on-prem system with over 300 individual physical machines (well, blades in racks) with one other guy no problem for many years. I was dealing with four different hosting providers doing remote hands and never had any issues whatsoever.

About the most complicated tooling we had was Anaconda.

I think I lost three hard drives in that time and all three were hotswapped with zero downtime under warranty. No other failures. And that's not an outlier - I've been doing this 25+ years and the hardware failure rate has been pretty consistently low.

Similar experience here roughly, albeit mine is at a smaller scale.

The key phrase in your post is 'doing this 25+ years'.

What you're providing is the /confidence/ that you can do it. Most outfits nowadays probably have no one who has that.

I think there's a sliding scale with cloud services.

At the ultra-small scale (1 person company), it's arguably cheaper to run from a single server you bought.

Mid-sized companies (~2-1000 people) probably benefit the most from cloud computing because they need fewer dedicated salaried employees to manage them.

Beyond ~1000 employees however, you're going to need that expertise regardless of whether you use cloud or on-prem. And the markup of those thousands of cloud machines will start to become significant. You'll need to do a cost/benefit analysis and I can see it going in either direction.

The cloud's cost savings are not in your 300 individual machines. In fact, I'd bet that your 300 machines are much cheaper than the equivalent on AWS.

Rather, the cost savings come from the salaries of you and the "one other guy" you mentioned. If cloud hardware can bring your staff of 2 down to 1, then the cost savings are huge. More-so if they can bring it down to 0 and have devs manage the hosted infrastructure. Each of your salaries is likely near or higher than the cost of running all those machines.

In my experience with cloud - pushing 8 years now - it's a total fallacy that you can just run without admins. Every time I've come in to clean up a giant mess left by "devops" who were actually just devs who knew nothing about administration of servers I've found massive amounts of waste, duplication, zero optimizations and poorly operating systems that even a intermediate admin would have helped to mitigate. One place, just doing a bit of tuning and moving a few services on-prem resulted in a $200K/year savings.

Developers and Administrators have different priorities and I can count on one hand the number of true "DevOps" people I know - the rest are either decent admins and poor devs or decent devs but poor admins. The idea is nice but the reality isn't quite what its made out to be.

It's a fallacy to save money on personnel. The time saved by not managing low level hardware will be assigned to other tasks. The savings are actually done on instance costs and flexibility.

Physical instances have too much over provisioning and zero flexibility. You can easily be paying for an order of magnitude above the actually used capacity. (This can be helped with VmWare virtualization instead of bare metal but the amount of new companies using VmWare is shrinking and licenses are expensive.)

Whereas with the cloud, you can half any instance CPU/memory to save half the money. There is even build in monitoring to analyze usage. There is little waste.

> Physical instances have too much over provisioning and zero flexibility. You can easily be paying for an order of magnitude above the actually used capacity.

The cost of the hardware is not the issue.

Paying for 600 machines when you actually need 300 probably "wastes" $50-80k/year in over-provisioned hardware. Hiring a skilled admin to figure out what's going on, optimize the network, and reduce those 600 machines down to 300 probably costs at least $150k/yr in their salary+benefits. The cloud does not decrease the cost of the hardware (it's actually more expensive), but it allows you to reduce the number of admins managing it. That's the cost savings.

Given the above example, what would any reasonable general manager do: hire someone to make things more efficient, or just pay for the over-provisioning?

You're hopefully kidding when you say that the costs of the hardware is not an issue. Server grade hardware costs an arm.

600 servers at $10k each, that's $6M upfront. Then another $6M within 3-5 years to renew them.

Of course in that example you'd pay someone, even multiple people in fact, to make things more efficient.

Well, you're either incredibly lucky or you have zero monitoring and failures have no impact.

I couldn't imagine having to manage 300 physical servers in a team of two. The hardware, OS and networking alone are more than a full time job with 24/7 oncall expectations. In fact that's why I don't do this anymore.

Pretty sure my current company had more failed hard drives this month than you all these years. I remember a shipment of servers once with more dead motherboards than that.

I bought all brand-new Dell blade servers so that's probably the main reason it was fine. IP-based kickstart boot, anaconda provisioning, it only took one person to monitor a server's setup.

The point is that managing on-prem equipment is the easiest its ever been. When I started the ratio of admin to box was 1:50, now it's easily 1:300.

"Self hosting is doable up to 2 or 3 servers with a very few services. Beyond that it's really suffering from the lack of tooling, the lack of isolation and the lead time of weeks to get anything more in place."

There are plenty of companies with their own datacenters of hundreds or even thousands of machines. Yes, these companies usually have deep pockets and their own dedicated staff to take care of these servers. This is how it was done before AWS existed, and is likely to continue long after AWS ceases to exist.

And that’s just the point. AWS lets you not only cut down on “dedicated” staff. You can also share a lot of the staff you do need by using a Managed Service Provider.

> DELL wants to re verify by phone 3 times before they bill or ship anything. Don't ask me why, I don't know, I keep begging them to stop doing that.

You need to get a Dell sales rep you deal with everytime you order. You'll get better pricing too.

Computing power on AWS costs 3x+ as much as the naively calculated cost of buying and running servers. Of course there are situations where running your own servers actually is cheaper. You are arguing the impossibility of something people are doing right now, so...?!

There is a lot more that AWS does for you than host a bunch of VMs...

Many use cases doesn't require what AWS provides besides storage and compute primitives. If your use case must use Amazon, use Amazon. If not, consider more efficient alternatives.

I don't mention AWS in my comment, not sure why you thought I was comparing the two.

Even then if you can just get rid of one support person you can save enough to buy a lot of from AWS just taking into account the management you don’t have to do - if you automate.

"Ideally, your colo facility is within driving distance of your office if you absolutely need physical access. Otherwise, you pay someone at the colo a few hundred bucks a month for remote hands services."

That remote hands service is usually very limited. Datacenter staff usually don't have the time or the skills to do detailed troubleshooting, setup, or maintenance. Maybe they'll swap out a disk for you, power cycle a server, or turn a knob on a KVM switch. If you want more, you'll have probably have to buy in to managed service, which they'll do on their own hardware and with their own setup, where you'll be paying closer to AWS prices.

Now if you want the kind of availability and redundancy that AWS offers you, you'll have to have a presence in multiple datacenters around the world, where these sorts of issues and others will multiply.

> Now if you want the kind of availability and redundancy that AWS offers you, you'll have to have a presence in multiple datacenters around the world, where these sorts of issues and others will multiply.

Wikimedia is serving Wikipedia (ranked #5 for Internet traffic) with ~550 servers out of Dallas, TX and Ashburn, VA for about $4 million/year in tech costs. OpenStreetMap infra is ~$120k-$150k/year (albeit with volunteers and some donated hosting capacity). Do you need Amazon to do better?

I am not trying to be a "stick in the mud". I am trying to save you money.

aws is only a few cents an hour, i dont get why wikipedia doesnt switch over unless they are pocketing the differece.

Because Wikipedia doesn’t want to waste money.

“A few cents an hour”. It’s a 30% markup over physical hardware, not to mention extortionate bandwidth pricing.

You might be surprised at what offering them (the “hands”) a packet of well-written runbooks and a bottle of scotch for the holidays will get you.

I work at a company that runs out of 11 datacenters around the world. We don't have issues with availability and redundancy

Servers are cheap enough so buying a few and keeping on standby is no brainer and they are good for 5 years generally.

Even if you can't predict the workload too well, you can go hybrid cloud - colo for the baseline and then AWS for the unpredictable stuff. ManageIQ is great for this.

I think VMware has expensive licensing. At least for me when I was just trying to set up home linux lab environment. And than I discovered Xen Project.

I played around with that, and than AWS/DO came along.

I've been tinkering with bhyve lately at home. there are definitely a few kinks to work out but I am _very_ happy with it and hope the community around it grows.

You were lucky to have a guy with VMWare experience. Most startups who avoid the cloud go bare metal and that's completely unmanageable. Basically a pair of high memory servers running every services with no isolation whatsoever.

By the way, Vmware is not exactly cheap either. I recall when I was selling VMWare licenses 10 years ago, it was starting around $5000 per server.

I personally feel many things in AWS are purposely made in order to hide the real costs from you.

For instance how you are charged for S3 traffic your buckets receive even if you don't have any files there. Or how traffic between zones that is outside of your control is still charged. Or how things like AZ replication cost a ton and you have no metric to show if it even makes sense to enable them. Heck, even usage alerts cost money.

S3 traffic cost is a funny topic. I remember ~3 years ago we had an issue with AWS, where they kept charging us for web traffic to an S3 bucket we deleted. When we complained, their response was like: Well, then stop sending traffic to that (non-existing) bucket. That of course wasn't going to happen as the URL was out in the wild and got requested quite often automatically by third-party services.

If you cancelled your AWS account entirely, but people were still trying to reach that deleted S3 bucket, who would pay for that?

Did you not control the DNS for that hostname? Not supporting AWS' stance, but you should always maintain control of your traffic flow using DNS. Reverse proxying is the latest hype for SEO juice (blog.somewhere.com->somewhere.com/blog), but subdomains are the bees knees when you need to point traffic somewhere else on demand.

Back then we already had the bucket for a few years and the person who set it up probably didn't think about the implications and simply chose to use the AWS-provided bucket URL.

And that person isn't alone. If you look around, you'll find a lot of services on the internet which directly link to S3 buckets using thee AWS-provided endpoints.

A tool my company always pulled data from was depreciated and the web ui and api got shutdown. They used the awe bucket url though for the web ui downloads and those had predictable url patterns. Turns out they didn’t stop generating the data we needed, so we still just go straight to their urls.

You don't know what you don't know. That's the danger.

If someone has the aws bucket name they only need to form the s3.amazonaws.com url and still request it.

You can restrict access to a bucket so it only goes through a CloudFront url that you control.


Yeah, when we needed to move our download servers from Rackspace (~15TB/month) after they canceled sponsorship for OSS projects... S3 costs didn't look reasonable.

Went with a small cluster of cheapo ARM64 scaleway servers (no bandwidth billing), which costs us ~15 Euro a month.

Much better than the likely ~$US1.5k with S3 just for transfer costs alone.

Yeah S3 traffic bills are always surprising. It’s probably 2x or even 3x more than the actual storage cost for our use case. And we have caching mechanisms on our end.

> For instance how you are charged for S3 traffic your buckets receive even if you don't have any files there. Or how traffic between zones that is outside of your control is still charged.

Can you say a bit more about these two things? When do you get charged for s3 traffic on a bucket with no files? What do you define as uncontrollable cross zone traffic here?

Maybe they mean something else but I expect a GET request is $0.0004 per 1,000 requests (i.e. $0.40 per million requests) regardless of whether it returns a file (200) or not (404). Sounds more like a nuisance than a financial risk.

After years of working with AWS this is not surprising. You can have developers with minimal sys admin / dev ops building instances and running jobs. Back in early days of my startup I recall my team launching a big model run and leaving for weekend. We can back on monday to a 10k bill and thousands of compute hours... That did not go well, 10k on a few days of compute for a small startup haha! Still to this day we've got tons of EC2 instances lying around racking up charges. We dont have the devops support to build proper auto-scaling, making things really fit AWS. We just needed a server to run 24/7 and this was easy.

Can only imagine the insanity that goes on at huge Pinterest level scale. How many EC2 instances do they have? Thousands? Hundreds of thousands?

Fender showcased how much they use AWS at the recent re:Invent. They moved more services to AWS, while still increasing usage, and still reduced their AWS spend. Because it sounds like they did it the right way and used AWS services effectively.

I've spent a lot of time helping web developers, that don't have a ton of infrastructure experience, that get into AWS and they almost always do it the wrong way (the most expensive way). As in, they take whatever they were running on-premise and just throw it on EC2.

Would that be the quick keynote excerpt? https://www.youtube.com/watch?v=F61GtueelP4

Or their IoT talk? https://www.youtube.com/watch?v=v7oqSTmrfVc

It’s the quick keynote. Starts around 6:30. AWS spend down 15% even though traffic is 21x what it was a year ago.

if you just have webservers, api gateway + lambda is the way to go Edit: to make myself clear, I meant the backend, frontend, of course, should be served like below

Cloudfront + S3 for static sites.

I've seen so many people just create an EC2 instance for serving static content.

If you have a banging new React app that generates static files, you don't need anything more.

of course, I meant it backend-wise

lambda gets incredibly expensive for anything that approaches even modest usage levels. Once you exceed about 10 requests/second, you'd be better off with the lowest tier virtualized instances ($2 - $5/mo from most providers including AWS)

Not if your usage patterns spike dramatically.

Once worked at a place that needed to do rapid crawls of websites on short notice and we saved a lot of money by moving from.dedocatrd instances to lamdas that we could scale out and down as needed.

How you use them is important.

Pay-for-what-you-use pricing has the ability to escalate. You don't buy a server for $30k, and go through whatever procurement paperwork is set up for that. You just increase average usage by here and there. There's no procurement hurdle, so it can be a blank check.

..just like it's easier to get I to debt with a credit card than a personal loan. Credit card debt mounts one purchase at a time. Personal loans require you to decide to borrow €X.

Reminds me of reading how US Armed forces in Iraq being forced to buy NEW printers because their procurement system didn't allow buying toners to replace used up toners in practically new printers.


I just joined a new company who is doing nothing but helping companies figure out how to save more money on their cloud costs. This is such a timely article.

The same costs that are being used to justify the value of the cloud are now being used to justify going back to buying servers. Which is the wrong solution, by the way.

With anything it just comes down to education and best practices. People need to utilize the incredible tools out there along with understanding better how the cost structures work.

Servers the wrong solution? Maybe in a few situations. AWS is just too expensive.

Eg. 600 TB data out and 100 TB in is $37300 on AWS, and by going with colocation a price of < $500 is very achieveable.

That's ~1/75th of the cost!!!

There is nothing to understand here, except that AWS in insanely expensive and colocation is a way better option in this situation.

I agree that the price is high but you're not comparing the same things. Is the $500 a month DDOS protected bandwidth with as much redundancy as AWS is providing? I definitely shift as much bandwidth away from AWS as I can with any solution I make but they do provide more than a dumb pipe

> Is the $500 a month DDOS protected bandwidth with as much redundancy as AWS is providing?

Sometimes you don't need that.

AWS S3 provides reduced-redundancy options for less cost. It'd be nice if the same applied to other services.

Ditto. The other point to make here is that AWS is not the only game in town. I agree they're not an inexpensive option, but there are even some small "mom and pop" cloud providers that are cheaper and easier to use than AWS. However, until they stop gobbling up market share there will always be ancillary markets that will work to support those that invest in their services.

600 TB/month is an average of 1850 Mbps. Expect at least twice that sustained during business hours.

You're not gonna get a dedicated 10 Gbps link for a few hundreds bucks with a free server.

While that's true, other quality providers still smash AWS on that issue.

Let's say you spread that across 10 dedicated instances ("optimized droplets") on DigitalOcean, at a cost of $800. They account pool the droplet transfer, so that's 50tb with the instances. The remaining 550tb would cost you $5500.

$6,300 vs $37,300 and you get ten servers with DO.

And if you really wanted to be creative, spin up 100 $20 instances, which would cost you $2,000 and give you 400tb of pooled transfer. Then spin up the 10 $80 optimized droplets to use for actual work (maybe there would also be some use case for the 100 standard droplets with 2vCPUs and 4gb ram; otherwise idle them). That gets you down to $4,300, almost a tenth of what AWS is charging just for the transfer.

When it comes to transfer, AWS is a minimum of 5x more expensive than it should be even if you assume a premium for the service. It's a big fat money maker for them.

Agreed. The bandwidth is overpriced. None of this should be served from the cloud, but from a CDN instead.

That being said. I wouldn't try to push the limits of Digital Ocean too much. I have an ex coworker who looked into moving a compute grid to Digital Ocean, pinning a number of instances to full CPU, he quickly got contacted by support to stop before they shut down the account.

That's odd that he had the ability to max the cpu's in the first place right? Was he violating some terms or agreement?

Not at all. Just running a normal computing grid consuming any CPU resources that's available. It was for scientific simulations for a research center, think of something like weather forecast.

Clearly, Digital Ocean didn't want to serve that use case. They don't want customers to use all the resources that they are paying for. They're probably under provisioning power and cooling by a large factor.

Or they thought he was running cryptocurrency mining and the bills might bounce.

Does their term tell you not to max on capacity?

Otherwise DO gets sued for unfairly threatening.

Lightsail on AWS gives 1 TB of transfer on a $3.50 instance. $2,000/m would get you 571 TB/m :-)

Just wanted to make a shoutout to Corey Quinn, who has a consultancy which literally works on lowering your excessive AWS bills. He also runs the most excellent Last Week in AWS newsletter https://lastweekinaws.com/ and podcast Screaming in the Cloud: https://www.screaminginthecloud.com

I've personally known Corey for 10+ years and he's a good dude, so if anyone is looking to lower their AWS bills, talk to Corey.

> literally works on lowering your excessive AWS bills

these companies are a dime a dozen, no personal offense intended to yourself or this company.

The fabulous level of trolling and entertainment he offers on his twitter account however is priceless.

And both the newsletter + podcast are quite fantastic. He really is unrivaled in the space. If you use AWS, you should be keeping tabs on those things. Bonus points that they're free and amusing.

I know him from back when I was helping maintain the Saltstack python config management tool[1]. He was a user and we were doing training on contributing code to salt. It turned out that I did teach him a thing or two and he did several contributions to salt[2]. That's literally it.

[1] https://github.com/saltstack/salt

[2] https://github.com/saltstack/salt/commits?author=quinnypig

I figured I'd see Quinn's name mentioned in the comments. He had a good Twitter thread a bit ago about the easiest things you can do to lower your bill:

https://threadreaderapp.com/thread/1091041507342086144.html (original Twitter link: https://twitter.com/QuinnyPig/status/1091041507342086144)

Your comment doesn't contribute to the discussion. It's more of an advertisement than a comment.

I'm not affiliated with him whatsoever either fiscally or professionally. If someone is reading this article, they might suffer from the same problem. I figured it was worth sharing.

Where I work there's a monthly report sent out that shows what each team is spending. Periodically teams poke around their infrastructure and remove anything they don't need any more. A few times there have been company-wide "spring cleaning" days.

That's all it has taken to save a lot of money.

Same. It's pretty amazing to be able to see where the money is going and adjust anytime.

i honestly cannot believe that this is still a thing! well, actually, i can. :(

i run a large AWS consolidated billing family for a sizeable computer science graduate research lab and tracking our AWS spends are an absolute nightmare.

if you think about it for a minute, why would it be in the best interests of amazon/AWS to allow us granular, easy to parse and (most importantly) easy to customize and use billing reports? they're in the business of making money, and providing mechanisms for folks to limit their spends will limit profits.

another thing to consider is that the billing subsystems for all of these cloud providers are literally one of the first things to be engineered, and after release, one of the last to ever be updated.

for instance, it took amazon two years to release OUs (organizational units): one year in closed beta, one in open. while these are great for organizing accounts, you still can't see how much an OU spent w/o a bunch of python/boto gymnastics!

i was on a call w/some of the leads of the aws billing subsystem about a year back, and asked about what the roadmap for billing features and the response was "2020 at the earliest".

What are you having troubles with? I find the billing page to be pretty good, once the tags are configured for billing.

Far better than any internal tool I had the chance to work with.

we have a lot of projects, and i use OUs to track the N-ary tree of lab->project->people and their spends.

since the aws billing subsystem can't map OU->$$$, i wrote a thing: https://github.com/ucbrise/aws-audit

it'd be also super awesome to have OU-based budgets.

we use teevity ice when we really need to dig deep (https://github.com/Teevity/ice) but it's pretty much abandonware at this point.

i've also explored a couple of commercial offerings (ronin & teevity) but they don't work well in our AWS research credit-based model.

I'm not sure about OU, didn't use them much.

In commercial tools, I would have a look at https://www.cloudhealthtech.com/ and https://cloudcheckr.com/ and there was a third I can't remember the name.

They should have good reporting since that's the only point of the tool, but they charge an arm and a leg.

those look just like the other services i looked in to, but w/o diving too deeply in to their whitepapers i can guarantee they use instance/resource tagging as the shim between AWS and their system(s).

with consolidated billing, and a couple of hundred linked accounts (with plenty of turnover), enforcing a sane tagging is pretty much out of the question.

thanks for the links tho! when i have a spare few minutes i'll definitely take a closer look.

Consider revisiting your account strategy. Hundreds of accounts is way too many and it's unmanageable. Even a company with tens of thousands of employees couldn't have anywhere near that.

Enforce some tags on purpose, team, business unit, dev or prod. It shouldn't be allowed or possible to create instances without any information. Enforce the rules, if an instance has no information it will be shutdown next week and deleted next month. People will take tagging seriously very soon.

I've seen businesses who adopted AWS without any strategy and with revolving contractors as employees. A few months later, it's a hundred instances without a single name or tag, each one costing thousands of dollars a month for the more expensive.

If the company cares about costs and resources management, it's easy to achieve. If it doesn't care, then doesn't matter, not your money.

With the rise of lambda+managed services, I think we’ll start seeing finance and development start to blend and merge.

Besides giving businesses more legibility into what specific parts of their business logic cost to operate vs. the value they generate, you can start building higher-order financial systems based on flows of capital+information within businesses. From there you can implement all sorts of financial engineering like insurance and options+derivatives that could allow businesses to do things like dial up leverage against these flows. Certainly half-baked ideas, but fun to think about the possibilities.

I'm moderately hopeful that the basic idea (close association between activity and cost) will improve, but I have some caveats.

One is that not everything will fit. Lots and lots and lots of workloads exist in their current form and are not going to switch overnight. Whatever bookkeeping method that gets developed needs to deal with the mix of costing models.

Another is that you still need to assign costs. In a lot of companies this is the subject of intense corporate politics. Whether such a method is adopted will depend on who is wielding what cudgel.

Related to which: there will be cases of premature optimisation on the visible. Optimising for cost is fine, but costs often include estimates that can be overlooked. It's one thing to optimise for "least dollars spent per invocation". Another thing to optimise for "fewest pissed off customers". The latter is harder to measure but in many cases more important.

But overall? Yes. I think it could be a step forward.

I was always incredulous at how many free credits AWS hands out to startups (up to $100K per startup), but then every time I see an article like this I'm reminded how large the recurring infrastructure bill can be for a successful business. Such a great business for Amazon.

Hmm, it's like early stage funding and then reaping the benefits from the small number who make it to unicorn level ...

It's quite clever, too, as it likely encourages setups that are pretty inefficient on long-term AWS spend.

It's like the apocryphal drug dealer who gives out free doses to unwitting kids until they are hooked.

This article is one of the first shots fired in the 'War on Clouds'.

I thought they make more than the shopping division.

Part of this is that AWS is often more expensive than on-prem.

The other part, though, is that dev teams in large companies often aren't held accountable for on-prem costs.

The on-prem costs are often in a big baseline number with little visibility into how much of that big number belongs to each team.

I find it interesting that while the price of hard drives has decreased by about 50 percent over the past five years, the price of cloud storage has remained roughly flat.

As margin continues to expand, the need for alternative models and competition in 'Cloud' is becoming increasingly apparent.


I think the end game is probably something resembling electrical grid markets. Electrical power is bought and sold in real-time by traders, as the supply and load fluctuate. There are many players in a competitive market, like nuclear plants, gas plants, solar, wind, etc. There are even complex derivatives (like “swing options”) traded in the electrical markets. Someday computing resources will be bought and sold in this way too, with a large number of players in a competitive market, which will drive down the currently inflated cloud computing costs. We can already see an embryonic version of such a market in the form of AWS “spot instances” and the AWS reserved instance secondary market. AWS is fighting tooth and nail to avoid or delay such a commoditization of computing utilities, which is why they announce 50 new vendor lock-in software products every week instead of focusing only on the hardware resource.

“The Cloud” is a euphemism for extreme centralization of web and other hosting.

I joined this company to build out a software stack for being able to do disaster recovery of sites off site. This resulted in building a system that takes incremental backups of client systems and uploading them to our servers. In a DR event we would spin up VMs backed by the images we had copied to our servers.

Throughout the processes the word "cloud" was brought up. Who's cloud would we use? Over and over again I tried to explain we were actually building a "cloud" service. And by that I mean we are just offering a service that runs on our servers.

Due to the feature set and cost we ended up settling on bare metal servers from SoftLayer (IBM). The entire solutions has been made to run on commodity hardware.

For the longest time the company kept marking that we were using "IMBs Cloud".

Every now and then I get a request to ask what it would take to move to AWS. My response is always the same. More operating budget, and less features.

We spin up DR environments for many servers2 in seconds -- and can do so because we have full control over the hypervisor, so moving to anything like AWS, Azure, or what not means we give up the ability for near instance restore. Owning the entire stack has its own set of problems, but in the long run we should be able to move much faster.

"The Cloud" -> "The Internet"

The two most insidious things I've found in AWS billing:

- If you want the AWS dashboard metrics to have 1 minute instead of 5 minute granularity, that's $2.10 per instance per month. 5 minutes is pretty useless, so either you set up other monitoring or you pay a tax on the number of instances you have.

- If you use AMIs (which you probably should) to launch EC2 instances with all your software baked in already, you will probably end up with dozens or hundreds of old, unused AMIs. Furthermore each of these AMIs is linked to a snapshot, which is stored on S3. S3 pricing is very cheap but it's a significant amount of work to determine which AMIs are no longer in use and to delete both the AMI and the corresponding snapshot. Every 100 AMIs you have at the standard 8GB root volume size costs you $18/month.

> If you use AMIs (which you probably should) to launch EC2 instances with all your software baked in already, you will probably end up with dozens or hundreds of old, unused AMIs.

We previously had this issue. What we do now is instead of leaving old ones on S3, we download and archive them. We just tag them on the dashboard and a nightly script does the rest.

In general tagging is an extremely powerful way to organize your AMIs. This article[0] has good examples of tagging you can follow. We use something akin to the "Business Tag" strategy, with an Excel document. Definitely requires some internal organisation but the cost savings speak for themselves.

[0] https://aws.amazon.com/answers/configuration-management/aws-...

I think that everyone who's used AWS or a competitor knows that it's very easy to rack up a large bill by accident. For companies, the main issue isn't the expense but the unpredictability.

The use is that most people use AWS like if it was a datacenter they don't own.

If you're going for AWS, you should consider rewriting a lot of your services to use AWS features well.

Auto Scaling Groups and Fargate EC2 are some common components I see few companies using.

If your instances are the same size through the day, you are doing it wrong!*

*Exception: If you provide the same level of service and traffic 24/7.

This is [unfortunately] correct. If you marry your company's destiny to AWS, their managed products are far more cost-effective than using them as a datacenter [no doubt this is intentional by Amazon]. If you're just using the EC2 services, much like owning the hardware, unused CPU and RAM is wasted money.

I recently gave Fargate a try and was very unimpressed with the costs.

Anecdata is terrible, but my experience running 1 Fargate container 24x7 on the lowest specs (0.25 CPU and 512 MB RAM) to handle baseline was was going to cost as much as a no-contract T3.micro EC2 instance with much more capacity (2 CPU and 1 GB RAM). AWS Kubernetes was also a bust at $120/mo just to get started (that's the cost before an EC2 server is provisioned).

Fargate has two flavors, ECS or EC2.

EC2 is just EC2 with scaling groups while ECS is a fully non-managed solution that's for small items.

The ECS one is weirdly difficult to set up. If you just want Docker Repo (ECR) -> deploy with some config -> expose a single ingress point it sounds perfect, but I had a TON of trouble with it recently

If it helps, I had the exact same experience. Setup was possible but involved way too many decisions for the obvious use case.

ECS is best done through Terraform, there's existing templates that will allow you to do this fairly easily.

This is one of those predictable things, that is conveniently ignored by certain layers of a business. The more entertaining situation (from my position anyway) is the drama that the rapid Azure adoption by otherwise very non-cloud companies will cause in a year or two, when the free credits Microsoft stuffed into their ELA's go away.

I love running stuff in the cloud, but without a conscious decision in an organization to prepare for, and avoid, the various pitfalls, its just a giant cost sink.

A lot of people don't realize that Reserved Instnaces don't show up on your monthly invoice. So once every 3 years you're hit by a large bill for RI renewal.

"Reserved Instances" is a poison that every AWS user must be aware of.

Figuring out how/if it's used is really convoluted. And getting out early is a pain.

You can choose how to setup your billing for reserved instances.

> You can choose to pay for your Reserved Instance upfront, partially upfront, or monthly

source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts...

You get the best discount by paying fully upfront. If you do this, the AWS bill doesn't show the RI on your monthly bill what-so-ever. Leading to a suprise 3 years down the road.

And more importantly you're engaged to pay for the full year no matter which option you pick, with (almost) no possibility to cancel.

Long story short, only play 1 year full upfront.

One of the basic tenants of engineering is "why is this feature appropriate"; much of AWS can be sliced away with this razor, much of the time. Appropriate tool for the appropriate time. A lot of the AWS systems simply are special purpose and not good for general application.

I'm working on a Kubernetes configuration at home that provides the common & popular features AWS in a consistent & portable manner: blob store, database, etc.

AWS is really expensive. Even for running a small personal websites. I thought I could gain AWS experience AND run a few personal websites on them. What a mistake.

I've moved off all (well like 2) my personal sites to other VPS provider, that starts at $5/month for 512MB VPS.

The worst annoyance is 'Reserved instances'. You are basically signing up for a long term contract that you can't get out of easily.

Lightsail is even cheaper, AND AWS


S3 website hosting for static pages will be cheaper than VPS unless you are running an app.

I'm reminded of this article - http://highscalability.com/blog/2017/12/4/the-eternal-cost-s...

Netflix run a spot market for instances to drive down costs for exactly the reasons mentioned here

companies that think that they can treat aws like their own colo and dont have any semblance of budgeting will always be surprised by their bills. the cloud can be very expensive...it can also be significantly cheaper (think: a retailer with mostly static websites with some light javascript here and there that peaks on black friday). doing an analysis and being responsible can help here.

Sometimes I wonder whether AWS makes more money than GCP because average GCP users develop better optimized systems.

Missed opportunity:

As AWS Use Soars, Companies Surprised by Soaring Bills

Paywall, cannot read article past third paragraph.

Seriously, anything that is not fully readable without going through additional hoops should be removed, or not even posted in the first place.

Gotta recoup those cloud hosting costs somehow!

I'm curious if all of these comments here are people who paid to read it or just extrapolating off a the headline.

This comment breaks the site guidelines, which ask you not to insinuate astroturfing without evidence. Please don't post like that here; it's a toxic trope that people are nearly always imagining and projecting onto one another. Mind you, your post was far from the worst example of this—but it still needs nipping in the bud.

In this case, the history of the users commenting is enough to make it overwhelmingly unlikely that they were paid, and the well-known propensity of users to comment just on a title clinches it.


Sorry, I didn't mean to imply astroturfing. My comment is asking how many people "paid to read the article" (because it's behind a paywall), and not "were paid to comment here".

Still not a great comment I admit, but with the amount of discussion going on I would be surprised if everyone is subscriber or if they are going off of the headline alone.


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact