Hacker News new | past | comments | ask | show | jobs | submit login
AWS costs every programmer should know (2019) (hatanian.com)
221 points by ddtaylor on June 9, 2022 | hide | past | favorite | 158 comments



It's interesting how AWS can keep so high prices on these. But it's just the beginning, the real money comes from when they convince you to run over a dozen vms/containers (all needing storage etc of course).

You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.

You can then get a pat on the head ("pass AWS well-architected review"). Then they innovate "serverless" stuff where you can pour developer man months and years to save on this overpriced things by making everything more complex, hard to monitor/debug and learn new aws specific techs, so you can get faux savings. Here they're buying your long term lock-in by getting AWS specialists and advocates on your long term payroll. They're doing certifications and investing in AWS careers that can carry over to the next employer too.

And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).

Just about now someone in your newly grown cloud herding engineering org champios the idea "I know what will help, let's start using Kubernetes". They've probably never costed / self administered a server in production use, or if they have, they know to keep quiet.


As an "onPrem" & "Big Data" guy I've been astonished at how fast the bill can run up. We are in the middle of a cloud adoption where the big proponent started with "its just money, devs time costs more.. do what you gotta do" to "hey so we need to consolidate these 3 DBs, and X is on vacation so lets spin this service down, and lets not worry about multi AZ/region yet" .. all before we even have a PROD launch, lol.

In any case, was recently amazed at the cost of an RDS instance to host a meager 2B total records of vendor data. For the annual price I could buy one of the beefy servers I use to host 20% of my entire on-prem estate with 100s of vendor data sets. For my on-prem plant, 2B records is table stakes and we have datasets we receive that much data every day. Probably have something like 10T records of data across datasets if I had to speculate.

Similarly on the ops/infra folks cost save, for every on-prem infra guy they think they can RIF, they've hired a cloud infra guy who for sure costs more.

Likewise for the redundancy/backups/etc being "easy/free" in the cloud, we've already lost data in object store because of poorly execute combination of configuration & actions when try to do some permission changes/copy migration. It was completely unintentional and not noticed for weeks. No one actually executed a literal rm command at the time. Just because the object store advertises zero loss / versioning / etc, you really do still need to do backups.

Clearly theres a lot of ephemeral & burst compute / variable cost stuff that totally belongs in the cloud. However for internal apps at a firm where the usage is largely scaled with staffing levels, the amount of fixed compute makes it hard to argue every app and every use case belongs in the cloud.


"Dev or IT admin time costs more."

That is the core argument. It can be true, but it's not always true. The larger your cloud footprint gets, the less true it often becomes. It's also not true if you are doing something that really slams one of the cloud's high cost areas like outbound bandwidth or tons of sustained compute.

If you are running a business or department it's your job to run these numbers properly and not listen to mindless cloud salespeople or developers brainwashed by them.

The cloud is great for prototyping. As you scale there usually comes a point at which the cloud cost line crosses the "hire one or two people" line. If you think you have to hire five or ten people, your architecture is probably too complex.

Of course the cloud industry really pushes complex architectures hard because they know this. Complexity works in their favor on both ends. It makes it harder to get off managed services by adding labor costs and it also means you have to run more stuff in their cloud.


It's interesting to think about the process that lead to this: "its just money, devs time costs more.. do what you gotta do"

Are cloud vendor salesmen doing jedi mind tricks? Or are these decisions just made by incompetent people? Who researches this kind of stuff? It's some kind of a management trends history subject.


There’s also a lot of politics around OpEx vs capitalEx.

In prior firms we’d have to go hat-in-hand for $3M of hardware every 3 years on an upgrade cycle. Of course few were around long enough to be on the requesting or approving side of this upgrade cycle and it would drag out painfully. Sometimes we’d try to get creative and go more just-in-time and come back for $500K every 6 months but the pain would just be more frequent.

On the other hand, $150k/mo slowly growing adds up to more in the longterm, but no senior manager ever has to approve a single $500K-$3M purchase request.


Hardware need not be CapEx. There are plenty of leasing options available for just about everything. There are also some wild Section 179 options:

https://www.section179.org/section_179_leases/


Depends on where the budgets go - salary goes into something they have to answer for - "cloud infrastructure" goes elsewhere in the budget they don't have to answer for.


Right - CapEx vs OpEx

Write big checks for big servers every few years Vs Recurring monthly out the door slowly growing, but maybe 2x higher in total

Also flexibility in not having to do capacity (or much of any other) planning.


There's also the thousand cuts - if you want a big server, you will have to do lots of discussion and arguing about the cost there of, but if instead you're adding a small monthly cost, the arguing isn't as much.

You keep doing that and suddenly someone notices that half the budget is AWS, at which point the "move onsite" dance begins (until the next time the big server argument happens).


Nobody got fired buying IBM^H^H^H Amazon.


I think Elon Musk got technically demoted but at any rate shuffled around for wanting to use Microsoft for the servers, Paypal ended up using Unix instead.


If I was a CEO, and my CTO did such a poor job, firing would certainly be a consideration...


> "its just money, devs time costs more.. do what you gotta do"

Old boss said the same thing, but with administrators instead of devs.

So we go and start up $n worth of AWS instances, and then spend $n/2 to have Rackspace of all people administer it.


> cost of an RDS instance

If RDS is expensive, you can always spin up your own DB deployment, you can use an EC2 instance, or you can just run it via ECS or EKS.

You buy those beefy servers, and where do you put them? you need a data center, where you need data center grade internet lines, you need staff to maintain all that.

> Similarly on the ops/infra folks cost save, for every on-prem infra guy they think they can RIF, they've hired a cloud infra guy who for sure costs more.

I disagree, you will need less cloud infra people, who will likely cost more, however everything will end up being more reliable than having a legion manually maintain everything.

> Likewise for the redundancy/backups/etc being "easy/free" in the cloud, we've already lost data in object store because of poorly execute combination of configuration & actions when try to do some permission changes/copy migration. It was completely unintentional and not noticed for weeks. No one actually executed a literal rm command at the time. Just because the object store advertises zero loss / versioning / etc, you really do still need to do backups.

Did you ever lose data by properly putting something to S3 or Glacier and not doing anything with it?

> However for internal apps at a firm where the usage is largely scaled with staffing levels, the amount of fixed compute makes it hard to argue every app and every use case belongs in the cloud.

It's actually easy to argue: - the world is not going to stay stagnant, if I need 1 more server tomorrow, I can get it in minutes on the Cloud, use it and get rid of it whenever I no longer have a need for it. - the on-prem costs are downplayed, people usually just talk about the cost of the metal and forget the upkeep and extra staff required - the on-prem reliability can be very questionable, hardware failures can cause disasters - cloud stacks can be easier to maintain and hand over, you have less hardware and less operating systems to worry about


This is the playbook though. It's used by everybody.

Microsoft, for example, has all of their certified developer programs. Company that use Microsoft development infrastructure like SQL Server end up getting huge discounts based on how many developers they have on staff with MS certs. This biases companies to hiring people with those certs, which reinforces the perception of value that developers have from getting the certs in the first place. Then when you have a ton of people on staff with MS certs you're NEVER going away from MS technology because at that point the company is so heavily invested in it.

It's very much an evil genius situation.

A few years back I worked for one of these companies and even though I wasn't doing much with the Microsoft stack in the company (they acquired a Rails company) I was still asked to go get an MS certification just to help with the discounts. I'm now a Microsoft certified HTML5/CSS3 developer, which after going through the IE6 years felt like the most ironic cert I could get.

The lock in mentality is very real though. As part of an architecture committee at that company, we went through many months of analysis for some of the issues that were being faced at the company. The root cause was simply limitations of SQL Server combined with horizontal scale out limitations due to licensing costs. There was absolutely nothing that could be done though, because the company wouldn't move away from SQL Server for any reason.

It was hard to watch.


You'll be hard pressed to meet the business protections a MS contract gets your organization by using someone else.


What type of protections?


> And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).

Of all the AWS complaints, CDK is one of yours? I absolutely love CDK and its take on infra-as-code where you construct OO constructs imperatively in a sane language (please, please no more YAML-based DSLs...). I've found that CDK is the only one to have given us the code reusability that all of these solutions always promise while still being overtly hackable.

Debugging CDK code? What kind of wacky stuff are you trying to do? I don't think I've ever had to debug CDK outside of typical "what IAM action am I missing?" or "how do I click some checkbox in CDK?".

I'm curious what you would consider to be better alternatives to CDK.


CDK code also isn't executed at runtime. Even if it somehow is slower to develop and harder to debug (maybe if you're new to it), it's ideal that you have more of it than actual app code per line. That implies you have substantially less app code than you otherwise would, which (of course) is the entire purpose of these cloud-based abstractions.

The amount of app code involved to stitch Kinesis, Lambda, and S3 together is minimal. The amount of app code to solve the same problems without them is orders of magnitude more plentiful and complex, which is why we avoid it.


> The amount of app code to solve the same problems without them is orders of magnitude more plentiful and complex, which is why we avoid it.

The amount of application code to write to a filesystem is... zero.

The amount of application code required to setup a webserver in Java Spring is... zero.

Not sure what Kinesis does, but the amount of application code required to connect to a database is zero (spring), to connect to a rabbitmq/kafka is zero (spring).

And the best part is, it is less complex, mostly open source, and locally runnable compared to the Amazon rube goldberg machine.


There will be substantial amounts of code you are writing to provision, set up, maintain, scale, and troubleshoot all of those things. Whether you call it app code or not, it's something that you (and your team) will be on the hook for building and maintaining over time.

Getting your Java Spring web server to scale to thousands of concurrent instances will require a lot of undifferentiated work outside the scope of Java Spring. Same with the care and feeding of RabbitMQ and Kafka. And to suggest that's the approach that's less of a Rube Goldberg machine is... questionable.


What sort of application are you building where you need a thousand instances of a spring application?

Like even hosted on a low powered hardware, Java services can easily deal with order of 100 requests/second. This is home turf, this is what Java is really good at. That 100rps may be lowballing it. So you're what, processing 100k+ API requests per second? I can't imagine that would translate toward any less than a billion users.

At that point you are either making good headway toward becoming a new letter in the FAANG acronym and ought to have the budget to run your own data center, or your attention is better put toward optimizing your comically inefficient application.


What sort of app code are you writing where you can keep data, authentication and user data all on the desktop?

Just because you can write a docker-compose.yml (or do without) that describes a system that holds together, doesn't mean the system you've produced stands the chance at SOC2 or whatever standard you hold dear.

Having just finished a 27001 audit, I promise, "we rely on AWS for X" went a lot farther than "we have a homegrown system that does X", every single time. AWS - box checked, homegrown? Let's dig in for two more hours.

Thousands of instances is easy to need - if you have an active CI/CD lifecycle - you just might not need them all at once. Even just one per PR and it adds up super fast if you've got a competent developer team.


Strong "forgotten how to count that low" vibes from this one.


Or this Java-based Spring application is one of dozens or hundreds of ones built by different teams, for different customers, or for different purposes. Some of them may even be applications not built in Java or Spring.

You're missing the point, though.


> There will be substantial amounts of code you are writing to provision, set up, maintain, scale, and troubleshoot all of those things.

And the amount of infra code you have to write to do that in AWS is any less? No, it's just as much, except more arcane.

> Getting your Java Spring web server to scale to thousands of concurrent instances will require a lot of undifferentiated work outside the scope of Java Spring.

Scaling a RESTful webserver is an exercise in slapping a load balancer in front of it and running a few more instances. Not exactly rocket science.

> Same with the care and feeding of RabbitMQ and Kafka

Same on AWS, except now you are at the mercy of someone else's whims to see this works when it fails.


> And the amount of infra code you have to write to do that in AWS is any less? No, it's just as much, except more arcane.

It's substantially less, and declarative. It also includes stuff commonly pushed to phase 2 in on-prem deployments, like TLS between entities, proper IAM, and built-in metrics/logging.

> Scaling a RESTful webserver is an exercise in slapping a load balancer in front of it and running a few more instances. Not exactly rocket science.

If it's stateless, yes, though I was just responding to an example. Not all technical problems are so simple.

> Same on AWS, except now you are at the mercy of someone else's whims to see this works when it fails.

The business is always at the mercy of someone "else," whether it's you, the person who replaced you, or AWS. And AWS is way better at keeping S3 running at scale and maintaining this stuff over time than your average IT / DevOps employee. Same with Lambda, Kinesis, DynamoDB, SQS, etc. Unless you've got special requirements, you're wasting time and resources reinventing the wheel.


> It's substantially less, and declarative

There are declarative ways to manage bare metal machines, such as terraform, nix and puppet.

> And AWS is way better at keeping S3 running at scale and maintaining this stuff over time than your average IT / DevOps employee

Yeah no, the outages we faced _because_ of AWS were way, way worse than what we faced with our average IT/DevOps employees at my old jobs. This keeps getting repeated and it's going to keep being wrong.


I would also be interested in what would be the alternative not in the cloud? AWS deployments can get complicated due to the IAM rules, networking rules, security groups. But if I told someone, bring up a stack like that on bare metal, and make it repeatable what would that be?

For those of us that have worked on systems pre-cloud it does feel sometimes that it was easier in the past. But if you went to explain to someone new what to do, you realize how much you are taking for granted the knowledge and effort involved. Also the long term maintenance.


I love CDK but it does have issues. Like if you wanted to launch your load balancer in a new AZ you would need to structure the code to carefully create the new infrastructure, pivot your DNS to point to the new load balancer, then you can delete the old infra. It’s more complicated then just adding more subnets unfortunately


> You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.

Why would you care about the redundancy on staging / dev?

Just making up things to inflate AWS costs now.


If you're not testing az failovers you're probably just wasting money, like untested backups... But it's true that most people don't know this and aren't testing it because they don't know that az failovers don't automatically just work(tm). And the second downside of course would be asymmetry between the envs and divergence in the IaC etc resulting in more complexity and engineering work.

(Of course you're probably wasting money anyway since you don't actually business-wise need better uptime than the single AZ, and your complexity induced human fumbles will cause much more outages anyway, but this has been a main selling point of the decision to go to AWS, so the requirement needs to be defended)

Yes, you can build automation to have the redundant stuff up only sometimes, if you eat the engineering effort and complexity in your IAC and build automation... in the general vein of justifying engineering spend to offset AWS operating costs where running containers is very expensive!

TLDR: either way you end up paying for the very high markup in compute prices, it'll just be easier to excuse jumping through expensive hoops to "save money" on it.


If you’re building your own infrastructure in a data center then sure you absolutely want to test your redundancy.

But with AWS it’s a checkbox. It’s transparent to you and your applications. The infrastructure to host in multiple AZs is already in place. The only real issue with MultiAZ is the failover in RDS depending on the database you use could be seconds or 10s of seconds.


IME not so in the real world, you'll have accidental state in your distributed system outside the DBs. You'll have some stuff that actually always runs in one AZ in normal circumstances and your integration partner in another org has whitelisted only its IP. Etc etc, everything that is untested will find ways to conspire to rot. Especially if you haven't learned by seeing these bugs so you can avoid them.

Also you won't have a clear experience and understanding of what happens in the failover and you'll won't know to avoid failover breaking mistakes in your VPC configs, security groups, frontend-backend shared state etc. (And by "you" I mean "your dev team", it's not enough that one guy gets it).

Also^2 if you read the news about all the outages it's very common for failover systems to fail generally, not just AWS - the general engineering wisdom is: always test your failovers. And there's no substitute for end-to-end testing it, instead of individually testing each layer/module. (Bad: "we can skip testing db failover", good: "let's test that the whole system works when there's a az failure")


Dealing with this now for a client. Can't test Redshift AZ relocation feature because there's no way to simulate AZ failure. Only safe bet is full multi region with DNS switcheroo.


Back in colo days, I saw a lot of post-mortems that read “the thing we thought was redundant wasn’t”, leading me to call them “redundant like an appendix [rather than like a kidney]”.

We instituted quarterly “game day testing” where we forcibly turned off one of the redundant items in all of our systems. It took us about 6 such cycles before these tests didn’t turn up outages that were just waiting for us.

Thinking back on those, it’s hard for me to believe that most cloud hosted companies are prepared by checking a box without actually testing.



> Thinking back on those, it’s hard for me to believe that most cloud hosted companies are prepared by checking a box without actually testing.

We are talking about MultiAZ, Availability Zones. Not different regions. Setting up redundancy across regions is not easy. But for majority of the people using AWS a single region with MultiAZ is good enough.


It's hard to simulate an AZ failing, and when it does, half the internet goes offline, so people don't really complain that much.


About 5 or 6 years ago we had an alert in the middle of the night that our RDS instance dieded. It failed over in about 15 seconds (SQL Server so it’s a bit slow compared to PostgreSQL) but the MultiAZ worked as advertised. The downside is AWS never told us why it occurred.


They don't tell you because you're not supposed to care, and there's no human involved in the process to do a post-mortem.

Something on the instance host died, most likely.


I’ve seen a few AWS instance hardware failures, they happen with some regularity. You can handle single instance failure without being multi AZ. Testing an actual AZ failure, as in the whole AZ going offline or getting partitioned from the other AZs, is pretty much impossible.


AZs are connected via normal user visible networks, you can just break those. They even provide examples, https://github.com/awslabs/aws-well-architected-labs/tree/ma...

Those are basic (don't cover flapping or glacial-speed slowdown degradation modes, some services only, etc) but a starting point at least that can be extended.


Huh, didn't know about aws rds reboot-db-instance --force-failover


>But with AWS it’s a checkbox. It’s transparent to you and your applications. The infrastructure to host in multiple AZs is already in place. The only real issue with MultiAZ is the failover in RDS depending on the database you use could be seconds or 10s of seconds.

Have you actually seen this work on your project in practice ? Like a region go down and another region picks up automatically and it kept working just by switching a checkbox ?


Multi AZ is multiple availability zones. Not multi region. Distribution over multiple regions is obviously harder than within the same region and different zones.


> But with AWS it’s a checkbox.

And I'd test the checkbox all the same. We learned just this week that one of our setups, which checks the cloud-provider provided box to have its VMs distributed across 3 AZs, is susceptible to the loss of a single AZ. Why? Because the resulting VMs … aren't actually distributed across 3 AZs as requested. (The provider has "reasons" for this, but they're dumb, IMO. It should have been as easy as checking the box.)


Are AZ failovers not done automatically? Where can I read more about that?


They are from user pov just separate networks where you can deploy separate copies of services, replicated db cluster nodes &whatnot. Each service handles (knock on wood) an az becoming unreachable/slow/crazy independently. Which can become fun with service interdependencies, and a mix of your self implemented services + AWS provided ones.

There's high level description at https://aws.amazon.com/about-aws/global-infrastructure/regio...


Even if you don't want it sometimes you're forced to run multiple AZs (eg: EKS requires 2x). But that 18x figure is nuts. VPCs, AZs, subnets, IAM etc are free. The cost comes from what you deploy into them. So separate environments don't have to be as expensive as production. You can scale them to zero, use smaller computer instances, run self-managed versions of expensive stuff (like DBs) or simply run small single-instances of RDS instead of large redundant clusters. Non-prod environments they're a great place to experiment with aggressive scale-down on cost while observing performance.


Whilst the VPC itself is free, they get you on the NATGW - which you probably need along with the VPC in most cases.


Had to use a NATGW temporarly on AWS Lambda's to access the database and make remote http calls. But now it all works without the NATGW. Haven't had any other need for one.


IIRC, you want one per AZ, or you'll get billed for extra cross-AZ traffic when instances in zone B's private subnet use the NAT gateway in zone A.


You don't need NATGW at all if you use ipv6 egress-only gateway (which is free I believe)


> Why would you care about the redundancy on staging / dev?

If you deploy to multiple regions then it wouldn't make no sense at all to have a single preprod stage, specially to run integration tests.

Also, keep in mind that professional apps support localization, and localization has a significant afinity with regional deployments. I mean, if you support japanese and have a prod stage on Japan, would you think it was a good idea to run localization tests in a us deployment?


Currently have a deployment in Oregon and cloudfront servicing most of the world. With localisation. No need to deploy in Japan to support Japan. The only tricky one is China because of the firewall. Tho that hasn’t been an issue for the last few years as they haven’t been blocking cloudfront completely. (This has been running for 10 years in AWS without issue)


> Currently have a deployment in Oregon and cloudfront servicing most of the world. With localisation.

So you're serving static assets through CloudFront with a single backing service. Congrats, you managed to have a service that doesn't offer regional services.

Also, you definitely don't support clients in China, or enjoy shooting yourself in the foot.

Most professional applications with a global deployment and paying customers don't have the benefit of uploading HTML and calling it done.


So 10+ years of supporting broadcasters and creative agencies around the world from 1 region is not supporting those countries. Got it. I mean there’s services in those regions for the tasks performed but even then it doesn’t require the level of testing you’re assuming it does.


If you're deploying to multiple regions in aws then you'd presumably have to roll out the infrastructure for 3 regions yourself if you weren't? In which case I gotta assume using aws is probably a lot more straightforward straight off the bat than rolling your own solution in three different data centers?


This isn't the only way, though.

Cloudfront can distribute static content and backends running on Lambda@Edge.

DynamoDB supports global distribution (eventually consistent).

If you don't have massive scale, they're quite cheaper than self managing (infra itself and ops people involved).

Yes, these services require adaptation to your old habits of freely controlling your own servers.

As for vendor lock-in, even on EC2 you'll end up with some. Difficult to argue how much, there's no reproducible studies on that.

There are programming practices to minimize the costs of vendor switch as well.


The original machine has a base-plate of prefabulated amulite, surmounted by a malleable logarithmic casing in such a way that the two spurving bearings were in a direct line with the pentametric fan. The latter consisted simply of six hydrocoptic marzelvanes, so fitted to the ambifacient lunar waneshaft that side fumbling was effectively prevented. The main winding was of the normal lotus-o-delta type placed in panendermic semiboloid slots in the stator, every seventh conductor being connected by a non-reversible tremie pipe to the differential girdlespring on the "up" end of the grammeters.


Not really sure this adheres to HN guidelines [1].

What's the substance here? What respectful idea are you adding to the discussion? Maybe this kind of comment would fit better a Facebook group?

If you like this community and benefit from "thoughtful" and "substantive" ideas shared here, I'd suggest reading these guidelines.

They are meant to avoid HN from descending to shallowness and disrespect, like many other communities around the web ended up.

[1] https://news.ycombinator.com/newsguidelines.html#comments


I think this is the line that triggered the parent:

"Cloudfront can distribute static content and backends running on Lambda@Edge."

The really scary thing is that the aws products being lampooned are the mainstream ones. Amazon has, what, another couple hundred "offerings"?

Here on HN, there are people that say if you're using AWS only for the basic EC2 you're doing it wrong. They can be correct in the tactical sense, but in the strategic sense, AWS is cackling with glee.

If I were building an IT org, I'd have an Infra as code foundation that from the foundation supported both AWS and other options that are far cheaper. AWS for prototyping and DR/redundancy but it "seamlessly" (I can hear the eyeballs roll and you're all not wrong) will deploy to non-AWS.

It's crazy that the basic tooling for EC2 / google computer / azure / etc to do this is only at the quarter-state. Hey HN, isn't there a massive opportunity for a company to meta-manage this?


For anyone reading this for the first time, well worth the watch: https://m.youtube.com/watch?v=Ac7G7xOG2Ag


Thanks for sharing, gave me context.

But the parent comment and video were not worth the time at all.

Maybe I'm failing to see the value? Happy to reconsider if you have more perspectives.

It seems to me such video will only serve as fuel for cynic behavior. Which I wouldn't like to see more on HN.


it's an older code sir, but it checks out


This seems overwhelmingly pessimistic. VMs, serverless, and Kubernetes all panned without a suggestion of an alternative. The theme is knocking on AWS, so is your suggestion to go back to colos and self-hosting? Does that really sound better than infra-as-code?


Self-hosting and infrastructure-as-code aren't mutually exclusive.


This doesn't feel like a meaningful distinction.

You can go from an empty cloud account to a fully-provisioned application stack in minutes with the right infra-as-code. Provisioning infrastructure at that level is mutually exclusive with self-hosting.

But true, even with self-hosting, you can use infra-as-code at various levels (Ansible, OpenStack, self-hosted Kubernetes). It's just not really comparable to having your entire stack defined and provisioned that way.


Who cares when you can accidentally bankrupt your company with the wrong keystroke?

Time to spin up is nice for prototypes, it can be very freeing.

The mistake that people make is assuming AWS can host things better than they can, since they’re only mere mortals: it leads to a sort of learned helplessness where you assume the cost of a thing is it’s true value and not an egregious markup.

My biggest gripe with these cloud providers (except maybe GCP) is that they promise less ops work and it can be true in the beginning; but over time the ops work becomes basically the same burden except esoteric to the provider.

It’s IBM mainframes again, with specialists on IBM pushing more IBM because it’s their bread.

But overall, if you know your problem then time to create an instance isn’t very valuable, believe it or not: the majority of workloads are not excessively elastic, at least the upper bound is not unlimited like many people seem to claim.

Apart from that !cloud != self-hosted. There’s plenty of hardware providers that can get you a dozen machines in under an hour; even discounting virtual host providers like vultr and Tulsa.


The "bankrupt your company with the wrong keystroke" is not entirely accurate. AWS does work with companies (or even individuals) if they genuinely made an error that wracked up a huge bill. Personally they have dropped bills of $1000's when I made a bone headed mistake and have seen companies get $100 000's of bills credited due to the same issue. They are not in the business of ripping people off in the short time who would spend a lot more than that in the long term.


> Who cares when you can accidentally bankrupt your company with the wrong keystroke?

I think it's fair to say millions of AWS customers manage to avoid bankruptcy quite successfully. If you're incapable of following basic best practices, setting billing alarms, and using IaC, I agree AWS is probably not the best choice for you.

> The mistake that people make is assuming AWS can host things better than they can, since they’re only mere mortals: it leads to a sort of learned helplessness where you assume the cost of a thing is it’s true value and not an egregious markup.

Of course they can--they do it all day every day at Internet scale. I don't want to build S3 everywhere I go. I just want to use it. You could call it a "learned helplessness" in the same way that I don't want to build my own RDBMS, either. I'd rather just use Postgres. This is just the next level of abstraction.

> My biggest gripe with these cloud providers (except maybe GCP) is that they promise less ops work and it can be true in the beginning; but over time the ops work becomes basically the same burden except esoteric to the provider.

It really depends on your team and architecture. If you try to be as cloud-agnostic as possible and abstract away AWS from your devs, then absolutely: you're going to be churning through a lot of ops work that feels like it could be done elsewhere. The more cloud-agnostic you try to be, the less value you'll get from AWS. I've seen numerous companies learning this with EKS (hosted Kubernetes).

That's not the only approach, though. The managed services (like S3, Lambda, DynamoDB, and Kinesis) really add a whole lot of value with substantially less code if your devs are willing to use them. They can even cost less, and especially so if you factor in time spent toward development and building/maintaining alternatives.

> But overall, if you know your problem then time to create an instance isn’t very valuable, believe it or not: the majority of workloads are not excessively elastic, at least the upper bound is not unlimited like many people seem to claim.

A lot of time you don't know your problem, and the minimum instance size for a dev environment in a cloud-agnostic architecture can be significant. Scale to zero is a big help not just in deployment time, but also developer productivity and autonomy.

> Apart from that !cloud != self-hosted. There’s plenty of hardware providers that can get you a dozen machines in under an hour; even discounting virtual host providers like vultr and Tulsa.

I acknowledge they exist, but I do think they're a fraction of the market for good reason. The value adds of cloud providers often outweigh the costs and complexity of the undifferentiated heavy lifting needed for necessary feature parity.


I think one of the major contributors that is often not mentioned with cloud providers are the actual setup/human maintenance savings. As is, the premium that is added to the 'managed' services is usually much less than what would cost a DevOps to maintai /run. I.e - it is safe to assume (at least the market that I'm in) that 1 hour of DevOps services will cost 150USD. 10 hours would be 1. 5K USD. That is actually enough to host a PHP based medium sized ecommerce solution by using purely managed services for about a month (<500 orders/day, using AWS ECS, RDS, ALB, OpenSearch, ElastiCache). Sure, there is still the DevOps cost attached to such solutions, but it is drastically lower than on premis. I've been using these arguments with my customers to migrate them to AWS for the past few years (since Fargate wide adoption) and its been great so far. I do not think that saving money should be the primary reason to migrate to the major cloud providers. The primary reasons should be the increased robustness, ease of maintenance and additional tools that you get.

PS. This is written entirely from the perspective of running multiple small/medium ecommerce solutions (500 - 2000 usd/mo AWS bill) using the stack mentioned above. I have no real world experience doing grand scale setups.


Perhaps I am confused or we are talking about different things and consequently talking past each other, but I can literally open a new AWS/GCP/Hetzner/DO account, plug the credentials into my local configuration (in code), and then run a command with NixOps to provision an entire network of machines with custom specifications, and to automatically install all the software I need on those machines.

Perhaps you aren't familiar with what NixOps and similar tools can give you?


It sounds like we're using different definitions of the word "self-hosted." If you have your own on-premises lab or colo rental, you're not using AWS/GCP/Hetzner/DO, and you've got a lot of undifferentiated heavy lifting before your NixOps kick in (including maintenance going forward).

If your point is that you can avoid a serverless architecture, still use cloud, and still use infra as code: of course you can. We've got to be disagreeing on what "self-hosted" means. OP criticized the cost and complexity of deploying EC2 instances and RDS databases across AZs, so presumably infra-as-code wouldn't help him here. OP didn't present an alternative solution, but reading between the lines is to not use cloud infrastructure (e.g. on-prem or colo).


Hetzner offers colo rental ;)

That said, if you just rent dedicated servers from them, you don’t have to worry about maintenance, but don’t have to pay the ridiculous cloud markups either.


I didn't knock on VMs or infra as code generally, or suggest you go all the way to self hosting. Doing IaC of course is orthogonal to AWS. Your best alternatives depend on your needs. It might be some higher level app platform too, like various current Heroku style platforms etc. Just know what you're getting into.


You've got it backwards - which is to some extent why you have this problem:

If you're fully utilizing your instances, you're working from the bottom of the pricing up, and all these things (3x redundancy, etc) are marginal costs of doing business because you're successful. What you're opting out of is the data-center workers, the sysadmins, the operators, and the expertise that comes with if it offered you value.

Your hundreds of lines of CDK are people you're not employing to write them (it's your job I guess?), and applying the AWS systems and services to do that work for you - given you've chosen CDK, I have to assume you've bought into AWS anyhow, and someone in the chain sees the value extracted or savings approach.

Where AWS burns you is when you don't know what you're doing - you have a PHP app, so you put in on an instance, you need it reliable so you have two, you need shared storage so you put it on EFS, and so on and so forth.

If you're at the other end of the spectrum, and you want to buy the level of reliability and redundancy you get from three of the smallest instances in AWS spread in three AZ's, you're looking at a six figure investment. So yes, it's pricy if you use it like you use the desktop under you (8 hours a day, usually at 20% load), but you can also run millions of hits for pennies if you learn that most of what you're doing is overhead that gets built into, well, serverless.


At scale, perhaps they are costly. But i just cannot imagine deploying a webapp without cloud services.

For the indie dev, cloud services have made side projects and experiments extremely cheap to spin up and scale.


The idea of deploying an app to a machine sitting next to me, and serving the few requests per second just from that, is highly appealing. The only thing I'd always want is offsite backups which are best done to the cloud. You've inspired me!


Cloud Architecture: tech's version of a record label contract


Hire a better engineer/architect.


Indeed, this is often the next step. The new architect must have lots of AWS certifications. With the new architeture, the long term cost savings will be exponential, compared to the current projected cost curve!

This can optionally combine with the Kubernetes scenario.


Exactly, then you need at least 1 architect per app. I wrote an article about this problem: http://blog.bytester.net/posts/cloud-talent/


This was a great read, thank you. I’m just now removing my application-focused blinders, but I already see the same technical, organizational, and financial issues.


> This can optionally combine with the Kubernetes scenario.

Some people like to poke fun at the expense of kubernetes, but if they did any professional work, at the first failed deployment they would sell their firstborn to have something similar to kubernete's deployment rollback feature.


We don't use kubes. Deployments sometimes fail and we deploy the old versions easily. No firstborns sold.

I have used kubernetes blue/green deployments in an old job and it was beautiful. But to say kubernetes is that much better compared to rsyncing your executables to the server and restarting the service is plain wrong. It's a bit easier and a bit more declarative, sure, but manual rollbacks were a thing fifty years ago and still are.


> You need to be triply redundant on 3 availability zones, (3x) both with the RDS db cluster and app containers (2x) . And then have separate dev/staging/prod envs (3x). That's 18x.

Don't believe this has anything to do with the cloud, if you want the same thing with your own data center, it's going to be 18x there as well.

> You can then get a pat on the head ("pass AWS well-architected review").

Or you can be pragmatic, the cloud will allow you to do whatever you want, nobody is stopping you.

> Then they innovate "serverless" stuff

Serverless does not replace everything, there are some things it's good for, but not everything, if you use it for something that it's not useful for then that's an issue with your judgement, not with the fact that it exists as an option.

> And don't even get me started on how much work by this time has gone to build the infra-as-code to manage the rube golderg machine, you'll (seriously) have more lines of CDK code than app logic (that was slower to develop & harder to debug than your actual app code per line).

I have thousands of lines of YAML and MD that I can give to anyone who is skilled enough to copy and paste and that person can 100% replicate the environment without me helping them once.

If a component fails, I can just rebuild it very quickly and 100% accurately by just re-importing the Cloud Formation code, if I need an environment that is 100% same as the one that I already have, I just need to re-run a few scripts and I have it.

What credible alternative can you propose that is better than infrastructure as code?

> They've probably never costed / self administered a server in production use, or if they have, they know to keep quiet.

I could make a similar snarky remark about sys-admins not understanding that the world is not static, however I would instead focus on the fact that acquisition of hardware is nowhere near the total cost of the infrastructure.

According to a 3 second Google search, sys-admins in the US can cost around 75k-140k. Sys-admins can also slow down development by having to perform more manual actions and having to be more over protective, that adds additional costs, because you will still have to pay your idle developers while they are waiting for the sys-admin to action the ticket where they requested write access to their own home folder.


You could spend less than 18 x AWS cost, because AWS compute is so expensive, depending what hosting you chose. A self owned DC is isn't the best option for most. You might also skip the overkill (for most cases) AZ redundancy because it wasn't pushed on you. You might even go with a much more managed platform, like the Heroku like ones. Depends on what you build and for who.

I'm not knocking IaC, but unnecessarily huge amounts of IaC to manage unnecessarily complex AWS service infra after you can't afford the monolith + db model and get roped to dynamodb, lambda@edge, API gateway, step functions, etc all requiring IAM roles, security groups, observability tooling, cicd & version control complexity, config services etc etc all the downsides you read about in microservice horror stories. And you can't even ssh in and strace or tcpdump the stuff like you could with your own microservices, they're black boxes.

I also don't want separate sys admins, at least the bad kind you describe. But having your own servers, or knowing what they cost, doesn't mean you would.


> You could spend less than 18 x AWS cost

You can, but I'm sure you are making a tradeoff to achieve that, sure that trade off might be worth it for your use case, but I seriously doubt you could get exactly the same thing you get from AWS for substantially less, you need to factor in the reliability and flexibility aspects, if an EC2 instance, fails, I can just get a new one, if your own server fails, hopefully you have another one on hand.

There are other things that make the AWS offering compelling: you get everything in one place and you can expect to find IT people who know how to operate AWS, whereas that is probably not true for smaller competitors who might be cheap but also have less to offer in terms of services and mindshare.

> AWS compute is so expensive

I do auto scaling, which means that if nobody is using the application, I'm paying for a single small instance and then I use ECS tasks for offline computation, which means I end up paying exactly for what I use.

If I were hosting my own on-prem solution, I would still end up building something similar.

> AZ redundancy because it wasn't pushed on you.

If you work for a client that wants AZ redundancy, likely they are a client that is more than happy to pay the premium. For some clients, reliability matters more than infrastructure cost. For those clients infrastructure cost might even look like a rounding error compared to all the other costs.

I generally have no problem implementing a requirement that I don't believe is technically needed, if the business wants it, if they believe 3x the cost is worth it for the additional resiliency.

> you can't afford the monolith + db model and get roped to dynamodb, lambda@edge, API gateway, step functions, etc all requiring IAM roles, security groups, observability tooling, cicd & version control complexity, config services etc etc all the downsides you read about in microservice horror stories.

1 instance(s) x 2.32 USD hourly x 730 hours in a month = 1693.6000 USD (Aurora PostgreSQL Compatible DB)

A single developer will cost you at minimum 10k per month.

From a business perspective, I'm more than happy to pay thousands more for RDS and potentially overpowered EC2 instances than wasting far more expensive developer time on going nuts with serverless. For small things where it makes sense, sure I'm happy to go with serverless and save a few pennies, but not at massive developer time costs.

> And you can't even ssh in and strace or tcpdump the stuff like you could with your own microservices

Then you should just stick to EC2 instances if that's your cup of tea, though the idea of debugging individual instances becomes distant when you start auto scaling or start using containers.


It is strange that bandwidth/traffic/egress seems to be an afterthought in this post and I think the numbers in the linked blog post are wrong anyway.

Bandwidth out of AWS is at best 9 cents per gigabyte, which is HUGE and for many applications the primary cost.

Indeed when I saw the title of the post I thought "there's only one number you need to know with AWS, it's 9 cents per gigabyte".

Please correct me if I am wrong. I went looking to find the correct number but as always it is hard to find the exact prices.


I've moved probably thousands? of workloads to AWS in the last decade. Not a single one of them has been 'internet facing' or pushed much/any data out of the environment. Average bandwidth bills are <$1k/yr.

I'd guess most of AWS's customers aren't....that kind of shop.

If you are, that's definitely something to be aware of though, no doubt.


In every single AWS setup I've worked with (~10), bandwidth has been a not-insignificant portion of the cost. And its been surprising each time for anyone looking at the bill that bandwidth was as much as it was.


I still maintain bandwidth costs are the most poorly understood in the current CTO/CIO/Dev generation.

The "cloud native" generation has (of course) never purchased bandwidth (as in circuits, cross connects, etc) and the result is "Well, AWS charges $0.09/GB so that's what it costs".

I wouldn't be surprised if AWS is capitalizing on this blind spot and offering many services as loss-leaders knowing their 100x markup (or whatever) on bandwidth more than makes up for it.


Pricing is different for large customers, they can get special discounted rates.

For the small guys, that's the ballpark as I understand it also.

And I agree with you. That's the #1 item when building online infrastructure.


The complaints about cloud costs vs on-prem often compare apples with pears. Complaining that a machine that "I could buy for $1000" costs like $4000/year on cloud missing an enormous number of costs that people don't consider as well as the reliability/availability that you are paying for. Do it the same on-prem and you need to include the costs for:

* An air-conditioned room + maintenance (this could be very large if you are a big company) * How many staff at $50-100K per year to look after it * How to deal with internet reliability - do we need multiple high bandwidth broadband links at $1000+/month? * Ongoing op-ex costs for failed components. * Cap-ex replacement of hardware after 2-3 years

The point of the cloud is that these costs are wrapped up in your rental cost so of course it looks high.

If you have the people/skills and need control over exactly how your hardware is provisioned, go on-prem and good luck but plenty of us would rather concentrate on what we are good at and are happy to pay 50-100K per year to a cloud provider to do it for us.


The complaints about the complaints about cloud costs vs on-prem often deliberately ignore that there is a spectrum of options between "cloud" and "on-prem". This is not a binary choice between these two extremes.

Your air-conditioned room plus networking can easily be rented from a multitude of providers, for example, in different variants from "el cheapo" to "ultra-high-professional with lots of fancy certificates". You can then place your own hardware in rented racks, you can buy your own racks and place them in rented space, or you can rent everything including the hardware.


Our colo even offers managed Kubernetes on top of the rented hardware.


> If you have the people/skills and need control over exactly how your hardware is provisioned, go on-prem and good luck but plenty of us would rather concentrate on what we are good at and are happy to pay 50-100K per year to a cloud provider to do it for us.

This is the only thing that matters when you talk about AWS. All your other points are not mutual exclusive.

I love the "good luck". As if it's some magic that we do to get applications run on the internet. It's not as hard as you think. Even scaling. These are solved problems and just abstracted away by AWS behind a paywall.


> I love the "good luck". As if it's some magic that we do to get applications run on the internet.

For those of us that haven't gone through the internet of the late 1990s or early 2000s as a software developer, where it was the norm for everyone to self-host their hobbyist or semi-professionally-operated online services on rented or bought linux boxes, it actually often is some kind of black magic.


for a single box the extra cost is roughly as much as you need for your personal computer.


This is a tiny bit out of date, but the costs have not changed much from what I can see. Also, for a point of comparison:

                        Azure      AWS
    CPU core / month    $40.32   $34.56
    GB ram / month      $10.00    $8.64
    GB logs              $2.30    $0.53
So it seems that for the "latest gen" VMs currently available, Azure does charge a little bit of a price premium over AWS, but it's not a massive difference. Also, costs have gone up but it's roughly in line with inflation. Also note that 2022 era CPUs are somewhat faster than 2019 era CPUs, so you get more for your money.

The biggest difference between the two biggest cloud providers I've noticed is Azure Log Analytics versus AWS CloudWatch logs.

Azure's logs are nearly 5x as expensive as AWS to ingest, and AWS is already overpriced in my opinion. Charging $542 to keep 1TB of logs for a month is insane, because they're stored in a compressed columnar format and aren't actually that big on disk. Azure's $2,350 for a TB of logs is highway robbery and makes it impossibly expensive to use many of their dependent services in the way they are supposed to be used. For example, ingesting all web logs can cost more than the web server being monitored! Similarly, their SIEM security product Sentinel is stupidly expensive if you enable all of the data feeds it supports. Think 5 or 6 figure sums per month.

PS: Another thing that catches people out when designing cloud solutions is disks are typically "fully allocated". If you pre-provision an empty 1 TB disk, you're charged for the full terabyte, not your current usage of it. This is not what VMware vSphere does (typically), so many people simply ask for 1, 2, or even 4 TB disks, put 20 megabytes of application code into them, and then they're unaware that they're overpaying for storage by a factor of 50,000x or more.


Network Egress always gets a bad rap but recently it's the SSD cost on AWS/GCP that has me upset. IMO storage needs to take on some of network egress's heat :)

$0.10 per GB-month for a basic SSD is outrageous compared to Hetzner et. al. a 4TB SSD for a month costs $400 on the big guys and like $50 on the commodity providers. It's not even a managed service!


EBS is a managed service with very high redundancy (3/5 copies of your data, I don't recall). Hetzner's SSD is just that, an SSD.


I've always been bad at math but isn't 4TB for $400 approximately $0.10/GB ?

As another commenter mentioned EBS is quite a bit better than just a SSD in a number of ways. You can resize it, it has published resiliency numbers, quick and easy snapshots, etc etc

Also worth noting that the newest generation (GP3) is 0.08/GB/Month although like all things Amazon there are other charges to take into account.


> I’ve always been bad at math but isn't 4TB for $400 approximately $0.10/GB?

Yes, I was trying to say that amazon and google charge $400 for 4TB (a lot!). And yes, EBS is better but what about when all you need is actually just an SSD?


The one AWS service that I pay for is S3 Glacier Deep Archive.

It is very cheap insurance for knowing with virtual certainty that your data exists. Egress is expensive ($100/TB last I checked), but in the target use - backups of last resort - I will be very glad to pay that to have my data back. And if all goes right I'll never pay it.

Personally I find ordinary S3 far too expensive to justify. Not so with Deep Archive.

Note, I work for AWS. Opinions are my own.


It might be cheap now but there is absolutely no guarantee that one day they'll send out a fairly nondescriptive AGB update email and increase the prices without you knowing. Then because all these ever so helpful cloud services don't allow you to create payment caps the costs can explode any day. It's all one executive decision away.


Historically speaking, in the ~15 years AWS has been around, have they ever increased prices for any service? In my experience, they've only ever decreased prices of their services as advances in tech have been made rather than increasing them. Sure some start out being expensive cough Macie and Fargate cough but they didn't increase prices after being launched as far as I'm aware.

Assuming they never have increased a price in 15 years, that doesn't prevent them from doing so, but it's a confidence inspiring track record.

I have about as much confidence that AWS won't raise prices on existing services/features as I do that Google will eventually kill another good project - https://killedbygoogle.com/


The fact that they don't have a simple price cap setting proves to me that they have no intention of being on my side. Promises of businesses that they won't increase prices are .. well, it just means almost nothing to me. It's "nice".


I feel like AWS has a pretty good track record at this point for not increasing prices - certainly to the point that I trust them not to pull an Oracle. (Hypothetically, my volume of data stored with them is low enough that even a 10x gouge would just make me angry, not wound me financially. There's not a mechanism for these archives to incur explosive costs, but your point is why I avoid other less-predictable cloud services.)


Don’t be silly. It’s a game theoretic certainty that they will indeed pull an Oracle.


There's no need to act condescending or call people names ("silly"). A 15 year track record and a public commitment aren't things you can hand wave away.

I'd also point out there's no substance behind your "game theoretic" jargon.


The parent put it too strongly but it's equally naive to assume that track records and public commitments mean anything to a privately-held company, with a commitment to profit shareholders, run by Jeff Bezos.


I can’t believe this needs to be spelled out.

Of course AWS needs to make money.

Of course they make more money in the short term if they raise prices.

Of course this affects their long term revenue and profitability because customers are afraid that prices of any of the products they use could rise.

Say whatever you want about Amazon, but they have never been a company that cares about the short term. They care about profits for the next 20 years, not Q3 2022.

You’re calling me/my views naive? Seriously? Just after I excoriated someone for name calling?


Out of curiosity, what do you think a fair price for s3 would be?

I haven't had a problem with the storage prices, just egress fees can add up

(I don't work for aws or a cloud provider, just curious)


A competitor of AWS claimed that AWS' egress fees are excessive and by how much - https://blog.cloudflare.com/aws-egregious-egress/

As for a fair price, I think Cloudflare and BackBlaze provide S3 API compatible services with better pricing. Cloudflare doesn't charge for egress at all. This lends substance to their claim that AWS is overcharging for egress.


Egress fees on AWS are high, but I'm not sure free from Cloudflare is the best evidence given they still haven't had a profitable year (getting close though in recent quarters).


To be clear, I have no special insight into underlying S3 costs. Only the perspective of a customer.

My opinion is that Backblaze has done an awesome job of driving their storage costs down, scaling that up, and then reflecting that thriftiness in their pricing. Their technical communication (e.g. hard drive reliability reports, software engineering blogs) makes me believe in their competence. So when Backblaze sets a price, I expect that's pretty close to the minimum sustainable price, because they've spent a lot of effort on that.

B2 currently charges $5/TB/mo with lots of options for bandwidth that's too cheap to meter. This strikes me as pretty fair overall.


B2 is definitely a lower tier of storage and I don't begrudge amazon wanting $23 for their top tier.

But the bandwidth is a huge issue.


In what sense? The durability is 11 9s either way [1]. If you value all the different AWS regions, then it's true that B2 can't match that. I'd still argue the 4.6x cost difference isn't worth that for a lot of uses, though.

[1]: https://help.backblaze.com/hc/en-us/articles/218485257-B2-Re...


The data is only stored in one data center, and the 99.9 and 99.99 percentile latency on requests is not good and can cause issues depending on use case.

And if durability is your main concern you'd be on a cheaper S3 plan anyway.

> I'd still argue the 4.6x cost difference isn't worth that for a lot of uses, though.

Sure. S3 and B2 have different strengths, and B2 is better for many uses. But looking at the full stack and the performance at each layer suggests to me that S3 charges a reasonable/fair price for storage. And that's what the question was about.


r2 is in public beta! it’s great.


>AWS represents the computing power of its machines in Elastic Compute Units, and 4 ECUs represent more or less the power of a modern CPU.

What does the author mean by the "power of modern CPU"? What which microarchitecture? Which generation? How many cores? In 2019 I had a Ryzen 9 3900X with 12 cores. Was that a modern CPU? How many Amazon ECU would that take to be equivalent?

It seems to me he assumed all the CPUs were equally powerful at the time he wrote the article.


There is also the cost of supporting a global monopolist that exploits populations of workers undercuts prices and destroys local production and trade in order to spread its empire.


Over the years, I've migrated from AWS -> Digital Ocean -> Wasabi/Hetzner for various compute/storage needs and I'm still surprised how AWS price hasn't come down considering many other options in the market. It feels like cloud computing & storage have really become commoditized with many viable alternatives but the price charged by Azure/AWS/GCP doesn't seem to reflect this.


aws can be so cheap. aside from egress bandwidth, it’s pricing is fantastic.

crazy aws cost is always gonna be a case of your holding it wrong. either intentionally because it’s more fun, or accidentally.

how to use aws well and cheaply:

- scale to zero whenever possible (lambda)

- use minimal infrastructure (lambda managing ec2 with route53 health check)

- don’t use rds (s3+dynamo ideally, sql on i4i nvme if you must)

- automate everything, and use the automation constantly.

- egress heavy components get pushed out to cloudflare. r2 is in public beta!


> scale to zero whenever possible (lambda)

AWS pricing is great because you aren't price-gouged if you don't use it?

And your statement isn't really true. If you use AWS Lambdas do implement something like an HTTP endpoint, you still need to pay for stuff like AWS KMS and API Gateway even if no client hits you with a request.

- use minimal infra (lambda managing ec2 with route53 health check)

Above free tier AWS Lambdas are terribly expensive when compared with EC2 alone. You're charged per RAM*time, you're charged per concurrent execution, and if your Lambda times out you have to pay for the re-execution.

- don’t use rds (s3+dynamo ideally, sql on i4i nvme if you must)

AWS pricing is great because you are not price gouged id you do not use them?


> AWS pricing is great because you aren't price-gouged if you don't use it?

Precisely. If you spin up instances with a per-time cost and they aren't being used, then you are wasting those instances. The engineering to automatically scale down instances when they are not being used, and to scale up (vertically and/or horizontally) when needed, is non-trivial to do correctly and, quite frankly, is now undifferentiated heavy-lifting. Paying a higher price when the infrastructure is being used and then not paying when it is not being used can and does result in lower overall costs when correctly architected.

Personally speaking, my AWS bill last month for my serverless side-project was $2.74. Sure, this includes stuff in the free-forever tier. Does that really matter? Nobody rents servers out that cheaply.

> you still need to pay for stuff like AWS KMS and API Gateway even if no client hits you with a request.

For KMS, true, for API Gateway, wrong. API Gateway pricing is per-requests, not per-time. If there are no requests, there are no charges. In most cases, you don't really need to manage your own KMS keys; there is no added-value compared to using AWS-managed keys. Even if you do want to manage your own key, the pricing is a few bucks a month per key, including usage, unless your project is highly scaled up, in which case, you're in a completely different class of cost-optimization.


The API gateway might not be necessary any more:

https://aws.amazon.com/blogs/aws/announcing-aws-lambda-funct...


aws pricing is literally a catalog. what and how much you buy is kind of the whole thing. a subset of the catalog is good.

aws core primitives are:

- flexible

- reliable

- robust

if your service succeeds, you will definitely want to move some components out of aws, but not all of it. aws can always remain the control plane.

if you don’t need what aws offers, don’t use it. they offer it at a price, and it is what it is. thanks to other providers, it likely will remain fair over time.

what i outlined was how to make aws cheap(er). it’s a simple strategy:

- use the primitives: s3, ec2, nvme, route53, lambda, apigateway

- avoid the services: rds and all the rest

- avoid heavy egress: use cloudflare workers and r2

scaling to zero means avoiding things like kms keys.

not sure where you got apigateway pricing from, it definitely scales to zero without fixed minimum cost.

lambda will always be the most reliable part of your stack. when you have enough traffic that it’s worth it, use ec2 instead and let lambda manage those machines.


I don't agree about avoiding the services. If you're mega successful down the line you can likely afford to do the migration when it's expensive, but the services are what save you_time_, and initially money too. I can spin up an RDS server that is secure and production ready in less than 5 minutes that quite frankly I don't need to touch for a very very long time. If we need more power, I can restart the instance with a bigger type, or if we've overspecced I can use a smaller one. Compared to running your own instances on ec2 there's a few hours of poking around to do things like password handling, backups, logs, not to mention you're now responsible for managing updates and the likes. It's a false economy to avoid them IMO.

That's not to say you should immediately pivot to all in AWS all the time but like any engineering be aware of the price you're paying to build Vs buy


definitely. some of the services are great. they are just a lot more expensive. in the context of making aws cheaper, avoiding services is the play.


> if your service succeeds, you will definitely want to move some components out of aws, but not all of it. aws can always remain the control plane.

Do you factor in the costs of dev time in a cross cloud environment at all? Nevermind all the AWS specific tooling and code you write, which needs to be thrown away and re-done to accomplish this?

Don't get me wrong, I love AWS when I'm not paying the bill, it makes my job of shipping features quickly so much easier. It's good and it works well. But it's a little crazy to think it's "cheap" and to ignore the costs of vendor lock in.


Do you factor programming language and library and operating system and cpu architecture into your vendor locking and specific tooling you need to write?


vendor lockin is fine. it’s like having a great employee, but not leveraging them fully because bus factor and cogs. you should be sad if a great employee leaves, or your vendor shuts down. choose both carefully, and trust them.

my only experience expanding out of aws is to cloudflare for egress. i’m sure it can be rough, but so can anything solved with dev time.

expanding out of aws with individual tightly scoped components should be easier than general cloud migration. just bandwidth egress. just cheaper cpus for background processing. just whatever.

keep the complicated stuff on main cloud where it belongs.


> Above free tier AWS Lambdas are terribly expensive when compared with EC2 alone. You're charged per RAM*time, you're charged per concurrent execution, and if your Lambda times out you have to pay for the re-execution.

They are dirt cheap for many use-cases where you application only runs a few times a day/hour and you need ad-hoc scalable performance. For example processing/creation of csv-reports - which is a common task in my organization.


> They are dirt cheap for many use-cases where you application only runs a few times a day/hour and you need ad-hoc scalable performance.

If your app consists of running a quick cron job a couple of times per day, you do not have much of an app to begin with.

If you are already paying for EC2, running those short-lived cron jobs in any one of those instances already adds zero cost.

The main beneficiary of AWS Lambdas is AWS itself. It charges a huge premium for what essentially amounts to spare cycles in its infrastructure, and in the process ties up clients to a proprietary solution.


> If you are already paying for EC2, running those short-lived cron jobs in any one of those instances already adds zero cost.

Now you have a scheduling/orchestration concern. Kubernetes was built to address that kind of concern. The thread is filled with people bemoaning the cost of running Kubernetes.

> The main beneficiary of AWS Lambdas is AWS itself. It charges a huge premium for what essentially amounts to spare cycles in its infrastructure, and in the process ties up clients to a proprietary solution.

And as a result, we (the customers) don't have to pay for infrastructure that would otherwise be idle. Why can't it be a win-win?


Also should be included is license costs. It's easy to forget that special feature you are using in SQL Server or Oracle can increase you to a much higher license. Main areas for concern for license costs are DB and windows server.

Also DevOps people are not cheap. You need to factor in implementation and maintenance. I could be using that money on a bigger server with less complex infra.


Regarding bandwidth costs, there are new AWS services with much better pricing. I just happened to write about this here, https://www.vantage.sh/blog/nat-gateway-vpc-endpoint-savings


Inter-zone traffic is $0.01/GB EACH WAY. So in effect it's $0.02/GB https://www.duckbillgroup.com/blog/aws-cross-az-data-transfe...


The two data visualizations in the article are poorly designed.

They show line graphs where the X-axis is the type of EC2 instance and the Y-axis is the price. Firstly, the X-axis of any line graph should be time. Or perhaps some other variable that "progresses". It's bizarre to use a category to label the X-axis of a line graph. Secondly, the X-axis is not ordered alphabetically, but seemingly randomly.

I don't mean to be harsh, but a better data visualization would be a bar chart, not a line graph. Bar charts are better suited for categorical dimensions. Also, the type of EC2 instances should be grouped and/or sorted, but not randomized. This way, the information from the data visualization can be parsed more quickly and easily.


The point at which you turn a bar chart or scatterplot into a line chart is when and only when intermediary values can be linearly interpolated between adjacent values.

This is often the case when the X axis is time, but far from always, and there are other types of X axis that satisfies that property too.

That's the strict way of looking at it.

Slightly more loosely speaking, a line also helps comparing whether adjacent values are higher or lower, because humans are fairly good at judging angles relative to each other. So under this looser requirement, the X axis values have to be such that adjacent ones are more comparable than further apart ones.

Under the looser requirement, time is actually a surprisingly bad X axis because values further apart are normally just as valid comparisons as those close together!

There's a third, very loose reason to connect the points of a scatterplot with lines: it helps the eye judge the sequence of points, and thus get a sense of the internal variation in the series.

This is a very loose requirement, but also a powerful tool.


I refuse to learn these numbers, they are outdated so fast. Are they still relevant today? One would spend quite some time figuring that out instead of optimizing his own code/architecture... For which there's always good discounts to get!



Great little article thank you, one thing, can you add the unit cost per mont details for storage and bandwidth, it makes a difference in calculating the cost, not the same to asume $0.10 TiB/Month or $0.10 GiB/Month, I think that data would be most useful without having to go AWS for verification.


This title is toxic. Don't push AWS on me. AWS is for huge corporations or for devs who don't know how an internet application works.


> for devs who don't know how an internet application works.

I'd argue AWS is even more risky than any other solution for people who don't know how applications work. Theres about 3,000,000 footguns loaded into AWS.


Why would you make line plots, with a category variable on the x-axis? Bar charts would be appropriate here.

Also, the lines seem to have way more line segments than there are labels on visible on the x-axis, so what do the graphs even mean? Are they based on a larger number of instance types than what the labels show? What are those non-labeled instances then?


It seems that the article is almost 2 years old and the script is missing from github.

Here is the original website where the information is taken from [1]. Provides the latest data with every possible instance price.

[1] https://instances.vantage.sh/


AWS is overrated. Most of the engineers and managers choose it because it's a safe choice for them. But in the long term the costs rise and you can't get away from it. I would even argue it's only for those who are afraid of command line.


> I would even argue it's only for those who are afraid of command line.

This make no sense. Using AWS beyond spinning up a VM requires extensive knowledge and comfort at the 'command line'. In fact, I think the vast knowledge required is one of the negatives of all the current cloud providers. Back when I could walk down the hall into the DC and sit down at whatever machine it was easier, and arguably required less knowledge.

> But in the long term the costs rise and you can't get away from it.

With AWS at least, our long term costs have only gone up as the business has grown. If we went back to the same size many years ago, costs have gone down.


So you could easily switch away from AWS if needed?

I've heard a few horror stories of vendor locking and I can easily understand that (but no own experience, luckily).


Cloud should just be a commodity, where you can switch between providers without changing a line of code. Until this is the case, I will keep building my own solutions.


What would you use instead?


Cloud services will eventually shift the entire market back to onprem strictly because of costs...costs that are supposed to go down over time


Until AWS lets you set hard limits on your spend ahead of time, I'll take a hard pass.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: