I don't mean this case to be universal, in particular, I think cloud services force applications to have a particularly good/modular design (which is a cost in itself) - where, with metal, as you wrote, you can relatively cheaply overprovision.
I think the analysis you're making overlooks some important characteristics of the infrastructure engineering aspect.
Some typical network/infrastructure elements, in particular firewalling, load balancing, and network management don't necessarily belong to the "rocket science" type of application; they are easy to overlook in "type A" services, ending with a "kind-of-HA-but-not-really" infrastructure, which is ok, but it makes the comparison cloud <> metal not really meaningful, as in the cloud, those features are baked in ("almost" for free).
I'm very skeptical for example, that the 5x figure includes hardware for the above network equipment and management.
To summarize, it's perfectly fine not to have an "advanced" infrastructure but it must be highlighted that such conditions make a direct comparison incorrect.
From what did you transition to what? 20% sounds like a very small markup, from my experience and from other people experiences documented on the web it's much much larger (at 10x?) than your experience.
So I would be rather interested in more details as many people ask me about Amazon transitioning, and 20% markup would be killer.
"Some typical network/infrastructure elements, in particular firewalling, load balancing, and network management don't necessarily belong to the "rocket science" type of application;"
I surely do not know your demands, but firewalling, load balanicng etc. looks rather easy to me today for everyone except Google, Amazon, LinkedIn, AirBnB and 99% of startups are not one of these.
"I'm very skeptical for example, that the 5x figure includes hardware for the above network equipment and management."
Not sure what you are using, the hardware would be around $40 per month for that kind of network architecture (FW,HAProxy,Nginx,...).
In my last job we had large NetScalers which where much more powerful then HAProxy/Nginx on a rented server, and I assume AWS is as powerful, but for most of my clients this would be huge overkill.
The LBs you're talking about are software. I was referring to hardware solutions; a soft load balancer is still a good solution, but brings back to the problem I've made before - unit granularity.
Do you refer to two (two is the minimum required for HA purposes) dedicated load balancing machines, or mixed services?
In the former case, metal is not very convenient, as the minimal unit even for a pure LB machine, is still expensive.
In the latter case, it's hard to say, buy I think there is plenty of middle ground between a small startup and Amazon, where the cloud granularity is helpful and cost-effective (I'm not implying that it's generally cheaper than metal, though).
Our base environment (we have several) has 4 servers, 2 of whom are used for the app servers and load balancers, and 2 for data stores and queue processors.
Each server is the typical (as someone in this thread named) "8k" server (a bit more costly, actually).
Generally speaking, the servers are significantly overprovisioned.
There are a couple of factors that made the conversion to AWS cheap (~20+).
The first is that the base unit of a metal server is very large (1 server). Although it's cheap to scale vertically, scaling horizontally, for HA purposes, is expensive, because it costs at least 2 units.
For example, the total power of our app servers is overprovisioned in the 20x range (CPU and memory are cheap, right?). Even with minimal speccing, a metal server still costs around 3/4k, you need 2, that's 6/8k.
In AWS, we can work with very small units. So maybe we end up paying the same amount, but we don't need that excessive power, and, crucial, we have networking for free (or almost).
The second factor is networking and hosting costs, which are not trivial. We rent managed firewalls and load balancers, which in AWS are for free or almost.
Also, it's important to spread the servers cost over the time - any metal server won't last forever. If you buy one for 9.6k, it's 100$ a month for 8 years. When it breaks, HA goes temporarily out of the window until it gets fixed (or you buy a new one, or you move services around).
The big pain point of AWS [for us] is RDS, which is madly expensive. It accounts for something like 50% of our AWS costs.
A very gross estimation of the monthly costs of each metal server could be:
- 80$: server
- 80$: hosting
- 80$: managed networking
For 4 servers, that's almost 1000$. Adding 20%, with a budget of 1200$, with AWS, we have a less powerful but also correspondingly less wasteful infrastructure, with lots of baked-in functionality (including, more flexible HA).
E.g. I had a setup that spanned on-demand instances, rented managed servers, racks in two separate colos and racks on premises. We expanded resources whenever it was cost effective at the time. Generally the colos won out, with on-demand instances handling traffic spikes, and managed servers primarily used for locations we did not have staff.
When your infrastructure is designed so that adding a new one of any of those is just a matter of assigning IP space to the new satellite network and deploy the first instances - whatever they're physically on - your utilisation of all the resources can be far higher.
E.g. in this setup we have instances where we move containers seamlessly between the UK, New Zealand and Germany currently depending on load, available resources, and which instances need low latency (Germany vs. UK makes a roughly 8ms latency difference despite going over an encrypted VPN connection, so we've even had times where client traffic hits load-balancers in the UK while the web servers were temporarily in Germany because it happened to be cheaper to expand there for a while (and contrary to with AWS, our bandwidth costs in both locations are trivial).
If you're comparing to "lets throw a bunch of servers somewhere", then, yes, AWS probably won't be that much more expensive, and presumably that is a big part of why so many people gets caught out by AWS costs once they start scaling up.
Rearchitecting may be necessary anyway, but doing this kind of work involves hiring and training better senior devs and retaining them for a couple of years at least. That's not cheap. It's
A lot more expensive than those OPS guys you laid off.
I think what both of these scenarios share is that they're not about saving money. They're about empire building for the Dev manager.
Also, an 8k server is a big unit. Cloud services are much more granular. This is a problem of bare metal - it's easy to overprovision because the base unit is large, and one ends up being happy of having an "overprovisioned" system, when in reality it's money down the drain.
Also, you don't count the management of the 8k server. It may (but not necessarily) be at a click distance; if it is, the management hardware (eg. one from a very famous servers producer) may have a poor software.
There are reasons why, in some cases (of course, not in all or not in many), cloud may be more advantageous than an "8k" bare metal server.
All in all, I think without numbers, talking about metal vs. cloud in abstract, generic terms, makes a poor argument.
My point was that it's peanuts compared to the labor costs it saves. I've worked quite a few places where a single $2-3k server would have saved every employee around an hour a day. In some cases several machines would have saved an hour each.
Every person you don't need gets you more than one person worth of increased productivity due to scaling limits, (see also Fred Brooks, IT and HR - more employees, more support staff).
(That doesn't happen when you're building from the jump for AWS or another cloud, though, unless you're messing up on a deeper level.)