This is like a double trap many try to sell to startups: a) you need to scale across many machines and b) the way to scale is the cloud. My take: a single machine (or two for HA) will be enough, if you really want to go big separate the web server from the database but that's it. And yes, I am in the website performance business, I worked on the video purchase platform of one of the largest British television stations and even that didn't require more than a single database server and a single Redis server for caching layer. Harken to https://stackoverflow.com/questions/5131266/increase-postgre... this question from 2011 discussing speeding up from like a thousand inserts per second at the cost of data loss -- today you will write on an SSD and don't need to risk data loss. Does your web site / app really get a thousand writes every second? I thought not. Does it even get a thousand reads? If not, then why are you building a complex database cluster...?
The other day I saw quad E7-4870 (yeah won't win any single thread contest but has 40 cores and 80 threads) 512GB RAM servers for $299 a month, with 1TB RAM for $499. Had a low end 2TB SSD for boot and you could add 8x1TB HDD w/ HW RAID for $40...
Anyway, the funny thing is, I'm running it on $2.50/month Vultr VPS. I got so worried when it crossed 30,000 that my site will crash. But it didn't. Then when its views got higher, I optimized further. I learned more things about how to make things faster. I refactored my code to squeeze out that much more juice out of $2.50 - not because I'm cheap (well, perhaps), but because I wanted to see how far I can push it.
I'll probably cross 200,000 soon and I think by then I'll need to increase it to (yikes heaven forbid!) $5/month plan.
Recently, just because of all the hype about AWS (in general and also at my work) I wanted to get started with AWS. Then I looked at their ridiculous pricing calculation page and I just closed the browser. I thought my time was better spent working on my project than learning about how to deploy a simple PHP application to AWS.
Edit: sorry didn't mean to imply that your site isn't successful, only that in terms of traffic, it doesn't make aws worth it.
I think the problem here is that a lot of people use AWS when there's no value in using AWS. It's commonly reached for as a first choice when it's often not a good option if you're looking to run lean.
AWS is extremely popular, but probably only efficient for a small percent of companies who have wildly variable traffic patterns.
I bill between $200 and $250 an hour. (If I had a full-time job I'd be salaried around $90-$100/hour.) Being able to pay me less because AWS's tooling makes life a hell of a lot easier once set up makes a lot of sense even for fairly small companies.
I've watched someone spend days learning to configure a performant DB server on AWS due to their poor disk IO performance and spend an enormous amount to get a high ram instance when a simple SSD based server where more IOPS were trivially available would have had it working out of the box.
Everything has a learning cost though, so perhaps that's a somewhat unfair example as having a deep knowledge of data centers and backbone providers doesn't come free, but I can crank an ansible and docker script out and have pretty much all the advantages of AWS but deployed on my own hardware at a fraction of the cost. So I'm not sure it's fair to say that AWS tooling offers anything particularly unique to merits its market position.
And AWS reduces business risk, too, which needs to be understood and respected, too. Your VPS universe is not actually repeatable. It takes one line to roll out a full environment for more than one of my clients. Dev environment? Here ya go, self-serve bootstrapping and management. Prod environment? Infra-as-code guaranteed to be what you pushed to test at the infrastructural level as well as at the deployable level.
This is why we are replacing system administrators with developers and why we are replacing hands-on system creation with cloud stacks: because it's a difference of kind and the fears of higher operational expenses are trivialized by being able to use that endless inventory to replace the expensive part of your operation--the people.
0 network transparency
abismal IOPS performance
very limited config. options
Horrible uptime (US East has worth uptime as a region than wast majority of quality DCs)
0 Access to people who can really help you (unless you are at several mil. per month spend)
Endless inventory and minimized administration but higher upfront and ongoing server investment is not a benefit when you need to write a website used by under 10k people a day. Nor is it a benefit for internal services moved to the cloud for nonsense reasons. Both of which I've seen done far too frequently. Simple tooling and small virtual machines or singular dedicated servers like are easily enough for these applications.
I can definitely understand the need for a small team working on a very large, high hit count application though. I'm not saying there are no uses for it, there totally are and you make valid points, just that the uses more limited than one might expect by the hype.
And, unlike the still-really-weird-and-ad-hoc VPS world, stuff like DR is a solved problem when-not-if you need it. The vig for AWS is consistently between 20% and 25% and you are able to leverage implicitly all the incredibly useful tooling and systems around you. If 20% to 25% of $26 is going to materially damage your business, you do not have a business.
- Networking is usually real bad. SDNs are your friend. Yeah, learn you an iptables and all, but this is the future, we can do better. (DigitalOcean is almost to the point AWS was at, like, eight years ago with EC2 Classic? Something like that.)
- Geographically-centralized but independent systems are hard to come by and so fault tolerance is a Big Problem. AWS loses an availability zone, my stuff keeps rolling. Can't say the same elsewhere.
- Value-add services. The sibling comment's complaining about RDS, but RDS configuration hits probably the 98% case and You Don't Have To Learn It. I'm a little more hesitant about lock-in services like SQS, SNS, etc., but moral equivalents exist elsewhere for the most part, you can use them pretty effectively.
(And are they not better served by buying and operating their own datacenters?)
I'm working at another service company expecting to clear 5M requests/day with bursts close to what you're talking about several times a year. We've had so much pain managing colocated servers, we can't justify the cost of 3 full data centers for just our load, that doesn't make sense. We're currently moving to a cloud provider to be able to scale out better when peaks really spike.
We had a customer that wanted to do several million requests in a 15 minute window, and can't currently handle that... We're restructuring/refactoring so that we can.
It may not be sustained, but getting 330k requests/second in bursts is a different way to think about problems than anything less than 1k/second, which many servers can hit without a sweat, and why I'd be more inclined to push for mid-level VPS like DO or Linode in those cases. Depends on need and expected growth.
So at that point it was 2.6M views per day, which means HN itself was getting about 30 views per second. If you look at the 200k uniques instead (which might make more sense since any individual person will probably only click through once), that's about 2 unique visitors per second. So even if HN has grown a ton in the time since that post, I'd be surprised if it sent more than about 5 hits/second to anything.
I used to run a university web hosting platform that currently has, I think, five web server VMs on about as many physical machines, two physical machines for load balancing, and two physical MySQL servers in active/passive replication (i.e., only one gets either reads or writes). We hit the front page of HN fairly frequently—for instance, we host mosh.org—and it hasn't really been a problem. I remember getting paged in ... 2009 or so? ... when a particular website in WordPress got to the front page of Reddit, but we had fewer machines then, and also I think we had not deployed FastCGI for PHP at that point (for complicated shared-hosting reasons), so each WordPress page load was its own PHP process via CGI. If you're optimizing for performance, even if you want to stay on WordPress, step one is to not use plain CGI and step two is to do one of the myriad things you're supposed to do for WordPress caching.
In any case, a handful of physical machines will handle being on the front page of HN just fine. If you're doing something where you have an extremely computationally-intensive process on the first page load and you're worried you might hit HN but you might not, put it on cloud and set up autoscaling, but other than that it probably doesn't make sense. If you know you won't scale too much—and a static site on the front page of HN isn't too much—chances are that your usage is so low that you're paying a premium for the unused ability to scale and you should just pay for two cheap VPSes, and if you know you will scale (e.g., you have a large fixed workload), again you're paying a premium for the unused ability to scale down, and you should just invest in a datacenter and save in the long term.
All that said, if you've got a static site, by all means stick it on a CDN, which I think is a perfectly defensible use of cloud for sites of all sizes.
150000 hits per month is an average of a hit every 86400 * 30 / 150000 = 17.28 seconds.
Yes, that's nothing. With a hit every 17 seconds, there is no performance optimization to be done. Therefore I don't understand the concern about performance in the parent comment by 'sideproject'.
Sure, there could be peak times when there is a hit every 100 microseconds and that's what forced the parent commenter to focus on performance optimization but nothing about this was mentioned in the comment.
Details about traffic in such peak times would have made the parent comment by 'sideproject' interesting. But with the current details in the comment right now, it is going to leave readers confused why one needs to discuss performance optimization for a hit every 17 seconds on average.
Why not have a CDN serve static pages? Or better yet, an app on a mobile phone?
Then the hits are just APIs.
If you serve a page in 10s, then you can serve 259'200 pages/month.
This is obviously skipping over a ton of details, but it's a good rule of thumb.
In addition to this, one pageview may produce many requests. Both of these need to be profiled before you can estimate reasonably how much traffic a webserver can handle given it's current resources.
There are some good benchmarking tools that will load the entire page, including all it's resources, and produce a more accurate load measure in terms of r/s.
As a side note, those $5 vultr instances can handle a surprising amount of static requests per second using nginx.
We would be investing far more time into system operations than I do now thanks to AWS's automation of standard stuff. My hosting bill would increase - partly because we can run on t2.micro instances, but that awfully glib advice about overprovisioning is most definitely asking for trouble; and we'd lose curated services like RDS and OpsWorks - which, by the way, are PostgreSQL and Chef, hardly a "ugly vendor lock" but simply well-designed infrastructure services based on standard parts. Oh, and I'd have to spend more money and time on auditors, because bare-metal providers don't have compliance programmes like AWS/Azure/GCP do. And I've haven't even started to think about securing the resulting systems to the same level I get for minimal effort from a major public cloud.
You're not a dinosaur, no, but these claims don't match my reality, nor that of many other projects besides. This is compounded in my view because you've claimed specific expertise to promote your opinion. Any manager bringing me such a poorly considered and bombastically argued business case would be sent away with an ear bashing about TCO and opportunity cost.
Some startups (A) I work with have basically no OPS costs beyond setup, integrating Docker deployements and getting automatic backup working. Most simple technology just works and devs easily can do operations. The largest pain point still is VPN. Machines today are very very fast and load of many startups is very low. This are mostly simple marketplace startups etc. without any rocket science aka "web ui frontend to database". They often have <10 servers and are still over provisioned mostly due to HA requirements.
Some startups (B) I work with have high OPS costs due to demanding technology needs for throughput, load peaks, amount of data with innovative technology at their core, aka "rocket science".
I have not seen any correlation between AWS usage and A/B types.
From my experience with startups the only way to successfully use AWS is deep integration and using lots of services. If you use AWS and do everything on your own people are doing it wrong.
I don't mean this case to be universal, in particular, I think cloud services force applications to have a particularly good/modular design (which is a cost in itself) - where, with metal, as you wrote, you can relatively cheaply overprovision.
I think the analysis you're making overlooks some important characteristics of the infrastructure engineering aspect.
Some typical network/infrastructure elements, in particular firewalling, load balancing, and network management don't necessarily belong to the "rocket science" type of application; they are easy to overlook in "type A" services, ending with a "kind-of-HA-but-not-really" infrastructure, which is ok, but it makes the comparison cloud <> metal not really meaningful, as in the cloud, those features are baked in ("almost" for free).
I'm very skeptical for example, that the 5x figure includes hardware for the above network equipment and management.
To summarize, it's perfectly fine not to have an "advanced" infrastructure but it must be highlighted that such conditions make a direct comparison incorrect.
From what did you transition to what? 20% sounds like a very small markup, from my experience and from other people experiences documented on the web it's much much larger (at 10x?) than your experience.
So I would be rather interested in more details as many people ask me about Amazon transitioning, and 20% markup would be killer.
"Some typical network/infrastructure elements, in particular firewalling, load balancing, and network management don't necessarily belong to the "rocket science" type of application;"
I surely do not know your demands, but firewalling, load balanicng etc. looks rather easy to me today for everyone except Google, Amazon, LinkedIn, AirBnB and 99% of startups are not one of these.
"I'm very skeptical for example, that the 5x figure includes hardware for the above network equipment and management."
Not sure what you are using, the hardware would be around $40 per month for that kind of network architecture (FW,HAProxy,Nginx,...).
In my last job we had large NetScalers which where much more powerful then HAProxy/Nginx on a rented server, and I assume AWS is as powerful, but for most of my clients this would be huge overkill.
The LBs you're talking about are software. I was referring to hardware solutions; a soft load balancer is still a good solution, but brings back to the problem I've made before - unit granularity.
Do you refer to two (two is the minimum required for HA purposes) dedicated load balancing machines, or mixed services?
In the former case, metal is not very convenient, as the minimal unit even for a pure LB machine, is still expensive.
In the latter case, it's hard to say, buy I think there is plenty of middle ground between a small startup and Amazon, where the cloud granularity is helpful and cost-effective (I'm not implying that it's generally cheaper than metal, though).
Our base environment (we have several) has 4 servers, 2 of whom are used for the app servers and load balancers, and 2 for data stores and queue processors.
Each server is the typical (as someone in this thread named) "8k" server (a bit more costly, actually).
Generally speaking, the servers are significantly overprovisioned.
There are a couple of factors that made the conversion to AWS cheap (~20+).
The first is that the base unit of a metal server is very large (1 server). Although it's cheap to scale vertically, scaling horizontally, for HA purposes, is expensive, because it costs at least 2 units.
For example, the total power of our app servers is overprovisioned in the 20x range (CPU and memory are cheap, right?). Even with minimal speccing, a metal server still costs around 3/4k, you need 2, that's 6/8k.
In AWS, we can work with very small units. So maybe we end up paying the same amount, but we don't need that excessive power, and, crucial, we have networking for free (or almost).
The second factor is networking and hosting costs, which are not trivial. We rent managed firewalls and load balancers, which in AWS are for free or almost.
Also, it's important to spread the servers cost over the time - any metal server won't last forever. If you buy one for 9.6k, it's 100$ a month for 8 years. When it breaks, HA goes temporarily out of the window until it gets fixed (or you buy a new one, or you move services around).
The big pain point of AWS [for us] is RDS, which is madly expensive. It accounts for something like 50% of our AWS costs.
A very gross estimation of the monthly costs of each metal server could be:
- 80$: server
- 80$: hosting
- 80$: managed networking
For 4 servers, that's almost 1000$. Adding 20%, with a budget of 1200$, with AWS, we have a less powerful but also correspondingly less wasteful infrastructure, with lots of baked-in functionality (including, more flexible HA).
E.g. I had a setup that spanned on-demand instances, rented managed servers, racks in two separate colos and racks on premises. We expanded resources whenever it was cost effective at the time. Generally the colos won out, with on-demand instances handling traffic spikes, and managed servers primarily used for locations we did not have staff.
When your infrastructure is designed so that adding a new one of any of those is just a matter of assigning IP space to the new satellite network and deploy the first instances - whatever they're physically on - your utilisation of all the resources can be far higher.
E.g. in this setup we have instances where we move containers seamlessly between the UK, New Zealand and Germany currently depending on load, available resources, and which instances need low latency (Germany vs. UK makes a roughly 8ms latency difference despite going over an encrypted VPN connection, so we've even had times where client traffic hits load-balancers in the UK while the web servers were temporarily in Germany because it happened to be cheaper to expand there for a while (and contrary to with AWS, our bandwidth costs in both locations are trivial).
If you're comparing to "lets throw a bunch of servers somewhere", then, yes, AWS probably won't be that much more expensive, and presumably that is a big part of why so many people gets caught out by AWS costs once they start scaling up.
Rearchitecting may be necessary anyway, but doing this kind of work involves hiring and training better senior devs and retaining them for a couple of years at least. That's not cheap. It's
A lot more expensive than those OPS guys you laid off.
I think what both of these scenarios share is that they're not about saving money. They're about empire building for the Dev manager.
Also, an 8k server is a big unit. Cloud services are much more granular. This is a problem of bare metal - it's easy to overprovision because the base unit is large, and one ends up being happy of having an "overprovisioned" system, when in reality it's money down the drain.
Also, you don't count the management of the 8k server. It may (but not necessarily) be at a click distance; if it is, the management hardware (eg. one from a very famous servers producer) may have a poor software.
There are reasons why, in some cases (of course, not in all or not in many), cloud may be more advantageous than an "8k" bare metal server.
All in all, I think without numbers, talking about metal vs. cloud in abstract, generic terms, makes a poor argument.
My point was that it's peanuts compared to the labor costs it saves. I've worked quite a few places where a single $2-3k server would have saved every employee around an hour a day. In some cases several machines would have saved an hour each.
Every person you don't need gets you more than one person worth of increased productivity due to scaling limits, (see also Fred Brooks, IT and HR - more employees, more support staff).
(That doesn't happen when you're building from the jump for AWS or another cloud, though, unless you're messing up on a deeper level.)
I love it when customers pick AWS (though I usually advice not to, unless they have very specific needs), as my billable hours are way higher for those clients, though it is annoying having to deal with the inevitable "why is my AWS bill so big?" after I'd told them exactly why it'd be expensive in the first place. This is particularly true with bandwidth heavy setups, where AWS charges tens of times more per TB transferred than e.g. Hetzner.
Often I even end up getting paid to help them get off AWS again down the line when they realise how expensive it is.
Overall the idea that managed servers and even bare metal colocated servers take up so much more operations time is just not what I experience. Even if it did, if you're even moderate successful it'd need to save you a crazy amount of operational time per server to make up for the price differences in the hosting.
Further my experience is that you don't need to over-provision much exactly because setting up hybrid setups where you spin up AWS instances (or any other cloud) to handle spikes works well. The trick is to treat managed servers exactly as cloud instances, apart from at most the initial hardware provisioning (though many hosting companies provides APIs for this so you can abstract away that too), so that it doesn't matter where you provision.
As for services like RDS etc., they're a very mixed bag. When they work for you, great, though they're expensive, but very often I end up with clients having to move off them because they need some plugin or other that isn't available. Very few of my clients in the end stay on them for very long. Their biggest benefit is to defer the initial setup of a self-managed cluster.
That doesn't mean there are no cases where AWS is the right choice - for starters having it there for traffic spikes is great (though we usually end up needing it very rarely), and if you need large batch jobs or other environments you spin up/down frequently, it may be cost effective. But it's extremely rare I see cases where it's cost effective for base load.
The typical argument then is that big companies wouldn't have picked them if it wasn't. But big companies don't pay the prices mere mortals pay. I know concrete examples of large negotiated discounts for a couple of larger companies, and they are steep.
> and I'd have to spend more money and time on auditors
That might be true, but most companies are not in a position where they need their IT infrastructure audited. This might very well be a niche that makes it worth using AWS.
> And I've haven't even started to think about securing the resulting systems to the same level I get for minimal effort from a major public cloud.
My experience is that getting security right on the public clouds is harder than bare metal. If you take the effort to do it properly, the end results can be good. But a lot of that is simplified a lot in a colo'ed environment by simply physically separating resources into different network segments and the like.
For people with very complex requirements, you might even get better results, but I could never agree with "minimal effort" - it's very common to see people badly misconfiguring their IAM setup for example, because it was too hard for them to figure out how to open up just the specific things they needed to open.
> Their biggest benefit is to defer the initial setup of a self-managed cluster
And that, I think, is why (nearly) every startup, or indeed every new project, should begin on a public cloud. The capital waste of an incorrect server purchase can ruin a project and makes "fast failure" hard to stomach.
I don't move services away until an alternative is clearly both cheaper in 3yr NPV, and will not constrain future business opportunities. Except in cash cow operations where innovation has ceased, I rate the second criteria more important than the first and a compelling reason to stay on a public cloud.
As for the CDN, we absolutely agree. I noted elsewhere that even if you host on AWS, if you have any kind of volume of outbound bandwidth use, you should probably get a CDN elsewhere whether or not you think you need a CDN. A good caching CDN can for many people cut their AWS bill dramatically without making you move anything else off AWS.
If cost is your reason for looking at a CDN, or a big part of it, CloudFront will do very little for you unless your pageviews are extremely costly in terms of compute relative to the amount of data returned, and said data is very cache-friendly.
That's very much a niche requirement. To date I've not come across a setup where CloudFront made sense cost-wise.
To be honest I rarely use external CDNs and instead "roll my own" with a variety of providers as with cloud providers + geoip enabled DNS you can get 90% of the benefit at very low rates, but in terms of "brand name recognition" MaxCDN is worth checking. It's not nearly the maximum saving you can get, but it can be substantially cheaper than CloudFront.
Or if you want to go old school, cheaper, and less sexy/api driven: https://www.delimiter.com/
If you shop, for around 30-40 a month you can get 16 cores, 32gb ram, and a 120GB ssd or 1-2TB spinny disk.
I can't stress this enough. Having to roll your own / administrate a logging system, Database, load balancers, autoscale, file storage far exceeds the added cost of the cloud. Now that GCE has sustained usage discounts of 39% it makes no sense to go bare metal right now.
Packet - $292/mo
32 GB memory
128 GB SSD
Google Compute Engine - $176/mo
$156 - custom-4-32-extended
4 cores / 32 GB memory
$20 - 120 GB SSD storage
> For the n1 series of machine types, a virtual CPU is implemented as a single hardware hyper-thread on a 2.6 GHz Intel Xeon E5 (Sandy Bridge), 2.5 GHz Intel Xeon E5 v2 (Ivy Bridge), 2.3 GHz Intel Xeon E5 v3 (Haswell), 2.2 GHz Intel Xeon E5 v4 (Broadwell), or 2.0 GHz Intel Skylake (Skylake).
I'd have one (or more) already if I didn't live on the opposite side of the world to the data center.
IMO, docker and container orchestration spells a bright future for bare-metal boxes like these, as you won't need cloudformation, etc..
But I still see few alternatives to S3, many vendors offers block devices, but only the big clouds offer blob storage.
Backup and restore from blob storage makes recovery from crash pretty easy.
There has to be something that adds more bare-metal for your docker containers to run on when the existing bare-metal reaches capacity, right?
K8s is getting pretty easy to setup these days.
Cloudformation, to me at least, is the power to expand resources for large traffic events. Most of the time you can get by with a small amount of instances but its nice when it scales up to hundreds of instances in minutes.
The bare metal equivalent would be to buy something that can handle the peak load from the start, right?
This is not what CloudFormation does. CloudFormation allows a declarative way to express a group of AWS resources to be created and coupled together. There's nothing that's quite exactly the same as CloudFormation, but stock Kubernetes is quite close, since you effectively describe what resources you want in individual declarative YAML/JSON files. However, there's nothing standard in Kubernetes for coupling them together into one thing equivalent to a Stack in CloudFormation.
The not-my-problem aspect is really powerful.
Note: I'm still hopeful that things like k8n will make running stateful containers on random metal easy without huge investments in configuration for backup and reliability.
I pay about $3.50/mo total to run these services.
I would be concerned with how good the connection is...
Ie. if you use packet or scaleway, how stable/fast is the connection to b2.
S3 being internal network at EC2 is a killer feature too :)
$50/month 24GB ram, 16 threads, 2TB HD, gigabit uplink (20TB/month free).
It's not hard to wait until they have a sale and/or coupons and/or pay upfront yearly to get a similar config for 30-40.
(disclaimer: these servers are pretty "unmanaged")
Or packet gives you a completely "cloud/api-driven" experience that's still on bare metal and reasonably priced compared to AWS or DO.
What is that $70/mo server with 16 cores?
The cheapest I see is Kimsufi KS-5 8-core for $36/month. There's also an EU hoster Netcup that offers RS 4000 8-core for around $33.
What about the larger upfront investment for hardware, especially over provisioned, relative to a spread out payment? What about the 5000 credits you get upfront from aws for startups, and the free tier?
As for vendor lock in, this isn't a problem most companies face - they're fine buying into AWS, and you're underselling the many, many other services they provide beyond EC2.
Sadly their support quality has decreased and our own account team has ignored us for weeks so maybe stick with AWS instead.
If you're a larger business, you can probably absorb that risk if it goes wrong. If you're a startup, it could be catastrophic.
However, I also know larger businesses that just simply do not have good ops teams and a cloud provider would outperform all of them.
> Good ops people can cost as much as good developers.
And this is a surprise? They bring a ton of domain specific expertise, good automation experience, and they lift the burden of managing your systems from your developers, so they can work on features and not scaling.
Also, when your web service scales to millions of requests per second from millions of requests a month or week or whatever, having elastic compute that can scale with that is a blessing. Because nobody wants to wait for page loads:
First of all, if you have anything mission critical, you need to run it in a high availability config, this is easy for stateless microservices, but when it comes to running your DB, you start renting three boxes instead of one or two and configuring them accordingly.
And then you setup your Backup Infrastructure for disaster recovery, Glacier needs a replacement after all. No problem, just more disks(?) on a few more boxes(?) and bacula(?), better in a different Datacenter just to be on the safe side, it would be nasty if you whole rack gets fried and your data with it.
Don't forget to backup your configuration, all of it. Loadbalancers, Server Environment Variables, Network (do you have an internal DNS?), Crontabs, some businesses need their audit logs stored etc...
On the infrastructure level there is lots and lots of stuff you can do and you won't ever really need AWS, you'll just spend significantly more time finding and administering the right solutions than just using the AWS Solutions where you'll find a treasure trove of great tutorials and can relatively cheaply pay for support.
If you then pay someone on top for 24/7 management/monitoring of your dedicated stack so that your team doesn't have to get up at 3 am because one of your VMs disk fills because some stray logfile is filling the disc, many of the savings you had by setting it up on a dedicated server go out of the window because the management partner needs to train their people to look into your infrastructure. AWS only Management Partners are just light-years cheaper because they can streamline their processes much better.
You could also hire your own team of admins...
Sure AWS is a beast with its own surprises, but overall the cost/benefit ratio is still very fair even if you factor in all the "surprises"(many of which your management partner will probably know about). Having layered support is really something beneficial aswell.
If something is wonky with RDS, you get to call your management partner if he didn't detect it before you, who if he can't tackle it himself can call AWS technicians. This gets you much much further than you would get elsewhere. The outside the world is paying for (for example) perconas consultants or someone similar if the problems grow over their team's head.
Sure, at some point in a companies growth, depending on how technical the operation is, there might be a time where an admin team and colocation/dedicated boxes make sense, where AWS technicians will scratch their heads etc., especially if you have some very very specific tasks you need to do.
But for most people this is far off if ever.
Another big thing that rarely gets mentioned is research into exactly what hardware to purchase and how to configure it. Do you know what compute hardware you should purchase? There are 10s of vendors and thousands of options. What about network hardware? Do you know which switch is the best for your stack?
With a cloud vendor all of these questions disappear.
Even if you make a mistake in your cloud provisioning, it's easy to correct; just shut it down and start over. Make a mistake buying your own hardware and you have to live with it for 3 years or pay purchase more hardware.
I think people tend to forget or ignore all of these costs when evaluating cloud providers. You look at the total bill each month and are surprised that it cost that much. However, the costs for your custom built architecture are likely higher, but they are spread out over more time and more projects.
The 2 extremes you cite might be good for some people in a narrow set of contexts, it makes sense to include other possibilities when evaluating your hosting options.
Did I stutter? I said no colo. You need to be ridiculously large for colo to be worth considering.
If you have customers with enterprise needs paying enterprise prices, you need to keep your SLA. Think finance, e-commerce, etc.
If your Infra Provider has the same SLA than you do, you have problems and should do the math and think about backup infrastructure or what ever else is necessary to have a chance of upholding that SLA.
Edit: I just now see that you misunderstood me. With "backup infrastructure" I meant an infrastructure to do backups, not another new infrastructure to sit there and collect dust awaiting disaster. That's mostly not necessary.
Or in a more realistic scenario you forget to dial back an are paying $200/month extra for unused provisioned IO for years before anyone notices it. Happened, even though I was looking for improvements after every bill.
> "My take: a single machine (or two for HA) will be enough"
2 bare metal instances isn't HA. Not even close.
> "if you really want to go big separate the web server from the database but that's it."
I would always recommend separating the web server from the database server on anything professional. It gives an easy clear path for scaling sideways (since you've already separated out your back end from your application), it allows you to tighten security (eg only allow access to the DB server from the web servers via the unprivileged DB user connection), it also makes maintenance easier. Even if you're only running on one physical box, put the web server and DB in their own VM or LXC/Zone/Jail container.
> The other day I saw quad E7-4870 (yeah won't win any single thread contest but has 40 cores and 80 threads) 512GB RAM servers for $299 a month, with 1TB RAM for $499. Had a low end 2TB SSD for boot and you could add 8x1TB HDD w/ HW RAID for $40...
I work with both bare metal servers matching your description and both self-hosted and private clouds. Frankly I think your rant misses one of the most important point of working with AWS and that's the convenience and redundancy that the tooling offers. AWS isn't just about single instances, it's about having redundant availability zones with redundant networking hardware and about being able to have disaster recovery zones in whole other data centres and having all of the above work automatically. Getting our self hosted stuff to even close the level of tooling that AWS offers took months of man hours and quite a considerable more initial set up costs. Having to buy at least two of every piece of kit for redundancy, having to have BT lay two dedicated internet links (we have 3 now) just incase a builder accidentally cuts one of our lines and having core infrastructure replicated off site all adds considerably to both the set up time and cost. So yeah, for small businesses and personal blogs AWS is a bit overkill. But you cannot use the "high availability" argument and say "2 physical machines is enough".
Disclaimer: I've worked for clients such as Sony, UEFA and News International as well as many smaller but still sizable national publications. Our infrastructure has consisted of both scaled up physical hardware and scaled sideways virtual machines and frankly I/we wouldn't be able to offer the kinds of services we do nor with the kind of uptime we do without running a fleet of virtualized web servers.
But once again it comes back to SLIs and client expectations.
Two machines with two internet connections and a good UPS easily match the availability of AWS.
This is all in one physical location as well so you'd need to double this spec again.
Then once you've built all of that, you'd probably want to put it behind a CDN as leased lines are expensive.
Only then you're starting to reach feature parity with what I've described in my first post and there will be lots of kit I've not even touched on.
However even if you do just run 2 VMs (web and db) on each of the 2 physical boxes, and don't need redis etc. You still need to double your spec just for the multi-region point I raised earlier.
Even then the answer is: it depends what you need. I run blogs on S3 + CloudFront. That's effectively content versioning and geo-distributed caching CDN for pennies. AWS is not just EC2.
"I know that hardware is cheap, and programmers are expensive."
The way people here are talking makes it sound like they are spending man years setting up a couple servers. From my experience the time consuming parts are messing with the software stack. A problem none of these vendors of machines/VMs/etc really solve (ignoring way overpriced shared databases/etc). Buying a server installing an OS/whatever on it, tossing it in the rack and providing an IP/VLAN/etc is less than 1% of the time/effort spent on spinning up an application. Plus if you buy used, its possible to get 3 year old machines for less money than AWS charges for a month... To which I always hear about "reliability" of old hardware despite the fact that a large part of this rented hardware is just as old, and outside of disks/SFPs and batteries none of it really dies of old age.
As others have said, the effort to "cloudify" your application is probably more than the effort to buy some massively over-provisioned machine. Sure all the big boys need all this fancy management, but your startup with a few hundred thousand hits a month can probably be run on a 5 year old machine if any effort at all were spent assuring its not doing something stupid that results in second long page responses.
You can be anywhere in the world and have ridiculously small latency.
But the cost you pay sure is high.
Running "a few" dedicated servers is not some magic catch all for everyone.
I too have run networks for companies that's been in business 10+ years, with as many servers, and I've done that on both AWS, colo'd servers, managed servers, and hybrid setups of all of that, and I've yet to see an instance where AWS was cost effective at published prices for base load.
I have seen AWS be cost effectively used to handle spikes or batch jobs, and I have recommended it for clients that care more about the brand name (to tell their customers for example) than cost, or have very specific needs. I have also seen it used cost effectively once you get big enough to secure steep discounts.
AWS is more than just compute cores, and you're hand-waving away what it offers.
I agree that AWS offers much more than hosting, but when talking about hosting specifically the numbers are clearly better with 'bare metal' (rented or otherwise) -- it's just substantially more power per dollar. You'll need to do a cost-benefit for your own situation so YMMV of course, just as it would with any other provider or any other service your business may choose to subcontract.
If you're talking about actually running a global application then that would be a scenario where a network like Google's or Softlayer's does help by having 1 giant VLAN.
Any CDN can be stood up in front of a small fleet of dedicated hosts, and you'd still be saving 80%.
The bandwidth prices are high enough that I at one point mulled over setting up an "S3 compatible" storage service using S3 as the backend for durability, but storing a single local copy to most-of-the-time avoid hitting S3.
S3 bandwidth prices are high enough that there are huge cost savings to be had if objects are retrieved reasonably regularly.
Just do the whole Docker thing, the more dependencies and lock-in the better, update your CV and move on.