> If you want low cost, you do your own servers for anything under 20 nodes.

This seems unlikely to be true unless your software engineers love doing ops (and the needs of the business don't need those on software.)

The salaries of having Operations that can replace failed hardware 24/7 (and w/ vacations) alone is more expensive than a company like Etsy should have in ops. Then there is the cost of the pipe, power, and redundancies. Even higher up, managing all of the depreciation in accounting is going to have costs compared to a expense line item.

Here is the dirty little secret: most business run fine without high availibility. Your customer will not leave if your service is crashing for a few hours during the year if you are small.

On top of that, Amazon's single region EC2 SLA kicks in at 99.95%. So unless you are using multiple regions in AWS, going by their SLA you're already in the range of a few hours of downtime per year.

Don't believe the lie that you don't need ops people in cloud. You're still running servers. So since you have the staff anyway, why not save 3-4x the cost?

Hardware failures are far less common than you believe. I lost one hard drive in 10 years and because it was a RAID array it didn't even fail.

You still need software ops, yes, but not hardware ops.

> Hardware failures are far less common than you believe.

Your experience may vary; I once rented metal from Softlayer (running ~15 shards, about 60 boxes), we had a number of drives* fail, a couple of the rack controllers, some raid controllers, and one time a power supply over a 3 year period. On the worst RAID failure, we sent one of our employees across country to manage the recovery directly.

*Some hard drive failures related to 2012 Seagate 3TB drive issue. One failed within a week of being replaced.

Softlayer had a team monitoring our servers and working through issues with us; Other than supplier issues I blame them for nothing.

In that worst scenario; we ran off our geo redundant slave for the better part of a week.

I think you overstate the overhead for hardware operations. I had a team of five that managed over 1,000 physical servers in four global locations and they had to visit a datacentre maybe twice in three years. Other than for hardware installs which were always easily planned out.

