Disclaimer: I work for Backblaze. The answer to almost every question put to us is "total cost of ownership". Facebook stores a lot less data per customer than Backblaze and makes more per customer than we charge. Facebook can afford to waste more money than we can, so they do.
In my humble opinion, there is A LOT of wasted money in making datacenter machines easy to service. We create a little spreadsheet (it isn't rocket science) which includes the employee salaries taking into account different designs and how long it takes to open up and service a pod. For example, to access the Backblaze Pod's hard drives you must pull the pod out like a drawer and open the top. Many servers you can access all the hard drives from the front, without moving the server. Accessing drives from the front is much faster -> but you lose 2/3 of the density!! We pay $600 per month for a cabinet, just for the physical space. We can stack 8 pods in that cabinet, or about $75 per pod per month in space rental. So buying servers that are 1/3 as "dense" but easier to access drives from the front costs Backblaze $50/pod/month or $600/pod/year. We open a particular pod and replace a failed drive or fix some other problem maybe once every 6 months-> so you are paying $300 PERINCIDENT if you have front mounted drives that save the technician maybe 10 minutes of time. The math simply doesn't make any sense at scale.
I think a lot of the inefficiency is because the datacenter employees recommend these "nice to haves" to their managers to make the datacenter employees lives easier. But their company pays dearly for not doing the math. This isn't important if your company is massively profitable or if it isn't a "datacenter centric" company like Backblaze. But for us, it's the difference between life and death. Remember, we only charge $5/month and we don't have any deep pockets so we have to stay profitable -> there isn't a lot of margin in there to be wasteful.
I totally agree with you in principle. But you're being too generous with your density advantage as compared to hotswap. I can't tell if your pods are 4U or 5U, but your motherboard vendor sells a 4U front & back chasis that has 36 3.5" sas trays.
Obviously that setup will run you more than yours, but I'd be surprised if the delta was more than 1k once you subbed in your consumer mb, controllers and backplanes.
That said, one small nitpick: I'm fairly sure Facebook does less revenue/user than you do: They have a billion users, and did (order of magnitude) 5 billion last year: That's $5/user/year. Which you do per month.
Still, it's amazing you can provide this unlimited service for that small fee!
I work for Backblaze -> typically we replace a drive once it fails, but not "as soon as it fails". Because the storage pods go in to "read only" mode when one drive goes down, we have some time before we need to take action, sometimes it can be a few days before the drive is replaced. All incoming data is rerouted to a different pod, but the data that was on the pod is still readable and available for restore.
How do you roll up you old pods? By this I mean, do you still have all of your 1.0 pods running, or do you start migrating them to 3.0 pods, and cannibalizing the older pods for parts, until full failure/replacement, or...?
We have migrated the data from smaller pods to larger pods, then re-enabled backups on the new half filled pod to fill it to the brim. But we did not do this because the pod was necessarily old, we do this if a pod is unstable for some reason (usually it is a brand of drive we ended up not liking).
We have done exactly what you mention -> migrated off a pod and then disassembling the pod and using the older drives as replacements. Hard drives often come with a 3 year warranty, so for the first 3 years it is free to get replacement from the manufacturer if they fail. But after 3 years we have to pay out of our own pocket to replace the drives which can change the cost/math a little.