Hacker News new | comments | show | ask | jobs | submit login

Do you replace a drive as soon as failure occurs? Or do you wait until X number of drives in a chassis go bad to make it worth the time for the tech to pull the chassis out of the rack?

I work for Backblaze -> typically we replace a drive once it fails, but not "as soon as it fails". Because the storage pods go in to "read only" mode when one drive goes down, we have some time before we need to take action, sometimes it can be a few days before the drive is replaced. All incoming data is rerouted to a different pod, but the data that was on the pod is still readable and available for restore.

How do you roll up you old pods? By this I mean, do you still have all of your 1.0 pods running, or do you start migrating them to 3.0 pods, and cannibalizing the older pods for parts, until full failure/replacement, or...?

We have migrated the data from smaller pods to larger pods, then re-enabled backups on the new half filled pod to fill it to the brim. But we did not do this because the pod was necessarily old, we do this if a pod is unstable for some reason (usually it is a brand of drive we ended up not liking).

We have done exactly what you mention -> migrated off a pod and then disassembling the pod and using the older drives as replacements. Hard drives often come with a 3 year warranty, so for the first 3 years it is free to get replacement from the manufacturer if they fail. But after 3 years we have to pay out of our own pocket to replace the drives which can change the cost/math a little.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact