effing love this. I wonder what the reliability stats on this setup is, though. ...

skolor · on Sept 1, 2009

I would assume they don't care about any of the hardware, just the data. If you look at the setup, and how the drives are raided, there 45 drives are sub-divided into 15 raid arrays, of which it would take 3 to die before they lost data. Essentially, they would need 20% of their drives to die simultaneously for them to lose data.

Now, for the rest of the hardware, its not that important to if it fails. If one of the other components die, you're only looking at some down time (and a possible dead hard drive or two from a PSU dying, which I assume they monitor regularly). As long as the data is secure and in one piece, it doesn't really matter whether the pod is up or down, until someone needs the data. Just send out your repair guy to replace it and reboot it, and its fine.

Andys · on Sept 2, 2009

In sysadmin terms, they have aimed for data integrity over availability.

If a system goes down and they have to replace a disk or a power supply or motherboard, the data is still safe, and if by chance there's any $5/month users who need to restore data held on that one unit, well they can wait a few hours :-)