I come not to praise RAID-5... (Why RAID-5 isn't evil)

lmm · on Aug 3, 2012

The problem is the stupid way raid systems (or at least linux md) handle these inevitable UREs. Make a raid5 of 2TB disks, use it for a bit, and it's virtually guaranteed there's one bad sector on each disk (or at least, you'd get one URE on each disk when reading them all). Now have a drive fail. No problem, you think, I'll replace that drive and rebuild. Put in the replacement, kick off the rebuild process. Linux will hit a URE on one of the drives, kick that drive out of the array, and refuse to rebuild any more. You can't even mitigate this by doing a weekly verify of your disks, because if that verify happens to run into UREs on two separate disks, bam, bye bye data.

This is not theoretical, it happened to me; maybe there are some magic parameters that get around it, but I read the available tutorials; if I made a mistake, others have probably made it. I asked a kernel dev how to solve this problem and he suggested a cronjob that runs md5sum on each of the underlying disk devices. I wish I was joking.

Fortunately, there is a simple solution, ZFS. A raidz1 will recover perfectly from the same scenario (again, this is not theoretical, I've done it); you will lose the particular blocks that suffered the UREs, but no others (and ZFS can tell you which files were affected). And you can run the "zfs scrub" command regularly to catch any sectors that've decayed before you lose a drive and can no longer use parity to restore them.

baruch · on Aug 3, 2012

You're supposed to run dm scrubs as well, if you don't it is possible that you have a bad sector somewhere on your disks.

See http://en.gentoo-wiki.com/wiki/RAID/Software#Data_Scrubbing