Hacker News new | past | comments | ask | show | jobs | submit login

This can not be stressed strongly enough. There is never a case when RAID5 is the best choice, ever [1]. There are cases where RAID0 is mathematically proven more reliable than RAID5 [2]. RAID5 should never be used for anything where you value keeping your data. I am not exaggerating when I say that very often, your data is safer on a single hard drive than it is on a RAID5 array. Please let that sink in.

The problem is that once a drive fails, during the rebuild, if any of the surviving drives experience an unrecoverable read error (URE), the entire array will fail. On consumer-grade SATA drives that have a URE rate of 1 in 10^14, that means if the data on the surviving drives totals 12TB, the probability of the array failing rebuild is close to 100%. Enterprise SAS drives are typically rated 1 URE in 10^15, so you improve your chances ten-fold. Still an avoidable risk.

RAID6 suffers from the same fundamental flaw as RAID5, but the probability of complete array failure is pushed back one level, making RAID6 with enterprise SAS drives possibly acceptable in some cases, for now (until hard drive capacities get larger).

I no longer use parity RAID. Always RAID10 [3]. If a customer insists on RAID5, I tell them they can hire someone else, and I am prepared to walk away.

I haven't even touched on the ridiculous cases where it takes RAID5 arrays weeks or months to rebuild, while an entire company limps inefficiently along. When productivity suffers company-wide, the decision makers wish they had paid the tiny price for a few extra disks to do RAID10.

In the article, he has 12x 4TB drives. Once two drives failed, assuming he is using enterprise drives (Dell calls them "near-line SAS", just an enterprise SATA), there is a 33% chance the entire array fails if he tries to rebuild. If the drives are plain SATA, there is almost no chance the array completes a rebuild.

[1] http://www.smbitjournal.com/2012/11/choosing-a-raid-level-by...

[2] http://www.smbitjournal.com/2012/05/when-no-redundancy-is-mo...

[3] http://www.smbitjournal.com/2012/11/one-big-raid-10-a-new-st...




Note that the 10^14 figure is only what the HDD mfgs publish, and it has been the same for something like a decade. It's a nice, safe, conservative figure that seems impressively high to IT Directors and accountants, and yet it's low enough that HDD mfgs can easily fall back on it as a reason why a drive failed to meet a customers expectations.

In reality you'll find that drives perform significantly better than that, arguably orders of magnitude better.

That said, I'm still not a fan of RAID5. Both rebuild speed and probability of batch failures of drives (if one drive fails, the probability of another drive failing is significantly higher if they came from the same batch), make it too risky a prospect for my comfort.


10^14 bit error rate is false or routine ZFS scrubs would produce documented read errors. I'm inclined to believe the entire math here is wrong as well.


What I don't understand about the URE is from [2]. If you have a 12TB raid 5 array and you need to rebuild. If 10^14 approaches an URE at around 12TB of data as the article says. What causes it to hit 12TB? Each disk has 10^14 which is 100TB. If you had 12TB from 4x3TB disks it should have alot to go through.


10^14 is bits. When you divide 10^14 by 8 you get 12.5 trillion bytes, or 12.5 TB.

If you have 4x 4TB drives in RAID5, and one fails, then in order to rebuild with a replacement drive, you have to read all data from all surviving drives (3 x 4TB = 12TB).

Here is an example from a manufacturer:

http://www.seagate.com/files/www-content/product-content/con...

They call it "Non-recoverable Read Errors per Bits Read" and then list "1 sector per 10^15". So for every 10^15 bits read they expect 1 sector to be unreadable.


OK i see now but if it needs to read 12TB from all drives together then that's still far from 12.5TB per drive which is the limit. That's where I am confused.


Say we have a department of a company with 36 employees, and one pair of dice. We decide that if any person out of the entire department rolls a 12, then everyone in the department will be fired. The chance of rolling a 12 is 1/36. It doesn't matter if one person keeps rolling the dice, or if they take turns, the chances of everyone being fired are close to 100%.

The same is true for a disk array. Each read operation is an independent event (for the purpose of doing this math). The chance of one URE happening is 1/(10^14) for every bit read. It doesn't matter which disk it happens on. When it happens, the entire array is failed.

Also 12.5 TB is not a hard limit, just an average. The URE could happen on the very first read operation, or you might read 100 TB without a URE.


I think your calculation on failing an array rebuild is wrong. Can you show how you got those numbers?


Sure, there were two statements I made.

>On consumer-grade SATA drives that have a URE rate of 1 in 10^14, that means if the data on the surviving drives totals 12TB, the probability of the array failing rebuild is close to 100%.

10^14 bits is 12.5 TB, so on average, the chance of 12TB being read without a single URE is very low, and the probability the array fails to rebuild is close to 100%. I was estimating 10^14 bits to be about 12TB, so the probability is actually 12/12.5 = 96% chance of failure.

>...he has 12x 4TB drives. Once two drives failed, assuming he is using enterprise drives...there is a 33% chance the entire array fails if he tries to rebuild. If the drives are plain SATA, there is almost no chance the array completes a rebuild.

A RAID6 with two failed drives is effectively the same situation as a RAID5 with one failed drive. In order to rebuild one failed drive, the RAID controller must read all data from every surviving drive to recreate the failed drive. In this case, there are 10x 4TB surviving drives, meaning 40TB of data must be read to rebuild. Because these drives are presumably enterprise quality, I am assuming they are rated to fail reading one sector for every 10^15 bits read (10^15 bits = 125 TB). So it's actually 40/125 = 32% chance of failure if you try to rebuild.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: