

Your SATA RAID has a 56% chance of data loss - oxyona
http://alumnit.ca/~apenwarr/log/?m=200809#08

======
patrickg-zill
ZFS however would protect against this, in that it checksums each chunk of
data. And "zfs scrub poolname" will go through and do the checksums for you in
the event you are storing a lot of data and not necessarily reading/writing it
often.

~~~
spydez
Well, maybe I should look into transition my home server from Ubuntu and XFS
to Nexenta or OpenSolaris and ZFS...

I'd hate to lose some of the data on that thing, though I'm currently not
doing any RAID. Just a disk of important stuff rsnapshotted every 4 hours to
another disk.

~~~
lallysingh
I've been running opensol/zfs for almost a year now. Zero problems, good perf
& reliability. My research is on two mirrored sata/160gb drives. The most
important stuff is on a FS on there with copies=2, making 2 copies on each
drive.

Also, backing up zfs is really, really easy. 48 lines of shell scripts do
daily incrementals and weekly snapshot backups.

But, do yourself a favor and put /usr/gnu/bin first in your path. oh, and
alias tar=gtar. Old solarisisms will otherwise be irritating.

------
ars
Wow. If you have an 8TB raid, that's 7TB of data so 61,572,651,155,456 bits

A hard disk has a bit error rate of 10^15:

1,000,000,000,000,000

That's about 16 times higher.

That's horrible.

Taken to a single 750GB disk that means you can read the entire disk an
average of 155 times before you'll encounter an error! That's shockingly bad.

~~~
Andys
I am, among other things, a sysadmin who babysits several RAIDs, including a
10TB one.

I can confirm that the large capacity SATA disks have a noticeably high error
rate, which I have never seen before. It used to be that RAID protected you
from disk failures, but we aren't seeing that nearly as much now as we are
unrecoverable bit errors on otherwise healthy disks.

I have set all our RAID controllers to scrub the data and compare the mirrors
every weekend. In the past couple of years we've only had a couple of
unrecoverable read errors, but many recoverable read errors.

Recoverable read errors are still bad, the disk gets taken offline and out of
the RAID set for up to a minute while the controller retries many times until
the data can be read. Then the sector is remapped to another part of the disk.

So yes, it is a real problem, but it isn't too hard to manage in this era of
filesystem snapshotting plus high speed RAID6 controllers. In return we've
been given huge amounts of storage space.

------
lallysingh
Note that it mentions that SATA has a higher bit-error rate than SAS/SCSI.
Apparently through the use of fewer ECC bits.

But yeah, ZFS rocks.

------
ars
You could also set smartmon tools to do a short self test every day, and a
long one (which reads the entire disk) every week.

At least you'll know if there was an error.

------
JoelSutherland
I keep waiting for a product that solves the failing hard drive problem at the
consumer level. For most people I know, the most likely cause of data loss is
hardware failure.

Are any manufacturers offering a package that contains multiple hard drives
but behaves as one?

If performance isn't a factor, is there something I don't understand that
makes this impossible?

~~~
briansmith
* Most hard disks are actually multiple hard disk platters packaged together. Even with today's systems, the filesystem can multiplex data across the platters so that the platters mirror each other. Even on a single-platter drive, the file system could just mirror the data on the same platter.

* People want increasingly-higher-capacity, smaller, cheaper, thinner, lighter, and cooler laptops. Adding a second magnetic hard drive into a laptop runs counter to all those goals, as does the kind of mirroring described above. The consumer market demands higher capacity over safety.

* Solid state hard drives seem to be the future, at least for laptops. It might be practical to install dual SSDs into a laptop since they are smaller and lighter. It might even be practical to have a rotating set of three as a backup system. Eject hard disk #1, then insert hard disk #3. Hard disk #1 becomes the backup. Hard disk #3 gets synced with hard disk #2 until they mirror each other. Then, relabel the disks 1->2, 2->3, 3->1.

* People are being encouraged to save (or at least back up) their data on the network if it is really important.

~~~
JoelSutherland
"The consumer market demands higher capacity over safety."

My contention is that this is beginning to change. Short of video (and at the
risk of being wrong in 5 years) there is nothing to do with a 500GB drive.

I think given the trade-off, consumers would choose safety.

~~~
briansmith
For the same price point, maybe. But, hard drives are one of the most
expensive components of a laptop. Choosing between $500 and $600 is a big
deal. Network backup like Mozy is competitively priced, especially when you
consider the high chance of catastrophic loss that laptops have (stolen, lost,
dropped, flooded).

Plus, if you were going to trade 50% of capacity for safety, there are a lot
of other kinds of safety precautions to consider. For example, file history
versioning (like a local, always-on Mozy or Dropbox or Subversion) would
probably protect the user from more data loss than RAID, because most data
loss is due to human error, not mechanical failure.

That said, I would really like my laptop to have RAID-1. Actually, I have two
100GB hard drives and an Ultrabay...maybe I will give it a shot on my Linux
partition.

------
keyes
Any decent hardware RAID controller has a background "patrol read" which takes
care of this problem.

------
aparrie
Anyone use ZFS on Linux?

<http://www.linuxworld.com/news/2007/061807-zfs-on-linux.html>

------
attack
This is what keeps encryption impractical, especially for laptops.

~~~
lonestar
Why? Assuming you're using CBC mode within each sector, you wouldn't lose any
additional data due to a sector error compared to an unencrypted disk.

~~~
attack
These are bit level errors. So instead of losing a bit or two, you lose a
massive chunk. That's the difference between a strange character in your Doc
file and losing the whole thing.

I think it's 128 bits for truecrypt, although internet searches point to many
people using higher (1024 and 4096). And if the bit level error rates are as
high as he says, looks bad.

~~~
wmf
If the hard disk detects the bit error, it will return an I/O error instead of
the corrupt data. Likewise, ZFS will not return corrupt data. So there are
plenty of cases where bit errors are promoted to block errors. I can
understand the desire to minimize propagation of corruption, but I'd rather
just use RAID.

------
sabat
I don't have any hard data, but my BS meter is going off.

