Hacker News new | past | comments | ask | show | jobs | submit login

HDDs are also prone to silent bitrot, where it will simply return incorrect bytes for a sector, even without any smart errors. (Optical disks also bitrot; but so does HDDs).

This is usually a precursor to SMART errors happening in the near future, but unfortunately, it can still result in corrupted replication and corrupted backups; as your backups would be backing up the rotten (corrupt) data.

I've witnessed this happen on both Seagate and WD drives, on systems with ECC memory. I can only suspect this is due to HDD manufacturers wanting to reduce their error rates, and RMA rates: it may happen with the ECC bits in a sector is corrupt, making bitrot undetectable. Instead of giving an error (and being grounds for a RMA replacement), the HDD firmware may choose to return non-integrity-checked data; which would usually be correct but also could be corrupt.

It's why filesystems like ZFS and btrfs are so important.

My rough estimation of this, based on my own experiences and those on r/DataHoarder, suggests 1 hardware sector (4KB for most drives post 2011) will silently corrupt per 10TB year. Such corruption can be detected via checksumming disc systems like ZFS.

Usually, the whole sector is garbage, which is not indicative of cosmic ray bitflips.

External flash memory storage like USB sticks and SD cards fare far worse. In my own experience, silent corruption occurs more like 1 file per device, per 2-3 years; irrespective of the size of memory. I've had USB sticks and SD cards return bogus data without errors, so often. I only know because I checksum everything, otherwise I would have thought the artefacts in my videos or photos came with the source.

If, in 2020, you are not using ZFS or btrfs for long term archival, you are doing something wrong.

ext4, NTFS, APFS, etc may be tried and tested, but they have no checksumming, and that is a problem.

Interestingly, on my home ZFS raidz with 3 4TB hard drives, I have had to replace a drive a couple of times because ZFS scrub was reporting silent corruption. They were consumer-grade SATA drives.

However, at work, I have backed up ~200TB of data to a large server with RAID-6 and ext4, storing the backups as large .tar files with par2 checksums and recovery data, and regularly scrubbing the par2 data. I have yet to see any corruption whatsoever. These are enterprise-grade hard drives. This is the strongest evidence I have yet seen that the enterprise-grade drives are actually better than the consumer-grade ones, rather than just being re-badged.

Enterprise drives have different firmware, especially from an ECC and integrity perspective. From a price/perf standpoint tho, shucking consumer grade drives with ECC win.

Thanks. What are the drives at your workplace?

I actually have no idea. I didn't have any part in purchasing that particular system, I don't have root, and all the drives are hidden behind a RAID controller. Sorry.

How do you know they are enterprise drives then?

I have a home "NAS" (opensuse server) where my main /data partition is xfs, but it mounts a btrfs backup partition, rsyncs, and takes a snapshot.

I should really get around to converting the main drive to btrfs, but this works well.

Do proper use of ZFS also require ECC memory?

ZFS protects you from disk errors. ECC protects you from memory errors. Using one or the other is safer than using neither. Using both together is even safer.

100% yes. With non-ECC you will always have bad RAM bits eventually. With ZFS this is especially bad because it can corrupt your checksums or your ZFS metadata, which means either silently corrupting your data, or corrupting ZFS itself and losing your entire zpool (akin to losing a RAID array).

Maybe not: that ZFS needs ECC is "common wisdom", but the disaster scenario appears not so likely. See



This is FUD perpetuated by a certain individual on the FreeNAS forum.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact