Hacker News new | past | comments | ask | show | jobs | submit login

Yes it does or I don't understand that: http://louwrentius.com/please-use-zfs-with-ecc-memory.html https://pthree.org/2013/12/10/zfs-administration-appendix-c-...

Always fun to post on HN and getting downvoted ;o




The first article doesn't contribute useful metrics - it simply infers that it would be bad. Here's the thing: have you run filesystem recovery tools lately? Because if you're at that point, your data is toast.

It's toast if you were running mdadm or RAID. It's toast if a file gets deleted and recovered by ext3/4 (good luck figuring out which chunk you're looking at in lost+found).

The second article is simply stressing my original point: ZFS cannot protect you from bad RAM. No file system can. If data is corrupted in RAM, ZFS will checksum that and write it to disk. This is exactly what will happen with any other filesystem.

ZFS storage pools can recover from fairly dramatic failures. You can lose a disk out of a non-redundant stripe and still have complete metadata (just incomplete data, but checksums tell you exactly what you lost). Unimportable pools are not something that easily happens without severe damage - a corrupted uberblock can recovered by using the zpool import -T to rewind a txg and get back a verifiably complete copy of your data.

This idea that somehow the failure mode of other filesystems to faulty RAM is better is destructive fiction. You have no idea if your data is valid on disk. You have no way to verify if the data is valid in the first place. And the degree and number of failures required to wipe out a pool would also wipe out any other storage system. The idea that recovery tools will save you in a catastrophic failure (of what kind?) is laughable. This can be trivially explored by trying to recover a deleted file in ext4. Sure, it can technically be done. Good luck with doing it more then once.


As far as I've understood, the gist of the "ZFS without ECC is scary"-mantra is that one bad memory location (say a flaw that results in a word always getting zeroed) might corrupt all your on disk data due to scrubbing ("read data, compute checksum, write data, repeat") Is this in fact a non-issue with ZFS?


Read data -> compute checksum -> check against on-disk checksum -> rebuild from parity -> repeat.

But this also wouldn't be a random failure. ECC RAM with a consistent defect would give you the same issue. It would also proceed to destroy your regular filesystem/data/disk by messing inode numbers, pointers etc.

Your scenario would require an absurd number of precision defects: ZFS would have to always be handed the exact same block of memory, where it always stores only the data currently being compared, and then only ever uses that memory location to store the rebuilt data. And then also probably something to do with wiping out the parity blocks in a specific order.

This is a weird precision scenario. That's not a random defect (bit-flip, which is what ECC fixes) - that's a systematic error. And I'd be amazed if the odds of it were higher then the odds of a hash collision given the number of things you're requiring to go absolutely right to generate it.

EDIT: In fact I'm looking at the RAID-Z code right now. This scenario would be literally impossible because the code keeps everything read from the disk in memory in separate buffers - i.e. reconstructed data and bad data do not occupy or reuse the same memory space, and are concurrently allocated. The parity data is itself checksummed, as ZFS assumes it might be reading bad parity by default.


Thank you! I was always a bit puzzled why the Sun engineers would not verify against the on disk checksum, but it made some sense that in a server setting ECC can be assumed.

By the way, the above "insane" scenario comes from the freenas forums

http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-r...

and is used to motivate why ZFS should have ECC ram. I am glad to hear that is much less of an issue than it's made out to be.


The mechanism that you describe should be impossible to achieve with one bit-flip. I have no idea how you came to think this.


There's a lot of misconception spread by FreeNAS forums. I suspect that's where he read it from:

http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-r...

The author there makes some crazy assumptions like this: "But, now things get messy. What happens when you read that data back? Let's pretend that the bad RAM location is moved relative to the file being loaded into RAM. Its off by 3 bits and move it to position 5."


This is indeed where I read it. When googling to find it again I found a similar story at:

https://pthree.org/2013/12/10/zfs-administration-appendix-c-...

Any good ideas on how one might fix this incorrect "common wisdom"?


Honestly no idea, except just pointing it out when someone mentions it.

The whole ECC recommendation is due to ZFS (unlike other filesystems) providing guarantees to data correctness, but as you know while ZFS can discover data corruption on the disk thanks to checksums, it can't guarantee data correctness in RAM because because as any program it is bound to trust it. That's why ECC RAM is highly recommended.


I think there is some confusion about the impact of bit flips on reads and on writes. Those are separate cases. That blog post is absolutely correct, but it talks about writes. People here are talking about reads.


The forum post and blog both state that data also gets corrupted even more during reads. They also use strange assumption that data would be written to RAM ignoring byte boundary (i.e. shifted by 4 bits).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: