I'm probably biased; but I got screwed using ReiserFS. It was fast and great, but one day something went wrong and then I found that the reiserfsck program was practically useless. Very little work had been done on it, so any small inconsistencies in the FS meant your data was toast.
Sure, keep backups and all that, but just be aware how important a good fsck tool is.
On writes ZFS also checks that each written block is valid before committing the new block to the filesystem. In the event of a bad write the new block is not committed and the old block remains, blocks are written originally to a separate area of the disk.
I suspect (but I have no data!) that the majority of times that users have file system corruption and need to run fsck is due to bad software behaviour, and not failing disks.
If you have a bad software write, ZFS should be able to detect that in the same way it would a hardware error since the checksums won't match. Also worth noting that the checksums are themselves checksummed in a merkle tree. It also has a tool called "scrub" which can check data integrity without unmounting the disk.
Of course if your filesystem has buggy code for checksumming/repair then you are boned whatever.
One thing to keep in mind is that ZFS is more complex than your typical filesystem, but it's also one of the few filesystems that's effectively a cryptographically self-validating tree. You're a lot more likely to have a disk lie to you than ZFS flake out.
Yes ZFS is quite resilient against corruption but when it bites, which has happened plenty of times in the past, you are truly screwed.
The philosophy behind ZFS and file system recovery has always been that it is too inconvenient. It is much better to restore from tape backups so why make any effort on recovery?
Thing is, most consumers don't have tape backups...
Because of that, ZFS is in the very vast majority of cases a bad choice for home users, especially under linux where it can hardly be considered mature.
BRTFS just isn't ready yet.
As a home user you can't even get a decent COW FS with decent snapshotting or checksums. That is the sad state we are in and will be in for many years to come.
Actually the whole point of RAID is to combat a MISSING drive, not a corrupt drive. Parity works when I'm lacking one bit ... not when any of my bits might be lying.
Half of the reason of ZFS is to protect against the scenarios when a drive's internal checksum should catch a data error - but the drives returns bogus data and says 'oh its good'. RAID5 can't catch that(mathematically impossible) ... RAID6 can but most implementations don't check because its expensive(extra IO and calcs)
This is the reason most of your high end SAN vendors don't just deploy RAID5/6/10 but also checksum the data themselves under the covers. They don't trust even high end SAS/FC drives.
I've been using ZFS for years now, i've had disks go bad, I've replaced them. ZFS has protected me from data loss better than anything else. In the past when I've had the disk lie to me, ZFS was able to tell me what files were corrupted and thus had invalid data.
I'm going to need to throw a  on your post.
UFS, ext3, ext4 would have continued on giving me bad data back. Are there going to be cases whereby a file system fails completely? Yes, but writing tools to attempt to fix those issues that happen once in a million times is almost impossible because of the difficulty of replicating those failure scenarios and then writing code to deal with the disks having that corruption.
I still trust ZFS with more data than any other file system.
I'm getting "we need an fsck that does magic" vibes here.
Most data on consumer's computers is 1 disk failure away from loss anyway, hence the popularity of cloud syncing services.
I got screwed by ZFS, and I was able to recover the filesystem by learning its on-disk structure and fixing it directly on-disk. Something that a decent fsck tool could do. But no, ZFS never breaks. Go figure.
How? ZFS refuses to mount/import a corrupt filesystem.
In my case, the latest superblock (or some internal bookkeeping structures that the superblock points to) was corrupted in a way that ZFS completely gave up. So what I ended up doing is to manually invalidate the latest superblocks until ZFS was able to mount the filesystem. I may have lost the changes written in the few minutes before the corruption, but that's still way better than loosing everything.
Before I decided to poke around the raw disk with dd (to invalidate the superblocks by overwriting them with zeros), I googled around and I wasn't the only one with that problem. One other guy asked on the ZFS mailing list and the response was along the lines of 'Your filesystem is FUBAR, restore from backup'.
You may argue that ZFS itself should do what I did (dropping a few minutes of transaction history and roll back to the latest non-corrupt state) upon mounting. Fair enough. I don't really care if that functionality is built into ZFS or an external fsck binary. The fact is that ZFS wasn't able to recover from the situation. One that I would argue is very trivial to recover from if you know the internal ZFS on-disk structure.
zpool clear -F $POOLNAME
You may have found a corner case in the fs and perhaps this sort of thing should be added to the import command, but I'm not sure simply having an "fsck" fixes this. I just think the import command appears to have a bug/needs a feature.
That's answered very well in the article connected to this other currently active HN discussion:
In short, fsck simply checks to see that the metadata makes sense, and that all inodes belong to files, and that all files belong to directories, and if it finds any that don't, it attaches them to a file with a number for a name in lost+found.
It's pretty crude compared to a filesystem debugger.
If you want to compare apples-to-apples, you'd be better off asking how zdb compares to debugfs (for ext2/3/4) as both are filesystem debuggers.
You could also ask "How's zfs scrub different from fsck?" and the answer to that would be: zfs scrub checks every bit of data and metadata against saved checksums to ensure integrity of everything on-disk. In comparison, fsck cannot detect data corruption at all, and can only detect metadata corruption when an inode points to an illegal disk region (for example).
Even that comparison shows fsck is crude when compared to scrub.
The tool to recover from corruption is a rollback:
clear [-nF] <pool> [device]
Maybe we've just been really lucky.
I won't discuss the nature of the business, but it's unlikely that actual corruption that isn't automatically repaired would go undetected for any amount of time.
Especially for those of us that don't have thousands of machines and can therefor be badly screwed by one issue.
What I find scary is silently corrupt data, something which is a problem for most other filesystems. I've seen ZFS catch and fix that error orders of magnitude more often than I've seen ZFS flake out. If we're talking risk analysis, I feel you're worried about a mouse in the corner, while a starved tiger is hungrily licking its chops while staring at you.
The thing about fsck-like recovery tools is you need to have a failure mode in mind when you write them. ZFS can fix most of those types of errors thanks to the checksums and on-disk redundancy on the fly. Or at least tell you that something is now going wrong and which files are affected.
However, scrub does not make ZFS perfect. There are still ways the filesystem can become corrupted without scrub noticing. Or corrupted in a way so that ZFS fails to recover from, even though recovering would be dead simple.
The attitude of the ZFS developers only works in the enterprise market: Your data is safe (checksummed, scrubbed, replicated using RAID-Z), but if a bit flips in the superblock just restore from your backup, because we won't provide tools to recover from that.
Let's argue that bits flip in all four uberblocks though, then ZFS will use the previous valid commit, which also has four uberblocks (ZFS is a CoW FS). And so on backwards for quite a few transactions. All these uberblocks are spread out at the beginning and end of each disk.
ZFS has a policy that the more important the data (and the top of the tree is most important), the more copies there are, although a user can also define how many duplicates there should be at the leaves of the tree.
Basically, you'd need a very trashed disk to render an entire pool dead. You're not going to recover from that, regardless of filesystem.
The fact that the vast majority of other filesystems have no way to detect silent corruption of data (only metadata inconsistencies) is far more frightening to me.
Here is an nice article written by someone who discovered just how unreliable disks are, after switching to ZFS (because other filesystems couldn't detect the corruption). Quite an eye-opener.
If you use chrome and are getting the same error as I am, here's the google cache link (Firefox will load it):
The article references sources of studies of hard disk corruption, if you'd want something with even more detail and statistics:
Unless you assume that all hardware is perfect and that no bugs exists in ZFS (have been plenty of those in the past that just makes the whole FS completely worthless).
Nut no, ZFS doesn't need fsck. Because, we don't want it to need one. Oh, and we don't want to spend the resources developing one.
The only reason ZFS doesn't have a fsck tool is because in the enterprise world it doesn't need one. When it is needed you just restore from tape instead. It is that simple.
At first glance, it certainly seems to depend upon some high-level ZFS data in order to start. A command like 'zpool scrub pool-name' still needs to navigate the ZFS pool data on disk in order to locate the named pool.
They're validated as well. Everything has a cryptographic checksum that's stored in the parent block, starting at the data and working all the way up to the top of the tree.
Furthermore, the higher up the tree you go, the more redundant copies there are. The top of the tree has four copies, if I recall. This ignores support for mirroring and striping, which further improves data redundancy.
But wait, there's more! Those four blocks aren't overwritten. ZFS is a copy-on-write filesystem (data and metadata) which behaves a lot like the persistent data structures that Clojure hackers are so fond of, so if the newest writes do not validate, it'll roll back to the newest valid commit of the tree.
That's a nice way of saying that you're guessing given your experience with other filesystems. ZFS was a genuinely revolutionary filesystem, and doesn't behave like other filesystems, the sole OSS exception being BTRFS. Read up on it a bit, you'll find something interesting. :)
It then reads the data about what zpool a disk belongs to from disk. That data is itself stored in multiple locations so that it is unlikely that all of those are corrupted, after importing the disks you can run scrub.
If the pool metadata itself is corrupted, generally you can roll-back to a previous time when the data is not corrupted.
This document: http://docs.oracle.com/cd/E19082-01/817-2271/6mhupg6qg/index... describes fairly well what all the options along the way are.
None of the failure modes described would be any better if there was a fsck tool available... in all file systems it is going to cause dataloss.