Best comment on that thread (from the bug reporter)
"Those following along at home is probably half the human race, now we have posts on Phoronix, Slashdot and Heise. Who the hell submits things like this to random-terrified-user media outlets before we've even characterized the bloody problem? Every one of those posts is inaccurate, of course, through no fault of their own but merely because we didn't yet know what the problem was ourselves, merely that I and one other person were seeing corruption: we obviously started by assuming that it was something obvious and thus fairly serious, but that didn't mean we expected that to be true: I certainly expected the final problem to be more subtle, if still capable of causing serious disk corruption (my warning here was just in case it was not).
But now there's a wave of self-sustaining accidental lies spreading across the net, damaging the reputation of ext4 unwarrantedly, and I started it without wanting to.
It's times like this when I start to understand why some companies have closed bug trackers."
This is symptomatic with one of my two big problems with journalling oriented file-systems.
My problems with journalling are two fold:
1) They are very slow:
1a) You have a nice big sequential write into the journal, which is OK.
1b) A flush track cache to make sure it is actually in the journal. This can sync whatever has accumulated in the track cache which might not just be journal data.
1c) The actual overwrites that spew data randomly over the drive.
1d) Writes to update the journal header/terminate the transaction.
1e) A final flush track cache which will sync who-knows-what onto the platters/flash.
2) Replay behavior of the journal log is _very_ fragile code. You need to handle lots of terrible cases, the most awful of which is to ensure that you don't play older transaction on top of newer transactions. You might say "hey, that shouldn't happen!" but it happens. It happens because the code is not trivial to write and detecting these bad cases aren't trivial. Even if you do get the code write, you are still screwed. Why? Because if your drive does not support the flush track cache mechanism you are in for a world of pain. You can have a journal and journal header that is ancient if it just stuck in some cache...
The ext* family of filesystems do not appear to have natural resiliency to this sort of problem. Instead, it appears to be a coordinated, concerted effort between various parts of the journaling code.
>Because if your drive does not support the flush track cache mechanism you are in for a world of pain.
If your drive does not support CACHE_FLUSH mechanism, then pretty much any file system that are supposed to handle unclean shutdowns (ZFS, btrfs), etc., are going to be screwed.
More generally, any time you have a file system, there will always be very delicate code. Fundamentally file systems are _hard_. It has to handle a huge amount of concurrent operations, and users want speed, so we use fine-grained locking, and there's a reason why file systems take 100 man years or so to become production ready. The btrfs folks started in 2007, and I warned them it would probably be at least 5-7 years, minimum before it would be ready for the enterprise. And now here it is five years later, and the community distributions (not the enterprise distros) are just starting to adopt btrfs. ZFS took seven years to develop before Sun announced it, and then it took a few more years before people trusted it on production servers.
No one believes how hard it is when they start, but the history is pretty clear.
Note that this is dated after the "Wed, 24 Oct 2012 17:31:29 -0400" post and talks about changing code related to a truncated journal.
My post does not say anything about the likelihood of the dataloss due to such bugs, I mention what happens in every design in a WAL journaling system.
Talking about the impact of a particular ext bug is up to the people who post on the LKML. To be honest, I don't really care how severe the bug is, I care about why there are bugs and journal truncation is a source of them.
> The ext* family of filesystems do not appear to have natural resiliency to this sort of problem.
I believe the fact data corruption on ext4 is rare shows it's not really a huge problem. The only time I lost data on an ext4 filesystem was when I mistyped a wildcard for rm.
I would rather not have byzantine relations between fragments of code make up the policy of my file system's metadata resiliency thank-you-very-much. I would rather prefer that stale journal replays be no-ops, even at the expense of making journal replay slower.
Unfortunately the open source ones kinda blow due to interesting performance penalties (I'm looking at you, btrfs back ref management) or crazy memory usage/non-extent based systems/read-modify-write induced mania (I'm looking at you, ZFS block tree)
I've been playing with btrfs under the 3.7 prerelease kernel on an arch kvm with a virtio disks, and with lzo compression btrfs is kicking some serious read / write butt vs every other major filesystem I can think of. Given sufficient hardware, of course. Haven't tried it on an ssd though, only on 7200rpm turnstables.
I still use reiserfs for all my linux machines (ext2 for /boot); ZFS on freebsd. I haven't experienced the corruption I've seen under ext3 with either of them.
Ubuntu 12.04 Server crashes randomly due to some obscure bug in 3.2
I upgraded the kernel to 3.6.3 specifically to stop the machine from crashing. Now let's hope that INDEED it doesn't crash, else I might get hit by this newfangled Ext4 bug.
Sounds like the accelerated kernel development is hitting various limits, they should go back to more stodgy stable series like before.
If you have reasonable resiliency and redundancy in your backups then it largely doesn't matter what filesystem is used. I do and consequently use btrfs everywhere across several different machines in several different configurations. The crucial functionality which pushed me over was:
* snapshots (I already did this using rsync and hard links)
* compression (Save some space)
* checksums and online scrubbing (I had a disk very slowly go bad with ext4. The only scrubbing available required the system being offline for 19 hours.)
I use Ubuntu on all my machines and use the standard kernel that comes with it. For a brief while I used kernel 3.4 on my laptop due to Ivy Bridge issues but could also boot into the Ubuntu 3.2 kernel just fine. (And on the laptop I'm also using encryption.)
Note that the kernel has updates of the code, not updates of the filesystem format. Supposedly the last filesystem format change was in 2.6.31 (from 2009).
Compression is a format "difference". For example lzo was added about 18 months ago, so in theory an older kernel won't understand your filesystem that is using it. There was some talk about snappy being added earlier this year, but I don't know if it ever was.
I haven't encountered any compatibility issues at all with btrfs. In various places I am using compression, RAID 0, RAID 1, ext4 to btrfs conversion, encryption (dmcrypt/LUKS), SATA, USB and who knows what I have forgotten. Except for one system all are on UPS and in general do not experience unclean shutdowns or similar adversity.
The only trouble I have ever had is when a filesystem filled up. It gets quite challenging freeing up disk space because deleting a file requires an append to the btree which needs space.
btrfs has been an absolute joy for me. I can just throw devices at it. I can do scrubbing without taking systems offline. I could convert existing filesystems. I don't have to perform system administration with it. And I know it will tell me about bitrot. And worst case I have backups, and backups of backups, on and offsite.
What kind of performance do you get out of it? My understanding was that it's still slow compared to, eg, ext4, but it might not be very visible to normal users.
Two things are slow. One is random rewriting of large file contents like a database does. It isn't that bad, but you wouldn't want to run something like MongoDB on btrfs, although I did so for a while. You may see similar issues with virtualization disk images, although that will depend on how much writing you do. I still do this, but don't do much writing and performance is acceptable.
The other part is fsync calls. Doing a package install on Debian/Ubuntu is an absolute blizzard of fsync calls, several per file. Kernel 3.5 is better than 3.2 and supposedly 3.7 will be even better again. There is also the 'eatmydata' LD_PRELOAD wrapper which nulls out the fsync calls. It was worthwhile using on 3.2, and with 3.5 I don't bother.
That is write issues. I have no noticeable difference in read speeds, and they are limited to the speed of the underlying media, which is pretty much the case for any filesystem. In theory using compression will help for compressible files.
"Those following along at home is probably half the human race, now we have posts on Phoronix, Slashdot and Heise. Who the hell submits things like this to random-terrified-user media outlets before we've even characterized the bloody problem? Every one of those posts is inaccurate, of course, through no fault of their own but merely because we didn't yet know what the problem was ourselves, merely that I and one other person were seeing corruption: we obviously started by assuming that it was something obvious and thus fairly serious, but that didn't mean we expected that to be true: I certainly expected the final problem to be more subtle, if still capable of causing serious disk corruption (my warning here was just in case it was not).
But now there's a wave of self-sustaining accidental lies spreading across the net, damaging the reputation of ext4 unwarrantedly, and I started it without wanting to.
It's times like this when I start to understand why some companies have closed bug trackers."