

Ext4 data corruption trouble - tshtf
https://lwn.net/Articles/521022/

======
sciurus
Best comment on that thread (from the bug reporter)

"Those following along at home is probably half the human race, now we have
posts on Phoronix, Slashdot _and_ Heise. Who the hell submits things like this
to random-terrified-user media outlets before we've even characterized the
bloody problem? Every one of those posts is inaccurate, of course, through no
fault of their own but merely because we didn't yet know what the problem was
ourselves, merely that I and one other person were seeing corruption: we
obviously started by assuming that it was something obvious and thus fairly
serious, but that didn't mean we _expected_ that to be true: I certainly
expected the final problem to be more subtle, if still capable of causing
serious disk corruption (my warning here was just in case it was not).

But now there's a wave of self-sustaining accidental lies spreading across the
net, damaging the reputation of ext4 unwarrantedly, and I started it without
wanting to.

It's times like this when I start to understand why some companies have closed
bug trackers."

~~~
saurik
Also (same overall gist but with more technical detail),
<https://lwn.net/Articles/521090/>

------
cokernel_hacker
This is symptomatic with one of my two big problems with journalling oriented
file-systems.

My problems with journalling are two fold:

1) They are very slow:

1a) You have a nice big sequential write into the journal, which is OK.

1b) A flush track cache to make sure it is actually in the journal. This can
sync whatever has accumulated in the track cache which might not just be
journal data.

1c) The actual overwrites that spew data randomly over the drive.

1d) Writes to update the journal header/terminate the transaction.

1e) A final flush track cache which will sync who-knows-what onto the
platters/flash.

2) Replay behavior of the journal log is _very_ fragile code. You need to
handle lots of terrible cases, the most awful of which is to ensure that you
don't play older transaction on top of newer transactions. You might say "hey,
that shouldn't happen!" but it happens. It happens because the code is not
trivial to write and detecting these bad cases aren't trivial. Even if you do
get the code write, you are still screwed. Why? Because if your drive does not
support the flush track cache mechanism you are in for a world of pain. You
can have a journal and journal header that is ancient if it just stuck in some
cache...

The ext* family of filesystems do not appear to have natural resiliency to
this sort of problem. Instead, it appears to be a coordinated, concerted
effort between various parts of the journaling code.

~~~
riledhel
What is your main use case? What is your choice instead of ext*?

~~~
cokernel_hacker
I like write-anywhere/COW filesystems.

Unfortunately the open source ones kinda blow due to interesting performance
penalties (I'm looking at you, btrfs back ref management) or crazy memory
usage/non-extent based systems/read-modify-write induced mania (I'm looking at
you, ZFS block tree)

~~~
mcpherrinm
What non-open source ones do you like, then?

~~~
sliverstorm
Oh, you've probably never heard of them.

------
bcl
Update here: <https://lwn.net/Articles/521090/>

------
Florin_Andrei
Oh great.

Ubuntu 12.04 Server crashes randomly due to some obscure bug in 3.2

I upgraded the kernel to 3.6.3 specifically to stop the machine from crashing.
Now let's hope that INDEED it doesn't crash, else I might get hit by this
newfangled Ext4 bug.

Sounds like the accelerated kernel development is hitting various limits, they
should go back to more stodgy stable series like before.

~~~
rbanffy
> Ubuntu 12.04 Server crashes randomly due to some obscure bug in 3.2

Must be really obscure. With more than 100 servers under my watch, I never saw
anything like that.

~~~
Florin_Andrei
I'm guessing all those 100 servers are the same hardware, or a few variations.

------
batgaijin
What's the status with BTRFS? Also, is ZFS Linux performance still terrible,
or did they find a way to fix that?

~~~
meaty
btrfs is shipping with OpenSuSE and Oracle Linux apparently as stable. Not
sure I trust it yet. If they ship it with Debian stable I will consider it.

ZFS still uses FUSE AFAIK so is going to suck.

~~~
batgaijin
I thought they had a kernel module or something bypassing the need for fuse.

~~~
dmpk2k
Indeed there is: <http://zfsonlinux.org/>

There's more than one ZFS port to Linux, which might be confusing some people.

There's also the more dramatic option of using FreeBSD or one of the illumos
derivatives, which work fine in my experience too.

