
ZFS on Linux: Unlistable and disappearing files - heinrichhartman
https://github.com/zfsonlinux/zfs/issues/7401
======
ryao
We are working on it. We know what patch introduced the regression and 0.7.8
is going out soon to revert it. Until then, users should downgrade to 0.7.6 if
they have not already. The Gentoo and EPEL maintainers have pulled the
affected releases from the repositories (technically masked on Gentoo). Ubuntu
was never affected.

The regression makes it so that creating a new file could fail with ENOSPC
after which files created in that directory could become orphaned. Existing
files seem okay, but I have yet to confirm that myself and I cannot speak for
what others know. It is incredibly difficult to reproduce on systems running
coreutils 8.23 or later. So far, reports have only come from people using
coreutils 8.22 or older. The directory size actually gets incremented for each
orphaned file, which makes it wrong after orphan files happen.

We will likely have some way to recover the orphaned files (like ext4’s
lost+found) and fix the directory sizes in the very near future. Snapshots of
the damaged datasets are problematic though. Until we have a subcommand to fix
it (not including the snapshots, which we would have to list), the damage can
be removed from a system that has it either by rolling back to a snapshot
before it happened or creating a new dataset with 0.7.6 (or another release
other than 0.7.7), moving everything to the new dataset and destroying the
old. That will restore things to pristine condition.

It should also be possible to check for pools that are affected, but I have
yet to finish my analysis to be certain that no false negatives occur when
checking, so I will avoid saying how for now.

~~~
exikyut
> _We will likely have some way to recover the orphaned files (like ext4’s
> lost+found) and fix the directory sizes in the very near future._

How should people behave right now?

Will normal usage of production filesystems erase data, or will read/write
activity leave the potentially-orphaned files in place?

You've also mentioned snapshots being tricky in the thread. Should people stop
creating snapshots in case orphaned files are not included in the snapshots?

\--

> _It is incredibly difficult to reproduce on systems running coreutils 8.23
> or later._

IIUC:

\- This is specifically due to the fact that `cp` in 8.23 is optimized (8.22
created files in {0..2000} order, 8.23+ randomized the order (I don't quite
understand why))

\- The script in
[https://gist.github.com/trisk/9966159914d9d5cd5772e44885112d...](https://gist.github.com/trisk/9966159914d9d5cd5772e44885112d30)
uses `touch` to create files in random order and some people reported this
triggered the bug

~~~
jwilk
> 8.23+ randomized the order (I don't quite understand why)

It's using inode order, which speeds up things significantly on some
filesystems:

[https://lists.gnu.org/archive/html/bug-
gnulib/2014-02/msg000...](https://lists.gnu.org/archive/html/bug-
gnulib/2014-02/msg00007.html)

~~~
exikyut
Ah, thanks! I knew there was a rationale but hadn't quite gotten it.

Nice.

------
herogreen
Wow that's an old Kernel!
[https://linux.slashdot.org/story/16/01/31/0424221/linux-
kern...](https://linux.slashdot.org/story/16/01/31/0424221/linux-
kernel-2632-lts-reaches-end-of-life-in-february-2016)

~~~
mrmondo
My thoughts exactly, while the RHEL kernel has lots of backported patches -
it's by no means complete or near current.

~~~
jlgaddis
Of course, it also isn't intended to be.

------
spindle
Probably obvious, but hooray for open source software! What a fantastic
response to the bug.

~~~
tinus_hn
Imagine if this had happened to Apple, everyone and his dog would be rolling
over each other villifying them.

~~~
Annatar
I paid good money to Apple so something like that would not happen, so yeah
I’d be villifying them. You bet!

~~~
tinus_hn
As opposed to the free RedHat Enterprise Linux operating system

~~~
Annatar
No, I love to hate redhat because they are the most incompetent company in the
computer industry’s history; just take a look at their priority 1 bugs at
bugzilla.redat.com and the picture becomes crystal clear. (I’m forced to
perform system engineering on RHEL every day and a lot, so my hate for it
grows daily.)

------
drewg123
Is this ZOL tip or Linux specific? FWIW, the bug does not seem to reproduce on
FreeBSD-current (r332158).

However, it also does not reproduce on Ubuntu 4.4.0-116-generic running the
ZFS stuff from Ubuntu.

~~~
ryao
Ubuntu was never affected. The regression started in 0.7.7 and Ubuntu is on
0.7.5. HEAD was affected until earlier today when the patch was reverted.

I am not sure if the bad patch was ported to the other OpenZFS platforms.

~~~
dmm
Debian Stable(jessie) is on 0.6.5.9 and Debian unstable(sid) is on 0.7.6 so
they shouldn't be affected.

~~~
simcop2387
Debian stable is stretch, released in june 2017. It's on 0.7.5 currently.

~~~
simcop2387
correction, I hadn't updated my server in a while, stretch itself is on
0.6.5.9 also, but stretch-backports is on 0.7.6 (not sure when that happened
exactly)

~~~
TimWolla
> but stretch-backports is on 0.7.6 (not sure when that happened exactly)

[2018-03-09] Accepted zfs-linux 0.7.6-1~bpo9+1 (source amd64 all) into
stretch-backports (Aron Xu)

source: [https://tracker.debian.org/pkg/zfs-
linux](https://tracker.debian.org/pkg/zfs-linux)

------
cmurf
Looks like they're on top of the causes and solution. But there's a caveat:
possible orphans.

[https://github.com/zfsonlinux/zfs/issues/7401#issuecomment-3...](https://github.com/zfsonlinux/zfs/issues/7401#issuecomment-379895691)

~~~
ryao
We will work out a solution for people affected so that they can get those
orphaned files back.

------
JetSpiegel
Regression introduced by
[https://github.com/zfsonlinux/zfs/commit/cc63068e95ee725cce0...](https://github.com/zfsonlinux/zfs/commit/cc63068e95ee725cce03b1b7ce50179825a6cda5),
will be reverted and a new version tagged.

------
hippich
From title first thought was - "Wow, this is awesome security/privacy feature
of the file system!" and then from reading comments I realise it is a bug

------
mafro
Great to hear the ZoL guys are right on this. Bravo.

It also reminds me why my NAS runs Debian..

~~~
sureaboutthis
It also reminds me why my NAS runs FreeBSD.

~~~
ryao
The bad patch passed review by developer(s) from other platforms. Matthew
Ahrens was a reviewer. This was not merged based on unilateral review by the
ZFSOnLinux developers. There was also nothing Linux specific about it.

That said, bugs happen. We should be putting new test cases in place to help
catch such regressions in the future. If we find more ways to harden the code
add against regressions of this nature as we continue our analysis, we will
certainly do them too.

~~~
jclulow
Reviewers will help catch bugs, but the engineer who writes the code and seeks
to integrate it is ultimately responsible for sufficient testing to avoid
issues like this one.

~~~
ryao
The author is a fairly new contributor:

[https://github.com/zfsonlinux/zfs/commits?author=sanjeevbage...](https://github.com/zfsonlinux/zfs/commits?author=sanjeevbagewadi)

What do you suggest that we should have done?

~~~
jclulow
I'm not sure, but I'm not deeply familiar with this part of the code. In
general, I think it's good to be able to induce all of the failure cases for
all of the error handling code that's being added.

This can be time-consuming work, but I would argue that this is the file
system -- anything less is an unacceptable risk. This quote from my boss comes
to mind:

 _Remember: you are (or should be!) always empowered as an engineer to take
more time to test your work._ \--
[http://dtrace.org/blogs/bmc/2015/09/03/software-
immaculate-f...](http://dtrace.org/blogs/bmc/2015/09/03/software-immaculate-
fetid-and-grimy/)

------
DimitarIbra9
i stick with old good ext4

------
rhabarba
So this is not a "ZFS bug", but a "ZFS on Linux" bug. The actual ZFS on
systems which have had ZFS for decades is not affected at all.

~~~
Fnoord
> The actual ZFS on systems which have had ZFS for decades is not affected at
> all.

ZFS hasn't existed "for decades".

~~~
Annatar
It has existed since 2002, and was officially put back into onnv on
31.10.2005, and made available in Solaris 10 update 2 at the end of June 2006.
So, it has existed for at least one decade and more.

~~~
Fnoord
It hasn't existed since 10 april 1998 therefore "decades" is inaccurate.
Furthermore, since it wasn't publicly available from 10 april 2003 (who cares
how long it was in development) if you round fairly, then its accurate to say
"more than a decade" but "decades" is inaccurate. Because "decades" means
"more than 1 decade". It denotes "at least 20 years".

That's regardless of the subject we discuss or how good it is etc.

Anyway, the original parent admitted their mistake (thank you) so I'm done
with that discussion.

------
beedogs
At least it's no btrfs. What a disaster that filesystem's been.

~~~
clircle
Is there a stable (loaded word, I know) versioning file system available for
Linux?

~~~
topspin
The simple, unqualified answer is no.

BTFS is official Linux and apparently "ok" for simple use cases, but there are
no end of reports of failures when non-trivial RAID modes are used, demanding
work loads are applied, device replacement is attempted, etc. and there are
performance problems under a variety of conditions. There are enough
qualifications on the BTFS status page[1] that I, for one, do not consider it
'stable.' ZFS is a thing on Linux, but it's not in the kernel and _when_ it
breaks the kernel developers don't officially care, except when they happen to
have a foot in both camps. This situation naturally limits the size of the ZFS
on Linux user base; you're one of the few if you're doing it and that's not
where most production users want to be. LVM can snapshot logical volumes and
produce `crash consistent' volumes independent of the type of file system.
That's been my go-to solution given no other alternative.

The fact is Linux has trailed far behind its contemporaries in advanced file
systems for... 10+ years now? Not terribly flattering.

I suspect the reason is that most production use of Linux occurs in
environments that provide many enterprise storage functions independent of the
operating system, so there isn't much pressure to, for instance, harden BTRFS
until there aren't major deficiencies. I can snapshot/clone/restore/whatever
my EBS volumes any time I wish and I can do similar with my private cloud
powervault volumes as well. I trust either of these mechanisms far more than
_anything_ Linux has ever provided, including LVM.

[1]
[https://btrfs.wiki.kernel.org/index.php/Status](https://btrfs.wiki.kernel.org/index.php/Status)

------
newnewpdro
Why is the thread title hyphenated? It made me expect there's a utility called
"zfs-bug" causing data loss.

~~~
xxpor
The OP appears to speak German natively. In German they are much more hyphen
happy than English. It's probably the most common error I see among German
speakers typing English.

~~~
Conan_Kudo
This. German merges words together as their modifier mechanism, whereas
English just uses word order to indicate modification. Since it's technically
_not_ illegal in English to hyphenate to do the same thing, German speakers
tend to do this for English.

It's a very hard habit for them to break. :)

~~~
xxpor
One could argue that technically nothing's illegal in English, since there's
no governing body, just what's in common use.

Kind of like civil vs common law too, now that I think about it :)

------
zorkw4rg
basically the reason I like my filesystems a few decades old and mature, I
would not trust zfs or btrfs with anything critical.

~~~
xoa
I have been running ZFS on macOS (OS X), illumos and Solaris for a good 7
years or so now. A major part of the reason I switched over fully, despite
some warts, was that I experienced actual and significant data rot from stuff
I was carrying forward under XFS (IRIX), HFS, etc. I don't consider my
personal stuff to go back _that_ long, but I still have things from 1993 or so
that matter to me. I did a review around 2009ish, and found that a number of
old files had at some point or another become corrupted, including ones I know
for sure weren't in 2004. I'd been following ZFS somewhat since Sun had demoed
it, including a depressing spell after Apple almost moved to it then quit, but
it was my own personal actual losses that pushed me to move over.

End to end checksumming and other integrity features just plain should be
universal at this point. Should have been a decade ago or more in fact. We
have so much incredibly important data now that is digital only and nowhere
else, and memory and storage have both become very cheap at the general
population level. It's shameful that anyone should still be losing data or
experiencing anxiety years or even decades down the line. Integrity, basic
levels of security/privacy, and flexible, high integrity replication should
all be native level features of any data store system. "Decades old"
filesystems just plain absolutely do not cut it, no does anything newer that
doesn't include those promises at least as options. Bugs are unfortunate, and
I hope this prompts ZoL and associated projects under the OpenZFS umbrella to
double check their automated unit and stress tests. Sun rightly made a big
deal of that on release. Even so, I wholeheartedly believe that ZFS or the
like are far better primitives for a data storage scheme then older FS (or
many newer ones for that matter).

~~~
Arbalest
I am still somewhat suprised there are still not relatively simple filesystems
that don't do this. Make it the one feature it does and does properly. Adding
enterprise features will just increase development time (and cost) and add
bugs. Basically an online archival filesystem, that's how most people use
their home computers.

Requirements:

1\. Checksumming on all files

2\. Minimise assumptions of ram correctness

3\. Disc replication (soft raid 1)

Out of scope:

* Snapshotting

* Subvolumes

* Deduplication

As much as I love subvolumes and snapshotting, I feel like CoW probably adds
too much complexity to make it (easily) reliable. Honestly I don't know why
turning off CoW on BTRFS disables checksumming, so if anyone can shed some
light as to why this is, feel free to point out that my requirements are far
more complex than I think.

~~~
zzzcpan
Checksumming is like a myth on HN already. No, it's not useful on its own to
the user or to the filesystem, but it is useful to the filesystem if it can
transparently heal the data. Whole disk replication doesn't address your
reliability concerns as much as you think either, doesn't work all that well
in general and is a burden to maintain. Doing it properly requires treating
the filesystem as a distributed system where disks can join and leave, where
everything is sharded, where rebalancing, self-healing, syncing, resyncing is
all automatic and blazing fast. And so on.

Nowadays most of the work in storage systems is in distributed systems, and
local filesystems are treated as just another unreliable layer.

Hope this gives you some ideas why there are no filesystems like that.

~~~
Arbalest
Not particularly. I'm thinking for the home user who just copies the photos
from their camera to their home computer. Distributed systems would be great,
were it not for the fact that by and large, home connections have abysmal
upload speeds.

