
OpenZFS vs. Btrfs and other file systems (2017) - stargrave
https://www.ixsystems.com/blog/open-zfs-vs-btrfs/
======
elfchief
Is it just me or is it absolutely insane that zfs and btrfs are the only
common filesystems out there that do data checksumming? I don't want or need
the extra complexity of either of them in a lot of cases, but I'd sure as hell
like to know if my data is corrupt...

~~~
z3t4
There is however no point in having data checks if there are no way of
recovering, so running ZFS only make sense if you have multiple disks. And
don't forget to scrub your pool!

~~~
elfchief
This is like saying "there's no point in ECC if it can't correct every error"
... which simply isn't true. It's still far better to know that corruption
exists -- so you can know something is invalid and potentially take action --
than to have corruption silently hanging around doing corrupted data things.

~~~
z3t4
That is not a fair comparison as ECC _is_ Error Correcting, while just check-
summing is not. If you however have a mirror or raid-Z the file system will be
able to correct.

------
Mister_Snuggles
I've been burned by Btrfs enough times to avoid it.

I had a VM running OpenSuse Tumbleweed - I know, rolling release, don't expect
stability, etc - and it would consistently freeze after being up and unused
for a few days. Rebuilt the VM but used XFS instead of Btrfs, it was rock
solid. This happened within the last four months.

At work, we have a number of machines using Btrfs. At some point they get into
a state where the weekly Btrfs scrub process hangs the machine for hours at a
time. The solution in every case was to move that filesystem onto a non-Btrfs
volume. This is an ongoing issue, but I'm not in the team responsible so I
don't know what they're doing with this. Anecdotally, this seems to be a
problem when there is a lot of writing going on with the volumes. Machines
that use Btrfs root but do most of their work on NFS volumes don't seem to
experience this issue.

I also have a laptop running OpenSuse Tumbleweed with Btrfs. One day it
somehow got itself into a state where it would hang when mounting the
filesystem read/write, but it would be fine if it was mounted read-only. I
can't recall exactly what happened to cause it, but I'm pretty sure I didn't
do anything exceptional with the computer - I think I rebooted it from the GUI
while it was doing a scrub in the background, but that's just a guess. I think
this happened about a year ago.

This is all disappointing because some of the things Btrfs enables are
awesome. In particular, OpenSuse's snapper has saved me countless hour of
work. I haven't played with the send/receive stuff, but it seems like a great
solution for backups.

In the meantime, I'm using ZFS on a variety of FreeBSD boxes with a variety of
workloads (including one running more VMs than its RAM should support) and
it's been rock solid.

At this point it will be a long time before I give Btrfs another look. It has
the potential to be really good, and I want it to be good, but I don't believe
it's there yet.

~~~
cookiecaper
Yes, this is similar to my fairly recent experience with btrfs. While it's
competitive on paper and there are several meta-arguments in its favor (being
in mainline, for example), the cold hard facts just don't stack up. btrfs
fares far worse and requires much more tuning than a generic ZFS installation.

A great example is that if you ever try to run a database or a large virtual
machine on a btrfs mount, you'll quickly become acquainted with chattr +C for
"nocow". Any write-intensive applications grinds the filesystem to a halt
until you disable the copy-on-write functionality for that file specifically,
making it so snapshots, etc., are ineffective and the FS stops tracking the
changes that would make them possible.

Whereas on ZFS, like any application that uses fixed block sizes,
virtualization and databases _may_ require some alignment tuning to avoid a
read-modify-write cycle, and ZFS works without fuss from that moment on.

I know that Facebook is investing heavily in btrfs right now, but to be frank,
I don't see the point. btrfs doesn't have its youth as an excuse anymore and
the core issues that exist seem like they're pretty fundamental.

It's a gargantuan waste of effort when people could just take the tack that
Ubuntu has taken and decide all that FUD around the CDDL, kicked up by fears
that the commercial Unices were plotting to take their revenge on Linux, that
maybe all that isn't super relevant anymore, and they can just start shipping
ZoL.

~~~
hiyer
Facebook uses btrfs extensively [1]. I was tempted to try it out again (burnt
my fingers on it a couple of years back) but from the experiences mentioned
here it looks like it's not really production-ready yet.

1\. [https://code.fb.com/open-source/linux/](https://code.fb.com/open-
source/linux/)

------
blinkingled
HAMMER2/DragonFly is already looking great with cow/snaphots, compression,
live dedup etc and one of its design goals is to work great on low memory
systems. Once it is feature complete with clustering support it'll be a great
choice to have.

Although looks like it'll be DragonFly only.

~~~
xte
This.

Also the author do not mention that LLNL zfs Linux port is not as production
ready than FreeBSD one.

Personally I use nilfs2 on top of lvm since is Linux mainline from many many
years, it a logfs like Hammer volumes, so it protect effectively against
accidental overwrite/deletion and can be live resized both in grow and shrink
so coupled with LVM can be a poor man's pooled storage... But Hammer on Linux
will be a success like OpenSSH!

~~~
blinkingled
I was running DragonFly/HAMMER1 on ESXi with physical disks attached. This is
my NAS Backup with Samba.

Got it migrated over to HAMMER 2 recently and it's pretty sweet.

------
jplayer01
It's a travesty that so few people care about moving to more reliable
filesystems. A decade later and we're still stuck with ext4 and now btrfs,
which I do use nowadays but I'm absolutely aware of the high risk of data
corruption because the underlying code is terrible.

Windows is even worse with NTFS, which is a complete pile of garbage. I've
fixed hundreds of computers with NTFS corruption over my lifetime and it's
mind-boggling that Microsoft is fine with this shit.

~~~
zaarn
The thing about ext4 is that it's sufficiently mediocre at what it does.

It's fairly crash resistant, has well worn and battle-tested codepaths and
generally seems to not loose data outside bitrot. So why pick btrfs with it's
dataloss stories or ZFS with it's complex management?

It's an effort to switch to a new FS as default, especially if either itself
or the integration is not as well tested or completed yet.

So everyone just sticks to the thing they know how it sucks.

~~~
Asmod4n
ZFS and "complex management"? there comes nothing close in the open source
market than ZFS and its easy and straightforward to use management tools in my
opinion.

You don't need a handful of tools just to do one simple task, just hack away
at the zfs and zpool commands and everything gets done. No need for parted,
mdadm, lvm or support of your needs compiled into the kernel. No matter on
which operating system you are currently, you only need those two tools and
they work the same everywhere.

~~~
zaarn
It comes down to when you need to anything with ZFS but simply use it for data
storage.

Expanding a RAID vdev is not possible, instead the recommendation is to waste
50% of storage for a pool of mirrored devices to get the same protection as a
RAID1 of two drives, which isn't much for some people.

For the avarage home user, such a scaling option is already no longer viable.

This is topped with ZFS' poor performance under low RAM conditions. FreeNAS
barely managed 10MB/s when running with 2GB of RAM and an old Pentium 4 Dual
Core. Not everyone can afford to throw hardware at the problem.

On the other hand, Linux performs under the same conditions with 8 times the
speed for simply copying files and mdadm/lvm/unraid allow me to dynamically
grow the pool of harddrive space a single drive at a time.

(plus mdadm/lvm is included in basically every larger distro by now, no need
to install it or compile support into the kernel)

~~~
boomboomsubban
>Expanding a RAID vdev is not possible, instead the recommendation is to waste
50% of storage for a pool of mirrored devices to get the same protection as a
RAID1 of two drives, which isn't much for some people

Or make a new vdev. It takes a bit of planning, but it's not some massive
hurdle.

>This is topped with ZFS' poor performance under low RAM conditions. FreeNAS
barely managed 10MB/s when running with 2GB of RAM and an old Pentium 4 Dual
Core. Not everyone can afford to throw hardware at the problem.

FreeNAS is enterprise software, the default settings expect you to read and
follow the system requirements. The developers aren't spending their time
supporting it on less powerful systems, but ZFS (and likely FreeNAS) can run
fine if you configure it for that hardware.

~~~
luke0016
>>Expanding a RAID vdev is not possible, instead the recommendation is to
waste 50% of storage for a pool of mirrored devices to get the same protection
as a RAID1 of two drives, which isn't much for some people

>Or make a new vdev. It takes a bit of planning, but it's not some massive
hurdle.

I had a 3 disk, 6TB RAIDZ (3x 3TB drives). I bought a fourth 3TB disk when I
started running low on space, not knowing that this limitation existed. I
ended up using an intermediate, 8TB external drive to copy data off, re-create
the RAIDZ with 4 disks, and then copy the data back.

This sucked. What should I have done instead?

~~~
Mister_Snuggles
Sadly enough, the current answers are to either do what you did, or replace
disks one by one.

RAIDZ expansion is coming though[0]. This is the right answer to your problem,
but an answer at some point in the future doesn't help you with your problem
today.

[0]
[https://www.bsdcan.org/2018/schedule/events/960.en.html](https://www.bsdcan.org/2018/schedule/events/960.en.html)

~~~
luke0016
Oh, this is going to be great. Thank you for the link!

And, replacing disks one by one would require me to buy 3 disks instead of 1.
I did that the last time I grew my array :-/

------
hawski
I was always hoping for more adoption of NILFS [0] - a log structured FS with
continuous snapshots. In theory it should never lose data and that's a
reasoning behind it not having fsck. I've seen a blog post that was debating
this choice and presenting a few situations when fsck would be necessary, but
I can't find it.

It would be great for /home for many normal users - they could always go back.
That could also be a problem when one wants to make sure that a sensitive file
is really removed. It could be also used at least for some kind of archive.
Possibly also for development.

[0] [https://en.wikipedia.org/wiki/NILFS](https://en.wikipedia.org/wiki/NILFS)

~~~
exikyut
I was curious how "never lose data" works given that HDDs do not provide
infinite capacity: the log is circular. This makes NILFS good for NAND flash.

The continuous per-fwrite snapshotting functionality sounds awesome - I've had
a few (minor but annoying) accidents with rm of late, so I'd definitely
appreciate an rm-proof undelete.

I was trying to figure out if NILFS had any sort of COW functionality, then
realized, d'oh, of course it effectively does, via the log.

To address the elephant in the room - checksumming - unfortunately NILFS
doesn't (yet?) do this. The WP article links to
[https://www.spinics.net/lists/linux-
nilfs/msg01063.html](https://www.spinics.net/lists/linux-nilfs/msg01063.html),
which states that the CRC32Cs that are written to disk are used only for the
remount/recovery stage, and are apparently not suitable for realtime
verification.

------
throwaway12iii
bcachefs is a really interesting FS in the linux space.

Checkout some of the writing about it here:
[https://www.patreon.com/bcachefs](https://www.patreon.com/bcachefs)

~~~
ansible
Well, that is interesting.

I'm curious that they're talking about implementing encryption at the
filesystem layer. This has typically been done elsewhere. And anyway, key
management is an issue. Seems like it would be beyond the scope for a
filesystem. But I had similar thoughts right before I started learning about
the write anywhere layout used by NetApp all those years ago, where mixing
layers had enormous benefits (ZFS too).

The idea of incorporating a flash translation layer (FTL) is interesting, but
there is the hardware support issue. Meaning that yes, you can still buy raw
NAND flash memory, but what are you going to connect it to? NAND flash
controllers which can present a useful interface to the host processor (like
USB) already incorporate a FTL.

Similarly, eMMC memory has a FTL layer baked in, and presents itself as a
block-addressable device.

Raw NAND flash controllers are increasingly rare on modern microcontrollers.

~~~
cyphar
ZFS encryption works in a similar fashion, and ext4 also has encryption (which
is used by default in Android).

Personally I'm not a huge fan of this because unlike the clear benefits of
giving filesystems control over raw devices, full disk encryption has
requirements that can't really be provided partially. You want to ensure no
metadata about the filesystem structure or files will be known without knowing
the encryption key but this is clearly not possible without having block
device layer encryption. ZFS encryption allows for all sorts of useful
operations on encrypted filesystems without having the key. This is cool, but
also obviously provides a lot of information about the filesystem structure.
Also dedup tables aren't encrypted in this setup. ext4 only encrypts filenames
and contents, which is worse in some respects.

~~~
koverstreet
The only metadata stored in the clear in bcachefs is the superblock and the
very first part of the btree node and journal entry headers - just the
checksum, magic number (identifying it as a btree node/journal header), and a
flags field which mainly contains the checksum/encryption type.

So an attacker can tell roughly how much metadata you have, but literally
nothing about the contents.

------
aidenn0
ZoL and Btrfs are the only two filesystems that I've ever had rendered
unmountable without any reported disk error. In both cases, ECC ram was in
use, and mirrors were used.

Btrfs was a known bug, and the kind folks on #btrfs were able to help me
recover (though it was a quite involved process).

For ZFS, the consensus in #zfs and #zfsonlinux was that with the error message
I was getting "I hope you have backups."[1]

I see XFS mentioned here, but in the admittedly unusaly use-case of editing a
file that when compiled and modprobed may hang the kernel, I found that vim
was insufficiently fsync-ing to ensure I had valid data (either old or new
copy would have been acceptible).

I don't know what my point was with this comment except "All filesystems are
buggy, so backup your data"

1: It should be noted that in the case of ZFS, I was on vacation and it was my
laptop that had no critical data, but I absolutely needed it working for other
things, so I only spent a couple hours trying to recover; The btrfs issue
happened at home where I had plenty other systems to use and where I was more
motivated to recover the data.

------
madhadron
Quick note on btrfs: unless you've been dealing with the latest kernels in the
last few months, reserve judgement. Facebook has put a truly ridiculous number
of hours on that filesystem in the last year or so and fixed a concomitant
number of issues.

~~~
walrus01
From what I'm seeing in this thread, basically two things:

People who have been running btrfs in production (reckless, imho) on kernels
that are 2.5+ years old or worse. And then they're surprised and disappointed
when it loses data.

For everyone who's posted here with a btrfs data loss story, it would be
helpful to quantify your anecdote with what kernel you were running.

~~~
deno
Top post in this thread is about btrfs is on Tumbleweed, which is not an old
kernel. My experiences have been similar. Btrfs is a great backup stress tool.
I have no idea how Facebook is running it in production…

~~~
rb2k_
The "how" consists mostly of reasonably up-to-date kernels that track main
line as much as possible.

As far as why, [https://code.fb.com/open-
source/linux/](https://code.fb.com/open-source/linux/) has a nice quote:

"Btrfs has played a role in increasing efficiency and resource utilization in
Facebook’s data centers in a number of different applications. Recently, Btrfs
helped eliminate priority inversions caused by the journaling behavior of the
previous filesystem, when used for I/O control with cgroup2 (described below).
Btrfs is the only filesystem implementation that currently works with resource
isolation, and it’s now deployed on millions of servers, driving significant
efficiency gains."

~~~
deno
Is there any reason cgroup2 io can’t be implemented in non-CoW fs?

~~~
rb2k_
I don't think copy-on-write is related

(I don't know though)

------
lrem
I'm not sure I agree with the idea that having a single open source project to
rule them all is an inherently bad thing. In proprietary software, if Banana
corp says that the next version of their Cavendish OS drops x86 support, you
might find yourself doomed. But if Debian decides to drop x86 tomorrow, you go
"Cool, good luck, I'll just fork." The project itself is never a single point
of failure.

~~~
ken
We need to disambiguate the word "you" here. When "you" = an IBM or Apple or
Microsoft, then forking is a great option, and a great reason to pick a
program with a license that allows it. When "you" = some guy sitting at home
in his PJs on Sunday morning who reads that his favorite filesystem is moving
in a direction he doesn't like, you're basically SOL unless you decide to
devote your life to maintaining that FS now.

Maintaining an entire Debian architecture is probably not a one-person job.

~~~
Wowfunhappy
But, because Debian has such a broad array of contributors, I'm _extremely_
confident they won't drop x86 support unless/until basically everyone else in
the world has already moved on, and quite possibly not even then.

Whereas Apple really could just up and drop x86 any day now, and I'll just be
stuck.

------
linsomniac
I've been super happy with ZFS for over a decade. All the pain I've had with
it has been related to deduplication requiring massive amounts of RAM. When
you don't have the RAM, it tends to be flaky. But I've never run into data
loss with it, despite my best attempts.

My current backup server would need ~50GB of RAM just for the in-RAM
deduplication tables.

I've been thinking of trying btrfs for the backup server, but the
deduplication story is not something that's obvious, it all seems to be third-
party, which makes me skeptical. But it worth trying. I might just try a test
Dragonfly setup, because my simple testing of HAMMER seemed very promising,
years ago.

I've had pretty bad experiences with btrfs in the distant past. Around a
decade ago I had all my company laptops using btrfs and it worked great, as
long as we didn't get >80% usage or something. It worked well for around a
year. But then the reliability of btrfs dropped, and we had repeated data loss
and switched back to ext.

~~~
mbrumlow
You really can't blame ZFS for needing ram for dedup. The raw math around
looking up checksum just demands it. There are some other systems that can
achieve fast dedup, but instead of ram they would require a ton of super fast
SSDs to store the tables.

There are some other types of dedup that don't guarantee 100% dedup, but then
the dedup levels would very depending on data sets -- this may be fine for VMs
and user data. They work by being using data locality, predicting which dedup
table would needed to dedup the next N bytes. But it is not likely we are
going to see this any time soon because the patents on this sort of dedup is
fairly recent.

~~~
LeoPanthera
> You really can't blame ZFS for needing ram for dedup. The raw math around
> looking up checksum just demands it. There are some other systems that can
> achieve fast dedup, but instead of ram they would require a ton of super
> fast SSDs to store the tables.

This is only true for real-time, "live" dedup, which is an essential feature
for almost nothing.

btrfs' offline dedup (which, despite the name, can be done on a mounted
filesystem) is far more rational. You fill the disk with data, and then dedup
it later, at a schedule that works for you. No extra RAM needed.

~~~
mbrumlow
> btrfs' offline dedup (which, despite the name, can be done on a mounted
> filesystem) is far more rational.

You still have to process it, and you are still going to use a lot of ram --
and you are going to want to use a lot of ram once you get into larger data
sets otherwise the process will take days.

------
unhammer
On that note, are there any fancy file systems that have acceptable
Windows/Mac support? Or are we all doomed to using NTFS or exfat for USB
drives that need to work on Linux/Mac/Windows?

~~~
ubercow
Windows has ReFS. From the looks of it, it has some modern features. But I
doubt that’ll ever be readable on a Mac.

[https://en.wikipedia.org/wiki/ReFS](https://en.wikipedia.org/wiki/ReFS)

~~~
riffraff
> The ability to create ReFS volumes was removed in Windows 10's 2017 Fall
> Creators Update for all editions except Enterprise and Pro for
> Workstations,[4] which would seem to indicate Microsoft is no longer
> intending ReFS as a general replacement for NTFS, at least in the near
> future.

------
nimbius
anecdotal report: Im running a medium sized MariaDB database of appointments
and invoice backups on a 4 disk BTRFS filesystem with deduplication. snapshots
work, its stable, and it performs well.

I feel like so much of BTRFS though was death by committee. It was trying to
do every single thing ZFS did, regardless of whether or not the specific
feature from ZFS was mediocre to begin with.

focusing on a release cycle and communicating would have helped it alot
instead of vague things like "it is ready enough" and "some features arent
ready"

~~~
eeperson
BTRFS does provide a status for all the major features:
[https://btrfs.wiki.kernel.org/index.php/Status](https://btrfs.wiki.kernel.org/index.php/Status)

------
CathyWest
I get it during the Sun years, but why the hell does Oracle continue
developing Btrfs when they can just release ZFS under a GPL-compatible license
and be done with it? It reallys seems like they are working against their own
interests here.

~~~
coredog64
They’re still selling Solaris licenses to enterprises. If they keep the FUD
churning they can continue to suck what is essentially free money from those
customers.

~~~
CathyWest
Does a GPL-compatible ZFS really detract from their FUD churning Solaris
business, though?

Those customers can't redistribute Solaris any more than they can redistribute
Linux with a CDDL-licensed ZFS. If they really wanted to switch to Linux they
could do so today and be in the exact same legal position license-wise. Not to
mention they could switch to FreeBSD and be able to redistribute, but again, I
doubt this is something very many Solaris users care about. Perhaps the cost
of porting their applications away from Solaris is too high, or their middle
managers are Oracle fanboys, whatever the reason is, a GPL-compatible ZFS does
not threaten that business model.

I think a more likely reason why they pour money into the Btrfs death march
while having a turnkey solution just sitting around in the next office over
waiting to be singed off by the legal department as CDDL+GPL or Apache or
whatever is that each project is the pet of a different pointy-haired boss and
they refuse to cooperate internally.

------
zmix
Hot title. Boring article.

------
zmix
Is ZFS still incapable of enlarging/extending a filesystem by adding new disks
to it without rebuilding the whole FS infrastructure?

~~~
snuxoll
You’re confusing the expansion of a ZFS pool with the expansion of a RAID-Z
vdev. Throwing new disks into a pool has always been possible, adding them to
an existing RAID-Z vdev isn’t (yet, block pointer rewrite and RAID-Z expansion
are coming).

I run exclusively with mirrored vdev’s and have never run into issues growing
my pool.

~~~
Wowfunhappy
_Is_ block pointer rewrite actually coming? It seems like it has been stalled
for years and years...

~~~
snuxoll
There was just a talk on it this year, it’s coming but between the time it
takes to go stable in OpenZFS and the time it takes to hit downstream
(FreeBSD, ZoL) it might not be for a while yet.

------
paulie_a
Is btrfs still a thing? Reading other people's experiences, atleast a couple
distros dropping it as a default, and my personal experience of it crashing
completely. I'm surprised anyone would even consider it as an option. It
simply isn't a reliable file system.

~~~
zlynx
As personal experiences go, btrfs works great for me.

I've been using it on a 6 drive NAS since 2012, and on two laptops with SSDs.

In my experience "crashes" are symptoms of people who didn't read the
documentation and don't understand how to recover.

For example, on the laptops I've often run out of disk space. And btrfs can
run into a situation where it can't clean up partly used blocks without more
disk space. To recover you have to add more space. I usually used the swap
partition, but a USB stick also works. Or a RAM drive, but that's risky.

Some other issues I've had to help people with is snapshot deletion. Deleting
a btrfs snapshot appears instant but none of the used space comes back. That's
because it is doing it in the background and it can take a really long time.
OTOH, ext4 does exactly the same thing with background delete of very large
files.

~~~
paulie_a
I've lost multiple terabytes with btrfs on multiple different systems. It was
essentially a random occurance. These were all cases of simply using it as a
regular filesystem. I read the docs I attempted recovery. They didn't work.
And quite frankly I shouldn't need to take those steps anyways. As for your
example that seems insane someone should have to jump through hoops like that
to utilize a hard drive. Btrfs is junk and I am glad it has been removed as
the default on a couple distros.

~~~
zlynx
"hoops like that"

Yeah that's every filesystem ever. I helped a guy out just a couple of months
ago because his Debian virtual machine had plenty of disk space but he
couldn't install anything.

Because it was out of inodes. What "regular user" aka Nodejs / React developer
knows about inodes? Not that guy.

So I guess ext4 is junk too.

------
metildaa
Is ZFS worth considering at this point? Btrfs seems to be quite featureful and
stable, with a plethora of grocers and retailers all over the world using it.
Wal-Mart, Kroger (and subsidiaries), all of IBM's old point of sale customers
(now Toshiba SuperPOS iirc) all use it on things as small as deli scales all
the way to running their point of sale systems and backends.

~~~
zielmicha
\- fsync on btrfs is extremly slow. The situation has improved in last 3
years, but it is still much slower than ZFS. I just did a simple test and BTFS
is 12x slower than ZFS (on my customer SSD, for small writes).

\- RAID5/6 mode is only experimental in btrfs, while it is really stable in
ZFS.

\- I don't have concrete data for that, but in my experience, BTRFS has high
latency (>1 second) even for small file operations when under load.

\- While that should not be a problem for production systems, I have some
crappy hardware where BTRFS oopses or corrupts data, while other filesystems
(Ext4, ZFS) work fine.

~~~
jstimpfle
Yes the 12x slower seems about right. I remember Debian installations taking
>1h while ext4 took < 10min. (Installations and package operations are
particularly bad since dpkg does many fsync()s when unpacking). I think that
was on a spinning drive, though. Anyway at my old job (where we had mainly
spinning drives) we used apt and dpkg only with eatmydata. eatmydata is a
command which uses LD_PRELOAD hackery to remove fsync() systemcalls.

~~~
andoma
I believe the correct solution to this problem would be that the installers
would snapshot the system, install all packages without any fsync()ing at all,
then finally one sync() and remove the snapshot. Optionally keeping snapshots
if user want to roll back from a broken upgrade (for whatever reason). Again,
as others have written here, btrfs is great if your software plays along with
it, otherwise it might not be so great.

~~~
rleigh
While this is a good idea in theory (and is possible on e.g. FreeBSD with ZFS
with boot environments), dpkg can't do this easily. The main problem is that
it's possible and supported for the managed files to be placed upon multiple
filesystems. Separate /usr, separate /var, separate /usr/share, whatever
combination you choose. This means that dpkg needs to force file
synchronisation across all mounted filesystems and it can only do this
robustly by issuing fsyncs.

When there's only a single filesystem, and that filesystem is btrfs (or ZFS),
it should however be possible to optimise this away and delegate everything to
the filesystem. But even here, maintainer scripts may issue their own fsyncs
as they update their own databases, kernel images or whatever.

~~~
zbentley
> dpkg needs to force file synchronisation across all mounted filesystems and
> it can only do this robustly by issuing fsyncs

Not if file-change notifications were supported robustly by dpkg and the
kernel (to a lesser extent). Getting to that would, however, require massively
restricting the compatible-kernel-versions set of dpkg, and would also
probably require undoing some of the more . . . misguided pieces of history
with regard to file-change notification systems in Linux.

~~~
rleigh
I don't think this is correct. File-change notifications wouldn't provide any
information which isn't already known. dpkg, after all, is entirely
responsible for unpacking the .deb files and doing the file modifications.
It's fully aware of what was written, in what order, and when.

The problem is that the system state needs checkpointing for every package
state change. It must allow for recovery on failure, termination, abortion or
power loss, amongst other scenarios. And the package database must remain in
sync with the filesystem state.

When every managed file is on one snapshot-able filesystem, this could be
rolled back atomically, and the fsyncs skipped. But as soon as you have a non-
snapshot-able filesystem or multiple filesystems in use, the fsyncs can't be
skipped.

------
InTheArena
The challenge here is that ZFS is dominate by Oracle right now, and if anyone
thinks that Oracle is going to act in the best interest of the community,
rather then killing the golden goose to peel the meat off it's bones, take a
look at what Oracle has done with Java recently. Or what they did with
Solaris. Or what they do with Oracle Cloud.

in addition, we know that there are patent landmines out there because NetApp
and Oracle have a (undisclosed) settlement on the issue. Using ZFS is a
probably good way to get a visit from your legendary friendly Oracle sales
agent and auditor. Oracle could fix all of this with a quick re-license or
patent indemnification, but they haven't and almost certainly won't.

My Synology (and all modern Synologies) uses BTRFS, and it seems pretty solid
so far. I would have preferred that ZFS had matured and been dominate in FS,
but there are some really risky things that come with ZFS.

~~~
snuxoll
ZFS is _not_ dominated by Oracle, Open-ZFS is the source of record for
anything other than Solaris and critical features are being added that Oracle
cannot use without opening their branch back up under the CDDL.

The CDDL also includes a patent agreement, Oracle can’t retroactively waive
it.

~~~
InTheArena
Every single one of your statements was also true of Java.

Go look at what the Java community is going through right now.

~~~
simion314
>Go look at what the Java community is going through right now.

You did not understand what happened with the latest Oracle Java news and you
are spreading FUD.

People have a choice what Java implementation to use, they can pay for latest
Oracle implementations, use Open JDK (it is included in Debian and most
distributions), you can get support from RedHat and other big vendors. There
is no risk of getting in trouble by using Open JDK, MS and Google had trouble
because they "forked" Java

------
sirmike_
Opinions are good, bad and ugly. Sometimes all at once.

Thanks for letting us know one nerd's take on openzfs.

The only thing standing in zfs way (to 1st tier Linux support) is a company
springing up and heading the charge. 75 percent sure this will happen
eventually with ZoL at some point in a decade's time.

However, another point of measure is money. Synology is huge and seems to
think btrfs is good enough for business. They compete with ixsystems directly.
We could go all day and into 2020 with points and counter points. The
licensing issue is weak but will eventually be moot with enough money and time
behind the right project. I'm not in any position to judge that last point for
sure just offering my IT pro/business take on it.

To be fair bsd-ish-ness has a role to play on the backend and is bullet proof
and battle tested for the right mission. But the same can be said for Linux
too. The last point I'll make is that today, I can run the Linux kernel from
my msft windows 10 machine natively. No one here would have saw that in 2000.
Does bsd, zfs have a similar anecdotal Hero Epic like this; where the enemy
capitulated so thoroughly?

~~~
rleigh
Dont' we already have the beginnings of such support with Canonical's support
of ZFS in the Ubuntu Linux kernel? I'm writing this reply on a system running
ZFS as the rootfs.

    
    
        % findmnt /
        TARGET SOURCE             FSTYPE OPTIONS
        /      rpool/ROOT/default zfs    rw,relatime,xattr,noacl
    

Regarding Linux kernel on Windows 10, if you're referring to The Linux
subsystem, there's no Linux kernel there, it's a Windows subsystem
implementing the Linux system calls directly. Unless you're referring to full
Hyper-V virtualisation, where you can run FreeBSD just as you can run Linux.

~~~
sirmike_
I also didn't think the canonical effort went anywhere? Maybe I'm wrong.
Re:wsl You are correct thank you for the catch. I was missing a few key words.

~~~
rleigh
I'm not sure what you mean by "went anywhere". It's perfectly functional as it
stands.

The main missing piece is proper support in the installer for using it as a
root filesystem. Currently this requires manual setup, which is tedious, but
even this is functional once done.

------
em-bee
ZFS is not suitable for desktop computers. as for example that one anecdote in
the comments of the article shows, ZFS doesn't work well on a single disk. the
author of the comment is confused by the behavior, and doesn't even consider
that this might be by design.

another point that shows that ZFS is unsuitable for the desktop is it's
reliance on EEC RAM: [https://forums.freenas.org/index.php?threads/ecc-vs-non-
ecc-...](https://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-
zfs.15449/)

i considered getting EEC RAM for my machine just because of ZFS, but i could
not find any 16GB strips to fill my board.

that's two strikes against desktop use, and enough to let me prefer btrfs.

EDIT: i am very happy to see so many replies proving me wrong. when i last
researched this topic i was not able to find any counter arguments to needing
EEC.

greetings, eMBee.

~~~
binaryapparatus
I have been using ZFS for over two years, on any desktop that I control, as
both root file system for FreeBSD setups and backup file system in many
configurations. In almost all cases I don't have EEC ram and I don't need it
really. Your comment is either outdated or wrong, ZFS is THE go-to filesystem
on desktop for me.

Edit: oh and it works beautifully with single disk setups too.

~~~
amelius
> In almost all cases I don't have EEC ram and I don't need it really.

That's like saying you don't need health insurance because you haven't got
ill.

~~~
binaryapparatus
ECC is nice to have but absolutely not a show stopper if you don't have it.
Plenty of articles and discussions online, example:
[http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-
yo...](http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/)

Claim that you can only use ZFS with ECC memory to feel safe is false.

~~~
agapon
Wait until you get a 1-bit flip in a spacemap record. Then you just won't be
able to import your pool (insta-panic). And good luck with fining and fixing
that flipped bit.

