Hacker News new | past | comments | ask | show | jobs | submit login
Bcachefs – A New COW Filesystem (kernel.org)
292 points by jlpcsl on May 11, 2023 | hide | past | favorite | 237 comments



I would like to see filesystems benchmarked for robustness.

Specifically, robustness to everything around them not performing as required. For example, imagine an evil SSD which had a 1% chance of rolling a sector back to a previous version, a 1% chance of saying a write failed when it didn't, a 1% chance of writing data to the wrong sector number, a 1% chance of flipping some bits in the written data, and a 1% chance of disconnecting and reconnecting a few seconds later.

Real SSD's have bugs that make them do all of these things.

Given this evil SSD, I want to know how long the filesystem can keep going serving the users usecase.


  nbdkit memory 10G --filter=error error-rate=1%
... and then nbd-loop-mount that as a block device and create your filesystem on top.

Notes:

We're working on making a ublk interface to nbdkit plugins so the loop mount wouldn't be needed.

There are actually better ways to use the error filter, such as triggering it from a file, see the manual: https://www.libguestfs.org/nbdkit-error-filter.1.html

It's an interesting idea to have an "evil" filter that flips bits at random. I might write that!


Suddenly I'm reminded of the time someone made a Bad Internet simulator (causing packet loss or other problems) and named the program "Comcast".


I love it.


How does this compare with dm-flakey [0] ?

[0]: https://www.kernel.org/doc/html/latest/admin-guide/device-ma...


dm-flakey runs inside the kernel so is going to be faster. With nbdkit + NBD loop mounting there's a bit of overhead going through a Unix socket, which is avoided by using ublk + nbdkit plugins, but you still have to go out to userspace for each device access.

The advantage is you can write plugins as regular code in lots of different programming languages, with a stable API and ABI: https://www.libguestfs.org/nbdkit.1.html So it's a lot easier to experiment or write plugins for odd devices.

Also our sparse RAM disk supports huge disks (up to 16 exabytes IIRC, 2^63-1 bytes whatever that is).


> For example, imagine an evil SSD which had a 1% chance of rolling a sector back to a previous version, a 1% chance of saying a write failed when it didn't, a 1% chance of writing data to the wrong sector number, a 1% chance of flipping some bits in the written data, and a 1% chance of disconnecting and reconnecting a few seconds later.

There are stories from the ZFS folks of dealing with these issues and things ran just fine.

While not directly involved with ZFS development (IIRC), Bryan Cantrill was very 'ZFS-adjacent' since he used Solaris-based systems for a lot of his career, and he has several rants about firmware that you can find online.

A video that went viral many years ago, with Cantrill and Brendan Gregg, is "Shouting in the Datacenter":

* https://www.youtube.com/watch?v=tDacjrSCeq4

* https://www.youtube.com/watch?v=lMPozJFC8g0 (making of)


A 1% error rates for corrupting other blocks is prohibitively. A file system would have do extensive forward error correction in addition to checksumming to have a chance to work with this. It would also have to perform a lot of background scrubbing to stay ahead of the rot. While interesting to model and maybe even relevant as a research problem given the steadily worsening bandwidth to capacity ratio of affordable bulk storage I don't expect there are many users willing to accept the overhead required to come even close to a usable file system an a device as bad as the one you described.


Well, it’s sort of redundant. According to [1], the raw bit error rate of the flash memory inside today’s SSDs is already in the 0.1%-1% range. And so the controllers inside the SSDs already do forward error correction, more efficiently than the host CPU could do it since they have dedicated hardware for it. Adding another layer of error correction at the filesystem level could help with some of the remaining failure modes, but you would still have to worry about RAM bitflips after the data has already been read into RAM and validated.

[1] https://ieeexplore.ieee.org/document/9251942


ZFS will do this. Give it a RAIDz-{1..3} setup and you've got the FEC/Parity calculations that happen. Every read has it's checksum checked, and when reading if it finds issues it'll start resilvering them asap. You are of course right in that it will eventually start getting to worse and worse performance as it's having to do much more rewriting and full on scrubbing if there are constant amounts of errors happening but it can generally handle things pretty well.


I don't know...

Let's say you have 6 drives in raidz2. If you have a 1% silent failure chance per block, then writing a set of 6 blocks has a 0.002% silent failure rate. And ZFS doesn't immediately verify writes, so it won't try again.

If that's applied to 4KB blocks, then we have a 0.002% failure rate per 16KB of data. It will take about 36 thousand sets of blocks to reach 50% odds of losing data, which is only half a gigabyte. If we look at the larger block ZFS uses internally then it's a handful of gigabytes.

And that's without even adding the feature where writing one block will corrupt other blocks.


I've had ZFS pools survive (at different times over the span of years):

- A RAIDz1 (RAID5) pool with a second disk start failing while rebuilding from an earlier disk failure (data was fine)

- A water to air CPU cooler leaking, CPU overheated and the water shorted and killed the HBA running a pool (data was fine)

- An SFF-8088 cable half plugged in for months, pool would sometimes hiccup, throw off errors, take a while to list files, but worked fine after plugging it in properly (data was fine after)

Then the usual disk failures which are a non-event with ZFS.


I recovered from 3 disk RAID 6 failure (which itself was organization failure driven...) in linux's mdadm... ddrescue to the rescue, I guess I got "lucky" the bad blocks didn't happen in same place on all drives (one died, other started returning bad blocks), but chance for that to happen are infinitesly small in the first place

So shrug


How do you know you got lucky with corrupted blocks?

mdadm doesn’t checksum data, and just trusts the HDD to either return correct data, or an error. But HDDs return incorrect data all the time, their specs even tell you how much incorrect data they’ll return, and for anything over about 8TB you’re basically guaranteed some silent corruption if you read every byte.


> and for anything over about 8TB you’re basically guaranteed some silent corruption if you read every byte

This is simply not true. I've been running bi-weekly ZFS scrubs on my file server which has grown from 80 TB to 200 TB of data over three years now with zero failed checksums. That is petabytes of reads with zero corruption. The oldest drives are nearing 5 years old (they were in a Btrfs Synology NAS before, again, zero failed checksums).

The wildly pessimistic numbers on the spec sheets are probably just there so they can deny warranty replacements by saying that some errors are in-spec.


Those specs aren't true, or at least the distribution is very far from uniform and they're failing to explain it even a fraction of the way. You don't get corruption that often.


Yes, I also had a "1 disk failed, second disk failed during rebuild" event (like the parent, not your story) with mdadm & RAID 6 with no issues.

People seem to love ZFS but I had no issues running mdadm. I'm now running a ZFS pool and so far it's been more work (and things to learn), requires a lot more RAM, and the benefits are... escaping me.


Are you sure the data survived? ZFS is sure and proves it with checksums over metadata and data. I don't know madm well enough to know if it does this too.


It doesn't


My son has a ZFS pool where he just realised one of the drives has half a billion errors on it and the data is still fine. New drive has been ordered!


Perhaps you ought to run to a Best buy and buy external drives and do backups ASAP.

RAID is not backup, always remember that!


It's actually all replaceable data and he's planning to recreate the pool anyway. Just insane that ZS handles it though.


This is why I always opt for ZFS.


Please, where can i read more about this. I remember bricking OCZ drive by setting ATA password, as was fashionable to do back then, but 1% writes going to wrong sector - what are these drives, fake sd cards from aliexpress?

Like, which manufacturer goes, like "tests show we cannot write more that 400kB without corrupting drive, let us ship this!" ?


> Like, which manufacturer goes, like "tests show we cannot write more that 400kB without corrupting drive, let us ship this!" ?

According to https://www.sqlite.org/howtocorrupt.html there are such drives:

4.2. Fake capacity USB sticks

There are many fraudulent USB sticks in circulation that report to have a high capacity (ex: 8GB) but are really only capable of storing a much smaller amount (ex: 1GB). Attempts to write on these devices will often result in unrelated files being overwritten. Any use of a fraudulent flash memory device can easily lead to database corruption, therefore. Internet searches such as "fake capacity usb" will turn up lots of disturbing information about this problem.

Bit flips amd overwriting wrong sectors from unrelated files being written are also mentioned. You might think this sort of thing is just cheap USB flash drives, bit I've been told about NVMe SSDs violating their guarantees and causing very strange corruption patterns too. Unfortunately when the cause is a bug in the storage's device own algorithms, the rare corruption event is not necessarily limited to a few bytes or just 1 sector here or there, nor to just the sectors being written.

I don't know how prevalent any of these things are really. The sqlite.org says "most" consumer HDDs lie about commiting data to the platter before reporting they've done so, but when I worked with ext3 barriers back in the mid 2000s, the HDDs I tested had timing consistent with flushing write cache correctly, and turning off barriers did in fact lead to observable filesystem corruption on power loss, which was prevented by turning on barriers. The barriers were so important they made the difference between embedded devices that could be reliably power cycled, vs those which didn't reliably recover on boot.


> Fake capacity USB sticks

Those drives have a sudden flip from 0% corruption to 95-100% corruption when you hit their limits. I wouldn't count that as the same thing. And you can't reasonably expect anything to work on those.

> The sqlite.org says "most" consumer HDDs lie about commiting data to the platter before reporting they've done so, but when I worked with ext3 barriers back in the mid 2000s, the HDDs I tested had timing consistent with flushing write cache correctly

Losing a burst of writes every once in a while also manifests extremely differently from steady 1% loss and needs to be handled in a very different way. And if it's at power loss it might be as simple as rolling back the last checkpoint during mount if verification fails.


Writes going to the wrong sector are usually wear levelling algorithms gone wrong. Specifically, it normally means the information about which logical sector maps to which physical sector was updated not in sync with the actual writing of the data. This is a common performance 'trick' - by delaying and aggregating these bookkeeping writes, and taking them off the critical path, you avoid writing so much data and the user sees lower latency.

However, if something like a power failure or firmware crash happens, and the bookkeeping writes never happen, then the end result that the user sees after a reboot is their data written to the wrong sector.


With this level of adversarial problems, you'd better formulate your whole IO stack as a two-player minimax game.


On the whole, I'm not sure that a FS can work around such a byzantine drive, at least not if its the only such drive in the system. I'd rather FSes not try to pave over these: these disks are faulty, and we need to demand, with our wallets, better quality hardware.

> 1% chance of disconnecting and reconnecting a few seconds later

I actually have such an SSD. It's unusable, when it is in that state. The FS doesn't corrupt the data, but it's hard for the OS to make forward progress, and obviously a lot of writes fail at the application level. (It's a shitty USB implementation on the disk: it disconnects if it's on a USB-3 capable port, and too much transfer occurs. It's USB-2, though; connecting it to a USB-2-only port makes it work just fine.)


At the point where it disappears for seconds it's a distributed system not an attached disk. At this point you have to start applying the CAP theorem.

At least in Unix the assumption is that disks are always attached (and reliable...) so write errors don't typically bubble up to the application layer. This is why losing an NFS mount typically just hangs the system until it recovers.


> At least in Unix the assumption is that disks are always attached (and reliable...)

I want to say a physical disk being yanked from the system (essentially what was happening, as far as the OS could tell) does cause I/O errors in Linux? I could be wrong though, this isn't exactly something I try to exercise.

As for it being a distributed system … I suppose? But that's what an FS's log is for: when the drive reconnects, there will either be a pending WAL entry, or not. If there is, the write can be persisted, otherwise, it is lost. But consistency should still happen.

Now, an app might not be ready for that, but that's an app bug.

But it can always happen that the power goes out, which in my situation is equivalent to a disk yank. There's also small children, the SO tripping over a cable, etc.

But these are different failure modes from some of what the above post listed, such as disks undoing acknowledged writes, or lying about having persisted the write. (Some of the examples are byzantine, some are not.)


The OS notices when disks or NFS servers ("NFS server not responding") go missing, but there's no way to generally broadcast this information to every process. And then you would have to build in error handling everywhere. What does `cat` do when it's informed that the file it's reading is temporarily unavailable? You can either block the next read() call and hang the process until the disk comes back, or return a hard error that it's gone forever. Otherwise each application has to have its own wait/timeout code. (Which is what applications dealing with HTTP have to deal with but at least when you make an HTTP connection you're aware that something could go wrong.)


NFS is a bit of a pathological case.

> but there's no way to generally broadcast this information to every process.

Sure there is: pending I/O returns EIO.

> What does `cat` do when it's informed that the file it's reading is temporarily unavailable?

Let's test it. I inserted a USB disk, mounted it, started `cat`, and yanked the disk. `cat`'s read(2) call returns EIO, and cat dies with an error.

I'd expect similar results from most processes: for applications, there is no sane, in-process way to recover from such a situation; either you can rely on the FS's capabilities, if any, or you need to build those into your files (e.g., how SQLite does).

> And then you would have to build in error handling everywhere.

Apps that presume I/O isn't fallible are broken. Disconnection is hardly the only possible error during I/O.

NFS is, again, rather pathological: the FS driver there I think just keeps the mount alive, and pends the writes until a connection is re-established. I'd argue this is terrible behavior, too: the probability with which an SRE has to debug a hung, nigh unkillable process when NFS is involved in system design is 100%.


Ackchually, the journal is not a WAL (a journal is a log of intended and completed actions, while the WAL actually stores the data before committing it to the data store, it's more advanced and can recover all data ).

But yeah, I agree that is what the journal is for. However, I don't know if the journal in "modern" filesystems works the same when a drive reconnects than when the power is cut out (the OS still being alive and all in the first case).


I would expect that it must: the OS cannot assume that the newly connected drive wasn't altered by another system in the meantime. (Or that it's the same FS, even.)

(although I grant that, a.) the FS probably has a UUID that idents it, and b.) it's only gone for a few ms — but those aren't things that, reasonably, the OS is going to look at to go "eh, we don't need to recover this newly attached disk".)

(I'm not sure I'm still grokking the differences between a WAL & a journal, to you.)


I have basically the opposite problem. I've been looking for a filesystem that maximizes performance (and minimizes actual disk writes) at the cost of reliability. As long as it loses all my data less than once a week, I can live with it.


Don't you want a RAM disk for this? It'll lose all your data (reliably!) when you reboot.

You could also look at this: https://rwmj.wordpress.com/2020/03/21/new-nbdkit-remote-tmpf... We use it for Koji builds, where we actually don't care about keeping the build tree around (we persist only the built objects and artifacts elsewhere). This plugin is pretty fast for this use case because it ignores FUA requests from the filesystem. Obviously don't use it where you care about your data.


> Don't you want a RAM disk for this? It'll lose all your data (reliably!) when you reboot.

Uhh hello, pricing ?


Depends on how much space you need.


Is there RAM that can interface with an M.2 slot rather than the standard ram slots?


CXL is kind of like that. I don't think there's specifically an M.2 form factor but it's (in hand-waving terms) RAM over PCIe.

https://en.wikipedia.org/wiki/Compute_Express_Link


Have you tried allowing ext4 to ignore all safety? data=writeback, barrier=0, bump up dirty_ratio, tune ^has_journal, maybe disable flushes with https://github.com/stewartsmith/libeatmydata


You can also add journal_async_commit,noauto_da_alloc

> maybe disable flushes with https://github.com/stewartsmith/libeatmydata

overlayfs has a volatile mount option that has that effect. So stacking a volatile overlayfs with the upper and lower on the same ext4 could provide that behavior even for applications that can't be intercepted with LD_PRELOAD


Thanks, this looks promising


Cue MongoDB memes


Probably just using it for cache of some kind


You should be able to do this with basically any file system by using the mount options `async`(default) `noatime` disabling journalling, and massively increasing vm.dirty_background_ratio, vm.dirty_ratio, and vm.dirty_expire_centisecs.


Tongue-in-cheek solution: use a ramdisk[1] for dm-writecache in writeback mode[2]?

[1]: https://www.kernel.org/doc/Documentation/blockdev/ramdisk.tx...

[2]: https://blog.delouw.ch/2020/01/29/using-lvm-cache-for-storag...


It all depends on how much reliability are you willing to give up for performance.

Because I have the best storage performance you'll ever find anywhere, 100% money-back guaranteed: write to /dev/null. It comes with the downside of 0% reliability.

You can write to a disk without a file-system, sequentially, until space ends. Quite fast actually, and reliable, until you reach the end, then reliability drops dramatically.


Trouble is you can't use /dev/null as a filesystem, even for testing.

On a related note, though, I've considered the idea of creating a "minimally POSIX-compliant" filesystem that randomly reorders and delays I/O operations whenever standards permit it to do so, along with any other odd behavior I can find that remains within the letter of published standards (unusual path limitations, support for exactly two hard links per file, sparse files that require holes to be aligned on 4,099-byte boundaries in spite of the filesystem's reported 509-byte block size, etc., all properly reported by applicable APIs).


Yeah, I've had good experience with bypassing fs layer in the past, especially on a HDD the gains can be insane. But it won't help as I still need a more-or-less posixy read/write API.

P.S. I'm fairly certain that /dev/null would lose my data a bit more often than once a week.


Not sure if this is feasible, but have you considered dumping binary on the raw disk like done with tapes?


You could even use partitions as files. You could only have 128 files, but maybe that's enough for OP?


How would you cope with losing all you data once a week?


"once a week" was maybe a too extreme example. For my case specifically: lost data can be recomputed. Basically a bunch of compiler outputs, indexes and analysis results on the input files, typically an order of magnitude larger than the original files themselves.

Any files that are important would go to a separate, more reliable filesystem (or uploaded elsewhere).


On top of other suggestions I've seen you get already, raid0 might be worth looking at. That has some good speed vs reliability tradeoffs (in the direction you want).


Some video production workflows are run on 4xraid0 just for the speed - it fails rarely enough and intermediate output is just re-created.


Can confirm. When I can't work off my internal MacBook storage, my working drive is a RAID0 NVME array over Thunderbolt. Jobs setup in Carbon Copy Cloner make incremental hourly backups to a NAS on site as well as a locally-attached RAID6 HDD array.

If the stripe dies, worst case is I lose up to one hour of work, plus let's say another hour copying assets back to a rebuilt stripe.

There are so many MASSIVE files created in intermediate stages of audiovisual [post] production.


The file system is tmpfs. Just don’t reboot :)


Kent use md-faulty to test such a scenario, you can also see here the test-CI https://evilpiepirate.org/~testdashboard/ci


If you have an evil SSD which screws up 5% (1 in 20) of your writes, well... You don't need a better FS. You need a new disk right now.

This is not the kind of frequency you need to protect against. You just need to detect it and replace the drive ASAP.


ISTM one could design a filesystem as a Byzantine-fault-tolerant distributed system that happens to have many nodes (disks) partially sharing hardware (CPU, memory, etc). The result would not look that much like RAID, but would look quite a bit like Ceph and its relatives.

Bonus points for making the result efficiently support multiple nodes, each with multiple disks.


[flagged]


Isn't Linux the most deployed OS in industry (by practitioners)? Are the Linux hyperscalers hiding their FS secret-sauce, or perhaps the "aesthetic ideal" filesystems available to Linux good enough?


I imagine the hyperscalers are all handling integrity at a higher level where individual filesystems on single hosts are irrelevant to the outcome. In such applications, any old filesystem will do.

For people who do not have application-level integrity, the systems that offer robustness in the face of imperfect storage devices are sold by companies like NetApp, which a lot of people would sneer at but they've done the math.


Seen NetApp's boot messages, it's FreeBSD under the hood


They have their own filesystem with all manner of integrity protection. https://en.wikipedia.org/wiki/Write_Anywhere_File_Layout


As is EMC Isilon.


You're interested in more detailed information about bcachefs I highly recommend checking out bcachefs: Principles of Operation.[0]

Also, the original developer of bcachefs (as well as bcache), Kent Overstreet posts status updates from time to time on his Patreon page.[1]

[0] https://bcachefs.org/bcachefs-principles-of-operation.pdf

[1] https://www.patreon.com/bcachefs


Thanks for the links!

I was wondering if bcachefs is architectured with NAND-flash SSD hardware in mind (as recently highlighted on HN in the "Is Sequential IO Dead In The Era Of The NVMe Drive" article [1] [2]), to optimize IO and hardware lifecycle.

Skimming through the "bcachefs: Principles Of Operation" PDF, it appears the answer is no.

[1] https://jack-vanlightly.com/blog/2023/5/9/is-sequential-io-d...

[2] https://news.ycombinator.com/item?id=35878961


It is. There's also plans for ZNS SSD support.


I would listen to what this guy has to say. It looks like he knows something.


Interesting that internally it’s much closer to a database than file systems generally are.


What i really miss when compared to ZFS is ability to create datasets. I really like to use ZFS subvolumes for LXC containers. That way i can have separate sub-btree for each container with it's own size limit without having to create partitions or LVs, format the filesystem and then resize everything when i need to grow the partition or even defragment fs before shrinking it. With ZFS i can easily give and take disk capacity to my containers without having to do any multi step operation that requires close attention to prevent accidental data loss.

Basicaly i just state what size i want that subtree to be and it happens without having to touch underlying block devices. Also i can change it anytime during runtime extremely easily. Eg.:

zfs set quota=42G tank/vps/my_vps

zfs set quota=32G tank/vps/my_vps

zfs set quota=23G tank/vps/my_other_vps

btrfs can kinda do this as well, but the commands are not as straighforward as in zfs.

update: My bad. bcachefs seems to have subvolumes now. there is also some quota support, but so far the documentation is bit lacking, so not yet sure how to use that and if that can be configured per dataset.


Oh! This is very exciting. Bcachefs could be the next gen filesystem that Linux needs[1].

Advantages over other filesystems:

* ext4 or xfs — these two don't use ECC to protect your data, only the filesystem metadata

* zfs — zfs is technically great, but binary distribution of the zfs code is tricky, because the CDDL is GPL incompatible

* btrfs — btrfs still doesn't have reliable RAID5

[1] It's been in development for a number of years. It now being proposed for inclusion in the mainline kernel is a major milestone.


> * zfs — zfs is technically great, but binary distribution of the zfs code is tricky, because the CDDL is GPL incompatible

Building your own ZFS module is easy enough, for example on Arch with zfs-dkms.

But there's also the issue of compatibility. Sometimes kernel updates will break ZFS. Even minor ones, 6.2.13 IIRC broke it, whereas 6.2.12 was fine.

Right now, 6.3 seems to introduce major compatibility problems.

---

edit: looking through the openzfs issues, I was likely thinking of 6.2.8 breaking it, where 6.2.7 was fine. Point stands, though. https://github.com/openzfs/zfs/issues/14658

Regarding 6.3 support, it apparently is merged in the master branch, but no release as of yet. https://github.com/openzfs/zfs/issues/14622


It might help someone: nixos can be configured to always use the latest kernel version that is compatible with zfs, I believe its config.boot.zfs.package.latestCompatibleLinuxPackages .


How is the legal situation of doing that? If I had a company I wouldn't want to get in trouble with any litigious companies.


The ArchZFS project distributes binary kernel images with ZFS integrated. I don't know what the legal situation is for that.

In my case, the Arch package is more of a "recipe maker". It fetches the Linux headers and the zfs source code and compiles this for local use. As far as they are concerned, there is no distribution of the resulting artifact. IANAL, but I think if there's an issue with that, then OpenZFS is basically never usable under Linux.

Other companies distributed kernels with zfs support directly, such as Ubuntu. I don't recall there being news of them being sued over this, but maybe they managed to work something out.


archzfs does not distribute any kernel images, they only provide pre-built modules for the officially supported kernels.


IANAL.

Oracle is very litigious. However, OpenZFS has been releasing code for more than a decade. Ubuntu shipped integrated ZFS/Linux in 2016. It's certain that Oracle knows all about it and has decided that being vague is more in their interests than actually settling the matter.

On my list of potential legal worries, this is not a priority for me.


I would add to this "IANAL But" list

https://aws.amazon.com/fsx/openzfs/

So -- AWS / Amazon are certainly big enough to have reviewed the licenses and have some understanding of potential legal risks of this.


Unless you're distributing, I don't see how anybody could do anything. Personal (or company wide) use has always allowed the mixing of basically any licenses.

The worst case scenarios would be something like Ubuntu being unable to provide compiled modules, but dkms would still be fine. Or the very unlikely ZFS on Linux getting sued, but that would involve a lengthy trial that would allow you to move away from Open ZFS.


The danger is specifically to the copyright holders of Linux - the authors who have code in the kernel. If they do not defend their copyright, then it is not strong and can be broken in certain scenarios.

"Linux copyright holders in the GPL Compliance Project for Linux Developers believe that distribution of ZFS binaries is a GPL violation and infringes Linux's copyright."

Linux bundling ZFS code would bring this text against the GPL: "You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of [the CDDL]."

Ubuntu distributes ZFS as an out of tree module, which taints the kernel at immediately at installation. Hopefully, this is enough to prevent a great legal challenge.

https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/


> If they do not defend their copyright, then it is not strong and can be broken in certain scenarios.

This is not true. Copyrights (like patents) do not have to be actively defended to remain enforceable.

This is partially true of trademarks, however.


Yes, distribution has legal risks. Use does not, it only has the risk that they are unable to get ZFS distributed.


Worth noting Oracle could clear it up very easily, the fact they won't is a worrying tale.


I don't think anything Oracle can do would change the usability of OpenZFS.

I'm also not positive how easily they could fix the distribution problems given OpenZFS has over a decade of work after the split, but as a whole I'm only discussing use, not distribution.


Oracle could do what they did with DTrace in 2017, and release it under the GPL in addition to the CDDL. Then the OpenZFS folks would have to track down every contributor to OpenZFS, and get their permission to re-license their contribution. Contributions from folks who refused or couldn't be found would have to be re-implemented or dropped.

It is a pain in the ass, but it has been done before.


You're right that DKMS is fairly easy (at least until you enable secure boot).

> Even minor ones, 6.2.13 IIRC broke it, whereas 6.2.12 was fine.

Interesting!

It's just a shame the license has hindered adoption. Ubuntu were shipping binary ZFS modules at one point, but they have walked back from that.


> You're right that DKMS is fairly easy (at least until you enable secure boot).

Still easy. Under Arch, the kernel image isn't signed, so if you enable secure boot you need to fiddle with signing on your own. At that point, you can just sign the kernel once the module is built. Works fine for me.


> Ubuntu were shipping binary ZFS modules at one point, but they have walked back from that.

This is incorrect? Ubuntu is still shipping binary modules.


Right, but various things point to ZFS being de facto deprecated: https://www.omgubuntu.co.uk/2023/01/ubuntu-zfs-support-statu...


> various things point to ZFS being de facto deprecated

I'm not sure that's the case? Your link points to the ZFS on root install being deprecated on the desktop. I'm not sure what inference you/we can draw from that considering ZFS is a major component to LXD, and Ubuntu and Linux's sweet spot is as a server OS.

> Ubuntu were shipping binary ZFS modules at one point, but they have walked back from that.

Not to be persnickety, but this was your claim, and Ubuntu is still shipping ZFS binary modules on all it's current releases.


Yeah, my wording was clumsy, but thanks for assuming good faith. I essentially meant their enthusiasm had wained.

It's good you can give reasons ZFS is still important to Ubuntu on the server, although as a desktop user I'm sad nobody wants to ship ZFS for the desktop.


> zfs is technically great

It's only great due to the lack of competitors in the checksummed-CoW-raid category. It lacks a bunch of things: Defrag, Reflinks, On-Demand Dedup, Rebalance (online raid geometry change, device removal, device shrink). It also wastes RAM due to page cache + ARC.


> It's only great due to the lack of competitors in the checksummed-CoW-raid category.

blinks eyes, shakes head

"It's only great because it's the only thing that's figured out how to do a hard thing really well" may be peak FOSS entitlement syndrome.

Meanwhile, btrfs has rapidly gone nowhere, and, if you read the comments to this PR, bcachefs would love to get to simply nowhere/btrfs status, but is still years away.

ZFS fulfills the core requirement of a filesystem, which is to store your data, such that when you read it back you can be assured it was the data you stored. It's amazing we continue to countenance systems that don't do this, simply because not fulfilling this core requirement was once considered acceptable.


> Meanwhile, btrfs has rapidly gone nowhere […]

A reminder that it came out in 2009:

* https://en.wikipedia.org/wiki/Btrfs

(ext4 was declared stable in 2008.)


Yes! File systems are hard. My prediction is that it will be *at least* 10 years before this newfangled FS gains both feature- and stability parity with BTRFS and ZFS.

Also, BTRFS (albeit a modified version) has been used successfully in at least one commercial NAS (Synology), for many years. I don't see how that counts as "gone nowhere".


> Also, BTRFS (albeit a modified version) has been used successfully in at least one commercial NAS (Synology), for many years. I don't see how that counts as "gone nowhere".

Excuse me for sounding glib. My point was btrfs isn't considered a serious competitor to ZFS in many of the spaces ZFS operates. Moreover, it's inability to do RAID5/6 after years of effort is just weird now.


Years of effort is a stretch, nobody serious (read: who's willing to pay for it) has been working on raid5/6 pretty much since its inception (since nobody serious needs raid 5/6 at all). Western Digital promised to fix it a couple of years ago, but there doesn't seem to be much progress since then.


raid1c2, raid1c3 and raid1c4 will get you close to RAID5/6 on btrfs (in terms of redundancy), albeit with tad less disk space, but still more than normal raid 1.

> ZFS in many of the spaces ZFS operates

not a lot.


>> My point was btrfs isn't considered a serious competitor to ZFS in many of the spaces ZFS operates.

> raid1c2, raid1c3 and raid1c4 will get you close > not a lot.

I guess I just don't understand this take. btrfs doesn't do what ZFS does, and still isn't as reliable as ZFS is. When it is, maybe I'll take another look. But this is really the problem with btrfs stans -- they've been saying it's ready, when it's not, for years.

Fix the small stuff. Make it reliable. Quit making promises about how it's as good as ZFS, when it's clear it doesn't do all the things ZFS does, just some of the things most of the time.


Are all the foot guns described described in 2021 been fixed?

* https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu...


Not sure about "all", but apart from that article being more pissy than strictly necessary, RAID1 can now, in fact survive losing ore than one disk. That is, provided you use RAID1C3 or C4 (which keeps 3 or 4 copies, rather than the default 2). Also, not really sure how RAID1 not surviving >1 disk failure is a slight against btrfs, I think most filesystems would have issues there...

As for the rest of the article — the tone rubs me the wrong way, and somehow considering a FS shit because you couldn't be bothered to use the correct commands (the scrub vs balance ranty bit) doesn't instill confidence in me that the article is written in good faith.

I believe the writer's biggest hangup/footgunnage with btrfs is still there: it's not zfs. Ymmv.


The author is a Canonical fanboy, a company whom put all their effort into ZFS, which now seems to have been in vain.


They put no real effort into ZFS, their own userspace tooling was only half-baked and then thrown aside. Continuing to build and ship the kernel module doesn't cost them much, the hard work of ZFS development is done by others. Quite interesting how you blame others for being fanboys while being a fanboy yourself.


> The author is a Canonical fanboy, a company whom put all their effort into ZFS, which now seems to have been in vain.

ZFS is in pretty heavy use?


I don't see what's entitled about the idea that "it fulfills the core requirements" is enough to get it "good" status but not "great" status. Even if that's really rare among filesystems.


> I don't see what's entitled about the idea that "it fulfills the core requirements" is enough to get it "good" status but not "great" status. Even if that's really rare among filesystems.

You don't see? Well. Uh, I think this review would make more sense coming from someone who wrote a "great" filesystem, or at the very least understood how hard it was to write ZFS. "Big whoop", or "I don't understand what the big deal is" is what is entitled about it.


If "good" is an accurate assessment, then "It's only great because of lack of competitors" seems like a fair statement to me, and far from "big whoop". The list of problems they put in the post is real and meaningful, and they didn't say it was bad, they implied something more like big fish in a small pond.

> this review would make more sense coming from someone who wrote a "great" filesystem

That's not a reasonable standard for reviewers.


> That's not a reasonable standard for reviewers.

It is when the attitude is "What's the big deal?" ZFS is two decades on, and, is by many metrics, still the state of the art in the traditional filesystem space. What ZFS does is extremely hard, and the reason we know is because every open source competitor can't touch it, so I'm saying -- have a little respect.

You don't like it? You prioritize reflinks (ZFS just merged block cloning BTW, so hello reflinks: https://github.com/openzfs/zfs/pull/13392) or offline dedup over RAIDZ? Fine. But make sure your favorite filesystem (or your new filesystem) can do what ZFS does, day in and day out, before you throw that shade. If it does half the things, or breaks sometimes, it's still a toy compared to ZFS.


I think you read the comment as a lot harsher than it actually was.


> [ZFS is] only great due to the lack of competitors in the checksummed-CoW-raid category.

You forgot robust native encryption, network transparent dump/restore (ZFS send/receive) - and broad platform support (not so much anymore).

For a while you could have a solid FS with encryption support for your USB hd that could be safely used with Linux, *BSD, Windows, Open/FOSS Solaris and MacOS.


Is it just the implementation of zfs which is owned by oracle now? I wonder how hard it would be to write a compatible clean room reimplementation of zfs in rust or something, from the spec.

Even if it doesn’t implement every feature from the real zfs, it would still be handy for OS compatibility reasons.


I would suppose it would take years of effort? and a lot of testing in search of performance enhancements and elimination of corner cases. Even if the code of the FS itself is created in a provably correct manner (a very tall order even with Rust), real hardware has a lot of quirks which need to be addressed.


I wish the btrfs (and perhaps bcachefs) projects would collaborate with OpenZFS to rewrite equivalent code that they all used.

It might take years, but washing Sun out of OpenZFS is the only thing that will free it.


OpenZFS is already free and open source. Linux kernel developers should just stop punching themselves in face.

One way to solve the ZFS issue, Linus Torvalds could call a meeting of project leadership, and say, "Can we all agree that OpenZFS is not a derived work of Linux? It seems pretty obvious to anyone who understands the meaning of copyright term of art 'derived work' and the origin of ZFS ... Good. We shall add a commit which indicates such to the COPYING file [0], like we have for programs that interface at the syscall boundary to clear up any further confusion."

Can you imagine trying to bring a copyright infringement suit (with no damages!) in such an instance?

The ZFS hair shirt is a self imposed by semi-religious Linux wackadoos.

[0]: See, https://github.com/torvalds/linux/blob/master/LICENSES/excep...


Linus has some words on this matter:

> And honestly, there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself that says that yes, it's ok to do so and treat the end result as GPL'd.

> Other people think it can be ok to merge ZFS code into the kernel and that the module interface makes it ok, and that's their decision. But considering Oracle's litigious nature, and the questions over licensing, there's no way I can feel safe in ever doing so.

> And I'm not at all interested in some "ZFS shim layer" thing either that some people seem to think would isolate the two projects. That adds no value to our side, and given Oracle's interface copyright suits (see Java), I don't think it's any real licensing win either.

https://www.realworldtech.com/forum/?threadid=189711&curpost...


> Linus has some words on this matter:

I hate to point this out, but this only demonstrates Linux Torvalds doesn't know much about copyright law. Linus could just as easily say "I was wrong. Sorry! As you all know -- IANAL. It's time we remedied this stupid chapter in our history. After all, I gave similar assurances to the AFS module when it was open sourced under a GPL incompatible license in 2003."

Linus's other words on the matter[0]:

> But one gray area in particular is something like a driver that was originally written for another operating system (ie clearly not a derived work of Linux in origin). At exactly what point does it become a derived work of the kernel (and thus fall under the GPL)?

> THAT is a gray area, and _that_ is the area where I personally believe that some modules may be considered to not be derived works simply because they weren't designed for Linux and don't depend on any special Linux behaviour.

[0]: https://lkml.org/lkml/2003/12/3/228


> I hate to point this out, but this only demonstrates Linux Torvalds doesn't know much about copyright law.

Maybe.

When I was young I had an honestly awful employment contract waved under my nose that I was expected to sign. It included waivers of "moral rights" - like the company was allowed to give credit for my work to someone else and lie and say I never contributed to a project I worked on. I felt weird about it, so I talked to some senior people I respected.

Some of the advice I got was that the existence of a signed contract only gave the employer cover to could sue me if they wanted to. But if a company starts suing ex-employees over things that sound capricious and unfair, even if they win the court case its an incredibly bad look. Doing so would probably cost them employees and customers. So in a very real sense, particularly awful terms would never be enforced anyway.

This cuts the other way when it comes to Linux, ZFS and Oracle. Imagine Linux includes ZFS in the kernel. Oracle decides that maybe they can claim that linux is thus a derived work of ZFS. Ridiculous, but that might be enough cover to start suing companies who use linux. If it went to court they might eventually lose. So they don't go after Google. They sue smaller companies. They sue Notion. They sue banks. They sue random YC companies right after a raise. And then they graciously offer to settle each time for a mere hundreds of thousands of dollars. Much less than the court case would cost.

It doesn't matter that they're legally in the wrong. Without a court case to demonstrate that they're wrong, they get to play mafia boss and make a killing. This really hurts Linux - which gets a reputation as a business liability. And thats what Linus wants to avoid.

I'm sure Apple has fantastic lawyers. It might surprise you to learn that Apple's lawyers came to the same decision as Linus. Apple was in the process of transitioning MacOS to ZFS when Oracle bought SUN (and by extension acquired ZFS). They'd done all the technical work to make that happen - and they were set to announce it at WWDC, launching ZFS as a headlining feature of the next version of macos. But after oracle got involved, they pulled the plug on the project and threw out all their work. We can only assume Apple's lawyers considered it too big of a legal liability. Even if they might have won the court case, they didn't want to take the risk. Cheaper in the long run to make their own ZFS-like filesystem (APFS) instead. So thats what they did.


> Oracle decides that maybe they can claim that linux is thus a derived work of ZFS.

Whosiwhatsit? That's a non-sequitur.

I get it -- "Be scared of Oracle." But this is fever dreams from r/linux stuff.


It doesn’t need to make sense for oracle to use the threat of lawsuits to bully small companies. It doesn’t matter if it’s crazy if your pockets aren’t deep enough to survive the legal challenge.

I don’t blame people for deciding zfs isn’t worth the risk.


It's a shame some legalese is holding back this great filesystem in the Linux world. I've used ZFS on FreeBSD for 20 years or so and it's amazing, especially since it's been possible to use it as a root FS since about 10 years ago.

I wish the Sun legacy had been sold to a more ethical company, but at least the original ZFS was fully open source and there's nothing Oracle can do about it.


It's not that simple because the compatibility is from both ends; the CDDL states:

> Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. …

> You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License

Now, you can argue up and down and left and right whether this applies on including the CDDL code in a GPL project (with or without exception), but the fact remains that anyone can sue anyone for any reason, and as long as the complaint is not so ludicrous that a judge will throw it out – which this complain isn't – you're going to end up helping out lawyers with their retirement fund.

Linus in general is far more pragmatic about these sort of license issues than, say, the FSF or Stallman. The problem isn't on the Linux end, the problem is "someone may sue our pants off 10 or 20 years down the road". Remember that BSD lawsuit in the 90s? That kind of stuff. Even if Oracle wasn't the Oracle we know and love today it would still be a risk: things can change in the future, companies get taken over.


> It's not that simple because the compatibility is from both ends; the CDDL states:

After reading this and thinking about it, I don't understand any argument a CDDL copyright holder would have against Linux? It frankly doesn't make any sense. You're going to have to explain the "Why?" of this. Start with -- "The facts are ... so the claim is ..."

> Linus in general is far more pragmatic about these sort of license issues than, say, the FSF or Stallman.

I wish they both, Linus and Stallman, would stop living in the 1980s. Both don't seem to understand modern copyright jurisprudence, from and including Computer Associates International, Inc. v. Altai, Inc.. I'm not sure it's intentional, but they certainly have both misled the public about what the GPLv2 entails, and their misinformation re: ZFS has been particularly egregious.

> Even if Oracle wasn't the Oracle we know and love today it would still be a risk: things can change in the future, companies get taken over.

AFAICT there is nothing special about Oracle as a litigant in this instance. AFAICT any Linux copyright holder would have have standing to bring suit.


> I don't understand any argument a CDDL copyright holder would have against Linux?

If you ship ZFS with Linux then those clauses I mentioned may apply. Whether that will hold up in court? I don't know. But do you want to run the risk of trying? I wouldn't, and I certainly can't blame Linux for not wanting to either.

I don't really know much about Computer Associates International. v. Altai, and I freely admit my ignorance on the finer points of US copyright law – it's not a topic I find especially interesting.

However, my point is that this doesn't really matter. Even if we assume you're 100% correct in this regard, that still doesn't mean Oracle can't and won't sue Linux. Anyone can sue anyone, and in the US the defendant is expected to carry their own legal costs regardless of the outcome of that suit unless the suit was filled under spectacular bad faith, which doesn't really apply here.

That is the issue; if ZFS was MIT or GPL then there obviously wouldn't an issue and any lawsuit would be completely baseless and any judge would throw it out in an instance. But with CDDL this is a lot less clear, even if we assume you're 100% correct on copyright, there still is a real dispute here with enough legal ambiguity that a judge will have to make ruling (i.e. they're unlikely to throw out the suit).

> AFAICT there is nothing special about Oracle as a litigant in this instance.

The Oracle-Google Java lawsuit stands out here. They argued that all the way to the Supreme Court with a novel and (IMO) creative view on copyright.

But like I said: it doesn't really matter. Even companies with good reputations can change, through change in management, change in owner, or just change of mind.


> If you ship ZFS with Linux then those clauses I mentioned may apply. Whether that will hold up in court? I don't know.

I'm sorry, but this is FUD by another name. This is a "Maybe there is something wrong with the CDDL too?" take, not a "This. See, this here is the problem with the CDDL" take.

FOSS people are supposed to be against this stuff.

> That is the issue; if ZFS was MIT or GPL then there obviously wouldn't an issue and any lawsuit would be completely baseless and any judge would throw it out in an instance. But with CDDL this is a lot less clear

Let me get this straight -- the law is complex, but it would be less complex if we had a different license, also we don't like this license and Oracle, so we won't even try with this license, even though it may not be an issue.

All of this is a giant turn off for me. I don't find it hard to believe that Canonical made the determination it did, because they're not beholden to a view that we should be scared of Oracle for indeterminate reasons, all the time.

And since you brought it up -- when Oracle sued Google, the basis of the suit was a claim that Google had violated the GPL because of, yes, another ridiculously broad interpretation of copyright law centered around the GPL. These over-broad interpretations, and attendant scared straight FUD campaigns, are not a service to the Linux/FOSS community, and Linux and the FSF and their minions should stop now. We should pick a side and the side should be in the majority of Oracle v. Google, and a long line of cases that say roughly similar things, not FSF, et. al., craziness.


Even if you were to be able to say that OpenZFS is not a derived work of Linux, all it would allow you to do is to distribute OpenZFS. You would _still_ not be able to distribute OpenZFS + Linux as a combined work.

(I am one of these guys who thinks what Ubuntu is doing is crossing the line. To package two pieces of software whose license forbids you from distributing their combination in a way that "they are not combined but can be combined with a single click" is stretching it too much. )

It would be much simpler for Oracle to simply relicense older versions of ZFS under another license.


> Even if you were to be able to say that OpenZFS is not a derived work of Linux, all it would allow you to do is to distribute OpenZFS. You would _still_ not be able to distribute OpenZFS + Linux as a combined work.

Why? Linus said such modules and distribution were acceptable re: AFS, an instance which is directly on point. See: https://lkml.org/lkml/2003/12/3/228


Where is he saying that you can distribute the combined work? That would not only violate the GPL, it would also violate AFS's license...

The only thing he's saying that there is that he's not even 100% sure whether AFS module is a derived work or not (if it was, it would be a violation _just to distribute the module by itself_!). Go imagine what his opinion will be on someone distributing a kernel already almost pre-linked with ZFS.

Not that it matters, since he's not the license author not even the copyright holder these days...


> Where is he saying that you can distribute the combined work?

What's your reasoning as to why one couldn't, if we grant Linus's reasoning re: AFS as it applies to ZFS?

> Not that it matters, since he's not the license author not even the copyright holder these days...

Linux kernel community has seen fit to give its assurances re: other clarifications/exceptions. See the COPYING file.


> What's your reasoning as to why one couldn't

Simply put, because it's a license incompatible with the GPL, as it literally says so on the page of the creators of the GPL, Wikipedia, etc.

> if we grant Linus's reasoning re: AFS as it applies to ZFS?

You are the one claiming that Linus reasoning implies the AFS code combined with the Linux kernel _can be distributed_. Linux is not actually saying that in the post you quoted. I am asking for where he says that.

> Linux kernel community has seen fit to give its assurances re: other clarifications/exceptions. See the COPYING file.

It doesn't really matter; not one individual can grant "an exception" unless it was already allowed to begin with (the GPL _explicitly_ grants the system library exception). Linus even says so in the very beginning of the post you quoted ("No such exception exists. There's a clarification [...]").


> Simply put, because it's a license incompatible with the GPL, as it literally says so on the page of the creators of the GPL, Wikipedia, etc. > not one individual can grant "an exception"

"The creators of the GPL"? You mean the FSF? If you don't think Linus and the kernel devs have standing, what standing or authority do the FSF and Wikipedia have? Neither are a licensor of the Linux kernel either.

I'm saying the Linux kernel devs have seen fit to give their view, and they think it carries weight, and seems to govern the actual practice and use, and this would seem to be another analogous instance where they could give their view again.

> You are the one claiming that Linus reasoning implies the AFS code combined with the Linux kernel _can be distributed_. Linux is not actually saying that in the post you quoted. I am asking for where he says that.

Linus doesn't explicitly say this in that post. I thought that was clear enough from ... reading the linked post. However, I do feel it naturally follows from what he says. Otherwise, this discussion would have an angels on the head of a pin quality (which we're talking about the GPL on the internet, so...).

Perhaps we simply disagree about the underlying copyright jurisprudence? After all, you might say, the GPLv2 states: "But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License,..." and "a 'work based on the Program' means either the Program or any derivative work under copyright law". Maybe, you say, ZFS is a "derived work" under copyright law, if not a "derived work" given the plain meaning of derived work?

I happen to believe that courts would not be sympathetic to broad claims of what is "derived" software/infringement, like those made by the FSF, and have not been since Computer Associates v. Altai (and a long line of other cases). For example, you might see Sega v. Accolade, where a court of appeals held that Accolade could reverse engineer Sega’s video game system to create games that ran on the system, even though it involved copying Sega’s code. And I happen to believe the FSF has misled the FOSS community as to the state of the underlying copyright law with their (mis)interpretations of their licenses.


> "The creators of the GPL"? You mean the FSF? If you don't think Linus and the kernel devs have standing, what standing or authority do the FSF and Wikipedia have? Neither are a licensor of the Linux kernel either.

They are, at least, actual lawyers.

> I thought that was clear enough from ... reading the linked post. However, I do feel it naturally follows from what he says.

No, it doesn't.

If OpenAFS is a derived work, then it is in violation of the GPL to distribute it at all. i.e. _it cannot exist_ for all practical purposes.

If OpenAFS is not a derived work, then you can distribute it. You can still work on it, distribute it, and individual users may combine it with Linux on their own machines.

Under no circumstances you can distribute a Linux kernel combined with a non-GPL-compatible module and call it a day. Note that this does not depend _at all_ on whether OpenAFS is a derived work or not. The kernel combined with OpenAFS _for sure_ is a derived work of the kernel ( you really need the courts to determine that?) , and you can't distribute that without violating the GPL. AND the CDDL.

What Ubuntu does is stretching it (they never distribute binaries, only aggregate sources, which supposedly the user can one-click-combine into a derived work).


> They are, at least, actual lawyers.

Are they?

> Under no circumstances you can distribute a Linux kernel combined with a non-GPL-compatible module and call it a day.

That's exactly the Q presented -- whether the ZFS module/s licensed as CDDL is GPL compatible. I am arguing it is.

> From GPLv2, Section 2: "If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it."

If the module is not a derived work, then my understanding is you can distribute them with "the Program". See also GPLv2, Section 3.

> What Ubuntu does is stretching it (they never distribute binaries, only aggregate sources, which supposedly the user can one-click-combine into a derived work).

My understanding is that this is incorrect. Canonical ships a fully built and linked zfs.ko in a separate package from the main kernel image. See: https://packages.ubuntu.com/kinetic/amd64/linux-modules-5.19...


> Are they?

What's this, ELIZA ?

> I am arguing it is.

That's new: none of the arguments you've presented has helped make that case yet.

I don't understand why you keep trying to argue whether the "ZFS module" _by itself_ is a derived work or not. It is irrelevant. You are distributing _the entire work_, which is obviously derived from the Linux kernel since it literally _contains it_ (or an almost verbatim copy of it). The paragraph you yourself quoted literally says the entire distribution must then be on the terms of this license (GPL), which the CDDL _forbids_.

The GPL has exceptions for "aggregation" but as I said in my opinion what Ubuntu is doing crosses the line. It can hardly be claimed to be aggregation when literally the module they ship is strictly designed to work with the kernel they ship, and it is absolutely useless otherwise. Such interpretation basically makes the LGPL meaningless.


> What's this, ELIZA ?

Do you know if any of these FSF "interpretations" are written by attorneys? I'm curious. Here[0], re: ZFS, Stallman says he solicited advice from others, at least one an attorney, but AFAIK Stallman alone is the named author. But I generally don't trust 2nd hand legal opinions of attorneys, who won't publish under their own name, and don't work for me.

> I don't understand why you keep trying to argue whether the "ZFS module" _by itself_ is a derived work or not. You are distributing _the entire work_, which is obviously derived from the Linux kernel since it literally _contains it_ (or an almost verbatim copy of it).

This is obviously a point of disagreement. You might see [1], written by an actual attorney and expert in these issues: "With the ambiguous definitions in the GPL, and rather paltry protections provided to software under the Copyright Act, kernel module code likely falls outside of the definition of 'derivative works.'"

> The paragraph you yourself quoted literally says the entire distribution must then be on the terms of this license (GPL), which the CDDL _forbids_.

Again [1], re: the paragraph I quoted and Section 0, it notes: "...[T]he GPL fails to make the distinction between a work containing the Program and a work based on the Program, or collective and derivative works as Congress defined them under the act. Combining these terms into a single all-encompassing definition is illogical, especially given the GPL’s reference in the same sentence to copyright law and the importance of those legal terms of art under the act."

Which I have to agree with! When the GPL says "The "Program", below, refers to any such program or work, and a 1) "work based on the Program" means either the Program or any derivative work under copyright law:" and then says 2) "that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language." The correct interpretation is not to conflate two distinct copyright terms. That actually the GPL applies to both. It's to read out the clearly illogical, contradictory language.

[0]: https://www.fsf.org/licensing/zfs-and-linux

[1]: https://www.networkworld.com/article/2301697/encouraging-clo...


You are quoting someone who tries to argue whether the module by itself or not is a derivative work, but I have literally say I find that irrelevant. You literally removed the part where I say "it is irrelevant" when quoting me...

You miss the point. I'm not claiming that the GPL applies to ZFS code, most definitely not when it is obviously not derived from any GPL software. Whether the ZFS Linux module is derivative or not I don't care. I am claiming that the GPL applies when you _distribute Linux itself_. What possible point of contention you could have? Linux is obviously GPL...

If you are trying to distribute Linux you have to do it under the terms of the GPL, whatever your opinion of them are, in the same way that when you want to _use_ Windows you have to use it under the terms of the Windows EULA, whatever your opinion of Windows is. Unlike the Windows EULA, the GPL doesn't put any limitations to use, so at your own home you can do whatever you want (including combining it with ZFS, if the CDDL were to allow you), but you can't distribute the combined work (because the GPL forbids you to do so)!


> Unlike the Windows EULA, the GPL doesn't put any limitations to use, so at your own home you can do whatever you want (including combining it with ZFS, if the CDDL were to allow you), but you can't distribute the combined work (because the GPL forbids you to do so)!

This is what I'm arguing is wrong. GPL explicitly allows distributing a collective/combined work, in contrast to a derived work. See, again, GPLv2, Section 2.


Which part exactly you claim allows you to do that? The only which remotely even allows you to do so is the "mere aggregation", and mere aggregation I already mentioned like 2 days ago.

The only part you mentioned so far:

> From GPLv2, Section 2: "If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it."

Is literally saying that the whole must be licensed under the GPL (note: even whether there are sections that may be considered independent and/or separate and/or derivative and/or whatever), otherwise you just can't distribute it.

Just think about it. The LGPL would make no sense if this wasn't the case.


> Is literally saying that the whole must be licensed under the GPL (note: even whether there are sections that may be considered independent and/or separate and/or derivative and/or whatever), otherwise you just can't distribute it.

You seem to think it's very clear what the "whole work" is and that it must include ZFS.

See my comment of 2 days ago.[0] Section 2 states "...a whole which is a work based on the Program..." which is, as I've already mentioned, defined at Section 0. I'll say again -- at Section 0, it says "a work based on the Program" is either 1) the licensed program (Linux only), or 2) "any derivative work under copyright law", which ZFS is not, so the combination is such a "mere aggregation."

> Just think about it. The LGPL would make no sense if this wasn't the case.

Whether the LGPL makes sense or not is not really a concern of mine, but the LGPL also only applies to "derived works" as well.

[0]: https://news.ycombinator.com/item?id=35914449


Not exactly ZFS in Rust, but more like a replacement for ZFS in Rust: https://github.com/redox-os/tfs

Worked stalled, though. Not compatible, but I was working on overlayfs for freebsd in rust, and it was not pleasant at all. Can't imagine making an entire "real" file system in Rust.


> wonder how hard it would be to write a compatible clean room reimplementation of zfs in rust or something, from the spec

As for every non-trivial application - almost impossible.


If I understand things correctly, the only thing Oracle owns that could impact Open ZFS is the patents the CDDL permits the usage of. Would a clean room implementation even matter?


Reflinks and copy_file_range() are just landing in OpenZFS now I think? (Block cloning)


Block cloning support has indeed recently landed in git and already allows for reflinks under FreeBSD. Still has to be wired up for Linux though.


Really excited about this.

Once support hits in Linux, a little app of mine[0] will support block cloning for its "roll forward" operation, where all previous snapshots are preserved, but a particular snapshot is rolled forward to the live dataset. Right now, data is simply diff copied in chunks. When this support hits, there will be no need to copy any data. Blocks written to the live dataset can just be references to the underlying snapshot blocks, and no extra space will need to be used.

[0]: https://github.com/kimono-koans/httm


What does it mean to roll forward? I read the linked Github and I don't get what is happening

> Roll forward to a previous ZFS snapshot, instead of rolling back (this avoids destroying interstitial snapshots):

     sudo httm --roll-forward=rpool/scratch@snap_2023-04-01-15:26:06_httmSnapFileMount
    [sudo] password for kimono:
    httm took a pre-execution snapshot named: rpool/scratch@snap_pre_2023-04-01-15:27:38_httmSnapRollForward
    ...
    httm roll forward completed successfully.
    httm took a post-execution snapshot named: rpool/scratch@snap_post_2023-04-01-15:28:40_:snap_2023-04-01-15:26:06_httmSnapFileMount:_httmSnapRollForward


From the help and man page[0]:

    --roll-forward="snap_name"

    traditionally 'zfs rollback' is a destructive operation, whereas httm roll-forward is non-destructive.  httm will copy only files and their attributes that have changed since a specified snapshot, from that snapshot, to its live dataset.  httm will also take two precautionary snapshots, one before and one after the copy.  Should the roll forward fail for any reason, httm will roll back to the pre-execution state.  Note: This is a ZFS only option which requires super user privileges.
I might also add 'zfs rollback' is a destructive operation because it destroys snapshots between the current live version of the filesystem and the rollback snapshot target (the 'interstitial' snapshots). Imagine you have a ransom-ware installed and you need to rollback, but you want to view the ransomware's operations through snapshots for forensic purposes. You can do that.

It's also faster than a checksummed rsync, because it makes a determination based on the underlying ZFS checksums, or more accurate than a non-checksummed rsync.

This is a relatively minor feature re: httm. I recommend installing and playing around with it a bit.

[0]: https://github.com/kimono-koans/httm/blob/master/httm.1


What I don't understand is: aren't zfs snapshots writable, like in btrfs?

If I wanted to rollback the live filesystem into a previous snapshot, why couldn't I just start writing into the snapshot instead? (Or create another snapshot that is a clone of the old one, and write into it)


> What I don't understand is: aren't zfs snapshots writable, like in btrfs?

ZFS snapshots, following the historic meaning of "snapshot", are read-only. ZFS supports cloning of a read-only snapshot to a writable volume/file system.

* https://openzfs.github.io/openzfs-docs/man/8/zfs-clone.8.htm...

Btrfs is actually the one 'corrupting' the already-accepted nomenclature of snapshots meaning a read-only copy of the data.

I would assume the etymology of the file system concept of a "snapshot" derives from photography, where something is frozen at a particular moment of time:

> In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. […] To avoid downtime, high-availability systems may instead perform the backup on a snapshot—a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data. Most snapshot implementations are efficient and can create snapshots in O(1).

* https://en.wikipedia.org/wiki/Snapshot_(computer_storage)

* https://en.wikipedia.org/wiki/Snapshot_(photography)


Sure, there's lots of room for improvement. IIRC, rebalancing might be a WIP, finally?

But credit where credit is due: for a long time, ZFS has been the only fit for purpose filesystem, if you care about the integrity of your data.


Afaik true rebalancing isn't in the works. Some limited add-device and remove-vdev features are in progress but AIUI they come with additional overhead and aren't as flexible.

btrfs and bcachefs rebalance leave your pool as if you had created it from scratch with the existing data and the new layout.


Yeah world decided just replicating data somewhere is far preferable if you want to have resilience, instead of making the separate nodes more resilient.


“Wastes” ram? That’s a tunable my friend.


https://github.com/openzfs/zfs/issues/10516

The data goes through two caches instead of just page cache or just arc as far as I understand it.


Can I totally disable ARC yet?


    zfs set primarycache=none foo/bar
?

Though this will amplify reads as even metadata will need to be fetched from disk, so perhaps "=metadata" may be better.

* https://openzfs.github.io/openzfs-docs/man/7/zfsprops.7.html...


I'm curious what your workflow is that not having any disk caching would have acceptable performance.


A workflow where the person doesn't understand that RAM isn't wasted and it just their utility to show usage is wrong. Imagine being mad at file system cache being stored in RAM.


The problem with ARC in ZFS on Linux is the double caching. Linux already has a page cache. It doesn't need ZFS to provide a second page cache. I want to store things in the Linux page cache once, not once in the page cache and once in ZFS's special-sauce cache.

If ARC is so good, it should be the general Linux page cache algorithm.


I maybe wrong, but I remember (circa 2011) that any access to ZFS on Linux entirely ignored page cache buffers unless you use mmap, so by "default" only binaries and libraries get double cached.

ARC is better than page cache in linux right now. It's not used by linux because:

1) Linus irrational hate towards ZFS (every rant I read shows clearly that he has next to zero knowledge about ZFS)

2) Patents and Licensing

Also, it's a linux's problem that it's doing that, not ZFS - nothing is cached twice on other platforms that run ZFS. Why? See reason #1.


Are you really saying: “design of user land code Y should be in the kernel”?

ZFS has been run just fine on a system with 1GB of ram. Ram issues with ZFS are just FUD.


>If ARC is so good, it should be the general Linux page cache algorithm

Not possible until IBM's patent expires


Well patent 6,996,676 was filed in November 2002, which should mean it's expired now?

I guess there's a few others listed in various places that were filed up to 2006-2008, I'm not sure how important they are.


> btrfs still doesn't have reliable RAID5

Synology offers btrfs + RAID5 without warning the user. I wonder why they’re so confident with it.



Thanks for the link!


Synology doesn't use the btrfs raid - AIUI they layer non-raid btrfs over raid LVM


Thanks, I wonder what the advantage of using FS-based RAID is.


Btrfs still highly recommends a raid1 mode for Metadata, but for data itself, the raid-5 is fine.

I somewhat recall there being a little progress on trying to fix the remaining "write hole" issues, in the past year or two. But in general, I think there's very little pressure to do so because so very many many people run raid-5 for data already & it works great. Getting Metadata off raid1 is low priority, a nice to have.


Still, even with raid1 for metadata and raid5 for data, the kernel still shouts at you about it being EXPERIMENTAL every time you mount such a filesystem. I understand that it's best to err on the side of caution, but that notice does a good job of persisting the idea that btrfs isn't ready for prime-time use.

I use btrfs on most of my Linux systems now (though only one with raid5), except for backup disks and backups volumes: those I intend to keep on ext4 indefinitely.


Raid5 works ok until you scrub. Even scrubbing one device at a time is a barrage of random reads sustained for days at a time

I’ll very happily move back from MD raid 5 when linear scrub for parity raid lands


Great point. I need to test this more & find out.

I have seen a bunch of changes in the past year go by about how scrubbing is prioritized & what capacity it is allocated. On the one hand, unsettling. But also, good to see folks are prodding at this area.


> [1] It's been in development for a number of years. It now being proposed for inclusion in the mainline kernel is a major milestone.

not a measure of quality in the slightest. btrfs had some serious bugs over the years despise being in mainline


True, but bcachefs gives the impression of being better designed, nor being rushed upstream. I think it helps that bcachefs evolved from bcache.


The feature that caught my eye is the concept of having different targets.

A fast SSD can be set as the target for foreground writes, but that data will be transparently copied in the background to a "background" target, i.e. a large/slow disk.

If this works, it will be awesome.


You can also have that at block level (which is where bcache itself comes from). Facebook used it years ago and I had it on an SSD+HDD laptop... a decade ago at least? Unless you want the filesystem to know about it, it's ready to go now.


Look up --write-mostly and --write-behind options in mdadm(8) man page.

I can't recommend such a setup though. It works very poorly for me.


See the lvmcache(7) manpage, which I think may be what the earlier poster was thinking of. It isn't an asymmetric RAID mode, but a tiered caching scheme where you can, for example, put a faster and smaller enterprise SSD in front of a larger and slower bulk store. So you can have a large bulk volume but the recently/frequently used blocks get the performance of the fast cache volume.

I set it up in the past with an mdadm RAID1 array over SSDs as a caching layer in front of another mdadm array over HDDs. It performed quite well in a developer/compute workstation environment.



> A fast SSD can be set as the target for foreground writes, but that data will be transparently copied in the background to a "background" target, i.e. a large/slow disk.

This is very similar in concept to (or an evolution of?) ZFS's ZIL:

* https://www.servethehome.com/what-is-the-zfs-zil-slog-and-wh...

* https://www.truenas.com/docs/references/zilandslog/

* https://www.45drives.com/community/articles/zfs-caching/

When this feature was first introduced to ZFS in the Solaris 10 days there was an interesting demo from a person at Sun that I ran across: he was based in a Sun office on the US East Coast where he did stuff, but had access to Sun lab equipment across the US. He mounted iSCSI drives that were based in (IIRC) Colorado as a ZFS poool, and was using them for Postgres stuff: the performance was unsurprisingly not good. He then add a local ZIL to the ZFS pool and got I/O that was not too far off from some local (near-LAN) disks he was using for another pool.


ZIL is just a fast place to write the data for sync operations. If everything is working then the ZIL is never read from, ZFS uses RAM as that foreground bit.

Async writes on a default configuration don't hit the ZIL, only RAM for a few seconds then disk. Sync writes are RAM to ZIL, confirm write, then RAM to pool.


But ZIL is a cache, and not usable for long-term storage. If I combine a 1TB SSD with a 1TB HDD, I get 1TB of usable space. In bcachefs, that's 2TB of usable space.

Bcache (not bcachefs) is more equivalent to ZIL.


If I recall correctly, Bcachefs grew out of the author's work on bcache, which was explicitly designed for exactly that goal.

I used it extensively back in the day and it was remarkably solid. https://wiki.archlinux.org/title/bcache


Here's a link to the Bcachefs site https://bcachefs.org/

I think it summarizes its features and strengths pretty well, and it has a lot of good technical information.


Does anyone know if there any good links to current benchmarks between the diff types? My googlefu is only finding stuff form 2019.


I can't help reading this name as Bca-chefs

(...I realise it must be B-cache-fs)


Can you elaborate on how you pronounce "Bca"?


I was mentally pronouncing it ‘beaker-chefs’


Maybe we could call it b$fs


Another File system I am interested in is GEFS - good enough fs (rather - "great experimental file shredder" until stable ;-). It's based on B-epsilon trees, a data structure which wasn't around when ZFS was designed. The idea is to build a ZFS like fs without the size and complexity of zfs. So far its plan 9 only and not production ready though there is a chance it could be ported to OpenBSD and a talk was given at NYC*BUG: https://www.nycbug.org/index?action=view&id=10688

Code: http://shithub.us/ori/gefs/HEAD/info.html


huh, this is fun: https://lore.kernel.org/lkml/ZFrBEsjrfseCUzqV@moria.home.lan...

There's a little x86-64 code generator in bcachefs to generate some sort of btree unpacking code.


This is also the point which is the most likely to cause problems for this patch series (which is only fixes and utils added to the kernel) and the bcachefs in general.

Like when you have an entry like "bring back function which could make developing viruses easier (through not a vulnerability by themself) related to memory management and code execution" the default answer is nop .. nooop .. never. (Which doesn't mean that it won't come back).

It seems while it's not necessary to have this it's a non-neglible performance difference.


It would be really nice if he posted the difference with/without the optimisation for context. I hope it's going to be included in the explanation post he's planning.


It looks like the code generator is only available for x86 anyway, so it seems niche that way. I am all about baseline being good performance, not the special case.


He mentions he wants to make the same type of optimization for ARM, so ARM+x86 certainly wouldn't be niche.

I wouldn't even call x86 alone niche...


With eyes on portability, any single-arch-specific code is niche


I'll be eagerly waiting for the upcoming optimization writeup mentioned here: https://lore.kernel.org/lkml/ZFyAr%2F9L3neIWpF8@moria.home.l...


Please post it on HN because I won't remember to go looking for it.


It's bad enough that the kernel includes a JIT for eBPF. Adding more of them without hardware constraints and/or formal verification seems like a bad idea to me.


yeah, most of the kernel maintainers in that thread seem to be against it. bcachefs does seem to also have a non-code-generating implementation of this, as it runs on architectures other than x86-64.


I really hope Linux can get a modern FS into common usage (as in default FS for most distros). After more than a decade ZFS and BTRFS didn't go anywhere. Something that's just there as a default, is stable, performs decently (at least on ext4 level) and brings modern features like snapshots. Bcachefs seems to have a decent shot.

What I'd like to see even more though would be a switch from the existing posix based filesystem APIs to a transaction based system. It is way too complicated to do filesystem operations that are not prone to data corruption should there be any issues.


Btrfs is the default on a few systems already like Fedora, suse, Garuda, easynas, rockstor, and some others. It's not the default in Ubuntu and Debian, but I wouldn't say it didn't go anywhere either.


It looks like Fedora's adoption of btrfs unearthed another data corruption bug recently: https://bugzilla.redhat.com/show_bug.cgi?id=2169947


Wow, that's funny - almost looks like bcachefs explaining a similar issue here https://lore.kernel.org/lkml/20230509165657.1735798-7-kent.o...


I'm using BTRfs on Fedora (by default install) and it's been great over the last year.

The only thing to be aware of is to disable CoW/Hashing on database stores or streaming download folders. Otherwise it'll rehash each file update, which isn't needed.


I've been running it on my NAS-slash-homeserver for... 5 or 6 years now, I think. Root on a single SSD, data on a few HDDs in RAID1. It's been great so far. My desktops are all btrfs too, and the integration between OpenSUSE's package manager and btrfs snapshots has been useful more than once.


Why is it hashing files and not blocks? If a block is hashed and written there's no need to touch it again.


It isn't, and the problem is not in "hashing files" (or blocks for that matter). Checksum calculation is very cheap, especially if you use default settings (which prefer simple & fast checksum functions like crc32/xxhash instead of strong cryptographic functions like sha256/blake2 that can be enabled manually).

The real issue is that modifying large files on a COW filesystem causes it to spread that file's guts all over the disk (i.e. fragmentation), and results in serious performance loss over time.

For some reason that doesn't happen nearly as much on ZFS and it's totally fine for VM and database workloads.


HAMMER2 supports snapshots. I do not have any experiences with it though.


Is HAMMER2 supported on Linux? I thought it was Dragonfly only.



For some reason VM and DB workloads are btrfs's Achilles heel but ZFS seems to handle them pretty well (provided that a suitable recordsize is set). How do they perform on bcachefs?


I've never had a problem with these on BTRFS with COW disabled on their directories...


The issue is that also disables many of the interesting features of BTRFS for those files. No checksumming, no snapshots and no compression. In comparison ZFS handles these features just fine for those kinds of files without the enormous performance / fragmentation issues of BTRFS (without nodatacow).


chattr +C does not disable snapshots. It switches the writes from CoW-everything to CoW-only-after-snapshot. Snapshots work just fine, with only the first write (per extent) after a snapshot paying the CoW cost.


I was not aware this was the case, TIL!


Erasure coding at the filesystem level? Finally!

I've not dared try bcachefs out though, I'm quite wary of data loss, even on my laptop. Does anyone have experience to share?


Had(have) a Laptop that crashed reproducible when touching it wrong. Had a few btrfs corruptions on it and after a while got enough. Have it running bcachefs as rootfs for a few years now and had no issue whatsoever with it. Home is still btrfs (for reasons) and had no data loss on that either. Only problems I had were fixed through booting and mounting it through a rescue system (no fsck necessary) had that twice in 2 years or so. Was too lazy to check what the bcachefs hook (aur package) does wrong.

Edit: Reasons for home being btrfs. I set this up a long fucking time ago and it was more or less meant as a stresstest for bcachefs. Since I didn't want data loss on important data (like my home) I left my home as btrfs


Is there an optimal filesystem or is it all just trade offs? And how far if we come since you know when we were first creating file systems plan nine or whatever to now like it’s as there been any sort of technological Leap like killer algorithm it’s really improve things?


An optimal filesystem... for what?

There is no single filesystem which is optimized for everything, so you need to specify things like

cross-platform transportability, network transparency, hardware interfaces, hardware capability, reliability requirements, cost-effectiveness, required features, expected workload, licensing

and what the track record is in the real world.


It's all tradeoffs.


> These are RW btrfs-style snapshots

There's a word for 'RW snapshots': clones. E.g.

* https://docs.netapp.com/us-en/ontap/task_admin_clone_data.ht...

* http://doc.isilon.com/onefs/9.4.0/help/en-us/ifs_t_clone_a_f...

* https://openzfs.github.io/openzfs-docs/man/8/zfs-clone.8.htm...

* http://www.voleg.info/lvm2-clone-logical-volume.html

In every other implementation I've come across the word "snapshot" is about read-only copies. I'm not sure why btrfs (and now bcachefs?) thinks it needs to muddy the nomenclature waters.


Cloning can also mean simple duplication. I think calling it a RW snapshot is clearer because a snapshot generally doesn't mean simple duplication.


> I think calling it a RW snapshot […]

So what do you call a RO snapshot? Or do you now need to write the prefix "RO" and "RW" everywhere when referring to a "snapshot"?

How do CLI commands work? Will you have "btrfs snapshot" and then have to always define whether you want RO or RW on every invocation? This smells like git's bad front-end CLI porcelain all over again (regardless of how nice the back-end plumbing may be).

This is a solved problem with an established nomenclature IMHO: just use the already-existing nouns/CLI-verbs of "snapshot" and "clone".

> […] is clearer because a snapshot generally doesn't mean simple duplication.

A snapshot generally means a static copy of the data, with bachefs (and ZFS and btrfs) being CoW then it implies new copies are not needed unless/until the source is altered.

If you want deduplication use "dedupe" in your CLI.


> So what do you call a RO snapshot

There should be no read-only snapshot: it's just a writable snapshot where you don't happen to perform a write


> There should be no read-only snapshot: it's just a writable snapshot where you don't happen to perform a write

So when malware comes along and goes after the live copy, and happens to find the 'snapshot', but is able to hose that snapshot data as well, the solution is to go to tape?

As opposed to any other file system that implements read-only snapshots, if the live copy is hosed, one can simply clone/revert to the read-only copy. (This is not a hypothetical: I've done this personally.)

(Certainly one should have off-device/site backups, but being able to do a quick revert is great for MTTR.)


"The COW filesystem for Linux that won't eat your data"

LOL, they know what the problem is at least. I will try it out on some old hard disks. The others (esp. looking at you btrfs) are not good at not losing your entire volumes when disks start to go bad.


Is there a high-performance in-kernel FS that acts as a hierarchical cache that I can export over NFS? Presently I use `catfs` over `goofys` and then I export the `catfs` mount.


Not sure I understand your use case, but if you have to use nfs, cachefilesd is very effective for read heavy workload: https://access.redhat.com/documentation/en-us/red_hat_enterp...


Ah, I'm hoping for the other way around:

slow storage ----> cache ----> export over nfs ---> client


Cool to see it approaching upstreaming.

Does it support mountable subvolumes like btrfs?


I parsed this as “BCA chefs” at first.


How does this compare with ZFS?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: