Hacker News new | past | comments | ask | show | jobs | submit login
The potential risk to ZFS created by the shift in its userbase (utoronto.ca)
185 points by zdw on Jan 28, 2019 | hide | past | favorite | 111 comments

The nail in the coffin was Delphix switching away from Illumos for similar reasons the author cites in another article. As they drive most of the complex feature development directly with staff developers, and by virtue of having Matt Ahrens on staff drive a lot of the leadership and organization of ZFS, it makes sense to follow their lead as the repo of truth and lower the barrier to entry (illusions-gate wasn't working out).

Since FreeBSD support is going to be merged directly into the ZoL repo, ZoL devs will see the FreeBSD CI build results and it will be a lot closer to a unified upstream than the past. Being a downstream instead of integrated into the source of truth made feature development a laborious two way sync and communications were spread over many mailing lists and bug trackers. I think this move will be good for both communities, companies like iX Systems will help get trim across the finish line in ZoL.

In terms of shifting userbase, FreeBSD might even win in terms of ZFS revenue or other metrics like shipping systems due to FreeNAS (would have to spend a few hours researching the ZoL companies business models since a lot of the work is subsidized by the US Government via LLNL). I don't think there will be any tectonic shifts in userbase, there are still pros and cons to both Linux and FreeBSD in different areas and being behind in ZFS wont be wont be a con for either now.

LLNL interest in ZFS is, AFAICS, motivated by using it in their Lustre clusters. And Lustre is pretty solidly Linux-only.

>I don't think there will be any tectonic shifts in userbase

Unrelated to this post, I'm quite interested in knowing how Canonical's move has worked out so far. Does anyone know if they've got customers at any meaningful scale who are using ZoL ? Or is Canonical contributing to ZoL ? There may not be an immediate shift in userbase, but Canonical can certainly make a more comprehensive bid to future customers with their overall offering that also includes ZoL, compared to a specialty storage provider that supports ZFS on FreeBSD.

I think it has mostly lowered the barrier to entry for Linux users already interested in ZFS, which is a good thing for user experience, but I haven't seen Canonical driving any feature dev and they aren't doing extensive backports or uplifts into their LTS distros to stay current with ZoL AFAIK. If they wanted to monetize it better, they could do a spin for the SOHO market that either helped replace proprietary filers the same way FreeNAS does, or highly integrate it into their private cloud stack.

Ask iXsystems. They run FreeNAS to sell hardware and support.


How real is the risk?

ZFS on Linux is loaded via DKMS. DKMS is basically the compile-from-source alternative to binary blob drivers. Is there a reason to believe that ZFS via DKMS is materially different, legally, to Nvidia drivers being loaded via binary modules?

Indeed it appears that Nvidia proprietary drivers these days go via the DKMS route. Are they also at risk?

The risk is in trying to keep compatible with the Linux Kernel, when not part of the Linux Kernel dev team.

If ZFS was a module within the Linux kernel source tree, Linux kernel devs would keep it in mind when making changes to the Linux kernel's APIs. They wouldn't make potentially breaking changes to that API, or replace APIs with ones that can't be made compatible with ZFS at all, without considering the effect it would have on ZFS. The Kernel devs have to take the whole tree into account, including all the modules.

But since ZoL is not part of the kernel, and is instead a separate independent module that gets loaded into the kernel, the ZoL have to play catch up. Instead of Kernel devs working with them to make sure things stay compatible, or giving them a heads up when something needs to be changed for a new version, the ZoL team is on the outside, trying to make sure they don't get boxed out of the Kernel API.

It can be a tenuous position to be in. For a more everyday example, look at the Firefox and Chromium browser updates. For Firefox Quantum they completely revamped their extensions system, making all existing extensions worthless. Either the extension devs rewrote them with the new API, or the extension is just gone. Similarly Chromium is changing its API for how extensions can interact with the DOM and networking, which would make Ublock unusable. If these features were officially part of the browser or official extensions, the browser devs would work with the extension devs to make sure things stayed working; instead the extension developers need to keep up or watch their product die.

If the Linux Kernel completely changed its module API, and ZoL was caught flat footed and needed to make substantial rewrites, what would the projects backers do? Would they commit the dev time and resources needed to make those changes, or would they compare the benefits of ZFS vs the competition -- solutions based on btrfs or LVM, both built into the kernel -- and decide to jump ship before committing more time? That's where the danger really lies. If the Linux Kernel makes a serious breaking change, ZoL's supporters might double down to fix it, or they might just cut their losses.

And while there is an official way to integrate external filesystems into the Linux kernel, FUSE, it's extremely slow compared to Kernel native modules. There was a userland FUSE implementation of ZFS, and there's a reason it's fallen out of favor. ZoL is significantly faster because of its direct kernel extension, but it's playing with fire by not being part of the kernel tree itself.

>ZoL was caught flat footed

Which would essentially be impossible.

OpenZFS is designed in a way that uses a shim layer between ZFS's internal Solarisish API usage and the native OS's usage This allows the same ZFS code to run essentially unmodified on many *nixes without major changes.

In ZoL, this is called SPL (Solaris Porting Layer), and is one of the two kernel modules required to use ZoL. Linux does not export the proper APIs that ZFS requires to work here, and SPL fills that gap.

ZFS modules do "break" often due to internal API changes, but the fix is usually shipped in ZFS stable before the kernel itself is shipped. Only people who follow ZoL development closely ever see the sausage being made.

It is highly unlikely that Linux can ever do anything that ends up with a situation that the SPL and ZFS kernel modules can't easily #ifdef their way out of. It wouldn't make sense for Linux to, either.

> ZFS modules do "break" often due to internal API changes, but the fix is usually shipped in ZFS stable before the kernel itself is shipped. Only people who follow ZoL development closely ever see the sausage being made.

Eh, it depends on the distribution. I use Fedora, and there have been quite a few times when a kernel update has resulted in ZoL becoming useless for a while until they push an updated version. The solution, of course, is just booting the earlier kernel until ZoL catches up.

In fact, this is happening right now. The 4.20 kernel was pushed to updates on Fedora (at least my system) on 1/23. The latest stable ZoL doesn't work with it. They're on RC3 for the next release, though, and that does. Hopefully it comes out soon.

In the mean time, I'm running the last 4.19 kernel. shrug

> I use Fedora, and there have been quite a few times when a kernel update has resulted in ZoL becoming useless for a while until they push an updated version.

Alternatively, you can use the variation of a distro, if it exists, that provides a more stable kernel and package version set. In this case, that would be RHEL/CentOS.

For a workstation, that can be annoying, but as you said, just use an older kernel for a bit longer. Perhaps mark it as to be skipping in normal package operations, and have a cron job that runs to check the kernel specifically and emails you if there are updated versions of the kernel that exist.

For a server, I imagine it's rarely a problem since those should be running more stable distros anyway, since 99% of the time the kernel is older and back-patched (it may be the current kernel, but still weeks after it was released right at point release updates), which should result in a stable API for ZoL.

The trade-offs there don't seem too onerous to me. A little hand configuration for a workstation (of which there's likely one or two for a person to deal with as long as there's smooth server support which is generally both more important to have solid because it can be harder to fix if there's a problem and because it can scale from none to many per person.

Sure, it doesn't bother me a whole bunch, I just thought I'd point out that, at least with Fedora, ZoL is occasionally behind tracking the most recent kernel release.

How does RHEL help there? They ship kernel updates. Those kernel updates once broke HP b120i properietary driver (HP releases new version of this driver for every minor RHEL release). I don't see how fakeraid driver is fundamentally different from ZFS.

Kernel updates are back patched. That means that in-between point release updates (which generally happen every 6-12 months) the kernel version stays the same, and any bug-fixes are ported into the older kernel that is shipped. Point releases may update the kernel version (I think?), but generally keep it the same as well, but they will back port some features into the older kernel as well, not just bug/security fixes. You can see here[1] for RHEL versions and the kernels they ship with.

If a security fix breaks your ZoL integration, my guess is you're actually better off waiting for that to play out and resolve itself than to expect it to work. If a feature back port breaks it, that might be a little more annoying, but I imagine it will be fixed in short order, and you only have to worry about that once every 6-12 months (and it's well publicized).

> Those kernel updates once broke HP b120i properietary driver (HP releases new version of this driver for every minor RHEL release).

If HP is releasing closed source drivers for RHEL, I imagine they would want to be on the certified hardware list and test, or at a minimum seek access to the beta spins of the point releases (which I think is where it broke) so they can test before it comes out. I'm not sure I blame Redhat for HP trying to specifically support RHEL and failing to do so, given the systems I know they have in place to help companies in just that situation (because it helps RHEL users).

In any case, all I'm really noting is that between Fedora, which ships a new kernel version every kernel update (AFAIK) and RHEL/CentOS, which ship larges the same kernel with only the specific changes needed the majority of the time, keeping ZoL working should be vastly easier on RHEL systems (and in fact, any OS which does back patching of kernels, which I believe includes SuSE and the LTS releases of Ubuntu).

1: https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#RHEL_...

It's not really the same kernel. They backport a lot of features with every minor release. They still call it 2.6.x or whatever, but it really is different. I know that RHEL has some subset of internal kernel API that they promised to keep stable within major release, so if HP failed to rely on those API, it's their problem, but it might happen.

On this tangent, other out of tree patchsets like OpenVZ have had similar issues where the kernel has massively changed between versions, and forward porting their changes is challenging at best, even with a massive userbase.

That allows ZoL to run on many different * nixes, which means that if Linux made a drastically breaking change, you could use it on another OS, sure.

And the Linux kernel has made numerous breaking changes to their APIs that ZoL has been able to work around. So it has happened, they've just been able to deal with it.

Despite the belief that a breaking change that ZoL can't work around being improbable, it is still possible. The Linux Kernel could majorly overhaul an API in a major version release, in a way that ZoL can't handle. Given ZoL's status as a separate kernel module not under a GPL license, it's entirely possible that no amount of yelling gets the Linux maintainers to change their mind. In fact Wowfunhappy notes a discussion along those lines is happening currently based on function symbols being removed for an API required for Mac hardware support.

And sure, that compatibility layer means that users could switch to FreeBSD, or Solaris, etc, and keep using ZFS. If that happens, does Delphix change their target platform again to move to FreeBSD? Or do they come up with another solution, and stop supporting ZoL?

Except only recently we have seen an instance of access to an API being removed that has broken the build of ZoL, while this one is at worst case a performance regression it does show that the mainline kernel is very much able to breaking ZoL with an API change.

The last paragraph is pretty telling too.


Example of this happening with ZFS on Linux, right now: https://lore.kernel.org/lkml/20190111180617.2k5uundov6hf4m7h...

From Greg Kroah-Hartman at https://lore.kernel.org/lkml/20190110182413.GA6932@kroah.com

> My tolerance for ZFS is pretty non-existant. Sun explicitly did not want their code to work on Linux, so why would we do extra work to get their code to work properly?

Ouch. It hurts to see Greg still holding onto that old grudge.

Wariness around commercial Unix vendors may've made some sense in 2005 when ZFS was released, but not only does it not make sense 14 years later, but the company the community viewed with suspicion has since entirely ceased to exist.

I spent a long time trying to finagle btrfs because it was the blessed, in-tree copy-on-write filesystem. It was a nightmare. It doesn't take long for ZFS to prove itself the massively superior solution.

Canonical's adoption of ZFS is a welcome relief, along with SFLC guidance that there is no incompatibility with the CDDL and GPL.

We need to bring the rest of the crew along and stop reinventing the wheel here. Linux will be so much better off once it accepts ZFS.

The biggest wtf here is that only some exports are GPL only. One has to wonder if there aren't any ulterior motives for this cherry picking.

Seriously that's crazy.

Its why I've never been a fan of the GPL. Its restrictive instead of permissive.

I feel like there's an 800lbs gorilla in the room that people are either forgetting or intentionally ignoring. Ubuntu promoted ZFS to being a first class citizen over a year ago. They have plenty of devs involved in the Linux kernel. What makes you think they won't continue to advocate for ZoL as well as proactively work on fixing any integration issues they find when new kernels are being developed?

Ubuntu is my daily driver, but don't expect Ubuntu promoting something to mean they're sticking with it for the long haul. See Wayland[1], Unity[2], and NetworkManager (EDIT apparently still supported with Netplan). I once backported a bug in Ubuntu's version of a Torrent Client (Deluge), and got to inform the software's lead developer that Ubuntu had actually stopped using his software by default in the next release in favor of Transmission.

I've been using Ubuntu since 8.04, ran a LoCo team and even did some package maintenance, and I've watched Ubuntu embrace tech whole heartedly, and then drop it like a rock 2 releases later, over and over again. Ubuntu moves fast and breaks things, and them promoting something should not indicate to you that they'll promote it for the long term. Long Term Support does not mean long term new development, and LTS support does not mean helping to implement support in new major Kernel versions.

1. https://blog.ubuntu.com/2018/01/26/bionic-beaver-18-04-lts-t...

2. https://arstechnica.com/information-technology/2018/05/ubunt...

They haven't given up Wayland though, it's just not ready for primetime quite yet - especially not for an LTS. They stuck with Unity for almost eight years as well, that's pretty long haul. I'm unaware of them dumping NetworkManager?

I'd agree their support of ZFS seems to be in odd state, Neil McGovern had to deal with that weirdness as DPL [1] which was very annoying as the DKMS was a "good enough" solution.

[1] https://debconf16.debconf.org/talks/9/

Ah, I saw they had moved to Netplan, I hadn't realized NetworkManager supported a Netplan interface.

Canonical has notoriety as not contributing to the Linux kernel, although they do contribute (1% of contributions from kernel 4.8 to 4.13 [1]). They do promote it, but their power within the kernel developer community is minimal.

[1] https://go.pardot.com/l/6342/2017-10-24/3xr3f2/6342/188781/P...

For a non-pdf source of development stats:


> The risk is in trying to keep compatible with the Linux Kernel, when not part of the Linux Kernel dev team.

But... that was already the case.

Yeah, even if there _weren't_ (arguable, debatable) license incompatabilities, the ZFS team doesn't _want_ it to be just a linux submodule, they want it to be an independent thing that can be used with other OSs too, no? So it's really just part of the challenge of doing _that_, regardless of legal license issues. No?

I don't see how ZoL being the canonical ZFS repository means ZFS is now "just a linux submodule". No, it's a Git repository that easily could (and will, no doubt) support building for multiple target systems.

Any idea why they don't maintain a stable kernel mode counterpart to FUSE? (FKEE?)


> Yeah, HELL NO!

Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.

We do not regress, and we do not regress exactly because your are 100% wrong.

And the reason you state for your opinion is in fact exactly WHY you are wrong.

Your "good reasons" are pure and utter garbage.

The whole point of "we do not regress" is so that people can upgrade the kernel and never have to worry about it.

-Linus Torvalds

The linux kernel has made breaking changes [1] and deprecated APIs [2] internal APIs before.

Specifically, your example brings up the danger that ZoL is in. The Linux kernel does everything in their power to not break User Land APIs. ZoL isn't User Land, and it replaced the previously extremely slow User Land ZFS Linux implementation. It is a Linux kernel module, and as [2] notes, Linux kernel devs are absolutely allowed to make breaking changes to internal APIs. The cost of the increased speed of a Kernel module is the risk of internal API breaking changes. I highly doubt that Linus would scream anywhere near as much about an internal API change that broke a separately maintained linux module with non-GPL code.

1. https://stackoverflow.com/questions/24897511/what-is-the-rep...

2. https://lwn.net/Articles/769365/

Pretty sure that only applies to userland apps, not kernel modules.

That's about the ABI to user-land, not about internal-to-the-kernel interfaces.

My understanding is that Nvidia's non-open source drivers are a pre-compiled binary blob that uses DKMS to compile a wrapper whose only purpose is to re-export existing kernel functionality in a questionably legal attempt to sidestep the GPL requirement. So that's completely different from any module that's actually compiled from source via DKMS.

There’s no questional legality, and end-user can do whatever they want with GPL software. As long as the modules aren’t bundled with the kernel or on the same media the GPL is perfectly happy with the arrangement.

End users aren't the ones accused of violating the GPL. The issue is the non-GPL binary NVidia drivers linking against the GPL kernel without releasing the source code.

The post isn't talking about legal risk?

That's exactly what I'm asking. How real is the risk to Nvidia of using DKMS?

It seems like an exactly parallel situation to me.

(PS: further, I don't think you can separate out the legal risk from attitude / API risk, because of things like this, mentioned elsewhere in this conversation: https://marc.info/?l=linux-kernel&m=154722999728768&w=2 - specifically marking symbols GPL to make them non-available.)

The risk to Nvidia is much lower than ZoL -- Nvidia has the resources to keep up with the kernel. It's maintained in house by the company selling the cards, and they have the resources directly to do development. But just like ZoL, if the Kernel makes more and more breaking changes, Nvidia would eventually make the business decision to cut the proprietary driver loose. However that threshold is likely much higher for a profitable company than a community funded project.

But the real price of being an external module is very clear for Nvidia proprietary driver issues. If you look at distro and kernel bug reports, you'll see piles of reports for Nvidia driver users, which devs won't look into because of the tainted kernel. Some Linux software even blacklists Nvidia driver users from using graphics hardware acceleration for their apps, because it causes so many bugs. Nvidia is operating outside of the open source ecosystem, which means they don't benefit from that ecosystem like open source implementations do. When an Nvidia driver user runs into a bug, they're usually just told to shove off and not use the driver.

It's more compatibility.

I've had the DKMS modules break on what were supposed to be minor API compliant kernel upgrades on a major distro. The code was then not fixed for a month (our fix was to rollback kernel versions and just tag it there, it's since been removed entirely).

Soured me on using ZFS for Linux at all in production.

Speaking of all this, am I the only one where NFS clients with executables on NFS (such as PXE booted machines) get bus errors when the server runs recent 4.19/4.20 Linux kernels with recent ZoL code?

As a reminder, those bus errors are supposed to happen when a client-side mapped file has been unlinked on the server. So the ZFS export does not provide pages to the client on a client page fault the same way NFS would on deletion. Tried dedup and compression on and off, no change. A reboot of the server clears it up for a couple of days, but then it starts the same way. Something seems very broken in caching there.

This popped up during maybe the last 4-5 months.

Interesting article. So is the “unfriendliness” of Linux kernel developers towards ZFS that is mentioned strictly due to licensing issues or are there other points of contention?

The recent Linux 5.0 GPL symbol incident (https://marc.info/?l=linux-kernel&m=154722999728768&w=2) probably didn't help that perception.

In the context of Linus stance that the kernel must never, ever break userland, I find it strange that he has not slapped the change of a function to GPL-only callers as incorrect and reverted it.

You have a policy that you can't break users' programs, but borking their whole file-system is fine? Keep feeding end-users fears of open source petty feuds making people miserable. That works.

Because... That's not userspace, that's kernelspace. Linux's policy on kernelspace API breakage is famously, "we do what we like, and if you're not in-kernel, then you get to keep both pieces when it breaks".

If ZFS wanted to live in userspace (and get Linux's userspace API guarantees) then they could have made a FUSE filesystem instead.

ZFS-FUSE existed before ZoL did. The reason they moved to their SPL-based solution is because FUSE didn't want to make the required changes to support high performance file systems.

SPL now handles the guarantee to separate ZFS's userland components from Linux's constant internal kernelspace API churn. ZFS, internally, doesn't really know what a Linux is beyond a few little bits here and there.

> ZFS-FUSE existed before ZoL did. The reason they moved to their SPL-based solution is because FUSE didn't want to make the required changes to support high performance file systems.

These changes were made years later, but the developer of ZFS-FUSE disappeared, so they were never used.

The FUSE v3 API further builds on that, and today, it's definitely possible to build a high-performance filesystem using FUSE. In fact, there are many examples of this in the wild: GlusterFS, Ceph, NTFS-3G, and so on.

Today, there's absolutely _no_ reason ZFS would not sensibly work as high performance filesystem entirely from userspace. In fact, it still remains an open issue and a desirable target for ZoL: https://github.com/zfsonlinux/zfs/issues/8

That sounds like a missed opportunity, who doesn't want higher performance fuse filesystems?

Glad you brought this up. It feels, at least to me, like FUSE is only for...hmm...gimmick or science-fare projects. Maybe something useful on your development machine. Indeed, at a previous employer, several problems were addressed using a handful of FUSE filesystems, quite elegantly. Of course the caveat was "this ain't prod, duh, it's FUSE". But this doesn't NEED to be the case. Tangentially, microkernels pop up here often, and arguable, moving some filesystems to user space makes sense in a related vein.

I'll admit, I'm all talk and if it really mattered, pull requests are probably welcome.

You're confirming what I just said.

End-user dont'care about splitting hairs. Iwas using Linux in the mid-90's and that kind of stance is what turned me off. The sound card would brek, the networking would break, configuring the modem would be hell after each upgrade.

The attitude that you must not break LibreOffice is not okey but breaking the file-system okay 'because it's a driver!' is just untenable.

Making people's computer fail will turn users away. It's not just a Linux thing. When Windows breaks stuff, all hell break loose on the Internet. The problem with OSS, is that its maintainers have no direct, immediate monetary incentive to amend. The long-term loss of confidence is hard to measure, but it's real.

Using internal kernel APIs is a pretty long way to being userland. Internal APIs are guaranteed to be broken, and for in-tree modules, the breakage is fixed by those, who break it. External modules have to do fixes themselves, thus motivating them to become in-tree.

"v5.0 removes the ability from non-GPL modules to use the FPU or SIMD instructions"

Any reason why?

Well you can read the thread on the mailing list but to put it bluntly - because they wanted to stop ZoL from working. They view non-GPL kernel modules as a violation of the GPL.

"My tolerance for ZFS is pretty non-existant" Greg Kroah-Hartman

"please switch to FreeBSD instead of advocating to violate the copyright and licensing rule on my and others work." Christoph Hellwig

For context, Hellwig is the guy that tried to sue vmware in Germany and had the case dismissed because the evidence submitted summed up to references of stuff on the internet and basically copy and pasted git output. It didn't even include details on which lines of code were allegedly used by vmware, or details on authorship in any of the code comparisons that were present.

Which is a shame, because it would have been nice to see if the shim usage pattern is actually a violation of the GPL - a lot of us would like a clear ruling there. I'm sympathetic to his viewpoint, but that whole ordeal was a bit of a "wtf?" because of how it was handled, and I can't imagine him being sympathetic to anything else in an even somewhat similar situation.

AFAIK, the VMware stuff is beyond shim usage. They straight up replace the kernel and run GPLed drivers inside their kernel.

> They view non-GPL kernel modules as a violation of the GPL.

But then why allow them?

users demand it. If the kernel devs pushed too hard legally a lot of users would switch to BSD (probably freebsd, but there are other good candidates). Linux is slightly better for desktop users, but if you cannot use your graphics hardware for legal reasons BSD will still work and is the lesser evil.

So basically NVIDIA gets a monopoly on the ability to release non-GPL linux kernel modules so that Linux can have a better reputation among desktop users (which is only a small fraction of Linux use in the first place)?

I don't get it -- do they care more about what the users want, or about enforcing copyleft? If it's the latter, then they should be happy if those users who don't care about the GPL move to BSD. If it's the former then they should stop playing games and implement a fair policy for all non-GPL module authors.

they is several thousand people with different motivations and desires. To expect them all to have a common motive would be a mistake.

People not associated with the leadership of the project shouldn't be making decisions about what licensing models are acceptable or not in the first place.

Because the kernel dev decided that interfacing with the kernel through the API would change the kernel too much to not be in compliance with the GPL if your license isnt.

Its not non-GPL. Its non-GPl-compatible.

Which is proooobably a lie. It's pretty hard to imagine how such a generic function as enabling and disabling the fpu could mean that you're combining the two works. But there's not much process to call someone on technically inappropriate use of _GPL. It's a mostly-political tool pretending to be a technical tool.

BTRFS is being positioned as an alternative to ZFS (amongst others) without the licensing issues (and less rational concerns like the current/past politics of it all), so perhaps there is a "why do we X it when Y is close" with a bit of extra NIH syndrome mixed in?

Oracle discontinued BTRFS development when they bought Sun, merged the ZFS and BTRFS team, and then fired anyone on the BTRFS team that didn't have ZFS experience.

BTRFS is basically on maintenance mode, and no one at Redhat/IBM or Ubuntu has any interest in keeping it alive. Anyone that requires HA services will not be using BTRFS.

Redhat ships with XFS as the default file system, Ubuntu ships with Ext4 as default with heavy install-time ZFS support for enterprise storage, and BTRFS is no longer being spoken of by anyone as a potential Ext5 candidate.

90% of this is wrong. Oracle bought Sun in 2009, and continued to make substantial contributions for years after that, and they still have developers actively contributing to Btrfs. The idea they merged the ZFS and Btrfs teams is absurd, the two file systems are nothing alike either in terms of design, on disk format, or code base.

Btrfs is on maintenance mode? Based on what evidence? There's 1k-2k commits per kernel cycle. There are dozens of developers actively working on it. Facebook is certainly using it in production for high availability in concert with Gluster FS, they're also using Btrfs almost exclusively with Tupperware, their containerization service.

Red Hat has quite a lot of device mapper, LVM, and XFS developers so it makes sense for them to work on storage solutions that continue to build on that. And Red Hat hasn't been significantly contributing to Btrfs for many years now, so their lack of interest isn't new and hasn't affected Btrfs development.

>The idea they merged the ZFS and Btrfs teams is absurd, the two file systems are nothing alike either in terms of design, on disk format, or code base.

You've never seen the same team working on two very separate code bases or products?

I have no idea if the btrfs and zfs teams at Oracle were merged, but I don't know that "they are two separate products" is actually a real argument that they weren't. Product teams working on separate things get merged all the time.

I've seen a tiny handful of developers juggle more than one file system at a time. They're that complicated. I'm familiar with most of the Btrfs developers and can't think of a single one who works on ZFS; the most active developers when asked about it quite a few years ago on the Btrfs development list said they were unfamiliar with ZFS and it wasn't used as a guide for Btrfs.

You can track developers through their git commits, and you'll see even when they change companies, they almost invariably keep working on the same filesystem they were before the move. Which is why the idea that Red Hat has no one working on Btrfs, means they want to see it go away is not what's going on at all. They have a lot of other developers already, who'd lose years becoming familiar with Btrfs (or ZFS for that matter). So you're going to see them build on what they already are familiar with, rather than moving laterally to a filesystem technology they're not familiar with.

Is it in maintenance mode? Chris Mason (the main btrfs developer) works for Facebook now and Facebook uses btrfs on "millions of servers" (source: https://code.fb.com/open-source/linux/).

Facebook's usage of it is very specific, but as you mentioned, Chris Mason (who used to also work at Oracle on their BTRFS team before Oracle dumped BTRFS) is there in-house to make sure it is smooth.

However, I'm pretty sure if I boiled the commit history down for the kernel, most of Facebook's commits would be Flashcache, and not BTRFS.

What I forgot, though, is SuSE supports it as their enterprise FS of choice, which is maybe more important than Facebook.

Yes. It's important to recognize that there's a big difference in one's willingness to adopt a technology if its creator is on staff. Chris Mason's full-time job is to make sure the thing he built works well for his employer.

If you work some place that doesn't have that kind of luxury, perhaps a little extra conservatism is warranted before you go deploying across "millions of servers".

Isn't btrfs also at risk? I was under the impression that it lost some steam in the recent past and was on its way to become the next Hurd.

btrfs has a problem with it's image as being unreliable (some well deserved). Redhat obsoleted it, because they use kernel 3.10 in RHEL7 and they were doing nothing but backporting, without having any btrfs expertise in house anyway. Btrfs continues to be developed by Oracle, Suse, Facebook, Fujitsu and the rest of the usual suspects.

RHEL is famous for using kernels that are ancient, even by the notoriously conservative and risk averse debian-stable standards.

I can't even imagine the hassle of backporting stuff from much newer kernels into 3.10 to make btrfs work.

If you want to use btrfs in production (not the best idea, IMHO, but orgs such as facebook do it) you absolutely need to be running a 4.9.x or better kernel.

Calling them ancient is a bit unfair, they regularly backport hardware support and select other features from newer kernels. Unlike other distributions, however, they have a stable kernel ABI for an entire release, so they can’t bring in changes that break white listed symbols.

The reason why its not taken off is that its much younger than ZFS, and more crucially its got terrible admin tools and documentation.

It might be a brilliant file system, but I'm never going ot use it because its such an arse to learn how to use it properly.

The Oracle Linux SAG is a reasonable place to start for documentation on it:


However, I often end up at the Arch Linux wiki, too:


Not that I've seen. It's there and works. I think a lot of that impression is driven by the fact that... let's be honest: filesystem features are stale, dinosaur stuff. All the cool kids are worrying about cloud scale storage backends.

At the end of the day btrfs and ZFS both are just ways to help single-system administrators do better and more reliable backups. I know that's not how they're positioned, but really that's all they do.

btrfs still has a lot of issues unfortunately. I tried using buildroot once and the entire filesystem kinda died with the infamous "out of space" error even though there was plenty free. Several RAID levels are also unsafe to use. After experiencing both, I trust ZFS a lot more.

I've only used BTRFS a handful of times and I've still had three filesystems die on me. Two became cripplingly slow, probably because of automatic snapshots (even though there was a small retention limit). One was fine until it suddenly got into a state where it can only be mounted read-only or mounting never finishes, despite a lot of parameters thrown at it trying to fix that. The first two were not terribly long ago, and the last one was 2018.

I had the same experience with btrfs. It killed my machine multiple times.

ZFS on my laptop took me an evening to setup, but it's been completely hassle free. ZFS is really easy to use.

Agreed. However, Synology uses it in their NAS products along with md RAID. Also, SuSE ships it in all their products. OpenSuSE LEAP 15 which was released two weeks ago and has a transactional updates feature that uses btrfs snapshots.

Honestly, I've never really understood the appeal of trying to shoehorn RAID into the FS itself. A more traditional softraid has always felt "good enough" to me (especially in this day and age of affordable SSDs).

ZFS in particular seems to push more toward RAIDZ, whereas I'm a bit averse to striping in general nowadays for data recovery/reliability reasons (RAID1 + JBOD is dead-simple to recover, more flexible with heterogeneous disk sizes, and much easier to reason about IMO).

Not sure where you are getting the push from to use RAIDZ over mirroring. Definitely not from ZFS proper. It's all about what's the best for the situation. Most my things are ZFS mirrors. But given a choice between RAID 3/5/6 or RAIDZ, I don't have to think long.

I've had the opposite experience. I've been using btrfs for years with no issues (pro-tip: don't create 10,000 snapshots and then try to rebalance the fs)

Having snapshots has saved me occasionally and find-new is really useful.

Maybe, but with btrfs your data is definitely at risk. Don't ever use it for anything important. Backups are important but even with those, btrfs is a filesystem that should be uninvented.

It's installed an works many places so it's not quite Hurd.

A critical issue is that btrfs raid5/6 is officially considered unstable.

Zfs is very modularize. Most of the platform specific code is in SPL. Don't think this would affect freebsd

I would speculate part of Java's success is due to being available but not deeply integrated on Linux-based OSes, for similar legal reasons. As with ZFS, it became very easy to upgrade your version of Java without having to go through an upgrade of the whole baseline OS (which was often the natural path if you wanted to use a new version of, say, Python).

Of course ZFS has a lot more potential to be broken by kernel updates than Java, but the parallel is interesting.

Why was CDDL chosen for ZoL?

Because ZFS itself is under Cddl and no one but Oracle can change that.

Theoretically you can re-write all the Oracle parts and then relicesne it. That is how BSD got out from AT&T years ago. This is hard to pull off legally and takes a long time. It is possible, but generally isn't practical.

Surviving a visit from Oracle’s lawyers isn’t practical.

you skipped the part about the lawsuit too.

And if the ongoing mudfight that is Oracle v. Google is any indicator, it's unclear that it would indeed even be legal (it should be, but God only knows how the courts are gonna end up ruling in the end).

ZoL was forked from Oracle ZFS, if you want the real answer you need to ask Oracle, presumably it was so that ZFS would be incompatible with their competition (Linux GPL based work)

Even before Oracle, Sun had licensed it as CDDL. So Oracle itself may not be able to answer that question anymore.

CDDL includes patent grants which GPL (et cetera) does not. CDDL allows also file-based licensing, which allows for projects with combined free/nonfree sources to be distributed, which was a requirement for Solaris. It's not that they chose CDDL to make it not work on Linux, they created CDDL because they didn't have a choice.

Sun created CDDL in order to be Linux GPLv2 incompatible, even the person who was tasked by Sun management to write CDDL (Danese Cooper) said so.

And it's not as if that admission was needed, as anything else would have been insane from Sun.

They were losing badly to Linux, so in the eleventh hour they decided to go open source in an attempt at getting more mindshare = marketshare.

Solaris had some real technical advantages over Linux like ZFS and DTrace, however if they licensed Solaris with a GPLv2 compatible license, Linux could just incorporate their coveted tech, which would be business suicide. Hence CDDL.

You can't blame them given the situation they were in.

additionally FSF and Sun weren't able to agree to have compatible copyleft clauses.

Oracle (presumably) acquired the copyright when they acquired Sun, so they actually could (presumably) answer that question with, say, a GPL'd release.

That'd mean Oracle actually using their army of litigators for good, though, and it'll be a cold day in Hell when that happens.

Got me! Right you are, I should have made that distinction, thanks for clarifying.

Ah I didn't realize it was a fork, I had thought it was a clean room implementation.

Speaking of filesystems- is there one that allows me to add tags? Or alternate streams (like NTFS does), so I can add my own tagging system on top?

As someone who has spent a lot of time on this problem: you don't want to store tags as attributes/resource forks/streams/etc in a traditional filesystem. The single most important characteristic for a tag-based organization system is that search operations must be extremely fast and cheap. Walking a hierarchical filesystem and stat'ing a bunch of files to discover their tag information is anything but cheap.

What you want is to store the tag metadata heap in an easily indexed structure (i.e: a search tree.) That heap then just contains pointers into content addressable storage. You can kind of build such a thing on top of existing filesystems which support hardlinks, however I'd advise against that for the metadata itself. Once you have the power of tagged content a natural evolution is querying that data using set operators. You will inevitably re-invent (some approximation of) SQL once you arrive at this discovery.

It might still make sense to keep the canonical metadata in xattrs so that tools can interoperate on it and build a search index from it when you actually need to do search.

On a NVMe drive rebuilding the search index shouldn't be too costly, even on hundred thousands of files.

I think there were also some ioctl additions for btrfs that let you sequentially walk the filesystem by extents or inodes instead of by directory tree, which makes indexing a lot easier.

That's a great point. Which rules out using alternate streams as a tagging system.

But not tags themselves- so I'll still like a file-system that supports tags natively.

I think extended attributes is what you're looking for (at least in Linux land).

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact