"honestly, there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself that says that yes, it's ok to do so and treat the end result as GPL'd.
Other people think it can be ok to merge ZFS code into the kernel and that the module interface makes it ok, and that's their decision. But considering Oracle's litigious nature, and the questions over licensing, there's no way I can feel safe in ever doing so.
And I'm not at all interested in some "ZFS shim layer" thing either that some people seem to think would isolate the two projects. That adds no value to our side, and given Oracle's interface copyright suits (see Java), I don't think it's any real licensing win either."
I understand Linus reasoning but there is just no way I will install btrfs, like ever. I rather dont update kernel (I am having zfs on fedora root with degular kernel updates and scripts which verify that everything is with kernel modules prior to reboot) than use file system that crashed twice in two years.
Yes it is very annoying if update crashes fs, but currently:
- in 2 years two time btrfs crashed itself
- in next 2 years update never broke zfs
As far as I am concerned, the case for zfs is clear.
This might be helpful to someone: https://www.csparks.com/BootFedoraZFS/index.md
Anyway Linus is going too far with his GPL agenda, the MODUL_LICENSE writting kernel moduls explains why the hardware is less supported on linux - instead of devs. focusing on more support from 3rd party companies, they try to force them to do GPL. Once you set MODUL_LICENSE to non GPL, you quickly figure out that you can't use most of kernel calls. Not the code. Calls.
Relaxing on anything more permissive than GPL2 would instead mean the end of Linux as we know it. A more permissive license means that nothing would prevent Google or Microsoft from releasing their own closed-source Linux, or replacing the source code of most of the modules with hex bloats.
I believe that GPL2 is a good trade-off for a project like Linux, and it's good that we don't compromise on anything less than that.
Even though I agree on the superiority of ZFS for many applications, I think that the blame for the missed inclusion in the kernel is on Oracle's side. The lesson learned from NTFS should be that if a filesystem is good and people want to use it, then you should make sure that the drivers for that filesystem are as widely available as possible. If you don't do it, then someone sooner or later will reverse engineer the filesystem anyway. The success of a filesystem is measured by the number of servers that use it, not by the amount of money that you can make out of it. For once Oracle should act more like a tech company and less like a legal firm specialised in patent exploitation.
> or replacing the source code of most of the modules with hex bloats.
Ok good point, I am no longer pissed off on MODULE_LICENSE, didn't even thought about that.
I did use it as well many years ago (probably around 2012-2015) in a raid5-configuration after reading a lot of positive comments about this next-gen fs => after a few weeks my raid started falling apart (while performing normal operations!) as I got all kind of weird problems => my conclusion was that the raid was corrupt and it couldn't be fixed => no big problem as I did have a backup, but that definitely ruined my initial BTRFS-experience. During those times even if the fs was new and even if there were warnings about it (being new), everybody was very optimistic/positive about it but in my case that experiment has been a desaster.
That event held me back until today from trying to use it again. I admit that today it might be a lot better than in the past but as people have already been in the past positive about it (but then in my case it broke) it's difficult for me now to say "aha - now the general positive opinion is probably more realistic then in the past", due e.g. to that bug that can potentially still destroy a raid (the "write hole"-bug): personally I think that if BTRFS still makes that raid-functionality available while it has such a big bug while at the same time advertising it as a great feature of the fs, the "irrealistically positive"-behaviour is still present, therefore I still cannot trust it. Additionally that bug being open since forever makes me think that it's really hard to fix, which in turn makes me think that the foundation and/or code of BTRFS is bad (which is the reason why that bug cannot be fixed quickly) and that therefore potentially in the future some even more complicated bugs might show up.
I am writing and testing since a looong time a program which ends up creating a big database (using "Yandex Clickhouse" for the main DB) distributed on multiple hosts where each one uses multiple HDDs to save the data and that at the same time is able to fight against potential "bitrot" ( https://en.wikipedia.org/wiki/Data_degradation ) without having to resync the whole local storage each time that a byte on some HDD lost its value. Excluding BTRFS, the only other candidate that I found is ZFSoL that perform checksums on data (both XFS and NILFS2 do checksums but only on metadata).
Excluding BTRFS because of the reasons mentioned above, I was left only with ZFS.
I'm now using ZFSoL since a couple of months and so far everything went very well (a bit difficult to understand & deal with at the beginning, but extremely flexible) and performance is as well good (but to be fair that's easy in combination with the Clickhouse DB, as the DB itself writes data already in a CoW-way, therefore blocks of a table stored on ZFS are always very likely to be contiguous).
On one hand, technically, now I'm happy. On the other hand I do admit that the problems about licensing and the non-integration of ZFSoL in the kernel do have risks. Unluckily I just don't see any alternative.
I do donate monthly something to https://www.patreon.com/bcachefs but I don't have high hopes - not much happening and BCACHE (even if currently integrated in the kernel) hasn't been in my experience very good (https://github.com/akiradeveloper/dm-writeboost worked A LOT better, but I'm not using it anymore as I don't have a usecase for it anymore, and it was a risk as well as not yet included in the kernel) therefore BCACHEFS might end up being the same.
When it comes to predicting your future, though, your personal anecdotes may not hold water to more substantial data.
As far as it needing a lot of memory, that is also not true. The ARC will use your memory if it's available, because it's available! You paid good money for it, so why not actually use it to make things faster?
But there was nothing like that on the market at that time anyway.
- The kernel team may break it at any time, and won't care if they do.
- It doesn't seem to be well-maintained.
- Performance is not that great compared to the alternatives.
- Using it opens you up to the threat of lawsuits from Oracle. Given history, this is a real threat. (This is one that should be high for Linus but not for me - there is no conceivable reason that Oracle would want to threaten me with a lawsuit.)
> It doesn't seem to be well-maintained.
The last commit is from 3 hours ago: https://github.com/zfsonlinux/zfs/commits/master. They have dozens of commits per month. The last minor release, 0.8, brought significant improvements (my favorite: FS-level encryption).
Or maybe this is referred to the 5.0 kernel (initial) incompatibility? That wasn't the ZFS dev team's fault.
> Performance is not that great compared to the alternatives.
There are no (stable) alternatives. BTRFS certainly not, as it's "under heavy development"¹ (since... forever).
> The kernel team may break it at any time, and won't care if they do.
That's true, however, the amount is breakage is no different from any other out-of-tree module, and it unlikely to happen with a patch version of a working kernel (in fact, it happen with the 5.0 release).
> Using it opens you up to the threat of lawsuits from Oracle. Given history, this is a real threat. (This is one that should be high for Linus but not for me - there is no conceivable reason that Oracle would want to threaten me with a lawsuit.)
"Using" it won't open to lawsuits; ZFS has a CDDL license, which is a free and open-source software license.
The problem is (taking Ubuntu as representative) shipping the compiled module along with the kernel, which is an entirely different matter.
Java is GPLv2+CPE. That didn't stop Oracle because, as Linus pointed out in the email, Oracle regards their APIs as a separate entity to their code.
So it's not comparable with GCC; but comparable to forking clang and keeping clang license. I doubt RMS could be able to say anything.
Note that they don't mean "it's unstable," just "there are significant improvements between versions." Most importantly:
> The filesystem disk format is stable; this means it is not expected to change unless there are very strong reasons to do so. If there is a format change, filesystems which implement the previous disk format will continue to be mountable and usable by newer kernels.
...and only _new features_ are expected to stabilise:
> As with all software, newly added features may need a few releases to stabilize.
So overall, at least as far as their own claims go, this is not "heavy development" as in "don't use."
What makes you say that? I've seen plenty of people make this claim based on URE rates, but I've also not seen any evidence that it is a real problem for a 3-4 drive setup. Modern drives are specced at 1 URE per 10^15 bits read (or better), so less than 1 URE in 125 TB read. Even if a rebuild did fail, you could just start over from a backup. Sure, if the array is mission critical and you have the money, use something with more redundancy, but I don't think RAID5 is infeasible in general.
* All localized error are correctable, unless they overlap on different disks, or result in drive ejection. This precisely fixes UREs of non-raid drives.
* If a complete drive fails, then you have a chance of losing some data from the UREs / localized errors. This is approximately the same as if you used no RAID.
As for URE incidence rate - people use multi-TB drives without RAID, yet data loss does not seem prevalent. I'd say it depends .. a lot.
If you use a crappy RAID5, that ejects a drive on a drive partial/transient/read failure, then yes, it's bad, even worse than no RAID.
That being said, I have no idea whether a good RAID5 implementation is available, one that is well interfaced or integrated into filesystem.
It has terrible performance problems under many typical usage scenarios. This is a direct consequence in the choice of core on-disc data structures. There's no workaround without a complete redesign.
It can become unbalanced and cease functioning entirely. Some workloads can trigger this in a matter of hours. Unheard of for any other filesystem.
It suffers from critical dataloss bugs in setups other than RAID5. They have solved a number of these, but when reliability is its key selling point many of us have concerns that there is still a high chance that many still exist, particularly in poorly-exercised codepaths which are run in rare circumstances such as when critical faults occur.
And that's only getting started...
 which I used for nearly two years on a small desktop machine on a daily basis; ended up with (minor?) errors on the file system that could not be repaired and decided to switch to ZFS. No regrets, nor similar errors since.
Is bcachefs more-or-less ready for some use cases now? Does it still support caching layers like bcache did?
Regarding caching: "Bcachefs allows you to specify disks (or groups thereof) to be used for three categories of I/O: foreground, background, and promote. Foreground devices accept writes, whose data is copied to background devices asynchronously, and the hot subset of which is copied to the promote devices for performance."
If all you need is a simple root FS that is CoW and checksummed, bcachefs works pretty good, in my experience. I've been using it productively as a root and home FS for about two years or so.
Same for encryption, there are already existing crypto layers both on the block and filesystem (as an overlay) level.
Are you sure about that? Always reading both doubles read I/O, and benchmarks show no such effect.
> there's no way for the fs to tell which is correct
This is not an immutable fact that precludes keeping the RAID implementation separate. If the FS reads data and gets a checksum mismatch, it should be able to use ioctls (or equivalent) to select specific copies/shards and figure out which ones are good. I work on one of the four or five largest storage systems in the world, and have written code to do exactly this (except that it's Reed-Solomon rather than RAID). I've seen it detect and fix bad blocks, many times. It works, even with separate layers.
This supposed need for ZFS to absorb all RAID/LVM/page-cache behavior into itself is a myth; what really happened is good old-fashioned NIH. Understanding other complex subsystems is hard, and it's more fun to write new code instead.
This is all great, and I assume it works great. But it is no way generalizable to all the filesystems Linux has to support (at least at the moment). I could only see this working in a few specific instances with a particular set of FS setups. Even more complicating is the fact that most RAIDS are hardware based, so just using ioctls to pull individual blocks wouldn’t work for many (all?) drivers. Convincing everyone to switch over to software raids would take a lot of effort.
There is a legitimate need for these types of tools in the sub-PB, non-clustered, storage arena. If you’re working on a sufficiently large storage system, these tools and techniques are probably par for the course. That said, I definitely have lost 100GBs of data from a multi-PB storage system from a top 500 HPC system due to bit rot. (One bad byte in a compressed data file left the data after the bad byte unrecoverable). This would not have happened on ZFS.
ZFS was/is a good effort to bring this functionality lower down the storage hierarchy. And it worked because it had knowledge about all of the storage layers. Checksumming files/chunks helps best if you know about the file system and which files are still present. And it only makes a difference if you can access the lower level storage devices to identify and fix problems.
Why not? If it's a standard LVM API then it's far more general than sucking everything into one filesystem like ZFS did. Much of this block-mapping interface already exists, though I'm not sure whether it covers this specific use case.
At the time that ZFS was written (early 2000s) and released to the public (2006), this was not a thing and the idea was somewhat novel / 'controversial'. Jeff Bonwick, ZFS co-creator, lays out their thinking:
Remember: this was a time when Veritas Volume Manager (VxVM) and other software still ruled the enterprise world.
When Sun added ZFS to Solaris, they did not get rid of UFS and/or SVM, nor prevent Veritas from being installed. When FreeBSD added ZFS, they did not get rid of UFS or GEOM either.
If an admin wanted or wants (or needs) to use the 'old' way of doing things they can.
It may be a good thing, and it may not. Linux has a bajillion file systems, some more useful than others, and that is unique in some ways.
Solaris and other enterprise-y Unixes at the time only had one. Even the BSDs generally only have a few that they run on instead of ext2/3/4, XFS, ReiserFS (remember when that was going to take over?), btrfs, bcachefs, etc, etc, etc.
At most, a company may have purchased a license for Veritas:
By rolling everything together, you get ACID writes, atomic space-efficient low-overhead snapshots, storage pools, etc. All this just be removing one layer of indirection and doing some telescoping:
It's not "modularity bad", but that to achieve the same result someone would have had to write/expand a layer-to-layer API to achieve the same results, and no one did. Also, as a first-order estimate of complexity: how many lines of code (LoC) are there in mdraid/LVM/ext4 versus ZFS (or UFS+SVM on Solaris).
What would you recommend over zfs for small-scale storage servers? XFS with mdraid?
I'd also love to hear your opinion on the Reiser5 paper.
That's problem only with RAID1, only when copies=2 (granted, most often used case) and only when the underlying device cannot report which sector has gone bad.
There are valid reasons, most having to do with filesystem usage and optimization. Off the top of my head:
- more efficient re-syncs after failure (don't need to re-sync every block, only the blocks that were in use on the failed disk)
- can reconstruct data not only on disk self-reporting, but also on filesystem metadata errors (CRC errors, inconsistent dentries)
- different RAID profiles for different parts of the filesystem (think: parity raid for large files, raid10 for database files, no raid for tmp, N raid1 copies for filesystem metadata)
and for filesystem encryption:
- CBC ciphers have a common weakness: the block size is constant. If you use FS-object encryption instead of whole-FS encryption, the block size, offset and even the encryption keys can be varied across the disk.
Volume management is a just a hack. We had all of these single-disk filesystems, but single disks were too small. So volume management was invented to present the illusion (in other words, lie) that they were still on single disks.
If you replace "disk" with "DIMM", it's immediately obvious that volume management is ridiculous. When you add a DIMM to a machine, it just works. There's no volume management for DIMMs.
Storage is at the bottom of the caching hierarchy where people get inventive to avoid rebuilding. Rebuilding would be really costly there. Hence we use volume management to spare us the cost of rebuilding.
RAM also tends to have uniform performance. Which is not true for disk storage. So while you don't usually want to control data placement in RAM, you very much want to control what data goes on what disk. So the analogy confuses concepts rather than illuminating commonalities.
One of the creators of ZFS, Jess Bonwick, explained it in 2007:
> While designing ZFS we observed that the standard layering of the storage stack induces a surprising amount of unnecessary complexity and duplicated logic. We found that by refactoring the problem a bit -- that is, changing where the boundaries are between layers -- we could make the whole thing much simpler.
RAIDZ is part of the VDEV (Virtual Device) layer. Layered on top of this is the ZIO (ZFS I/O layer). Together, these form the SPA (Storage Pool Allocator).
On top of this layer we have the ARC, L2ARC and ZIL. (Adaptive Replacement Caches and ZFS Intent Log).
Then on top of this layer we have the DMU (Data Management Unit), and then on top of that we have the DSL (Dataset and Snapshot Layer). Together, the SPA and DSL layers implement the Meta-Object Set layer, which in turn provides the Object Set layer. These implement the primitives for building a filesystem and the various file types it can store (directories, files, symlinks, devices etc.) along with the ZPL and ZAP layers (ZFS POSIX Layer and ZFS Attribute Processor), which hook into the VFS.
ZFS isn't just a filesystem. It contains as many, if not more, levels of layering than any RAID and volume management setup composed of separate parts like mdraid+LVM or similar, but much better integrated with each other.
It can also store stuff that isn't a filesystem. ZVOLs are fixed size storage presented as block devices. You could potentially write additional storage facilities yourself as extensions, e.g. an object storage layer.
Which was precisely Sun/Oracle's goal when they released ZFS under the purposefully GPL incompatible CDDL. Sun was hoping to make OpenSolaris the next Linux whilst ensuring that no code from OpenSolaris could be moved back to linux. I can't think of another plausible reason why they would write a new open source license for their open source operating system and making such a license incompatible with the GPL.
Some people argue that Sun (or the Sun engineer) as creator of the license made the CDDL intentionally GPL incompatible. According to Danese Cooper one of the reasons for basing the CDDL on the Mozilla license was that the Mozilla license is GPL-incompatible. Cooper stated, at the 6th annual Debian conference, that the engineers who had written the Solaris kernel requested that the license of OpenSolaris be GPL-incompatible.
Mozilla was selected partially because it is GPL incompatible. That was part
of the design when they released OpenSolaris. ... the engineers who wrote Solaris
... had some biases about how it should be released, and you have to respect that.
> Simon Phipps (Sun's Chief Open Source Officer at the time), who had introduced Cooper as "the one who actually wrote the CDDL", did not immediately comment, but later in the same video, he says, referring back to the license issue, "I actually disagree with Danese to some degree", while describing the strong preference among the engineers who wrote the code for a BSD-like license, which was in conflict with Sun's preference for something copyleft, and that waiting for legal clearance to release some parts of the code under the then unreleased GNU GPL v3 would have taken several years, and would probably also have involved mass resignations from engineers (unhappy with either the delay, the GPL, or both—this is not clear from the video). Later, in September 2006, Phipps rejected Cooper's assertion in even stronger terms.
So of the available licenses at the time, Engineering wanted BSD and Legal wanted GPLv3, so the compromise was CDDL.
Edit: Nevermind, debunked by Bryan Cantrill. It was to allow for proprietary drivers.
So of the available licenses at the time, Engineering wanted BSD and Legal wanted (to wait for) GPLv3, so the compromise was CDDL.
Lovely except it really was decided to explicitly make OpenSolaris incompatible with GPL. That was one of the design points of the CDDL. I was in that room, Bryan and you were not, but I know its fun to re-write history to suit your current politics. I pleaded with Sun to use a BSD family license or the GPL itself and they would consider neither because that would have allowed D-Trace to end up in Linux. You can claim otherwise all you want...this was the truth in 2005.
(In many ways, it still is a time of pirates, we just moved a bit higher in the stack...)
I wouldn't say McNealy was that different than any of those, though others like Joy and Bechtolsheim had a more salutary influence. To the extent that there was any overall difference, it seemed small. Working on protocol interop with DEC products and Sun products was no different at all. Sun went less-commodity with SPARC and SBus, they got in bed with AT&T to make their version of UNIX seem more standard than competitors' even though it was more "unique" in many ways, there were the licensing games, etc. Better than Oracle, yeah, but I wouldn't go too much further than that.
For (the lack of) openness, I agree, but the claim that they were not innovative needs stronger evidence.
The whole open-source steer by SUN was a very disingenous strategy, forced by the changed landscape in order to try and salvage some parvence of relevance. Most people saw right through it, which is why SUN ended up as it did shortly thereafter: broke, acquired, and dismantled.
I am willing to bet that Google had the same thought. And I am also willing to bet that Google is regretting that thought now.
That last one is likely to get some kind of hacky workaround. But nobody wants to do the invasive changes necessary for actual BPR to enable that entire list.
Unless you are living in 2012 on a RHEL/CENTOS 6/7 machine, btrfs has been stable for way too long. I have been using btrfs as the sole filesystem on my laptop in standard mode, on my desktop as RAID0 and my NAS as RAID1 for more that two years. I have experienced absolutely zero data loss. Infact, btrfs recovered my laptop and desktop from broken package updates many times.
You might have had some issues when you tried btrfs on distros like RHEL that did not backport the patches to their stable versions because they don't support btrfs commercially. Try something like openSUSE that backports btrfs patches to stable versions or use something like arch.
> That's true, however, the amount is breakage is no different from any other out-of-tree module, and it unlikely to happen with a patch version of a working kernel (in fact, it happen with the 5.0 release).
This is a filesystem that we are talking. In no circumstances will any self respecting sysadmin use a file system that has even a small change of breaking with a system update.
Meanwhile ZFS has survived disk failures, removing 2 disks from an 8 disk RAIDZ3 array and then putting them back, random SATA interface connection issues that were resolved by reseating the HDD, and will probably survive anything else that I throw at it.
RAIZ/Z2 avoids the issue by having slightly different semantics. That's why it is Z/Z2, not 5/6.
VMWare Fusion, on the other hand, powers the desktop environment I've used as a daily work machine for the last 6 months, and I've had absolutely zero problems other than trackpad scrolling getting emulated as mouse wheel events (making pixel-perfect scroll impossible).
Despite that one annoyance, it's definitely worth paying for if you're using it for any serious or professional purpose.
VirtualBox itself is GPL. There is no lawsuit risk.
What requires "commercial considerations" is the extension pack.
The extension pack is required for:
> USB 2.0 and USB 3.0 devices, VirtualBox RDP, disk encryption, NVMe and PXE boot for Intel cards
If licensing needs to be considered (ie. in a corporate environment), but one doesn't need the functionalities above, then there's no issue.
It might be, but let's just say that Oracle aren't big fans of $WORK, and our founders are big fans of them. Thus our legal department are rather tetchy about anything that could give them even the slightest chance of doing anything.
> What requires "commercial considerations" is the extension pack.
And our legal department are nervous about that being installed, even by accident, so they prefer to minimise the possibility.
Is the expectation here that firms offering software under non-commercial-use-is-free licenses just run it entirely on the honour system? And isn't it true that many firms use unlicensed software, hence the need for audits?
They can also apply stronger heuristics, like popping up a dialogue box if the computer is centrally-managed (e.g.: Mac MDM, Windows domain, Windows Pro/Enterprise, etc.).
The (commercially licensed) Extensions pack provide "Support for USB 2.0 and USB 3.0 devices, VirtualBox RDP, disk encryption, NVMe and PXE boot for Intel cards" and some other functionality e.g. webcam passthrough . There may be additional functionality enabled by the Extension pack I cannot find at a glance, but those are the main things.
With my MBP as host and Ubuntu as guest, I found that VirtualBox (with and without guest extensions installed) had a lot of graphical performance issues that Fusion did not.
I don't think it has to be conceivable with Oracle...
Unfortunately I have to agree with Linus on this one. Messing with Oracle's stuff is dangerous if you can't afford a comparable legal team.
Money. Anecdotally that's the primary reason Oracle do anything.
I worked for a tiny startup (>2 devs full time) where Oracle tried to extract money from us because we used MariaDB on AWS.
If you think this sounds ridiculous you probably got it right.
(Why? Because someone inexperienced with Oracle had filled out the form while downloading the mySQL client.)
Don't be so sure about this.
However the person he is replying to was not actually asking to have ZFS included in the mainline kernel. As noted above, that could never happen, and I believe that Linus is only bringing it up to deflect from the real issue. What they were actually asking is for Linux to revert a change that was made for no other reason than to hinder the use of ZFS.
Linux includes a system which restricts what APIs are available to each module based on the license of the module. GPL modules get the full set of APIs whereas non-GPL modules get a reduced set. This is done strictly for political reasons and has no known legal basis as far as I'm aware.
Not too long ago a change was made to reduce the visibility of a certain API required by ZFS so only GPL modules could use it. It's not clear why the change was made, but it was certainly not to improve the functionality of the kernel in any way. So the only plausible explanation to me is that it was done just to hinder the use of ZFS with Linux, which has been a hot political issue for some time now.
Saving and restoring registers is an astoundingly generic function. If you list all the kernel exports and sort by how much they make your work derivative, it should be near the very bottom.
It was always frowned upon:
> In other words: it's still very much a special case, and if the question was "can I just use FP in the kernel" then the answer is still a resounding NO, since other architectures may not support it AT ALL.
> Linus Torvalds, 2003
and these specific functions, that were marked as GPL were already deprecated for well over a decade.
> It was always frowned upon
Whether it's frowned upon is a completely different issue from whether it intertwines your data so deeply with the kernel that it makes your code a derivative work subject to the GPL license. Which it doesn't.
> if the question was "can I just use FP in the kernel" then the answer is still a resounding NO, since other architectures may not support it AT ALL.
It's not actually using floating point, it's using faster instructions for integer math, and it has a perfectly viable fallback for architectures that don't have those instructions. But why use the slower version when there's no real reason to?
> and these specific functions, that were marked as GPL were already deprecated for well over a decade.
But the GPL export is still there, isn't it? It's not that functionality is being removed, it's that functionality is being shifted to only have a GPL export with no license-based justification for doing so.
He is not going to beg people outside kernel, whether he is allowed to change something that may break their module. On the contrary, they must live with any breackage that is thrown at them.
Again, that symbol was deprecated for well over a decade. How long does it take to be allowed to remove it?
> Again, that symbol was deprecated for well over a decade.
But not the GPL equivalent of the symbol. That symbol is not deprecated.
One can interpret this as something legally significant, or an embarrassing private anecdote, or nothing substantial at all, maybe even just talk. However, I'd give them the benefit of the doubt. Not the least since they could be the ones against Oracle's legal dept...
If you don’t like that don’t use it.
let me stop you right there. This being "Oracle," and its litigious nature, how can you truly be aware or sure?
Linus is literally saying there is a legal basis.
The functionality I'm describing has absolutely nothing to do with ZFS or Oracle in any way. If you really think the reach of Oracle is so great, then why not block all Oracle code from ever running on the OS? That seems to me to be just as justified as this change.
...to be fair, I would probably run that module.
I can't make a informed opinion but my uninformed gut feeling is oracle have done what they are suing google for having done.
CoW filesystems do trade performance for data safety. Or did you mean there are other _stable/production_ CoW filesystems with better performance? If so, please do point them out!
Just ask yourself what happens when a thin pool runs out of actual, physical disk blocks?
I have actually managed to run out of blocks on XFS on thin LV and it's an interesting experience. XFS always survoved just fine, but some files basically vanished. Looks like mostly those that were open and being written to at exhaustion time, like for example a mariadb database backing store. Files that were just sitting there were perfectly fine as far as I could tell.
Still, you definitely should never put data on a volume where a pool can be exhausted, without a backup as I don't think there is really a bulletproof way for a filesystem to handle that happening suddenly.
ZFS doesn't over-provision anything by default. The only case I'm aware of where you can over-provision with ZFS is when you explicitly choose to thin provision zvols (virtual block devices with a fixed size). This can't be done with regular file systems which grow as needed, though you can reserve space for them.
File systems do handle running out of space (for a loose definition of handle) but they never expect the underlying block device to run out of space, which is what happens with over-provisioning. That's a problem common to any volume manager that allows you to over provision.
With ZFS, if I take a snapshot and then delete 10GB of data my file system will appear to have shrunk by 10GB. If I compare the output of df before and after deleting the data, df will tell me that "size" and "used" have decreased by 10GB while "available" remained constant. Once the snapshot is deleted that 10GB will be made available again and the "size" and "available" columns in df will increase. It avoids over-provisioning by never promising more available space than it can guarantee you're able to write.
I think you're trying to relate ZFS too much to how LVM works, where LVM is just a volume manager that exposes virtual devices. The analogue to thin provisioned LVM volumes is thin-provisioned zvols, not regular ZFS file systems. I can choose to use ZFS in place of LVM as a volume manager with XFS as my file system. Over-provisioned zvols+XFS will have functionally equivalent problems as over-provisioned LVM+XFS.
ZFS has quotas and reservations. The former is a maximum allocation for a dataset. The latter is a minimum guaranteed allocation. Neither actually allocate blocks from the pool. These don't relate in any comparable way to how LVM works. They are just numbers to check when allocating blocks.
If I have a 10GB pool and I create 10 empty file systems, the sizes reported in df will be 100GB. It's not quite a lie either, because each of those 10 file systems does in fact have 10GB of space available I could write 10GB to any one of them. If I write 1GB to one of those file systems, the "size" and "available" spaces for the other nine will all shrink despite not having a single byte of data written to them.
With ZFS and df the "size" column is really only measuring the maximum possible size (at this point in time, assuming nothing else is written) so it isn't very meaningful, but the "used" and "available" columns do measure something useful.
You could build a pool-aware version of DF that reflects this, by grouping file systems in a pool together and reporting that the pool has 10GB available. But frankly there's not enough benefit to doing that because people with storage pools already understand summing up all the available space from df's output is not meaningful. Tools like zpool list and BTRFS's df equivalent already correctly report the total free space in the pool.
Supporting reflinks is actually more, than can be said about ZoL (see zfsonlinux#405).
No. Distributing (ie. precompiled distro with ZFS) will. You are free to run any software on your machine as you so desire.
> as far as I can tell, it has no real maintenance behind it either any more
Which simply isn't true. They just released a new ZFS version with encryption built in (no more ZFS + LUKS) and they removed the SPL dependency (which didn't support Linux 5.0+ anyway).
I use ZFS on my Linux machines for my storage and I've been rather happy with it.
All of the snapshot functionality is based upon simple transaction number comparisons plus the deadlist of blocks owned by the snapshot. Only the recycling of blocks should have a bit of overhead, and that's done by a background worker--you see the free space increase for a few minutes after a gargantuan snapshot or dataset deletion, but the actual deletion completed immediately.
I should point out again that I don't have enough direct experience with ZFS to say if this is the case, my experience was with an enterprise NetApp server at a large company that was filling the disk up (>95%) in addition to doing hourly snapshots.
Linux project maintains compatibility with userspace software but it does not maintain compatibility with 3rd party modules and for a good reason.
Since modules have access to any internal kernel API it is not possible to change anything within kernel without considering 3rd party code, if you want to keep that code working.
For this reason the decision was made that if you want your module to work you need to make it part of Linux kernel and then if anybody refactors anything they need to consider modules they would be affecting by the change.
Not allowing the module to be part of the kernel is a disservice to your user base. While there are modules like that that are maintained moderately successfully (Nvidia, vmware, etc.) this is all at the cost of the user and userspace maintainers who have to deal with it.
Use FreeBSD where there's a stable ABI and you don't have these problems.
Having a stable ABI for two years is vastly easier to support than an ABI which changes every two weeks. This is reflected by the number of binary modules which are packaged for FreeBSD in the ports tree, and provided by third-party vendors. This stability makes it possible to properly support for a reasonable timeframe, and vendors are doing so.
It is enough that almost all devices around me have a bunch of running code that I have absolutely no control over. I need at least one computer I can trust to do MY bidding.
If you look at FreeBSD, the majority of third-party modules are free software. It's stuff like graphics drivers, newer ZFS modules, esoteric HBAs etc. Proprietary modules, like nVidia's graphics driver, are the minority.
I can see and understand why things are the way they are, and indeed I agreed with the approach for many years. Today, I see it being as short sighted as the GCC vs LLVM approach to modular architecture.
Linux is nearly 30 years old now. To not have stable internal interfaces seems to me to be indicative of either bad initial design or ill discipline on the part of its maintainers. Every other major kernel seems to manage to have a stable ABI for third-party functionality, and Linux is an outlier in its approach. Having to upgrade the kernel for a new GPU driver is painful. Not only do I have to wait for a new kernel release, I have to hope that none of the other changes in that release cause breakage or change the behaviour in unexpected ways. Upgrading a third-party module is much less risky.
That's going to be plenty of reason not to use ZFS for most people. The licensing by itself is also certainly a showstopper for many.
But I'm not sure his other comments are really fair and, had Oracle relicensed ZFS n years back, ZFS would almost certainly be shipping with Linux, whether or not as the typical default I can't say. It certainly wasn't just a buzzword and there were a number of interesting aspects to its approach.
> It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.
So presumably the licensing problem mentioned by your parent's comment is weighing heavily here. I think this "don't use ZFS" statement is most accurately targeted at distro maintainers. Anyone not actually redistributing Linux and ZFS in a way that would (maybe) violate the GPL is not at any risk. That means even large enterprises can get away with using ZoL.
See also https://www.kernel.org/doc/html/latest/process/stable-api-no...
(fuse is a stable user-space API if you want one ... it won't have the same performance and capabilities of course ...)
Maybe, but the complains seem to be more related to the (problematic) changes not being of technical nature accidentally braking ZFS, but being more of political nature. With speculation that it might have been meant to _intentionally_ brake ZFS and then pretend this was a accident because ZFS isn't (and can never) be maintained in tree. Basically on the line of "we don't like out of tree kernel modules so we make the live hard for them". No idea if this is actually the case or people just spin thinks together. Even if it is the case I'm not sure what I should think about, because it's at least partially somewhat understandably.
This has come up many times in the past. Keep in mind that linux has always been GPLv2-only, it is not LGPL or anything like that.
When he says that, I think on the $500 million Sun spent on advertising java.
"Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.
The benchmarks I've seen do not make ZFS look all that great. And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"
The thing about ZFS that actually appeals to me is how much error-checking it does. Checksums/hashes are kept of both data and metadata, and those checksums are regularly checked to detect and fix corruption. As far as I know it (and filesystems with similar architectures) are the only ones that can actually protect against bit rot.
> And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"
It has as much maintenance as any open source project: http://open-zfs.org/. IIRC, it has more development momentum behind it than the competing btrfs project.
I don't believe that's true. They are checked on access, but if left alone, nothing will verify them. From what I've read, you need to setup a cron job that runs scrubbing on some regular schedule.
From my perspective, it has no real competitor under linux, which is why I use it. I don't consider brtfs mature enough for critical data. (Others can reasonably disagree, I have intentionally high standards for data durability.)
Aside from legal issues, he's talking out of his ass.
After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust. Sorry, don't have numbers around anymore, stuff has been trashed since.
Oh, and btw., snapshots and larger numbers of filesystems (which Sun recommended instead of the missing Quota support) also slow things down to a crawl. ZFS is nice on paper and maybe nice to play with. Definitely simpler to use than anything else. But performance-wise it sucked big time, at least on Solaris.
ZFS for “play”?!
This... is just plain uninformed.
Not just me and my employer, but many (many) others rely on ZFS for critical production storage, and have done so for many years.
It’s actually very robust on Linux as well - considering the fact that freeBSD have started to use the ZoL code base is quite telling.
Would freeBSD also be in the “play” and “not robust” category as well, hanging out together with Solaris?
Will it perform better than all in terms of writes/s? Most likely not - although by staying away from de-dup, enough RAM and adhere the pretty much general recommendation to use mirror vdevs only in your pools, it can be competitive.
Something solid with data integrity guarantees? You can’t beat ZFS, imo.
This reminds me. We had one file server used mostly for package installs that used ZFS for storage. One day our java package stops installing. The package had become corrupt. So I force a manual ZFS scrub. No dice. Ok fine I’ll just replace the package. It seems to work but the next day it’s corrupt again. Weird. Ok I’ll download the package directly from Oracle again. The next day again it’s corrupt. I download a slightly different version. No problems. I grab the previous problematic package and put it in a different directory (with no other copies on the file system) - again it becomes corrupt.
There was something specific about the java package that ZFS just thought it needed to “fix”. If I had to guess it was getting the file hash confused. I’m pretty sure we had dedupe turned on so that may have factored into it.
Anyway that’s the first and only time I’ve seen a file system munge up a regular file for no reason - and it was on ZFS.
"play" comes from my distinct impression that the most vocal ZFS proponents are hobbyists and admins herding their pet servers (as opposed to cattle). ZFS comes at low/no cost nowadays and is easy to use, therefore ideal in this world.
I’ve only used zfs in two or three way mirror setup, on beefy boxes, where the issues you describe are minimal.
Also JBOD only.
The thing is that without checksumming you’ve actually no idea if you lose data.
I’ve had several pools over the years report automatic resilvering on checksum mismatches.
Usually it’s been disks acting up well before smart can tell, and reporting this has been invaluable.
On our backup servers (45 disks, 6-wide Z2 stripes) easily handle wire-speed 10G with 32G ARC.
And you're just wrong about snapshots and filesystem counts.
ZFS is no speed demon, but it performs just fine if you set it up correctly and tune it.
Maybe you also don't see as massive an impact because your hardware is a lot faster. X4200s were predominantly meant to be cheap, not fast. No cache, insufficient RAM, slow controllers, etc.
The BMC controller couldn't speak to the disk controller so you had no out-of-band storage management.
I had to Run a fleet of 300 of them, truly an awful time.
Wanna use LVM for snapshots? 33% performance hit for the entire LV per snapshot, by implementation.
ZFS? ~1% hit. I've never been able to see any difference at the workloads I run, whereas with LVM it was pervasive and inescapable.
Asking for a friend who uses XFS on LVM for disk heavy applications like database, file server, etc.
Essentially it comes down to this: a snapshot LV contains copies of old blocks which have been modified in the source LV. Whenever a block is updated in the source LV, LVM will need to check if that block been previously copied into all corresponding snapshot LVs. For each source LV where this is not the case, it will need to copy the block to the snapshot LV.
This means that there is O(n) complexity in the checking and copying. And in the case of "thin" LVs, it will also need to allocate the block to copy to, potentially for every snapshot LV in existence, making the process even slower. The effect is write amplification effectively proportional to the total number of snapshots.
ZFS snapshots, in comparison, cost essentially the same no matter how many you create, because the old blocks are put onto a "deadlist" of the most recent snapshot, and it doesn't need repeating for every other snapshot in existence. Older snapshots can reference them when needed, and if a snapshot is deleted, any blocks still referenced are moved to the next oldest snapshot. Blocks are never copied and only have a single direct owner. This makes the operations cheap.
The overlying filesystem also lacks knowledge of the underlying storage. The snapshot much be able to accommodate writes up to and including the full size of the parent block device in order to remain readable, just like the old-style snapshots did. That's the fundamental problem with LVM snapshots; they can go read-only at any point in time if the space is exhausted, due to the implicit over-commit which occurs every time you create a snapshot.
The overheads with ZFS snapshots are completely explicit and all space is fully and transparently accounted for. You know exactly what is using space from the pool, and why, with a single command. With LVM separating the block storage from the filesystem, the cause of space usage is almost completely opaque. Just modifying files on the parent LV can kill a snapshot LV, while with ZFS this can never occur.
Please let me know which company this is, so I can ensure that I never end up working there by accident. Much obliged in advance, thank you kindly.
what? no. why would that be the case? You lose a single disk's performance due to the checksumming.
just from my personal NAS I can tell you that I can do transfers from my scratch drive (NVMe SSD) to the storage array at more than twice the speed of any individual drive in the array... and that's in rsync which is notably slower than a "native" mv or cp.
The one thing I will say is that it does struggle to keep up with NVMe SSDs, otherwise I've always seen it run at drive speed on anything spinning, no matter how many drives.
I think they are probably referring to the write performance of a RAIDZ VDEV being constrained by the performance of the slowest disc within the VDEV.
But yes, lots of RAM
And as another thread pointed out, stripe size is also an important parameter.
If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.
Expecting all Linux drivers to be GPL-licensed is unrealistic and just leads to crappy user experiences. nVidia is never going to release full-featured GPL'd drivers, and even corporative vendors sometimes have NDAs which preclude releasing open source drivers.
Linux is able to run proprietary userspace software. Even most open source zealots agree that this is necessary. Why are all drivers expected to use the GPL?
P.S. Never mind the fact that ZFS is open source, just not GPL compatible.
P.P.S. There's a lot of technical underpinnings here that I'll readily admit I don't understand. If I speak out of ignorance, please feel free to correct me.
There is little incentive for Nvidia to maintain a linux specific driver, but because it is closed source the community cannot improve/fix it.
> Why are all drivers expected to use the GPL?
I think the answer to this is: drivers are expect to use the GPL if they want to be mainlined and maintained - as Linus said: other than that you are "on your own".
But it means I can't use Wayland. Wayland isn't critical for me, but since NVidia is refusing to implement GBM and using EGLStream instead, there's nothing I can do about it. It simply isn't worth NVidia's time to make Wayland work, so I'm stuck using X. If the driver were open-source someone would have submitted a GBM patch and i wouldn't be stuck in this predicament.
I can't wait for NVidia to have real competition in the ML space so I can ditch them.
Now you can't use something like Sway but their lead developer is too evangelical for my taste so even if I had an AMD/Intel card I would never use it.
You can do that on Intel and AMD drivers and other open source graphics drivers, which due to being open source allow 3rd parties like redhat to patch in GBM support in drivers and mesa when required.
Nvidia driver does not support GBM code paths. Therefore wayland does not work on nvidia. And because nvidia driver is not open source, someone else cannot patch GBM in.
What you cannot use is applications that use OpenGL or Vulkan acceleration. GBM is used for sharing buffers across APIs handled by GPU. If your Wayland clients use just shm to communicate with compositor, it will work.
Why is Intel not a competition? In laptops, I want only Intel, nothing else. It is the smoothest/most reliable/least buggy thing you may have.
For most uses, Intel GPU is fine.
No doubt someone more knowledgeable about Linux could fix this issue, but I never had any issues with my nVidia blobs. That's not to say nVidia don't have their own issues.
I think parent comment wasn't asking for third party, non-GPL drivers to be mainlined, but for a stable interface for out-of-tree drivers.
The idea that Linux needs better support for out of tree drivers is like someone going to church and saying to the priest "I don't care about this Jesus stuff but can I have some free wine and cookies please".
Full disclosure my day job is to write out of tree drivers for Linux :)
How do the Linux and Windows drivers compare on matters related to CUDA?
The _desktop_ situation is worse, though perfectly functional. But I boot into Windows when I want battery life and quiet fans on my laptop.
Which is really kinda of hilarious considering that so much modern hardware requires proprietary firmware blobs to run.
Nvidia is pretty much the only remaining holdout here on the hardware driver front. I don't see why they should get special treatment when the 100%-GPL model works for everyone else.
But it is a problem that you can't reliably have out-of-tree modules.
Also, Linus is wrong: there's no reason that the ZoL project can't keep the ZFS module in working order, with some lag relative to updates to the Linux mainline, so as long as you stay on supported kernels and the ZoL project remains alive, then of course you can use ZFS. And you should use ZFS because it's awesome.
That is the bit I'm trying to get at. Yes it would be best if ZFS was just part of Linux, and maybe some day it can be after Oracle is dead and gone (or under a new leadership and strategy). But it's almost beside the point.
Every other OS supports installing drivers that aren't "part" of the OS. I don't understand why Linux is so hostile to this very real use case. Sure it's not ideal, but the world is full of compromises.
But that is also not without reason, in a certain way Linux balances in a field where they are and want to stay open source. But a lot of users (and someteimes the companies paying some "contributors", too) are companies which are not always that happy about open source. So if it's easy to not put drivers under permissive licenses and still get a good experience out of it they will have very little sensitive to ever make any in-tree GBL drivers and Linux would run at risk of becoming a skeleton you can't use without accepting/buying drivers from multiple 3rd parties.
Through take that argument with a (large) grain of salt, there are counter arguments for it, too. E.g. the LLVM project with is much more permissive and still maintained well, but then is also a very different kind of software.
That shouldn't actually matter; it should just depend on the license. But millions in legal fees says otherwise.
As a Linux user and an ex android user, I absolutely disagree and would add that the GPL requirement for drivers is probably the biggest feature Linux has!
Android did start making this less of problem with HAL and stuff, but it's still a problem, just a less big one.
The issue here is which parts of the kernel API are allowed for non-GPL modules has been decided to be a moving target from version to version, which might as well be interpreted as "just don't bother anymore".
The problem is already addressed: if someone wants to contribute code to the project then it's licensing must be compatible with the prior work contributed to project. That's it.
With that, they do expose a userspace filesystem driver interface, FUSE. There used to be a FUSE ZFS driver, though I believe it's mostly dead now (But I never used it, so I don't know for sure). While it's not the same as an actual kernel FS driver (performance in particular), it effectively allows what you're asking for by exposing an API you can write a filesystem driver against without it being part of the kernel code.
Presumably, such a thing would just be a set of kernel APIs that would parallel the FUSE APIs, but would exist for (DKMS) kernel modules to use, rather than for userland processes to use. Due to the parallel, it would only be the work of a couple hours to port any existing FUSE server over into being such a kernel module.
And, given how much code could be shared with FUSE support, adding support for this wouldn't even require much of a patch.
Seems like an "obvious win", really.
That's why Windows moved WSL2 to being a kernel running on hyper-v rather than in kernel. Their IFS (installable filesystem driver) stack screws up where the buffer cache manager is, and it was pretty much impossible to change. At that point, the real apples to apples comparison left NT lacking. Running a full kernel in another VM ended up being faster because of this.
With that, I doubt the performance issues are directly because it runs in userspace, they're likely due to the marshaling/transferring from the in-kernel APIs into the FUSE API (And the complexity that comes with talking to userspace for something like a filesystem), as well as the fact that the FUSE program has to call back into the kernel via the syscall interface. Both of those things are not easily fixable - FKSE would still effectively be using the FUSE APIs, and syscalls don't translate directly into callable kernel functions (and definitely not the ones you should be using).
There are ways and means to do this. It would be perfectly possible to have a versioned VFS interface and permit filesystems to provide multiple implementations to interoperate with different kernel versions.
I can understand the desire to be unconstrained by legacy technical debt and be able to change code at will. I would find that liberating. However, this is no longer a project run by dedicated amateurs. It made it to the top, and at this point in time, it seems undisciplined and anachronistic.
Yes, which Linus has also poo-pooed:
"People who think that userspace filesystems are realistic for anything but toys are just misguided."
Linus is just plain wrong on this one.
> fuse works fine if the thing being exported is some random low-use interface to a fundamentally slow device. But for something like your root filesystem? Nope. Not going to happen.
His point is that FUSE is useful and fine for things that aren't performance critical, but it's fundamentally too slow for cases where performance is relevant.
It is the policy of linux development at work. Linux kernel doesn't break userspace, you could safely upgrade kernel, your userspace would work nice. But Linux kernel breaks inner APIs easily, and kernel developers take responsibility for all the code. So if a patch in memory management subsystem broke some drivers, kernel developers would find breakages and fix them.
> We don't consider Windows drivers part of Windows.
Yeah, because Windows kernel less frequently breaks backward compatibility in kernel space, and because hardware vendors are ready to maintain drivers for Windows.
It seems that some people are oblivious to the actual problem, which is some people want their code to be mixed into the source code of a software project without having to comply with the rightholder's wishes, as if their will shouldn't be respected.
I'm not sure you can commit your source code to the windows kernel project.
The problem is the nature of changes, and people questioning if there is any good _technical_ reason why some of the changes need to be the way they are done.
It's less a think Linux can work on then a think lawmakers/courts would have to make binding decisions on, which would make it clear if this usage is Ok or not. But in practice this can only be decided on a case-by-case basis.
The only way Linux could work on this is by:
1. Adding a exception to there GPL license to exclude kernel modules from GPL constraints (which obviously won't happen for a bunch of reasons).
2. Turn Linux into a micro kernel with user-land drivers and interfaces for that drivers which are not license encumbered (which again won't happen because this would be a completely different system)
3. Oracle re-licensing ZFS under a permissible Open Source license (e.g. dual license it, doesn't need to be GPL, just GPL compatible e.g. Apache v2). Guess, what that won't happen either, or at last I would be very surprised. I mean Oracle is running out of products people _want_ to buy from them and increasingly run into an area where they (ab-)use the license/copyright/patent system to earn their monny and force people to buy there products (or at last somehow pay license fees to them).
A shim layer is a poor legal bet. It assumes that a judge who might not have much technical knowledge will agree that by putting this little technical trickery between the two incompatible works then somehow that turn it from being a single combined work into two cleanly separated works. It could work, but it could also very easily be seen as meaningless obfuscation.
> Why are all drivers expected to use the GPL
Because a driver is tightly depended on the kernel. It is this relationship that distinguish two works from a single work. A easy way to see this is how a music video work. If a create a file with a video part and a audio part, and distribute it, legally this will be seen as me distributing a single work. I also need to have additional copyright permission in order to create such derivative work, rights that goes beyond just distributing the different parts. If I would argue in court that I just am distributing two different works then the relationship between the video and the music would be put into question.
A userspace software is generally seen as independent work. One reason is that such software can run on multiple platforms, but the primary reason is that people simply don't see them as an extension of the kernel.
It's a feature, not a bug. Linux is intentionally hostile to binary-blob drivers. Torvalds described his decision to go with the GPLv2 licence as the best thing I ever did. 
This licensing decision sets Linux apart from BSD, and is probably the reason Linux has taken over the world. It's not that Linux is technically superior to FreeBSD or OpenSolaris.
> Expecting all Linux drivers to be GPL-licensed is unrealistic and just leads to crappy user experiences
'Unrealistic'? Again, Linux took over the world!
As for nVidia's proprietary graphics drivers, they're an unusual case. To quote Linus: I personally believe that some modules may be considered to not be derived works simply because they weren't designed for Linux and don't depend on any special Linux behaviour 
Because of the 'derived works' concept.
The GPL wasn't intended to overreach to the point that a GPL web server would require that only GPL-compatible web browsers could connect to it, but it was intended to block the creation of a non-free fork of a GPL codebase. There are edge-cases, as there are with everything, such as the nVidia driver situation I mentioned above.
Vendors are expected to merge their drivers in mainline because that is the path to getting a well-supported and well-tested driver. Drivers that get merged are expected to use a GPL2-compatible license because that is the license of the Linux kernel. If you're wondering why the kernel community does not care about supporting an API for use in closed-source drivers, it's because it's fundamentally incompatible with the way kernel development actually works, and the resulting experience is even more crappy anyway. Variations of this question get asked so often that there are multiple pages of documentation about it  .
The tl;dr is that closed-source drivers get pinned to the kernel version they're built for and lag behind. When the vendor decides to stop supporting the hardware, the drivers stop being built for new kernel versions and you can basically never upgrade your kernel after that. In practice it means you are forced to use that vendor's distro if you want things to work properly.
>[...] nVidia is never going to release full-featured GPL'd drivers.
All that says to me is that if you want your hardware to be future-proof, never buy nvidia. All the other Linux vendors have figured out that it's nonsensical to sell someone a piece of hardware that can't be operated without secret bits of code. If you ever wondered why Linus was flipping nvidia the bird in that video that was going around a few years ago... well now you know.
To answer your excellent question (and ignore the somewhat unfortunate slam on people who seem to differ with your way of thinking), it is an intentional goal of software freedom. The idea of a free software license is to allow people to obtain a license to the software if they agree not to distribute changes to that software in such a way so that downstream users have less options than they would with the original software.
Some people are at odds with the options available with licenses like the GPL. Some think they are too restrictive. Some think they are too permissive. Some think they are just right. With respect to you question, it's neither here nor there if the GPL is hitting a sweet spot or not. What's important is that the original author has decided that it did and has chosen the license. I don't imagine that you intend to argue that a person should not be able to choose the license that is best for them, so I'll just leave it at that.
The root of the question is "What determines a change to the software". Is it if we modify the original code? What if we add code? What if we add a completely new file to the code? What if we add a completely new library and simply link it to the code? What if we interact with a module system at runtime and link to the code that way?
The answers to these questions are not well defined. Some of them have been tested in court, while others have not. There are many opinions on which of these constitutes changing of the original software. These opinions vary wildly, but we won't get a definitive answer until the issues are brought up in court.
Before that time period, as a third party who wishes to interact with the software, you have a few choices. You can simply take your chances and do whatever you want. You might be sued by someone who has standing to sue. You might win the case even if you are sued. It's a risk. In some cases the risk is higher than others (probably roughly ordered in the way I ordered the questions).
Another possibility is that you can follow the intent of the original author. You can ask them, "How do you define changing of the software". You may agree with their ideas or not, but it is a completely valid course of action to choose to follow their intent regardless of your opinion.
Your question is: why are all drivers expected to use the GPL? The answer is because drivers are considered by the author to be an extension of the software and hence to be covered by the same license. You are absolutely free to disagree, but it will not change the original author's opinion. You are also able to decide not to abide by the author's opinion. This may open you up to the risk of being sued. Or it may not.
Now, the question unasked is probably the more interesting question. Why does Linus want the drivers to be considered an extension of the original software? I think the answer is that he sees more advantages in the way people interact in that system than disadvantages. There are certainly disadvantages and things that we currently can't use, but for many people this is not a massive hardship. I think the question you might want to put to him is, what advantages have you realised over the years from maintaining the license boundaries as they are? I don't actually know the answer to this question, but would be very interested to hear Linus's opinion.
> The root of the question is "What determines a change to the software". [...] The answers to these questions are not well defined.
And that's fair, but what confuses me is that I never see this question raised on non-Linux platforms. No one considers Windows drivers a derivative of Windows, or Mac kernel extensions a derivative of Darwin.
Should the currently-in-development Windows ZFS port reach maturity and gain widespread adoption (which feels possible!), do you foresee a possibility of Oracle suing? If not, why is Linux different?
Perhaps they do, but the difference is that their licensing does not regard their status of derivative works as being important. Those platforms have their own restrictions on what drivers they want to allow. In particular, Mac doesn't even allow unsigned drivers anymore, and any signed drivers have to go through a manual approval process. And don't forget iOS, which doesn't even support user-loadable drivers at all.
>Should the currently-in-development Windows ZFS port reach maturity and gain widespread adoption (which feels possible!), do you foresee a possibility of Oracle suing? If not, why is Linux different?
I'm not sure, I haven't used Windows in many years and I don't know their policies. But see what I said earlier: the simple answer is that the license is different from the license of Linux. For more details, the question you should be asking is: Is the CDDL incompatible with Windows licensing?
Just to clarify one little thing, because it appears to be something of a common misconception:
> Mac doesn't even allow unsigned drivers anymore
You can absolutely still install unsigned drivers (kernel extensions) on macOS, the user just needs to run a Terminal command from recovery mode. This is a one-time process that takes all of five minutes if you know what you're doing.
You can theoretically replace the Darwin kernel with your own version too. macOS is not iOS, you can completely open it up if you want.
He is claiming that it comes down to the user's choice, which would be just fine if that were true. The only problem here is that Linux has purposely taken steps to hinder that choice.