Hacker News new | past | comments | ask | show | jobs | submit login
Linus: Don't Use ZFS (realworldtech.com)
572 points by rbanffy 8 days ago | hide | past | web | favorite | 555 comments





Here's his reasoning:

"honestly, there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself that says that yes, it's ok to do so and treat the end result as GPL'd.

Other people think it can be ok to merge ZFS code into the kernel and that the module interface makes it ok, and that's their decision. But considering Oracle's litigious nature, and the questions over licensing, there's no way I can feel safe in ever doing so.

And I'm not at all interested in some "ZFS shim layer" thing either that some people seem to think would isolate the two projects. That adds no value to our side, and given Oracle's interface copyright suits (see Java), I don't think it's any real licensing win either."


Btrfs crashed for me on two occations, last time, around 2 years back I have installed zfs (which I am using for ~10 years on FreeBSD server) which works like a charm since then.

I understand Linus reasoning but there is just no way I will install btrfs, like ever. I rather dont update kernel (I am having zfs on fedora root with degular kernel updates and scripts which verify that everything is with kernel modules prior to reboot) than use file system that crashed twice in two years.

Yes it is very annoying if update crashes fs, but currently:

- in 2 years two time btrfs crashed itself

- in next 2 years update never broke zfs

As far as I am concerned, the case for zfs is clear.

This might be helpful to someone: https://www.csparks.com/BootFedoraZFS/index.md

Anyway Linus is going too far with his GPL agenda, the MODUL_LICENSE writting kernel moduls explains why the hardware is less supported on linux - instead of devs. focusing on more support from 3rd party companies, they try to force them to do GPL. Once you set MODUL_LICENSE to non GPL, you quickly figure out that you can't use most of kernel calls. Not the code. Calls.


The Linux kernel has been released under GPL2 license since day 1, and I don't think that's ever going to change. Linus is more pragmatic than many of his detractors think - he thankfully refused to migrate to GPL3 because the stricter clauses would have scared away a lot of for-profit users and contributors.

Relaxing on anything more permissive than GPL2 would instead mean the end of Linux as we know it. A more permissive license means that nothing would prevent Google or Microsoft from releasing their own closed-source Linux, or replacing the source code of most of the modules with hex bloats.

I believe that GPL2 is a good trade-off for a project like Linux, and it's good that we don't compromise on anything less than that.

Even though I agree on the superiority of ZFS for many applications, I think that the blame for the missed inclusion in the kernel is on Oracle's side. The lesson learned from NTFS should be that if a filesystem is good and people want to use it, then you should make sure that the drivers for that filesystem are as widely available as possible. If you don't do it, then someone sooner or later will reverse engineer the filesystem anyway. The success of a filesystem is measured by the number of servers that use it, not by the amount of money that you can make out of it. For once Oracle should act more like a tech company and less like a legal firm specialised in patent exploitation.


The blame is on Oracle side for sure. No question about it.

> or replacing the source code of most of the modules with hex bloats.

Ok good point, I am no longer pissed off on MODULE_LICENSE, didn't even thought about that.


I agree with the stand btrfs, around same time (2 years back), it crashed on me while I was trying to use it for external hard disk attached to raspberry pi. nothing fancy. since then, I cant tolerate the fs crashes, for a user, its supposed to be one of the most reliable layers.

Concerning the BTRFS fs:

I did use it as well many years ago (probably around 2012-2015) in a raid5-configuration after reading a lot of positive comments about this next-gen fs => after a few weeks my raid started falling apart (while performing normal operations!) as I got all kind of weird problems => my conclusion was that the raid was corrupt and it couldn't be fixed => no big problem as I did have a backup, but that definitely ruined my initial BTRFS-experience. During those times even if the fs was new and even if there were warnings about it (being new), everybody was very optimistic/positive about it but in my case that experiment has been a desaster.

That event held me back until today from trying to use it again. I admit that today it might be a lot better than in the past but as people have already been in the past positive about it (but then in my case it broke) it's difficult for me now to say "aha - now the general positive opinion is probably more realistic then in the past", due e.g. to that bug that can potentially still destroy a raid (the "write hole"-bug): personally I think that if BTRFS still makes that raid-functionality available while it has such a big bug while at the same time advertising it as a great feature of the fs, the "irrealistically positive"-behaviour is still present, therefore I still cannot trust it. Additionally that bug being open since forever makes me think that it's really hard to fix, which in turn makes me think that the foundation and/or code of BTRFS is bad (which is the reason why that bug cannot be fixed quickly) and that therefore potentially in the future some even more complicated bugs might show up.

Concerning alternatives:

I am writing and testing since a looong time a program which ends up creating a big database (using "Yandex Clickhouse" for the main DB) distributed on multiple hosts where each one uses multiple HDDs to save the data and that at the same time is able to fight against potential "bitrot" ( https://en.wikipedia.org/wiki/Data_degradation ) without having to resync the whole local storage each time that a byte on some HDD lost its value. Excluding BTRFS, the only other candidate that I found is ZFSoL that perform checksums on data (both XFS and NILFS2 do checksums but only on metadata).

Excluding BTRFS because of the reasons mentioned above, I was left only with ZFS.

I'm now using ZFSoL since a couple of months and so far everything went very well (a bit difficult to understand & deal with at the beginning, but extremely flexible) and performance is as well good (but to be fair that's easy in combination with the Clickhouse DB, as the DB itself writes data already in a CoW-way, therefore blocks of a table stored on ZFS are always very likely to be contiguous).

On one hand, technically, now I'm happy. On the other hand I do admit that the problems about licensing and the non-integration of ZFSoL in the kernel do have risks. Unluckily I just don't see any alternative.

I do donate monthly something to https://www.patreon.com/bcachefs but I don't have high hopes - not much happening and BCACHE (even if currently integrated in the kernel) hasn't been in my experience very good (https://github.com/akiradeveloper/dm-writeboost worked A LOT better, but I'm not using it anymore as I don't have a usecase for it anymore, and it was a risk as well as not yet included in the kernel) therefore BCACHEFS might end up being the same.

Bah :(


I'd avoid making an argument for or against a filesystem on the basis of anecdotal evidence.

For your own personal use, your own personal anecdotes are really all that matter.

Your personal anecdotes are indeed all that matter when it comes to describing your past.

When it comes to predicting your future, though, your personal anecdotes may not hold water to more substantial data.


Btrfs like OCFS is pretty much junk. You can do everything you need to on local disk with XFS and if you need clever features buy a NetApp.

Both ZFS and BTRFS are essentially Oracle now. BTRFS was an effort largely from Oracle to copy SUN's ZFS advantages in a crappy way which became moot once their acquired SUN. ZFS also requires (a lot of) ECC memory for reliable operation. It's a great tech, pity it's dying slow death.

I’d argue that other file systems also require ECC RAM to maximize reliability. Zfs just makes it much more explicit in their docs and surfaces errors rather than silently handing back any memory corrupted data.

ZFS needs ECC just as much as any other file system. That is, it has no way of detecting in memory errors. So if you want your data to actually be written correctly, it's a good idea to use ECC. But the myth that you "need" ECC with ZFS is completely wrong. It would be better if you did have ECC, but don't let that stop you from using ZFS.

As far as it needing a lot of memory, that is also not true. The ARC will use your memory if it's available, because it's available! You paid good money for it, so why not actually use it to make things faster?


I worked at SUN when ZFS was "invented" and the emphasis on a large amount of proper ECC memory was strong, especially in conjunction with Solaris Zones. I can't recall if it was 1GB of RAM per 1TB of storage or something similar due to how it performed deduplication and stored indices in hot memory. And that was also the reason for insisting on ECC, in order to make sure you won't get your stored indices and shared blocks messed up, leading to major uncorrectable errors.

I can see how a (perhaps, less than competitive) hardware company would want you to think that :)

Sure, all about internal marketing, right? :D

But there was nothing like that on the market at that time anyway.


I have examined all the counterarguments against ZFS myself and none of them have been confirmed. ZFS is stable and not RAM-hungry as is constantly claimed. It has sensible defaults, namely to use all RAM that is available and to release it quickly when it is used elsewhere. ZFS on a Raspberry Pi? No problem. I myself have a dual socket, 24 Core Intel Server with 128 GB RAM and a virtual Windows SQL Server instance running on it. For fun, I limited the amount of RAM for ZFS to 40 MB. Runs without problems.

That's his reasoning for not merging ZFS code, not for generally avoiding ZFS.

Here are his reasons for generally avoiding ZFS from what I consider most important to least.

- The kernel team may break it at any time, and won't care if they do.

- It doesn't seem to be well-maintained.

- Performance is not that great compared to the alternatives.

- Using it opens you up to the threat of lawsuits from Oracle. Given history, this is a real threat. (This is one that should be high for Linus but not for me - there is no conceivable reason that Oracle would want to threaten me with a lawsuit.)


I'm baffled by such arguments.

> It doesn't seem to be well-maintained.

The last commit is from 3 hours ago: https://github.com/zfsonlinux/zfs/commits/master. They have dozens of commits per month. The last minor release, 0.8, brought significant improvements (my favorite: FS-level encryption).

Or maybe this is referred to the 5.0 kernel (initial) incompatibility? That wasn't the ZFS dev team's fault.

> Performance is not that great compared to the alternatives.

There are no (stable) alternatives. BTRFS certainly not, as it's "under heavy development"¹ (since... forever).

> The kernel team may break it at any time, and won't care if they do.

That's true, however, the amount is breakage is no different from any other out-of-tree module, and it unlikely to happen with a patch version of a working kernel (in fact, it happen with the 5.0 release).

> Using it opens you up to the threat of lawsuits from Oracle. Given history, this is a real threat. (This is one that should be high for Linus but not for me - there is no conceivable reason that Oracle would want to threaten me with a lawsuit.)

"Using" it won't open to lawsuits; ZFS has a CDDL license, which is a free and open-source software license.

The problem is (taking Ubuntu as representative) shipping the compiled module along with the kernel, which is an entirely different matter.

---

[¹] https://btrfs.wiki.kernel.org/index.php/Main_Page#Stability_...


> ZFS has a CDDL license

Java is GPLv2+CPE. That didn't stop Oracle because, as Linus pointed out in the email, Oracle regards their APIs as a separate entity to their code.


Googles Java implementation wasn't GPL licensed, so neither its implementation nor its interface could have been covered by the OpenJDK being GPLv2. I think RMS wouldn't sit by idly either if someone took GCC and forked it under the Apache license.

But Google didn't fork the OpenJDK; they forked Apache Harmony, which was already Apache-licensed.

So it's not comparable with GCC; but comparable to forking clang and keeping clang license. I doubt RMS could be able to say anything.


> There are no (stable) alternatives. BTRFS certainly not, as it's "under heavy development"¹ (since... forever).

Note that they don't mean "it's unstable," just "there are significant improvements between versions." Most importantly:

> The filesystem disk format is stable; this means it is not expected to change unless there are very strong reasons to do so. If there is a format change, filesystems which implement the previous disk format will continue to be mountable and usable by newer kernels.

...and only _new features_ are expected to stabilise:

> As with all software, newly added features may need a few releases to stabilize.

So overall, at least as far as their own claims go, this is not "heavy development" as in "don't use."


Some features such as Raid5 were still firmly in "don't use if you value your data" territory last I looked. So it is important to be informed as to what can be used and what might be more dangerous with btrfs

Keep in mind that RAID5 isn’t feasible with multi-TB disks (the probability of failed blocks when rebuilding the array is far too high). That said, RAID6 also suffers the same write-hole problem with Btrfs. Personally I choose RAIDZ2 instead.

> Keep in mind that RAID5 isn’t feasible with multi-TB disks (the probability of failed blocks when rebuilding the array is far too high).

What makes you say that? I've seen plenty of people make this claim based on URE rates, but I've also not seen any evidence that it is a real problem for a 3-4 drive setup. Modern drives are specced at 1 URE per 10^15 bits read (or better), so less than 1 URE in 125 TB read. Even if a rebuild did fail, you could just start over from a backup. Sure, if the array is mission critical and you have the money, use something with more redundancy, but I don't think RAID5 is infeasible in general.


Last time I checked (a few years ago I must say), a 10^15 URE was only for enteprise-grade drives and not for consumer-level, where most drives have a 10^14 URE. Which means your build is almost guaranteed to fail on a large-ish raid setup. So yeah, RAID is still feasible with multi-TB disks if you have the money to buy disks with the appropriate reliability. For the common folk, raid is effectively dead with today's disk sizes.

Theoretically, if you have a good RAID5, without serious wire-hole and similar issues, then it is strictly better than no RAID and worse than RAID5 and RAID1.

* All localized error are correctable, unless they overlap on different disks, or result in drive ejection. This precisely fixes UREs of non-raid drives.

* If a complete drive fails, then you have a chance of losing some data from the UREs / localized errors. This is approximately the same as if you used no RAID.

As for URE incidence rate - people use multi-TB drives without RAID, yet data loss does not seem prevalent. I'd say it depends .. a lot.

If you use a crappy RAID5, that ejects a drive on a drive partial/transient/read failure, then yes, it's bad, even worse than no RAID.

That being said, I have no idea whether a good RAID5 implementation is available, one that is well interfaced or integrated into filesystem.


I have a couple of Seagate IronWolf drives that are rated at 1 URE per 10^15 bits read and, sure, depending on the capacity you want (basically 8 TB and smaller desktop drives are super cheap), they do cost up to 40% more than their Barracuda cousins, but we're still well within the realm of cheap SATA storage.

Manufacturer-specified UBE rates are extremely conservative. If UBE were a thing then you'd notice transient errors during ZFS scrubs, which are effectively a "rebuild" that doesn't rebuild anything.

To be sure, it's entirely feasible, just not prudent with today's typical disk capacities.

Feasible is different than possible, and carries a strong connotation of being suitable/able to be done successfully. Many things are possible, many of those things are not feasible.

Btrfs has many more problems than dataloss with RAID5.

It has terrible performance problems under many typical usage scenarios. This is a direct consequence in the choice of core on-disc data structures. There's no workaround without a complete redesign.

It can become unbalanced and cease functioning entirely. Some workloads can trigger this in a matter of hours. Unheard of for any other filesystem.

It suffers from critical dataloss bugs in setups other than RAID5. They have solved a number of these, but when reliability is its key selling point many of us have concerns that there is still a high chance that many still exist, particularly in poorly-exercised codepaths which are run in rare circumstances such as when critical faults occur.

And that's only getting started...


There's differing opinions of BTRFS's suitability in production - it's the default filesystem of SUSE on one hand, on the other RedHat has deprecated BTRFS support because they see it as not being production ready and they don't see it being production ready in the near future. They also feel that the more legacy linux filesystems have added features to compete.

It's also the default file system of millions of Synology NASes running in consumer hands (although Synology shimmed on their own RAID5/6 support)

Kroger (and their subsidiaries like QFC, Fred Meyer, Fry's Marketplace, etc), Walmart, Safeway (and Albertsons/Randalls) all use Suse with BTRFS for their point of sale systems.

Synology uses standard linux md (for btrfs too). Even SHR (Synology Hybrid RAID) is just different partitions on the drive allocated to different volumes, so you can use mixed-capacity drives effectively.

Right, instead of BTRFS RAID5/6, they use Linux md raid, but I believe they have custom patches to BTRFS to "punch through" information from md, so that when BTRFS has a checksum mismatch it can use the md raid mirror disk for repair.


But then, your personal requirements/use cases might not be the same as Facebook's. (And this does not only apply to Btrfs[1]/ZFS, it also applies to GlusterFS, use of specific hardware, ...)

[1] which I used for nearly two years on a small desktop machine on a daily basis; ended up with (minor?) errors on the file system that could not be repaired and decided to switch to ZFS. No regrets, nor similar errors since.


Check what features of BTRFS SUSE actually uses and considers supported/supportable.

bcachefs should be heavily supported, it doesn't get nearly enough for what it supposes to do: https://www.patreon.com/bcachefs

I've been looking forward to using bcachefs as I had a few bad experiences with btrfs.

Is bcachefs more-or-less ready for some use cases now? Does it still support caching layers like bcache did?


It's quite usable, but of course, do not trust it with your unique unbacked-up data yet. I use it as a main FS for a desktop workstation and I'm pretty happy with it. Waiting impatiently for EC to be implemented for efficient pooling of multiple devices.

Regarding caching: "Bcachefs allows you to specify disks (or groups thereof) to be used for three categories of I/O: foreground, background, and promote. Foreground devices accept writes, whose data is copied to background devices asynchronously, and the hot subset of which is copied to the promote devices for performance."


To my knowledge, caching layers are supported but require some setup and don't have much documentation to setup rn.

If all you need is a simple root FS that is CoW and checksummed, bcachefs works pretty good, in my experience. I've been using it productively as a root and home FS for about two years or so.


Many of the advanced features aren't implemented yet though, like compression, encryption, snapshots, RAID5/6....

Compression and encryption have been implemented, but not snapshots and RAID5/6.

why would you want to embed raid5/6 in the filesystem layer? Linux has battle-tested mdraid for this, I'm not going to trust a new filesystem's own implementation over it.

Same for encryption, there are already existing crypto layers both on the block and filesystem (as an overlay) level.


Because the FS can be deeply integrated with the RAID implementation. With a normal RAID, if the data at some address is different between the two disks, there's no way for the fs to tell which is correct, because the RAID code essentially just picks one, it can't even see the other. With ZFS for example, there is a checksum stored with the data, so when you read, zfs will check the data on both and pick the correct one. It will also overwrite the incorrect version with the correct one, and log the error. It's the same kind of story with encryption, if its built in you can do things like incremental backups of an encrypted drive, without ever decrypting it on the target.

> when you read, zfs will check the data on both and pick the correct one.

Are you sure about that? Always reading both doubles read I/O, and benchmarks show no such effect.

> there's no way for the fs to tell which is correct

This is not an immutable fact that precludes keeping the RAID implementation separate. If the FS reads data and gets a checksum mismatch, it should be able to use ioctls (or equivalent) to select specific copies/shards and figure out which ones are good. I work on one of the four or five largest storage systems in the world, and have written code to do exactly this (except that it's Reed-Solomon rather than RAID). I've seen it detect and fix bad blocks, many times. It works, even with separate layers.

This supposed need for ZFS to absorb all RAID/LVM/page-cache behavior into itself is a myth; what really happened is good old-fashioned NIH. Understanding other complex subsystems is hard, and it's more fun to write new code instead.


> If the FS reads data and gets a checksum mismatch, it should be able to use ioctls (or equivalent) to select specific copies/shards and figure out which ones are good. I work on one of the four or five largest storage systems in the world, and have written code to do exactly this (except that it's Reed-Solomon rather than RAID).

This is all great, and I assume it works great. But it is no way generalizable to all the filesystems Linux has to support (at least at the moment). I could only see this working in a few specific instances with a particular set of FS setups. Even more complicating is the fact that most RAIDS are hardware based, so just using ioctls to pull individual blocks wouldn’t work for many (all?) drivers. Convincing everyone to switch over to software raids would take a lot of effort.

There is a legitimate need for these types of tools in the sub-PB, non-clustered, storage arena. If you’re working on a sufficiently large storage system, these tools and techniques are probably par for the course. That said, I definitely have lost 100GBs of data from a multi-PB storage system from a top 500 HPC system due to bit rot. (One bad byte in a compressed data file left the data after the bad byte unrecoverable). This would not have happened on ZFS.

ZFS was/is a good effort to bring this functionality lower down the storage hierarchy. And it worked because it had knowledge about all of the storage layers. Checksumming files/chunks helps best if you know about the file system and which files are still present. And it only makes a difference if you can access the lower level storage devices to identify and fix problems.


> it is no way generalizable to all the filesystems Linux has to support

Why not? If it's a standard LVM API then it's far more general than sucking everything into one filesystem like ZFS did. Much of this block-mapping interface already exists, though I'm not sure whether it covers this specific use case.


> This supposed need for ZFS to absorb all RAID/LVM/page-cache behavior into itself is a myth; what really happened is good old-fashioned NIH.

At the time that ZFS was written (early 2000s) and released to the public (2006), this was not a thing and the idea was somewhat novel / 'controversial'. Jeff Bonwick, ZFS co-creator, lays out their thinking:

* https://blogs.oracle.com/bonwick/rampant-layering-violation

Remember: this was a time when Veritas Volume Manager (VxVM) and other software still ruled the enterprise world.

* https://en.wikipedia.org/wiki/Veritas_Storage_Foundation


I debated some of this with Bonwick (and Cantrill who really had no business being involved but he's pernicious that way) at the time. That blog post is, frankly, a bit misleading. The storage "stack" isn't really a stack. It's a DAG. Multiple kinds of devices, multiple filesystems plus raw block users (yes they still exist and sometimes even have reason to), multiple kinds of functionality in between. An LVM API allows some of this to have M users above and N providers below, for M+N total connections instead of M*N. To borrow Bonwick's own condescending turn of phrase, that's math. The "telescoping" he mentions works fine when your storage stack really is a stack, which might have made sense in a not-so-open Sun context, but in the broader world where multiple options are available at every level it's still bad engineering.

> ... but in the broader world where multiple options are available at every level it's still bad engineering.

When Sun added ZFS to Solaris, they did not get rid of UFS and/or SVM, nor prevent Veritas from being installed. When FreeBSD added ZFS, they did not get rid of UFS or GEOM either.

If an admin wanted or wants (or needs) to use the 'old' way of doing things they can.


Sorry, I'm pernicious in what way, exactly?

Heh. I was wondering if you were following (perhaps participating in) this thread. "Pernicious" was perhaps a meaner word than I meant. How about "ubiquitous"?

The fact that traditionally RAID, LVM, etc. are not part of the filesystem is just an accident of history. It's just that no one wanted to rewrite their single disk filesystems now that they needed to support multiple disks. And the fact that administering storage is so uniquely hard is a direct result of that.

However it happened, modularity is still a good thing. It allows multiple filesystems (and other things that aren't quite filesystems) to take advantage of the same functionality, even concurrently, instead of each reinventing a slightly different and likely inferior wheel. It should not be abandoned lightly. Is "modularity bad" really the hill you want to defend?

> However it happened, modularity is still a good thing.

It may be a good thing, and it may not. Linux has a bajillion file systems, some more useful than others, and that is unique in some ways.

Solaris and other enterprise-y Unixes at the time only had one. Even the BSDs generally only have a few that they run on instead of ext2/3/4, XFS, ReiserFS (remember when that was going to take over?), btrfs, bcachefs, etc, etc, etc.

At most, a company may have purchased a license for Veritas:

* https://en.wikipedia.org/wiki/Veritas_Storage_Foundation

By rolling everything together, you get ACID writes, atomic space-efficient low-overhead snapshots, storage pools, etc. All this just be removing one layer of indirection and doing some telescoping:

* https://blogs.oracle.com/bonwick/rampant-layering-violation

It's not "modularity bad", but that to achieve the same result someone would have had to write/expand a layer-to-layer API to achieve the same results, and no one did. Also, as a first-order estimate of complexity: how many lines of code (LoC) are there in mdraid/LVM/ext4 versus ZFS (or UFS+SVM on Solaris).


Other than esoteric high performance use cases, I'm not really sure why you would really need a plethora of filesystems. And the list of them that can be actually trusted is very short.

I'd like to agree, but I don't think the exceptions are all that esoteric. Like most people I'd consider XFS to be the default choice on Linux. It's a solid choice all around, and also has some features like project quota and realtime that others don't. OTOH, even in this thread there's plenty of sentiment around btrfs and bcachefs because of their own unique features (e.g. snapshots). Log-structured filesystems still have a lot of promise to do better on NVM, though that promise has been achingly slow to materialize. Most importantly, having generic functionality implemented in a generic subsystem instead of in a specific filesystem allows multiple approaches to be developed and compared on a level playing field, which is better for innovation overall. Glomming everything together stifles innovation on any specific piece, as network/peripheral-bus vendors discovered to their chagrin long ago.

>I work on one of the four or five largest storage systems in the world

What would you recommend over zfs for small-scale storage servers? XFS with mdraid?

I'd also love to hear your opinion on the Reiser5 paper.


> With a normal RAID, if the data at some address is different between the two disks, there's no way for the fs to tell which is correct, because the RAID code essentially just picks one, it can't even see the other.

That's problem only with RAID1, only when copies=2 (granted, most often used case) and only when the underlying device cannot report which sector has gone bad.


why would you want to embed raid5/6 in the filesystem layer?

There are valid reasons, most having to do with filesystem usage and optimization. Off the top of my head:

- more efficient re-syncs after failure (don't need to re-sync every block, only the blocks that were in use on the failed disk)

- can reconstruct data not only on disk self-reporting, but also on filesystem metadata errors (CRC errors, inconsistent dentries)

- different RAID profiles for different parts of the filesystem (think: parity raid for large files, raid10 for database files, no raid for tmp, N raid1 copies for filesystem metadata)

and for filesystem encryption:

- CBC ciphers have a common weakness: the block size is constant. If you use FS-object encryption instead of whole-FS encryption, the block size, offset and even the encryption keys can be varied across the disk.


I think to even call volume management a "layer" as though traditional storage was designed from first principles, is a mistake.

Volume management is a just a hack. We had all of these single-disk filesystems, but single disks were too small. So volume management was invented to present the illusion (in other words, lie) that they were still on single disks.

If you replace "disk" with "DIMM", it's immediately obvious that volume management is ridiculous. When you add a DIMM to a machine, it just works. There's no volume management for DIMMs.


Indeed there is no volume management for RAM. You have to reboot to rebuild the memory layout! RAM is higher in the caching hierarchy and can be rebuilt at smaller cost. You can't resize RAM while keeping data because nobody bothered to introduce volume management for RAM.

Storage is at the bottom of the caching hierarchy where people get inventive to avoid rebuilding. Rebuilding would be really costly there. Hence we use volume management to spare us the cost of rebuilding.

RAM also tends to have uniform performance. Which is not true for disk storage. So while you don't usually want to control data placement in RAM, you very much want to control what data goes on what disk. So the analogy confuses concepts rather than illuminating commonalities.


One of my old co-workers said that one of the most impressive things he's seen in his career was a traveling IBM tech demo in the back of a semi truck where they would physically remove memory, CPUs, and disks from the machine without impacting the live computation being executed apart from making it slower, and then adding those resources back to the machine and watching them get recognized and utilized again.

> why would you want to embed raid5/6 in the filesystem layer?

One of the creators of ZFS, Jess Bonwick, explained it in 2007:

> While designing ZFS we observed that the standard layering of the storage stack induces a surprising amount of unnecessary complexity and duplicated logic. We found that by refactoring the problem a bit -- that is, changing where the boundaries are between layers -- we could make the whole thing much simpler.

* https://blogs.oracle.com/bonwick/rampant-layering-violation


It's not about ZFS. It's about CoW filesystems in general; since they offer functionalities beyond the FS layer, they are both filesystems and logical volume managers.

Why does ZFS do RAIDZ in the filesystem layer?

It doesn't.

RAIDZ is part of the VDEV (Virtual Device) layer. Layered on top of this is the ZIO (ZFS I/O layer). Together, these form the SPA (Storage Pool Allocator).

On top of this layer we have the ARC, L2ARC and ZIL. (Adaptive Replacement Caches and ZFS Intent Log).

Then on top of this layer we have the DMU (Data Management Unit), and then on top of that we have the DSL (Dataset and Snapshot Layer). Together, the SPA and DSL layers implement the Meta-Object Set layer, which in turn provides the Object Set layer. These implement the primitives for building a filesystem and the various file types it can store (directories, files, symlinks, devices etc.) along with the ZPL and ZAP layers (ZFS POSIX Layer and ZFS Attribute Processor), which hook into the VFS.

ZFS isn't just a filesystem. It contains as many, if not more, levels of layering than any RAID and volume management setup composed of separate parts like mdraid+LVM or similar, but much better integrated with each other.

It can also store stuff that isn't a filesystem. ZVOLs are fixed size storage presented as block devices. You could potentially write additional storage facilities yourself as extensions, e.g. an object storage layer.


Honestly just use ZFS. We've wasted enough effort over obscure licensing minutia.

> We've wasted enough effort over obscure licensing minutia.

Which was precisely Sun/Oracle's goal when they released ZFS under the purposefully GPL incompatible CDDL. Sun was hoping to make OpenSolaris the next Linux whilst ensuring that no code from OpenSolaris could be moved back to linux. I can't think of another plausible reason why they would write a new open source license for their open source operating system and making such a license incompatible with the GPL.


https://en.wikipedia.org/wiki/Common_Development_and_Distrib...

Some people argue that Sun (or the Sun engineer) as creator of the license made the CDDL intentionally GPL incompatible.[13] According to Danese Cooper one of the reasons for basing the CDDL on the Mozilla license was that the Mozilla license is GPL-incompatible. Cooper stated, at the 6th annual Debian conference, that the engineers who had written the Solaris kernel requested that the license of OpenSolaris be GPL-incompatible.[18]

    Mozilla was selected partially because it is GPL incompatible. That was part
    of the design when they released OpenSolaris. ... the engineers who wrote Solaris 
    ... had some biases about how it should be released, and you have to respect that.

And the very next paragraph states:

> Simon Phipps (Sun's Chief Open Source Officer at the time), who had introduced Cooper as "the one who actually wrote the CDDL",[19] did not immediately comment, but later in the same video, he says, referring back to the license issue, "I actually disagree with Danese to some degree",[20] while describing the strong preference among the engineers who wrote the code for a BSD-like license, which was in conflict with Sun's preference for something copyleft, and that waiting for legal clearance to release some parts of the code under the then unreleased GNU GPL v3 would have taken several years, and would probably also have involved mass resignations from engineers (unhappy with either the delay, the GPL, or both—this is not clear from the video). Later, in September 2006, Phipps rejected Cooper's assertion in even stronger terms.[21]

So of the available licenses at the time, Engineering wanted BSD and Legal wanted GPLv3, so the compromise was CDDL.


Wow... talk about cutting off your nose to spite your face. Oracle ended up abandoning OpenSolaris within a year or so.

Edit: Nevermind, debunked by Bryan Cantrill. It was to allow for proprietary drivers.


Not at all really. Danese Cooper says that Cantrill is not a reliable witness and one can say he also has an agenda to distort the facts in this way [1].

[1] https://news.ycombinator.com/item?id=22008921


And Cooper's boss:

> Simon Phipps (Sun's Chief Open Source Officer at the time), who had introduced Cooper as "the one who actually wrote the CDDL",[19] did not immediately comment, but later in the same video, he says, referring back to the license issue, "I actually disagree with Danese to some degree",[20] while describing the strong preference among the engineers who wrote the code for a BSD-like license, which was in conflict with Sun's preference for something copyleft, and that waiting for legal clearance to release some parts of the code under the then unreleased GNU GPL v3 would have taken several years, and would probably also have involved mass resignations from engineers (unhappy with either the delay, the GPL, or both—this is not clear from the video). Later, in September 2006, Phipps rejected Cooper's assertion in even stronger terms.[21]

* https://en.wikipedia.org/wiki/Common_Development_and_Distrib...

So of the available licenses at the time, Engineering wanted BSD and Legal wanted (to wait for) GPLv3, so the compromise was CDDL.


There were genuine reasons for the CDDL - it wasn't an anti-gpl thing. https://www.youtube.com/watch?v=-zRN7XLCRhc&feature=youtu.be...

Danese Cooper, one of the people at Sun who helped create the CDDL, responded in the comment section of that very video:

Lovely except it really was decided to explicitly make OpenSolaris incompatible with GPL. That was one of the design points of the CDDL. I was in that room, Bryan and you were not, but I know its fun to re-write history to suit your current politics. I pleaded with Sun to use a BSD family license or the GPL itself and they would consider neither because that would have allowed D-Trace to end up in Linux. You can claim otherwise all you want...this was the truth in 2005.


This needs to be more widely known. Sun was never as open or innovative as its engineer/advertisers claim, and the revisionism is irksome. I saw what they had copied from earlier competitors like Apollo and then claimed as their own ideas. I saw the protocol fingerprinting their clients used to make non-Sun servers appear slower than they really were. They did some really good things, and they did some really awful things, but to hear proponents talk it was all sunshine and roses except for a few misguided execs. Nope. It was all up and down the organization.

The thing is - it was a time of pirates. In an environment defined by the ruthlessness of characters like Gates, Jobs, and Ellison, they were among the best-behaved of the bunch. Hence the reputation for being nice: they were markedly nicer than the hive of scum and villainy that the sector was at the time. And they did some interesting things that arguably changed the landscape (Java etc), even if they failed to fully capitalize on them.

(In many ways, it still is a time of pirates, we just moved a bit higher in the stack...)


> In an environment ... they were among the best-behaved

I wouldn't say McNealy was that different than any of those, though others like Joy and Bechtolsheim had a more salutary influence. To the extent that there was any overall difference, it seemed small. Working on protocol interop with DEC products and Sun products was no different at all. Sun went less-commodity with SPARC and SBus, they got in bed with AT&T to make their version of UNIX seem more standard than competitors' even though it was more "unique" in many ways, there were the licensing games, etc. Better than Oracle, yeah, but I wouldn't go too much further than that.


> Sun was never as open or innovative as its engineer/advertisers claim, and the revisionism is irksome.

For (the lack of) openness, I agree, but the claim that they were not innovative needs stronger evidence.


Just to be clear, I'm not saying they weren't innovative. I'm saying they weren't as innovative as they claim. Apollo, Masscomp, Pyramid, Sequent, Encore, Stellar, Ardent, Elxsi, Cydrome, and others were also innovating plenty during Sun's heyday, as were DEC and even HP. To hear ex-Sun engimarketers talk, you'd think they were the only ones. Reality is that they were in the mix. Their fleetingly greater success had more to do with making some smart (or lucky?) strategic choices than with any overall level of innovation or quality, and mistaking one for the other is a large part of why that success didn't last.

Java was pretty innovative. The worlds most advanced virtual machine, a JIT that often outperforms C in long running server scenarios, and the foundation of probably 95% of enterprise software.

ANDF had already done (or at least tried to do) the "write once, run anywhere" thing. The JVM followed in the footsteps of similar longstanding efforts at UCSD, IBM and elsewhere. There was some innovation, but "world's most advanced virtual machine" took thousands of people (many of them not at Sun) decades to achieve. Sun's contribution was primarily in popularizing these ideas. Technically, it was just one more step on an established path.

Sure plenty of the ideas in Java were invented before, standing on the shoulders of giants and all that. The JIT came from Self, the Object system from Smalltalk, but Java was the first implementation that put all those together into a coherent platform.

Yeah, it's hard to understand this without context. Sun saw D-Trace and ZFS as the differentiators of Solaris from Linux, a massive competitive advantage that they simply could not (and would not) relinquish. Opensourcing was a tactical move, they were not going to give away their crown jewels with it.

The whole open-source steer by SUN was a very disingenous strategy, forced by the changed landscape in order to try and salvage some parvence of relevance. Most people saw right through it, which is why SUN ended up as it did shortly thereafter: broke, acquired, and dismantled.


And Cooper's boss:

> Simon Phipps (Sun's Chief Open Source Officer at the time), who had introduced Cooper as "the one who actually wrote the CDDL",[19] did not immediately comment, but later in the same video, he says, referring back to the license issue, "I actually disagree with Danese to some degree",[20] while describing the strong preference among the engineers who wrote the code for a BSD-like license, which was in conflict with Sun's preference for something copyleft, and that waiting for legal clearance to release some parts of the code under the then unreleased GNU GPL v3 would have taken several years, and would probably also have involved mass resignations from engineers (unhappy with either the delay, the GPL, or both—this is not clear from the video). Later, in September 2006, Phipps rejected Cooper's assertion in even stronger terms.[21]

So of the available licenses at the time, Engineering wanted BSD and Legal wanted GPLv3, so the compromise was CDDL.


I stand corrected!

I don't think something that is the subject of an ongoing multi-billion-dollar lawsuit can rightly be called "obscure licensing minutia." It is high-profile and its actual effects have proven pretty significant.

> Honestly just use ZFS. We've wasted enough effort over obscure licensing minutia.

I am willing to bet that Google had the same thought. And I am also willing to bet that Google is regretting that thought now.


It's not just licensing. ZFS has some deep-rooted flaws that can only be solved by block pointer rewrite, something that has an ETA of "maybe eventually".

Care to elaborate?

You can't make a copy-on-write copy of a file. You can't deduplicate existing files, or existing snapshots. You can't defragment. You can't remove devices from a pool.

That last one is likely to get some kind of hacky workaround. But nobody wants to do the invasive changes necessary for actual BPR to enable that entire list.


Wow. As a casual user - someone who at one point was trying to choose between RAID, LVM and ZFS for an old NAS - some of those limitations of ZFS seem pretty basic. I would have taken it for granted that I could remove a device from a pool or defragment.

> There are no (stable) alternatives. BTRFS certainly not, as it's "under heavy development"¹ (since... forever).

Unless you are living in 2012 on a RHEL/CENTOS 6/7 machine, btrfs has been stable for way too long. I have been using btrfs as the sole filesystem on my laptop in standard mode, on my desktop as RAID0 and my NAS as RAID1 for more that two years. I have experienced absolutely zero data loss. Infact, btrfs recovered my laptop and desktop from broken package updates many times.

You might have had some issues when you tried btrfs on distros like RHEL that did not backport the patches to their stable versions because they don't support btrfs commercially. Try something like openSUSE that backports btrfs patches to stable versions or use something like arch.

> That's true, however, the amount is breakage is no different from any other out-of-tree module, and it unlikely to happen with a patch version of a working kernel (in fact, it happen with the 5.0 release).

This is a filesystem that we are talking. In no circumstances will any self respecting sysadmin use a file system that has even a small change of breaking with a system update.


I also used btrfs not too long ago in RAID1. I had a disk failure and voila, the array would be read-only from now on and I would have to recreate it from scratch and copy data over. I even utilized the different data recovery methods (at some point the array would not be mountable no matter what) and in the end that resulted in around 5% of the data being corrupt. I won't rule out my own stupidity in the recovery steps, but after this and the two other times when my RAID1 array went read-only _again_ I just can't trust btrfs for anything other than single device DUP mode operation.

Meanwhile ZFS has survived disk failures, removing 2 disks from an 8 disk RAIDZ3 array and then putting them back, random SATA interface connection issues that were resolved by reseating the HDD, and will probably survive anything else that I throw at it.


I believe he's referring to the raid 5/6 issues

RAID 5/6 issue is the write hole, which is common to all software RAID 5/6 implementations. If it is a problem for you, use either BBU or UPS.

RAIZ/Z2 avoids the issue by having slightly different semantics. That's why it is Z/Z2, not 5/6.


A former employer was threatened by Oracle because some downloads for the (only free for noncommercial use) VirtualBox Extension Pack came from an IP block owned by the organization. Home users are probably safe, but Oracle's harassment engine has incredible reach.

My employer straight up banned the use of VirtualBox entirely _just in case_. They'd rather pay for VMWare Fusion licenses than deal with any potential crap from Oracle.

Anecdotal, but VirtualBox has always been a bit flaky for me.

VMWare Fusion, on the other hand, powers the desktop environment I've used as a daily work machine for the last 6 months, and I've had absolutely zero problems other than trackpad scrolling getting emulated as mouse wheel events (making pixel-perfect scroll impossible).

Despite that one annoyance, it's definitely worth paying for if you're using it for any serious or professional purpose.


On the other hand, VMWare Fusion kernel extension is the only culprit, why I've seen kernel panic on Mac.

This is throwing the baby along with the bathwater.

VirtualBox itself is GPL. There is no lawsuit risk.

What requires "commercial considerations" is the extension pack.

The extension pack is required for:

> USB 2.0 and USB 3.0 devices, VirtualBox RDP, disk encryption, NVMe and PXE boot for Intel cards

If licensing needs to be considered (ie. in a corporate environment), but one doesn't need the functionalities above, then there's no issue.


> This is throwing the baby along with the bathwater.

It might be, but let's just say that Oracle aren't big fans of $WORK, and our founders are big fans of them. Thus our legal department are rather tetchy about anything that could give them even the slightest chance of doing anything.

> What requires "commercial considerations" is the extension pack.

And our legal department are nervous about that being installed, even by accident, so they prefer to minimise the possibility.


Well ... that sounds initially unreasonable, but then if I think about it a bit more I'm not sure how you'd actually enforce a non-commercial use only license without some basic heuristic like "companies are commercial".

Is the expectation here that firms offering software under non-commercial-use-is-free licenses just run it entirely on the honour system? And isn't it true that many firms use unlicensed software, hence the need for audits?


IIRC VirtualBox offers to download the Extension Pack without stating it's not free for commercial use. There isn't even a link to the EULA in the download dialog as far as I can tell (from Google Images, at least). Conversely, VirtualBox itself is free for commercial use. Feels more like a honeypot than license auditing.

They can also apply stronger heuristics, like popping up a dialogue box if the computer is centrally-managed (e.g.: Mac MDM, Windows domain, Windows Pro/Enterprise, etc.).


Wait is this the pack that gets screen resizing and copy/paste working?

You're thinking of the Guest Additions which is part of the base Virtualbox package and free for commercial use.

The (commercially licensed) Extensions pack provide "Support for USB 2.0 and USB 3.0 devices, VirtualBox RDP, disk encryption, NVMe and PXE boot for Intel cards"[1] and some other functionality e.g. webcam passthrough [2]. There may be additional functionality enabled by the Extension pack I cannot find at a glance, but those are the main things.

[1] https://www.virtualbox.org/wiki/Downloads [2] https://www.virtualbox.org/manual/ch01.html#intro-installing


A tad offtopic, but on my 2017 Macbook Pro the "pack" was called VMWare Fusion.

With my MBP as host and Ubuntu as guest, I found that VirtualBox (with and without guest extensions installed) had a lot of graphical performance issues that Fusion did not.


They harass universities about it too. Which is ludicrous, because universities often have residence halls, and people who live there often download VirtualBox extensions.

Their PUEL license even has a grant specifically for educational use.

It does, but it's not 100% clear if administrative employees of universities count as educational. Sure, if you are teaching a class with it, go for it; but running a VM in it for the university accounting office is not as clear.

Education might not be the same as research in this license's terms. And there are even software vendors picking nits about writing a thesis being either research or education, depending on their mood and purse fill level...

> There is no conceivable reason that Oracle would want to threaten me with a lawsuit.

I don't think it has to be conceivable with Oracle...

Unfortunately I have to agree with Linus on this one. Messing with Oracle's stuff is dangerous if you can't afford a comparable legal team.


"Oracle's stuff" can most often be described more accurately as "what Oracle considers its stuff".

Linus is distributing the kernel, a very different beast from using a kernel module. I can't imagine Oracle targeting someone for using ZFS on Linux without first establishing that the distribution of ZFS on Linux is illegal.

> there is no conceivable reason that Oracle would want to threaten me with a lawsuit.

Money. Anecdotally that's the primary reason Oracle do anything.


If anyone thinks this is hyperbole :

I worked for a tiny startup (>2 devs full time) where Oracle tried to extract money from us because we used MariaDB on AWS.

If you think this sounds ridiculous you probably got it right.

(Why? Because someone inexperienced with Oracle had filled out the form while downloading the mySQL client.)


Re-reading my comment in daylight I realize I got one detail almost exactly wrong: we were always <= 2 developers, but it seems everyone understood the point anyway - we were tiny, but not too tiny for Oracles licensing department.

Well... Serves you about right for choosing MySQL over PostgreSQL :)

In my defense it wasn't my choice ;-)

"there is no conceivable reason that Oracle would want to threaten me with a lawsuit."

Don't be so sure about this.


None of these are good reasons to purposely hinder the optional use of ZFS as a third party module by users, which is what Linux is doing.

Can you expand? I'm no expert - use linux daily but have always just used distro default file system. Linus' reasons for not integrating seems pretty sensible to me. Oracle certainly has form on the litigation front.

Linus' reasons for not integrating ZFS are absolutely valid and it's no doubt that ZFS can never be included in the mainline kernel. There's absolutely no debate there.

However the person he is replying to was not actually asking to have ZFS included in the mainline kernel. As noted above, that could never happen, and I believe that Linus is only bringing it up to deflect from the real issue. What they were actually asking is for Linux to revert a change that was made for no other reason than to hinder the use of ZFS.

Linux includes a system which restricts what APIs are available to each module based on the license of the module. GPL modules get the full set of APIs whereas non-GPL modules get a reduced set. This is done strictly for political reasons and has no known legal basis as far as I'm aware.

Not too long ago a change was made to reduce the visibility of a certain API required by ZFS so only GPL modules could use it. It's not clear why the change was made, but it was certainly not to improve the functionality of the kernel in any way. So the only plausible explanation to me is that it was done just to hinder the use of ZFS with Linux, which has been a hot political issue for some time now.


If I remember correctly, the reasoning for the GPL module stuff was/is, that if kernel modules integrate deeply with the kernel, they fall under gpl. So the GPL flag is basically a guideline of what kernel developers believe is safe to use from non gpl-compatible modules

But from what I can see, marking the "save SIMD registers" function as GPL is a blatant lie by a kernel developer that wanted to spite certain modules.

Saving and restoring registers is an astoundingly generic function. If you list all the kernel exports and sort by how much they make your work derivative, it should be near the very bottom.


You are not supposed to use FP/SSE in kernel mode.

It was always frowned upon:

> In other words: it's still very much a special case, and if the question was "can I just use FP in the kernel" then the answer is still a resounding NO, since other architectures may not support it AT ALL.

> Linus Torvalds, 2003

and these specific functions, that were marked as GPL were already deprecated for well over a decade.


> You are not supposed to use FP/SSE in kernel mode.

> It was always frowned upon

Whether it's frowned upon is a completely different issue from whether it intertwines your data so deeply with the kernel that it makes your code a derivative work subject to the GPL license. Which it doesn't.

> if the question was "can I just use FP in the kernel" then the answer is still a resounding NO, since other architectures may not support it AT ALL.

It's not actually using floating point, it's using faster instructions for integer math, and it has a perfectly viable fallback for architectures that don't have those instructions. But why use the slower version when there's no real reason to?

> and these specific functions, that were marked as GPL were already deprecated for well over a decade.

But the GPL export is still there, isn't it? It's not that functionality is being removed, it's that functionality is being shifted to only have a GPL export with no license-based justification for doing so.


So what meets the criteria of being a "special case" and what doesn't? One of the examples that Linus gives is RAID checksumming. How come RAID checksumming is a special case but ZFS checksumming isn't? I don't think it has anything to do with the nature of the usage, the only problem is that the user is ZFS.

RAID checksuming is in the kernel, and when Linus says jump, the RAID folks ask back how high.

He is not going to beg people outside kernel, whether he is allowed to change something that may break their module. On the contrary, they must live with any breackage that is thrown at them.

Again, that symbol was deprecated for well over a decade. How long does it take to be allowed to remove it?


Sometimes in life we do things even though we are not explicitly obligated to do them. Nobody is asking for ZFS to get explicitly maintained support in the Linux kernel. They are simply asking for this one small inconsequential change to be reverted just this one time, since it would literally be no harm to the kernel developers to do so, and it would provide substantial benefits to any user wanting to use ZFS. Furthermore the amount of time that kernel developers have spent arguing in favour of this change has been significantly greater than the time it would have taken to just revert it.

> Again, that symbol was deprecated for well over a decade.

But not the GPL equivalent of the symbol. That symbol is not deprecated.


This is the commonly recited argument but I don't believe it was ever proven to be legally necessary. Furthermore, even if it was, it's not clear what level of integration is "too deep". So in practice, it's just a way for kernel developers to add political restrictions as they see fit.

Proven legally necessary, as in, a court telling them to stop doing something? I'm pretty sure they don't want it to get to that point.

Proven legally necessary, as in, a court ever telling anyone in that situation to stop doing it. Or even to start doing it in the first place. There's just no legal justification behind it whatsoever.

"Proven" is a maybe impossible standard: Kernel devs hint at the GPLonly exports having been useful in certain cases they prefer not to discuss on a ML. https://lore.kernel.org/lkml/20190110131132.GC20217@kroah.co...

One can interpret this as something legally significant, or an embarrassing private anecdote, or nothing substantial at all, maybe even just talk. However, I'd give them the benefit of the doubt. Not the least since they could be the ones against Oracle's legal dept...


What he is referring to is the use of the GPL export restriction to strong-arm companies into releasing their code as GPL. It's nothing to do with a legal requirement, he is just an open source licensing hardhead. See: https://lwn.net/Articles/603145/

Surely the kernel developers can do whatever the hell they like.

If you don’t like that don’t use it.


>This is done strictly for political reasons and has no known legal basis as far as I'm aware.

let me stop you right there. This being "Oracle," and its litigious nature, how can you truly be aware or sure?

Linus is literally saying there is a legal basis.


> This being "Oracle," and its litigious nature, how can you truly be aware or sure?

The functionality I'm describing has absolutely nothing to do with ZFS or Oracle in any way. If you really think the reach of Oracle is so great, then why not block all Oracle code from ever running on the OS? That seems to me to be just as justified as this change.


> why not block all Oracle code from ever running on the OS?

...to be fair, I would probably run that module.


I think that it would be a mandated module by many companies.

Oracle sued Google for copying the same names of the functions.

And I believe oracle copied the amazon s3 api.

I can't make a informed opinion but my uninformed gut feeling is oracle have done what they are suing google for having done.


This want a case of "purposely hinder", but rather the zfs nodule broke because of some kernel changes. The kernel is careful to never break userspace and never break its own merged modules. But if you're a third-party module then you're on your own. The kernel developers can't be responsible for maintaining compatibility with your stuff.

The changes conveniently accomplished nothing except for breaking ZFS. Furthermore, just because they don't officially support ZFS doesn't mean they must stonewall all the users who desire the improved compatibility. Reverting this small change would not be a declaration that ZFS is officially supported.

> - Performance is not that great compared to the alternatives.

CoW filesystems do trade performance for data safety. Or did you mean there are other _stable/production_ CoW filesystems with better performance? If so, please do point them out!


XFS on LVM thin pool LV. Stable and performant as far as I can tell.

My terrible experiences with thin pools makes me see btrfs as a pool of wonderful, trouble-free and perfect code.

Just ask yourself what happens when a thin pool runs out of actual, physical disk blocks?


Isn't this a problem for any over provisioned storage pool ? You can avoid that if you want by not over provisioning & checking space consumed by CoW snapshots. Also what does ZFS do if you run out of blocks ?

I have actually managed to run out of blocks on XFS on thin LV and it's an interesting experience. XFS always survoved just fine, but some files basically vanished. Looks like mostly those that were open and being written to at exhaustion time, like for example a mariadb database backing store. Files that were just sitting there were perfectly fine as far as I could tell.

Still, you definitely should never put data on a volume where a pool can be exhausted, without a backup as I don't think there is really a bulletproof way for a filesystem to handle that happening suddenly.


>Isn't this a problem for any over provisioned storage pool ?

ZFS doesn't over-provision anything by default. The only case I'm aware of where you can over-provision with ZFS is when you explicitly choose to thin provision zvols (virtual block devices with a fixed size). This can't be done with regular file systems which grow as needed, though you can reserve space for them.

File systems do handle running out of space (for a loose definition of handle) but they never expect the underlying block device to run out of space, which is what happens with over-provisioning. That's a problem common to any volume manager that allows you to over provision.


Can't you over provision even just by creating too many many snapshots ? Even if you never make the filesystems bigger then the backing pool, the snapshots will allocate some blocks from the pool and over time, boom.

Snapshots can't cause over-provisioning, not for file systems. If I mutate my data and keep snapshots forever, eventually my pool will run out of free space. But that's not a problem of over-provisioning, that's just running out of space.

With ZFS, if I take a snapshot and then delete 10GB of data my file system will appear to have shrunk by 10GB. If I compare the output of df before and after deleting the data, df will tell me that "size" and "used" have decreased by 10GB while "available" remained constant. Once the snapshot is deleted that 10GB will be made available again and the "size" and "available" columns in df will increase. It avoids over-provisioning by never promising more available space than it can guarantee you're able to write.

I think you're trying to relate ZFS too much to how LVM works, where LVM is just a volume manager that exposes virtual devices. The analogue to thin provisioned LVM volumes is thin-provisioned zvols, not regular ZFS file systems. I can choose to use ZFS in place of LVM as a volume manager with XFS as my file system. Over-provisioned zvols+XFS will have functionally equivalent problems as over-provisioned LVM+XFS.


ZFS doesn't work this way. The free blocks in the ZFS pool are available to all datasets (filesystems). The datasets themselves don't take up any space up front until you add data to them. Snapshots don't take up any space initially. They only take up space when the original dataset is modified, and altered blocks are moved onto a "deadlist". Since the modification allocates new blocks, if the pool runs out of space it will simply return ENOSPC at some point. There's no possibility of over-provisioning.

ZFS has quotas and reservations. The former is a maximum allocation for a dataset. The latter is a minimum guaranteed allocation. Neither actually allocate blocks from the pool. These don't relate in any comparable way to how LVM works. They are just numbers to check when allocating blocks.


LVM thin pools had (maybe still have - I haven't used them recently) another issue though, where running out of metadata space caused the volumes in the thinpool to become corrupt and unreadable.

ZFS does overprovision all filesystems in a zpool by default. Create 10 new filesystems and 'df' will now display 10x the space of the parent fs. A full fs is handled differently than your volume manager running out of blocks. But the normal case is overprovisioning.

That's not really overprovisioning. That's just a factor of the space belonging to a zpool, but 'df' not really having a sensible way of representing that.

That is not over-provisioning, it's just that 'df' doesn't have the concept of pooled storage. With pools it's possible for different file systems to share their "available" space. BTRFS also has its own problems with ouput when using df and getting strange results.

If I have a 10GB pool and I create 10 empty file systems, the sizes reported in df will be 100GB. It's not quite a lie either, because each of those 10 file systems does in fact have 10GB of space available I could write 10GB to any one of them. If I write 1GB to one of those file systems, the "size" and "available" spaces for the other nine will all shrink despite not having a single byte of data written to them.

With ZFS and df the "size" column is really only measuring the maximum possible size (at this point in time, assuming nothing else is written) so it isn't very meaningful, but the "used" and "available" columns do measure something useful.


This is exactly what overprovisioning is: The sum of possible future allocations is greater than available space.

In my example the sum of possible future allocations for ZFS is still only 10GB total. Each of the ten file systems, considered individually, does truthfully have 10GB available to it before any data is written. The difference is that with over-provisioning (like LVM+XFS), if I write 10GB of data to one file system the others will still report 10GB of free space, but with ZFS or BTRFS they'll report 0GB available, so I can never actually attempt to allocate 100GB of data.

You could build a pool-aware version of DF that reflects this, by grouping file systems in a pool together and reporting that the pool has 10GB available. But frankly there's not enough benefit to doing that because people with storage pools already understand summing up all the available space from df's output is not meaningful. Tools like zpool list and BTRFS's df equivalent already correctly report the total free space in the pool.


XFS is not copy on write.

XFS has supported reflinks for some time already, just the deduplication is kind of experimental.

Supporting reflinks is actually more, than can be said about ZoL (see zfsonlinux#405).


I think that you're both right - under normal conditions XFS not CoW, but when using the reflink option it does use CoW => kind of a mix.

>- Using it opens you up to the threat of lawsuits from Oracle. Given history, this is a real threat. (This is one that should be high for Linus but not for me - there is no conceivable reason that Oracle would want to threaten me with a lawsuit.)

No. Distributing (ie. precompiled distro with ZFS) will. You are free to run any software on your machine as you so desire.


This reminds me of the adaptation of a Churchill quote that "ZFS is the worst of the file systems, except for all others."

Well he had this:

> as far as I can tell, it has no real maintenance behind it either any more

Which simply isn't true. They just released a new ZFS version with encryption built in (no more ZFS + LUKS) and they removed the SPL dependency (which didn't support Linux 5.0+ anyway).

I use ZFS on my Linux machines for my storage and I've been rather happy with it.


Same, for at least 6 years in a 4 drive zraid array. It always reads and writes at full gigabit ethernet speeds and I haven't had any downtime other than maintaining FreeBSD updates which are trivial even when going from 10.x to 11 to 12.

"Same" for the last ~4 years, starting with 8 disks and as of 2018, the 24-bay enclosure is full. Each vdev is a mirrored pair split across HBAs to sedate my paranoia. I've replaced a few drives after watching unreadable sector count slowly increase over a few months. I've also switched out most of the original 3TB pairs to 8TB and 10TB pairs. ~42TB usable and the box only has 16GB of RAM (because I can't get the used 32GB sticks to work, it's a picky mainboard and difficult to find matching ECC memory here in Europe). I haven't powered down much except to attempt to replace the RAM or during extremely hot days. Read/write speed is more or less max gigabit, even during rebuild after hot-swapping drives.

Same here (4-drive raidz for many years), though I do have an issue where deleting large files (~1 GB) takes around a minute and nobody seems to know why (I have plenty free space and RAM)...

do you have lots of snapshots? every snapshotting FS I've worked with has really slow deletes, especially when the volume is near capacity.

Snapshots are one thing ZFS is fast at. All the blocks for a given snapshot are placed on a "deadlist". Snapshot deletion is essentially just returning this list of blocks back to the free pool. A terabyte snapshot will take a short while (in the background) to recycle those blocks. But the deletion itself is near instantaneous.

I think you misunderstand: file deletions are what is slow (I don't use ZFS, my reference is WAFL, but my understanding is that all snapshotting file systems have this problem).

Even this should have minimal overhead. If the file is present in the snapshot, then it's simply moving the blocks over to the deadlist which is a very cheap operation. If it's not in the snapshot then the blocks will get recycled in the background. In both cases you should have the unlink complete almost immediately.

All of the snapshot functionality is based upon simple transaction number comparisons plus the deadlist of blocks owned by the snapshot. Only the recycling of blocks should have a bit of overhead, and that's done by a background worker--you see the free space increase for a few minutes after a gargantuan snapshot or dataset deletion, but the actual deletion completed immediately.


I've been promised many things by vendors and they always fall back to "hey! look! cool CS file system theory". I test my systems carefully and report the results back; they often don't agree.

I should point out again that I don't have enough direct experience with ZFS to say if this is the case, my experience was with an enterprise NetApp server at a large company that was filling the disk up (>95%) in addition to doing hourly snapshots.


I have 400 in total, though none on the slow volume :/ That shouldn't affect it, right?

A single 5400 rpm drive (the like of wd red) should be able to saturate gigabit ethernet. 4 drive array should be basically idling.

The problem with ZFS is that it isn't part of Linux kernel.

Linux project maintains compatibility with userspace software but it does not maintain compatibility with 3rd party modules and for a good reason.

Since modules have access to any internal kernel API it is not possible to change anything within kernel without considering 3rd party code, if you want to keep that code working.

For this reason the decision was made that if you want your module to work you need to make it part of Linux kernel and then if anybody refactors anything they need to consider modules they would be affecting by the change.

Not allowing the module to be part of the kernel is a disservice to your user base. While there are modules like that that are maintained moderately successfully (Nvidia, vmware, etc.) this is all at the cost of the user and userspace maintainers who have to deal with it.


It isn't just ZFS. All sorts of drivers get broken because Linux refuses to offer a stable API, saying your code should be in the kernel, but also often refuses to accept drivers into the kernel, even open-source code with no particular quality issues (e.g. quickcam, reiserfsv4).

Use FreeBSD where there's a stable ABI and you don't have these problems.


Plenty of drivers get rejected because the kernel developers have no confidence that they will be maintained going forward, which would mean the driver would be removed fairly quickly again.

FreeBSD does not really have a stable ABI; every major release breaks the ABI, so it's only stable for 2 years.

https://wiki.freebsd.org/VendorInformation


Stable for each major and minor release is still a vast step up on Linux.

Having a stable ABI for two years is vastly easier to support than an ABI which changes every two weeks. This is reflected by the number of binary modules which are packaged for FreeBSD in the ports tree, and provided by third-party vendors. This stability makes it possible to properly support for a reasonable timeframe, and vendors are doing so.


Honestly, I don't like binary modules and I am happy with policy that let's me have functional operating system with modern hardware with source code that I have access to (well... except the firmware that even Linux can't do anything about until open-source hardware projects get more traction).

It is enough that almost all devices around me have a bunch of running code that I have absolutely no control over. I need at least one computer I can trust to do MY bidding.


The problem I have with this is that Linux shoots itself in the foot here. It's conflating two different problems: (1) supporting third-party modules and (2) supporting proprietary modules. All modules are ultimately binary; only a small subset are both proprietary and binary-only.

If you look at FreeBSD, the majority of third-party modules are free software. It's stuff like graphics drivers, newer ZFS modules, esoteric HBAs etc. Proprietary modules, like nVidia's graphics driver, are the minority.

I can see and understand why things are the way they are, and indeed I agreed with the approach for many years. Today, I see it being as short sighted as the GCC vs LLVM approach to modular architecture.

Linux is nearly 30 years old now. To not have stable internal interfaces seems to me to be indicative of either bad initial design or ill discipline on the part of its maintainers. Every other major kernel seems to manage to have a stable ABI for third-party functionality, and Linux is an outlier in its approach. Having to upgrade the kernel for a new GPU driver is painful. Not only do I have to wait for a new kernel release, I have to hope that none of the other changes in that release cause breakage or change the behaviour in unexpected ways. Upgrading a third-party module is much less risky.


I don't see how it shoots itself in the foot given that these rules were basically since forever and it is currently most popular open source operating system by a huge margin.

Well, I left Linux in part because a lot of my hardware stopped working - FreeBSD probably has a fraction of the developers that Linux does, yet I actually have more faith in its hardware support because of this issue. YMMV I guess.

Parent updated their post and my comment is no longer relevant.

I don't see how it's an insult to the users. It's saying that not allowing ZFS code to be distributed under the GPL and be maintained as part of the Linux kernel, is a disservice to ZFSonLinux users. Which I think is clearly right.

I edited it out before I saw your comment.

And he was doing fine up to that point. For IMO good reasons, ZFS will likely never be merged into Linux. And filesystem kernel modules from third parties have a pretty long history of breakage issues going back to some older Unixes.

That's going to be plenty of reason not to use ZFS for most people. The licensing by itself is also certainly a showstopper for many.

But I'm not sure his other comments are really fair and, had Oracle relicensed ZFS n years back, ZFS would almost certainly be shipping with Linux, whether or not as the typical default I can't say. It certainly wasn't just a buzzword and there were a number of interesting aspects to its approach.


Well, he says

> It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.

So presumably the licensing problem mentioned by your parent's comment is weighing heavily here. I think this "don't use ZFS" statement is most accurately targeted at distro maintainers. Anyone not actually redistributing Linux and ZFS in a way that would (maybe) violate the GPL is not at any risk. That means even large enterprises can get away with using ZoL.


It's exactly that, when combined with the longstanding practice of maintaining compatibility with userspace, but reserving the right to refactor kernel-space code whenever and wherever needed. If ZFS-on-linux breaks in a subtle or obvious way due to a change in linux, he can't afford to care about that - keeping the linux kernel codebase sane while adding new features, supported hardware, optimizations, and fixes at an honestly scary rate, is not that easy.

See also https://www.kernel.org/doc/html/latest/process/stable-api-no...

(fuse is a stable user-space API if you want one ... it won't have the same performance and capabilities of course ...)


> he can't afford to care about that - keeping the linux kernel codebase sane while adding new features, supported hardware, optimizations, and fixes at an honestly scary rate, is not that easy.

Maybe, but the complains seem to be more related to the (problematic) changes not being of technical nature accidentally braking ZFS, but being more of political nature. With speculation that it might have been meant to _intentionally_ brake ZFS and then pretend this was a accident because ZFS isn't (and can never) be maintained in tree. Basically on the line of "we don't like out of tree kernel modules so we make the live hard for them". No idea if this is actually the case or people just spin thinks together. Even if it is the case I'm not sure what I should think about, because it's at least partially somewhat understandably.


Linus is rather tolerant (or apathetic) about non-GPL modules, but what he doesn't care to do is ensure that there is an appropriate set of non-GPL-marked exports available for external modules. If some other developer happens to mark some export GPL and it happens to be one key export needed by a non-GPL external module, Linus doesn't care, because he doesn't care about external modules.

This has come up many times in the past. Keep in mind that linux has always been GPLv2-only, it is not LGPL or anything like that.

https://lwn.net/Articles/769471/

https://lwn.net/Articles/603131/

https://lkml.org/lkml/2012/2/7/451


"Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me."

When he says that, I think on the $500 million Sun spent on advertising java.


Sun isn't going to sue anyone into oblivion any time soon, but Oracle sure will

Sun is all but defunct, I don't think I would characterize it as a subsidiary of Oracle.

That's kinda non-sensical IMO. If Oracle, the parent company is trigger happy, there are no guarantees they won't go deeper to protect their children companies IP if they feel they're being infringed.

I was thinking more of "the buzzword" bit, and how it got to be such a well known technology.

Relevant bits:

"Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.

The benchmarks I've seen do not make ZFS look all that great. And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"


> The benchmarks I've seen do not make ZFS look all that great.

The thing about ZFS that actually appeals to me is how much error-checking it does. Checksums/hashes are kept of both data and metadata, and those checksums are regularly checked to detect and fix corruption. As far as I know it (and filesystems with similar architectures) are the only ones that can actually protect against bit rot.

https://github.com/zfsonlinux/zfs/wiki/Checksums

> And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"

It has as much maintenance as any open source project: http://open-zfs.org/. IIRC, it has more development momentum behind it than the competing btrfs project.


> those checksums are regularly checked to detect and fix corruption.

I don't believe that's true. They are checked on access, but if left alone, nothing will verify them. From what I've read, you need to setup a cron job that runs scrubbing on some regular schedule.


Yes. Those cron jobs are installed by default by all major vendors that supply/support ZFS.

The setup instructions for ZFS always include the "how to setup regular scrubs" step.

Linus is just wrong as far as maintenance, as a look at the linux-zfs lists would show.

From my perspective, it has no real competitor under linux, which is why I use it. I don't consider brtfs mature enough for critical data. (Others can reasonably disagree, I have intentionally high standards for data durability.)

Aside from legal issues, he's talking out of his ass.


I don't care about my data, so I use ext4, and like most non-ZFS peasants I lose files every other day.

Bitrot is a real thing and deduplication is actually very useful for many usecases, so your sarcasm is ill-advised. ZFS has legitimate useful features that ext4 does not.

Not sure where that belief comes from. But it might be that many benchmarks are naive and compare it against other filesystems in single-disc setups with zero tuning. Since its metadata overheads are higher, it's definitely slower in this scenario. However, put a pool onto an array of discs and tune it a little, and the performance scales up and up leaving all Linux-native filesystems, and LVM/dm/mdraid, well behind. It's a shame that Linux has nothing compelling to do better than this.

Last time I used ZFS write performance was terrible compared to an ordinary RAID5. IIRC Writes in a raidz are always limited to a single disk’s performance. The only way to get better write speed is to combine multiple raidzs - which means you need a boatload if disks.

We had a bunch of Thumpers (SunFire X4200) with 48 disks at work, running ZFS on Solaris. It was dog slow and awful, tuning performance was complicated and took ages. One had to use just the right disks in just the right order in RaidZs with striping over them. Swap in a hotspare: things slow to a crawl (i.e. not even Gbit/s).

After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust. Sorry, don't have numbers around anymore, stuff has been trashed since.

Oh, and btw., snapshots and larger numbers of filesystems (which Sun recommended instead of the missing Quota support) also slow things down to a crawl. ZFS is nice on paper and maybe nice to play with. Definitely simpler to use than anything else. But performance-wise it sucked big time, at least on Solaris.


ZFS, on Solaris, not robust?

ZFS for “play”?!

This... is just plain uninformed.

Not just me and my employer, but many (many) others rely on ZFS for critical production storage, and have done so for many years.

It’s actually very robust on Linux as well - considering the fact that freeBSD have started to use the ZoL code base is quite telling.

Would freeBSD also be in the “play” and “not robust” category as well, hanging out together with Solaris?

Will it perform better than all in terms of writes/s? Most likely not - although by staying away from de-dup, enough RAM and adhere the pretty much general recommendation to use mirror vdevs only in your pools, it can be competitive.

Something solid with data integrity guarantees? You can’t beat ZFS, imo.


> Something solid with data integrity guarantees? You can’t beat ZFS, imo.

This reminds me. We had one file server used mostly for package installs that used ZFS for storage. One day our java package stops installing. The package had become corrupt. So I force a manual ZFS scrub. No dice. Ok fine I’ll just replace the package. It seems to work but the next day it’s corrupt again. Weird. Ok I’ll download the package directly from Oracle again. The next day again it’s corrupt. I download a slightly different version. No problems. I grab the previous problematic package and put it in a different directory (with no other copies on the file system) - again it becomes corrupt.

There was something specific about the java package that ZFS just thought it needed to “fix”. If I had to guess it was getting the file hash confused. I’m pretty sure we had dedupe turned on so that may have factored into it.

Anyway that’s the first and only time I’ve seen a file system munge up a regular file for no reason - and it was on ZFS.


Performance wasn't robust, especially on dead disks and rebuilds, but also on pools with many (>100) filesystems or snapshots. Performance would often degrade heavily and unpredictably on such occasions. We didn't loose data more often than with other systems.

"play" comes from my distinct impression that the most vocal ZFS proponents are hobbyists and admins herding their pet servers (as opposed to cattle). ZFS comes at low/no cost nowadays and is easy to use, therefore ideal in this world.


Fair enough, I can’t argue with your personal experience, but I can assure you that ZFS is used ”for real” at many shops.

I’ve only used zfs in two or three way mirror setup, on beefy boxes, where the issues you describe are minimal. Also JBOD only.

The thing is that without checksumming you’ve actually no idea if you lose data. I’ve had several pools over the years report automatic resilvering on checksum mismatches. Usually it’s been disks acting up well before smart can tell, and reporting this has been invaluable.


Sounds like you turned on dedupe, or had an absurdly wide stripe size. You do need to match your array structure to your needs as well as tune ZFS.

On our backup servers (45 disks, 6-wide Z2 stripes) easily handle wire-speed 10G with 32G ARC.

And you're just wrong about snapshots and filesystem counts.

ZFS is no speed demon, but it performs just fine if you set it up correctly and tune it.


Stripe size could have been a problem, though we just went with the default there afair. Most of the first tries was just along the Sun docs, we later only changed things until performance was sufficient. Dedupe wasn't even implemented back then.

Maybe you also don't see as massive an impact because your hardware is a lot faster. X4200s were predominantly meant to be cheap, not fast. No cache, insufficient RAM, slow controllers, etc.


X4200s were the devil's work. Terrible BMC, raid controller, even the disk caddies were poorly designed.

The BMC controller couldn't speak to the disk controller so you had no out-of-band storage management.

I had to Run a fleet of 300 of them, truly an awful time.


ZFS performs quite well if you give it boatloads of RAM. It uses its own cache layer, and eats RAM like hotcakes. XFS OTOH is as fast as the hardware can go with any amount of RAM.

Sort of. But no snapshots.

Wanna use LVM for snapshots? 33% performance hit for the entire LV per snapshot, by implementation.

ZFS? ~1% hit. I've never been able to see any difference at the workloads I run, whereas with LVM it was pervasive and inescapable.


That was with the old LVM snapshots. Modern CoW snapshots have a much smaller impact. Plus XFS developers are working on internal snapshots, multi-volume management, and live fsck (live check already works, live repair to come).

I don't doubt this but do you have any documentation?

Asking for a friend who uses XFS on LVM for disk heavy applications like database, file server, etc.


You would have to look at the implementation directly. The user documentation isn't great for documenting performance considerations, sadly.

Essentially it comes down to this: a snapshot LV contains copies of old blocks which have been modified in the source LV. Whenever a block is updated in the source LV, LVM will need to check if that block been previously copied into all corresponding snapshot LVs. For each source LV where this is not the case, it will need to copy the block to the snapshot LV.

This means that there is O(n) complexity in the checking and copying. And in the case of "thin" LVs, it will also need to allocate the block to copy to, potentially for every snapshot LV in existence, making the process even slower. The effect is write amplification effectively proportional to the total number of snapshots.

ZFS snapshots, in comparison, cost essentially the same no matter how many you create, because the old blocks are put onto a "deadlist" of the most recent snapshot, and it doesn't need repeating for every other snapshot in existence. Older snapshots can reference them when needed, and if a snapshot is deleted, any blocks still referenced are moved to the next oldest snapshot. Blocks are never copied and only have a single direct owner. This makes the operations cheap.


FreeNAS has good documentation on which hardware to pick and how to set up ZFS.

That's for the old "fat" LVM snapshots, right ? No way the new CoW thin LVs have such a big overhead for snapshots.

There will be a much bigger overhead in accounting for all of the allocations from the "thin pool".

The overlying filesystem also lacks knowledge of the underlying storage. The snapshot much be able to accommodate writes up to and including the full size of the parent block device in order to remain readable, just like the old-style snapshots did. That's the fundamental problem with LVM snapshots; they can go read-only at any point in time if the space is exhausted, due to the implicit over-commit which occurs every time you create a snapshot.

The overheads with ZFS snapshots are completely explicit and all space is fully and transparently accounted for. You know exactly what is using space from the pool, and why, with a single command. With LVM separating the block storage from the filesystem, the cause of space usage is almost completely opaque. Just modifying files on the parent LV can kill a snapshot LV, while with ZFS this can never occur.


"After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust."

Please let me know which company this is, so I can ensure that I never end up working there by accident. Much obliged in advance, thank you kindly.


Why? What is bad about playing around with leftover hardware?

Nothing at all; it's what was done to that hardware that's the travesty here. It takes an extraordinary level of incompetence and ignorance to even get the idea to slap Linux with dmraid and LVM on that hardware and then claim that it was faster and more robust without understanding how unreliable and fragile that constelation is and that it was faster because all the reliability was gone.

dmraid raid5/6 lose data, sometimes catastrophically, in normal failure scenarios that the ZFS equivalent handles just fine. If a sector goes bad between the time when you last scrubbed and the time when you get a disk failure (which is pretty much inevitable with modern disk sizes), you're screwed.

> Writes in a raidz are always limited to a single disk’s performance

what? no. why would that be the case? You lose a single disk's performance due to the checksumming.

just from my personal NAS I can tell you that I can do transfers from my scratch drive (NVMe SSD) to the storage array at more than twice the speed of any individual drive in the array... and that's in rsync which is notably slower than a "native" mv or cp.

The one thing I will say is that it does struggle to keep up with NVMe SSDs, otherwise I've always seen it run at drive speed on anything spinning, no matter how many drives.


> what? no. why would that be the case? You lose a single disk's performance due to the checksumming.

I think they are probably referring to the write performance of a RAIDZ VDEV being constrained by the performance of the slowest disc within the VDEV.


true, if you have 7 fast disks and one slow disk in a raidz, you get 7 x slow disk performance.

Have you seen any benchmarks for the scenario you've described?

Have you got any info on how to do the required tuning that's geared towards a home NAS?

Group your disks in bunches of 4 or 5 per Raidz, no more. And have them on the same controller or SAS-expander per bunch. Use striping over the bunches. Don't use hotspares, for performance maybe avoid RAIDz6. Try out and benchmark a lot. Get more RAM, lots more RAM.

Back when I setup my last ZFS running on OmniOS 5 disks was not optimal, though i am running RAIDZ2

But yes, lots of RAM


I think the optimal number of RAIDz5 disks is 3, if you just want performance. But this wastes lots of space of course. Also, the number of SAS/SATA-channels per controller and the topology of expanders is important. Thats why I don't think there is a recipy, you have to try it out for each new kind of hardware.

And as another thread pointed out, stripe size is also an important parameter.


I think you mean RAIDZ1, not 5.

Yes. RAIDz (without the "1") was the original RAID5-equivalent, RAIDz2 is equivalent to RAID6. However since nobody really knows what the hell z1 and z2 is and z1 is easy to mix up with RAID1 for nonZFS people, calling it z5 and z6 is far less confusing.

It's the number of parity disks, pretty simple. There has been occasional talk of making the number arbitrary, though presently only raidz1, raidz2, and raidz3 exist.

I think speed is not the primary reason many (most?) people use ZFS; I think it's mostly about stability, reliability and maintainability.

> And I'm not at all interested in some "ZFS shim layer" thing either

If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.

Expecting all Linux drivers to be GPL-licensed is unrealistic and just leads to crappy user experiences. nVidia is never going to release full-featured GPL'd drivers, and even corporative vendors sometimes have NDAs which preclude releasing open source drivers.

Linux is able to run proprietary userspace software. Even most open source zealots agree that this is necessary. Why are all drivers expected to use the GPL?

---

P.S. Never mind the fact that ZFS is open source, just not GPL compatible.

P.P.S. There's a lot of technical underpinnings here that I'll readily admit I don't understand. If I speak out of ignorance, please feel free to correct me.


I am also not an expert in this space - but if I understand correctly the reason the linux Nvidia driver sucks so much is that it is not GPL'd (or open source at all).

There is little incentive for Nvidia to maintain a linux specific driver, but because it is closed source the community cannot improve/fix it.

> Why are all drivers expected to use the GPL?

I think the answer to this is: drivers are expect to use the GPL if they want to be mainlined and maintained - as Linus said: other than that you are "on your own".


My experience is that linux nvidia drivers are better than the competitors open source drivers.

Nvidia proprietary drivers work OK for me, mostly (I needed to spoof the video card ID so KVM could lie to the Windows drivers in my home VFIO setup, but it wasn't hard.)

But it means I can't use Wayland. Wayland isn't critical for me, but since NVidia is refusing to implement GBM and using EGLStream instead, there's nothing I can do about it. It simply isn't worth NVidia's time to make Wayland work, so I'm stuck using X. If the driver were open-source someone would have submitted a GBM patch and i wouldn't be stuck in this predicament.

I can't wait for NVidia to have real competition in the ML space so I can ditch them.


No you can use Wayland as long as your window manager/environment supports GBM. Gnome and KDE both do (Which for most Linux users is all that is needed).

Now you can't use something like Sway but their lead developer is too evangelical for my taste so even if I had an AMD/Intel card I would never use it.


> No you can use Wayland as long as your window manager/environment supports GBM.

You can do that on Intel and AMD drivers and other open source graphics drivers, which due to being open source allow 3rd parties like redhat to patch in GBM support in drivers and mesa when required.

Nvidia driver does not support GBM code paths. Therefore wayland does not work on nvidia. And because nvidia driver is not open source, someone else cannot patch GBM in.


I'm fairly sure parent meant 'EGLStream', not GBM. KDE and GNOME's Wayland compositors both support EGLStream.

Technically, you can use Wayland.

What you cannot use is applications that use OpenGL or Vulkan acceleration. GBM is used for sharing buffers across APIs handled by GPU. If your Wayland clients use just shm to communicate with compositor, it will work.


Is that experience recent? AMD drivers used to be terrible and Intel isn't even competition.

Depends also on the AMD GPU. Vega is fine, Raven Ridge had weird bugs last time I looked, with rx590 I couldn't even boot the proxmox 6.1 installer (it worked when I swapped in rx580 instead).

Why is Intel not a competition? In laptops, I want only Intel, nothing else. It is the smoothest/most reliable/least buggy thing you may have.


Performance wise, Intel is streets behind.

I know. But do you need that performance for what you do on the computer?

For most uses, Intel GPU is fine.


But if you do need that performance, Intel isn't an option. If you don't, there is no reason to even consider Nvidia. They serve different needs.

I'm currently running a AMD card because I thought the drivers were better. I was mistaken, I still have screen tearing that I can't fix.

No doubt someone more knowledgeable about Linux could fix this issue, but I never had any issues with my nVidia blobs. That's not to say nVidia don't have their own issues.


this was my experience as well. I eventually bought an NVidia card to replace it so I could stop having problems. It's been smooth ever since.

I have both an Nvidia and an AMD card. AMDGPU is the gold standard.

This was true until relatively recently, but no longer.

> drivers are expect to use the GPL if they want to be mainlined and maintained

I think parent comment wasn't asking for third party, non-GPL drivers to be mainlined, but for a stable interface for out-of-tree drivers.


There is just no incentive for this that I can see. Linux is an open source effort. Linus had said that he considers open source "the only right way to do software". Out of tree drivers are tolerated, but the preferred outcome is for drivers to be open sourced and merged to the main Linux tree.

The idea that Linux needs better support for out of tree drivers is like someone going to church and saying to the priest "I don't care about this Jesus stuff but can I have some free wine and cookies please".

Full disclosure my day job is to write out of tree drivers for Linux :)


I would expect a large fraction of Nvidia's GPU sales to be from customers wanting to do machine learning. What platform do these customers typically use? Windows?

How do the Linux and Windows drivers compare on matters related to CUDA?


Nvidia has a proprietary Linux driver that works just fine for GPGPU purposes. But because it's not GPLed, it will never be mainlined into the kernel, so you have to install it separately. This is in contrast to AMD GPUs, for which the driver lives in the Linux kernel itself.

Critically, Nvidia has a GPL'd shim. In the kernel code, which lets them keep a stable ABI. The kind of shim Linus isn't interested in for ZFS.

CUDA works fine, and I have found (completely non-rigorously) that a lot of the time where the workload is somewhat mixed between GPU and CPU you'll get better performance on Linux.

The _desktop_ situation is worse, though perfectly functional. But I boot into Windows when I want battery life and quiet fans on my laptop.


You make it sound like the idea is "if you GPL your driver, we'll maintain it for you", which is kinda bullshit. For one, kernel devs only really maintain what they want to maintain. They'll do enough work to make it compile but they aren't going to go out of their way to test it. Regressions do happen. More importantly though, they very purposefully do no maintain any stability in the driver ABI. The policy is actively hostile to the concept of proprietary drivers.

Which is really kinda of hilarious considering that so much modern hardware requires proprietary firmware blobs to run.


> Expecting all Linux drivers to be GPL-licensed is unrealistic and just leads to crappy user experiences. nVidia is never going to release full-featured GPL'd drivers, and even corporative vendors sometimes have NDAs which preclude releasing open source drivers.

Nvidia is pretty much the only remaining holdout here on the hardware driver front. I don't see why they should get special treatment when the 100%-GPL model works for everyone else.


ZFS is not really GPL-incompatible either, but it doesn't matter. Between FUD and Oracle's litigiousness, the end result is that there is no way to overcome the impression that it is GPL-incompatible.

But it is a problem that you can't reliably have out-of-tree modules.

Also, Linus is wrong: there's no reason that the ZoL project can't keep the ZFS module in working order, with some lag relative to updates to the Linux mainline, so as long as you stay on supported kernels and the ZoL project remains alive, then of course you can use ZFS. And you should use ZFS because it's awesome.


> But it is a problem that you can't reliably have out-of-tree modules.

That is the bit I'm trying to get at. Yes it would be best if ZFS was just part of Linux, and maybe some day it can be after Oracle is dead and gone (or under a new leadership and strategy). But it's almost beside the point.

Every other OS supports installing drivers that aren't "part" of the OS. I don't understand why Linux is so hostile to this very real use case. Sure it's not ideal, but the world is full of compromises.


I'm not sure Linux is especially hostile. A new OS version of, say, Windows can absolutely break drivers from a previous version.

Linux absolutely is especially hostile. Windows will generally try to support existing drivers, even binary-only ones, and give plenty of notice for API changes. FreeBSD has dedicated compatibility with previous ABIs going several versions back. Linux explicitly refuses to offer any kind of stability for its API (i.e. they can and will break APIs even in minor patches), yet alone ABI.

Linux is generally not happy about seeing any out of tree drivers.

But that is also not without reason, in a certain way Linux balances in a field where they are and want to stay open source. But a lot of users (and someteimes the companies paying some "contributors", too) are companies which are not always that happy about open source. So if it's easy to not put drivers under permissive licenses and still get a good experience out of it they will have very little sensitive to ever make any in-tree GBL drivers and Linux would run at risk of becoming a skeleton you can't use without accepting/buying drivers from multiple 3rd parties.

Through take that argument with a (large) grain of salt, there are counter arguments for it, too. E.g. the LLVM project with is much more permissive and still maintained well, but then is also a very different kind of software.


There's a unique variable here and that's Oracle.

That shouldn't actually matter; it should just depend on the license. But millions in legal fees says otherwise.


>If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.

As a Linux user and an ex android user, I absolutely disagree and would add that the GPL requirement for drivers is probably the biggest feature Linux has!


Yes, the often times proprietary android linux driver for are such a pain. Not only make they it harder to reuse the hardware outside of android (e.g. in a laptop or similar). But they also tend to cause delays with android updates and sometimes make it impossible to update a phone to a newer android version even if the phone producer wants to do so.

Android did start making this less of problem with HAL and stuff, but it's still a problem, just a less big one.


There is an "approved" method - write an publish your own kernel module. However if your module is not GPL licensed it cannot be published in the linux kernel itself, and you must keep up with the maintenance of the code. This is a relatively fair requirement imo.

...which is what the ZFS on Linux team are doing?

The issue here is which parts of the kernel API are allowed for non-GPL modules has been decided to be a moving target from version to version, which might as well be interpreted as "just don't bother anymore".


I wonder if this was exactly what they intended, i.e.: "just don't bother anymore to write out of tree driver and put them under GBL into the tree". And ZFS might just have been accidentally hit by this but is in a situation where it can't put thinks into the tree...

> If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.

The problem is already addressed: if someone wants to contribute code to the project then it's licensing must be compatible with the prior work contributed to project. That's it.


But why are all drivers expected to be "part of the project"? We don't treat userspace Linux software that way. We don't consider Windows drivers part of Windows.

It's pretty simple, once they expose such an API they'd have to support it forever, hindering options for refactoring (that happens all the time). With all the drivers in the tree, they can simply update every driver at the same time to whatever new in-kernel API they're rolling out or removing. And being that the majority of drivers would arguably have to be GPL anyway, and thus open-source, the advantages of keeping all the drivers in tree are high, and the disadvantages low.

With that, they do expose a userspace filesystem driver interface, FUSE. There used to be a FUSE ZFS driver, though I believe it's mostly dead now (But I never used it, so I don't know for sure). While it's not the same as an actual kernel FS driver (performance in particular), it effectively allows what you're asking for by exposing an API you can write a filesystem driver against without it being part of the kernel code.


You know, come to think of it, is there anything stopping Linux from having a... FKSE (Filesystem in Kernel SpacE) standard API?

Presumably, such a thing would just be a set of kernel APIs that would parallel the FUSE APIs, but would exist for (DKMS) kernel modules to use, rather than for userland processes to use. Due to the parallel, it would only be the work of a couple hours to port any existing FUSE server over into being such a kernel module.

And, given how much code could be shared with FUSE support, adding support for this wouldn't even require much of a patch.

Seems like an "obvious win", really.


It's not the context switch that kills you for the most part, but the nature of the API and it's lack of direct access to the buffer cache and VMM layer. Making a stable FKSE leads to the same issues.

That's why Windows moved WSL2 to being a kernel running on hyper-v rather than in kernel. Their IFS (installable filesystem driver) stack screws up where the buffer cache manager is, and it was pretty much impossible to change. At that point, the real apples to apples comparison left NT lacking. Running a full kernel in another VM ended up being faster because of this.


I mean, it doesn't really work that way, you can't just port a userspace program into a kernel module. For starters, there's no libc in the kernel - what do you do when you want to call `malloc`? ;)

With that, I doubt the performance issues are directly because it runs in userspace, they're likely due to the marshaling/transferring from the in-kernel APIs into the FUSE API (And the complexity that comes with talking to userspace for something like a filesystem), as well as the fact that the FUSE program has to call back into the kernel via the syscall interface. Both of those things are not easily fixable - FKSE would still effectively be using the FUSE APIs, and syscalls don't translate directly into callable kernel functions (and definitely not the ones you should be using).


The hard part isn't the "FKSE API", the hard part is for the "FKSE driver" to be able to do anything other than talk to that API. Like, scheduling, talking to storage, the network, whatever is needed to actually implement a useful filesystem.

The problem is that nobody is interested in doing that and that's why we are in this situation in the first place. If Oracle wanted to integrate ZFS into Linux they would just relicense it.

Given that the kernel is nearly 30 years old, do you not find it slightly incredible that there has been no effort to stabilise the internal ABI while every other major kernel has managed it, including FreeBSD?

There are ways and means to do this. It would be perfectly possible to have a versioned VFS interface and permit filesystems to provide multiple implementations to interoperate with different kernel versions.

I can understand the desire to be unconstrained by legacy technical debt and be able to change code at will. I would find that liberating. However, this is no longer a project run by dedicated amateurs. It made it to the top, and at this point in time, it seems undisciplined and anachronistic.


> With that, they do expose a userspace filesystem driver interface, FUSE.

Yes, which Linus has also poo-pooed:

"People who think that userspace filesystems are realistic for anything but toys are just misguided."


I mean, he's right. VFS, VMM, and buffer cache are all three sides of the same coin. Nearly every system that puts the FS in user space has abysmal performance; the one exception I can think of off the top of my head is XOK's native FS which is very very very different than traditional filesystems at every layer in the stack, and has abysmal performance again once two processes are accessing the same files.

Oh, I totally agree. But between that statement and this one about ZFS, the takeaway seems to be: for filesystems on Linux, go GPL or go home. Which is fine if that's his attitude, but if so I do wish he'd be more direct about it rather than making claims that are questionable at best (e.g. "ZFS is not maintained"--wtf?).

And yet people use them all the damn time because they're incredibly useful and even more importantly are relatively easy to put together compared to kernel modules.

Linus is just plain wrong on this one.


You should read the full quote, he really doesn't disagree with you:

> fuse works fine if the thing being exported is some random low-use interface to a fundamentally slow device. But for something like your root filesystem? Nope. Not going to happen.

His point is that FUSE is useful and fine for things that aren't performance critical, but it's fundamentally too slow for cases where performance is relevant.


The problem with FUSE file systems is not that they aren't part of the kernel's VCS repo, but that it requires a context switch to user-space.

> But why are all drivers expected to be "part of the project"? We don't treat userspace Linux software that way.

It is the policy of linux development at work. Linux kernel doesn't break userspace, you could safely upgrade kernel, your userspace would work nice. But Linux kernel breaks inner APIs easily, and kernel developers take responsibility for all the code. So if a patch in memory management subsystem broke some drivers, kernel developers would find breakages and fix them.

> We don't consider Windows drivers part of Windows.

Yeah, because Windows kernel less frequently breaks backward compatibility in kernel space, and because hardware vendors are ready to maintain drivers for Windows.


You can license both kernel modules or FUSE implementation any way you see fit. That's a non-issue.

https://www.kernel.org/doc/html/latest/process/license-rules...

It seems that some people are oblivious to the actual problem, which is some people want their code to be mixed into the source code of a software project without having to comply with the rightholder's wishes, as if their will shouldn't be respected.

> We don't consider Windows drivers part of Windows.

I'm not sure you can commit your source code to the windows kernel project.


No, no one wants to force ZFS into the Linux kernel. I think anyone agrees that it needs to be out-of tree the way thinks are currently.

The problem is the nature of changes, and people questioning if there is any good _technical_ reason why some of the changes need to be the way they are done.


Because running proprietary binaries in kernel space is not a good idea nor is it compatible with the vision of Linux?

ZFS isn't proprietary it's merely incompatible with the gpl.

> If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.

It's less a think Linux can work on then a think lawmakers/courts would have to make binding decisions on, which would make it clear if this usage is Ok or not. But in practice this can only be decided on a case-by-case basis.

The only way Linux could work on this is by:

1. Adding a exception to there GPL license to exclude kernel modules from GPL constraints (which obviously won't happen for a bunch of reasons).

2. Turn Linux into a micro kernel with user-land drivers and interfaces for that drivers which are not license encumbered (which again won't happen because this would be a completely different system)

3. Oracle re-licensing ZFS under a permissible Open Source license (e.g. dual license it, doesn't need to be GPL, just GPL compatible e.g. Apache v2). Guess, what that won't happen either, or at last I would be very surprised. I mean Oracle is running out of products people _want_ to buy from them and increasingly run into an area where they (ab-)use the license/copyright/patent system to earn their monny and force people to buy there products (or at last somehow pay license fees to them).


There is a big difference between a company distributing a proprietary Linux driver, and the linux project merging software of a gpl incompatible license. In the first case it is the linux developers who can raise the issue of copyright infringement, and it is the company that has to defend their right to distribute. In the later the roles are reversed with the linux developers who has to argue that they are within compliance of the copyright license.

A shim layer is a poor legal bet. It assumes that a judge who might not have much technical knowledge will agree that by putting this little technical trickery between the two incompatible works then somehow that turn it from being a single combined work into two cleanly separated works. It could work, but it could also very easily be seen as meaningless obfuscation.

> Why are all drivers expected to use the GPL

Because a driver is tightly depended on the kernel. It is this relationship that distinguish two works from a single work. A easy way to see this is how a music video work. If a create a file with a video part and a audio part, and distribute it, legally this will be seen as me distributing a single work. I also need to have additional copyright permission in order to create such derivative work, rights that goes beyond just distributing the different parts. If I would argue in court that I just am distributing two different works then the relationship between the video and the music would be put into question.

A userspace software is generally seen as independent work. One reason is that such software can run on multiple platforms, but the primary reason is that people simply don't see them as an extension of the kernel.


> If there is no "approved" method for creating Linux drivers under licenses other than the GPL, that seems like a major problem that Linux should be working to address.

It's a feature, not a bug. Linux is intentionally hostile to binary-blob drivers. Torvalds described his decision to go with the GPLv2 licence as the best thing I ever did. [0]

This licensing decision sets Linux apart from BSD, and is probably the reason Linux has taken over the world. It's not that Linux is technically superior to FreeBSD or OpenSolaris.

> Expecting all Linux drivers to be GPL-licensed is unrealistic and just leads to crappy user experiences

'Unrealistic'? Again, Linux took over the world!

As for nVidia's proprietary graphics drivers, they're an unusual case. To quote Linus: I personally believe that some modules may be considered to not be derived works simply because they weren't designed for Linux and don't depend on any special Linux behaviour [1]

> Why are all drivers expected to use the GPL?

Because of the 'derived works' concept.

The GPL wasn't intended to overreach to the point that a GPL web server would require that only GPL-compatible web browsers could connect to it, but it was intended to block the creation of a non-free fork of a GPL codebase. There are edge-cases, as there are with everything, such as the nVidia driver situation I mentioned above.

[0] https://en.wikipedia.org/w/index.php?title=History_of_Linux&...

[1] https://en.wikipedia.org/w/index.php?title=Linux_kernel&oldi...


>[...] that seems like a major problem that Linux should be working to address [...] Why are all drivers expected to use the GPL?

Vendors are expected to merge their drivers in mainline because that is the path to getting a well-supported and well-tested driver. Drivers that get merged are expected to use a GPL2-compatible license because that is the license of the Linux kernel. If you're wondering why the kernel community does not care about supporting an API for use in closed-source drivers, it's because it's fundamentally incompatible with the way kernel development actually works, and the resulting experience is even more crappy anyway. Variations of this question get asked so often that there are multiple pages of documentation about it [0] [1].

The tl;dr is that closed-source drivers get pinned to the kernel version they're built for and lag behind. When the vendor decides to stop supporting the hardware, the drivers stop being built for new kernel versions and you can basically never upgrade your kernel after that. In practice it means you are forced to use that vendor's distro if you want things to work properly.

>[...] nVidia is never going to release full-featured GPL'd drivers.

All that says to me is that if you want your hardware to be future-proof, never buy nvidia. All the other Linux vendors have figured out that it's nonsensical to sell someone a piece of hardware that can't be operated without secret bits of code. If you ever wondered why Linus was flipping nvidia the bird in that video that was going around a few years ago... well now you know.

[0]: https://www.kernel.org/doc/html/latest/process/kernel-driver...

[1]: https://www.kernel.org/doc/html/latest/process/stable-api-no...


> Linux is able to run proprietary userspace software. Even most open source zealots agree that this is necessary. Why are all drivers expected to use the GPL?

To answer your excellent question (and ignore the somewhat unfortunate slam on people who seem to differ with your way of thinking), it is an intentional goal of software freedom. The idea of a free software license is to allow people to obtain a license to the software if they agree not to distribute changes to that software in such a way so that downstream users have less options than they would with the original software.

Some people are at odds with the options available with licenses like the GPL. Some think they are too restrictive. Some think they are too permissive. Some think they are just right. With respect to you question, it's neither here nor there if the GPL is hitting a sweet spot or not. What's important is that the original author has decided that it did and has chosen the license. I don't imagine that you intend to argue that a person should not be able to choose the license that is best for them, so I'll just leave it at that.

The root of the question is "What determines a change to the software". Is it if we modify the original code? What if we add code? What if we add a completely new file to the code? What if we add a completely new library and simply link it to the code? What if we interact with a module system at runtime and link to the code that way?

The answers to these questions are not well defined. Some of them have been tested in court, while others have not. There are many opinions on which of these constitutes changing of the original software. These opinions vary wildly, but we won't get a definitive answer until the issues are brought up in court.

Before that time period, as a third party who wishes to interact with the software, you have a few choices. You can simply take your chances and do whatever you want. You might be sued by someone who has standing to sue. You might win the case even if you are sued. It's a risk. In some cases the risk is higher than others (probably roughly ordered in the way I ordered the questions).

Another possibility is that you can follow the intent of the original author. You can ask them, "How do you define changing of the software". You may agree with their ideas or not, but it is a completely valid course of action to choose to follow their intent regardless of your opinion.

Your question is: why are all drivers expected to use the GPL? The answer is because drivers are considered by the author to be an extension of the software and hence to be covered by the same license. You are absolutely free to disagree, but it will not change the original author's opinion. You are also able to decide not to abide by the author's opinion. This may open you up to the risk of being sued. Or it may not.

Now, the question unasked is probably the more interesting question. Why does Linus want the drivers to be considered an extension of the original software? I think the answer is that he sees more advantages in the way people interact in that system than disadvantages. There are certainly disadvantages and things that we currently can't use, but for many people this is not a massive hardship. I think the question you might want to put to him is, what advantages have you realised over the years from maintaining the license boundaries as they are? I don't actually know the answer to this question, but would be very interested to hear Linus's opinion.


Sorry for using the term "zealots", I didn't intend it as a pejorative. I should probably have said "hardliners". I meant only to refer to people at the extreme end of the spectrum on this issue.

> The root of the question is "What determines a change to the software". [...] The answers to these questions are not well defined.

And that's fair, but what confuses me is that I never see this question raised on non-Linux platforms. No one considers Windows drivers a derivative of Windows, or Mac kernel extensions a derivative of Darwin.

Should the currently-in-development Windows ZFS port reach maturity and gain widespread adoption (which feels possible!), do you foresee a possibility of Oracle suing? If not, why is Linux different?


>No one considers Windows drivers a derivative of Windows, or Mac kernel extensions a derivative of Darwin.

Perhaps they do, but the difference is that their licensing does not regard their status of derivative works as being important. Those platforms have their own restrictions on what drivers they want to allow. In particular, Mac doesn't even allow unsigned drivers anymore, and any signed drivers have to go through a manual approval process. And don't forget iOS, which doesn't even support user-loadable drivers at all.

>Should the currently-in-development Windows ZFS port reach maturity and gain widespread adoption (which feels possible!), do you foresee a possibility of Oracle suing? If not, why is Linux different?

I'm not sure, I haven't used Windows in many years and I don't know their policies. But see what I said earlier: the simple answer is that the license is different from the license of Linux. For more details, the question you should be asking is: Is the CDDL incompatible with Windows licensing?


Thank you!

Just to clarify one little thing, because it appears to be something of a common misconception:

> Mac doesn't even allow unsigned drivers anymore

You can absolutely still install unsigned drivers (kernel extensions) on macOS, the user just needs to run a Terminal command from recovery mode. This is a one-time process that takes all of five minutes if you know what you're doing.

You can theoretically replace the Darwin kernel with your own version too. macOS is not iOS, you can completely open it up if you want.


This is nonsense. The problem is not getting ZFS bundled with Linux like he is implying here. The problem is how Linux artificially restricts what APIs your module is able to access based on the license, so you wouldn't be able to use ZFS even by your own prerogative like he is suggesting.

He is claiming that it comes down to the user's choice, which would be just fine if that were true. The only problem here is that Linux has purposely taken steps to hinder that choice.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: