The basis of the incompatibility claim seems to be from 3 arguments:
1) Statements from the FSF website, but those have no legal bearing against the actual text of the licenses
2) Claims that the CDDL was engineered to be incompatible with the GPL, again, it's an interesting hypothesis but has little bearing against the actual text of the licenses
3) Derivative Work argument from the GPL. This seems to be the only one that could hold water; however, I doubt that a court would find ZFS to be derivative work of the Linux kernel.
Furthermore, even if you argue that the "derviative work" clause makes them incompatible, there’s no way actually to prosecute such a violation. Copyright law is something called a tort in law, which means you have to show that someone violated the rules AND also caused quantifiable harm. What would our theory of harm be? (https://blog.hansenpartnership.com/are-gplv2-and-cddl-incomp...)
The main debate over the incompatibility isn't about GPL vs CDDL; even Canonical isn't arguing that the licenses aren't incompatible. It's specifically about the Linux kernel vs ZFS; ie. does linking the ZFS module with the Linux kernel make the ZFS module a derivative work of the Linux kernel.
If yes, it has to be able to have the GPLv2 applied to it, which it can't. If no, then the incompatibility is irrelevant, and the kernel and ZFS can be distributed together and under their own separate licences. This is where opinions (by legal experts, not people on the internet) differ, the FSF and SFC's lawyers say this makes ZFS a derivative work and you can't distribute them together, Canonical's lawyers say it doesn't and therefore you can.
The Software Freedom Conservancy argues that not only is zfs.ko a derived work of Linux, but really any dynamically loaded Linux kernel module is a derived work (of Linux). To me that feels like a bit of a stretch, and one could make the opposite argument.
By virtue of the fact that zfs.ko is an optional module, rather than an integral part of Linux, it's not a derived work. ZFS.ko MUST be a separate entity from Linux in the first place; SINCE zfs.ko is a module, it MUST be logically discrete, while interacting with Linux through the standard, normal kernel interfaces. Further, it's likely that only if ZFS were distributed as a changeset against the Linux source code (rather than buildable as a distinct module) would it possibly be a derived work of Linux.
The statically vs. dynamically compiled (linking) distinction doesn't hold water as it is an arbitrary technical one that only has to do with how things are layed out on the filesytem and/or loaded into memory, since modules, plugins, add-ons, and similar things almost often can be either statically compiled in or dynamically loaded, of which there are many examples of in many existing software.
As long as I don't distribute the resulting binary there should be no legal implications.
The problem is that neither your install or recover CD won't have it enabled by default, so if you have a problem, your hard-drive won't mount.
While the current generation of "make me a magic shiny debian image" is:
(I went looking for some tool I've used in the past, which may, or may not, have been "live-build").
In general though, just pulling down a dkms zfs module into any working recovery image via apt should work.
But not if you don't have net access [ed: or a lan apt mirror..].
edit: including my referral link here will probably result in negative karma, but i'm desperate for points!
(edit: When I wrote this comment, the person I was responding to had only left a relatively short comment, without the argument or the subsequent bullet points.)
For example the GPL talks a lot about linking. Now how does that work with languages where there is no linking at build time (like Java)? The GPL's architects tells us that doesn't matter because its all in the same spirit. But that's just an opinion, you need a court to rule on that to be sure.
So the creators of the CDDL didn't think they were compatible.
What do you mean when you say "have come to the conclusion that they're unlikely to be incompatible"?
Do you believe one could combine the Linux kernel and ZFS into a single source tree and distribute binaries under the terms of the GPLv2?
Risks are sky high and no sane company should invest in it. It's worse than being just incompatible.
Who would the suit be between? Oracle and Canonical?
>Risks are sky high and no sane company should invest in it. It's worse than being just incompatible.
Companies don't seem to have a problem using Nvidia's binary kernel driver so I doubt it.
Canonical is the smallest company, it has the least to lose.
As an aside, are there any independent implementations?
OpenZFS is a fork, and is the upstream for Illumos, FreeBSD, Linux and macOS and all share the same on-disk format. The exact heirarchy, I think it's Illumos and FreeBSD as essentially co-equal upstreams, then ZoL (ZFS on Linux), and macOS is downstream of ZoL. I wouldn't call them different implementations.
zfs-fuse I think is based on the original.
RAID/filesystem expansion is probably the #1 thing that in the past was a huge PITA. These days its possible to reshape a linux array with a single add/grow option to mdadm after installing a new disk.
b. Near as I can tell, upstream GRUB supports ZFS. The article cites a 2011 freebsd thread.
Is that a reasonable idea? What would it take for someone to go about it today? I would expect that it would be possible if every copyright holder (I'd guess mainly developers and their current/past employers, next of kin, estates, etc.) were asked to sign a document collectively consenting to having ZFS's license changed.
Or is there a reason that some current copyright holders might not want that? I get why Sun would've refused since it was a competitive advantage for Solaris, but what's stopping it from happening now?
This whole situation just seems really silly as I'm not sure what anyone has to gain from it.
With Solaris dead I'd generally agree that it's a little pointless to keep it incompatible. They might as well just place it under something like BSD or Apache 2, that should keep the FreeBSD and Mac OS X ports happy as well as opening up the possibility of inclusion in the Linux kernel. With Oracle Solaris dead, I think there's even a case for Oracle's bottom-line: they could start including ZFS with Oracle Enterprise Linux and boast about all the advantages of using it there.
As the copyright holder, can't they do that anyway?
By not relicensing it, they keep others -- Canonical, Red Hat, etc. -- from being able to do the same thing.
However, having zfs outside the mainline kernel is a bit of a pain.
There are frequent breakage in kernel APIs, and the ZFS on Linux project always plays catch-up with these changes. Also, they have to maintain compatibility with various versions of the kernel APIs, and it's probably not an easy job for them.
I know it's asking for trouble, but at home, I'm running a server with ZFS under Debian Sid (so basically a rolling release), and each kernel update is a little risky.
Each time I upgrade, I pray saint Linus, but that's not always enough, it happens quite often that the zfs or spl module fails to build.
Not my brightest idea.
Ended up having a small XFS root partition and putting everything else on ZFS. Works great, but I don't look forward to the day when my root filesystem fails.
This seems to be the biggest problem with ZFS. Even Apple ditched ZFS on macOS.
ZFS is slowing down and impacting the real alternative: btrfs. IMHO Red Hat should be ashamed for abandoning btrfs. ️
Both ZFS and btrfs are owned/driven by Oracle. When Oracle bought Sun (and ZFS), it seemed like the pace of btrfs slowed significantly. I'm not sure what Oracle really wants to do in this situation, but they are clearly in the driver's seat.
(This is also alluded to in the article)
The biggest problem btrfs has, really, is the user interface. Nothing is as easy to use as the zfs command!
This is still a pretty good summary of the differences: https://rudd-o.com/linux-and-free-software/ways-in-which-zfs...
The only part that's outdated now is that btrfs finally has raid5/6 support... kind of. It's still marked experimental and shouldn't be trusted for production data.
Mind, btrfs does have one advantage: online shrink support providing much better resizing capabilities for its pools. This has always been on the backburner of ZFS; Sun once had a very slow implementation back around 2005-2006, but the Oracle acquisition pretty much killed any planned work on that front and I think most users of ZFS likewise don't consider it a high priority either. Even for the case of btrfs... it only works great when it's in the mood to. It's a bit easy to get into situations where it will absolutely refuse to remove devices despite there being plenty of space for it.
While ZFS is owned by Oracle they certainly don't drive development. Oracle has their closed source version of ZFS used in the now dead Solaris. Then their is OpenZFS which is where the majority of research and development takes place and where Illumos, FreeBSD, MacOS and Linux developers upstream their work.
We've been running btrfs in production for our small shop for a couple years. We've had one incident, but not too many problems, really.
Though, next time, I might not be using btrfs RAID-1, and instead use the regular MD. I'd still want the on-disk checksums for data, and the snapshots, which are awesome.
SuSE just doesn't have the same level of resources that Oracle can bring to bear. At the time, btrfs was a crucial feature for the next version of Linux because of ZFS. When Oracle bought Sun, that competitive pressure was gone.
I recently needed a large and fast SQL database. I decided to start by doing some simple tests with dd and some more advanced ones with fio.
On a very decent AMD EPYC 7401P 24-Core with 128G of ram and 6 NVMe drives, I found that performance started declining once I added the 6th drive.
Based on my benchmarks, the most I can get out of this machine is about 4GB/s in read/write performance. This makes me suspect it is the speed limit from the CPU to the PCH.
So instead of this 1 server, I will deploy a cluster of smaller servers each equipped with 2 NVME and hope the network drivers do a decent job (just checked the dmesg: eth2: (PCIe:5.0Gb/s:Width x4)
There’s simply no way a PCIe x4 NIC is going to be able to compete with the bandwidth available within that server.
Would you have some suggestions to diagnose the issue better?
What does that output suggest? Unless you have a 10Gbe card, this won't matter.
I also want to get my hands on a few intel Optanes to test how well VROC performs in practice.
FYI, here are some previous results where I configured partitions of similar sizes, then assembled them like:
mdadm --create --verbose /dev/md6 --level=stripe --raid-devices=6 /dev/nvme[0-5]n1p6
mdadm --create --verbose /dev/md5 --level=stripe --raid-devices=5 /dev/nvme[0-4]n1p5
On repeated benchmarks, the best results were consistently obtained with 5 drives:
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K,
Starting 8 processes
Jobs: 8 (f=8): [w(8)] [100.0% done] [0KB/4397MB/0KB /s] [0/1126K/0 iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=76554: Mon Feb 12 01:47:56 2018
write: io=966497MB, bw=4027.4MB/s, iops=1030.1K, runt=240002msec
slat (usec): min=1, max=3852, avg= 3.45, stdev= 1.49
clat (usec): min=0, max=17533, avg=119.88, stdev=258.75
lat (usec): min=10, max=17535, avg=123.40, stdev=258.76
clat percentiles (usec):
| 1.00th=[ 12], 5.00th=[ 13], 10.00th=[ 15], 20.00th=[ 20],
| 30.00th=[ 27], 40.00th=[ 36], 50.00th=[ 50], 60.00th=[ 71],
| 70.00th=[ 106], 80.00th=[ 167], 90.00th=[ 270], 95.00th=[ 366],
| 99.00th=[ 1020], 99.50th=[ 1992], 99.90th=[ 3536], 99.95th=[ 3952],
| 99.99th=[ 4960]
bw (KB /s): min=402136, max=748520, per=12.50%, avg=515423.89, stdev=44403.70
lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=19.43%, 50=30.18%
lat (usec) : 100=18.95%, 250=19.84%, 500=9.14%, 750=1.12%, 1000=0.32%
lat (msec) : 2=0.52%, 4=0.45%, 10=0.05%, 20=0.01%
cpu : usr=17.59%, sys=50.20%, ctx=50843447, majf=0, minf=71
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=247423104/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
WRITE: io=966497MB, aggrb=4027.4MB/s, minb=4027.4MB/s, maxb=4027.4MB/s, mint=240002msec, ma
Disk stats (read/write):
md5: ios=0/247193645, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/49484620, a
nvme0n1: ios=0/49483811, merge=0/0, ticks=0/5518696, in_queue=5613760, util=90.41%
nvme4n1: ios=0/49480722, merge=0/0, ticks=0/5573820, in_queue=5663004, util=90.29%
nvme1n1: ios=0/49485862, merge=0/0, ticks=0/5390388, in_queue=5483560, util=90.07%
nvme2n1: ios=0/49488566, merge=0/0, ticks=0/5395484, in_queue=5496916, util=90.12%
nvme3n1: ios=0/49484143, merge=0/0, ticks=0/5506276, in_queue=5600852, util=90.30%
Run status group 0 (all jobs):
WRITE: io=963949MB, aggrb=4016.5MB/s, minb=4016.5MB/s, maxb=4016.5MB/s, mint=240001msec, maxt=240001msec
md6: ios=0/246593949, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/41128473, aggrmerge=0/0, aggrticks=0/4307426, aggrin_queue=4365219, aggrutil=89.21%
nvme2n1: ios=0/41137801, merge=0/0, ticks=0/2860992, in_queue=2909124, util=81.31%
nvme3n1: ios=0/41125375, merge=0/0, ticks=0/2861640, in_queue=2896800, util=80.94%
nvme0n1: ios=0/41136785, merge=0/0, ticks=0/7713176, in_queue=7809532, util=89.21%
nvme4n1: ios=0/41142237, merge=0/0, ticks=0/3141888, in_queue=3185880, util=81.73%
nvme1n1: ios=0/41116666, merge=0/0, ticks=0/6495928, in_queue=6578204, util=87.12%
nvme5n1: ios=0/41111977, merge=0/0, ticks=0/2770936, in_queue=2811776, util=81.20%
Try increasing the queue from 16 to 64.
RAID5 is long dead. It's too sensitive to data loss and it has performance issues. Use RAID 1 or RAID 10.
The article was interesting for presenting simple tweaks and a nice table of the results.
What happens when you test each drive individually? Do they all perform identically or is the 6th device slower? An array will be limited by its slowest member even if there is no bus contention.
Also, there are many tunable parameters and I am not sure the defaults are equivalent for RAID5 or RAID6 in the Linux-MD layer. What happens if you use all 6 drives in a RAID5 array? What happens if you use 5 drives in a RAID6 array? Did you try changing the chunk size to match the SSD page size? Are you sure the partitions are all aligned properly to avoid any fragmentation during SSD writes? There are countless variables here...
This is for a raid0, as I am only concerned with performance here. The chunk size is the default 512K. I can try 4k, but I haven't found the Samsung NVMe SM961 page size. The partitions are of course 4k aligned:
Number Start (sector) End (sector) Size Code Name
1 4096 209719295 100.0 GiB FD00 Linux RAID
2 209719296 419434495 100.0 GiB FD00 Linux RAID
3 419434496 629149695 100.0 GiB FD00 Linux RAID
4 629149696 838864895 100.0 GiB FD00 Linux RAID
5 838864896 1048580095 100.0 GiB FD00 Linux RAID
6 1048580096 1258295295 100.0 GiB FD00 Linux RAID
7 1258295296 1468010495 100.0 GiB FD00 Linux RAID
8 1468010496 1677725695 100.0 GiB FD00 Linux RAID
9 1677725696 1875384974 94.3 GiB 8300 Linux filesystem
10 34 4095 2.0 MiB EF02 BIOS boot partition
If you want to try, give me your email and I'll send you the root password to play with SSH. I was setting md5 for psql tests but there's nothing on the server yet, you can also play as much as you want with /dev/md6
You can also get a AX160-NVMe at Hetzner.de. If you cancel before 14 days, there is no charge.
It's possible that by placing cards in different slots you could gain more speed. There may also be some sort of setting to configure PCIe between 8 16-lane slots (accelerator cards) or 32 4-lane slots (NVMe drives).
If you're running a single-threaded application you may be hitting the limits of the DMA and memory controllers.
I posted some benchmarks above using fio -- I think it it properly multithreaded.
Not to mention the GRUB driver. It's GPLv3 but I wouldn't be surprised if the GRUB project is willing to relicense it for a Linux kernel port.