Hacker News new | comments | show | ask | jobs | submit login
ZFS is the FS for Containers in Ubuntu 16.04 (dustinkirkland.com)
281 points by doener on Feb 18, 2016 | hide | past | web | favorite | 278 comments



This is great news, and we're already using ZFS in production on Ubuntu in a few areas at Netflix (not widespread yet).

Ubuntu 16.04 also comes with enhanced BPF, the new Linux tracing & programming framework that is builtin to the kernel, and is a huge leap forward for Linux tracing. Eg, we can start using tools like these: https://github.com/iovisor/bcc#tracing


What about FreeBSD?

Does that imply Netflix is transitioning away from FreeBSD? If so, why?


Netflix cloud: AWS EC2, tens of thousands of cloud instances, mostly Ubuntu.

Netflix CDN (Open Connect Appliance): lots of physical boxes, FreeBSD.


Any reason why you don't use FreeBSD everywhere (or Ubuntu everywhere).

Why two different OS'es?


It's really two questions: Why choose Ubuntu for the cloud, and, why choose FreeBSD for the CDN. We believe that's the best choice for both environments. I was trying to type in an explanation here, but that's really something that will take a lot to explain (maybe a Netflix tech blog post).


If you do write that blog post, it would be cool if you not only covered the FreeBSD vs. Ubuntu aspects of the choice, but also the Ubuntu vs. other Linux aspects (particularly Debian).


That would be a great post.


How does BPF compare to dtrace?


If you browse some of the _example.txt files in https://github.com/iovisor/bcc/tree/master/tools , you'll see it's solving the same problems we used to solve with DTrace, plus a few extra. Here's a couple of the ZFS examples (since we're talking ZFS):

  # ./zfsslower 
  Tracing ZFS operations slower than 10 ms
  TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
  06:31:28 dd             25570  W 131072  38784     303.92 data1
  06:31:34 dd             25686  W 131072  38784     388.28 data1
  06:31:35 dd             25686  W 131072  78720     519.66 data1
  06:31:35 dd             25686  W 131072  116992    405.94 data1
  06:31:35 dd             25686  W 131072  153600    433.52 data1
  [...]

  # ./zfsdist 
  Tracing ZFS operation latency... Hit Ctrl-C to end.
  ^C

  operation = 'read'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 4479     |****************************************|
         8 -> 15         : 1028     |*********                               |
        16 -> 31         : 14       |                                        |
        32 -> 63         : 1        |                                        |
        64 -> 127        : 2        |                                        |
       128 -> 255        : 6        |                                        |
       256 -> 511        : 1        |                                        |
       512 -> 1023       : 1256     |***********                             |
      1024 -> 2047       : 9        |                                        |
      2048 -> 4095       : 1        |                                        |
      4096 -> 8191       : 2        |                                        |
  [...]
The current BPF interface we're using (bcc) is Python for the frontend, and C for the backend. It's currently much more verbose than DTrace, and involves writing 10x the lines of code. For some immediate use cases at Netflix, that's not a big problem, as staff will be using BPF via a GUI (Vector), not writing these tools directly.

There's also high level features it's still missing (like tracepoints and sampling), so what will be in Ubuntu 14.04 won't do everything, but it will do a fair amount: most of those _example.txt's. Some use a newer BPF interface (linux 4.5), and we've been putting the legacy versions in an /old directory specifically for Ubuntu 16.04 users.


Does the current publicly released version of Vector support BPF? Or is there perhaps a PMDA that allows BPF support?

I'm following along with all of this pretty excitedly, and crossing my fingers for a Linux tracing book with BPF, ftrace, perf, etc. to read through and keep on my shelf next to your performance and dtrace books ;)


Vector doesn't have BPF support yet. When it does we should be open sourcing it. As for a book, I'll try to get it done. :)


So good to see ZOL has come so far!


ZFS is nice but as far as I understand the Linux version does not yet have support for copy-on-write clones using e.g. "cp --reflink=always", which to me was reason enough to choose BTRFS instead. Apart from this the two systems seem to be quite comparable (from my limited user perspective), with BTRFS having quite good Linux support as well. Maybe someone more experienced with the COW functionality could comment on that as it would be very interesting to hear how other people deal with this.


I've used both ZFS on both BSD and OpenIndiana, and I've used btrfs on linux. as recently as 4.0.5

As of 4.0.5 btrfs was IMO completely unusable as a daily file system. Some examples of issues I ran into:

1) System became unbootable with the version of btrfs I had installed and I had to use either an older or newer kernel to recover

2) I have a periodic backup of my mailbox that runs, and when it runs my system becomes completely unusable until it completes. The same script running on zfs on bsd and with ext4 or reiser3 on linux would show I/O slowdowns, but I could still use my machine.

3) In general I would run into other minor issues and the consensus in #btrfs was that since my kernel was more than 3 months old, it was probably fixed in the latest version, and why would somebody using an experimental filesystem not be tracking mainline more closely?

[edit]

To be fair, here's some issues with ZFS:

1) Do not ever accidentally enable dedupe on commodity hardware; it will slowly consume all your RAM if you aren't on a sun server (where 64GB of RAM is a resource constrained environment), and there aren't effective ways to undo dedupe, other than copying all the data onto a different pool.

2) You can't shrink a pool. Hugely annoying, apparently non-trivial to solve.


"To be fair, here's some issues with ZFS:"

Let me add:

3) Do not allow a pool to exceed 90% capacity ... and probably don't let it exceed 85%.

ZFS does not have a defrag utility and it badly needs one. You can permanently wreck zpool performance by running it up past 90% capacity - even if you later reduce capacity back down to 75-80%. You can sort of fix it if you add additional top level vdevs to the zpool, thus farming out some IO to the new set of disks, but it's still going to be performance constrained forever. The only solution is to create a new pool and export the data to it.

This is unacceptable, by the way.

It is not at all reasonable to require a filesystem to stay below 80% capacity (our target "full" number at rsync.net) nor is it acceptable that hitting 90% is a (performance) death sentence.

When you consider that you might have already sacrificed 25% of your physical disk just for the raidz2/raidz3 overhead, being constrained to 80% means you're only using 60% or so of your physical disks that you bought.

ZFS needs defrag. Badly.


If gang blocks are generated, you get more I/Os than necessary and an extra level of indirection, but the ZFS code base tries very hard to avoid that situation by switching to the allocator behavior at the metaslab level to best fit, such that most data written to the pool would not have gang blocks used at all. This is understandably terrible for IOPS, but it is likely the source of the performance degradation that you saw. You can probe for the zio_gang_* functions during a scrub to see if any gang blocks exist. On my pool that has exceeded 90% on multiple ocasioks, there are zero gang blocks and consequently, no permanent degradation from them. The only other problem that you might have (which tends to be caused by the order of writes rather than the fullness of the pool) is lower sequential performance from nonlinear block placement (one ZoL user measured this as cutting sequential reads in half on a pool filled with files made by bit torrent), but that is a much less severe problem, especially on solid state drives. If you want to fix placement, you can do a file copy or send/recv. The new locations should have blocks picked in sequence whenever possible.


For ZFS to get defrag, someone with enough ability and knowledge would need to bite the bullet and go through the fairly massive and difficult task of adding in block pointer rewrite, which doesn't seem to be something anyone has been willing to do, and I've seen a lot of concerns about the actual feasibility of it from some very smart people that are knowledgeable with the codebase.


Maybe off topic, but Alex Reece from Delphix had a blog post about a year ago about the ability to remove a disk from a pool.

http://blog.delphix.com/alex/2015/01/15/openzfs-device-remov...

This would imply some kind of rebalancing, if I understand correctly. Which maybe could be considered a form of indirect defragmentation.


Which also incidentally would solve the pool shrinking problem as well.


I wonder how much money you'd need to pay a very talented engineer to do the work. This is yet another thing I didn't read about before building a home NAS on ZFS.


I had this happen to me... I accidentally allowed a pool to fill up to capacity and couldn't do anything with it because deleting files wouldn't free space due to snapshots and the commands to delete snapshots wouldn't work.

Then added a disk to it to try to recover. That worked, but only after adding a disk did I realize that I couldn't shrink the pool down again. I ended up moving the whole thing off to a new disk cluster and back again. Really painful.


That's a good point about the 90% capacity.

Also the same "technically challenging" problem that needs to be solved for shrinking pools would also make defrag possible.


The Reiser guys were talking about this too. I think they needed defrag and were trying to get DARPA money to implement it, but it never happened.


Not all of the performance issues encountered when a pool exceeds 90% capacity are related to fragmentation.

Some implementers have been able to significantly reduce the pain of those situations.


The main factor seems to be the best-fit allocator, which tends to go into action sooner from metaslabs crossing the 94% threshold earlier than the pool itself and still be selected due to LBA weighting, which is a trick to increase throughput on rotational media. This ought to help prevent best-fit allocation from occurring earlier than necessary on SSDs:

https://github.com/zfsonlinux/zfs/commit/fb40095f5f0853946f8...

That said, Delphix made changes to consider on fragmentation rather than space usage when the histogram is in use, so the performance before best-fit behavior goes into effect is better than outright selection of metaslabs by free space.


I've been using btrfs since 2 years (Fedora) and the only problem I've ever run into was "no space left on device", solvable with a rebalance. btrfs also survived many hard poweroff.


I'll pitch in with a neutral position. I've been using btrfs for four years, and in that time I've had unrecoverable fs corruption probably three times. This is on Arch, on bleeding edge kernels, where new releases are prone to regressions that break the filesystem.

But there was a tangible progression from instability towards increasing stability. I haven't had one lick of issue with btrfs in about a dozen kernel releases. I'm close to saying I'd trust in a production environment, since I use it everywhere else as a daily driver, I would just use an LTS release version just to be safe.

It is not all sunshine and roses, though. While Facebook employs several major btrfs developers, a lot of features that have been talked about for years still have not seen the light of day or any development whatsoever. lz4 compression, better checksum algorithms, per-subvolume encryption, online filesystem checking, and the Raid 5/6 support is still kind of garbage a year later. I worry that btrfs is suffering from a lack of interest in actually making the last legitimate pushes it needs, and code audits, and integration testing, to make it truly trustworthy.

But at the end of the day checksum integrity and COW are basically a game changer for me in terms of data integrity.


I tried to use the mirror functionality when it was new. I tested booting with one disk missing. Errors all the way. I went into IRC and chatted with btrfs folk about the "bug". Their response?

"Booting without all members of your mirror is unsupported."


It's certainly supported now. mount -o degraded. It ought to be the default, though.


if it's not default yet it's a serious bug. a server should still come up completely with one disk in a mirror.


If you use ZFS you only have snapshots for this. cp --reflink also has some gotchas on btrfs - if you do a balance it's not preserved and there are some odd problems with snaphots (I have no link at hand, take a look at the mailing list)

ZFS is not comparable to btrfs at the moment. Everything device related is missing on btrfs. No detection of missing or broken devices in btrfs at the moment, no hotspare functionality, btrfs RAID1 uses the pid to decide with disk to read from. RAID5/6 is still experimental and there are some odd behaviours.

Using btrfs for production is a risky bet and may very well bite you. The tooling is terrible at the moment (IMHO) and benchmarks favour ZFS most of the time.


No ZFS version has support for that. It has been discussed and it might be implemented at some point.

That said, I would like to point out that ZFS' dataset level operations are more powerful than reflinks. ZFS' dataset level operations give separate independent snapshot and clone capabilities. They also provide the ability to rollback without killing things on top (which is useful in some cases). You cannot do that with reflinks. I suppose the immutable bit could be used to fix a reflink so that it retains the state at creation, but that is racy. In the case of virtual machines which seems to be a major application of reflinks, zvols are lower overhead and support incremental send/recv.

One benefit of reflinks would be that regular users can use it, but regular users should be able to snapshot, clone and rollback when delegation support is implemented.


Running hourly, daily, weekly and monthly snapshots is reason enough to choose ZFS on Linux for me. And ironically, it runs better on Linux than it did for me on Solaris - I used to get occasional pauses every few minutes when streaming media. Memory utilization for cache purposes isn't fantastic since it doesn't interact well with the rest of the kernel's logic, but everything else is pretty good.


I don't understand why BTRFS isn't gaining more support in Linux. I prefer BTRFS to ZFS with Fuse.


Probably rhymes with "btrfs balance".

I'v had the misfortune of using btrfs in production with a few hundred machines on Ubuntu 14.04. It's one of the most finicky filesystems I've ever used. It's probably better in newer kernels, but if you have a lot of churn it requires constant care and feeding and tends to cause kernel softlockups fairly commonly.

Also this: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-... and this: https://www.phoronix.com/scan.php?page=news_item&px=CoreOS-B... and this (old but a few of the points still apply): http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-...

That being said I'm sure there are plenty of use cases for which btrfs in its current capacity is more than sufficient.


At CoreOS and we tried really hard to make btrfs happen, but it really came down to how different it operated than other file systems. It was mainly a UX issue and thus fell into my lap.

The major issue is that regular debugging tools that folks have been using forever like `df -h` aren't just non-functional, they actively misrepresent the state of the file system. The most common example is indicating that you have plenty of free space when in fact you're out. We had to write a lot of documentation to teach people how btrfs works and how to debug it: https://coreos.com/os/docs/latest/btrfs-troubleshooting.html

The second major issue is that rebalancing requires free space, which is the problem that most folks are trying to fix with a rebalance operation. Catch-22 in the worst way. Containers vary in size and can restart frequently, churning through the btrfs chunks without filling them up, leaving around a lot of empty space that needs to be rebalanced.


I hit that rebalancing needs space (and therefore can ENOSPC) issue at work when trying to compile ZFS on CoreOS on Digital Ocean before CoreOS switched from btrfs to ext4 and overlayfs. Getting ENOSPC on btrfs rebalance when you are seeing regular writes return ENOSPC is a really annoying problem.


> I'v had the misfortune of using btrfs in production with a few hundred machines on Ubuntu 14.04

You are not alone. btrfs seems to be kind of stable - as in does not corrupt itself anymore - with 4.2 but it's been a nasty ride.

It's an experimental filesystem that is neither complete nor stable yet. I wish this would be better communicated.

It's needless frustrating: If you search for btrfs you come across a few slide decks that tell you: It's fine you can use it... after the first strange problems you'll subscribe the mailing list and every other day there is some post that shines some light into strange behavior and stuff that is not implemented.

If you want checksumming on your single hdd backup disk btrfs is fine. For everything else you are up to some surprises... basically everything volume management and RAID is pretty much experimental and has strange behavior.

Performance is not even a topic. I remember the ML discussion on this OLTP blogpost and the majority of responses was: Don't run databases on btrfs you stupid! I'd rather would read a technical discussion about the problems but from reading the ML it seems like it's too complex and few understand the complexity.

@bcantrill called it a shit-show in some podcast and while it maybe technically not true it sure does looks like that.

If you want peace of mind use mdraid+ext4 (or xfs) - ZFS on Linux has a lot of problems for heavy usage but the community is IMHO more invested in making it a good Linux citizen.

On the other hand: This stuff is complicated and everyone expects miracles. I'm just looking at it from sysadmin perspective and on Linux both suck at the moment. But ZFS won't eat your data and has far better tooling.

If you need something that works for high load on Linux I'd use neither.


This isn't ZFS in Fuse - it is a native kernel module.

BTRFS has yet to reach the stability of ZFS.


I wonder if the licensing of the ZFS kernel module will cause any issues with its inclusion in Ubuntu?


Canonical's lawyers OK'd it which is why they are including it.


Weird. What changed in the last 10+ years?


Nothing. It's the same as nvidia's non-gpl kernel modules. The simple fact is they will never be accepted upstream, but that matters little to distributors of ubuntu's scale.


In the case of Nvidia's modules, Nvidia's proprietary licensing disallows distribution of a prebuilt nvidia.ko (as that implies distributing a modified version). Coincidentally, their license terms for the OpenSolaris driver have no such restriction and the OpenSolaris descendants distribute the prebuilt module without potentially violating Nvidia's license terms.

Amazingly, their Linux licensing used to be worse. They used to claim you were only permitted to install the driver on one computer within an organization


Someone hired better lawyers. All these ridiculous EULA and click through licences and idiotic mandatory registration systems we see, I can't help think many companies would benefit from hiring better lawyers. Get rid of the timid who default to 'no' in order to protect their own arse, hire people who help you get where you want to go.

In this case, I can't even see any real liability issues - even if Canonical did get taken to court there are no damages since the software is free of charge.


> no damages since the software is free of charge

Many people can and do charge a lot of money for OSS products.


People decided to listen to lawyers who read the licenses instead of listening to statements by people claiming to know how things work without actually reading either license or asking a lawyer about it. That is quite literally the only change.


Didn't Sun use lawyers to design the CDDL to precisely prevent this situation?


No, I've seen talks by the engineers behind Solaris (I don't recall who at the moment) that strongly indicated the Sun lawyers didn't go out of their way to be incompatible with the GPL -- they just wanted a license that allowed them to split proprietary and open code (as they didn't have the right to open up all of Solaris due to licensing agreements with third parties etc) -- and still being able to distribute both open and traditional closed Solaris. This led to the "per file" license nature of the CDDL -- and unfortunately to the "additional limitation"-bit that makes it incompatible with the GPL.

If it was done again today, they might have gone for the Apache license as I recall -- and avoided some of the unfortunate issues.


Pretty sure Canonical saw an opportunity with container management, plus interest in ZFS, decided to get over the unfortunate licensing issue and support the module.


I check in on btrfs every 6-12 months or so. To date it has always seemed too unreliable compared to ZFS. Lack of decent RAID 5/6 support is another major difference.


Ubuntu 16.04 will have ZFS support in the kernel, via a kernel module. They are not using FUSE.


I have been using btrfs for a couple of years. The nicest thing I can say about it is that it is both conceptually elegant and has made my backup practises bulletproof.


ZFS on Linux runs natively though. This is not the fuse version.


Granted, it's been a couple of years, but I tried BTRFS when openSUSE made it their default and I had filesystem issues that I've never had on anything else. I'm sure it's progressed a lot since then, but I expect it will be behind ZFS for a long time in terms of stability.


SUSE seems to recommend XFS for production data and btrfs for the rootfs that is a read mostly workload that is unlikely to trigger ENOSPC. It is not what I would consider a great endorsement of btrfs.


ZFS on Linux (i.e. kernel module, not with fuse) is very stable in my experience. I have not heard the same about btrfs. I tried ZFS on fuse once, but the performance was abysmal.


ZFSOnLinux does not use FUSE.


Apple should replace the abomination that is HFS+. They almost did a few years ago, but pulled the code at the last moment.


Apple's choice to drop ZFS may have been influenced by the lawsuit Sun was dealing with at the time.

http://www.computerworld.com/article/2539287/data-center/app...

I'd wager they wouldn't want to adopt ZFS without an explicit license and legal indemnity from Oracle.


I heard from a former Sun executive that Apple wanted indemnification following netapp's lawsuit. They had spent a long time negotiating over that before they had something mutually acceptable and it was supposed to be signed the day Oracle's acquisition of Sun finished. That left it up to Larry Elison, who refused to sign it and Apple decided to try its luck improving HFS+.


They could adopt btrfs. Darwin is open source, right?


BTRFS is still not mature, and there's a license incompatibility as well. And it's controlled by Oracle all the same. If Apple replaces the filesystem they'll likely roll their own.


btrfs is not controlled by Oracle for one (its principal developers are employed by Facebook, but its still regular Linux GPLv2 code) but I did check and the APL is incompatible with the GPL.

And they obviously aren't going to make a new filesystem. That doesn't get them sales like higher resolution screens or changing the color theme... again.


And they obviously aren't going to make a new filesystem. That doesn't get them sales like higher resolution screens or changing the color theme... again.

Introducing a new filesystem would be a big decision for Apple. There would doubtless be all sorts of migration and compatibility issue, even aside from the work it would take. Especially given where we are in the maturity of desktop clients, it makes a lot more sense to incrementally improve the current filesystem. I'm not sure how snarky you intended to be, but no there aren't many sales in a complex undertaking that is far more likely to cause data corruption and migration issues than concrete benefits for 99.9% of users.


Looks like you're right. Oracle used to control it (as evidenced by (c) Oracle all over the place), but that doesn't seem to be the case anymore.


I'm not sure it's accurate to say it was "controlled" by Oracle but a lot of--though certainly not all--active development came out of Oracle at one point, notably by Chris Mason (who is now with Facebook).


Whoever employs the core team, controls the project. For a while that was Oracle.


Dragonfly BSD's HAMMER2, when it is even half done (that is, stable for one node), is probably a much better (technical) choice than BTRFS or improving HFS+, and probably a much better legal choice than ZFS.


Sorry in advance if this is a stupid question: my main Linux system is a laptop with a small SSD drive. I would like to organize my entire digital life on a 2 TB external USB drive, and be able to maintain a clone of everything on at least one other 2 TB USB drive.

Is ZFS the right tool for this?


I read some time ago that ZFS is definitely NOT the right tool for laptop/external storage, unless you actually have a zpool with mirroring/raidz (which means you have to always keep the devices connected).

The reason is that when ZFS detects corruption, it'll lock down the whole fs... and prevent reading/recovery data from it, as recovering data from raidz is the expected solution in that case.

I tried to google again for the description of this issue, but I couldn't find it... I found this otoh:

https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs...

And things aren't that obvious, apparently:

> Even without redundancy and "zfs set copies", ZFS stores two copies of all metadata far apart on the disk, and three copies of really important zpool wide metadata.

Whichn means that this might not actually be a problem after all


> The reason is that when ZFS detects corruption, it'll lock down the whole fs... and prevent reading/recovery data from it, as recovering data from raidz is the expected solution in that case.

ZFS has duplicate metadata by default, so it can recover from corrupted metadata blocks unless too much is gone. If the data blocks are corrupted and there is no resundancy, you should get EIO. There is no code to "lock down the FS", although if you have severe damage (like zeroing all copies of important metadata or losing all members of a mirror), it will die and you will see faulted status. That is a situation no storage stack can survive and is why people should have backups.


> The reason is that when ZFS detects corruption, it'll lock down the whole fs... and prevent reading/recovery data from it

Depends on what exactly is corrupt, but for file corruption it's generally just a case of warnings in logs/zpool status (which will suggest restoring the file from backup), and IO errors trying to access that specific file. The pool itself should remain intact and online.

It's less clear cut if it's important metadata that's damaged, but as you mention, ZFS is quite aggressive about maintaining multiple copies even on standalone devices.


I have backups stored on a double mirror of USB drives. The USB interface is fragile, but it does work. I cannot say that I recommend USB drives though, but if you are using USB, ZFS is not at any disadvantage versus other filesystems.

For what it is worth, I am this ryao:

https://github.com/zfsonlinux/zfs/graphs/contributors


If you're talking about a "static" setup where you attach both at the same time or not at all, yes. ZFS export before unplugging, ZFS import when plugged in, I can see it working very nicely.

If you're talking about using one of them most of the time and syncing occasionally then any filesystem will do, you'll want a user-level tool for doing the sync (probably - sibling did mention zfs send which I don't have any experience with).


It's a tool that could work. You can use zfs send/receive to clone a zfs filesystem or snapshot.

You could also use rsync, duplicity, or a bunch of other tools.

The major zfs advantage here is that all your files get integrity checking.


Let's see how this works out. It's probably better and more stable than btrfs but this is not complicated...

ZFS on Linux had issues with ARC (especially fast reclaim) and some deadlocks and AFAIK cgroups are not really supported - e.g. blkio throttling does not work.

Would be great is they got this ironed out but I would be wary. Still great news!


Additional problem is that in-kernel latencies of both btrfs and ZFS are on the high end. Essentially a show stopper for professional audio work and maybe some kinds of video streaming. Trying to completely escape disk IO in those uses is very limiting.

A comparable solution using LVM and/or mdraid with ext4 on top has much better latency behavior.

Sorry for no benches for you, but feel free to run a quick check using latencytop and ftrace. Phoronix has some performance comparisons if you want them.


> Essentially a show stopper for professional audio work (...). Trying to completely escape disk IO in those uses is very limiting.

Could you expand on that?

I mean, an hour of mono uncompressed 192 kHz/24bit audio is almost exactly 2 GB. Compared to professional audio equipment, 128 GB of RAM isn't very expensive ( < $2000), and that would let you keep 64 one-hour maximum-def tracks in memory. Why do you need to read from the disk with any frequency?


OpenZFS, not ZFS. Those are different beasts now.


Worth pointing out the licence is the same for both.


Not at all. ZFS is not even FOSS anymore.


Fair enough.


This is great news. Among other incentives, ZFS has some truly excellent features for improving reliability. ZFS's built-in checksums, for example, can result in much happier stories during the onset of disk failures: where a RAID array can quietly return incorrect sector contents without noticing, and be unable to correctly differentiate between the correct and not-so-correct sectors in the event of disk loss followed by disagreements discovered during rebuilds, ZFS simply does the right thing by making checks during normal operations, and uses the same checks to confidently do the right thing during recovery. And snapshotting. Oh, snapshotting.

On the other hand, I've always wished we could get a modern re-take on ZFS. As anyone who's tried it will tell you: dedup in ZFS essentially doesn't work. ZFS, internally, is not built on content-addressable storage (or, it is, but since splitting of large files into blocks doesn't take any special actions to make similar blocks align perfectly, it doesn't have anywhere near the punch that it should). As a result, dedup operations that should be constant-time and zero memory overhead... aren't. Amazing though ZFS is, we've learned a lot about designing distributed and CAS storage since that groundwork was laid in ZFS. A new system that gets this right at heart would be monumental.

Transporting snapshots (e.g. to other systems for backups... or to "resume" them (think pairing with CRIU containers)) could similarly be so much more powerful if only ZFS (or subsequent systems) can get content-addressable right on the same level that e.g. git does. `zfs send` can transport snapshots across the network to other storage pools -- amazing, right? It even has an incremental mode -- magic! In theory, this should be just like `git push` and `git fetch`: I should even be able to have, say n=3 machines, and have them all push snapshots of their filesystems to each other, and it should all dedup, right? And yet... as far as I can tell [1], the entire system is a footgun. Many operations break the ability receive incremental updates; if you thought you could make things topology agnostic... Mmm, may the force be with you.

[1] https://gist.github.com/heavenlyhash/109b0b18df65579b498b -- These were my research notes on what kind of snapshot operations work, how they transport, etc. If you try to build anything using zfs send/recv, you may find these useful... and if anyone can find a shuffle of these commands with better outcomes I'd love to hear about it of course :)


The deduplication code works, but each deduplication operation requires 3 serial IOS to lookup the information needed to check if deduplication is possible and if they are in aches, that becomes painful fast on storage with low IOPS. On my workstation where I have enough memory that the results of all of the lookups naturally fit and high IOPS storage, the deduplication code runs well. You would have a similar problem designing a system that perfectly deduplicates data at the record level if you tried.


I was thinking about this. To reduce both the huge ram usage and serial IOs you could use something similar to a Bloom filter to quickly test whether you should attempt to dedup a new block. If the filter says it's not a duplicate, then completely skip the standard (slow) dedup path.

Bloom filters specifically have issues: they don't permit removing entries for one, and they're not really that efficient. But there's a paper about Cuckoo Filters which seems to solve both of these problems. For example:

    The "semi-sort" variant of the cuckoo filter benchmarked in the paper has a size of 192 MB and holds 128M items.
    So for 8kb blocks, it can dedup 1TB of blocks. More if you increase the block size or the size of the table.
    It has a 0.09% false-positive rate (!). I.e. unique blocks would use the slow path to test for duplication in vain only once in 1111 writes.
    The algorithm can perform 6 million lookups per second on the benchmark hardware. (2x Xeons at 2.27GHz, 12MB L3, 32 GB DRAM)
This is assuming that the majority of writes are actually unique, and dedup is more of a "it would be nice" thing than essential. But for that case something like this would be a lot easier to implement and use far fewer resources. Just stick it in front of the existing dedup lookup and early-exit if the filter says it's not duplicate.

[0]: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf


Note that ZFS isn't magic. Even with ZFS's checksums on read, you should still be doing regular scrubs, just like you should be with LVM or btrfs. And once you have regular scrubs, checksums on read don't really add much.


Agreed, turning on more safety features will always make you... safer :)

But it's worth noting that I've debugged corruptions in prod systems where:

- corrupted data was read from disk -- a bit flip, with no error code at the time -- by an application

- the application operated on it

- and the application then wrote the result -- still carrying the bit flip -- back to a new file on disk.

Ouch. The bitflip is now baked in and even looks like a legit block as far as the disk is concerned. The disk failed not long after, of course -- SMART status caught up, etc. But that was days later.

Checksums on read address this. I never want to run a system without them again.


I don't understand. If the bit was flipped before or during read the scrub would catch it. If it was flipped by the application then no file system can help. How does read checksums help you?


There is still the chance that data get corrupted between the time the scrub is performed and the time you read the data, so I don't consider scrubs sufficient. In any case you are right, they should be performed even with ZFS, expecially to test data that is rarely or never read back.


Sure, there can be one error between scrub and read. But assuming RAID, you need errors on two disks. That can happen in a week or however often you scrub, but that's going to be pretty low probability.


You assume that your RAID implementation is going to actually read all of its parity bits from each of the disks, and check them for agreement, before returning a value to you.

You may want to validate that assumption. :)

(Prepare for an unpleasant surprise.)


Anyone know of Red Hat has plans to include it in default installations?


Opinion: Red Hat won't because it has better lawyers.


... and billions more dollars at stake.


And what about machines without ECC RAM? I thought this is the idea for using ZFS in the first place. Or is the ECC "requirement" only important for raidz?


The whole ECC is required by ZFS is a bit misunderstanding.

ZFS puts guarantees that your data will be safe, but if has no power to help you if your data gets corrupted in memory. The ECC is the last piece needed to guarantee data safety.

So if you don't have ECC your data is still safer with ZFS than traditional file-systems, ECC just increases the safety further.


How does corrupted memory affect ZFS's performance? Much of the replication state is stored in memory; is it possible you could lose data from a single bit being flipped?


AFAIUI, yes. If a single bit in the root's checksum is flipped just before being written to disk you'll probably lose the whole file system.



Could you be more specific? You seem to just be linking to random posts about ECC vs. non-ECC. I don't see anything specifically there about the root of the file system.

(I'll happily grant that this scenario is so unlikely as to be impossible for all practical purposes, but having skimmed the stuff you linked to I don't see why it couldn't happen theoretically.)


If I were creative, I could probably kill any filesystem with 1 bit flip if I could control the bit.


Without ECC RAM, you're far more likely to get uncorrectable / unnoticeable corruption.

This is not unique to ZFS, and it doesn't make ZFS worse than other filesystems. But since the reason you'd use ZFS is often to avoid any corruption, it's tradition to advise the use of ECC.


I learned this the hard way on an old server that did not have ECC.

I had a file server happily ticking away using ext4.

Converted it to ZFS - and a week later got file system corruption reported. Ran a very extensive memory test - and sure enough I had bad RAM (but it took 2-3 days for the errors to show up).

In the wild there has to be a ton of corruption that just never gets discovered without end to end checking.

If you have a large jpgs or MKVs - a flipped bit here or there is not going to be apparent.


Why would ZFS be worse than ext4 or anything in this way if you don't have ECC?

Genuine question, I don't understand this claim. As far as I can see, ZFS provides protection against some types of failures on disk, which ext4 doesn't. ECC has no impact on that, it protects another dimension.


I read this as they've been having corruption happening all along, but only ZFS detected it?


Yes - exactly.

To be clear, ext4 did not cause the corruption - but neither did it detect it or correct it. It happily sent corrupt blocks out to disk.

I think this happens a lot more frequently than people realize.


One reason is that you generally perform scrubbing, thus you are potentially rewriting data which would otherwise be at rest. If your memory is bad, this could replace good data with bad. FS that doesn't scrub doesn't have this issue.


No it is worse with ZFS because it doesn't have a fsck tool. If you have a bit flip in the ZFS metadata you have to export and re-import your whole pool to get it to a writable state again.


Meanwhile fsck on a traditional filesystem will gleefully mangle data that's actually fine in face of transient corruptions.

I've had this happen more than once, both with bad RAM and bad IO controllers - previously fine static data suddenly being detached from the filesystem and appearing in little bits in lost+found, because bit flips effectively causes it to hallucinate problems to "fix".


This is wrong. But flips can do many different things:

http://open-zfs.org/w/index.php?title=Hardware&mobileaction=...

Export/import is not a solution to most of those. In the cases where it does work, umount/mount is likely all that is needed.


http://open-zfs.org/wiki/Hardware#Background

This would seem to indicate that this sort of thing is exceedingly rare and similar issues would have similar effects on other filesystems.


Resilvering (which is basically a global data verify, similar to fsck) will fix bits flipped the wrong way via error correction, assuming you've set ZFS up that way. Are you saying this doesn't apply to the metadata?


A simple scrub will repair blocks that have checksum failures, but there's is no guarantee that the checksum was calculated before the bit flipped if it occurred in a buffer being written.


What is scrubbing and resilvering if not a fsck equivalent?


Scrubbing corrects on disk bit flips. An in-memory bit flip (which is more rare than on disk bit flips even with non-ECC memory) can corrupt a in-memory data structure which is later written to disk to all replicas, i.e. scrub will not detect it. If this corrupted data structure is later loaded and used this may cause all kinds of problems and there is no tooling to correct it.


Scrub?


So is ZFS more prone to defects without ECC than let's say EXT4 or NTFS?

I would like to buy a notebook with ECC ram, but Intel doesn't care.


No, no one is saying that ZFS is more prone to defects without ECC. Lack of ECC increases the risk of corruption for any filesystem. The reason you hear more about ECC in the context of ZFS is that data integrity is a key feature for many who choose to use ZFS.


I don't understand why this is so often misunderstood.

The parent poster already stated the opposite.

ECC and ZFS are orthogonal. ECC ensures that data in your RAM is not corrupted (or rather detects corruption) it helps whether you use ZFS, EXT4, NTFS etc.

ZFS increases your data safety whether you use ECC or not, but if you have to have maximum assurance that data is fine you should use ZFS and ECC.


As far as I can tell, the ECC/ZFS paranoia traces back to "cyberjock" on the FreeNAS forums. https://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-...


This is correct. He considers ECC so important that he is willing to spread FUD about non-ECC behavior to try to scare people into using ECC. I think the truth is scary enough to convince people, but he does not agree.


No, all are equally prone to errors if a bit in RAM is unexpectedly flipped. My understanding is that ZFS requires more RAM and possibly more CPU than other filesystems and those costs aren't worth it if you're going to use RAM that can't detect errors anyway.


To answer your question: No.

They are all just as prone to defects.


They do. There are now Xeon E3 notebooks that have ECC. It's something new they added with Skylake for that reason. They're designed as mobile workstations, so you're gonna be looking at at least $1700 for a laptop, but good quality's always expensive.


ECC is just as important for EXT4 and NTFS as it is for ZFS.


http://blog.codinghorror.com/to-ecc-or-not-to-ecc/

At least Jeff doesn't believe much in ECC anymore.


ECC is a prequisite for any filesystem to behave as intended. ZFS needs ECC just as much as other filesystem does. e.g. ext4, XFS, etc.


Oracle could put this whole legal wrangling to rest by changing the license to GPL.

I am dreaming I suppose.


They don't own the copyright on the patches, so it wouldn't exactly be effective.


They might not own the openzfs patches but they definitely own all of the zfs code. They had all contributors sign CLAs to reassign copyrights. That's how they were able to end opensolaris.


I'm pretty sure ZFS as last released is not as valuable without the OpenZFS contributions.


This is correct.


From my understanding, Sun back in the day, and Oracle now, cannot release everything under the GPL due to contractual obligations. There's a reason they wrote the CDDL in the first place.

I can't find the talks now, but I believe Cantrill and others have spoken about this previously.

My memory is somewhat fuzzy, so I might be wrong on this.


That was Solaris and that was regarding putting all of it under an OSS license, not necessarily the GPL. There were a few tiny bits that they just could not open source.

Oracle could release their fork of ZFS under any license they wish.


I'm pretty surprised we haven't seen an official Oracle version of ZFS released with Oracle Linux.


I think there was an internal schism at Oracle regarding BTRFS vs. ZFS.

The Solaris fans (generally from Sun) preferring / promoting ZFS, and hard core Linux fans (mostly Oracle) - who tended towards BTRFS.


That's right, I forgot, btrfs began as an Oracle project didn't it?


I'm a conservative user, so I don't change my filesystem until my preferred distribution (Ubuntu) supports installing an FS to the root with a provided, supported kernel module. This is a huge deal for me; I will probably install a new FS on my main file server and move from ext4 to zfs.


In case you ever feel less conservative, here's how I set up my MacBook Pro to dual boot Mac OS X/FileVault and Ubuntu/ZFS/LUKS:

https://gist.github.com/xenophonf/2e2d1a1550b0fb8dae98


ZFS is great, happy its included in Ubuntu 16.04 by default.


Anyone know if this applies for Lubuntu as well? I use Lubuntu on my rencently bought new desktop for the default of LXDE. I intend to upgrade my desktop on which I put Lubuntu 15.10 (I bought a computer without OS so as not to pay the Windows tax) to Lubuntu 16.04 because I understand it'll be based on LXQt, the successor to LXDE (and it's not a matter of newer is better -- I am a fan of Qt) and also because I think Lubuntu 16.04 will be an LTS release and I've been very happy with the stability of Lubuntu 14.04 LTS which my previous main computer was and is running.


Yes, all Ubuntu derivatives use the same repository so you should be set.


Thanks :)


Does this include a fix for this bug?

https://github.com/zfsonlinux/zfs/issues/1548


Check if this works for your case:

echo > myfile

P.S. not the "solution", but may help in case you fill the disk by accident, and need to make some room before a remount cycle. P.S.2. this could help also when your disk is already 100% full, without enough space for deleting files -not enough space for new inodes- (I tested that case on ZFS NAS with no left space at all, and worked).


Huh, well that's interesting news.

I'm currently setting up a couple servers using LXC with btrfs.

I ending up choosing LXC (as opposed to LXD, docker, rkt, etc.) because I wanted something relatively straight-forward. I just wanted some containers I could create, log in to and configure.

If this was a bigger deployment, I'd take the time to use docker or something else. But for now, just being able to get going quickly is nice. For backup / failover, I can btrfs send / receive the containers to another host and start them there.


Yeah, thats all fine and good. Nothing to announce with lxc/lxd + btrfs because it already works fine :) I do like the wizard to easily setup the zfs backend, rather than you needing to manually replace /var/lib/lxc with a btrfs partition or however you are doing it.

I've been using lxc + btrfs daily for quite a while, setting up and tearing down hundreds of containers on a busy day. I stopped using lxc snapshots after I had a btfs subvolume that would crash the system when I mounted it. After that, no problems.


I stopped using lxc snapshots after I had a btfs subvolume that would crash the system when I mounted it. After that, no problems.

That's unfortunate. What operating system version were you using at the time?

I've actually switched to using Ubuntu 15.10 for the container hosts so that I can get a more recent version of the btrfs tools. The intention is to upgrade to 16.04 as soon as is reasonable, and leave them there for a long time.


BTRFS uses less memory so I prefer BTRFS to ZFS for containers.


How did you determine that? The two should use about the same amount of memory. The only diffence is that ZFS uses ARC and btrfs uses the page cache. ARC is not reported the same way as the page cache though, which might give the appearance of requiring more.


This post only mentions LXD. Will ZFS be used for Docker?


Whether docker in aptitude will be configured to use zfs by default is impossible to guess. It is however possible to say that ZFS is already supported by docker, e.g.: https://docs.docker.com/engine/userguide/storagedriver/zfs-d...


Docker switched to Alpine Linux by default, so there's that hurdle... Not to mention the giant legal question mark of loading CDDL code into a GPLv2 kernel.


One of the ideas of containers / virtualization is that the host operating system (Ubuntu, in this case) has as little to do as possible with the VM / container (Alpine, in this case).

Running an Alpine container using Docker on an Ubuntu host will work just fine.


That's for the container package management system, not the host storage system.


I wonder how this would work under shared storage model for container clusters (eg. Mesos). NFS does not cut it for all use cases (eg. DBs).


For those who missed it, Debian Project Leader Neil McGovern gives details [1] about how licensing issues were resolved so that ZFS can be in Debian now. It is distributed as source-only dkms module.

[1]: http://blog.halon.org.uk/2016/01/on-zfs-in-debian/


Though that's not relevant to this news -- Ubuntu isn't using DKMS, they put it directly into their kernel tree, violating GPLv2.


This appears to be inaccurate. They appear to be distributing it as a binary module, outside of the kernel tree, just like many other binary-only kernel modules.

See zfs.ko here: http://blog.dustinkirkland.com/2016/02/zfs-is-fs-for-contain...

"However, there is nothing in either license that prevents distributing it in the form of a binary module or in the form of source code.":

http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

The binary kernel image is a separate issue from the kernel packages (deb); they could include multiple files in a kernel package (deb) that are licensed under different licenses.


Regardless of how the binary is shipped (which could be legal), their aggregation of the source is likely not, since it almost certainly is a derivative work at that point. The fact they have a public git repo where the two codebases touch is probably enough to bait a lawsuit from Oracle, in that they're distributing CDDL code in a way that is against its license.

There's a reason LLNL developed their branch out-of-tree - it's just not worth the legal headaches to aggregate the source like Canonical just did.


The CDDL does not prevent ZFS from being used in Linux. It's the GPL that prevents using CDDL code in the kernel. Oracle doesn't have any grounds to sue based on their IP. They would only be able to sue on behalf of Linux, and violation of the GPL. Although I wouldn't rule that out, it's somewhat less likely to happen.

On the other hand, I suspect RMS isn't too happy with this turn of events. The sfconservancy may be the more likely party to bring a lawsuit. I'm curious to see either of them comment on the situation.


The GPL doesn't even really prevent it, the only clause is a "derived work" which is quite a stretch that a court would find a module to be such a thing.


How would you explain to a non-technical person that a kernel module is not a derived work?

Can you remove the Linux kernel and still have a complete and working program? What happen if one removed all function calls to the Linux kernel or use of internal kernel variables? As a module, does it work with any other kernels like windows or apple, and what was the programmers intention when writing it?

There is some arguments in favor of fair use in regard to compatibility, where derived works are infringements but still deemed legal. The courts has historically been rather split on this subject when it comes to software, in particular with several cases voting in favor of unlicensed modules to consoles. It would be quite a big bet either way you vote.


If I take a chapter of a textbook, modify it to be a standalone volume in a collection of books and start distributing it, I am distributing a derived work of the original book, not a derived work of the collection of books. The latter constitutes an aggregation and unless there is some license (superseding doctrine of first sale in the case of books) that prevents it from being redistributed with such things, it is perfectly okay to do that.

Similarly, the original code was taken from OpenSolaris and was adapted for Linux. No matter how we change it, it is a derived work of Solaris. Furthermore, it is distributed as part of a mere aggregation, which is okay with OSS under the OSD and also okay with the GPL under the GPL FAQ. The only time you can claim a combined work is formed is when the module is loaded into a running kernel, but the GPL does not restrict non-distribution and the kernel with the module loaded into it is not being distributed.

As for removing it from the Linux kernel, given that it is an entire storage stack between the block layer and VFS, you would need to replace everything there (including the disk format), but yes, you would have a working system.

As for all calls to Linux kernel symbols, those are provided to LKM so that they can function and they cannot function without it. There are symbols not provided at all, symbols provided only to GPL software and symbols provided to everyone. ZFS only uses the last group, which is intended for use by non-GPL software.

You can design software to load a LKM from an arbitrary. FreeBSD had done that with Windows kernel modules for wireless drivers at one point. Wine does that for certain Windows drivers that do copy protection. There is nothing stopping you from creating a kernel under a difference license that loads modules in the LKM format of a given Linux kernel. Although the usual case is to port the code to another kernel's own LKM implementation. Attorneys with whom I (and apparently Canonical too) have spoken think this is okay.


> Can you remove the Linux kernel and still have a complete and working program? What happen if one removed all function calls to the Linux kernel or use of internal kernel variables? As a module, does it work with any other kernels like windows or apple, and what was the programmers intention when writing it?

ZFS was developed on another operating system, Solaris back in the early 2000's and continues to be actively developed on illumos, FreeBSD, OSX and Linux today. However the bulk of new code seems to come from the illumos and FreeBSD communities. ZFS also runs in userspace to allow for easier testing and development. So if you remove Linux you still have a working program, ie it's a working kernel module for illumos, FreeBSD and Mac OSX, as well as a userspace program.

As for the intentions of Jeff Bonwick and Matt Ahrens, it was to make administration of file systems much easier. The video posted below is about the history of ZFS and is presented by one of the creators. The first person talking is the other founder of ZFS.

Birth of ZFS. https://www.youtube.com/watch?v=dcV2PaMTAJ4


> How would you explain to a non-technical person that a kernel module is not a derived work?

GPLv2 does not use the term "derived work" anywhere. It uses "work [...] derived from the Program", and does not define this term [1].

I'd start out by explaining that before we even get to the question of whether or not the module is a "work [...] derived from the Program", we have to ask the question of whether or not the license even applies. GPLv2 only applies if the module does something that requires permission under copyright law. The copyright law question that needs to be asked is whether or not the module is a "derivative work" of the kernel.

> Can you remove the Linux kernel and still have a complete and working program? What happen if one removed all function calls to the Linux kernel or use of internal kernel variables? As a module, does it work with any other kernels like windows or apple, and what was the programmers intention when writing it?

None of these questions are actually relevant to the copyright law question of whether or not it is a derivative work. They are relevant to the question of whether or not it is useful when not used in conjunction with a Linux kernel but that's not a copyright law question.

To answer the copyright law question of whether or not some program P [2] is a derivative work of some other program Q, you only need to look at the source code to P and Q. If P and Q interact with each other (unilaterally or bilaterally, directly or indirectly) some people get hung up on the mechanism of that interaction, but that's not relevant to the question of whether or not P is a derivative work.

Whether or not a program P that uses function names, function argument ordering, and data structures of program Q, but does not copy algorithmic code from Q, is a derivative work of Q is going to essentially come down to whether or not the interface (I'm including data structures as part of the interface) of Q is copyrightable.

If program interfaces are copyrightable, then programs that interact with other programs will be derivative works of those programs, regardless of whether they interface by static linking, dynamic linking, system calls from a user process P to kernel code Q, IPC from process P to process Q, RPC from process P across a network to process Q on another machine and so on.

If program interfaces are not copyrightable, then as long as all P incorporates from Q are interfaces P won't be a derivative work.

Generally, courts have held that program interfaces are not copyrightable (with the notable exception of the Court of Appeals for the Federal Circuit in the Oracle vs. Google case, which does not set copyright precedent).

Thus we arrive at the major question for kernel modules: what copyrightable kernel elements do they incorporate?

If they just incorporate non-copyrightable interfaces then a kernel module would not be a derivative work of the kernel.

That's not the end of the inquiry though. It would be if some third party were making and distributing the module. E.g., if I were to write a kernel module that does not incorporate any copyrightable kernel elements and distribute it stand alone, for others to download if they want and use it with their kernels, we'd be done.

In the case of a distribution vendor distributing a kernel module along with a kernel, then even though the module itself might not be a derivative work their distribution as a whole is. Questions might arise as to just what constitutes a "work". If they statically link the module to the kernel, the resulting binary is clearly a work, and it is a derivative work of both the kernel and the module, and so the module would have to be GPL. It is important to note in this case that this is because the combined work is a derivative work of the kernel...the module itself is still not a derivative work of the kernel.

How about if the module is dynamically linked, but the configuration they ship automatically loads it at boot time? Might one argue that the kernel, init scripts, and dynamic modules together are all one work that the vendor is distributing?

[1] For completeness, GPLv3 does not use "derive" or "derived" or any similar terms it all. It uses the term "covered work", which is defined as the original program or a "work based on the Program", and it defines that as basically a work that requires copyright permission.

[2] I'm going to use the term "program" expansively to include modules, applications, plug-ins, and so on.


The CDDL was crafted with GPL existing and was according to the person responsible for creating it, deliberately made incompatible with GPLv2. Of course not hard to understand given that Sun had no reason whatsoever to hand over their prized technology (ZFS, DTrace) to the competitor which was killing them in the market.

>On the other hand, I suspect RMS isn't too happy with this turn of events.

Why not ?

>The sfconservancy may be the more likely party to bring a lawsuit.

That would require that a Linux copyright holder would want to sue, why would they ? OpenZFS is open source, and previous suits has been about source code compliance.


It is probably more accurate to claim the GPL was designed to be incompatible with an entire class of licenses that includes the CDDL, and the MPL on which it was based and any future licenses similar to or based on licenses in that class (of which the CDDL was given that it was made after the GPL).

There is no clause in the CDDL that places restrictions on other files in a combined work, but there is one the GPL. There are people out there who dislike the GPL for that, there are some people who explicitly go out of their way to avoid GPL compatibility because of that and I am sure that some of those people existed at Sun, but I really doubt that the design of a license by a huge organization with many people giving input can be simplified to one guy thinking GPL incompatibility is a good feature.

I also think this happened years ago and there really is no point to living in the past. People cannot distribute a vmlinux file with ZFS linked into it (i.e. not a kernel module, but part of the binary itself) because of that, but that does not stop people from distributing it as a kernel module and that is how filesystem code is loaded these days, so it is a non-issue.


>It is probably more accurate to claim the GPL was designed to be incompatible with an entire class of licenses that includes the CDDL,

It was designed to give and preserve rights for end users, it's not really a big mystery, and the actual rights which are given and preserved perfectly mirror that.

I don't see anything that would substantiate your claim of them being 'deliberately' incompatible with any other licenses (anything you can point to ?), in fact they've fixed incompability problems in GPLv3 with other licenses.

And of course both MPL and CDDL came along much later than GPLv2, with which they were incompatible (MPL 2.0 in turn rectified this).

>can be simplified to one guy thinking GPL incompatibility is a good feature.

No, I don't think for a second that it was 'one guy', again Sun management had absolutely zero reason to allow Linux to incorporate ZFS and DTrace and every business reason not to, in fact from a business standpoint it would have been crazy to hand over ZFS and DTrace to their main competitor.

>but that does not stop people from distributing it as a kernel module and that is how filesystem code is loaded these days, so it is a non-issue.

I'm not at all sure it's a non-issue, this is a Linux kernel module running in Linux kernel space, I'm pretty sure there is a strong case for this being considered a derivative, that said I hope it won't be an issue since having ZFS in a native capacity with minimal effort is a boon for Linux.


Do you have a bar number? If not, you being "pretty sure" does not mean much.


> It is probably more accurate to claim the GPL was designed to be incompatible with an entire class of licenses that includes the CDDL, and the MPL on which it was based and any future licenses similar to or based on licenses in that class (of which the CDDL was given that it was made after the GPL).

Given that work was done to make GPLv3 more compatible with other open source licenses and that GPLv2 predates both of the licenses you mention by quite a bit I'm inclined to think that's nonsense.


If being compatible with anything were the goal, the FSF would have opted for the CC0 license. Since the GPL is not compatible with things on that level, it is designed to be incompatible with certain things. Some subset of possible open source licenses definitely were excluded as part of that.


No, Ubuntu may have put the ZFS source in the kernel tree, but they still ship it to endusers as a separate kernel module and separate Ubuntu package (edit: the separate package is "zfsutils-linux" for the userspace code)

AFAIK to violate the GPL they would have to ship ZFS compiled code in the kernel image, but this is not what they are doing.


Whether something is a derivative work or not is incredibly unlikely to be determined by whether it's distributed as a module or linked in.


For at least a decade people have been using that route and have had (at least internal) legal opinions supporting it.


If I take a chapter of a textbook, modify it to be a standalone volume in a collection of books and start distributing it, I am distributing a derived work of the original book, not a derived work of the collection of books. The latter constitutes an aggregation and unless there is some license (superseding doctrine of first sale in the case of books) that prevents it from being redistributed with such things, it is perfectly okay to do that.

Similarly, the original code was taken from OpenSolaris and was adapted for Linux. No matter how we change it, it is a derived work of Solaris. Furthermore, it is distributed as part of a mere aggregation, which is okay with OSS under the OSD and also okay with the GPL under the GPL FAQ. The only time you can claim a combined work is formed is when the module is loaded into a running kernel, but the GPL does not restrict non-distribution and the kernel with the module loaded into it is not being distributed.

You can argue that GPL advocates did not intend to support a license that allows any of this. However, I expect that you would have trouble finding an attorney that will interpret what the copyright holder thought the terms said to supersede the legal meaning of the terms unless explicitly stated.

If you make a license for the kernel that does not allow derived works of other platforms' software to be distributed as ports, you would violate #9 of the OSD and could not call it an open source license:

https://opensource.org/osd-annotated


If you take the plot from an episode of Star Trek and modify it such that it fits into the Dr Who storyline, you've created a work that's derivative of both Star Trek and Dr Who. Similarly, if you take code from Solaris and modify it such that it tightly integrates with Linux, you've created a work that's derivative of both Solaris and Linux. Since ZFS can only be distributed under the CDDL and since GPLv2 requires all derived works to be distributed under the GPL, you can't satisfy the license.


> If you take the plot from an episode of Star Trek and modify it such that it fits into the Dr Who storyline, you've created a work that's derivative of both Star Trek and Dr Who. Similarly, if you take code from Solaris and modify it such that it tightly integrates with Linux, you've created a work that's derivative of both Solaris and Linux. Since ZFS can only be distributed under the CDDL and since GPLv2 requires all derived works to be distributed under the GPL, you can't satisfy the license.

That is analogous to writing a new piece of software intended to be similar to an existing piece of software rather than a port of software under license. Examples of the former include the Linux kernel (meant to be similar to UNIX SVR4) and the wine project (meant to be similar to Windows). If that argument is valid:

1. Oracle is in an excellent position to sue every Linux user not using Oracle Linux, because they own rights to UNIX SVR4, which they inherited from Sun.

2. Microsoft is in an excellent position to sue wine users.

3. James Cameron and 20th century Fox would also be in trouble with Disney for Avatar's similarities to Pocahontas.

4. Probably plenty of other bad things.

However, this argument does not apply to ZoL because the code originated in OpenSolaris and is under license and exists as a discrete module, rather than a whole program.

So far, the only thing that you have concretely stated is that you met some attorneys who were unwilling to make a decision on legality. You are not an attorney (unless you have obtained a bar number since I last asked) and I have yet to hear that anyone with a bar number that agrees with you.

If you want to prohibit people from using software you write with things that you consider to be derivatives when the law does not recognize them as such, you need a license that makes that explicit. Such a license could not be called an open source license under clause #9 of the open source definition:

https://opensource.org/osd-annotated

Consequently, the GPL is definitely the wrong license for that.


> That is analogous to writing a new piece of software intended to be similar to an existing piece of software rather than a port of software under license.

I take ZFS from Solaris. I rewrite it to work with Linux. In which sense is this not equivalent to my analogy? The examples you're giving are not equivalent, because in each case the work was written without deriving from the other copyrighted work.

> However, this argument does not apply to ZoL because the code originated in OpenSolaris and is under license and exists as a discrete module, rather than a whole program.

That's an entirely arbitrary distinction.

> So far, the only thing that you have concretely stated is that you met some attorneys who were unwilling to make a decision on legality.

No, I said that lawyers had told me that ZoL was an infringing work but that we wouldn't know for sure unless a court decided on it: http://www.phoronix.com/forums/forum/software/general-linux-...

> If you want to prohibit people from using software you write with things that you consider to be derivatives when the law does not recognize them as such

Nobody wants that.


> I take ZFS from Solaris. I rewrite it to work with Linux. In which sense is this not equivalent to my analogy? The examples you're giving are not equivalent, because in each case the work was written without deriving from the other copyrighted work.

I take it that you never actually read the ZFSOnLinux source code.

It is not really rewritten. There is a compatibility layer in place to prevent the need to rewrite much of the code and a very small percentage of the original kernel code actually changed to support Linux, but what did change was meant to use for interfaces that are provided by the kernel to allow proprietary modules to be ported, which suggests any license is fine.

However, the claim that writing a brand new TV show script inspired by another forms a derivative work is to claim that writing things from scratch forms a derivative work.

> That's an entirely arbitrary distinction.

It is the distinction lawyers are making.

> No, I said that lawyers had told me that ZoL was an infringing work but that we wouldn't know for sure unless a court decided on it: http://www.phoronix.com/forums/forum/software/general-linux-....

Do you have bar numbers of these lawyers? Is there any reason to think that they were thinking that zfs.ko somehow used GPL exported symbols or some other thing that is not actually true that does not involve taking your word for it? I did have one person going to law school tell me that it was a derivative work because of that. He did not think he could claim otherwise after an explanation that the code does not do that.

Given that your legal views are so incredibly divorced from those of actual lawyers with whom I have talked, I am not inclined to believe you when you say that they had no misunderstanding, especially when it seems that you have never actually read the code to be able to be sure of that.

> Nobody wants that.

Your claims are inconsistent with that.


> It is not really rewritten.

It has several direct calls into Linux functionality that don't go via SPL, but it's also unclear that simply adding an abstraction layer is a meaningful mechanism to avoid derivation.

> what did change was meant to use for interfaces that are provided by the kernel to allow proprietary modules to be ported

There are no such interfaces in Linux.

> the claim that writing a brand new TV show script inspired by another forms a derivative work is to claim that writing things from scratch forms a derivative work.

I didn't make that claim. The analogy in question involves taking an existing work and modifying it such that it includes components of another work.

> It is the distinction lawyers are making.

It's the distinction a lawyer that you've spoken to is making.

> Do you have bar numbers of these lawyers?

Yes.

> Is there any reason to think that they were thinking that zfs.ko somehow used GPL exported symbols or some other thing that is not actually true that does not involve taking your word for it?

No.

> Your claims are inconsistent with that.

My claim is that I have reason to believe that, under copyright law, ZoL is a derivative work of Linux and as such is subject to the terms of the GPL. If the final legal determination is that it's not a derivative work then the GPL is irrelevant.


> If I take a chapter of a textbook, modify it to be a standalone volume in a collection of books and start distributing it, I am distributing a derived work of the original book, not a derived work of the collection of books. The latter constitutes an aggregation and unless there is some license (superseding doctrine of first sale in the case of books) that prevents it from being redistributed with such things, it is perfectly okay to do that.

I should elaborate that you need the original to be under license. Otherwise, you have a problem.


> Well, Ubuntu may have put the ZFS source in the kernel tree, but they still ship it to endusers as a separate kernel module and separate Ubuntu package.

No, they aren't using a separate Ubuntu package, it's gone straight into the main kernel repo.

> AFAIK to violate the GPL they would have to ship ZFS compiled code in the kernel image, but this is not what they are doing.

You can violate the GPL inside a kernel module that you distribute.


> No, they aren't using a separate Ubuntu package, it's gone straight into the main kernel repo.

How Ubuntu packages is irrelevant. What matters under the GPL is how the module is linked into the kernel.

> You can violate the GPL inside a kernel module that you distribute.

Of course, but they're not doing that. For example, you could violate the GPL by including GPL'ed code in a kernel module under a more restrictive license.

There's a lot more detail here: http://www.tldp.org/HOWTO/Module-HOWTO/copyright.html


What matters under the Copyright law (and thus the GPL) is whether the module is a derivative work of the Linux kernel or not.

ZFS was originally created for Solaris, and works on multiple operating systems. So ZFS itself is obviously not a Linux derivative. If the original ZFS could be directly linked with the Linux kernel without modifications, it still wouldn't be a Linux derivative.

But ZFS had to be modified to work with Linux. It can be argued that those modifications are Linux derivatives. We haven't had a definitive ruling on this yet.

ZFS from Solaris / BSD --> not a Linux derivative, even if it was directly linked into Linux.

ZFS with trivial modifications to work with Linux --> not a Linux derivative

ZFS with extensive modifications to work with Linux --> judge's ruling required

The only reason that linking matters is because Linus's statement that binary modules are OK would have some weight with the judge. However, Linus is not the only copyright holder of the Linux kernel, and other copyright holders have disagreed with Linus on this statement.


Using the defined interfaces of the kernel does not constitute a derivative work.


It's a Linux kernel module running in Linux kernel address space, I'd say there is reason to assume it can be considered a derivative work, and thus a license incompability.


Do you think that there is reason to assume that every program that ran on MS-DOS on an 8086 was a derivative work of MS-DOS? The programs and MS-DOS all ran in the same address space on the 8086.


Who holds the copyright on the kernel interfaces that the ZFS module uses?


The GPLv2 does not restrict placing things under GPLv2 incompatible in the same tree. It only restricts distribution of binaries that are derivative works under copyright law.


OpenZFS ! I can't see why this has blown up as a violation when people didn't actually read the announcement.

EDIT :

ZFS is licensed under the Common Development and Distribution License (CDDL), and the Linux kernel is licensed under the GNU General Public License Version 2 (GPLv2). While both are free open source licenses they are restrictive licenses. The combination of them causes problems because it prevents using pieces of code exclusively available under one license with pieces of code exclusively available under the other in the same binary. In the case of the kernel, this prevents us from distributing ZFS as part of the kernel binary. However, there is nothing in either license that prevents distributing it in the form of a binary module or in the form of source code. http://open-zfs.org/wiki/Main_Page


OpenZFS is CDDL liscensed and GPL incompatable.


Here is another post from the parent about this issue specificaly:

- http://blog.dustinkirkland.com/2016/02/zfs-licensing-and-lin...

"We at Canonical have conducted a legal review, including discussion with the industry's leading software freedom legal counsel, of the licenses that apply to the Linux kernel and to ZFS.

And in doing so, we have concluded that we are acting within the rights granted and in compliance with their terms of both of those licenses."


> However, there is nothing in either license CDDL GPL that prevents distributing it in the form of a binary module or in the form of source code.

http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIss


"And zfs.ko, as a self-contained file system module, is clearly not a derivative work of the Linux kernel but rather quite obviously a derivative work of OpenZFS and OpenSolaris. Equivalent exceptions have existed for many years, for various other stand alone, self-contained, non-GPL and even proprietary (hi, nvidia.ko) kernel modules."


This would be true if the resulting work were not a derivative work of the GPLed kernel. There's plenty of solid legal opinion that it is, and if you accept that then the GPL absolutely prevents distributing it in the form of a binary module.


Sorry, I downvoted you by accident, while focusing my browser window.


Only if you build it into the kernel binary image loaded by the bootloader. Otherwise, it is an independent module.


Shame how most of the conversation devolved into licensing rubbish. Almost none of us are qualified to speak on that, leave it to the lawyers - which I assure you Canonical did too.

With that out of the way, ZFS is by far and away the best filesystem for container workloads. Hopefully we will get deeper quota and I/O throttling support soon.

I have been using ZoL in production for many years now thanks mostly to the work of Brian Behelendorf and Richard Yao. So if you find yourselves here thanks for all the work you have put into making ZoL awesome.


> Shame how most of the conversation devolved into licensing rubbish. Almost none of us are qualified to speak on that, leave it to the lawyers - which I assure you Canonical did too.

This a million times, It will be nice to have the illumos community, the FreeBSD community and now the Linux community contributing to one piece of core software. It's especially amazing considering most open source operating system projects don't share major kernel subsystems.


You are very welcome. I am happy to hear that it has been working well for you. :)


This can be game-changing for the NAS/SAN industry.

I'm surprised their lawyers gave an OK, where FSF, SFLC and friends have given a thumbs down. If their interpretation is good, suddenly the large AIX/Solaris dominated storage boxes open up to a LOT of ubuntu-based/ubuntu-derived competition.

Exciting times..


> I'm surprised their lawyers gave an OK, where FSF, SFLC and friends have given a thumbs down.

I'm not. FSF and SFLC have institutional incentives to support the maximum remotely defensible interpretation of the scope of copyright holders rights, since they are ideological organizations who rely on the maximum amount of code possible being subject to the restrictions of the GPL.

They are among the least likely organizations on Earth to publicly present a balanced view of the scope of copyright law particularly as it addresses coverage of derivative works.


They certainly have reasons to be biased but saying they are among the least likely is unnecessary hyperbole. I'd say they're just as likely, at most, as the lawyer of any copyright holder is when discussing whether something is a derived work of their property.

The other party here has their own interests and biases here as well, of course. Let's not forget how many companies in the mobile and embedded space have repeatedly chosen to violate the GPL even when their noncompliance has been obvious.


I'm curious exactly what the quality of Canonical's legal advice is and how much those lawyers understand open source licensing and IP law in general. It took "two years of negotiations" for them to state that, for packages under the GPL, their GPL-incompatible license on Ubuntu as a whole did not apply.

https://www.fsf.org/news/canonical-updated-licensing-terms

https://sfconservancy.org/news/2015/jul/15/ubuntu-ip-policy/

(It's still the case that non-GPL binary packages in Ubuntu, that is, stuff under MIT, BSD, etc. licenses, may not be redistributed. This is legal for the same reason that using that code in proprietary software is legal.)


BTW, if ZFS is used for the root filesystem, say hello to insanely simple OS upgrades and roll-backs :)


This is my #1 question. Can we use it for the root FS? If so, that's amazing, as there are already btrfs-based tools for snapshotting every time you run apt, etc.


There are a couple of open bugs[1][2] related to grub and ZFS that make it a little tricky to get ZFS on your root volume, but it is doable: https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubu....

I expect those issues to be resolved before 16.04 is released. Even with those fixes, the interactive installer doesn't support ZFS yet so you will still need to drop to a shell to actually setup your zpool and your partitions.

[1]: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1527727...

[2]: https://bugs.launchpad.net/ubuntu/+source/zfs-initramfs/+bug...


You could also use NixOS, which is even better at this. :)

And also has native support for ZFS, btw.


Or indeed PC-BSD. (-:

PC-BSD's installer understands ZFS. It creates an all-ZFS system, including the root volume. The boot manager uses ZFS for "boot environments".

* http://download.pcbsd.org/iso/10.2-RELEASE/amd64/docs/html/i...

* https://www.ixsystems.com/whats-new/2013/10/31/the-revamped-...

* https://blog.pcbsd.org/2013/06/pc-bsd-status-update/


I'm curious, because there's no mention made specifically... since you mention FSF, SFLC...

and Ubuntu describes "industry's leading software freedom legal counsel" as giving the thumbs up.


How is this possible, legally? Based on my basic understanding of the ZFS license, it's not possible to legally distribute ZFS and GPL code (linux kernel) together.


http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue

It is fine if it is a kernel module.


That's what they say - others claim that even a dynamically loaded module is producing a derivative and thus you're not allowed to distribute a non-GPL-ed binary module.

Matthew Garret (a kernel developer and thus a shared copyright holder of the kernel) is of the opinion that linking a binary ZFS module is not legal:

https://twitter.com/mjg59/status/700021939708915712 https://twitter.com/mjg59/status/700073945064611841 https://twitter.com/mjg59/status/700074164435091456

As a copyright holder, he is potentially in the position of suing canonical over this (and he doesn't like them very much, so he just might)


I know zealots are necessary to keep a balance. I've typically appreciated the utility of people like RMS to the free software movement.

That said, Matt Garret's Captain Ahab-like zeal of keeping one of the most useful pieces of open-source code away from Linux, while taking potshots at Ubuntu, is really off-putting. I guess I'm not so pure.

Which is why I run my file server with BSD.

I'm really excited to see ZFS functional in 16.04, and in fact, that got me to install the pre-beta just to mess with it.


I can understand how you disagree with Mathew (and I also would prefer for ZFS to be universally available under Linux), but that's not the point here.

The GPL states in clear terms what's allowed and what isn't.

It doesn't matter whether you believe a specific use case should make it ok to violate the license or not.

It's like laws. Whether you personally believe they are just or not is not a reason why they should or should not apply to you.


> The GPL states in clear terms what's allowed and what isn't.

The GPL does not trump copyright law in determining what is a derivative work.

> It's like laws.

A little bit. But its less like laws than actual laws are, and it depends on the actual laws; it doesn't have the power to redefine them.


The GPL does not placed restrict non-derivative works. If it did, it would not be an OSS license by #9:

https://opensource.org/osd-annotated


In my heart, I know. As a user, it's just frustrating to see so much awesome technology artificially limited by silly licenses on both sides of this debate.


>That said, Matt Garret's Captain Ahab-like zeal of keeping one of the most useful pieces of open-source code away from Linux, while taking potshots at Ubuntu, is really off-putting.

Your response to a large company violating the license that Linux is distributed under is to blame Matthew Garret for pointing it out?


That's a little presumptuous. Violating? On one hand we have some opinions from people, some lawyers, some not, saying they think this could be a violation.

On the other, there's equal opinions that this (or the way the did this) is -not- a violation.

So I don't think it's particularly fair to reach for your pitchfork, either.

Besides, his response had almost nothing to do with Ubuntu, it was "Garret's zeal in trying to keep ZFS off Linux" regardless of distro (which is true). "While taking potshots at Ubuntu" (which is also true, and on issues far wider than "including ZFS".


Can you provide bar numbers of lawyers who would make that claim?

So far, Matthew Garrett has yet to claim that any attorney said this is a problem. The only claim he has made (after I got him to clarify what was said) is that he met some attorneys who said that they were not absolutely sure that there is no problem. There are likely attorneys out there that make similar claims about the GPL software in general, so I really am not that concerned that he found a few attorneys that said that they were not sure.


Indeed. ZFS, supported by Canonical. Its Canonical's considered legal opinion that there isn't a problem with ZFS, and if you disagree you can sue them. They put their balls on the table. Dare you to try cutting them off. You need a real lawyer to try that, not an armchair lawyer.


I like how he points out here

https://twitter.com/mjg59/status/700073945064611841

that zfs was "merged into the kernel tree," but so far as I know the GPL doesn't dictate that things can't be stored in the same location together. There are no official GPL certified directory structures, etc.


Those comments are very different from Matthew's previous comments regarding Oracle's dtrace LKM for Linux, where the only definitive remark he had was that bypassing the GPL symbol export like they did was not okay:

https://mjg59.dreamwidth.org/31357.html

His argument about CDDL Linux kernel modules using non-GPL exported symbols being a problem is clearly FUD. Specifically, Fear of a violation; Uncertainty of a violation; and Doubt that there is no violation.


Does Garrett hold the copyright on anything that the ZFS module could be considered a derivative work of? I mean presumably the notion that the ZFS module is a derivative work is based on the ZFS module containing code that was written to work with particular parts of the Linux kernel - but it's not going to touch a lot of the kernel API. So who owns the copyrights on the parts that it does touch.


Is Garret asserting that the ZoL /source/ is a derived work of ZFS and linux and thus non-distributable?

Or is he saying a binary module compiled from that source is non-distributable?

If the source is redistributable then a DKMS kernel module could be used because it distributes source that is built by the end user.


On twitter[0], he is suggesting that since the binary module is a derivative work (of the linux kernel due to linking to it), according to the GPL, the source code to ZFS must be licensed under a GPL compatible license.

However, since Canonical cannot relicense the ZFS code to a GPL-compatible license (since they are not the copyright holder), if they distribute the ZFS module, they would be in violation of the GPL (and thus lose their rights under the GPL to the kernel code).

Whether that's actually true appears to be up for a debate depending upon whether or not a binary module is distribution or not, which is why he's suggesting he'll talk to the FSF about possible recourse.

[0] - https://twitter.com/mjg59/status/700074164435091456


I'm not sure it's a good idea for him to try to win this argument.

If he wins, binary video card and wifi drivers will all vaporize, too.

nVidia would probably love to actually have a good excuse to not support free versions of Linux and only have to support closed, commercial ones.


Given his track record trying to get Oracle to stop using a GPL exported symbol in their CDDL DTrace module, it is unlikely he is going to do anything here. Orqcle has committed an actual potential violation as far as lawyers with whom I have talked are concerned. Canonical has not.


ZFS has been available in Linux for a long time, the licence restrictions hold it back from being included in the kernel, but it's available as a FUSE filesystem.

https://en.wikipedia.org/wiki/Filesystem_in_Userspace

EDIT: Looking through the comments in the article, it's being suggested this isn't using the FUSE implementation of ZFS, and that somehow it's part of the kernel. Not sure how they've legally managed to do that!

EDIT 2: Looks like it's a kernel module, as other comments here suggest.


> Not sure how they've legally managed to do that!

Answer: They haven't. :) They've just created a GPLv2 violation.


It looks like they got around it by distributing OpenZFS as a kernel module. If that's permitted for closed source kernel blobs, it's probably fine for this code too.


It is not permitted for closed source kernel blobs; those are violations too.


> It is not permitted for closed source kernel blobs; those are violations too.

They are not permitted by the license; for them to be violations, they would have to be derivative works that require a license.

If you have a reference to copyright case law in any jurisdiction holding that to be the case, that would be interesting.

I think its clear that certain parties (including the FSF) would like this to be perceived to be a violation; it also seems fairly clear that this type of act has been a fairly established practice around Linux, but that those with that view have not take action to vindicate it in court, perhaps because while they'd like it to be perceived to be the case, they have little confidence that courts would agree with them, and the one thing they'd like less than active disagreement with some people engaging in the practice that they don't like is a black-and-white ruling vindicating the ongoing practice and rejecting the FSF view on the legal requirements.


Does a single patch to any part of the Linux kernel make you eligible for suing over these or do you have to have copyright over a part that is clearly relevant?


Matter of opinion


So using a kernel module for the closed source NVIDIA driver for Linux is a GPL violation?

As far as I understood, the GPL only applies for code that is directly linked to it. A call to an external library (e.g. a kernel module) wouldn't necessarily apply. The code is being delivered as separate binaries.


> So using a kernel module for the closed source NVIDIA driver for Linux is a GPL violation?

Yes. Your understanding isn't correct.

With NVIDIA, there's this complicated dance where you download source code from nVidia for the kernel module, then compile it on your own machine and use it -- you aren't violating copyright because you don't distribute the kernel module that you compiled for yourself on your own machine.

But that kernel module does indeed violate GPLv2, and you can't distribute it legally, and neither could Canonical or nVidia (which is why they do the dance above instead).


> "But that kernel module does indeed violate GPLv2, and you can't distribute it legally, and neither could Canonical or nVidia (which is why they do the dance above instead)."

If that's the case, then why doesn't the FSF sue Linux kernel developers over licence violations? There are clearly pre-compiled binary blobs distributed along with the mainline kernel (otherwise there would be no need for the Linux-libre project to exist: https://en.wikipedia.org/wiki/Linux-libre ). There's little point having a licence if there's no consequences for breaking it.

I suspect they don't because it's not a simple case, and that such a measure would be somewhat counterproductive for their cause.


> If that's the case, then why doesn't the FSF sue Linux kernel developers over licence violations?

I don't understand. The Linux kernel developers hold the copyright on the kernel. If someone sues someone else, it's them --- the kernel developers -- who have standing to be doing the suing. They didn't give their copyright to the FSF merely as a result of choosing to use the FSF's license.


So if none of the Linux kernel developers sues Canonical for using a OpenZFS kernel module, Canonical can carry on with it and nothing of value was lost.


If anyone wants more companies to comply with the GPL, you might want to consider supporting the compliance efforts of the SFC:

https://sfconservancy.org/supporter/


This is true, but most companies would want better assurance of their work's legality than "well, as long as none of the tens of thousands of people we just gave grounds to sue us actually do so, we'll be fine".

See also, from a kernel developer: https://twitter.com/mjg59/status/700074164435091456


> How is this possible, legally?

My understanding is that it isn't under the theory of what counts as a derivative work requiring a license (and, thus, what is subject to GPLv2 in the first place) espoused by the FSF, however, I believe that view has been hotly disputed as to its accuracy under US copyright law (at least) for about as long as the GPL has existed and never been tested in court.

The FSF, in general -- as is unsurprising for entity that relies on maximally leveraging copyright protections to achieve its ends -- holds to a fairly maximalist view of the legal rights of copyright owners.


It says that it is OpenZFS specifically.


Which is still CDDL licensed and thus GPLv2 incompatible.

Here's the copyright file from the relevant commit to the Xenial kernel tree:

http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/diff/z...

    The majority of the code in the ZFS on Linux port comes from
    OpenSolaris which has been released under the terms of the CDDL
    open source license.  This includes the core ZFS code, libavl,
    libnvpair, libefi, libunicode, and libutil.


OpenZFS was forked from the original Sun/Solaris ZFS code base, which Sun made available regularly. It's still CDDL.


>How is this possible, legally?

It's not. Canonical has blatantly violated the GPL before and gotten away with it.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: