Ubuntu 16.04 also comes with enhanced BPF, the new Linux tracing & programming framework that is builtin to the kernel, and is a huge leap forward for Linux tracing. Eg, we can start using tools like these: https://github.com/iovisor/bcc#tracing
Does that imply Netflix is transitioning away from FreeBSD? If so, why?
Netflix CDN (Open Connect Appliance): lots of physical boxes, FreeBSD.
Why two different OS'es?
Tracing ZFS operations slower than 10 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:31:28 dd 25570 W 131072 38784 303.92 data1
06:31:34 dd 25686 W 131072 38784 388.28 data1
06:31:35 dd 25686 W 131072 78720 519.66 data1
06:31:35 dd 25686 W 131072 116992 405.94 data1
06:31:35 dd 25686 W 131072 153600 433.52 data1
Tracing ZFS operation latency... Hit Ctrl-C to end.
operation = 'read'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 4479 |****************************************|
8 -> 15 : 1028 |********* |
16 -> 31 : 14 | |
32 -> 63 : 1 | |
64 -> 127 : 2 | |
128 -> 255 : 6 | |
256 -> 511 : 1 | |
512 -> 1023 : 1256 |*********** |
1024 -> 2047 : 9 | |
2048 -> 4095 : 1 | |
4096 -> 8191 : 2 | |
There's also high level features it's still missing (like tracepoints and sampling), so what will be in Ubuntu 14.04 won't do everything, but it will do a fair amount: most of those _example.txt's. Some use a newer BPF interface (linux 4.5), and we've been putting the legacy versions in an /old directory specifically for Ubuntu 16.04 users.
I'm following along with all of this pretty excitedly, and crossing my fingers for a Linux tracing book with BPF, ftrace, perf, etc. to read through and keep on my shelf next to your performance and dtrace books ;)
As of 4.0.5 btrfs was IMO completely unusable as a daily file system. Some examples of issues I ran into:
1) System became unbootable with the version of btrfs I had installed and I had to use either an older or newer kernel to recover
2) I have a periodic backup of my mailbox that runs, and when it runs my system becomes completely unusable until it completes. The same script running on zfs on bsd and with ext4 or reiser3 on linux would show I/O slowdowns, but I could still use my machine.
3) In general I would run into other minor issues and the consensus in #btrfs was that since my kernel was more than 3 months old, it was probably fixed in the latest version, and why would somebody using an experimental filesystem not be tracking mainline more closely?
To be fair, here's some issues with ZFS:
1) Do not ever accidentally enable dedupe on commodity hardware; it will slowly consume all your RAM if you aren't on a sun server (where 64GB of RAM is a resource constrained environment), and there aren't effective ways to undo dedupe, other than copying all the data onto a different pool.
2) You can't shrink a pool. Hugely annoying, apparently non-trivial to solve.
Let me add:
3) Do not allow a pool to exceed 90% capacity ... and probably don't let it exceed 85%.
ZFS does not have a defrag utility and it badly needs one. You can permanently wreck zpool performance by running it up past 90% capacity - even if you later reduce capacity back down to 75-80%. You can sort of fix it if you add additional top level vdevs to the zpool, thus farming out some IO to the new set of disks, but it's still going to be performance constrained forever. The only solution is to create a new pool and export the data to it.
This is unacceptable, by the way.
It is not at all reasonable to require a filesystem to stay below 80% capacity (our target "full" number at rsync.net) nor is it acceptable that hitting 90% is a (performance) death sentence.
When you consider that you might have already sacrificed 25% of your physical disk just for the raidz2/raidz3 overhead, being constrained to 80% means you're only using 60% or so of your physical disks that you bought.
ZFS needs defrag. Badly.
This would imply some kind of rebalancing, if I understand correctly. Which maybe could be considered a form of indirect defragmentation.
Then added a disk to it to try to recover. That worked, but only after adding a disk did I realize that I couldn't shrink the pool down again. I ended up moving the whole thing off to a new disk cluster and back again. Really painful.
Also the same "technically challenging" problem that needs to be solved for shrinking pools would also make defrag possible.
Some implementers have been able to significantly reduce the pain of those situations.
That said, Delphix made changes to consider on fragmentation rather than space usage when the histogram is in use, so the performance before best-fit behavior goes into effect is better than outright selection of metaslabs by free space.
But there was a tangible progression from instability towards increasing stability. I haven't had one lick of issue with btrfs in about a dozen kernel releases. I'm close to saying I'd trust in a production environment, since I use it everywhere else as a daily driver, I would just use an LTS release version just to be safe.
It is not all sunshine and roses, though. While Facebook employs several major btrfs developers, a lot of features that have been talked about for years still have not seen the light of day or any development whatsoever. lz4 compression, better checksum algorithms, per-subvolume encryption, online filesystem checking, and the Raid 5/6 support is still kind of garbage a year later. I worry that btrfs is suffering from a lack of interest in actually making the last legitimate pushes it needs, and code audits, and integration testing, to make it truly trustworthy.
But at the end of the day checksum integrity and COW are basically a game changer for me in terms of data integrity.
"Booting without all members of your mirror is unsupported."
ZFS is not comparable to btrfs at the moment. Everything device related is missing on btrfs. No detection of missing or broken devices in btrfs at the moment, no hotspare functionality, btrfs RAID1 uses the pid to decide with disk to read from. RAID5/6 is still experimental and there are some odd behaviours.
Using btrfs for production is a risky bet and may very well bite you. The tooling is terrible at the moment (IMHO) and benchmarks favour ZFS most of the time.
That said, I would like to point out that ZFS' dataset level operations are more powerful than reflinks. ZFS' dataset level operations give separate independent snapshot and clone capabilities. They also provide the ability to rollback without killing things on top (which is useful in some cases). You cannot do that with reflinks. I suppose the immutable bit could be used to fix a reflink so that it retains the state at creation, but that is racy. In the case of virtual machines which seems to be a major application of reflinks, zvols are lower overhead and support incremental send/recv.
One benefit of reflinks would be that regular users can use it, but regular users should be able to snapshot, clone and rollback when delegation support is implemented.
I'v had the misfortune of using btrfs in production with a few hundred machines on Ubuntu 14.04. It's one of the most finicky filesystems I've ever used. It's probably better in newer kernels, but if you have a lot of churn it requires constant care and feeding and tends to cause kernel softlockups fairly commonly.
Also this: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-...
and this: https://www.phoronix.com/scan.php?page=news_item&px=CoreOS-B...
and this (old but a few of the points still apply): http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-...
That being said I'm sure there are plenty of use cases for which btrfs in its current capacity is more than sufficient.
The major issue is that regular debugging tools that folks have been using forever like `df -h` aren't just non-functional, they actively misrepresent the state of the file system. The most common example is indicating that you have plenty of free space when in fact you're out. We had to write a lot of documentation to teach people how btrfs works and how to debug it: https://coreos.com/os/docs/latest/btrfs-troubleshooting.html
The second major issue is that rebalancing requires free space, which is the problem that most folks are trying to fix with a rebalance operation. Catch-22 in the worst way. Containers vary in size and can restart frequently, churning through the btrfs chunks without filling them up, leaving around a lot of empty space that needs to be rebalanced.
You are not alone. btrfs seems to be kind of stable - as in does not corrupt itself anymore - with 4.2 but it's been a nasty ride.
It's an experimental filesystem that is neither complete nor stable yet. I wish this would be better communicated.
It's needless frustrating: If you search for btrfs you come across a few slide decks that tell you: It's fine you can use it... after the first strange problems you'll subscribe the mailing list and every other day there is some post that shines some light into strange behavior and stuff that is not implemented.
If you want checksumming on your single hdd backup disk btrfs is fine. For everything else you are up to some surprises... basically everything volume management and RAID is pretty much experimental and has strange behavior.
Performance is not even a topic. I remember the ML discussion on this OLTP blogpost and the majority of responses was: Don't run databases on btrfs you stupid! I'd rather would read a technical discussion about the problems but from reading the ML it seems like it's too complex and few understand the complexity.
@bcantrill called it a shit-show in some podcast and while it maybe technically not true it sure does looks like that.
If you want peace of mind use mdraid+ext4 (or xfs) - ZFS on Linux has a lot of problems for heavy usage but the community is IMHO more invested in making it a good Linux citizen.
On the other hand: This stuff is complicated and everyone expects miracles. I'm just looking at it from sysadmin perspective and on Linux both suck at the moment. But ZFS won't eat your data and has far better tooling.
If you need something that works for high load on Linux I'd use neither.
BTRFS has yet to reach the stability of ZFS.
Amazingly, their Linux licensing used to be worse. They used to claim you were only permitted to install the driver on one computer within an organization
In this case, I can't even see any real liability issues - even if Canonical did get taken to court there are no damages since the software is free of charge.
Many people can and do charge a lot of money for OSS products.
If it was done again today, they might have gone for the Apache license as I recall -- and avoided some of the unfortunate issues.
I'd wager they wouldn't want to adopt ZFS without an explicit license and legal indemnity from Oracle.
And they obviously aren't going to make a new filesystem. That doesn't get them sales like higher resolution screens or changing the color theme... again.
Introducing a new filesystem would be a big decision for Apple. There would doubtless be all sorts of migration and compatibility issue, even aside from the work it would take. Especially given where we are in the maturity of desktop clients, it makes a lot more sense to incrementally improve the current filesystem. I'm not sure how snarky you intended to be, but no there aren't many sales in a complex undertaking that is far more likely to cause data corruption and migration issues than concrete benefits for 99.9% of users.
Is ZFS the right tool for this?
The reason is that when ZFS detects corruption, it'll lock down the whole fs... and prevent reading/recovery data from it, as recovering data from raidz is the expected solution in that case.
I tried to google again for the description of this issue, but I couldn't find it... I found this otoh:
And things aren't that obvious, apparently:
> Even without redundancy and "zfs set copies", ZFS stores two copies of all metadata far apart on the disk, and three copies of really important zpool wide metadata.
Whichn means that this might not actually be a problem after all
ZFS has duplicate metadata by default, so it can recover from corrupted metadata blocks unless too much is gone. If the data blocks are corrupted and there is no resundancy, you should get EIO. There is no code to "lock down the FS", although if you have severe damage (like zeroing all copies of important metadata or losing all members of a mirror), it will die and you will see faulted status. That is a situation no storage stack can survive and is why people should have backups.
Depends on what exactly is corrupt, but for file corruption it's generally just a case of warnings in logs/zpool status (which will suggest restoring the file from backup), and IO errors trying to access that specific file. The pool itself should remain intact and online.
It's less clear cut if it's important metadata that's damaged, but as you mention, ZFS is quite aggressive about maintaining multiple copies even on standalone devices.
For what it is worth, I am this ryao:
If you're talking about using one of them most of the time and syncing occasionally then any filesystem will do, you'll want a user-level tool for doing the sync (probably - sibling did mention zfs send which I don't have any experience with).
You could also use rsync, duplicity, or a bunch of other tools.
The major zfs advantage here is that all your files get integrity checking.
ZFS on Linux had issues with ARC (especially fast reclaim) and some deadlocks and AFAIK cgroups are not really supported - e.g. blkio throttling does not work.
Would be great is they got this ironed out but I would be wary. Still great news!
A comparable solution using LVM and/or mdraid with ext4 on top has much better latency behavior.
Sorry for no benches for you, but feel free to run a quick check using latencytop and ftrace. Phoronix has some performance comparisons if you want them.
Could you expand on that?
I mean, an hour of mono uncompressed 192 kHz/24bit audio is almost exactly 2 GB. Compared to professional audio equipment, 128 GB of RAM isn't very expensive ( < $2000), and that would let you keep 64 one-hour maximum-def tracks in memory. Why do you need to read from the disk with any frequency?
On the other hand, I've always wished we could get a modern re-take on ZFS. As anyone who's tried it will tell you: dedup in ZFS essentially doesn't work. ZFS, internally, is not built on content-addressable storage (or, it is, but since splitting of large files into blocks doesn't take any special actions to make similar blocks align perfectly, it doesn't have anywhere near the punch that it should). As a result, dedup operations that should be constant-time and zero memory overhead... aren't. Amazing though ZFS is, we've learned a lot about designing distributed and CAS storage since that groundwork was laid in ZFS. A new system that gets this right at heart would be monumental.
Transporting snapshots (e.g. to other systems for backups... or to "resume" them (think pairing with CRIU containers)) could similarly be so much more powerful if only ZFS (or subsequent systems) can get content-addressable right on the same level that e.g. git does. `zfs send` can transport snapshots across the network to other storage pools -- amazing, right? It even has an incremental mode -- magic! In theory, this should be just like `git push` and `git fetch`: I should even be able to have, say n=3 machines, and have them all push snapshots of their filesystems to each other, and it should all dedup, right? And yet... as far as I can tell , the entire system is a footgun. Many operations break the ability receive incremental updates; if you thought you could make things topology agnostic... Mmm, may the force be with you.
 https://gist.github.com/heavenlyhash/109b0b18df65579b498b -- These were my research notes on what kind of snapshot operations work, how they transport, etc. If you try to build anything using zfs send/recv, you may find these useful... and if anyone can find a shuffle of these commands with better outcomes I'd love to hear about it of course :)
Bloom filters specifically have issues: they don't permit removing entries for one, and they're not really that efficient. But there's a paper about Cuckoo Filters which seems to solve both of these problems. For example:
The "semi-sort" variant of the cuckoo filter benchmarked in the paper has a size of 192 MB and holds 128M items.
So for 8kb blocks, it can dedup 1TB of blocks. More if you increase the block size or the size of the table.
It has a 0.09% false-positive rate (!). I.e. unique blocks would use the slow path to test for duplication in vain only once in 1111 writes.
The algorithm can perform 6 million lookups per second on the benchmark hardware. (2x Xeons at 2.27GHz, 12MB L3, 32 GB DRAM)
But it's worth noting that I've debugged corruptions in prod systems where:
- corrupted data was read from disk -- a bit flip, with no error code at the time -- by an application
- the application operated on it
- and the application then wrote the result -- still carrying the bit flip -- back to a new file on disk.
Ouch. The bitflip is now baked in and even looks like a legit block as far as the disk is concerned. The disk failed not long after, of course -- SMART status caught up, etc. But that was days later.
Checksums on read address this. I never want to run a system without them again.
You may want to validate that assumption. :)
(Prepare for an unpleasant surprise.)
ZFS puts guarantees that your data will be safe, but if has no power to help you if your data gets corrupted in memory.
The ECC is the last piece needed to guarantee data safety.
So if you don't have ECC your data is still safer with ZFS than traditional file-systems, ECC just increases the safety further.
(I'll happily grant that this scenario is so unlikely as to be impossible for all practical purposes, but having skimmed the stuff you linked to I don't see why it couldn't happen theoretically.)
This is not unique to ZFS, and it doesn't make ZFS worse than other filesystems. But since the reason you'd use ZFS is often to avoid any corruption, it's tradition to advise the use of ECC.
I had a file server happily ticking away using ext4.
Converted it to ZFS - and a week later got file system corruption reported. Ran a very extensive memory test - and sure enough I had bad RAM (but it took 2-3 days for the errors to show up).
In the wild there has to be a ton of corruption that just never gets discovered without end to end checking.
If you have a large jpgs or MKVs - a flipped bit here or there is not going to be apparent.
Genuine question, I don't understand this claim. As far as I can see, ZFS provides protection against some types of failures on disk, which ext4 doesn't. ECC has no impact on that, it protects another dimension.
To be clear, ext4 did not cause the corruption - but neither did it detect it or correct it. It happily sent corrupt blocks out to disk.
I think this happens a lot more frequently than people realize.
I've had this happen more than once, both with bad RAM and bad IO controllers - previously fine static data suddenly being detached from the filesystem and appearing in little bits in lost+found, because bit flips effectively causes it to hallucinate problems to "fix".
Export/import is not a solution to most of those. In the cases where it does work, umount/mount is likely all that is needed.
This would seem to indicate that this sort of thing is exceedingly rare and similar issues would have similar effects on other filesystems.
I would like to buy a notebook with ECC ram, but Intel doesn't care.
The parent poster already stated the opposite.
ECC and ZFS are orthogonal. ECC ensures that data in your RAM is not corrupted (or rather detects corruption) it helps whether you use ZFS, EXT4, NTFS etc.
ZFS increases your data safety whether you use ECC or not, but if you have to have maximum assurance that data is fine you should use ZFS and ECC.
They are all just as prone to defects.
At least Jeff doesn't believe much in ECC anymore.
I am dreaming I suppose.
I can't find the talks now, but I believe Cantrill and others have spoken about this previously.
My memory is somewhat fuzzy, so I might be wrong on this.
Oracle could release their fork of ZFS under any license they wish.
The Solaris fans (generally from Sun) preferring / promoting ZFS, and hard core Linux fans (mostly Oracle) - who tended towards BTRFS.
echo > myfile
P.S. not the "solution", but may help in case you fill the disk by accident, and need to make some room before a remount cycle.
P.S.2. this could help also when your disk is already 100% full, without enough space for deleting files -not enough space for new inodes- (I tested that case on ZFS NAS with no left space at all, and worked).
I'm currently setting up a couple servers using LXC with btrfs.
I ending up choosing LXC (as opposed to LXD, docker, rkt, etc.) because I wanted something relatively straight-forward. I just wanted some containers I could create, log in to and configure.
If this was a bigger deployment, I'd take the time to use docker or something else. But for now, just being able to get going quickly is nice. For backup / failover, I can btrfs send / receive the containers to another host and start them there.
I've been using lxc + btrfs daily for quite a while, setting up and tearing down hundreds of containers on a busy day. I stopped using lxc snapshots after I had a btfs subvolume that would crash the system when I mounted it. After that, no problems.
That's unfortunate. What operating system version were you using at the time?
I've actually switched to using Ubuntu 15.10 for the container hosts so that I can get a more recent version of the btrfs tools. The intention is to upgrade to 16.04 as soon as is reasonable, and leave them there for a long time.
Running an Alpine container using Docker on an Ubuntu host will work just fine.
See zfs.ko here: http://blog.dustinkirkland.com/2016/02/zfs-is-fs-for-contain...
"However, there is nothing in either license that prevents distributing it in the form of a binary module or in the form of source code.":
The binary kernel image is a separate issue from the kernel packages (deb); they could include multiple files in a kernel package (deb) that are licensed under different licenses.
There's a reason LLNL developed their branch out-of-tree - it's just not worth the legal headaches to aggregate the source like Canonical just did.
On the other hand, I suspect RMS isn't too happy with this turn of events. The sfconservancy may be the more likely party to bring a lawsuit. I'm curious to see either of them comment on the situation.
Can you remove the Linux kernel and still have a complete and working program? What happen if one removed all function calls to the Linux kernel or use of internal kernel variables? As a module, does it work with any other kernels like windows or apple, and what was the programmers intention when writing it?
There is some arguments in favor of fair use in regard to compatibility, where derived works are infringements but still deemed legal. The courts has historically been rather split on this subject when it comes to software, in particular with several cases voting in favor of unlicensed modules to consoles. It would be quite a big bet either way you vote.
Similarly, the original code was taken from OpenSolaris and was adapted for Linux. No matter how we change it, it is a derived work of Solaris. Furthermore, it is distributed as part of a mere aggregation, which is okay with OSS under the OSD and also okay with the GPL under the GPL FAQ. The only time you can claim a combined work is formed is when the module is loaded into a running kernel, but the GPL does not restrict non-distribution and the kernel with the module loaded into it is not being distributed.
As for removing it from the Linux kernel, given that it is an entire storage stack between the block layer and VFS, you would need to replace everything there (including the disk format), but yes, you would have a working system.
As for all calls to Linux kernel symbols, those are provided to LKM so that they can function and they cannot function without it. There are symbols not provided at all, symbols provided only to GPL software and symbols provided to everyone. ZFS only uses the last group, which is intended for use by non-GPL software.
You can design software to load a LKM from an arbitrary. FreeBSD had done that with Windows kernel modules for wireless drivers at one point. Wine does that for certain Windows drivers that do copy protection. There is nothing stopping you from creating a kernel under a difference license that loads modules in the LKM format of a given Linux kernel. Although the usual case is to port the code to another kernel's own LKM implementation. Attorneys with whom I (and apparently Canonical too) have spoken think this is okay.
ZFS was developed on another operating system, Solaris back in the early 2000's and continues to be actively developed on illumos, FreeBSD, OSX and Linux today. However the bulk of new code seems to come from the illumos and FreeBSD communities. ZFS also runs in userspace to allow for easier testing and development. So if you remove Linux you still have a working program, ie it's a working kernel module for illumos, FreeBSD and Mac OSX, as well as a userspace program.
As for the intentions of Jeff Bonwick and Matt Ahrens, it was to make administration of file systems much easier. The video posted below is about the history of ZFS and is presented by one of the creators. The first person talking is the other founder of ZFS.
Birth of ZFS.
GPLv2 does not use the term "derived work" anywhere. It uses "work [...] derived from the Program", and does not define this term .
I'd start out by explaining that before we even get to the question of whether or not the module is a "work [...] derived from the Program", we have to ask the question of whether or not the license even applies. GPLv2 only applies if the module does something that requires permission under copyright law. The copyright law question that needs to be asked is whether or not the module is a "derivative work" of the kernel.
> Can you remove the Linux kernel and still have a complete and working program? What happen if one removed all function calls to the Linux kernel or use of internal kernel variables? As a module, does it work with any other kernels like windows or apple, and what was the programmers intention when writing it?
None of these questions are actually relevant to the copyright law question of whether or not it is a derivative work. They are relevant to the question of whether or not it is useful when not used in conjunction with a Linux kernel but that's not a copyright law question.
To answer the copyright law question of whether or not some program P  is a derivative work of some other program Q, you only need to look at the source code to P and Q. If P and Q interact with each other (unilaterally or bilaterally, directly or indirectly) some people get hung up on the mechanism of that interaction, but that's not relevant to the question of whether or not P is a derivative work.
Whether or not a program P that uses function names, function argument ordering, and data structures of program Q, but does not copy algorithmic code from Q, is a derivative work of Q is going to essentially come down to whether or not the interface (I'm including data structures as part of the interface) of Q is copyrightable.
If program interfaces are copyrightable, then programs that interact with other programs will be derivative works of those programs, regardless of whether they interface by static linking, dynamic linking, system calls from a user process P to kernel code Q, IPC from process P to process Q, RPC from process P across a network to process Q on another machine and so on.
If program interfaces are not copyrightable, then as long as all P incorporates from Q are interfaces P won't be a derivative work.
Generally, courts have held that program interfaces are not copyrightable (with the notable exception of the Court of Appeals for the Federal Circuit in the Oracle vs. Google case, which does not set copyright precedent).
Thus we arrive at the major question for kernel modules: what copyrightable kernel elements do they incorporate?
If they just incorporate non-copyrightable interfaces then a kernel module would not be a derivative work of the kernel.
That's not the end of the inquiry though. It would be if some third party were making and distributing the module. E.g., if I were to write a kernel module that does not incorporate any copyrightable kernel elements and distribute it stand alone, for others to download if they want and use it with their kernels, we'd be done.
In the case of a distribution vendor distributing a kernel module along with a kernel, then even though the module itself might not be a derivative work their distribution as a whole is. Questions might arise as to just what constitutes a "work". If they statically link the module to the kernel, the resulting binary is clearly a work, and it is a derivative work of both the kernel and the module, and so the module would have to be GPL. It is important to note in this case that this is because the combined work is a derivative work of the kernel...the module itself is still not a derivative work of the kernel.
How about if the module is dynamically linked, but the configuration they ship automatically loads it at boot time? Might one argue that the kernel, init scripts, and dynamic modules together are all one work that the vendor is distributing?
 For completeness, GPLv3 does not use "derive" or "derived" or any similar terms it all. It uses the term "covered work", which is defined as the original program or a "work based on the Program", and it defines that as basically a work that requires copyright permission.
 I'm going to use the term "program" expansively to include modules, applications, plug-ins, and so on.
>On the other hand, I suspect RMS isn't too happy with this turn of events.
Why not ?
>The sfconservancy may be the more likely party to bring a lawsuit.
That would require that a Linux copyright holder would want to sue, why would they ? OpenZFS is open source, and previous suits has been about source code compliance.
There is no clause in the CDDL that places restrictions on other files in a combined work, but there is one the GPL. There are people out there who dislike the GPL for that, there are some people who explicitly go out of their way to avoid GPL compatibility because of that and I am sure that some of those people existed at Sun, but I really doubt that the design of a license by a huge organization with many people giving input can be simplified to one guy thinking GPL incompatibility is a good feature.
I also think this happened years ago and there really is no point to living in the past. People cannot distribute a vmlinux file with ZFS linked into it (i.e. not a kernel module, but part of the binary itself) because of that, but that does not stop people from distributing it as a kernel module and that is how filesystem code is loaded these days, so it is a non-issue.
It was designed to give and preserve rights for end users, it's not really a big mystery, and the actual rights which are given and preserved perfectly mirror that.
I don't see anything that would substantiate your claim of them being 'deliberately' incompatible with any other licenses (anything you can point to ?), in fact they've fixed incompability problems in GPLv3 with other licenses.
And of course both MPL and CDDL came along much later than GPLv2, with which they were incompatible (MPL 2.0 in turn rectified this).
>can be simplified to one guy thinking GPL incompatibility is a good feature.
No, I don't think for a second that it was 'one guy', again Sun management had absolutely zero reason to allow Linux to incorporate ZFS and DTrace and every business reason not to, in fact from a business standpoint it would have been crazy to hand over ZFS and DTrace to their main competitor.
>but that does not stop people from distributing it as a kernel module and that is how filesystem code is loaded these days, so it is a non-issue.
I'm not at all sure it's a non-issue, this is a Linux kernel module running in Linux kernel space, I'm pretty sure there is a strong case for this being considered a derivative, that said I hope it won't be an issue since having ZFS in a native capacity with minimal effort is a boon for Linux.
Given that work was done to make GPLv3 more compatible with other open source licenses and that GPLv2 predates both of the licenses you mention by quite a bit I'm inclined to think that's nonsense.
AFAIK to violate the GPL they would have to ship ZFS compiled code in the kernel image, but this is not what they are doing.
You can argue that GPL advocates did not intend to support a license that allows any of this. However, I expect that you would have trouble finding an attorney that will interpret what the copyright holder thought the terms said to supersede the legal meaning of the terms unless explicitly stated.
If you make a license for the kernel that does not allow derived works of other platforms' software to be distributed as ports, you would violate #9 of the OSD and could not call it an open source license:
That is analogous to writing a new piece of software intended to be similar to an existing piece of software rather than a port of software under license. Examples of the former include the Linux kernel (meant to be similar to UNIX SVR4) and the wine project (meant to be similar to Windows). If that argument is valid:
1. Oracle is in an excellent position to sue every Linux user not using Oracle Linux, because they own rights to UNIX SVR4, which they inherited from Sun.
2. Microsoft is in an excellent position to sue wine users.
3. James Cameron and 20th century Fox would also be in trouble with Disney for Avatar's similarities to Pocahontas.
4. Probably plenty of other bad things.
However, this argument does not apply to ZoL because the code originated in OpenSolaris and is under license and exists as a discrete module, rather than a whole program.
So far, the only thing that you have concretely stated is that you met some attorneys who were unwilling to make a decision on legality. You are not an attorney (unless you have obtained a bar number since I last asked) and I have yet to hear that anyone with a bar number that agrees with you.
If you want to prohibit people from using software you write with things that you consider to be derivatives when the law does not recognize them as such, you need a license that makes that explicit. Such a license could not be called an open source license under clause #9 of the open source definition:
Consequently, the GPL is definitely the wrong license for that.
I take ZFS from Solaris. I rewrite it to work with Linux. In which sense is this not equivalent to my analogy? The examples you're giving are not equivalent, because in each case the work was written without deriving from the other copyrighted work.
> However, this argument does not apply to ZoL because the code originated in OpenSolaris and is under license and exists as a discrete module, rather than a whole program.
That's an entirely arbitrary distinction.
> So far, the only thing that you have concretely stated is that you met some attorneys who were unwilling to make a decision on legality.
No, I said that lawyers had told me that ZoL was an infringing work but that we wouldn't know for sure unless a court decided on it: http://www.phoronix.com/forums/forum/software/general-linux-...
> If you want to prohibit people from using software you write with things that you consider to be derivatives when the law does not recognize them as such
Nobody wants that.
I take it that you never actually read the ZFSOnLinux source code.
It is not really rewritten. There is a compatibility layer in place to prevent the need to rewrite much of the code and a very small percentage of the original kernel code actually changed to support Linux, but what did change was meant to use for interfaces that are provided by the kernel to allow proprietary modules to be ported, which suggests any license is fine.
However, the claim that writing a brand new TV show script inspired by another forms a derivative work is to claim that writing things from scratch forms a derivative work.
> That's an entirely arbitrary distinction.
It is the distinction lawyers are making.
> No, I said that lawyers had told me that ZoL was an infringing work but that we wouldn't know for sure unless a court decided on it: http://www.phoronix.com/forums/forum/software/general-linux-....
Do you have bar numbers of these lawyers? Is there any reason to think that they were thinking that zfs.ko somehow used GPL exported symbols or some other thing that is not actually true that does not involve taking your word for it? I did have one person going to law school tell me that it was a derivative work because of that. He did not think he could claim otherwise after an explanation that the code does not do that.
Given that your legal views are so incredibly divorced from those of actual lawyers with whom I have talked, I am not inclined to believe you when you say that they had no misunderstanding, especially when it seems that you have never actually read the code to be able to be sure of that.
> Nobody wants that.
Your claims are inconsistent with that.
It has several direct calls into Linux functionality that don't go via SPL, but it's also unclear that simply adding an abstraction layer is a meaningful mechanism to avoid derivation.
> what did change was meant to use for interfaces that are provided by the kernel to allow proprietary modules to be ported
There are no such interfaces in Linux.
> the claim that writing a brand new TV show script inspired by another forms a derivative work is to claim that writing things from scratch forms a derivative work.
I didn't make that claim. The analogy in question involves taking an existing work and modifying it such that it includes components of another work.
> It is the distinction lawyers are making.
It's the distinction a lawyer that you've spoken to is making.
> Do you have bar numbers of these lawyers?
> Is there any reason to think that they were thinking that zfs.ko somehow used GPL exported symbols or some other thing that is not actually true that does not involve taking your word for it?
> Your claims are inconsistent with that.
My claim is that I have reason to believe that, under copyright law, ZoL is a derivative work of Linux and as such is subject to the terms of the GPL. If the final legal determination is that it's not a derivative work then the GPL is irrelevant.
I should elaborate that you need the original to be under license. Otherwise, you have a problem.
No, they aren't using a separate Ubuntu package, it's gone straight into the main kernel repo.
> AFAIK to violate the GPL they would have to ship ZFS compiled code in the kernel image, but this is not what they are doing.
You can violate the GPL inside a kernel module that you distribute.
How Ubuntu packages is irrelevant. What matters under the GPL is how the module is linked into the kernel.
> You can violate the GPL inside a kernel module that you distribute.
Of course, but they're not doing that. For example, you could violate the GPL by including GPL'ed code in a kernel module under a more restrictive license.
There's a lot more detail here:
ZFS was originally created for Solaris, and works on multiple operating systems. So ZFS itself is obviously not a Linux derivative. If the original ZFS could be directly linked with the Linux kernel without modifications, it still wouldn't be a Linux derivative.
But ZFS had to be modified to work with Linux. It can be argued that those modifications are Linux derivatives. We haven't had a definitive ruling on this yet.
ZFS from Solaris / BSD --> not a Linux derivative, even if it was directly linked into Linux.
ZFS with trivial modifications to work with Linux --> not a Linux derivative
ZFS with extensive modifications to work with Linux --> judge's ruling required
The only reason that linking matters is because Linus's statement that binary modules are OK would have some weight with the judge. However, Linus is not the only copyright holder of the Linux kernel, and other copyright holders have disagreed with Linus on this statement.
ZFS is licensed under the Common Development and Distribution License (CDDL), and the Linux kernel is licensed under the GNU General Public License Version 2 (GPLv2). While both are free open source licenses they are restrictive licenses. The combination of them causes problems because it prevents using pieces of code exclusively available under one license with pieces of code exclusively available under the other in the same binary. In the case of the kernel, this prevents us from distributing ZFS as part of the kernel binary. However, there is nothing in either license that prevents distributing it in the form of a binary module or in the form of source code. http://open-zfs.org/wiki/Main_Page
"We at Canonical have conducted a legal review, including discussion with the industry's leading software freedom legal counsel, of the licenses that apply to the Linux kernel and to ZFS.
And in doing so, we have concluded that we are acting within the rights granted and in compliance with their terms of both of those licenses."
With that out of the way, ZFS is by far and away the best filesystem for container workloads. Hopefully we will get deeper quota and I/O throttling support soon.
I have been using ZoL in production for many years now thanks mostly to the work of Brian Behelendorf and Richard Yao. So if you find yourselves here thanks for all the work you have put into making ZoL awesome.
This a million times, It will be nice to have the illumos community, the FreeBSD community and now the Linux community contributing to one piece of core software. It's especially amazing considering most open source operating system projects don't share major kernel subsystems.
I'm surprised their lawyers gave an OK, where FSF, SFLC and friends have given a thumbs down. If their interpretation is good, suddenly the large AIX/Solaris dominated storage boxes open up to a LOT of ubuntu-based/ubuntu-derived competition.
I'm not. FSF and SFLC have institutional incentives to support the maximum remotely defensible interpretation of the scope of copyright holders rights, since they are ideological organizations who rely on the maximum amount of code possible being subject to the restrictions of the GPL.
They are among the least likely organizations on Earth to publicly present a balanced view of the scope of copyright law particularly as it addresses coverage of derivative works.
The other party here has their own interests and biases here as well, of course. Let's not forget how many companies in the mobile and embedded space have repeatedly chosen to violate the GPL even when their noncompliance has been obvious.
(It's still the case that non-GPL binary packages in Ubuntu, that is, stuff under MIT, BSD, etc. licenses, may not be redistributed. This is legal for the same reason that using that code in proprietary software is legal.)
I expect those issues to be resolved before 16.04 is released. Even with those fixes, the interactive installer doesn't support ZFS yet so you will still need to drop to a shell to actually setup your zpool and your partitions.
And also has native support for ZFS, btw.
PC-BSD's installer understands ZFS. It creates an all-ZFS system, including the root volume. The boot manager uses ZFS for "boot environments".
and Ubuntu describes "industry's leading software freedom legal counsel" as giving the thumbs up.
It is fine if it is a kernel module.
Matthew Garret (a kernel developer and thus a shared copyright holder of the kernel) is of the opinion that linking a binary ZFS module is not legal:
As a copyright holder, he is potentially in the position of suing canonical over this (and he doesn't like them very much, so he just might)
That said, Matt Garret's Captain Ahab-like zeal of keeping one of the most useful pieces of open-source code away from Linux, while taking potshots at Ubuntu, is really off-putting. I guess I'm not so pure.
Which is why I run my file server with BSD.
I'm really excited to see ZFS functional in 16.04, and in fact, that got me to install the pre-beta just to mess with it.
The GPL states in clear terms what's allowed and what isn't.
It doesn't matter whether you believe a specific use case should make it ok to violate the license or not.
It's like laws. Whether you personally believe they are just or not is not a reason why they should or should not apply to you.
The GPL does not trump copyright law in determining what is a derivative work.
> It's like laws.
A little bit. But its less like laws than actual laws are, and it depends on the actual laws; it doesn't have the power to redefine them.
Your response to a large company violating the license that Linux is distributed under is to blame Matthew Garret for pointing it out?
On the other, there's equal opinions that this (or the way the did this) is -not- a violation.
So I don't think it's particularly fair to reach for your pitchfork, either.
Besides, his response had almost nothing to do with Ubuntu, it was "Garret's zeal in trying to keep ZFS off Linux" regardless of distro (which is true). "While taking potshots at Ubuntu" (which is also true, and on issues far wider than "including ZFS".
So far, Matthew Garrett has yet to claim that any attorney said this is a problem. The only claim he has made (after I got him to clarify what was said) is that he met some attorneys who said that they were not absolutely sure that there is no problem. There are likely attorneys out there that make similar claims about the GPL software in general, so I really am not that concerned that he found a few attorneys that said that they were not sure.
that zfs was "merged into the kernel tree," but so far as I know the GPL doesn't dictate that things can't be stored in the same location together. There are no official GPL certified directory structures, etc.
His argument about CDDL Linux kernel modules using non-GPL exported symbols being a problem is clearly FUD. Specifically, Fear of a violation; Uncertainty of a violation; and Doubt that there is no violation.
Or is he saying a binary module compiled from that source is non-distributable?
If the source is redistributable then a DKMS kernel module could be used because it distributes source that is built by the end user.
However, since Canonical cannot relicense the ZFS code to a GPL-compatible license (since they are not the copyright holder), if they distribute the ZFS module, they would be in violation of the GPL (and thus lose their rights under the GPL to the kernel code).
Whether that's actually true appears to be up for a debate depending upon whether or not a binary module is distribution or not, which is why he's suggesting he'll talk to the FSF about possible recourse.
 - https://twitter.com/mjg59/status/700074164435091456
If he wins, binary video card and wifi drivers will all vaporize, too.
nVidia would probably love to actually have a good excuse to not support free versions of Linux and only have to support closed, commercial ones.
EDIT: Looking through the comments in the article, it's being suggested this isn't using the FUSE implementation of ZFS, and that somehow it's part of the kernel. Not sure how they've legally managed to do that!
EDIT 2: Looks like it's a kernel module, as other comments here suggest.
Answer: They haven't. :) They've just created a GPLv2 violation.
They are not permitted by the license; for them to be violations, they would have to be derivative works that require a license.
If you have a reference to copyright case law in any jurisdiction holding that to be the case, that would be interesting.
I think its clear that certain parties (including the FSF) would like this to be perceived to be a violation; it also seems fairly clear that this type of act has been a fairly established practice around Linux, but that those with that view have not take action to vindicate it in court, perhaps because while they'd like it to be perceived to be the case, they have little confidence that courts would agree with them, and the one thing they'd like less than active disagreement with some people engaging in the practice that they don't like is a black-and-white ruling vindicating the ongoing practice and rejecting the FSF view on the legal requirements.
As far as I understood, the GPL only applies for code that is directly linked to it. A call to an external library (e.g. a kernel module) wouldn't necessarily apply. The code is being delivered as separate binaries.
Yes. Your understanding isn't correct.
With NVIDIA, there's this complicated dance where you download source code from nVidia for the kernel module, then compile it on your own machine and use it -- you aren't violating copyright because you don't distribute the kernel module that you compiled for yourself on your own machine.
But that kernel module does indeed violate GPLv2, and you can't distribute it legally, and neither could Canonical or nVidia (which is why they do the dance above instead).
If that's the case, then why doesn't the FSF sue Linux kernel developers over licence violations? There are clearly pre-compiled binary blobs distributed along with the mainline kernel (otherwise there would be no need for the Linux-libre project to exist: https://en.wikipedia.org/wiki/Linux-libre ). There's little point having a licence if there's no consequences for breaking it.
I suspect they don't because it's not a simple case, and that such a measure would be somewhat counterproductive for their cause.
I don't understand. The Linux kernel developers hold the copyright on the kernel. If someone sues someone else, it's them --- the kernel developers -- who have standing to be doing the suing. They didn't give their copyright to the FSF merely as a result of choosing to use the FSF's license.
See also, from a kernel developer: https://twitter.com/mjg59/status/700074164435091456
My understanding is that it isn't under the theory of what counts as a derivative work requiring a license (and, thus, what is subject to GPLv2 in the first place) espoused by the FSF, however, I believe that view has been hotly disputed as to its accuracy under US copyright law (at least) for about as long as the GPL has existed and never been tested in court.
The FSF, in general -- as is unsurprising for entity that relies on maximally leveraging copyright protections to achieve its ends -- holds to a fairly maximalist view of the legal rights of copyright owners.
Here's the copyright file from the relevant commit to the Xenial kernel tree:
The majority of the code in the ZFS on Linux port comes from
OpenSolaris which has been released under the terms of the CDDL
open source license. This includes the core ZFS code, libavl,
libnvpair, libefi, libunicode, and libutil.
It's not. Canonical has blatantly violated the GPL before and gotten away with it.
... there is no legal issue preventing the sources
from being combined because neither the CDDL nor
the GPL place restrictions on aggregations of
source code, which is what putting ZFS into the
same tree as Linux would be. Binary modules built
from such a tree could be distributed with the
kernel's GPL modules under what the GPL considers
to be an aggregate. These concepts have passed
legal review by many parties.
Whether any loadable kernel module can be licensed under a different license than the GPL is still up for debate (even Linus says "sometimes"):
(Of course, Ubuntu et al have been doing this for years e.g. with Nvidia or HBA binary blob drivers, so it does appear to be a mostly-settled issue..)
That said, I'd be interested to see what the solution was.
Then could you explain what this commit is:
Canonical cannot ship ZFS in binary form. Kernel modules are derivative works of the kernel, and thus they must be distributed under GPLv2.
It doesn't have the power to define a binary module using an internal interface as a derivative work; that can only be done by a court interpreting copyright law in a particular jurisdiction. In the United States, different Federal Circuit courts have different views of what constitutes a derivative work in software.
There is no legal precedent suggesting that was actually necessary either. Or are you aware of a court case
That being said, would anyone who believes this "linking to the kernel" argument please explain what linking actually means and how it is related to the GPL when the term "link" is not even present in the GPL?
The only case where you cannot distribute ZFS is if you link it into the vmlinux binary that your bootloader loads. In that case, it is no longer a LKM and you can claim the binary is a derived work of Linux. That is what I believe shmeri meant.
Building kernel module on the fly when deploying ZFS is an acceptable workaround. I think that's exactly how it was planned to be used in Debian. I really have no clue what Canonical are planning though.
I'm not going to comment further on any implications of Debian/Ubuntu's decisions, since IANAL.
If ZFS kernel module uses only those permitted symbols, it's likely fine from legal standpoint. If not, there is a problem.
Debian uses the compile-at-install-time workaround but Ubuntu ships binaries, which is a GPL violation.