
Summary of ZFS on Linux for Debian - ferrantim
http://article.gmane.org/gmane.linux.file-systems.zfs.user/18418
======
fiatmoney
I use ZFS on Linux for my file server. One of the nicest things about it is
that the caching algos actually work - they're resilient to scans, so I can,
for instance, have an intensively used database file that remains in RAM, even
while & after doing a linear scan over a large number of files (eg during
rsync). With the standard Linux page replacement algos, linear reads will
flush the stuff you're actually using out of the page cache.

The fact that the caching algos are so good at keeping things in memory is why
everyone gets hung up on using ECC RAM with it.

~~~
mrb
This is not the reason why people recommend ECC. ZFS is so good at detecting
corruption (CKSUM errors) that when it happens it is hard to tell what caused
it: faulty RAM corrupting data before being written to disk or after being
read from disk, or data corrupted on disk? ECC simply helps reduce a good
chunk of corruption errors caused by faulty RAM. That said, in my experience
there are a few tell-tale signs that RAM is the issue: when you see CKSUM
errors poping up infrequently, and not being attributed to consistently the
same drive(s).

~~~
byuu
I haven't had any issues _so far_ using four drives mirrored on ZFS, but the
ECC issue certainly worries me.

I'd love to run ECC RAM, but I'd have to buy a much more expensive processor,
a much more expensive mainboard, and all of my RAM would have to be swapped
for moderately more expensive replacements. And DDR3+ is not cheap.

I'm curious though, if ECC consists of a ninth parity bit, is there any reason
why a memory controller can't be designed that would, worst case, use every
other (identical) stick just to get parity bit(s), as a BIOS configurable
option? Sure it'd halve your RAM, or you'd pay a lot more in RAM costs, but
not having to buy Xeon processors and mainboards, and getting to reuse your
existing RAM, would be worth buying an extra stick of RAM in my opinion.

It seems as processes keep getting smaller, and RAM sizes keep getting larger,
that the effects of cosmic radiation are just going to keep getting worse. If
we can't get desktop CPUs and mainboards to just switch to ECC, surely this
would at least be a better-than-nothing option.

~~~
mng2
I went ECC a couple months ago and managed to find some parts that didn't
break the bank, mostly because I held back and stayed a generation behind:
GA-6UASL3, E3-1230V2. There are other sacrifices: the motherboard is rather
no-frills, and the memory bandwidth isn't that great.

Besides price, another issue against server hardware is that it's generally
designed for a server case and therefore doesn't play well with enthusiast
cases. The board I got is a little odd in that it uses a layout more typical
of consumer boards, though the socket placement is still not ideal.

~~~
simoncion
One of the reasons I went with AMD processors and mobo chipsets for my non-
laptop system (and recommend them for anyone who's not concerned about
performance per watt) is because AMD doesn't segment the market by processor
features.

You choose based on power budget, thermal budget, and presence of on-die
graphics. Regardless of how you choose, all processors in a given generation
(back to _at least_ the Phenom series and associated mobo chipsets) have ECC
support, virtualization support, and whatever other fancy features were
slapped in in that generation.

~~~
tracker1
Most MB vendors (other than ASUS) disable ECC for non-server cpu's... Also,
it's Unbuffered ECC, not Registered ECC for those looking, and unsure.

------
ryao
Installing / on ZFS is fairly easy on most major Linux distributions, but
Debian has been the main exception due to its initramfs generator lacking ZFS
support. I am cautiously optimistic that will change.

If not, then this issue should go away when I publish ZFS support patches for
syslinux later this year. syslinux is capable of generating initramfs archives
on the fly, so adding ZFS support to it should largely eliminate the need for
distribution-specific initramfs generators.

~~~
LaikaF
I am not a particularly good Linux user ( I have to look up most commands/
where things are) and I had no problem getting ZFS set up on Debian. I'm
worried now that you say this, because I feel I may have messed something up.

~~~
StavrosK
He's talking about the root partition being on ZFS, not just setting up an
array.

~~~
RexRollman
Don't most people just use a /boot partition anyway?

------
aidenn0
It's interesting that the Debian people feel that ZoL is not a derivative
work.

I remember a thread where RMS claimed to Bruno Haible that clisp was a
derivative work of readline, since it had _optional_ readline support.

I always thought that position was untenable, but since Haible was open to
licensing clisp under GPL anyway, there wasn't a whole lot of pushback.

~~~
ryao
In that case, readline was not a loadable module. A comparison that does use
loadable modules is the FSF's GCC project. The FSF resisted implementing
support for loadable modules in GCC for a long time under the belief that it
would allow the use of GPL-incompatible modules. It was not until LLVM made it
a moot point because GCC itself could be replaced entirely non-copyleft code
that GCC gained support for this. Linux kernel module support analogously
permits loading modules that are under GPL-incompatible licenses.

Note that I am not associated with the Debian project and therefore I was not
involved in the discussion referenced here.

~~~
mikepurvis
Another one I've been wondering about recently is the inverse— loading at
runtime a GPL module into an otherwise BSD codebase.

ROS (robot operating system) runs into this with nodelets, which are shared
objects that are loaded into a nodelet manager. Is it a GPL violation to
supply a launchfile which specs the loading of BSD and GPL nodelets into a
single running process?

~~~
yebyen
BSD and GPL are actually compatible and can be distributed together. It's my
understanding that the advertising clause is the bit that makes them
incompatible, and that regular BSD and GPL code can be bundled and distributed
together.

What else I got from the OP thread is that if you do this (lets just assume
the two licenses are incompatible) then you are not the violator, since you
haven't distributed these as one binary package, but the users might be (only
if they go on to redistribute the pre-built confabulation of
binaries/processes as one package, or even just in uploading them together to,
say, a hosting provider.)

------
WestCoastJustin
If you are looking at playing around with ZFS on Linux, be sure to check out
Aaron Toponce's awesome series of articles, entitled "Install ZFS on Debian
GNU/Linux" [1]. I have also done a two part screencast about using ZFS on
Linux [2], part two will be released later today.

[1] [https://pthree.org/2012/04/17/install-zfs-on-debian-
gnulinux...](https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/)

[2] [https://sysadmincasts.com/episodes/35-zfs-on-linux-
part-1-of...](https://sysadmincasts.com/episodes/35-zfs-on-linux-part-1-of-2)

------
jeffdavis
Why doesn't Oracle just change ZFS to dual-license GPL/CDDL, and scrap btrfs?

My experiences with ZFS have been quite good, and with btrfs quite bad.

~~~
orkoden
Because Oracle wants you to buy Solaris if you do serious business.

~~~
riffraff
I guess the question remains, why do they keep developing btrfs ? (do they?)

~~~
ryao
Sun received dozens if not hundreds of patents for techniques used in ZFS and
since Oracle purchased Sun, it now has those patents. btrfs uses many of the
ideas that ZFS uses, so it is highly unlikely that btrfs does not infringe on
at least some of those patents. People who use ZFS have a patent grant through
the CDDL, but people who use btrfs have no such protection from the GPLv2.

So far, I am not aware of anyone who has gotten any legally binding assurance
from Oracle that shipping btrfs will not be a problem. I am also not aware of
anyone in the btrfs community asking Oracle to do something about it. If btrfs
takes off, the ZFS patent portofilo could ensure that Oracle is the only
company that can legally distribute btrfs. Consequently, Oracle's legal
department would likely be able to have a field day with any company
distributing btrfs. In the meantime, their competition will have to develop
workarounds and Oracle would be ahead because they will have a better
filesystem than they would have had they been upfront about this issue.

What Oracle might or might not do with the ZFS patent portfolio in the future
is speculation, but the fact is that Oracle appears to have reserved the
option to use it in future lawsuits.

~~~
rodgerd
> What Oracle might or might not do with the ZFS patent portfolio in the
> future is speculation,

This whole post blows straight through "speculation" and into "unpaid
marketing for Oracle" in the worst possible way.

~~~
PhantomGremlin
I don't think that's fair at all.

I don't like Oracle, I think they're one of the most evil companies in
existence. But that doesn't automatically make anyone who attempts to explain
Oracle's behavior a shill or "unpaid marketing for Oracle".

------
byuu
So it sounds like they are okay with binary kernel modules, just not built-in
to the base kernel. FreeBSD manages to do ZFS as a kernel module (plus another
module for Open Solaris abstractions) quite successfully. Although it has
somewhat of an ugly ZFS-on-root shim loader for booting from a ZFS partition,
it does certainly get the job done.

Here's hoping Debian can develop something similar so that users can create
and boot from a ZFS partition during their installer.

> CCDL is an Open Source License that is DFSG compliant

I don't mean to nitpick, but if you're going to discuss the legalities of a
license, at least spell it correctly. It's not CCDL, it's CDDL, or Common
Development and Distribution License.

------
acd
ZFS on Linux is very good!

~~~
aruggirello
But is it, performance-wise? How will ZFS compare to btrfs?

~~~
pizza234
At this point in time, they still can't be compared, because btrfs is still
not [considered] production-ready, and subject to significant changes (also
performance-related).

This is especially important because on use cases where performance
differences are significant, that is, not on general desktop usage, the
maturity of the FS is funamental, and btrfs is discouraged right now.

~~~
derefr
There are many situations where performance is important, and durability is...
not. For example, ephemeral CoreOS cloud instances running Docker containers:
they use lots of copy-on-write layers, but they don't actually need to persist
any state across reboots (the layers may as well be stored in volatile
memory.) Btrfs is perfectly "production-ready" for this _particular_ use-case,
so a [current] performance comparison would be pretty useful.

~~~
ryao
I am working on ZFS support for CoreOS. A snapshot of my WIP proof of concept
was posted by my employer yesterday:

[https://github.com/ClusterHQ/flocker/blob/zfs-on-coreos-
tuto...](https://github.com/ClusterHQ/flocker/blob/zfs-on-coreos-
tutorial-667/docs/experimental/zfs-on-coreos.rst)

CoreOS uses btrfs as its rootfs. I imagine that the CoreOS developers managed
to avoid issues like ENOSPC by virtue of not writing to their rootfs very
much. I did not have that luxury since I compiled a Gentoo GNU userland on top
of it during the course of development. I encountered numerous ENOSPC errors
on btrfs when developing the ZFS port to CoreOS and even hit ENOSPC errors
when trying to correct the btrfs ENOSPC errors with `btrfs balance /`. I would
not consider btrfs ready for production use, but your mileage will vary.

~~~
simoncion
When were you hitting these ENOSPC errors, and what kernel were you using when
you hit them?

    
    
      $ df -h .
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/dm-1        42G   41G  280M 100% /home
      $ btrfs fi df .
      Data, single: total=40.48GiB, used=40.21GiB
      System, single: total=4.00MiB, used=12.00KiB
      Metadata, single: total=1.01GiB, used=649.18MiB
      $ uname -r
      3.15.2-hardened
    

As you can see, I've been working with a pretty much full btrfs volume. It
used to be _TERRIBLE_ to deal with btrfs in such a situation, but I haven't
had an ENOSPC issue in ages.

~~~
ryao
The Gentoo Prefix boootstrap on a developer image:

[https://www.gentoo.org/proj/en/gentoo-
alt/prefix/bootstrap.x...](https://www.gentoo.org/proj/en/gentoo-
alt/prefix/bootstrap.xml)

CoreOS uses Linux 3.15.y.

------
_delirium
> Debian maintainers vote to ship ZFSonLinux in Debian

I don't believe that's what the linked post is saying. I may be missing
additional context that's posted elsewhere, but at least what I read in _this_
thread is: 1) the Debian ftpmasters rejected the binary ZFS module upload; and
2) the Debian ZFS-on-Linux team met at Debconf 14 and agreed on this
summary/response, arguing why it should be accepted.

But has that response itself been accepted? Where is the mentioned vote? The
only other post I see in the linked thread is from Lucas Nussbaum (Debian
project leader), which sounds inconclusive,

 _I think that adding an actual question to our legal counsel would help focus
their work. ... I 'll wait for comments or ACK from ftpmasters before
forwarding your mail to SFLC._

~~~
ryao
You are correct. There is no sign of a vote. That inaccuracy aside, the
outline of the general understanding of the licensing situation is a step in
the right direction. The email itself suggests that there was tentative
agreement over this at DebConf 14, which is promising.

~~~
_delirium
My read was that there was agreement at DebConf 14 among the ZFS-on-Linux
maintainers specifically, not necessarily that they'd gotten buy-in from the
wider Debian community. It's somewhat unclear though.

------
Thaxll
I stopped using ZFS when I learn that using non ECC memory was dangerous and
could corrupt sane data.

~~~
ChuckMcM
I'm confused by this statement, using 'non ECC' memory _is_ dangerous and
_does_ corrupt otherwise sane data. Using or not using ZFS doesn't change this
danger.

~~~
Thaxll
Other FS won't try to correct sane data on HDD but corrupted when loaded in
memory.

~~~
XorNot
ZFS does not do this. Your assumptions about what other FSs do is inaccurate.

Consider: you load data from ext4 into RAM, it gets corrupted. You change some
bytes, then save it. Corrupted data is then written to the disk. Could be
anything inside the allocation unit size, which is 4K generally - not
insubstantial.

ZFS's worst case behavior is exactly the same: it can't protect you from
corruption in RAM after checksum validation if you then ask it to save that
data to disk.

There is no difference: if you do not have ECC RAM, you are vulnerable to
bitflips corrupting your data - and they will have been.

ZFS won't corrupt data which gets altered in memory due to a bitflip but is
only ever read. It _can 't_ \- because in-memory bitflips can't be detected.
Even if at some point during checksum validation there was a mismatch, the
restore is done from checksum protected parity/mirror image data. And if the
restore block were corrupted, the checksum won't match and ZFS will rebuild
the block correctly next time.

If you do not have ECC RAM, _every_ filesystem will potentially corrupt your
data. ZFS is _more_ resilient then pretty much all of them even in this case.

~~~
j00lz
The worst case with ZFS and ram corruption is that you can lose your ENTIRE
Zpool. As there are no zfs recovery tools available this means your data is as
good as gone or a minimum 15k spend to get it re-assembled.

This makes it far more risky to run ZFS with non server grade parts.

~~~
ryao
It is possible for this to happen with other filesystems too. The only thing
is that when it happens with another filesystem, it is not news.

~~~
j00lz
It becomes the news as there are no recovery tools available for ZFS as there
are with most other file systems. Meaning that the possibility of losing ALL
data due to ram corruption becomes a real threat.

