
Who Needs Git When You Have ZFS? - urza
http://zef.me/blog/6023/who-needs-git-when-you-got-zfs
======
olavgg
ZFS is excellent for database development! Create a snapshot before you try
something that mess up your data and instantly restore it! If your development
database is a few gigabytes, this will save you a lot of time.

~~~
heisnotanalien
Except you'll lose all the data since you created the snapshot... You've been
able to do this for years on Linux with LVM anyway.

~~~
JustSomeNobody
I don't think you read the comment thoroughly. This is the point.

Also, the discussion is centered around ZFS, there's no reason to bring up
LVM. Everyone knows there's many ways to skin this cat and it doesn't make
anyone appear any smarter because they know of one way.

~~~
heisnotanalien
My point is that it's not really a great way of 'backing' up a database
because typically you want the transactions after you've made the change as
well.

------
raverbashing
> Who needs Git?

> Notably missing is support for merging

So, it's almost like a car except it's missing its wheels

No, it's not like Git

~~~
pmarreck
I really don't think the article was seriously suggesting what you think it
was; showing similarities between different types of tools (usually a more
familiar one and a less familiar one) is meant to be illustrative

~~~
Kurtz79
"I really don't think the article was seriously suggesting what you think it
was"

Then maybe it shouldn't have a purposefully click-baitingly title suggesting
it does...

~~~
combatentropy
>> I really don't think the article was seriously suggesting what you think it
was

> Then maybe it shouldn't have a purposefully click-baitingly title suggesting
> it does

I normally hate clickbait, and yet I liked this article and its headline.
Maybe because the tone was self-deprecating, like me getting on a bicycle and
saying, "Look out, Tour de France!"

Actually, it's like a fish getting on a bicycle and jokingly saying, "Look
out, Tour de France." Maybe the fish isn't anywhere near good enough on a bike
to compete in the Tour de France, but still, it's a fish on a bicycle.

In the same way, Git will run circles around ZFS as a source-code version-
control system. On the other hand, it's way cool that ZFS can even do some of
these things, because after all it's just a filesystem.

------
devonkim
I've been working with virtual machines in production and development for well
over a decade now and a lot of what works via ZFS in this article works very
similarly to how it would work we used AWS or VMware style snapshots. In fact,
ZFS snapshots are only so helpful if you make complicated changes across
multiple ZFS volumes, for example, that require some more transactional style
rollbacks. In such scenarios, it may be easier to just perform an instance-
level rollback.

One problem that didn't really occur to me before I became more ops-side was
that administrators would want to heavily restrict / remove users' (read:
developers and even other sysadmins) abilities to create snapshots in the
first place. Why would you remove self-service temporary backups and avoid a
lot of backup restoration requests? I didn't realize that other engineers
could be careless and keep dozens or even hundreds of snapshots over time that
gobble up expensive storage resources (SANs are _not_ cheap regardless of
manufacturer) and slow down I/O transactions over time. That was why so many
of my customers deploying stuff like VMware Lab Manager and vCloud Director
demanded the ability to remove access to snapshot features.

As a result of the typical usage where user abuse of a very powerful feature
threw things for a loop, the typical organizational structure and siloization
of these environments means that nowadays SAN-side LUN snapshots are used more
often than from the VM layer (the same administrators that manage VMware
environments typically have rights to the SANs). Using ZFS like this is a
developer-side reaction to me, but duplication of trying to solve the same
problem when technical solutions have existed and are viable is exasperating.

~~~
tw04
VMware snapshots are _NOT_ like ZFS snapshots. VMWare snapshots are extremely
heavy an will have massive performance impact if left around for an extended
period of time for any I/O sensitive VM. ZFS snapshots are essentially
0-overhead from a performance perspective, and function in a very different
manner.

~~~
devonkim
I'm aware of how both work (I had a VCP410 cert and happily run ZFS with many
snapshots on rotation at home), but IMO the uses are similar enough for the
purposes explored in the article despite the significant differences in
practice and felt that a traditional alternative should be mentioned.
Additionally, VMware-based snapshots can be performed SAN-side and integrated
with VAAI that can expect similar performance overhead as a ZFS snapshot and
still support a lot of options with trade-offs and features. The compulsory
indirections that happen in the traditional VMware snapshot implementation
with the on-disk block map appears designed for storage compatibility (much
like the design of VMFS in itself, including with that whole file-based mutex
system) primarily rather than performance not to mention the support for
memory persistence / quiescing with the guest OS, which ZFS snapshots do not
support. In contrast, ZFS has _all_ the metadata to quickly examine block time
or other rich metadata to accelerate read and write indirection of blocks
optimally. It's not clear if a SAN can accelerate ZFS using OSes in the way
that they can accelerate VMware storage but I'm curious of the potential.

On the other hand, recursive snapshots in ZFS are really slick and can achieve
90%+ of what people typically want from VMware-based snapshots when it comes
to any non-Windows OS if there's a little room for not needing to care about
indirect changes required in rollbacks.

~~~
tw04
...just no. VAAI snapshot integration with VMWare is only for NAS platforms,
not SAN - VMFS doesn't even come into the picture. Not to mention it's
intended for View for rapid cloning, and not as a backup feature.

------
erikb
> Notably missing is support for merging, which ZFS does not have direct
> support for as far as I'm aware.

So, no replacement. Merge and Branch are the major features of git. The bigger
the project the bigger the need for these. I know some projects where people
work full time as merge conflict resolvers. Without git that would require a
whole team instead of a person.

~~~
ChristianGeek
"Of course, I'm not seriously suggesting you'd ditch a 'proper' version
control system, but it gives a good sense of what's possible at the file
system level."

~~~
erikb
That's what the title says in big letters at the top though.

------
espadrine
> _Notably missing is support for merging, which ZFS does not have direct
> support for as far as I 'm aware._

That insight is critical, especially now that we are heading towards Dropbox
Infinite, IPFS, Ceph and so many other non-centralized file systems.

Syncing is not a solved problem at all yet. Advances can bring the same
revolution as 3-way merging did to SCMs, previously dominated by file-locking
mechanisms.

~~~
linsomniac
What was that replicated filesystem back in the 90s that promised to be the
next NFS, but with disconnected operation and merging when you came back
online? Coda, that's right. I had such high hopes for it in my move to using a
laptop as my primary machine, but never got it working despite several
attempts at it. Admittedly they weren't more than a couple hours at a given
time...

------
estrabd
ZFS doesn't have history in that way; snapshots are far too coarse. HAMMER,
for example, is closer to what you want if you're going to replace git with a
FS.

[https://www.dragonflybsd.org/hammer/](https://www.dragonflybsd.org/hammer/)

------
athenot
If you look past the attention-grabbing headline, this shows some pretty cool
stuff with ZFS, using source versioning as an analogy to explain what would
otherwise be quite abstract when it comes to filesystems. The only thing I
knew about ZFS was the name, this helped me see some of its impressive
features.

Of course, I'll take reliability & stability over fancy features any day, but
I'm glad there's active development in this area, and look forward to seeing
these features mature. They could contribute to some interesting
simplifications.

~~~
protomyth
> I'll take reliability & stability over fancy features any day

That is why people use ZFS and OpenZFS, they are battle tested and very
mature.

------
Finnucane
"Oracle's (previously Sun's) next-generation file system"

That's why you need to stick with Git.

~~~
xenophonf
What does Oracle have to do with anything? This is FUD.

~~~
ntlve
It is certainly not FUD, see [https://www.eff.org/cases/oracle-v-
google](https://www.eff.org/cases/oracle-v-google)

The point is, Oracle have decided to sue once over usage of something they
bought and may do so again in the future.

~~~
espadrine
Exactly.

The legal situation of Google's use of Java APIs was always a tiny bit murky,
but it was super-clear-cut compared to the use of ZFS-on-Linux.

~~~
ryao
Oracle sued over Google reimplementing code that was under the GPL (OpenJDK)
under the Apache 2.0 license (Apache Harmony). Google recently switched to the
GPL code because of it. In the case of ZoL, the code is derived from the
original code and is under the original license. The idea that ZoL is somehow
more at risk is pure FUD.

~~~
espadrine
My understanding is that the risk is mostly the other way around: it infringes
Linux' GPL license (see [https://sfconservancy.org/blog/2016/feb/25/zfs-and-
linux/](https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/)). Would
Oracle have grounds to sue as a Linux copyright holder? They have successfully
made API copyrightable, so they clearly have competent lawyers.

~~~
ryao
There are a few people who think that. There are plenty of others who do not.
Those in charge of reviewing licensing in Linux distributions that have
adopted it are generally of the opinion that it is okay, especially those in
charge of Sabayon and Ubuntu. Gentoo came to similar conclusions long before
them (>4 years before Ubuntu) and ships it in binary form on the LiveDVD. At
this point, there are at least a few dozen distributions that have adopted ZFS
in some form or another.

Here is a legal opinion on the matter:

[https://www.softwarefreedom.org/resources/2016/linux-
kernel-...](https://www.softwarefreedom.org/resources/2016/linux-kernel-
cddl.html)

~~~
xenophonf
I don't see how anyone could read that and come away thinking that ZFS-on-
Linux isn't kosher. Given as simple as it is to compile zfs.ko in the Ubuntu
live environment, I'd bet money that's Canonical's backup plan should a
literal interpretation of the kernel's license become a legal reality.

And here's to hoping that ZFS becomes so widespread that Oracle themselves end
up redistributing it in their version of Linux, like they did with DTrace.

------
claytonaalves
"Using ZFS as a replacement of Git for is probably not a good idea..."

Of course not.

------
parenthephobia
To play devil's advocate, sometimes Git is abused to store things that aren't
code, and ZFS (or similar) might actually be a better fit in those situations.

Somebody further down mentions art assets. Also potentially relevant are
document repositories.

ZFS isn't very easy to use for a lay person, but that's really only a "Time
Machine"-esque front-end away.

------
delonia
Ubuntu 16.04 comes with ZFS preinstalled and ready to use.

Well, you need to install the user package 'zfsutils-linux' and you are ready
to go.

The kernel modules for ZFS are already in the default kernel and are loaded
automatically when you create a volume.

------
zippergz
I have a FreeBSD server in my home office that has a big ZFS raidz volume that
I use for shared file storage, and it's really, really awesome. The snapshots
are especially great because they greatly reduce the fear of screwing
something up. I once wanted to run a de-dupe script on several hundred
gigabytes of photos, but I was afraid it would go awry, so I snapshotted
first, knowing I could roll back in a few keystrokes. (Granted, this wouldn't
help much if the script had caused some problem that wasn't immediately
apparent. And yes, I have backups of all of that stuff elsewhere, but doing a
restore from backup is a lot more work than rolling back a snapshot.)

------
mnd999
Strange there's no mention of FreeBSD's excellent ZFS implementation

~~~
veidr
Yes, _and_ the article casually implies that ZFS totally works not only on
Linux but also on OS X[1], which is... a stretch.

I check on it every year or so, but my impression is that ZFS on Mac is still
so far behind where ZFS is on FreeBSD and (more recently) Linux, that it isn't
really clear whether ZFS will _ever_ work reasonably on the Mac.

(I would love to hear experiences of people actually using it on OS X, though!
I may be out of date.)

[1]: [https://openzfsonosx.org](https://openzfsonosx.org)

------
chmike
How production ready is ZFS on Ubuntu (Debian?) ? Is there any risk of Oracle
fighting back open source zfs as for Java with Google and Android ?

~~~
delonia
ZFS is under the CDDL while Java is not under the CDDL.

CDDL is an open-source licence just as the GPL.

~~~
liotier
> CDDL is an open-source licence just as the GPL

But incompatible with the GPL, which mostly explains the current situation of
ZFS on Linux.

~~~
hhw
It's the GPL that's incompatible with the CDDL and not the other way around,
which is an important distinction. Other than other GPL variants and BSD and
MIT style licenses, the GPL has restrictions which make it pretty much
incompatible with everything.

~~~
belorn
I don't get how the word "incompatible" can be understood as anything other
than both license has aspects which makes combining them impossible. But
owell, lets see what the license text actually say:

 _Any Covered Software that You distribute or otherwise make available in
Executable form must also be made available in Source Code form and that
Source Code form must be distributed only under the terms of this License. The
Modifications that You create or to which You contribute are governed by the
terms of this License._

If you distribute source code, _this license_ means that you must use _this
license_. You can't use _some other license_ , as its not compatible with the
distributor deciding what license to use. CDDL license require that any source
code distribution use CDDL as its license, and it is incompatible with any
other license that has similar condition. CDDL is incompatible with GPL
because GPL has similar conditions as CDDL.

If we were to make an identical copy of the CDDL license and call it CDDLv2,
those two identical twins would be incompatible with each other. Software
under CDDLv1 would not be permitted to be combined with software under CDDLv2
and distributed as source code.

~~~
cyphar
Actually, the CDDL allows the person who wrote the License (Sun, now Oracle)
to release a new version of the license that implicitly updates the license
for all projects using the old license (It's like the "or any later version"
thing with the GPL, but I'm fairly sure it's not optional with CDDL). Which is
why people are asking why Oracle doesn't just release CDDL 2.0 that is GPL
compatible.

~~~
ryao
New CDDL code outside of Oracle is under CDDLv1 only to prevent them from
doing as they please with the license terms.

~~~
cyphar
But ZFS is not under that other license (which might be called CDDLv1, but
that's confusing). I don't get why new software uses the CDDL over the MPL if
you want "file based copyleft". If you want copyleft, just use the GPL.

~~~
ryao
Not much new software uses the CDDL. Fans of CDDL software typically opt for
BSD or Apache licenses.

------
egwynn
Three year old article teaches slightly outdated techniques and basic ZFS
usage on linux by way of specious comparisons to GIT.

I guess it wasn’t a half bad resource when it came out, but these days there’s
got to be better blog posts about this, right?

EDIT: Not that I’m bitter; I love zfs. I mainly wonder how something so old
ended up here.

~~~
somebehemoth
I very much appreciated this article and shared it with coworkers who also
found it helpful. I can understand the desire to only read the latest news,
but I've learned a lot of stuff in my life that was useful but only new to me.
The key benefit to this article was the comparison of ZFS to something I am
familiar with: Git. My wild guess is that other non-filesystem geeks like me
liked this article for similar reasons and upvoted despite its age.

Edit: Perhaps the ZFS to Git comparison is itself novel? I would like to read
about other filesystem features (any file system) from this perspective
because I think I would learn a lot.

------
linsomniac
I'm tempted to write an article: Who Needs ZFS When You Got Git, but the title
IS the article.

Don't get me wrong, I looove ZFS, and from a quick scan this article looks
like a good intro to ZFS, particularly for someone familiar with git.

~~~
kennywinker
Author is not actually suggesting replacing your version control with ZFS.
From the first paragraph:

> Of course, I'm not seriously suggesting you'd ditch a "proper" version
> control system, but it gives a good sense of what's possible at the file
> system level.

~~~
mikestew
So the title is click-bait, then? I almost read the article, but in the 342
milliseconds between thinking that and reaching for the mouse, I came up with
three reasons the topic alluded to in the title would be a dumb idea. And it
turns out that's not what the article is about anyway.

~~~
kennywinker
Good work, you avoided learning a thing! :hooray emoji:

------
jaz46
This blog post was one of the most inspirational pieces that led us to
founding Pachyderm (pachyderm.io). We offer git-like semantics for data,
including branches, and in a distributed system so it scales to petabytes.

------
exDM69
ZFS and other advanced versioned/copy-on-write filesystems are really cool and
useful but it's not a source code management tool.

E.g. the examples show branching but not merging.

On the other hand, this might be useful for storing binary large files that
are not really diffable and mergeable, and thus a bad fit for Git. However,
the typical use case for this is art assets in games, which means that it's
the artists and designers who are the target audience. Git is often said to be
too difficult for non-programmers, and ZFS or BTRFS is definitely not easier.

------
badmadrad
Weird title as Git and ZFS are used for completely different reasons but I
love ZFS as well and the article highlighted some of the finer details which
was great.

------
Ericson2314
An important difference is that While both rely on Merkel DAGs git uses
content addressing while ZFS doesn't (physical addresses impact hash too).

Relatedly, ZFS's memory management relies on the assumption that a forked
filesystem and its sibling will monotonically alias less data as either is
modified. This allows it to avoid tracing and ref-counting alike, but makes
implementing merging or `cp --reflink` difficult.

------
Wonnk13
Are there any beginner resources for learning about file systems? Would be
smart / harmful to reformat my whole macbook ssd to ZFS?

~~~
pmarreck
I don't think you can use ZFS as a boot volume on Macs (and only on Linux with
caveats).

~~~
cmurf
The limiting factor is the bootloader. Mac firmware can read HFS+ natively,
find the bootloader, which in turn recognizes HFSX, Apple RAID, and Apple Core
Storage. I doubt any of the previous ZFS support exists in the current
bootloader.

Unlike linux where there's a separate /boot, OS X doesn't really separate boot
from root file systems. So how you'd boot XNU from HFS+ and transition to a
separate ZFS root fs isn't obvious to me from the existing documentation.

------
QuantumRoar
This article gave me some confidence in finally trying to boot Arch Linux on
ZFS. This will either make my life very easy or very hard.

------
bechampion
Well.. I get the funny side of the comparison , they're two different things..

------
lazyant
I've been reading that ZFS was "almost ready for use" in Linux for years, how
stable is it for production now, anybody here using it, any good/bad comments
or tips? (using CentOS 7 at the moment)

~~~
ElijahLynn
To give you an idea, it just shipped baked into Ubuntu 16.04, the latest Long
Term Support (LTS) version.

[https://insights.ubuntu.com/2016/02/16/zfs-is-the-fs-for-
con...](https://insights.ubuntu.com/2016/02/16/zfs-is-the-fs-for-containers-
in-ubuntu-16-04/)

------
pmarreck
Anyone here think that a "ZFS in the cloud" that anyone could inexpensively
export a ZFS snapshot (or volume) to would potentially be a business model
(offsite backups, etc.)?

~~~
Freaky
rsync.net does:
[http://www.rsync.net/products/zfsintro.html](http://www.rsync.net/products/zfsintro.html)

~~~
pmarreck
Ah, I think I've even come across this in the past and forgot about it!

Is $60/TB/month competitive?

------
nabla9
More like git-annex and git LFS.

ZFS can be good cms choice for large binary files.

------
CommanderNyx
ZFS is very powerful indeed; I've been using ZFS to share a data partition
between OS X and Ubuntu. The versioning capabilities the OP mentions has a lot
of potential.

------
rlpb
Branches and tags are just snapshots and clones? This actually looks more like
Subversion than git, together with the challenges with managing merges that
this model brings.

------
db48x
I like ZFS a lot, but it's no replacement for Git. Snapshots are really handy
for a lot of things, but not for a group of developers collaborating over some
source code.

~~~
emmelaich
Did you read the article?

    
    
        "Of course, I'm not seriously suggesting you'd ditch a "proper" version control system

------
skrowl
Almost all of what they're saying the benefit here isn't ZFS specific and
could be done (for example) with Volume Shadow Copies on Windows servers as
well

------
mnd999
Strange he doesn't mention FreeBSD's ZFS implementation.

------
akerro
It looks like ZFS which came from BSD systems will be/is the next big thing in
Linux, not Ubuntu on Windows which gets more hype...

~~~
4ad
ZFS (not ZSF) came from Solaris, not BSD.

------
cyphar
I personally feel that we should all hop off the ZFS (on Linux) hype train and
consider a few points. Currently ZoL is being developed by two people, and it
works by creating a translation layer into the Linux kernel API. The bug list
is enormous (much bigger than btrfs), and there isn't enough experience with
the codebase by Linux kernel developers. That's ignoring the potential legal
issues. No other ZFS port has these problems.

On the other hand, if you want a supported filesystem with many of the same
features as ZFS, there's btrfs. It alleviates all of the problems with the ZoL
port. And there's no fear of Oracle lawsuits.

~~~
ryao
There are far more than 2 people here:

[https://github.com/zfsonlinux/zfs/commits/master](https://github.com/zfsonlinux/zfs/commits/master)

I will admit that my contribution activity has been low as of late, but it
will pick up again soon. As for the bug list, that is because basically all
distribution problems get sent there and duplicates are not closed until it is
proven that they are duplicates. There is no such transparency with other
filesystems' bug tracking. Redhat receives plenty of bug reports through
Fedora that they happily close if not solved by EOL.

As for btrfs, one of the btrfs developers claimed that ZFS has 5 times the
development resources of btrfs:

[https://news.ycombinator.com/item?id=11749477](https://news.ycombinator.com/item?id=11749477)

As for lawsuit fears, the fact is that using any software at all puts you at
risk of a lawsuit. Whether or not the plaintiff has a legitimate case is a
separate matter. However, if we consider the prospect of Oracle having a case
against those using btrfs or ZFS, btrfs would be at higher risk. The CDDL
provides an implicit patent grant while the GPLv2 does not. Any ZFS patents
that are applicable to btrfs that Oracle accquired from Sun could be used
against those using btrfs. The repercussions for Oracle would be huge, but if
we are discussing about the potential for legal issues with Oracle, then btrfs
is at the greatest risk.

From what I know of the internals of the two, btrfs has far more problems.
This shows some of the problems on an enterprise distribution:

[https://news.ycombinator.com/item?id=11749010](https://news.ycombinator.com/item?id=11749010)

The idea that performance can be so horrible that the system might as well
have deadlocked is a severe problem. The ENOSPC issues from internal
fragmentation is another severe problem. Then there is the lack of backports.
I could continue, but I see no need.

