
Who Needs Git When You Got ZFS? (2013) - matt42
http://zef.me/6023/git-zfs
======
mrb

      $ dd if=/dev/zero of=/tmp/disk1.img bs=1024 count=10485760
    

is a terribly inefficient way of creating a test image. It wastes 10GB of disk
space. It takes dozen of seconds to run. Etc. Instead you can create a
_sparse_ file of 10GB that takes 0 bytes on disk. The creation of such a file
is instantaneous:

    
    
      $ dd if=/dev/zero of=/tmp/disk1.img bs=1024 seek=10485760 count=0
    

The key is seek=XXX. The filesystem will not allocate blocks before this
offset. An application reading before this offset will just read virtual zero
bytes.

Otherwise, this is a nice succinct article showcasing the snapshot/clone
feature and send/receive snapshot deltas between 2 systems.

Edit: on Mac OS X, UFS supports sparse file, but not HFS+. On Linux, all major
fs support sparse files.

~~~
LeoPanthera
Huh. I didn't know that. Does it work on any filesystem?

~~~
panzi
I think about anything newer than fat32 supports sparse files.

~~~
dasmoth
The FS people today are likely to come across which doesn't support them is
HFS+.

~~~
e12e
Afaik neither exFat or fat32 support it -- and they are still(?) popular for
flash sticks and/or drives that needs to be shared between OS'.

------
transfire
While what the author demonstrates as the capabilities of ZFS to act in many
respects like a version control system --one that you would never actually
use, I think is foretells a future for file systems. It only makes sense that
versioning should in fact become a standard part of all file systems
eventually. There are definite limitations born of version control and file
system not "living in the same world", so to speak. The most obvious of these
is the notorious `rm foo`, oops... I mean `git rm foo`!

~~~
GFK_of_xmaspast
I was using a versioned file system in 1992, and you didn't even have to be a
privileged user to take advantage of it.

~~~
snogglethorpe
... and versioned file systems in production use have been around since _at
least_ the early '70s... (tenex/twenex, etc)

~~~
wglb
and Tops10, Tops20 as well.

------
frik
We should think more _out of the box_. BeOS BFS, (ReiserFS4) and early
versions of NTFS supported extensive object oriented metadata and search
capabilities ("Cairo") - something like WinFS but directly in the filesystem
driver. Data could be organized like in relational databases today, you could
find your data in various ways. Not just in the old and proven way of
hierarchical directory tree.

A modern filesystem that is available for common operating systems would be a
good starting point.

I am not sure if ZFS is the answer, there might be better FS around, but we
need definitely rock-solid file system drivers that we can rely on for WinNT
5+ and current OSX (as well as Linux and *BSD but at least ZFS has it for the
later).

At the moment the common determinator of all filesystem drivers are various
editions of Microsoft FAT filesystem - a pretty basic and old filesystem by
today's standards. NTFS as filesystem is very rock-solid and read & often also
write support is available in all common OS.

We would definitely benefit if solid r/w filesystem drivers like ext4, XFS,
ReFS, BFS/BeFS, Btrfs, etc. would be available on more OS platforms.

~~~
optimiz3
This ignores the massive body of research and evidence that suggests putting a
database in a file system is a Bad Idea.

File systems deal with organizing unstructured data (i.e. blocks of bytes);
databases deal with organizing structured data (i.e. typed records).

Efficiency and scalability come from decoupling the FS and the DB and letting
them specialize.

Examples: GFS + Bigtable, Azure extent/partition manager+ Table store,
Amazon's various storage elements.

Pushing the DB into the filesystem doesn't really buy you anything - you still
have to solve the unstructured page management/allocation problem.

Counter-examples: WinFS, Cairo, Windows Registry (which I'd argue was a large
failure).

It's an idea that sounds good on paper, but fails on the theoretical
(unstructured vs structured) and practical aspects (distributing structured
data is MUCH harder than distributing unstructured data).

~~~
frik
I would argue that Nepumok, Cairo and WinFS failed because of the project
management failed to meet the milestones, not because a filesystem with an
index and query interface is a bad idea. Please point me to research
documents.

The Cairo project documents never specified the query-language and UI part of
Cairo. And this was basically what never got implemented, eveything else made
it. WinFS was doomed to fail because it run in user-mode in dotNet (in
Longhorn era PCs were slower), instead of adding the query part to the NTFS
driver in kernel mode. The Shell integration with only UNC path and dotNet
only API was bad. And the object oriented metadata scheme was way to
complecated, especially if used on an SQL database

WinFS beta1 worked okay, it was just very slow (dotNet services + SQL server
in background, stored in hidden directory on NTFS). WinFS never made it
because it was way behind the shedule and too slow.

NTFS and similar modern file systems are modular enough that would make it
possible to add the missing feature a query interface directly into the kernel
driver. Operating system of course would need to expose the API too so that C
functions like fwrite() and WinAPI WriteFile(), etc. could be used to access
files using the file directory tree as well as using a query language (e.g.
Windows search exposes in Explorer address bar)

~~~
optimiz3
The problem with Database-as-a-File System is people are really asking for
The-One-Unified schema. The problem with The-One-Unified schema is it
anticipates all future requirements which is impossible.

At best you can add a few more structured primitives, but that's not much
better than SQL-lite or whatever you prefer running on top of a block store,
since you don't know or really care about the domain of every application.

~~~
frik
Microsoft SharePoint does everything of WinFS, it acts as WinFS like file-
server for office documents. It comes with default schema (columns) and the
administrator can add specific company relevant metadata fields. You can
group, filter search, create custom views based on metadata. It all works
great, but in the end of the day it's just a website and managing more than
one file at a time is cumbersome (it's a website not Explorer/shell) and even
if one can open directories in Explorer using inbuilt WebDAV protocol the
WebDAV integration is ok but it is as featureless as the zip-file support in
Windows shell (no rightclick menu entries, no new file, etc.).

With a native OS integration other applications could take advantage of the
new possible features.

------
seryoiupfurds
Something similar is done for Plan 9 development using the Fossil+Venti
filesystems.

You need access to the Fossil fileserver console to force a snapshot ``right
now'' rather than waiting for the next scheduled time, but since Fossil is a
user-space program, you could trivially run your own separate Fossil
filesystem against the Venti store without needing any sort of root access.

~~~
4ad
Not only fossil+venti, the old standalone cached-worm file server, and its
user-space port cwfs were also built around versioning.

By default, the cached-worm systems are dumped at 5am every day in the
morning, meaning you can access your files as they were at 5am every day. You
can manually dump them whenever you please, of course. These dumps are cheap,
but not as cheap as fossil or ZFS (or git) snapshots. Unlike ZFS (not sure
about git), these dumps are immutable, you can't delete them.

With fossil+venti, you still get daily dumps, but you also get finer grained
ephemeral snapshots. By default at 15 minute intervals, I used to set them at
5 minutes. You can control how long you want to keep these in fossil, I kept
mine for 3 months. The dumps are venti are immutable. These are very very
cheap, and these systems also do deduplication, by default.

I don't run fossil anymore, I like the features, but reliability was less than
stellar for some people, and the performance is much lower than cwfs, which is
what I use now. Cwfs is very fast, rock solid, and very easy to recover in
case of a catastrophe. I miss ephemeral snapshots, but cwfs dumps are cheap
enough I can run them for what in a normal system you'd do "commits" for.

I miss all these features when I am forced to use Unix. Git does two things,
history preservation and patch management. For history preservation, nothing
beats the Plan 9 system. What git does better is patch management. I think
it's valuable not to conflate these two concepts and create tools that can
solve each one well, and work well together.

------
jimmcslim
Given the article is from August 2013 does anyone have any updated feedback on
the stability of ZEVO's ZFS-for-Mac product? I'm using ZFS quite happily on a
HP Microserver (although I have yet to have to recover from any disk failures,
and do weekly automated scrubs) and would consider using it on my Mac.

~~~
runlevel1
ZEVO doesn't currently work in OSX 10.9.

Despite its name, ZEVO Community Edition is a closed source project run by
GreenBytes (recently acquired by Oracle).

Back in Nov. 2013, GreenBytes announced that they don't have plans to continue
development of ZEVO.[^1]

So unfortunately, ZEVO is dead project.

[^1]:
[http://zevo.getgreenbytes.com/forum/viewtopic.php?f=5&t=2244](http://zevo.getgreenbytes.com/forum/viewtopic.php?f=5&t=2244)

~~~
anders
ZFS on OS X has been picked up by a new project:
[https://openzfsonosx.org/](https://openzfsonosx.org/)

------
pan69
I was actually interested in running a ZFS set up but after talking to some
people they advised me not to run ZFS on a system without ECC memory. Reason
being, if corrupt memory would be written into your ZFS set up you won't be
able to recover from it.

I'm curious about any further thoughts/experience on this.

~~~
thecabinet
One of you missed the point. The reason ECC is suggested with ZFS is because
all of the other features (RAIDZ and checksumming in particular) don't do any
good if your data gets corrupted in memory. This is true for all filesystems,
but most of them don't try as hard to protect your data in the first place.

~~~
pan69
I could very well have got it wrong but the way I understand it was that in a
file system such as ext4, if a file is somehow corrupted in memory and written
to disk you might not be able to recover from this and the file might become
useless. However, with ZFS you would lose "all" your files in the file system.
Again, I might have misunderstood.

~~~
makomk
Wouldn't surprise me, ZFS is pretty bad at recovering from metadata
corruption.

~~~
XorNot
Disagree: ZFS is show stopped by unimportable pools, which currently are
probably _usually_ recoverable if someone could do the work to fix that use
case (still iffy).

I've had a ZFS volume get very thoroughly trashed though by bad writes and
still been able to recover data from it (back in the foolish days before ZFS-
on-Linux when I had OpenSolaris in a VirtualBox VM with real disks, and
_something_ happened that randomly nuked blocks all over one of the
filesystems. Interestingly: just the one FS. I had a whole bunch and others
were untouched).

------
jackyb
One thing that Git does not do is to track files, for a good reason. Git was
designed primarily to track content which means it can track code movement
between files and be smarter about compression. It is not limited by
architecture of a file system.

~~~
dsturnbull2049
ZFS is not limited to your concept of a file system either. Any change is
represented by the addition of a series of one or more blocks, which can be
seen as direct equivalents to blobs in git. ZFS is copy-on-write, so you can
always reference any particular content change regardless of what file it was
called when you made it. Compression and deduplication are handled almost the
exact same way as you'd expect.

------
cryptonector
git is not reliable in the face of power failures. I believe this is due to
insufficient care taken in writing to reflogs and packs when receiving.
(Reflogs are appended to, for example, while objects are first written to,
fsync'ed, then renamed into place, so object writing is incrementally atomic,
while everything else... not so much.)

Fossil uses SQLite3, but it doesn't do what I want -- only git gives me the
power of the index, rebase, and light-weight branches. (OK, Mercurial has
those too, but it's too late and it sucks. Sorry.)

Fossil does get several important things from using SQLite3: SQL for history
(do NOT underestimate this), extensibility (one file format, trivially
extensible schema), well-tested ACID semantics support. Oh, one more thing:
using SQLite3 minimizes fsyncs per-transaction. There's probably more
benefits, actually.

And ZFS? Well, ZFS snapshots give you ACID. If you have a persistent ZIL then
fsync() gives you barriers (nice!). But a) snapshots are slow by comparison to
SQLite3 COMMITs, and b) you don't get the benefit of SQL.

Oh: and not everyone can haz ZFS, but everyone can haz git.

Everyone also can has SQLite3.

My proposal: backend git's I/O abstraction layer (which is pretty nice) with
SQLite3. And if you have ZFS, then always take a snapshot in a post-receive
hook; destroy older snapshots later in a cronjob.

(As for zfs destroy, you really want the async zfs destroy feature, which IIRC
not every implementation has!)

~~~
cryptonector
I'd really like reflogs and branches to themselves be objects.

Only packs (maybe) and tags (also maybe) should not be objects.

If reflogs were objects then... they could be pushed and pulled, which would
be really nice indeed.

If branches were objects then they could record rebase history. I would really
like to be able to capture rebase history: what a branch's HEAD was before and
after a rebase, as well as the picks/squashes/edits/rewords/drops done in an
interactive rebase, and even the merge pre-/post-base (--onto).

------
amckinlay
We need some way for these snapshotting/logging file systems to integrate with
application-level versioning to reduce duplication.

I had this problem with git and nilfs2.

~~~
rakoo
As far as I know, ZFS does "automatic" deduplication, meaning that your
application doesn't have to care about specifying it to the filesystem; ZFS
will automatically detect duplicate regions and store them only once.

~~~
chousuke
ZFS can do dedup if you enable it, but IIRC the recommended amount of memory
to safely use it is ~5GB per terabyte of storage.

------
yamadapc
As the article seems to be down, here's the text-only (google) cached version:

[http://webcache.googleusercontent.com/search?q=cache:CTMc7ZQ...](http://webcache.googleusercontent.com/search?q=cache:CTMc7ZQuOB4J:zef.me/6023/who-
needs-git-when-you-got-zfs&strip=1)

------
songco
Where is github for zfs?

------
mantrax5
> "Using ZFS as a replacement of Git for is probably not a good idea."

Uhm.

> "I’m not sure of the stability on Mac at this time. Using ZFS as a root file
> system on Linux is still slightly problematic at this moment."

And Windows? Never mind.

Ok, so the author wrote an article called "Who Needs Git When you Got ZFS",
and then in the body of the article the conclusion is more like "Who needs ZFS
When You Got Git".

ZFS does a bunch of cool stuff in a generic way as a file system, sure. But we
already have apps doing it better in a way specific to real-world needs (i.e.
Git), on more platforms, and I don't have to fear if I'll corrupt my boot
volume if Git is buggy.

~~~
mrb
In reply to your (edited out) comment that "ZFS saw poor adoption":

You might not know it, but out of all the new filesystems designed after 2005
(a few dozens?), ZFS appears to be the one that has seen the fastest
adoption/growth [1].

Source: my anecdotal experience in the industry, the experience of many of my
colleagues, random consultants reporting "I've seen 1000's of large ZFS
deployments" [2], etc.

[1] Outside of ext4 which was introduced in 2006 and was a simple refresh of
ext3, not really a "new" filesystem.

[2]
[http://nex7.blogspot.com/2013/03/readme1st.html](http://nex7.blogspot.com/2013/03/readme1st.html)

~~~
mantrax5
There was a joke when Microsoft was pushing Zune that it has the fastest sales
growth "in the brown MP3 players market".

The category you define may be seen as arbitrary, first because existing
filesystems have not stopped evolving. HFS, NTFS and so on have been
significantly improved over time, including after 2005.

ZFS is a categorically different concept than said examples, in that it's
distributed.

But the file system has not been shown to be the best (or let alone only) way
to have distributed data, you can easily put that at a higher level, and
retain the flexibility of defining your own protocol with the consistency,
consensus algorithm and other properties you precisely need.

Why solve this at the file system and lock yourself to a single system-wide
way of doing it, versus have the best solution for each part of the system? I
feel like it causes more troubles than it solves.

~~~
dmpk2k
> ZFS is a categorically different concept than said examples, in that it's
> distributed.

What? No, ZFS is _not_ a distributed filesystem. It never has been, it almost
certainly never will be, and it has little in common with distributed
filesystems.

What makes ZFS different is that it is a production-grade copy-on-write self-
validating merkle tree. Most of its properties fall out from that. There's
nothing distributed there.

I'm saying this in the kindest way possible: please don't write about things
that you have zero idea about. You cannot possibly be more fundamentally wrong
about ZFS, and nothing you wrote makes any sense. :(

------
jrockway
Let me know when I can host my project on zfshub.com.

~~~
croggle
"Using ZFS as a replacement of Git for is probably not a good idea"

That's from the article... You know... that thing you didn't read?

I don't know what the correct term is for the kind of title here but it's not
supposed to be literal.

EDIT: title is tongue in cheek.. maybe even linkbait but in a clever way
rather than a negative way.

~~~
cryptonector
Yeah, but the title is still interesting on its own.

