Hacker News new | past | comments | ask | show | jobs | submit login
Who Needs Git When You Have ZFS? (zef.me)
206 points by urza on May 25, 2016 | hide | past | favorite | 146 comments



ZFS is excellent for database development! Create a snapshot before you try something that mess up your data and instantly restore it! If your development database is a few gigabytes, this will save you a lot of time.


ZFS is also great for teams that work on ginormous code projects.

When I was a contractor at IBM in the 90's we had something similar so we could work on OS/2 source code. The entire build project was made up of hundreds of other sub projects and it was way too big to fit on a single developer machine of the time. Plus the reality was that individual developers only edited small portions of any part, even though they needed the whole to build and test.

In later years, I've found myself thinking about ZFS for other big project scenarios. Like legacy enterprise websites that have gigs of static files and scripts that integrate with parts of webapps, etc. it's a pita to sync an 80gb mirror to every dev and a horrible waste of space. Much more efficient to simply mount the production mirror once and then store small edits locally.

Note that these kind of legacy sites are usually not checked into source control because they are too big.


Absolutely. It's also good for production. This feature saved my butt once when I upgraded a postgresql 8.3 server to 8.4 on a Sun 4500 not realizing that the new server was compiled with a different timestamp format then the old one was (8 byte integers vs double precision floats). Thanks to ZFS I was able to trivially roll back the update.


That's what docker would help you do too, right?

What's the pro/con of zfs compare to docker?


Usually data / persistence layer is not stored in Docker images directly. Frequently the data directory is marked in containers as a "volume" to bypass the usual CoW filesystem and write directly to the backing filesystem. So, it's uncommon to snapshot DBs directly in Docker and likely (this is pure speculation) 'zfs snapshot' is more efficient than 'docker commit' (on disk space usage and/or speed of snapshot) for this workload due to different use cases.

I'd be really curious to see some actual numbers on this though.


Just to clarify if I'm understanding correctly here: the data itself is often outside Docker for performance and other reasons, but the DB software itself (PostgreSQL / MySQL / whatever) can still easily be within Docker.


Sounds pretty correct, only nitpick is that the data is not "outside of Docker", but rather outside of the Docker image.


You can use docker on top of ZFS. My employer has something called flocker that can use it:

https://clusterhq.com/flocker/introduction/

The ZFS version is not yet considered a production release due to delays in stabilizing the /dev/zfs API though.


ZFS is what the filesystem your volume would be using. You should not store ("commit" really) the database data in a Docker image.


Not ZFS, but filesystem snapshot technology in general. Ten years ago we were force to move from a NetApp (which included snapshot) filer to a SAN with no snapshot and that was our biggest impact. Database changes went from being minutes to hours because the DBAs now wanted to do full backups before any change.


True, but you can/could easily do that with LVM snapshots and merges too. We snapshot the databases (copy on write) which is instant, then if we want to roll it back we lvm merge and remount. Everything is pretty instant and it doesn't matter what filesystem you want to use.


Do LVM snapshots still cause massive performance degradation?

https://www.nikhef.nl/~dennisvd/lvmcrap.html


I think its wise to have a healthy housekeeping policy if you're going to use lvm snapshots. Any point in time copy on write snapshot is going to cause performance degradation. However, it really depends on the volume and you can't speak universally for everything/everybody. For example if you have a drive with 100 files but only write to one of them. (for simplicity) The first time you rewrite those blocks are going to endure the hit for the copy on write. (dual writes) After that first rewrite of all those blocks you can't really tell the difference that the snapshot exists. This obviously assumes that you don't have a ton of additional snapshots compounding writes..


Depends. Thick provisioned (conventional) LV snapshots take a hit. Thin provisioned LV snapshots take a much smaller hit, almost negligible.


Not for me.


I have read that COW filesystems tend to perform terribly on HDD with database workloads because overwrites mid-file are common and increase external fragmentation.

Is ZFS any better than Btrfs in this regard?


The answer to that is: use an SSD. With SSD random seek vs sequential access doesn't matter. Second part of the answer is that ZFS provides features for cache devices. So you get a fast disk as cache and a set of larger disks for as actual storage. All writes at first go to the cache device to be safe and then are written to actual storage, thanks to cow this can be done in larger sequential sequences. (Unless the disks become too full, ZFS has issues when disks are ~80% full, then it takes too much time to find free space) What's left are longer "sequential" read operations. There ZFS tries to predict the read, this can be successful, but might also happen that the seeks take too much time for highest performance.


> ZFS is excellent for database development! Create a snapshot before you try something that mess up your data and instantly restore it! If your development database is a few gigabytes, this will save you a lot of time.

The proper way of doing this is to clone the snapshot and then spin up a new container using the clone. That way you do not need to risk downtime on your production code. I believe that Delphix has a ZFS-based solution for doing this.


> The proper way of doing this is to clone the snapshot

Might not be really feasible. In order to create a consistent snapshot you have to stop the database server (or, if running a multi-master setup, take one master out of replication, but MM setups are rare and even more so is experience with them).

Then, you have to actually do the clone - with a couple-GB-sized database, that's easy and fast but once you hit triple-digits GB sizes, you'll end up with a massive disk performance loss during the copy.

Also, in case your DB upgrade went through, you have to (magically) apply it to the other servers because in the time between clone-begin, clone-end and db-upgrade-end data will have changed on the prod system...

It's easier to make a short maintenance window, do a snapshot, hope for the best and if it blows up, restore service with a single click (by removing the maintenance page).


I think you are confused about ZFS clones. You never stop a production application for a clone because the clone is made from a snapshot. It also is copy on write, so you have no massive data copy. If your application, is not crash consistent, you do need to get it to pause, flush a consistent state and do the snapshot, but the snapshot takes less than a second to do, so such a thing is probably fine.

Also, if you need to stop the database for a consistent snapshot, then your database is not ACID compliant.


Traditionally, DB engines avoid FS specific features like the plague.


That is why this works.


Sounds great. Can you also duplicate versions of a database to coworkers on the same machine without much cost?


Yes, if they have ZFS too, it is basically snapshot, send & receive. Or you can clone it and rsync it to them.

Do not try to snapshot a running database though. Before you execute the snapshot you should make sure everything is flushed to disk. For example with PostgreSQL you need to create a checkpoint before the snapshot, so you write:

SELECT pg_start_backup('prepare_my_snapshot');

AND when the snapshot is done:

SELECT pg_stop_backup();


It's not so critical as that. Any transactions in flight when you take the snapshot won't be finalized and will be discarded when you start it back up from the snapshot, but that won't affect the consistency of the database.


If it is the same pool, you can just do a clone. Send/recv is completely unnecessary there.

As for flushing the database first, it is unnecessary as long as the database is ACID compliant. If you use MySQL, you will probably want to freeze and flush the database first to be absolutely certain of safety. DDL statements and a few minor other things in MySQL are known not to be ACID compliant.


> Do not try to snapshot a running database though. Before you execute the snapshot you should make sure everything is flushed to disk.

Is a database (or anything else) really supposed to behave that way? Shouldn't every state along the way be possible to resume from? In particular, shouldn't database transactions take care of this?


Yes, it is. One of the fundamental principles of practical RDBMS is durability (the D in https://en.wikipedia.org/wiki/ACID), meaning that when the DB acknowledges a commit to the client, it must be guaranteed that the transaction has been reflected in permanent storage (i.e., on disk).

The fine print: Due to the architecture of file system access in contemporary kernels (esp. file system caches), it's next to impossible for a user-space application to actually guarantee that writes are durable (i.e. synced to disk at a defined point in time), but most RDBMS manage well enough in practice.


> The fine print: Due to the architecture of file system access in contemporary kernels (esp. file system caches), it's next to impossible for a user-space application to actually guarantee that writes are durable (i.e. synced to disk at a defined point in time), but most RDBMS manage well enough in practice.

While this doesn't relate to a specific file, you can always look at the size of Dirty and Writeback in /proc/meminfo. They correspond to the amount of dirty file page caches that have yet to be synced.


If the snapshot is atomic as it should be, then yes.


A database should handle a snapshot gracefully. I think you should only expect corruption if you are taking a non-atomic backup with something like tar.


Maybe. RDBMS should be able to recover from any crash consistent copy of storage, where crash consistency means the storage represents a state of the system at run time. FS snapshots provide this property for single volumes.

The problem is that production RDBMS tend to span multiple volumes, e.g., for logs vs. data to pick a simple case. FS snapshots in this case may not be consistent which would cause recovery to fail.

AFAIK this is why even robust RDBMS like MS SQL server use mechanisms like VSS to ensure data are flushed, which guarantees not just crash consistency but application consistency.


It's too bad Unix/Linux doesn't offer something like Windows' VSS API, where the operating system can inform registered applications or services of an impending backup operation.


It's too bad that VSS is unreliable. If I had a nickel for everytime one of my backups failed due to a VSS writer issue, I'd own Symantec AND EMC. This has been across Win2k/2003/2008. We've been using 2012 for the last year (yeah, old stodgy corporations move slowly), and I haven't seen the same number of errors, but we're also not doing as many client based backups either.


Have you guys thought about giving Windows ZFS volumes over iSCSI and then take ZFS snapshots? For example with SQL Server, it should be possible to talk to the SQL Writer Service and create a consistent database snapshot of a live SQL Server database with ZFS.

When this snapshot is taken you can send & recv the diff to a remote backup server.


Our experience with iSCSI hasn't been very good; its use is limited to some dev ESX systems. Our DBAs prefer to perform cold backups that we backup with traditional tools (Avamar).


I believe that SoftNAS is able to do that when Windows is using Samba shares hosted on ZFS.


You're right. Here's more info about it: https://www.softnas.com/docs/softnas/v3/html-reference-guide...


Do a snapshot and clone. Then you will have an exact copy of the data mounted elsewhere that you can modify independently.


It is better to do a clone and do experiments on that before doing anything to the production copy.


Except you'll lose all the data since you created the snapshot... You've been able to do this for years on Linux with LVM anyway.


I don't think you read the comment thoroughly. This is the point.

Also, the discussion is centered around ZFS, there's no reason to bring up LVM. Everyone knows there's many ways to skin this cat and it doesn't make anyone appear any smarter because they know of one way.


My point is that it's not really a great way of 'backing' up a database because typically you want the transactions after you've made the change as well.


Bringing up alternatives is the norm around here. Some people don't know LVM.


That's the point. If you mess up your db doing something you want to lose all the data since you created the snapshot.

>You've been able to do this for years on Linux with LVM anyway.

What's the point of saying this in that tone? I could achieve the same goal in a number of different ways - that doesn't make any of those methods invalid because another one exists.


> Except you'll lose all the data since you created the snapshot...

Yes, but that's the point

You're right about LVM though


The original poster should have advised people to clone the snapshot. Then you instantly have a copy elsewhere that you can use for experiments without disturbing the original.


> Who needs Git?

> Notably missing is support for merging

So, it's almost like a car except it's missing its wheels

No, it's not like Git


> Of course, I'm not seriously suggesting you'd ditch a "proper" version control system, but it gives a good sense of what's possible at the file system level.

> Using ZFS as a replacement of Git for is probably not a good idea, but just to give you a sense of what ZFS supports at the file system level, let me go through a few typical git-like operations:


I really don't think the article was seriously suggesting what you think it was; showing similarities between different types of tools (usually a more familiar one and a less familiar one) is meant to be illustrative


"I really don't think the article was seriously suggesting what you think it was"

Then maybe it shouldn't have a purposefully click-baitingly title suggesting it does...


>> I really don't think the article was seriously suggesting what you think it was

> Then maybe it shouldn't have a purposefully click-baitingly title suggesting it does

I normally hate clickbait, and yet I liked this article and its headline. Maybe because the tone was self-deprecating, like me getting on a bicycle and saying, "Look out, Tour de France!"

Actually, it's like a fish getting on a bicycle and jokingly saying, "Look out, Tour de France." Maybe the fish isn't anywhere near good enough on a bike to compete in the Tour de France, but still, it's a fish on a bicycle.

In the same way, Git will run circles around ZFS as a source-code version-control system. On the other hand, it's way cool that ZFS can even do some of these things, because after all it's just a filesystem.


Considering the audience I think it's fine. I don't think anyone on HN is going to see this title and think "oh the author is serious; this whatever thing that appears to be a file system by name can certainly replace git in its entirety!"


That and the rather obvious 'sudo' in front of all the commands.

I mean, it's a cool tool; I use it on my NAS (particularly like the ECC mode instead of dumb RAID), but the git-equivalence aspect looks a little bit like a flamebait ;-)


    zfs allow zef mount,create,destroy,snapshot,rollback,diff,clone,send,receive mypool/projects
Now he doesn't need sudo in front of most of them.


Unfortunately, it remains a feature only on Solaris/illumos and FreeBSD. The Linux and Mac ports still lack support for delegation.


Maybe read the article before adding a snarky comment?


Yeah, I think a better title would have been "ZFS for Git users".


I've been working with virtual machines in production and development for well over a decade now and a lot of what works via ZFS in this article works very similarly to how it would work we used AWS or VMware style snapshots. In fact, ZFS snapshots are only so helpful if you make complicated changes across multiple ZFS volumes, for example, that require some more transactional style rollbacks. In such scenarios, it may be easier to just perform an instance-level rollback.

One problem that didn't really occur to me before I became more ops-side was that administrators would want to heavily restrict / remove users' (read: developers and even other sysadmins) abilities to create snapshots in the first place. Why would you remove self-service temporary backups and avoid a lot of backup restoration requests? I didn't realize that other engineers could be careless and keep dozens or even hundreds of snapshots over time that gobble up expensive storage resources (SANs are not cheap regardless of manufacturer) and slow down I/O transactions over time. That was why so many of my customers deploying stuff like VMware Lab Manager and vCloud Director demanded the ability to remove access to snapshot features.

As a result of the typical usage where user abuse of a very powerful feature threw things for a loop, the typical organizational structure and siloization of these environments means that nowadays SAN-side LUN snapshots are used more often than from the VM layer (the same administrators that manage VMware environments typically have rights to the SANs). Using ZFS like this is a developer-side reaction to me, but duplication of trying to solve the same problem when technical solutions have existed and are viable is exasperating.


VMware snapshots are NOT like ZFS snapshots. VMWare snapshots are extremely heavy an will have massive performance impact if left around for an extended period of time for any I/O sensitive VM. ZFS snapshots are essentially 0-overhead from a performance perspective, and function in a very different manner.


I'm aware of how both work (I had a VCP410 cert and happily run ZFS with many snapshots on rotation at home), but IMO the uses are similar enough for the purposes explored in the article despite the significant differences in practice and felt that a traditional alternative should be mentioned. Additionally, VMware-based snapshots can be performed SAN-side and integrated with VAAI that can expect similar performance overhead as a ZFS snapshot and still support a lot of options with trade-offs and features. The compulsory indirections that happen in the traditional VMware snapshot implementation with the on-disk block map appears designed for storage compatibility (much like the design of VMFS in itself, including with that whole file-based mutex system) primarily rather than performance not to mention the support for memory persistence / quiescing with the guest OS, which ZFS snapshots do not support. In contrast, ZFS has all the metadata to quickly examine block time or other rich metadata to accelerate read and write indirection of blocks optimally. It's not clear if a SAN can accelerate ZFS using OSes in the way that they can accelerate VMware storage but I'm curious of the potential.

On the other hand, recursive snapshots in ZFS are really slick and can achieve 90%+ of what people typically want from VMware-based snapshots when it comes to any non-Windows OS if there's a little room for not needing to care about indirect changes required in rollbacks.


...just no. VAAI snapshot integration with VMWare is only for NAS platforms, not SAN - VMFS doesn't even come into the picture. Not to mention it's intended for View for rapid cloning, and not as a backup feature.


Actually, perhaps this doesn't address all your points, but I share the sentiment about VM snapshots "feeling" quite similar, in fact I experience it every day as I run ZFS as a root filesystem within VMware. The real gain for me is that all of these tools are in-band with my working OS. If I were provided a VM(s) on AWS or some SAN by my employer, that could in theory give us a mutual benefit. Ops could avoid giving me access via IAM or whatever and expanding beyond some set limit, but allow me to perform almost identictal "ops like" tasks using ZFS (which I also find to be fairly user friendly).

Incidentally, I'm also a fan of nested virtualization and using git on top of ZFS... I believe all of these tools complement each other nicely as opposed to being mere repetitions.


> Notably missing is support for merging, which ZFS does not have direct support for as far as I'm aware.

So, no replacement. Merge and Branch are the major features of git. The bigger the project the bigger the need for these. I know some projects where people work full time as merge conflict resolvers. Without git that would require a whole team instead of a person.


"Of course, I'm not seriously suggesting you'd ditch a 'proper' version control system, but it gives a good sense of what's possible at the file system level."


That's what the title says in big letters at the top though.


ZFS could not be a replacement for all of git, but (if you wanted to) you could use it to replace git's storage layer, which is basically independent of operations like merge. I.e. you can git-merge ZFS subvolumes (without any actual git repos) with recursive git-merge-file.


That is a very interesting idea, imo. I don't really like the object storage backend. E.g., I'm not sure if it's really the smartest to split the hash into directories and filenames. And the whole idea of tree files seems flawed (you don't need them if you consider the (checked out) file path as part of the file name and have that completely in commit files as if they would be a single global tree file).


The object store is done in the way it's done for two reasons. The first being that using individual files doesn't require opening some database file to parse the state of the repo -- and pushes and pulls are very simple to implement (you just download the objects to .git/objects and then update any refs in .git/refs). Secondly, the splitting of the filename into a directory is because filesystems have very bad performance when you put >500000 files in one directory. Remember that git was created to deal with the Linux kernel source, so they have very large numbers of git objects and operations need to be fast.

I actually think the git object store is very clever. ;)


full time?? wow


One unknown person called Linus Torvalds is one of them ;).


And there was me thinking Mr Torvalds mostly provoked conflicts.. :p


Yes, and these can't be merged with git yet.


> Notably missing is support for merging, which ZFS does not have direct support for as far as I'm aware.

That insight is critical, especially now that we are heading towards Dropbox Infinite, IPFS, Ceph and so many other non-centralized file systems.

Syncing is not a solved problem at all yet. Advances can bring the same revolution as 3-way merging did to SCMs, previously dominated by file-locking mechanisms.


What was that replicated filesystem back in the 90s that promised to be the next NFS, but with disconnected operation and merging when you came back online? Coda, that's right. I had such high hopes for it in my move to using a laptop as my primary machine, but never got it working despite several attempts at it. Admittedly they weren't more than a couple hours at a given time...


ZFS doesn't have history in that way; snapshots are far too coarse. HAMMER, for example, is closer to what you want if you're going to replace git with a FS.

https://www.dragonflybsd.org/hammer/


If you look past the attention-grabbing headline, this shows some pretty cool stuff with ZFS, using source versioning as an analogy to explain what would otherwise be quite abstract when it comes to filesystems. The only thing I knew about ZFS was the name, this helped me see some of its impressive features.

Of course, I'll take reliability & stability over fancy features any day, but I'm glad there's active development in this area, and look forward to seeing these features mature. They could contribute to some interesting simplifications.


> I'll take reliability & stability over fancy features any day

That is why people use ZFS and OpenZFS, they are battle tested and very mature.


>Of course, I'll take reliability & stability over fancy features any day, but I'm glad there's active development in this area, and look forward to seeing these features mature. They could contribute to some interesting simplifications.

ZFS is in use in production systems all over the world, with a very good reputation for reliability and stability. Nexenta and others have made successful businesses working with large enterprise customers selling ZFS based storage solutions.


"Oracle's (previously Sun's) next-generation file system"

That's why you need to stick with Git.


What does Oracle have to do with anything? This is FUD.


It is certainly not FUD, see https://www.eff.org/cases/oracle-v-google

The point is, Oracle have decided to sue once over usage of something they bought and may do so again in the future.


Exactly.

The legal situation of Google's use of Java APIs was always a tiny bit murky, but it was super-clear-cut compared to the use of ZFS-on-Linux.


Oracle sued over Google reimplementing code that was under the GPL (OpenJDK) under the Apache 2.0 license (Apache Harmony). Google recently switched to the GPL code because of it. In the case of ZoL, the code is derived from the original code and is under the original license. The idea that ZoL is somehow more at risk is pure FUD.


My understanding is that the risk is mostly the other way around: it infringes Linux' GPL license (see https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/). Would Oracle have grounds to sue as a Linux copyright holder? They have successfully made API copyrightable, so they clearly have competent lawyers.


There are a few people who think that. There are plenty of others who do not. Those in charge of reviewing licensing in Linux distributions that have adopted it are generally of the opinion that it is okay, especially those in charge of Sabayon and Ubuntu. Gentoo came to similar conclusions long before them (>4 years before Ubuntu) and ships it in binary form on the LiveDVD. At this point, there are at least a few dozen distributions that have adopted ZFS in some form or another.

Here is a legal opinion on the matter:

https://www.softwarefreedom.org/resources/2016/linux-kernel-...


I don't see how anyone could read that and come away thinking that ZFS-on-Linux isn't kosher. Given as simple as it is to compile zfs.ko in the Ubuntu live environment, I'd bet money that's Canonical's backup plan should a literal interpretation of the kernel's license become a legal reality.

And here's to hoping that ZFS becomes so widespread that Oracle themselves end up redistributing it in their version of Linux, like they did with DTrace.


For which open-source projects owned by Oracle has Oracle's stewardship been a benefit to users?


"Using ZFS as a replacement of Git for is probably not a good idea..."

Of course not.


To play devil's advocate, sometimes Git is abused to store things that aren't code, and ZFS (or similar) might actually be a better fit in those situations.

Somebody further down mentions art assets. Also potentially relevant are document repositories.

ZFS isn't very easy to use for a lay person, but that's really only a "Time Machine"-esque front-end away.


Ubuntu 16.04 comes with ZFS preinstalled and ready to use.

Well, you need to install the user package 'zfsutils-linux' and you are ready to go.

The kernel modules for ZFS are already in the default kernel and are loaded automatically when you create a volume.


I have a FreeBSD server in my home office that has a big ZFS raidz volume that I use for shared file storage, and it's really, really awesome. The snapshots are especially great because they greatly reduce the fear of screwing something up. I once wanted to run a de-dupe script on several hundred gigabytes of photos, but I was afraid it would go awry, so I snapshotted first, knowing I could roll back in a few keystrokes. (Granted, this wouldn't help much if the script had caused some problem that wasn't immediately apparent. And yes, I have backups of all of that stuff elsewhere, but doing a restore from backup is a lot more work than rolling back a snapshot.)


Strange there's no mention of FreeBSD's excellent ZFS implementation


Yes, and the article casually implies that ZFS totally works not only on Linux but also on OS X[1], which is... a stretch.

I check on it every year or so, but my impression is that ZFS on Mac is still so far behind where ZFS is on FreeBSD and (more recently) Linux, that it isn't really clear whether ZFS will ever work reasonably on the Mac.

(I would love to hear experiences of people actually using it on OS X, though! I may be out of date.)

[1]: https://openzfsonosx.org


How production ready is ZFS on Ubuntu (Debian?) ? Is there any risk of Oracle fighting back open source zfs as for Java with Google and Android ?


ZFS is actually one of those features that are officially supported on Ubuntu but not on Debian. As far as I know, Ubuntu 16.04 is the first popular Linux OS to support ZFS out of the box.[1]

As the article notes, ZFS on Linux has been production-ready and stable for over three years. For in-depth info see [2].

[1] https://wiki.ubuntu.com/ZFS

[2] https://clusterhq.com/2014/09/11/state-zfs-on-linux/


If you have been following the case, Google switched from Apache Harmony to OpenJDK saying that Oracle had no basis for claims against future versions because the code is now under license from Oracle. The original ZFS code has always been under the CDDL from Sun and later Oracle. The ZoL code is derived from that, so it should be in the same situation where Oracle cannot make claims against their own licensee.


ZFS is under the CDDL while Java is not under the CDDL.

CDDL is an open-source licence just as the GPL.


> CDDL is an open-source licence just as the GPL

But incompatible with the GPL, which mostly explains the current situation of ZFS on Linux.


It's the GPL that's incompatible with the CDDL and not the other way around, which is an important distinction. Other than other GPL variants and BSD and MIT style licenses, the GPL has restrictions which make it pretty much incompatible with everything.


I don't get how the word "incompatible" can be understood as anything other than both license has aspects which makes combining them impossible. But owell, lets see what the license text actually say:

Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. The Modifications that You create or to which You contribute are governed by the terms of this License.

If you distribute source code, this license means that you must use this license. You can't use some other license, as its not compatible with the distributor deciding what license to use. CDDL license require that any source code distribution use CDDL as its license, and it is incompatible with any other license that has similar condition. CDDL is incompatible with GPL because GPL has similar conditions as CDDL.

If we were to make an identical copy of the CDDL license and call it CDDLv2, those two identical twins would be incompatible with each other. Software under CDDLv1 would not be permitted to be combined with software under CDDLv2 and distributed as source code.


Actually, the CDDL allows the person who wrote the License (Sun, now Oracle) to release a new version of the license that implicitly updates the license for all projects using the old license (It's like the "or any later version" thing with the GPL, but I'm fairly sure it's not optional with CDDL). Which is why people are asking why Oracle doesn't just release CDDL 2.0 that is GPL compatible.


New CDDL code outside of Oracle is under CDDLv1 only to prevent them from doing as they please with the license terms.


But ZFS is not under that other license (which might be called CDDLv1, but that's confusing). I don't get why new software uses the CDDL over the MPL if you want "file based copyleft". If you want copyleft, just use the GPL.


Not much new software uses the CDDL. Fans of CDDL software typically opt for BSD or Apache licenses.


> If we were to make an identical copy of the CDDL license and call it CDDLv2, those two identical twins would be incompatible with each other. Software under CDDLv1 would not be permitted to be combined with software under CDDLv2 and distributed as source code.

There is a CDDL v1.1 that is effectively 's/Sun/Oracle/'. The CDDL has an optional clause saying any later version is allowed and the CDDL only applies at the file level. I have yet to see a problem in mixing CDDL code under v1 and v1.1 with each other.


Okey, correction then, if you are not Sun/Oracle (and most people aren't) and create an identical copy of CDDL and call it "bobs license", then BL and CDDL is incompatible because of the conditions in the otherwise identical licenses. Point being, the condition that create the incompatibility between CDDL and GPL is that both want their license to be used when distributing. Remove that condition from either license, that is CDDL or GPL, and the license incompatibility goes away. It would also turn either license into a permissive license.


The CDDL's restriction is only in the files containing the code. The GPL's restriction is on all files within a derived work, whether they contain GPL code or not.


Is you argument that CDDL condition is somehow smaller in scope than GPL? GPLv3 also cover anti-DRM clauses, but that doesn't make it more or less compatible with CDDL requirement that source code is made available only under CDDL license.

If CDDL did not have their condition that source code is only made available under CDDL license, then the incompatibility would not exist. If GPL had similar loophole as CDDL and permitted modifications to be under different license if they are put as a separate file node in the file system, then the two permissions would permit a single project to include software under both licenses. I doubt CDDL will receive such update, and there is already LGPL.

The loophole is quite interesting from a legal point of view. Would a filesystem based on a database be fine? The computer would be storing all the code in a single file, but the presentation would look like that file is actually two different ones. Same goes with a gz file and "virtual" file systems, that present data inside as if it were separated files on the disk. What about a container image where a single work has source code under CDDL and source code under some other license?


The CDDL is designed to be a per-file license. All other files in a project can be proprietary as far as it is concerned as long as the files containing CDDL code are under the CDDL. Sun wanted it that way. The GPL's insistence that CDDL covered files be under it when they are used as part of a derived work (in the legal sense) is why incompatibility can occur.


Three year old article teaches slightly outdated techniques and basic ZFS usage on linux by way of specious comparisons to GIT.

I guess it wasn’t a half bad resource when it came out, but these days there’s got to be better blog posts about this, right?

EDIT: Not that I’m bitter; I love zfs. I mainly wonder how something so old ended up here.


I very much appreciated this article and shared it with coworkers who also found it helpful. I can understand the desire to only read the latest news, but I've learned a lot of stuff in my life that was useful but only new to me. The key benefit to this article was the comparison of ZFS to something I am familiar with: Git. My wild guess is that other non-filesystem geeks like me liked this article for similar reasons and upvoted despite its age.

Edit: Perhaps the ZFS to Git comparison is itself novel? I would like to read about other filesystem features (any file system) from this perspective because I think I would learn a lot.


I'm tempted to write an article: Who Needs ZFS When You Got Git, but the title IS the article.

Don't get me wrong, I looove ZFS, and from a quick scan this article looks like a good intro to ZFS, particularly for someone familiar with git.


Author is not actually suggesting replacing your version control with ZFS. From the first paragraph:

> Of course, I'm not seriously suggesting you'd ditch a "proper" version control system, but it gives a good sense of what's possible at the file system level.


So the title is click-bait, then? I almost read the article, but in the 342 milliseconds between thinking that and reaching for the mouse, I came up with three reasons the topic alluded to in the title would be a dumb idea. And it turns out that's not what the article is about anyway.


Good work, you avoided learning a thing! :hooray emoji:


I like how he uses of course . "Of course," the title of this article is misleading. The article is about something different, but this was the most clicky baity combination of words including "git" and "zfs."


Clickbait? Yes. Interesting article? also yes. The problem with clickbait is it's often used to bait you into reading BAD content or content irrelevant to your interests. Clickbait + good/related content is just clever marketing.


When KDE's git repositories had been corrupted, one server being out of sync was the only thing that saved it. ZFS could have saved it and as far as I know, one mirror administrator is using ZFS to protect KDE's git repositories against recurrences.

ZFS and git are complementary technologies. One does not replace the other.


This blog post was one of the most inspirational pieces that led us to founding Pachyderm (pachyderm.io). We offer git-like semantics for data, including branches, and in a distributed system so it scales to petabytes.


ZFS and other advanced versioned/copy-on-write filesystems are really cool and useful but it's not a source code management tool.

E.g. the examples show branching but not merging.

On the other hand, this might be useful for storing binary large files that are not really diffable and mergeable, and thus a bad fit for Git. However, the typical use case for this is art assets in games, which means that it's the artists and designers who are the target audience. Git is often said to be too difficult for non-programmers, and ZFS or BTRFS is definitely not easier.


Weird title as Git and ZFS are used for completely different reasons but I love ZFS as well and the article highlighted some of the finer details which was great.


An important difference is that While both rely on Merkel DAGs git uses content addressing while ZFS doesn't (physical addresses impact hash too).

Relatedly, ZFS's memory management relies on the assumption that a forked filesystem and its sibling will monotonically alias less data as either is modified. This allows it to avoid tracing and ref-counting alike, but makes implementing merging or `cp --reflink` difficult.


Are there any beginner resources for learning about file systems? Would be smart / harmful to reformat my whole macbook ssd to ZFS?


I don't think you can use ZFS as a boot volume on Macs (and only on Linux with caveats).


The limiting factor is the bootloader. Mac firmware can read HFS+ natively, find the bootloader, which in turn recognizes HFSX, Apple RAID, and Apple Core Storage. I doubt any of the previous ZFS support exists in the current bootloader.

Unlike linux where there's a separate /boot, OS X doesn't really separate boot from root file systems. So how you'd boot XNU from HFS+ and transition to a separate ZFS root fs isn't obvious to me from the existing documentation.


XNU hard codes HFS+ as the rootfs, but people have managed to make Mac OS X boot using a ZFS zvols to store a HFS+ rootfs. Unfortunately, there are no guides for that at this time.


This article gave me some confidence in finally trying to boot Arch Linux on ZFS. This will either make my life very easy or very hard.


Well.. I get the funny side of the comparison , they're two different things..


I've been reading that ZFS was "almost ready for use" in Linux for years, how stable is it for production now, anybody here using it, any good/bad comments or tips? (using CentOS 7 at the moment)


To give you an idea, it just shipped baked into Ubuntu 16.04, the latest Long Term Support (LTS) version.

https://insights.ubuntu.com/2016/02/16/zfs-is-the-fs-for-con...


Anyone here think that a "ZFS in the cloud" that anyone could inexpensively export a ZFS snapshot (or volume) to would potentially be a business model (offsite backups, etc.)?



Ah, I think I've even come across this in the past and forgot about it!

Is $60/TB/month competitive?


More like git-annex and git LFS.

ZFS can be good cms choice for large binary files.


ZFS is very powerful indeed; I've been using ZFS to share a data partition between OS X and Ubuntu. The versioning capabilities the OP mentions has a lot of potential.


Branches and tags are just snapshots and clones? This actually looks more like Subversion than git, together with the challenges with managing merges that this model brings.


I like ZFS a lot, but it's no replacement for Git. Snapshots are really handy for a lot of things, but not for a group of developers collaborating over some source code.


Did you read the article?

    "Of course, I'm not seriously suggesting you'd ditch a "proper" version control system


Almost all of what they're saying the benefit here isn't ZFS specific and could be done (for example) with Volume Shadow Copies on Windows servers as well


Strange he doesn't mention FreeBSD's ZFS implementation.


It looks like ZFS which came from BSD systems will be/is the next big thing in Linux, not Ubuntu on Windows which gets more hype...


ZFS (not ZSF) came from Solaris, not BSD.


ZFS has been around and stable on Linux for years, I doubt it's going to be the next big thing. It's usefulness is pretty limited for VMs. Btrfs on vms is at least useful for docker.


I personally feel that we should all hop off the ZFS (on Linux) hype train and consider a few points. Currently ZoL is being developed by two people, and it works by creating a translation layer into the Linux kernel API. The bug list is enormous (much bigger than btrfs), and there isn't enough experience with the codebase by Linux kernel developers. That's ignoring the potential legal issues. No other ZFS port has these problems.

On the other hand, if you want a supported filesystem with many of the same features as ZFS, there's btrfs. It alleviates all of the problems with the ZoL port. And there's no fear of Oracle lawsuits.


There are far more than 2 people here:

https://github.com/zfsonlinux/zfs/commits/master

I will admit that my contribution activity has been low as of late, but it will pick up again soon. As for the bug list, that is because basically all distribution problems get sent there and duplicates are not closed until it is proven that they are duplicates. There is no such transparency with other filesystems' bug tracking. Redhat receives plenty of bug reports through Fedora that they happily close if not solved by EOL.

As for btrfs, one of the btrfs developers claimed that ZFS has 5 times the development resources of btrfs:

https://news.ycombinator.com/item?id=11749477

As for lawsuit fears, the fact is that using any software at all puts you at risk of a lawsuit. Whether or not the plaintiff has a legitimate case is a separate matter. However, if we consider the prospect of Oracle having a case against those using btrfs or ZFS, btrfs would be at higher risk. The CDDL provides an implicit patent grant while the GPLv2 does not. Any ZFS patents that are applicable to btrfs that Oracle accquired from Sun could be used against those using btrfs. The repercussions for Oracle would be huge, but if we are discussing about the potential for legal issues with Oracle, then btrfs is at the greatest risk.

From what I know of the internals of the two, btrfs has far more problems. This shows some of the problems on an enterprise distribution:

https://news.ycombinator.com/item?id=11749010

The idea that performance can be so horrible that the system might as well have deadlocked is a severe problem. The ENOSPC issues from internal fragmentation is another severe problem. Then there is the lack of backports. I could continue, but I see no need.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: