$ dd if=/dev/zero of=/tmp/disk1.img bs=1024 count=10485760
$ dd if=/dev/zero of=/tmp/disk1.img bs=1024 seek=10485760 count=0
Otherwise, this is a nice succinct article showcasing the snapshot/clone feature and send/receive snapshot deltas between 2 systems.
Edit: on Mac OS X, UFS supports sparse file, but not HFS+. On Linux, all major fs support sparse files.
$ truncate -s 10485760 /tmp/disk1.img
Often, a sparse file doesn't really emulate a real file well, and you do want a big file full of zeroes...
At which point you have filesystem-soup..
Comments like this which are blissfully ignorant of VMS only make their statement ring more true.
A modern filesystem that is available for common operating systems would be a good starting point.
I am not sure if ZFS is the answer, there might be better FS around, but we need definitely rock-solid file system drivers that we can rely on for WinNT 5+ and current OSX (as well as Linux and *BSD but at least ZFS has it for the later).
At the moment the common determinator of all filesystem drivers are various editions of Microsoft FAT filesystem - a pretty basic and old filesystem by today's standards. NTFS as filesystem is very rock-solid and read & often also write support is available in all common OS.
We would definitely benefit if solid r/w filesystem drivers like ext4, XFS, ReFS, BFS/BeFS, Btrfs, etc. would be available on more OS platforms.
The second you open that up to application layers, you are facing a wide range of problems. Specifically:
* Common schema: each application considers itself a unique snowflake, and might want to use different columns to mean different sort of things. We've been here before -semantic web, RDF, etc- and the correct solution was letting the apps just manage their own database of stuff.
* Selfish apps: There is nothing stopping apps from overwriting, or manipulating meta-info in ways that are detrimental to user experience. Consider the current case of "Set <x> as default browser", "Set <x> as default media player" at each launch, then multiply it across every schema column
* Interop: Beos attempted to solve this by dumping meta-info into the archives. The problem is, when a file gets outside the original system/OS, these attributes will get truncated. Any workaround on this would require full agreement on changing all of the file transfer protocols, and storage methods on all OS.
Also note, that the 90% of the case (localizing files instantly on multi-terabyte consumer-drives) is handled already via Windows Search & Finder. Personal experience on this shows, that you can safely drop the hierarchical madness in favor of filename-based search for most media/music/document cases; and there are ways you can apply this for development as well.
Applications can do this since forever. But there is no correct way per se. The question is should application sit on a lot of data for themselves? (walled garden, vendor lock-in, no interop) Example: think of music ratings in iTunes - it's all lost/inaccessible if you decide to additionally use a non-Apple software.
I would argue metadata in user mode applications is already a solved problem  - most applications adhere to common metadata format standards  and a few outliers .
WinFS, NEPOMUK and semantic web failed or haven't gained traction.
A practical common schema is being developed for search engines on schema.org by Bing, Google, Yahoo!, Yandex & co: https://schema.org/docs/full.html
> Selfish apps
The metadata access would part of the operating system API. If a software intentionally renames filenames or moving files to different directories (for no good reason), it's a virus/worms.
Most common file formats support metadata anyway, just keep them up-to-date. And Adobe created the XMP sidecar format especially for this use-case: http://en.wikipedia.org/wiki/Extensible_Metadata_Platform , http://en.wikipedia.org/wiki/Sidecar_file
 mp3 ID2, jpg ITPC/EXIV/XMP, office formats, pdf, epub, etc.
 Windows Explorer, Windows Media Player, Windows Photo Gallery, foobar2000, Winamp, Photoshop, Acrobat, etc. (and Linux applications as well) usually read/write file metadata for common formats just fine.
 iTunes, iPhoto, Aperture and Photoshop Lightroom store their metadata in a per app SQLite database.
File systems deal with organizing unstructured data (i.e. blocks of bytes); databases deal with organizing structured data (i.e. typed records).
Efficiency and scalability come from decoupling the FS and the DB and letting them specialize.
Examples: GFS + Bigtable, Azure extent/partition manager+ Table store, Amazon's various storage elements.
Pushing the DB into the filesystem doesn't really buy you anything - you still have to solve the unstructured page management/allocation problem.
Counter-examples: WinFS, Cairo, Windows Registry (which I'd argue was a large failure).
It's an idea that sounds good on paper, but fails on the theoretical (unstructured vs structured) and practical aspects (distributing structured data is MUCH harder than distributing unstructured data).
The Cairo project documents never specified the query-language and UI part of Cairo. And this was basically what never got implemented, eveything else made it. WinFS was doomed to fail because it run in user-mode in dotNet (in Longhorn era PCs were slower), instead of adding the query part to the NTFS driver in kernel mode. The Shell integration with only UNC path and dotNet only API was bad. And the object oriented metadata scheme was way to complecated, especially if used on an SQL database
WinFS beta1 worked okay, it was just very slow (dotNet services + SQL server in background, stored in hidden directory on NTFS). WinFS never made it because it was way behind the shedule and too slow.
NTFS and similar modern file systems are modular enough that would make it possible to add the missing feature a query interface directly into the kernel driver. Operating system of course would need to expose the API too so that C functions like fwrite() and WinAPI WriteFile(), etc. could be used to access files using the file directory tree as well as using a query language (e.g. Windows search exposes in Explorer address bar)
At best you can add a few more structured primitives, but that's not much better than SQL-lite or whatever you prefer running on top of a block store, since you don't know or really care about the domain of every application.
With a native OS integration other applications could take advantage of the new possible features.
What value would a "filesytem" that understands contacts add to a web-server or load balancer? All it does is couple application-specific domains and complexity into layers of the system where they don't belong.
No one is debating that it would be nice for all computers to have a unified understanding of what a document and what a contact is, but those are orthogonal requirements to what a filesystem needs to do.
The first is labor. I'm cool with "mysql as my filesystem" but most people have no DBA-foo and will be horribly lost. Who gets paid more, a "filesystem-oriented administrator" aka generic sysadmin or a "database-oriented administrator" aka DBA? Its going to be much harder to use, not easier.
The second is "trends and fads" in persistent data. The calls for nosql as the universal cure for all ills have quieted a little. Fundamentally 99.9% of the time I just want something to quickly persist a binary blob, say a video file of a movie. I don't want to reimplement git in my filesystem because if I did I'd use git, I don't want a DB as my filesystem because if I did I'd use a DB. Ditto spreadsheets or VRML (remember that?) files. So the "nosql" analogy of a database driven filesystem is all the cool kids would "upgrade" to ext3 for performance reasons anyway and then come on HN to lecture everyone about how ext3 is the only way to solve all storage problems instead of the old fashioned and obsolete database-filesystem being proposed.
The benefit: users and application developers could access files in various ways. (A directory tree is so limited and outdated. See my other comment about SharePoint what is already possible and successful, just at an higher level - intranet website)
You need access to the Fossil fileserver console to force a snapshot ``right now'' rather than waiting for the next scheduled time, but since Fossil is a user-space program, you could trivially run your own separate Fossil filesystem against the Venti store without needing any sort of root access.
By default, the cached-worm systems are dumped at 5am every day in the morning, meaning you can access your files as they were at 5am every day. You can manually dump them whenever you please, of course. These dumps are cheap, but not as cheap as fossil or ZFS (or git) snapshots. Unlike ZFS (not sure about git), these dumps are immutable, you can't delete them.
With fossil+venti, you still get daily dumps, but you also get finer grained ephemeral snapshots. By default at 15 minute intervals, I used to set them at 5 minutes. You can control how long you want to keep these in fossil, I kept mine for 3 months. The dumps are venti are immutable. These are very very cheap, and these systems also do deduplication, by default.
I don't run fossil anymore, I like the features, but reliability was less than stellar for some people, and the performance is much lower than cwfs, which is what I use now. Cwfs is very fast, rock solid, and very easy to recover in case of a catastrophe. I miss ephemeral snapshots, but cwfs dumps are cheap enough I can run them for what in a normal system you'd do "commits" for.
I miss all these features when I am forced to use Unix. Git does two things, history preservation and patch management. For history preservation, nothing beats the Plan 9 system. What git does better is patch management. I think it's valuable not to conflate these two concepts and create tools that can solve each one well, and work well together.
Despite its name, ZEVO Community Edition is a closed source project run by GreenBytes (recently acquired by Oracle).
Back in Nov. 2013, GreenBytes announced that they don't have plans to continue development of ZEVO.[^1]
So unfortunately, ZEVO is dead project.
I'm curious about any further thoughts/experience on this.
On the other hand, disk electonics failures, disk controller failures, backplain failures, all do happen. Frequently. To me. (Well, more frequently than I'd like), and when they occur intermittently, these classes of failures will produce on-disk corruption, but that ZFS can withstand, even without ECC. So I would gladly take ZFS's protection against these, even at the risk of whatever it is that a memory error might produce.
Lastly, consider that if the data is important enough to warrant worrying about memory errors, it's important enough to have backups.
I've had a ZFS volume get very thoroughly trashed though by bad writes and still been able to recover data from it (back in the foolish days before ZFS-on-Linux when I had OpenSolaris in a VirtualBox VM with real disks, and something happened that randomly nuked blocks all over one of the filesystems. Interestingly: just the one FS. I had a whole bunch and others were untouched).
You might want to recover at least some files from a badly damaged FS if it is your home server with a baby photos, but would you really do that, if it was accounting of your company and not restore from a backup instead?
I would recommend anyone new to ZFS read it if they care about the data they are storing.
Fossil uses SQLite3, but it doesn't do what I want -- only git gives me the power of the index, rebase, and light-weight branches. (OK, Mercurial has those too, but it's too late and it sucks. Sorry.)
Fossil does get several important things from using SQLite3: SQL for history (do NOT underestimate this), extensibility (one file format, trivially extensible schema), well-tested ACID semantics support. Oh, one more thing: using SQLite3 minimizes fsyncs per-transaction. There's probably more benefits, actually.
And ZFS? Well, ZFS snapshots give you ACID. If you have a persistent ZIL then fsync() gives you barriers (nice!). But a) snapshots are slow by comparison to SQLite3 COMMITs, and b) you don't get the benefit of SQL.
Oh: and not everyone can haz ZFS, but everyone can haz git.
Everyone also can has SQLite3.
My proposal: backend git's I/O abstraction layer (which is pretty nice) with SQLite3. And if you have ZFS, then always take a snapshot in a post-receive hook; destroy older snapshots later in a cronjob.
(As for zfs destroy, you really want the async zfs destroy feature, which IIRC not every implementation has!)
Only packs (maybe) and tags (also maybe) should not be objects.
If reflogs were objects then... they could be pushed and pulled, which would be really nice indeed.
If branches were objects then they could record rebase history. I would really like to be able to capture rebase history: what a branch's HEAD was before and after a rebase, as well as the picks/squashes/edits/rewords/drops done in an interactive rebase, and even the merge pre-/post-base (--onto).
I had this problem with git and nilfs2.
> "I’m not sure of the stability on Mac at this time. Using ZFS as a root file system on Linux is still slightly problematic at this moment."
And Windows? Never mind.
Ok, so the author wrote an article called "Who Needs Git When you Got ZFS", and then in the body of the article the conclusion is more like "Who needs ZFS When You Got Git".
ZFS does a bunch of cool stuff in a generic way as a file system, sure. But we already have apps doing it better in a way specific to real-world needs (i.e. Git), on more platforms, and I don't have to fear if I'll corrupt my boot volume if Git is buggy.
You might not know it, but out of all the new filesystems designed after 2005 (a few dozens?), ZFS appears to be the one that has seen the fastest adoption/growth .
Source: my anecdotal experience in the industry, the experience of many of my colleagues, random consultants reporting "I've seen 1000's of large ZFS deployments" , etc.
 Outside of ext4 which was introduced in 2006 and was a simple refresh of ext3, not really a "new" filesystem.
To turn your question around: what new serious  filesystems have been designed since 2005 (or 2001) at all, regardless of adoption?
I think that the answer is actually far less than a few dozen. I can think of four: ext4 (as mentioned), F2FS (announced 2012, started ?), btrfs (started 2007), and ZFS (2001).
Maybe there are others, I don't know, but I don't really think there are even that many contenders for adoption. (If there are, I'd be interested to find out).
: By serious I mean in a sense similar to what is outlined here:
in particular "In specific, none of these alternate init systems did the hard work to actually become a replacement init system for anything much", but for filesystems instead of init systems. For example, I would not consider HAMMER to be a 'serious filesystem' for the purposes of this list, whatever its technical merits.
The category you define may be seen as arbitrary, first because existing filesystems have not stopped evolving. HFS, NTFS and so on have been significantly improved over time, including after 2005.
ZFS is a categorically different concept than said examples, in that it's distributed.
But the file system has not been shown to be the best (or let alone only) way to have distributed data, you can easily put that at a higher level, and retain the flexibility of defining your own protocol with the consistency, consensus algorithm and other properties you precisely need.
Why solve this at the file system and lock yourself to a single system-wide way of doing it, versus have the best solution for each part of the system? I feel like it causes more troubles than it solves.
What? No, ZFS is not a distributed filesystem. It never has been, it almost certainly never will be, and it has little in common with distributed filesystems.
What makes ZFS different is that it is a production-grade copy-on-write self-validating merkle tree. Most of its properties fall out from that. There's nothing distributed there.
I'm saying this in the kindest way possible: please don't write about things that you have zero idea about. You cannot possibly be more fundamentally wrong about ZFS, and nothing you wrote makes any sense. :(
People who would've found ZFS interesting on its own merit won't read about it, and those who are interested in Git would.
I wasn't disappointed.
Windows is notorious for supporting fewer filesystems than its Unix-like peers; I'm not sure why you seem surprised that ZFS would be an exception.
Edit: It's not the state of the art of ZFS on windows either, which would probably be a virtual machine.
But if you're going to give a link to something that doesn't meet basic requirements, say so. Otherwise it looks like you're disagreeing with the claimed lack of windows support.
That's from the article... You know... that thing you didn't read?
I don't know what the correct term is for the kind of title here but it's not supposed to be literal.
EDIT: title is tongue in cheek.. maybe even linkbait but in a clever way rather than a negative way.
I in fact started actually reading the article only to be let down that it is not about an alternative to git. Remember the times when there was a correlation? Welcome to the tricking-each-other-into-reading-stuff age.
I want to keep the right to believe that an article is a long version of the title, thus keep the right to comment on the idea held in the title as if it was the content. Don't forget that there could always be a longer version.
And considering the title only, the comment does have its place. Now go downvote me too.