Hacker News new | past | comments | ask | show | jobs | submit login
Gitfs: Version Controlled File System (presslabs.com)
189 points by pabs3 on Aug 22, 2021 | hide | past | favorite | 58 comments



To be clear on what this is, it allows you to mount a git repo and use writes to the filesystem to autocommit and push to the remote, effectively using git and a server to synchronize directories between remote hosts.

That is not the same thing I would think of as a version-controlled filesystem. That would be something more like the ClearCase MultiVersion Filesystem, which allows you to define versioned views of an entire filesystem. This gives you the holy grail of versioning, snapshotting, backup, restore, synchronizing entire systems, with equal treatment of text and binary files, in a way that is totally transparent to any higher-level tooling that reads and writes to these files, that developers have been trying to reinvent for 30 years because ClearCase is proprietary and very expensive.


I've done something that sounds similar with bash scripts and systemd path units. Every time I modify a configuration file from any of a number of paths I'm interested in versioning, I get a prompt to supply an optional commit message, and the change is automatically committed and pushed to remote. It's not the whole file system, but the commit messages on configuration file changes have come in handy at least once.


I expected it to be closer to Microsoft's GVFS: https://devblogs.microsoft.com/devops/announcing-gvfs-git-vi...

There, they used a filesystem interface to speed up checkouts for large repositories.


Since mvfs is a Linux kernel driver, it must be free software. Here is a git mirror of it: https://github.com/msteinert/mvfs

I'm not sure how easy it is to use with anything that isn't the rest of the clearcase suite


ClearCase is the devil in software form!

Unintuitive, hard to maintain, absolutely unloved in industry, extremely expensive.


I hear Opportunity knocking...


I used ClearCase for many years. It was a good version control system. It had a certain amount of complexity associated with it, but I thought it was fine and worked well.

I assume it’s still sold and supported, but I haven’t checked on it in ten years.


I used it in a global telecoms company about 15 years or so. We had all the bells and whistles dynamic snapshots FS integration etc and I found the experience very underwhelming. Clunky and brittle and it never felt fully transparent and for all that we had to do a whole heap of training to learn to use it anyway. Subversion when I came to it was a step up, but still felt a little brittle and they kept on changing things between releases. For all its warts git is simply the best tool in common use. Fast, reliable and probably about as hard to learn as any of the other options if you’re just sticking to the basic usecases.


Performance tends to be very poor. Seems very much in "cash cow" mode as in not being invested in to improve.


Similar to making a ZFS snapshot, maybe?


I think this is a great idea, but I would want to see it supported by the main Git maintainers (integrated into the Git Mainline —probably wouldn’t happen. It’s written in Python, and Git is C, so there’s a lot of structural issues).

I really like the idea of adding the work of non-tech team members (Graphic Design, Localization, Marketing, etc.) directly into Git (as opposed to requiring the devs to act as “gatekeepers”).

My main concern would be that this could end up “ballooning” a Git repo, and I’m not sure if this addresses the existing LFS issues, which would reduce its utility for some use cases (like media assets).

There’s a reason that some game studios still use Perforce (DISCLAIMER: I was a teenage Perforcer —I don’t miss it).


Git ships (usually, but technically depending on package I suppose) with 'contrib' stuff in Perl at least (e.g. diff-highlight).

I think more of a barrier if they even considered it would be the FUSE dependency. I mean, of course it uses FUSE, but it makes it Linux and (with a bit of pain) macOS only.



I wouldn't want this in git mainline.

git is a clear, simple file format (with a horrible user space) which makes sense as the back end of far more things than it is used for. I feel like integrating this into mainline is a little bit like integrating a visual database design tool into a database server. They're complementary, but different.


Past related threads:

Gitfs - https://news.ycombinator.com/item?id=10053176 - Aug 2015 (31 comments)

Show HN: Gitfs – mount Git repos as local folders - https://news.ycombinator.com/item?id=8735937 - Dec 2014 (62 comments)

More loosely related:

Show HN: A versioned filesystem inspired by Git - https://news.ycombinator.com/item?id=4443321 - Aug 2012 (46 comments)

Ask HN: Random idea ("gitfs") - https://news.ycombinator.com/item?id=3897817 - April 2012 (1 comment)

Git is an acceptable filesystem - https://news.ycombinator.com/item?id=3617072 - Feb 2012 (2 comments)

PhoenixFS - a versioning filesystem inspired by Git - https://news.ycombinator.com/item?id=2353162 - March 2011 (2 comments)


Is the intent to turn Git into something like Jetbrains's Local History, an unbranched log of all changes on disk without commit messages? I don't think it can replace human-curated Git histories, cu but could be useful as a safety net to avoid losing uncommitted code. You probably wouldn't want to put a build directory in gitfs without a .gitignore rule though.


I doubt its useful for code

On the other hand for syncing notes across devices , or things like that , id say its very handy


I don't know why most people are assuming this is mostly for enabling non-technical users to write to Git repositories.

A popular setup ( so popular it's (was?) the recommended way to setup Saltstack's configuration) is to use it to basically mount an autosynced folder from a central location ( with the benefits of revision history at that location).

IMHO it's a much better use case since you can't make conflicts and it won't result in poor (autocomitted) git history.


I'd be curious how well this performs with files that are changed often, and if the disk space needed to store snapshots of all the changes would grow a lot over time.


IIRC git stores copies at first - so interactive time performance is great. It later compresses a set of files into a pack - so long-term space performance is great.

It's like the perfect partner - at least, from afar.


Minor nitpick: The files are initially stored with zlib compression. Each "commit" stores the entire file.

The compression into a packfile is where the diffs between files are computed and stored.


Sure but this will still run into issues e.g. if you are editing a Photoshop file with autosave enabled.


I like the idea - does it support .gitignore though? I wasn't able to find anything in the documentation and GitHub issues seem to be confusing in that regards. If it doesn't, it might be hard to use with tools that create temporary or output files in the directory, such as compilers, LaTeX, some editors, etc.


I wonder how conflicts are resolved, especially binary files.


Issue #333: Simultaneous changes on different instances cause incorrect merge

https://github.com/presslabs/gitfs/issues/333

Seems like conflicts are being resolved poorly.


Well I'm guessing nobody would be merging two different heads. and if they do manually from the cli, however git would.


"however git would" is by asking the user what to do. This automatically pushes, even if the push isn't clean it sounds like. It's unclear to me exactly what that means, `push -f` ? A pull with some options first if upstream can't be fast-forwarded? Doesn't sound very...safe.


This looks great!, but since a filesystem doesn't need most of the git features (decentralization, merging, etc). How does this compare to ZFS, btrfs, since they allow "snapshots" of filesystem to be stored and restored quickly similar to how git works.


It’s not meant for version controlling whole existing file systems. It’s for mounting git repos in the file system so that people can modify files in the repo and have them automatically committed. From what I understand from reading the readme.


Neat, and creative, but..

> You can mount a remote repository’s branch locally, and any subsequent changes made to the files will be automatically committed to the remote.

This is not a “feature” I would ever want. That alone activity dissuades me from wanting to try this.


I'm curious what behavior you'd expect from a git file system then? For devs and commits, understood, but for time machine like history of directories, etc seems useful for many folks.


> I'm curious what behavior you'd expect from a git file system then?

Explicit “commits” and pushes, like zfs?

Automatic commits are all noise.

If it’s a git client for non-technical users then it’s even less acceptable, that’s a guaranteed way to fuck up your repository.


> I'm curious what behavior you'd expect from a git file system then?

I never thought about or considered a “git file system” before today. While a neat thing conceptually and even technically, I can’t imagine a situation where I would want a git file system. I never want to auto commit to the remote repo. I want to make sure my local changes aren’t complete shit before I committing to the remote. For me this is a solution to a problem that I don’t have.


Tell us how you feel about network file systems?


> Automatically commits changes: create, delete, update files and their metadata

Does this mean that every file write is a commit (which with the right tooling actually sounds very pleasant), or?


When I clicked on "arguments" I was half-expecting it to be arguments like "--verbose" and "--noconfirm" and half-expecting it to be arguments like "why WOULDN'T I do this?" and "no this is actually a GREAT idea because...".

That said, this is pretty neat and I can see a couple use cases for it. If this ran on Windows there're more than a few programs worth of config directory that I'd love to use it to handle.


Always thought modern oses shoukd have an on machine document management system. If vendors would implement this into their office applications we would nearly be there


Apple has had a similar feature for years in its OS. Its office suite makes use of it. https://www.makeuseof.com/tag/recover-word-pages-mac-documen...: “Every time you save changes to a document, iWork archives a copy that you can recover at a later date.”

Time Machine and Dropbox (only in the paid version, I think) also keep copies, but (important for documents that consist of many files on disk) can’t really know which files ‘belong together’ isn a series of file writes.

Both, of course, also potentially are data leaks waiting to happen. Not only can’t you be sure that a Save overwrites an old document, you can’t even be sure that the program you’re using is giving the system that opportunity. On the plus side, these make it impossible to accidentally send out a file with (partial) content from an old version.


Webconverger uses https://github.com/webconverger/git-fs to manage OS upgrades, to roll back or even branch for particular client needs or testing.

I don't know of any other systems that can roll back as easy and fine grained as what Webconverger can.


NixOS can also move between system generations seamlessly. The approach is entirely different though, the system is fully defined by its config files and will be recreated from scratch each time it is changed (except /home and /var, mostly).


I've had this idea for years and always wanted to try building it. Glad you did so I didn't have to :D

Well done!


How does this compare with Microsoft's "VFS for Git"?

https://github.com/microsoft/VFSForGit


VFS for Git was superceded by https://github.com/microsoft/scalar and then many of the features were merged into mainline git, so what is left now is a thin shell around git features in the form of MS's forked git binary: https://github.com/microsoft/git


VFS for Git solves the issue of having gigantic bloated monorepos used by thousands of devs, making sure user efficiently downloads only what is needed for him.

This is basically Git checkout with an autocommit feature, making sure your grandma will be able to do check grammar in your thesis without teaching her how to Git.


Clicking the link, I was kinda hoping it was some version of VFS for Git, but for operating systems other than Windows. Kinda bummed that it wasn't, this seems kinda pointless. Microsoft is (allegedly) working on macOS and Linux drivers for that thing, but it's been a while now. I hope it's going to become an actual thing one day, it's such a cool idea.


I think the main difference is that this is for Linux and MacOS and not for Windows


The sentence at the very top of the GH repo reads

> Virtual File System for Git: Enable Git at Enterprise Scale

Does it sound any similar to what this particular project tries to tackle?


No. VFS is for /the underlying git/ and not for the user (aka, the .git folder). The linked project is for the end user.


That was a rhetorical question.

I usually understand that people read mostly titles. It was a bit funny though that someone likely managed not to read both the posted site and a link in a comment they replied to.


Can it handle files being renamed?

Or will the file's history become unlinked?


Looks like it does an remove and add in the same commit:

https://github.com/presslabs/gitfs/blob/cf92acc1fdb0bf93d599...

called by: https://github.com/presslabs/gitfs/blob/cf92acc1fdb0bf93d599...

So it'll be tracked as a rename (renames in git are tracked heuristically anyway, as long as they're part of the same commit).


How it is different from Git-annex ? https://git-annex.branchable.com/


I'm not sure why you're getting downvoted.

It's different from git-annex in that it's using git itself (git-annex just uses git to track metadata/hash to facilitate large files), but there's a similarity to git-annex's 'Git-annex Assistant' in how it "any subsequent changes made to the files will be automatically committed to the remote".

From a brief experience with git-annex assistant, I've been finding the experience of having things automatically 'synced' to be confusing and prone to issues when doing things manually somewhere else. Think it's real power may be in evolving it's user interface to be pervasive within file browers, projects like https://github.com/andrewringler/git-annex-turtle are an example.


Thanks for your answer !


This feels like a new version of mounting WebDAV shares.


Love the idea but you'll have to be careful about not putting too big files on it I suppose ?


Sounds like it's intended for non-git-users to work on an existing git repo - editing docs, graphics, whatever and pretending it's just a bunch of files with no git - rather than particularly for a general purpose filesystem with git as automated backup / version history.


Can this be used to time machine arbitrary folders? Or keep config folders synced across devices?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: