The other day, I had 60 minutes of free time between events and so I brainstormed a few ideas for improving Fossil while sitting in a Starbucks, and those unedited, spur-of-the-moment notes trigger a big discussion on HN... Yikes! I do appreciate the feedback. Seriously. Your comments are very, very helpful. But let's not attach too much weight to my musings over coffee.
Should I interpret the response here to mean that there is latent demand for a new-and-improved VCS in the world. Does this mean that Git is ripe for disruption?
Some Issues I Have With Git
(1) A Git repository is a pile-of-files and/or a bespoke key/value store (packfiles). The format of the repository is underdocumented. (Proof sketch: try to write a utility that reads content out of a git repository without first studying the git source code.) The repository format is also brittle, as evidenced by the difficulty the Git developers have had trying to add support for hash algorithms other than SHA1.
(2) The key/value design of Git limits the information you can extract from the repository. Example: It is difficult to find the descendants of a check-in in Git - so difficulty that nobody ever does it. You can find ancestors easily, but finding all the descendants of a check-in is very hard. In addition to depriving the user of useful information, the inability to find descendants of a check-in leads directly to the "disconnected head" problem. That one deficiency is a show-stopper for me. And this is but one example of the limitations imposed by the key/value design of Git.
(3) For people who don't want to put their trust in GitHub, setting up a Git server is way too difficult.
(4) Git requires the user to remember too much state information. Git users should be cognizant of (a) the current check-out, (b) the "index" or staging area, (c) the local head, (d) the local copy of the remote head, and (e) the actual remote head. The more mental power users must to devote to keeping track of Git, the less there is available to work on their own code.
(5) Git only allows one check-out per repository. (I am told there are resent extensions to git to try to address this deficiency, but I am also told they do not work very well.)
(6) Git does not do a good job of remembering branch history. In particular, branches are unnamed in Git.
(7) Git is for file versioning only. Other important project information, such as bug tracking, must be handled separately.
Fossil is an effort to address the problems above. I do not claim that Fossil is perfect, just that it is better than Git. I am keen to make Fossil even better. Your feedback is appreciated.
I was very active in the Mercurial community for a long time, and I'm still using it for personal stuff today. Mercurial solves issues 1, 2, 4, 5, and 6 in your list. I'm mentioning that for only one reason:
That was not enough.
My personal theory on why this happened largely centers around network effects. Git got big in Ruby at the same time GitHub came into existence, and I think that drove the adoption as much as anything. Ignoring whether people here feel that Git was better or not, the simple fact is that it did not matter: GitHub used Git, and the community used GitHub, so you used Git. The network effects in that kind of situation can be really hard to defeat. And while it's possible that solving items 3 and 7 will make the difference, I don't think that'll do the trick unless you can come up with a strong way to defeat GitHub's (and Git's) incumbent network effects.
(1) Good UX -- steal more from Mercurial here. No "index" or staging area by default, or at least a way to disable it. Being smoother about branch handling between local and remotes would also help significantly IMO.
(2) Network protocol and probably on-disk compatibility with Git and maybe Mercurial -- essential to gaining adoption.
(3) Better support for binary files, perhaps through using a modern content-defined chunking mechanism like FastCDC.
(4) Good support for narrow and shallow clones, maybe even making new clones (somewhat) shallow by default.
(5) Write it in Rust with a good modular architecture: good (including startup) performance, safe/robust and easier to reuse key parts (see also Facebook's Mercurial server stuff in Rust). The lack of a coherent/reusable architecture in Git has also been a key reason AFAICT that Git and Facebook are investing in Mercurial-based tooling.
(6) Use the inotify/FSEvents/whatever the windows equivalent is APIs to keep a daemon and have much faster status responses plus potentially record every change to the files on-disk -- sort of like Git's reflog, on steroids. Could also do smart build system integration here?
(7) Better support for very large trees (a la bup's midx extension to Git's storage format). It should really support Google/Facebook monorepo scale.
I have thought about starting something like this -- but I'm not sure this is the kind of thing where crowdfunding would afford me to build it out; and also not sure the big tech players would be willing to fund something like this (if they are, let me know though!).
I think the items that really have potential to address pain-points & be killer-features that motivate people to move beyond git will are (3) and (7) (big files and big trees).
The rest are probably nice, but are more incremental improvements than something that's going to justify the huge leaving-the-standard-platform cost.
Regarding good UX, I very much agree that git's command line is insane, and makes it unnecessarily hard for people to understand.
However, I would disagree that the staging area is a bad idea! It's one of my favorite git features. I think it's just git's extremely confusing command-line terminology and shitty information display that it seem confusing.
As a small piece of evidence in favor of this view, I'll say that I recently introduced a tech artist to git using the (extremely good!) GitUp app. He had no problem picking it up right away, including staging, committing, pushing, making branches & merging.
I think this is because the excellent GitUp UI represents the concepts in such a nice visual way that they become very clear.
I am not as familiar with Fossil, but it seems that one use case Fossil may be well positioned to address, is the storage and revision of data sets along with data analyses.
For people working with any significant quantity of data, git was a backslide as compared to svn. For people wanting to do reproducible research, git does a great job storing the code, but the data assets must be stored out-of-band if they aren’t tiny.
Any thoughts as to how Fossil might or might not address that use case?
EDIT: I should add that, in addition to data uses, I’ve heard tell that git isn’t great for storing art assets along with code (e.g. video game development)
One thing I think would be a killer feature (though I'm not even sure if it's possible):
Supporting large binary assets in the repo without slowly down terribly.
I'm pretty sure that would be enough to get many people to switch. From there, you might see a network effect as people that didn't even care about that grew to like your other points of improvement, and eventually Fossil could overthrow git and becoming the reigning VC champion.
My 2¢. ;-)
This is interesting. In my using git I haven't ever needed to look at descendants. Perhaps that's just because the limitation is deeply established in my mind (git is really the only VCS I've used). How might this information be helpful in normal usage?
That's my theory, too. I never needed to run "bisect" until I had the capability to do so. Now I can't seem to live without it.