Hacker News new | comments | ask | show | jobs | submit login
Fossil vs Git (fossil-scm.org)
301 points by afiori 21 days ago | hide | past | web | favorite | 247 comments

I'm a big, huge fan of recording what should have happened instead of recording every typo and forgotten semicolon in your history. There's a difference between draft commits and published commits. When I'm reading published commits, i.e. history, I just want to know your intent, not your typos. So what are the tools that Fossil offers to make sure I don't have to see your typos in the history?

I'm also a big fan of not deleting data. I don't like squashing commits, for example. But I also want to be able to see high-level intent.

If instead of "squashing", it were "grouping", I'd be happy. I could encapsulate a bunch of messy commits that I made while I didn't know what I was trying to do. The intent would be clear at a higher level, but if you want to dig in to see what it actually took me to achieve that, you can see all my experimentation by looking at the commits inside the group.

Groups should be able to be nested, of course.

The only way I know how to achieve this in git is my relying on no-fast-forward merges. There's a post detailing this approach [1], but unfortunately a lot of git tools don't support this workflow that well. I haven't gotten it to work well in gitlab, for example.

[1] Git First-Parent-- Have your messy history and eat it too : http://www.davidchudzicki.com/posts/first-parent/

You misunderstand me. Almost everyone does. I do not think you should squash all of your changes into a giant hairball commit, and I don't think first parent (which is effectively the same thing) solves this problem either.

I think each of your commits should be individually rewritten until each commit makes sense and tells a single, indivudal story that makes sense on its own, while at the same time be completely atomic and as small as possible.

You created a new function? That's one commit. Take a moment to explain why this function is going to be useful in future commits.

You called that new function from several new spots? That's another commit. Explain why each of these calling sites requires this function.

You decided that there had to be some style and whitespace changes? That's another commit. This one can go without explanation.

You found an old bug along the way that can be fixed with a one-line change? That's another commit. Hey, nice catch. Perhaps a few lines about this bug and why your small change fixes it?

Together, these are all individual scenes of a larger story. But the larger story doesn't need the behind-the-scenes of how you came up with these scenes. I don't need to see all your drafts of each scene. I just want the final scenes that make up the final story.

Ideally it should be this way, but it's impractical in reality.

It requires that you either stop your development workflow to commit as you go along, or that you untangle all the pieces after they're already entangled.

If you commit as you go, it's an expensive mental switch to fire up git and also run all the tests (since surely part of this workflow is to apply the principle that no commit should ever break the build). You also take an extra productivity hit every time you change your mind about something a little later (e.g. you added the function getFoo() but realize it should have been called findFoo()).

If you work for a while and then try to bundle up small, atomic changes, that can also be very difficult. Tools like git group together contiguous chunks of changes when committing, and prying them apart later can be difficult. I often do this with a combination of "add -p" and then "stash save -k" to temporarily get rid of things unrelated to what I'm committing, but it's a chore. During a selective "add -p" session you have to mentally keep track of what belongs together, thus what dependencies are between every chunk you're adding.

Committing as you go is easier, but it's slow, and doesn't work well when you're working across many files with a big change that introduces new semantics in a lot of places. Both techniques require that you keep track mentally of which parts are related, of course.

Untangling is what I mostly do. I consider the untangling my own internal code review. I need to read my own diff and figure out what goes where and what each part does and why it's necessary. My commit messages are then my own code review comments.

I figure if I don't carefully read my own diff, why would anyone else? And once it's untangled, I am hoping others will find it easier to read too.

Git doesn't provide as many tools as I would like to make this process easier. It's partly why I don't use git. Mercurial's absorb command helps a lot: it absorbs changes from your working directory into the appropriate draft commit that corresponds to the same context:


Wait, it appears someone finally ported it to git:


That's a cool script. I will definitely try that. Augmenting commits by doing partial commits then fixing with "git rebase -i" and squashing with "fixup" takes so much time and mental effort just to not make a mistake.

It still doesn't solve how to disentangle changes that have become interdependent. For that you have to concentrate on committing atomically and planning ahead a lot.

I don't think this is impractical.

I've been using this approach successfully for 8 years now on tens of open source projects and various company code bases of all sizes.

It does take a small amount of overhead (I measure this, and for me it's around 5%). But that pays off immediately as soon as you or someone else reads it a few weeks later when investigating an issue.

To toss in a counter-point: I do commits as I go and occasionally go back and make changes so it's a coherent sequence. For the most part, once you are fluent with Git[1], I've found it to be a productivity improvement, and code reviews have been both faster and more useful.

If you're doing two semantically different things, put it in two different commits. If it's one, put it in one (merge commits work too). That's just good change-hygiene, for the same reasons you try to isolate behavior in code, rather than mashing it all together into a single func just because you happened to be doing it all around the same time.

tl;dr we don't name our funcs "june_27_through_29", don't name your commits like that.

[1]: a huge investment, so I totally get why this isn't an early-coder practice, and it's rather painful. but IMO worthwhile, usually I see people spending far more time fighting it than it would take to learn it.

I use gerrit for everything and this workflow is exactly what it gets you (well, to be clear, my workflow is commit-per-issue resolved, not commit-per-function added, though you could use it that way also). I highly recommend it or a similar tool.

I wrote a post three years ago about my switch:


While I am following this conversation closely, I wanted to politely make a suggestion about something you said:

"You misunderstand me. Almost everyone does."

That sounds really frustrating. It's not clear from your post whether you mean this as "everyone who reads this comment does not understand me" or "people frequently misunderstand me"... but typically, someone who feels this way experiences this in the latter, general sense.

Good news: it is possible to dramatically improve the ratio of people who understand you. It will, however, require that you change how you communicate some kinds of information.

What I'm saying is that if people don't understand you, the first place you need to look is your own tools for communicating with people. Of course, what you are saying makes sense to you. But there's an intellectual fallacy at work if you assume that "almost everyone" is the problem and you are the solution. :)

Have you ever heard the saying, "The only thing all of your crazy ex-partners have in common is you?"

There are people who are tasked with explaining far more complicated concepts than source control who people claim to love learning from. That means there is hope for us all.

Yes, I know I'm not communicating effectively. I shouldn't have implied that everyone is wrong but me. I know that there are people who understand me, but in a world of github and pull requests, it's really hard for me to explain what the world looks like otherwise.

There is a name for what you are describing, it's called "atomic commits". Every commit can stand alone and at every commit the software should be in a working condition.

I mostly agree with you, but I think this might be going a little too far:

> You created a new function? That's one commit. Take a moment to explain why this function is going to be useful in future commits.

> You called that new function from several new spots? That's another commit. Explain why each of these calling sites requires this function.

In my opinion each commit should make sense on its own. It doesn't really make sense to create an unused function, so these two changes should really be one commit.

Maybe. I think both ways can make sense, and if you insist on having the function and the calling sites in the same commit, that can make sense.

I think my proposal can also make sense because (1) it splits up the commits into two units that still keeps the codebase in a stable state [this is my rough metric for what "atomic" means] and (2) defining a function requires some independent contemplation about why that function is defined that way. Inserting that function into calling sites can be a logically distinct operation if the calling sites are varied and distinct enough to each have a different reason to now require this function.

Thus the two commits can be semantically distinct.

I see this the same way. Like the notion that purely cosmedic changes should be committed distinctly in order to not clutter up commits that change behavior (so they are simpler to read), it makes sense to me that one would commit entirely new units of code separately before re-wiring the logic of the existing code accordingly. That way the first commit basically states "I've built this new thing", and I don't have to consider 300 random existing codepoints in understanding what it does.

that doesn't scale. I personally prefer a single squashed commit linking to a full discussion in a PR/MR: do a git blame, even if you get a giant hairball commit, you should be able to trace it to a review process (PR/MR) where it was discussed and thoroughly reviewed.

It scales quite well. Linux itself is developed in this way. Or perhaps you think Linux isn't at a large enough scale? (No sarcasm, I know that there projects out there much bigger than Linux.)

I'm not sure about "scale" but Linux, as an open-source project where external actors want their code included, can push arbitrary amounts of work onto those actors with no additional cost to itself.

They can say "to get your code upstream you have to do twice as much work" or 3x or 4x or whatever. It's not their cost to bear.

An internal team pays that cost. They have to consider whether the trade off of having a pristine commit history is worth the additional overhead of doing it.

I personally care more about PR size and am happy to squash all commits in a PR. If that is too big I'd rather see multiple smaller PRs.

I think the extra work for a single developer to perform atomic commits is justifiable.

How does this work with multiple developers working on the same repo? I'm assuming everyone should work on their own feature branch and send PRs once their branch is done? Should the commits also be tagged by the feature branch they're on? Should the CI approval workflow be run against any combination of commits on the feature branch or against the final HEAD?

With git, I think it helps to think of development in terms of patch series.

An individual commit is a single patch, intended to do one thing (and hopefully do it well), and a feature branch is a patch series.

A pull request is then a request to review the series. If you need to change things, git allows you to rewrite your commits to send in a revised set of patches.

Before merging, your CI would create a temporary branch off the current master, merge your feature branch to that, and run tests against the result. I don't think testing individual commits (fully, at least) in a series makes much sense if you're going to merge all of them to master anyway.

Not sure why this was down voted. Squash’n’Merge on Pull Request keeps master history clean while tolerating sheninagins on feature branches.

If important things are lost then perhaps the PR was too large.

To do this effectively, we have to change the way we edit code, so it's an editor/workflow feature rather than an scm one. Trying to solve it downstream in scm is not productive.

How often dou you actually look at this historic detail you seek to maintain? Daily, weekly, monthly? Is it more for to satisfy a feeling than an actual need? I mean if some junior dev wallows on some branch for 40 commits, I don’t want to see any of that, I just want to see what was finally merged.

I look at git blame (using 'Annotate' as IntelliJ calls it) quite often to figure out reasons for some certain change/implementation logic. It irks me when the result is just some giant squashed commit with 40 lines. Which of these explains this specific line? _History_ itself, yeah, not that much.

Would it not be significantly more frustrating if you use git blame and you see:

> Revert: Some WIP didn't work out.

Git blame again from prior to that commit.

> Added missing semicolon.

and again:

> Fixed spelling.

and again:

> Stupid typo, wrong method call.

and again:

> WIP, going to see if X can work.

Before running git blame once more, finally getting to the commit message that actually pertains to the current line of code you're seeing. Including the explanation (commit message) for why it is the way it is, and very importantly when this line of code actually made it into the software?

I would prefer not to have these errors in the first place. They should be caught during review, and then fixed in the commits that introduced them with an interactive rebase. If this type of error is consistently getting through your review process, you should probably consider revising that process.

...and maybe consider adopting atomic commits -- the main reason I like them is not actually because they make it easy to look at history, but because they're easy to review and catch these types of errors. If each commit stands on its own, it's obvious when one doesn't.

> Would it not be significantly more frustrating if you use git blame and you see: [...]

Yes, it would be very frustrating. But you're presenting this as if this is the only alternative, it isn't: I wouldn't approve a pull-request that have commits like the one you mention, I would ask to rework the history of the PR to be a logically sequence, just like exposed in this comment: https://news.ycombinator.com/item?id=19007171

"I think each of your commits should be individually rewritten until each commit makes sense and tells a single, indivudal story that makes sense on its own, while at the same time be completely atomic and as small as possible."

fwiw this is largely what `git log -L` (and maybe `--follow`) solves. You can log changes to a file or lines in the file and have it follow through moves.

Granted, most tools don't make that easy to do. But most Git tools are rather blind mimics of simple CLI commands with a nice UI (which can be a huge help), rather than being value-adds in terms of behavior or understanding.

Can you explain this a bit more?

There is a difference between squashing together thirty commits of someone working on one concrete thing (I don't need to see all the mistakes and reworks you made, I just want to see the result in a nice, easy to read diff), and thirty commits of someone working on thirty things.

The latter is, of course, wrong as it makes the repo history harder to read, while on the other hand the former improves readability.

It would be nice if in the first case one could annotate those thirty commits in a narrative "here I started another attempt", "this solution could not work for X,Y and Z reasons", "these 8 commits are just typos" maybe with also the ability to only select a subset of a commit.

I've recently added the ability to associate a wiki page in Fossil with an individual check-in or with a branch - as additional documentation about that branch or check-in. This is similar to your concept, if I understand you correct. The Fossil changes have worked well enough so far. But only time will tell if this ends up being a good idea or not, I suppose.

An example is the "About branch begin-concurrent-pnu-wal2" at the top of the page https://www.sqlite.org/src/timeline?r=begin-concurrent-pnu-w... - the page shows all check-ins for the branch, and the Wiki at the top gives a summary of what that branch is about.

Another example is the detailed discussion in the "About" section for check-in https://www.sqlite.org/src/info/718ead555b09892f - important information that records the thinking about this commit but which seems too prolix for a check-in comment.

Let me know your thoughts on this idea.

If how understand it right it is pretty much what I had in mind (assuming that you can always create new branches on old commits).

In the last week I have been quite charmed by many of Fossil ideas, I will for sure try this feature too.

Do you require this functionality and for people to write meaningful comments? I just question if version control is where any of this should happen.

Personally, how I tend to work is if there’s some link between commit and ticketing system I can refer to, it’s about the best you can expect.

I like to write tests (in a separate repo) which iterate over each commit and mark the point in history where they start passing--which is usually when the feature was implemented--and (more importantly) the point in history when they start failing again.

These points usually indicate communication/comprehension errors involving two developers. When I bring the problem to the developers' attention, their reactions differ based on their commit style. If they have atomic commits it's usually a five minute conversation because the nature of the problem is immediately apparent. If they have large squashed commits, I usually have to bring both developers together and have them fight over whose problem it is.

So I would say... a couple times a week, but the overall time savings of finer granularity is significant because it limits the number of parties that end up huddled around a single screen.

I'd rather have it and not need it than need it and not have it.

Besides, squashing means the identity of the commits changes, doesn't it? So you can't merge the same branch into 2 different branches (like merging a bugfix into both the release branch and the trunk) while keeping the identity of the commits - then when you merge your release branch into your device branch you get wonky duplication of commits in the history.

I notice that move detection seems to get messed up by squashing sometimes...

But maybe I'm using Git wrong, which imho is the biggest flaw of Git - it's so flexible that there are so many ways to "use it wrong".

> I'd rather have it and not need it than need it and not have it.

Yeah but again, have you ever needed it? Because you can say the exact same thing about preserving every sequence of backspaces, deletes, and key typings into an editor. But when was the last time you or anyone needed that level of granularity?

> I'd rather have it and not need it than need it and not have it.

But squash commits do let you locate the identity of the authors. You just have to look at the PR, where all the original commits are listed

Often? When you're trying to see when a specific change occurred you often have to go down to the specific commit, a high level group isn't good enough (particularly when the group could be shared by multiple people).

Then don't squash atomic commits representing single, logical changes. The point is that if you have 3 commits in a row correcting typos, squash those 3 into a single atomic, logical, commit.

>If instead of "squashing", it were "grouping", I'd be happy. I could encapsulate a bunch of messy commits that I made while I didn't know what I was trying to do. The intent would be clear at a higher level, but if you want to dig in to see what it actually took me to achieve that, you can see all my experimentation by looking at the commits inside the group.

That oddly sounds like a feature available in mercurial.

I don't care about how many typos you made while developing a thing. I only care about your upstream's history not getting re-written, but I don't care to see your dev history -- it's a distraction at best.

this reply is way too late, but yeah, I get that you don't care. I don't care either.

But if the developer stored that information in a commit, and there is a cheap (computationally and cognitively) way to keep that information, then I don't want to delete it.

It's not charitable for you to call it a distraction at best. There might be useful information available in the dev history. You shouldn't be forced to see it (that would be distracting), but you shouldn't be precluded from seeing it either.

I sometimes go back and “group” commits with a ‘git rebase -i’ and squash commits that are related.

Is that similar to what you’re talking about?

No, I believe what author is asking for is to keep those changes in git, but have git manage grouping of history logs for you.

E.G. you have worked in a feature branch, and committed 10 times - let those commits be kept in the log, but when running git log there must be a flag that allows for filtering based on how granular the output must be.

That can be done with rebase, but then you loose history.

I am pretty sure you can implement something like that in git based purely on commit message content.

There is no theoretical reason you can’t maintain two histories—-e.g. when rebasing have a “rebase-merge” commit that has the hash of the other tree, and optionally keep that history around in the git repo. Then you could do a ‘git blame —orig’ or whatever to switch between immutable and cleaned up history.

No VCS I’m aware of supports this. But they COULD.

Your suggestion is more flexible, but a simple way to group commits is to preserve the original branch structure like Mercurial.

I tend to commit, read my commit, and then find all kinds of mistakes and have to amend or redo the changes on a separate branch. If those options aren't available I just have a messed up commit history. This is all because git makes modifying your commit history very difficult to do. I think this immutable feature makes git worse because I have no intention of lying to myself or my team about my commit history. I would just like easier tools for pruning and organizing it. Instead my commit history is always a mess that better represents my knowledge of git than the progress of whatever I'm working on.

Are you pushing the commits with errors? Are you merging the commits with errors? If both of those are true then this sounds like a process issue.

Git makes it extremely easy to edit history, with the ability to amend any commit; even several commits back with simple CLI tools like `git rebase -i <ref>`.

However, what it doesn't like you doing is ripping the rug out from other people i.e. editing history team members are basing their work on.

The entire purposes of distributed version control is that development should always happen on a branch. Whether that's a local branch, or some temporary pushed branch (pull/merge request branch etc.) In both cases you can safely rewrite history.

However, `master` (or whatever mainline branches you have) should never contain simple mistakes (e.g. non-compiling code), because the code should have been reviewed before being merged. Of course bugs (non-simple mistakes) will happen, and these ought to be fixed in future commits. Bugs that make it into pre-release or release builds (i.e. `master`) shouldn't be edited out of history or forgotten.

Hmm, okay, I should learn how to rebase then.

The biggest part of my problem is a total lack of commit discipline but there are times when I'm working on a branch where my commits don't tell a clear story (changed something then changed it back because I decided to do it a different way). That's when I most wish for better ways to tell that story.

I feel like an idiot for not knowing rebase could solve some of this for me. ...will definitely try it next time.

That back and forth is the most important part of the story! It shows that you thought through multiple approaches to the problem, (hopefully) why they didn't pan out, and they give someone else a starting point for returning to that approach in the future.

It isn't exactly rare that I go through the blame history on some project to find out why something was done in a way that seems stupid at first glance, just to get stuck on a giant squash commit saying "Implemented X".

"back and forth" is not the same thing as "all kinds of mistakes".

No-one cares about stray keystrokes other developers make, it's just noise.

Yes, we absolutely care about the design of the software we're working on, and that's what commit messages, self-documenting code, comments, issue trackers and project management (planning session etc.) are all for.

When you squash commits in Git the default generated commit message is even to merge together all your previous commit messages. Now is your chance to look at those old messages and change "Did X" to "Attempted X, but didn't work because Y".

When I'm investigating when and why some code was implemented the way it is; I don't want to look at a Git blame trying to find when something was changed, just to see that the most recent change was reverting some earlier messing around. Just to git blame again starting from just prior to said messing around, just to see the same thing again - noise is bad!

> No-one cares about stray keystrokes other developers make, it's just noise.

Sure, and `git commit --amend` is fine for those cases.

> When I'm investigating when and why some code was implemented the way it is; I don't want to look at a Git blame trying to find when something was changed, just to see that the most recent change was reverting some earlier messing around. Just to git blame again starting from just prior to said messing around, just to see the same thing again - noise is bad!

I guess that depends on your setup. My Emacs is set up so that `b` is "reblame from before this change". GitHub's blame UI has a similar button (though that, sadly, doesn't preserve in-file context).

At that point the cost of the "noise" is more or less zero.

What if my repository is linked to a CI process that deploys the live code? And tb change needs to be done “now”?

If you are working for a larger company where processes are clearly defined, then it’s good for you and that feature is not needed. But you are loosing all of the agile feature of git in the first place.

In my situation squashing history takes away my other ability to use git as a wiki of “things that didn’t work out”. It’s important to keep that.

> git as a wiki of “things that didn’t work out”

Wouldn't a wiki be the best solution for that? Git is a software development tool, not a design or project management tool.

In git you can do `git commit --amend --no-edit` to update your last commit, in case anyone is wondering.

I don't think it updates your last commit, it just deletes it and creates a new one with the same changes (along with any new changes being amended). It's an important distinction because if you push to a remote branch that has the old commit, you need to overwrite it with the new one. Commits themselves are immutable.

I don’t squash commits very often, but I do re-order them. When i have a pile of changes to commit that aren’t always perfectly related, I will sometimes check in the cart before the horse. Rebase lets me correct that order.

So perhaps fossil really “shows what actually happened” but perhaps the more accurate statement is it shows the actual commit order. And that may not reflect the actual order of what was done, leading to an inversion of the logical progression of the code.

That said, I’m going to try it anyway for the ticket system and wiki. I use a two tier vcs system. “Official” work goes into svn and I have no control over that. Interim work for me is in git. But I’d like tickets and wiki docs to help me track issues that I can’t get to right away and sometimes forget about after a month or two.

I want clean, linear history in my upstreams. Always. You can leave pointers to your work branch(es), and even use a second parent commit in commits for this, but no actual merging, always rebasing, and always preserving clean, well-broken-up commit history.

I cant even remember the last time I was looking at a commit in git or any other source control system. I usually look at pull requests and what has changed.

Example files changed view in Github:


I choose fossil at a time when it wasn't clear which one of fossil, Mercurial, or git would "win".

I choose it because it had a very clear and easy UI, guided you towards a way of working that is suitable for small teams (ie 99.99% of all projects), and had an approach to history which made it unlikely you'd ever accidentally lose work.

The only thing which has ever made me sad about my choice is pressure from people who think everyone should use git.

It also has some very nice features, like being able to give you a full web UI just by running it as a CGI process on a web server.

It is worth trying (and it can import and export git history). Just remember it's not git.

For what it's worth, Git has a command for an instant web server https://git-scm.com/docs/gitweb

Been there, but with Mercurial. We chose it because it has sane CLI interface (as opposed to git), but eventually had to switch because we wanted to use GitLab. Not much difference all in all, except with git I still need to search the net to find a proper command, while with hg I could usually guess it and just checked help for confirmation. It saddens me that git won... Awful UX. But they had GitHub and now GitLab.

Git: One check-out per repository

Fossil: Many check-outs per repository

git allows multiple checkouts per repo.

Official docs (good luck): https://git-scm.com/docs/git-worktree

Random person's blog that explains it more clearly: https://www.saltycrane.com/blog/2017/05/git-worktree-notes/

I have no idea what use case is satisfied by git worktree, based on that blog post. In the case that you desperately needed to have two branches checked out, why not just clone twice?

With multiple clones, you would need to remember to update each clone regulary not only from remote, but also push/pull locally in case you want to compare your state with other branches.

A 'git checkout' on a large repository such as the Linux kernel may take a while (bound by I/O performance). Regulary switching between branches with many changed files becomes annoyingly slow.

If you keep multiple clones, you actually keep multiple copies of the full history, which seems like a waste of disk space (yes, deduplicating filesystems exist, but are rarely used).

git worktrees are also very useful if you have unfinished on a release branch. You do not have to make temporary commits or stashes you forget about, you can just leave the modified files as they are and switch to a different worktree to continue your work on another branch.

`git clone --local` does hardlinks, space usage is quite low, so I'm not really sure that part holds up. Maybe in Windows? Switching branches can definitely be slow tho, yeah. Do worktrees change that somehow tho?

From all I've read so far (quite limited!) they just sound like replacements for `git stash` or making a temporary commit / branch, but with a new set of commands and rules to learn. I don't find `git commit -am tmp` (literally, I just do that) to be particularly worth optimizing further, and worktrees so far sound like substantially more work.


edit: ah, yeah, updating multiple clones is definitely annoying / easy to forget, totally agreed there. that alone might make it worthwhile. I only need it like once a year so I probably won't, but I do know some coworkers who do it a lot.

git worktrees can also be useful if your build system needs different arguments per branch. For example if you keep multiple release branches installed side-by-side on your machine, it is easier to just run `./configure --prefix=/opt/release-X.Y` only once on a worktree instead of repeating this procedure every time you switch branches. That way you can even keep all your object files around and save the time to compile them again.

The same also applies to languages such as Python or Node.js, where you might have a different set of dependencies depending on the branch and don't want to regenerate your virtualenv or node_modules on every branch switch.

aaahh, so it can track un-tracked files too? that I can definitely see being useful - the lack of a "local" / "remote" git ignore split makes this kind of thing hard :|

(yea, there's .git/info/exclude, but you can't add it to a branch and have it only exist locally. and it has weird interactions when something becomes tracked later.)

> why not just clone twice?

Sometimes this is fine, but if the history is large enough (like in the case of the Linux kernel), it can start to get out of hand space-wise. If you have a deduplicating filesystem, that can help, but only so much.

Of course, there is a way to use clone which hardlinks the objects.

I use it to build multiple versions of Go from source locally. When a new release branch appears, I just `git worktree add release-branch-1.12 ../go1.12`, then `cd ../go1.12/src` and `GOROOT_BOOTSTRAP=../go1.11 ./make.bash`. It makes it very easy to try out betas while staying in sync with patch releases. It’s also great for keeping an old version of Go around if you have some project that requires it.

I usually use it to keep an unrelated branch easily accessible. For example:

- website generator (JS stuff) on "master" branch checked out at $PROJECT_ROOT, and website content (Markdown files) on "content" branch checked out using git-worktree at $PROJECT_ROOT/content.

- Project source code on "master" branch checked out at $PROJECT_ROOT, and images needed for GitHub README file in "media" branch checked out using git-worktree at $PROJECT_ROOT/media.

I never use it for quickly switching topics (that's the git-stash use case, it you can just make a temporary commit).

A workflow in a few projects I work with:

Have a "develop" branch that does not contain any generated files, and a "master" branch that does. Have a post-commit hook that uses a worktree pull every commit from "develop" to "master" and generate any files.

That type of workflow is possible by cloning the local repo, but is kludgey and brittle enough prior to worktrees, I had never considered that workflow viable.

I usually leave release engineering tasks such as denormalizing data (generating derived files) to release time and Fossil has a particular place to keep these bits, as unversioned artifacts (which are optionally synced, on a per repository basis).

I usually agree with that. The projects I was referring to are either Go (where the norm is to check in code-gen'ed files, so that things are `go get`able), or static websites (where every commit will get rolled out as soon as you push it).

The main reason I'm using worktree instead of two clones is that you can commit in one and cherry-pick in the other immediately, without pushing/fetching. You can rebase the branch in development in worktree 2 on top of something else in worktree 1.

Shared set of local branches, they might be many

Disk space?

git clone --local will hardlink the existing git objects from the first checkout so no extra space is required.

Shitty monorepos that measure hundreds of GB.

One of the bullet point advantages listed for Fossil of allowing multiple checkouts per repository is also supported by Git. There has always been a way to do it, but since Git 2.5 or so, the "worktree" sub command was added for this purpose.

Huh. I've just used another git clone of the repo, but the worktree command seems to fit my uses very well. Thanks.

Git clone is still a fine solution, and the one I prefer.

Worktree adds the ability to share config and save disk space.

The other advantages of worktree are: you can update the remotes of all the checkouts at once with git fetch and you can see all the places your repository is checked out with a git command.

From the features in Fossil and not in git:

> Branches in Fossil have persistent names that are propagated to collaborators via push and pull. All developers see the same name on the same branch. Git, in contrast, uses only local branch names, so developers working on the same project can (and frequently do) use a different name for the same branch.

Since the default is to use the name from upstream when checking out a branch, the vast majority of the time devs working in git also use the same name for a branch. Given that, this seems better stated as a feature for git (Easily Renamed Branches, perhaps).

That is very silly of the Fossil community. On the surface Fossil branching is a lot like Mercurial branching, but under the covers it's just like Git's branching: branches are just bookmarks for commits. And Git very much has local tracking branches (that track upstream branches), so it's as Fossil-like as you want, but doesn't pretend to enforce a merge-only workflow.

That's really the bottom line when it comes to Git vs. all the others: Git doesn't enforce a merge-only workflow, not even as a default, while all the others generally do.

Mercurial, for example, resisted adding cherry-pick and rebase for the longest time (just like Fossil), and when they finally added it they had to do it in gratuitously different ways from Git (e.g., you can rebase, or rewrite history, but not both at once) because they are so damned opinionated about how we all should do do our work. So now Mercurial has rebase, and it has "bookmarks" (i.e., Git-style light-weight branching), but you still can't really use it without heavy-weight branching, and you can still get into trouble where you end up with multiple tips on a branch, and it still tries to hide the machinery underneath, which makes it difficult to reason about.

Fossil also tries to hide the machinery underneath it, and that's its big sin. Just like Mercurial. The irony is that under the covers Fossil is a lot like Git but with a fabulous SQL store.

I believe the Fossil maintainers simply do not understand rebase. Or why it's necessary when you have thousands of people working on a codebase.

At Sun we did rebasing long long before that term existed, way back in the 90s (well before I went to work for them), and we did it using ancient tech known as Teamware. That's where Larry McVoy got some of his ideas, and thence Linus Torvalds. Later, when we selected Mercurial, we ended up doing rebase in Mercurial long before Mercurial adopted the feature -- we still didn't call it that, and it was all scripted. Indeed, you can rebase on Fossil just fine, if you're willing to script it around Fossil's primitive cherry-pick (which, incidentally, is pretty much how rebase is implemented in Git, though nowadays it's not scripted).

My theory is that Git -or any VCS- is only difficult to understand if you insist on abstractions that hide its mechanics completely. Abstractions are absolutely critical to what we do, so this might seem a bit incongruous, but I'm not asking for no abstractions here, just that their being leaky is actually a very useful thing in the context of VCS.


I also would rephrase this as a missing feature from Fossil rather than a feature that is missing from git.

> Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth.

I think this is an interesting distinction. The question is: what is the function of the history? Is it to document what happened - and if so, shouldn't every keystroke be committed? Or is it to document which changes relate together, and e.g. should be reverted together? Or perhaps something else entirely?

One thing that isn't mentioned is whether Fossil supports working offline as well as Git does. My impression from this page is that it emphasising creating every branch on a server as well, which implies you need to be connected? If so, I'd consider that a "missing feature" as well, as it's something I use regularly enough and that's important enough that it'd be something I'd miss if it weren't there.

> One thing that isn't mentioned is whether Fossil supports working offline as well as Git does. My impression from this page is that it emphasising creating every branch on a server as well, which implies you need to be connected? If so, I'd consider that a "missing feature" as well, as it's something I use regularly enough and that's important enough that it'd be something I'd miss if it weren't there.

The way fossil works is that by default autosync is turned on, which means that if you are connected to the internet all of your commits, wiki and issue tracker changes are synced to the server.

In the case that you're not connected, everything still works as expected, and will sync back up next time you're online and try to commit or just run `fossil update`.

Thanks, that makes sense then.

One should not have to choose - the history of complete units of work should be an abstract view of the detailed history.

So the abstract view should be useful and free from noise, but you want to be able to get at the noise anyways. But to what end? What's it to you if I typo and make other silly errors during development? What if I never commit until I'm done?

If I never commit until I'm done you will not see my mistakes, but if there's nothing like `git add -e`, then I won't be able to split up my commits logically and so the upstream history when I push will be devoid of useful detail. Even if I do commit early and often, I will almost never do all my work in the order and logical splits that makes most sense for others to see after I'm done... unless you give me a powerful rebase facility.

No rebase -> no clean history upstream, only lots and lots of merges with pointers to messy history (where authors commit early and often) or shallow history (where they commit only when done) in branches you'll almost never want to see. This is the worst possible outcome! The only logical history organization here is "the large merge commit". You get far too little abstraction on the actual history and far too much noise if you want more detail.

There's also the risk that, because you cannot separate bits of work as "done" and push them separately, your branch will accumulate ever more change and reach a point where you cannot continue without doing much more work, and then you won't be able to easily salvage any of the work done to that point in that branch. This one is a big deal to me too.

What I ended up doing before git is having multiple workspaces and copying completed changes from my messy workspace to the clean workspace and commiting from there. Git's partial commit feature just made it a lot easier to do what I was already doing.

I haven't looked into Fossil into enough detail to know if your objections are valid complaints about the way it does things, but they are not objections to the principle that I set out in in my original post.

If you have to rewrite the actual history to get what you call a 'clean' history, rather than just overlay the detailed history with a sparse abstract lattice joining the key points, then it is not a history at all. This could be a problem if you need to do a post-mortem analysis, or if you make a mistake in the rewriting of the history.

I tend to regard frequent extensive rebasing as a process smell - not necessarily a problem, but a warning that there might be one.

When I worked with VCS that couldn't rebase, my strategy was simply to not commit until everything was perfect. I had a local branch. It was just not version controlled.

When I later started using git, my workflows simply became safer and easier.

Could you give an example of where it is necessary (in principle, not just because that's the way the tool works) to rewrite the detailed history in order to get the history you want?

I realize that we generally have to work with the available tools, but it is also useful to think about how things would optimally work. When I wrote that frequent rebasing looks like a process smell, that could because the tool is not optimal.

I rewrite history (not in the upstream) every day, multiple times a day. I do it to split commits. I do it to squash commits. I do it to reorder commits. I do it to make my commits easier to review by whoever is doing code reviews for me. I do it make my commits logical: bug fixes get their own commits, features get their own commits, tests get their own commits if that's what the upstream wants, ...

It's the only way to do things that yields useful history in the upstream. What is useful history in the upstream? It's history that others can read (linearly!) that is informative and makes sense and makes it easy to bisect, git blame, and so on, to find bugs, to understand changes in the smallest logical units.

This is impossible to do without rebasing.

If you find yourself copying changes to another clone to commit them one at a time, then you are rebasing, and you just didn't know it.

History in the upstream is sacred. Unpushed history is absolutely not.

Do you do code review using a formal tool, like Gerrit or Phabricator? If so you already have a "code review history", separate from the repo history. The code review history is at times interesting to review, because it contains discussions, tradeoffs, etc.

Given that we have this secondary history, why require a completely different tool to track and access it? That's just pointless duplication. We should track all the history using the same tool.

Now to your point, it's still useful to call out "final" commits for bisecting, blame, etc. So it would be good to group commits, and hide the detailed history by default.

I very much like features like built-in wiki (which is trivial to do in Git anyways, using either a separate branch or a separate repo with a named derived from the base repo), built-in issue tracker (this is harder to do in Git, though there exist projects that do it), built-in code review, ...

Still, I've worked with codebases sized in the hundreds of millions of lines of code. To deal with that level of complexity one needs things like OpenGrok, cscope, and so on, to find one's way around. And when it comes to history, I could not care less about past code reviews or history internal to a feature branch. When I need to do `git blame` or look through commit history or a large codebase, I want to see clean history with a high signal-to-noise ratio. The more noise, the slower I'll make progress on understanding whatever code/history I'm trying to understand, therefore the slower I'll make progress on bug fixing or feature development -- I might even give up on history, and lose a lot of important information, if the noise level is too high.

For me the ability to rebase, and to require clean history, trumps all the great things in Fossil -- each and every one -- that Git lacks. And this even though I love Fossil's design.

Is it really true you never want to refer to a code review history? It can provide important context missing from even a well-commented commit.

Regardless it's possible to have both. An example is hg's changeset evolution. With changeset evolution, each commit has two histories: the repo history and the changeset history. Commands like `blame`, `log`, etc. show only the repo history; a separate set of commands accesses the changeset history.

An example where this is useful: sometimes rebasing can inadvertently produce bugs, such as collapsing two identical lines which ought to have been duplicated. `git blame` cannot check if that happened. But the changeset history, by tracking the rebase, can tell you that.

Yes, it is. In my many years in this business I have never gone back to any code review -- of my code, my reviews of others' code, or anyone's review of anyone's code.

EDIT: I suppose I might look at past code reviews when evaluating a candidate for employment. Still, there is no need to store those along with code. And if a code review comment needs to be recorded for posterity, it gets recorded in the code or in the commit comment.

No. I worked for years with software that had no way of changing history. It was never required.

It was required at Sun, in Solaris engineering. Merge commits were absolutely verboten (which prohibition was enforced by tooling) -- therefore pushes had to be linear. Commit commentary had a very specific required format.

Clean, linear history has never been required anywhere else I've worked, but I've done it ever since Sun taught me to. Just because it's not required doesn't mean it's not a good idea, and it can't be forbidden ("what happens on my dev instances, stays in my dev instances", and the only thing seen in the end is what I choose to publish, and it's going to be clean and linear).

Large projects at Sun used a rebase-heavy / rebase-only workflow like so:

                | Upstream "gate" |<------+
                +-----------------+        \
                  /     \                   \
                 /       \                   \
                v         v                   \
    +--------------+    +--------------+     +--------------+
    | Project gate |    | Project gate |     | Project gate |
    | (re)based on |    | (re)based on | ... | (re)based on |
    | build N of   |    | build N+1 of |     | build N+M of |
    | upstream     |    | upstream     |     | upstream     |
    +--------------+    +--------------+     +--------------+
            ^             ^
            |            /
            |           /       ...
            v          v
      |   dev clones    |
      |                 |--+
      | periodically    |  |
      | rebased --onto  |  |--+
      | next rebasing   |  |  |
      | of project gate |  |  |
      +-----------------+  |  |
        | ...              |  |
        +------------------+  |
           | ...              |
In large projects individuals tracked a project fork of the upstream, which forks were periodically rebased onto the latest "build" of the real upstream, and the individuals' forks of the project "gates" were rebased onto the latest project fork as needed. At the end, when all the i's were dotted and t's crossed, the tech lead would push the project gate's linear history additions to the upstream.

We did this with early 90s tech known as Teamware, with lots of scripting on top. We later did it with Mercurial (again, with lots of scripting on top). Mercurial was a mistake. Git is much, much easier to use this way than any other VCS I've ever worked with, which for me includes: CVS, Clearcase, PRCS, Subversion, Mercurial, Git.

Mercurial was a mistake.

Anecdotally, as a fellow Sun Alumni, I strongly disagree (and I know many Sun Alumni would as well). I greatly miss using Mercurial instead of git now that I work elsewhere. The cadmium extension we had in-house at Sun more than made up for any plausible deficiency and Mercurial phases let me mostly drop use of cadmium.

Yup, and then you had no real history. And one big commit. Super useful, not.

You're just assuming that. And, despite your snark, you are completely incorrect about what my commits looked like.

Ay, I meant the general you, and the snark was directed to VCSes that lack an index and rebase.

You wrote this:

> > > When I worked with VCS that couldn't rebase, my strategy was simply to not commit until everything was perfect. I had a local branch. It was just not version controlled.

if I were to do that (and I have) with anything other than Git, I'd have a hard time splitting up the commits in the end. Mercurial has `hg record`, which is akin to an atomic `git add -p && git commit`. I don't think Fossil has anything even like Mercurial's `hg record`, and it famously lacks an index/staging area.

(So Mercurial has an index! but as always with Git features belatedly adopted by Mercurial, it's a pain to use in Mercurial. If you want to stop in the middle your choices are: say 'N' and commit what hunks you've accepted so far, or quit and abandon the hunk selection work you've done so far. And you don't get to edit hunks.)

> > > When I later started using git, my workflows simply became safer and easier.

Mine too.

Ah. I see what you were saying now, but for me it was just more work to split things manually. The end result didn't change. At least, not much.

This highlights one of the benefits to framing comments in a positive manner: they tend to be inoffensive even when misunderstood.

`hg record` has been supplanted by `hg commit -i` (for `--interactive`), which has an improved UI that is certainly more flexible than the old `hg record`.

Ugh. Certainly there are things that git could do better but reading this makes me want to run far away from Fossil and never look back.

You can argue about whether “the Unix philosophy” is best for specific tools like version control, and maybe you can argue that “the Unix philosophy” isn’t really about simple vs complex tools. But I don’t think I’ve ever heard nor would I think you could find evidence that the Unix philosophy is “it just works”. That claim makes me pretty certain that this is a tool I want to avoid at all costs. And the tone of the rest of this article confirms it.

My feeling as well. It's sad because Fossil underneath the covers is a lot like Git, but with a SQL RDBMS as the store. Fossil's design is much more powerful than Git's for that reason alone. But that super-opinionated attitude they have is a disaster. Run away.

Your commit history should read like a published novel, not a first draft.

I think I might’ve read something like that in the Pro Git book and it rings true with me.

The philosophy of Fossil is to preserve the full history, which I think is a mistake.

Both approaches have value. And you'll find books advocating both ways.

Fossil takes a stance, git doesn't. It makes fossil less versatile but it is not a mistake. They never claimed Fossil to be the perfect solution for every project.

A nice feature would be the ability to group and split commits afterwards; like adding a "novel branch". So that you could keep both the exact history and a parallel high level description.

Your high-level description resemble a cover letter used on some open-source project to introduce a patch series, justifying the changes.

You could achieve the same by using an empty commit, or tracking those in the bug tracker.

DCVSs are already complex enough, this just seems like scope creep, trying to emulate a feature that is actually part of another tool (the bug tracker or the mailing list, depending on the structure).

Another issue is that it would foster a mentality where people would simply don't care about the exact history and push really badly structured commits, relying on the second-level history to explain their work. Except that if someone publishes something, it's not for this person, it's to be read by other people. The dev could keep a local branch with all their commits if they are so inclined, but that should never be sent to a shared server.

Most people are already bad enough at writing proper commits and dividing their work, I think your feature would make it worse.

Not after the fact. It's too late then, and it won't get done. History has to be pretty when you push it. The idea of being able to group commits is very nice, and IIUC BitKeeper had that.

> Git puts a lot of emphasis on maintaining a "clean" check-in history.

This seems like an inaccurate phrasing of this. Git (the tool) doesn't specifically emphasize this, but it does support this as a possible approach without excluding the other approach. Many teams have settled on development approaches with git that do indeed emphasize this, but not every team does. If you want to show your work, you sure can, but unlike fossil, git does not force this upon developers. Like the article, I don't see either approach as right or wrong, but know that I prefer the clean history approach rather than every typo commit ever.

See, if you get a choice, at least you got the choice. When you get no choice in the matter, then the tool is opinionated, and forces its developers' choices on you. Sometimes that's nice, but mostly that's when you can live with those choices. Clean, linear history is absolutely the best choice for any sufficiently large project, and never the wrong choice for small projects, so being forced to do merge workflows is really suboptimal.

Git's linear presentation of history encourages rebasing or squashing. Mercurial keeps the branch structure, which has the advantages of both squashed and unsqusahed commits.

I use the following fairly regularly (aliased):

    git log --graph --oneline
I can understand how you could see this as an advantage, but I see forcing this on the developer as more of a disadvantage. I literally never want to see in any history commits with junk attempts to fix something. After the fact, these are the least interesting things to me.

That doesn't keep the branch names. It also doesn't help bisect complex merges. Git may be better at that now, to be fair. So many projects rebase or squash that I haven't had to do it in a while.

Mercurial can rewrite history in the same ways as Git.

I never want to see junk commits, but I can't force other people to spend time cleaning them up. Squashing cleans them up quickly but loses useful information.

Merge commits stop being interesting when you have thousands of people working on a codebase and you have tens or hundreds of thousands of branches. At that point clean, linear history is much, much simpler to understand.

If you mean a squashed history, I agree. At that scale I think the advantages outweigh the disadvantages.

Not squashed, not exactly, more like: clean and linear. Linear means: no merge commits. Clean means that a sequence of commits like this:

  Add feature XYZ
  Fix bug I just found around that code
  Fix typos in XYZ
  Fix bug in XYZ
  Add tests for XYZ
  Fix another bug I found around that code
  Fix bug in XYZ
  fixup! Add feature XYZ
  fixup! Fix bug I just found around that code

  Fix bug ... (not related to XYZ)
  Fix second bug ... (not related to XYZ)
  Add feature XYZ
You would do this by doing something like `git rebase --autosquash origin/master`, then `git rebase -i origin/master` to reorder the remaining commits as above.

Reword the commits to add ticket IDs to the commit subjects for all of these, if you have this as a requirement or have an issue tracker.

Force push your branch, do one last round of build, test, review, and push when all looks good (else rinse, repeat).

Maybe you'll realize that XYZ is two features, not one, and then you might split that up into two commits (again, using `git rebase -i`, mark that commit for "edit", `git reset HEAD^`, `git add -e`, `git commit -m 'Feature WXY'`, then `git add`, then `git rebase --continue`.

Some commits got squashed, maybe some got split, and all got reordered.

The resulting history is easy to read.

This takes a bit more work. But once you get into this workflow, it begins to become second nature, and you get good at it and it doesn't slow you down. The benefits are well worth the learning curve, and it's not that steep.

In my experience, feature sets tend to converge within product categories, but I'm happiest in the long term with the program whose original design decisions I'm most sympathetic with.

In fossil's case this was architectural simplicity, an easy-to-use basic set of commands with a discoverable set of expanded commands that always seemed to meet my needs as they cropped up, collaboration features in the pre-GitHub era were killer. There are a few discrete layers of hackability for customization with minimal risk of blowing your project away.

Fossil still meets my needs admirably, and I stick with it even though it's harder to make the case for it in the current era of feature convergence.

Fossil's design and implementation are indeed admirable. Its insistence on merges is not.

As someone currently working in a project where parts are managed with git and others with fossil, I have a strong preference towards git. One major issue with fossil is the non-existent ecosystem. It simply does not integrate well with other frameworks and tools (e.g. Yocto/OE). Migrating to git entirely would be much better for the team and workflow as a whole, but unfortunately the seniors who are using fossil are quite hostile towards git, for mostly political reasons. The irony is that they have big performance issues with fossil, precisely due to the fact that fossil records everything and the push/pull commands sync the entire repo, not just one branch.

Fossil also lacks an equivalent to git submodules, which should be added to the list of features found in git, but not in fossil.

I'm surprised git caught on despite mercurial being much superior (hadn't heard of fossil before). Git has the following shortcomings which are major (some shared by other VCS too)

1) UI- terrible, terrible UI

2) Unncessarily complex data model

3) Doesn't scale well to large repos (until Microsoft's VFS- windows only)

(and many others...)

I dread every time I have to use hg (and when I can use the git bindings for it). And I was using hg before I was using git.

git has a terrible UI, granted, but I find the hg UI pretty terrible too. I really hate their approach to branching (tho that is remedied these days with e.g. bookmarks to some extent).

Speaking of data model, I find gits model to be a lot saner. I REALLY hate that hg spams my disk with tons and tons of files within the .hg directory. Ever cloned e.g. the mozilla hg repos? ugh. Bonus points for their fancy name escaping mechanism in those files, which had me run into "path too long" issues on windows boxes a couple of times already.

Mercurial's bookmarks tend to get into a bad state for me. With Git it's trivial to fix things. With Mercurial I just can't figure anything out, and I think that's because Mercurial tries so hard to hide how things work and to tell me how I should do my work.

> 2) Unncessarily complex data model

Hmm? It’s just a directed graph of SHA1s under the hood. Seems pretty simple to me once you understand that. My understanding was that Hg’s data model is actually way more complicated with more pointers.

As everyone else is chiming in, the reason git won was speed. I haven’t used mercurial in a few years but at the time when I was looking to replace SVN, git did everything seconds faster than Hg which made it the clear winner.

"the reason git won was speed"

It wasn't just speed, it was also that the Linux kernel was developed under git, that Linus created git, and the existence of github, which a lot of people liked.

After a while, it also just developed a critical mass where the attitude of people/companies that weren't using it almost inevitably became one of "everybody else is using git, so we should too".

All these concepts are extremely useful in practice. I do not see them in any way as “needlessly” complex. Difficult to learn... maybe, but not just for the hell of it.

The Git index is one of its most powerful features. Try doing a `git add -e` sometime. You get to do a bunch of work in the workspace then split it up usefully at commit time.

Other than the staging area, hg has all of these.

1) It is not beginners friendly but for day to day dev you only need to understand few commands.

2) Data model brings speed that was not possible before. git won VCS space because of sheer performance.

3) Scale just fine. Just windows have a really bad filesystem and Windows codebase is the pathologic case.

It has nothing to do with the Windows filesystem; Git simply cannot support a 5 GB working tree on any filesystem. You can call this "pathological" but this throws a lot of shade on monorepos without much critical examination of how or when they might be useful.

To be precise, git's scaling issues mostly relate to centralization and file count rather than raw size.

* Pushing changes to a central repo requires including upstream commits. With 1 commit/s to that central repo, all developers are stuck in a loop until their push succeeds. It is a human spinlock with high contention.

* Some algorithms scale linearly with the number of server branches, such as pull-without-specifying-a-branch, which becomes too slow with 100K branches (a consequence of central repos).

* Some algorithms are linear with the number of files, like git status.

* Binary files don't compress nor deduplicate well, slowing pull and clone.

Those issues apply to fossil and Mercurial.

cf. https://docs.microsoft.com/en-us/azure/devops/learn/git/tech...

> It has nothing to do with the Windows filesystem; Git simply cannot support a 5 GB working tree on any filesystem.

Can you provide a reference? I was searching a bit and only things I found was bugs in windows[1] for git lfs.

> You can call this "pathological" but this throws a lot of shade on monorepos without much critical examination of how or when they might be useful.

Windows codebase has 3.5 million files and its repo is 300GB in size. It is not normal. This is google or MS type of problem and not average git user. MS instead changing workflow decided to create GVFS[2]

[1] https://github.com/git-lfs/git-lfs/issues/2434 [2] https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-large...

> Can you provide a reference? I was searching a bit and only things I found was bugs in windows[1] for git lfs.

Apologies, I hastily mistyped, I meant 500 GB, not 5. (5 GB is about the size of my repository, which is not really so big at all and certainly something git can cope with on its own).

This series of articles should illustrate some of the issues that VFS for Git tries to address. ("GVFS" is now called "VFS for Git".)


And this is a series of articles from an engineer who's been working on improving perf in large repositories in general, not strictly related to the Windows repository:


> Windows codebase has 3.5 million files and its repo is 300GB in size. It is not normal. This is google or MS type of problem and not average git user. MS instead changing workflow decided to create GVFS[2]

I didn't say it was normal. Indeed it's uncommon. I said it wasn't pathological.

If you're versioning line-based text, and you have 80 ASCII characters per line, stored as ASCII/UTF-8, a worktree of 500GB has 6.7 billion lines.

That's 3x the source line count of Google's entire monorepo. [1]

So if you're using git for source code, 500GB is beyond pathological.

If you're using git for other purposes, then yes you might need something like Annex/LFS/GVFS.

[1] https://m-cacm.acm.org/magazines/2016/7/204032-why-google-st...

Plastic SCM claims that 5TB works and that 50GB is the average size in their cloud offering. It seems that the Free Software world does not care about such use cases.


Git lacking a functionality has nothing to do with monorepos not being useful.

I am not a beginner and not trying to boast but pretty smart and experienced.

The problem is that the edge cases that come up have solutions which need to be looked up- not derived from understanding. And when you're scared of data loss, its a very frustrating situation

Interesting ... One sane thing about git is, it is very difficult to lose data. You have to work out of your way to lose data like delete your local and remote histories. Even if that is the case, if someone else has branched meanwhile, it can be restored without any fuss.

Git makes it difficult to lose committed data. It's easy to lose uncommitted changes. Also, someone can know that changes are in the reflog but not how to recover them without making a bigger mess.

That's a really interesting criticism to make.

Can you specify a version control system which doesn't make it easy to lose uncommitted changes?

A versioning file system tracks all changes to each file. On an old VMS machine, you might have X;1, X;2, and X;3 where the number after the ';' is the version number. Normally directory listings only show the 'X', which refers to the most recent version, but you could have it display all the files.

This makes it easy to compare, say, the state of the file now with the state of the save from 3 hours previous.

https://en.wikipedia.org/wiki/Versioning_file_system points out "Subversion has a feature called "autoversioning" where a WebDAV source with a subversion backend can be mounted as a file system on systems that support this kind of mount (Linux, Windows and others do) and saves to that file system generate new revisions on the revision control system."

Quoting http://svnbook.red-bean.com/en/1.4/svn.webdav.autoversioning... :\

> the use case for this feature can be incredibly appealing to administrators working with non-technical users: imagine an office of ordinary users running Microsoft Windows or Mac OS. Each user “mounts” the Subversion repository, which appears to be an ordinary network folder. They use the shared folder as they always do: open files, edit them, save them. Meanwhile, the server is automatically versioning everything. Any administrator (or knowledgeable user) can still use a Subversion client to search history and retrieve older versions of data. ...

> however, understand what you're getting into. WebDAV clients tend to do many write requests, resulting in a huge number of automatically committed revisions. For example, when saving data, many clients will do a PUT of a 0-byte file (as a way of reserving a name) followed by another PUT with the real file data. The single file-write results in two separate commits. Also consider that many applications auto-save every few minutes, resulting in even more commits.

It adds that Clearcase supported a similar feature.

I have never used that combination.

The other half of the story is that git also makes it easy to commit.

I think git and hg are both pretty bad in different ways.

Both have pretty terrible UI but so long as one uses magit, git comes out way on top.

The data models are different and suffer different problems. A main issue with git is that it is stupid about file copies and renames. An issue with hg is that it doesn’t work well with long running forked histories (i.e. like git branches) because it stores the set of revisions of a file as a list of blocks of “complete file” or “diff from previous version in this list”

Both have scaling problems to large repos and algorithm/data structure problems which cause too many operations to be e.g. O(size of history) at least. I suppose this is better than Darcs’ model of “commit on Friday and hopefully it will be done by Monday.” If hg we’re naturally good at scaling than e.g. Facebook wouldn’t be putting so much effort into trying to make it scale (e.g. using inotify instead of looking throughout the tree for changes (which I think shouldn’t count as any vcs gains from this), having a mergeless history, rewriting a ton of hg in rust (git was always partly in C and there is now also libgit2)).

The thing that makes me most sad about hg is the lack of a really good (ie good and emacs-based) ui.

I’m all for different vc systems being developed and I think it would be good to see some real innovation potentially break up the current git-hg hegemony.

I think there are lots of good things about fossil (e.g. using an actual database that is going to scale well and avoid data corruption instead of using a specialised data structure that is hard to change and likely not so corruption resistant or battle tested or scalable but maybe let’s your data structure be “faster” for certain operations)

Another interesting vc system being developed at the moment is pijul which can be simply described as “like darcs but fast and more likely to be correct”. It feels a bit like it’s fitting in with the current trend of CRDTs, although it’s core data structure is not a CRDT as that would imply that all merges have some deterministic resolution (ie merge conflicts do not happen) and that is not the case, instead files are allowed to be merged into a first-class conflicted state which can then be resolved by later patches.

> A main issue with git is that it is stupid about file copies and renames.

Could you elaborate on this? As far as I know, file copies and renames will still use the same blob, but the tree referencing the blob can reference it using a different path in the case of a rename or reference it more than once in the case of a copy.

If you tell hg to rename a file, e.g. hg mv foo bar, it will generate a patch which essentially just says “foo was renamed to bar”, and when you look at the diff the only thing that has changed is the name.

If you merge this with a patch that changed foo then hg will do something sensible (ie either merge the changes into bar or give a merge conflict).

Git has no first-class concept of file name changes. Instead it tries to use heuristics to spot renames and sometimes they work and sometimes they won’t. Maybe if you merge a patch renaming foo to bar with one that changes foo the second patch will be applied to bar, but maybe it will behave as if you are merging changes to foo with deletion of foo.

Merging is already hard, dangerous, and non associative. The danger is less that you get lots of annoying merge conflicts than that you don’t get a merge conflict when you should (and therefore you risk accidentally changing the meaning of the merged files without knowing), e.g. if you merge “rename foo to bar” with “delete foo” and git didn’t spot the move then the merge might leave bar untouched when really there should be a conflict between keeping/deleting bar. Having wrong merges happen automatically can be a big risk when software is supposed to be very reliable.

"Git has no first-class concept of file name changes. Instead it tries to use heuristics to spot renames and sometimes they work and sometimes they won't."

Git has the "mv" command. If you "git mv" a file, why would git have to guess or use heuristics to figure out that the file was renamed?

As the Git FAQ [0] says:

> Git has a rename command git mv, but that is just for convenience. The effect is indistinguishable from removing the file and adding another with different name and the same content.

git diff, merge, and related tools have heuristics for detecting file moves (off by default, turned on with e.g. git diff -M) but they tend to break if a file is both moved and modified in the same commit.

[0] https://git.wiki.kernel.org/index.php/GitFaq#Why_does_Git_no...

Yeah exactly why I always commit renames right away and atomically. I mean it's a _relatively_ rare thing and when I have to do it I just want to make and record the change and then move on. Renaming and then modifying a file in the same commit is slightly sloppy imo. Not to say that the tool couldn't be doing a better job.

Because git stores sets of files. If you move a file and make a new commit, it's just a new set of files which says "this old set is my parent". There is nothing in there about the renamed file.

Still, at the point you do a "git mv", it can't be said that git doesn't know you renamed a file. It knows.

In fact, after a "git mv foo bar", if you do a "git status", you'll see:

  On branch master
  Changes to be committed:
    (use "git reset HEAD <file>..." to unstage)
          renamed:    foo -> bar
If git chooses afterwards to discard that information and do nothing with that knowledge, that's a separate problem.

I don't know enough about Git internals in this regard, but it's possible that in fact the index is just being compared to HEAD to infer that information. Index has a file called "bar" and none called "foo". HEAD is vice versa.

Yes, Git has "mv". Git also detect file rename. I'm not aware of the method though. I'm talking based on experience. I renamed some file normally, without Git. When I checked the `git status`, Git says it was renamed.

> Unncessarily complex data model

The git data model is certainly complex, but I'd be curious to hear why you think it's 'unnecessarily' so. I often try and drum up a new SCM in my head and the data model gets pretty complex every time.

Git has the advantage of being the right kind of terrible.

If you only push commits to a branch and use GUI tools like Github or Gitea for merging, chances are you're never going to be exposed to anything more complicated than 'add; commit; push'.

Even merging isn't that terrible and while still being painful, git makes it somewhat clear what you want.

The problem is; if you want people to use something else, you need to improve over git in the areas that matter to most people (ie, 'add; commit; push').

Git is terrible but just good enough that improvement over that terrible will be hard to accept for the mainstream.

Fossil is made by the same gentleman (D. Richard Hipp) that gave us SQLite.

    $ fossil init wibble
    project-id: c8d025508afd55c31cebedfe244a3b62f39fb6eb
    server-id:  8b340295f8268f742f91c5e972195efc0c6e578f
    admin-user: gjvc (initial password is "c5d7ba")

    $ file wibble 
    wibble: SQLite 3.x database (Fossil repository), last written using SQLite version 3026000

Yes indeed, and he comments here as "SQLite".

>Unnecessarily complex data model

I have been using git since it was released and have never needed to think about its underlying data model in order to accomplish dev tasks. Also I suspect that any complexity in the data model was quite necessary to implement its api and features in a performant way.

Git won over Mercurial simply due to Github. There were some other minor contributing factors - association with Linus, speed - but they are insignificant compared to how popular Gihub was (for good reason) and therefore how many people were exposed to git.

The Mercurial alternatives like bitbucket just didn't have the same spread, and we got stuck with year after year of teaching new people a difficult interface.

Git won the war before GitHub existed. More projects were switching to git than hg.

GitHub was then mirroring open source projects git repos without asking. (Which I think is fine, but ruffled some feathers back then)

GitHub could have easily been hghub, but they targeted git because it was already winning.

Git was more popular for C, Perl, and Ruby. Mercurial was more popular for Java and Python. It was far ahead on Windows. Google and Atlassian bet on Mercurial well after GitHub existed. The idea that only one could win would have been strange to a lot of people at the time.

It’s even more specialized than that git one because github made public repos free & private repos paid. Bitbucket did the reverse.

If not for that difference we’d all think of git as that bizarre source control Linus makes the Linux devs use.

Linus definitely helped and he pretty much killed cvs (not the drug chain).

You mean git killed SVN right? SVN killed CVS in my understanding.

I still push for SVN when we're doing work that centralization makes a great deal of sense. For example, I look at SVN+puppet to be an exceptional combination... and I really don't need that puppet repo to be distributed.

Then again, if I'm working with people in different areas, and want them to have a full reproducible copy of the repo, git it is.

I maintain a "use the right tool for the right job". Sometimes the Cathedral wins out, and other times the Bazaar wins out (NO! not the Bazaar source control!).

People used SVN, but I'm not sure enough people switched that it killed CVS.

SVN seemed like it didn't go far enough to be honest. It wanted to be "atomic CVS", but there were many long-standing issues with SVN.

SVN is probably the most prolific in the "Enterprise" arena.

To give a personal, subjective point of view of why I switched from hg to git:

  - hg was horribly slow compared to git;
  - I love the branchs model used in git to let several persons work on different parts of the same project, and I could never find a satisfactory equivalent using hg idiomatisms.
It is true that hg has a far better UI in general, but Magit fixes this problem for me.

The UI thing is more than superficial though. Yes, the command line is horribly inconsistent (and that can be fixed with Magit), but the real issue is that if you want to do anything non-trivial you have to understand how git works - what the object model is, how refs work etc.

I used mercurial successfully, quite heavily, and I couldn't tell you much about how it's implemented.

To add more praises about Magit: I would really love to use other DVCS, especially Fossil because it has built-in issue tracker and Wiki and all, but to be unable to use Magit with Fossil is a bigger drawback for me to make a switch.

At this point, I can basically do most of Git commands in Magit with just muscle memory; say, stage everything in the tree, commit amend, reset author and dates, then force push (I know) to remote I would just press with evil-magit: <SPC>gsSc-Racp-fpy (<SPC>gs for invoking Magit, S to stage everything, c to enter commit mode, -R set the reset author flag, a amend commit, p enter push mode, -f set the force flag, p push to origin, y confirm force push)

It may sounds complicated, but the Magit UI is discoverable, and once you're used to it you can do anything without even looking at the UI...

I'm not sure any VCS scales well to huge repos, and the MSFT work in this space is truly amazing. I love, for example, their use of a Bloom filter to make git blame fast!

I actually like the git UI. Like many git power users, I've come to terms with a subset of the UI that I know how to use very well. The thinness of Git's abstractions lets me think of complex VCS operations as I do when reading or writing code. If the cognitive load is too high, you can always just use a merge-heavy workflow and stop thinking about the mechanics, but I recommend instead to understand the mechanics.

The data model in Git is hardly more complex than Fossil's or Mercurial's, and it's copy-on-write all the way, which makes it very safe (think ZFS).

The simple data model is the best part of git so I have no idea what you are referring to. I do not see how one could make it any simpler and still use it to implement a DVCS. The main reasons I picked Git over Mercurial were the data model and the performance.

Re 2: HG doesn't have a published data model. I believe they have created 2 or 3 models at this point so far, but I happens under the hood. They can do this as there is only 1 implementation of HG, so no need to worry about compatible as much.

It is rare that adaption has anything to do with the strength of the technology involved. Usually the winners are those that are early to market, are adapted early by industry leaders and/or have better marketing.

In this case the reason is most definitely the fact that it used in Linux, easily one of the biggest open source projects ever. I would wager a guess that if Mercurial was created earlier, Git would probably never have been created, let alone be adapted for Linux.

Mercurial and Git started around the same time. Linus was concerned that Mercurial was similar enough to BitKeeper that BitMover might threaten anyone who worked on it or at least anyone who had used BitKeeper.

4) lots of foot guns

I would rather like a new option which was designed as a VCS from day one, that is user friendly and fast. Fossil is actually nearly there (I used it for a bit for some private projects)

"git caught on despite mercurial being much superior"

mercurial wasn't superior.

I agree with those who said they prefer git's "clean" commit history.

The comparison table in section 2 could just as easily live in the git docs, under a page called "why use git instead of fossil".

git-worktree seems to give git the multiple-checkouts feature.

Also, this page seems to do the common but frustrating thing of advertising how something is implemented as a feature. I'm sure it's very interesting to the developers that it uses a database in the backend or that your side-project is written in a niche language but, as an end user, I just don't care.

Here is an attempt to clarify why I think implementation does matter: https://www.sqlite.org/whynotgit.html#git_makes_it_difficult...

When the design of a system makes something difficult, people tend to avoid doing that thing. The Git data design makes extracting some kinds of historical information difficult, with the result that many common tools (ex: GitHub) don't bother. This reduces the information available to users, resulting in reduced situational awareness.

I make no claim that Fossil is perfect in this regard. My only claim is that Fossil is better.

Whether you use Fossil or not, I don't much care. But I really would like you to understand some of the features of Fossil that (I think) make it better than Git, and perhaps start adding some of those features to your favorite VCS, be it Git or something else. Understand the general concepts, then port those concepts to other systems.

Fossil is better. Because it has a better design. Because it uses SQL (which makes it possible to write queries on the fly in ways that cannot be done in other VCSes). The use of SQL, and Fossil's use of fairly generic SQL, allowing the use of RDBMSes other than SQLite3, is brilliant, and would make it trivial to make Fossil scale better than MSFT is making Git scale, though with the caveat that you'd probably still end up wanting things like a Bloom filter for speeding up file-level log/blame, but you could probably make Bloom filters a transparent, RDBMS-level query optimization.

Fossil is worse. Because it doesn't have a CoW design in the format of its DB, though it does have a CoW design in its SQL schema. However, because SQLite3 is so good at handling power failures and such, this has not been a big deal (and indeed, Git has had more problems there owing to its ad-hoc storage model). This, of course, could be fixed by adopting a CoW backend for Fossil. This problem is hardly fatal.

Fossil is worse. Because of its insistence on merge workflows and not implementing rebase.

Fossil is worse. Because it has minimal mindshare. Mindshare is absolutely critical. The lack of rebase support is part of the reason that Fossil cannot get the kind of mindshare that would allow it to replace Git. And it may well be too late, but you never know.

It turns out that rebase is the more critical thing for me and many others. I think the Fossil community doesn't know it and doesn't get it because the projects it deals with don't have thousands of developers active every day. And, of course, rebase is really just a feature that can be built by using scripting on top of a cherry-pick primitive (which Fossil does have), so there's no real excuse for not having it.

Incidentally, GitHub used to also insist on merges in its browser UI, but eventually they added the "rebase and merge" feature.

Mercurial too resisted rebase. Boy did they. And they gave in and implemented it. Ironically many people rebased in Mercurial long before the feature was adopted -- we did, in Solaris engineering, a dozen years ago.

My prediction is that Fossil too will eventually support rebase (out of the box) for the same sorts of reasons that all other VCSes have tended to get it.

If I had to use Fossil, the first thing I'd do is script a rebase around cherry-picking. Because I could, because the market demands this feature, because it's a built-in feature of competing, more-successful VCSes, Fossil ought to adopt it. You wouldn't have to use it. You might have a way to turn it off on a project-by-project basis. But the rest of us should get to use it on our projects.

All in all, I neither use nor recommend Fossil. But I am envious, and I would use it, if it had rebase.

Funny how they say that Git uses a "Bazaar-style development" process when Bazaar literally is another VCS.

I'm also not convinced it's a good idea to merge your bug tracker and version control.

There's a famous comparison of development styles that in turn predates the Bazaar software by years if not a decade.

Here's the wikipedia article about this work: https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar

They don't mean bzr the VCS, they mean Cathedral/Bazaar.

Have you actually tried their model of bug tracking and code revision integrated?

I have, and the built-in tools are weak or cumbersome to the point of being useless (or were when I moved off to bitbucket).

The wiki had its own weird markup at the time, so people who might have helped me write documentation were faced with learning a weird new markup (they were volunteers, that's a tough ask.)

The bug tracker had no way to send emails, and you were expected to rig up some external systems to do that. Maybe it's changed now?

Just lots of little usability things too.

These issues had such ardent defenders on the mailing lists, too, I became convinced the situation would likely never improve.

Aside from the cute idea of having your version control in one self-contained file with a single-binary server, I can't think of anything fossil does better than any other system.


I guess the ability to use the bug tracker while offline is a nice advantage, but for that to be worth it it needs to be competitive with centralized bug trackers.

It is just like distributed code review tools, I wish there was any that was good enough but the ones that exist are too primitive to be worth the small gain of being able to seamlessly review code while on a flight.

The parent I replied to had an issue with the (assumed) model. You have an issue with features. All your points are valid- the parent had no points to debate validity of.

> Push or pull a single branch

> The fossil push, fossil pull, and fossil sync commands do not provide the capability to push or pull individual branches. Pushing and pulling in Fossil is all or nothing. This is in keeping with Fossil's emphasis on maintaining a complete record and on sharing everything between all developers.

This is a kind of deal breaker if it just pushes all of my private WIP branches to remote. Is there any genuine use case for this kind of functionality? Apart from that, this looks interesting.

Fossil does have private branches[0] and the recommended way to publish private work is to integrate it into a non-private branch and sync as usual.

[0] https://www.fossil-scm.org/xfer/doc/trunk/www/private.wiki

Always surprises me how Fossil is popular on HN but very much not outside a few projects (e.g. sqlite)

Some data:

The fossil-scm.org server gets between 1500 and 2000 distinct human visitors per day on weekdays. (Lower traffic on weekends. Robots are excluded from the count. "distinct" means visitors having different IP addresses.) This is perhaps orders of magnitude less than git-scm.org (I'm guessing - anybody have stats?) but it is also non-trivial. With ~1750 visitors per day somebody must be using it. And I would guess that most actual users do not visit the site daily. (When was the last time you visited git-scm.org?)

FWIW, today has already seen in excess of 10,000 distinct human IPs, likely due to this HN post.

> When was the last time you visited git-scm.org?

Like most people, I go to Google which then redirects me to the correct Stack Overflow question/answer the most often. I think git-scm.org is mostly a guide to get started and the reference documentation.

How do you estimate Fossil's general popularity?

Most projects are invisible to you, like in-house projects that never see the light of day. If Fossil is used by 1% of those projects, that's still a lot of projects.

How can you tell if Fossil is popular in HN? By my reading, only two people in this thread use it. More than that mention having never heard of it before, or make statements where it's clear they don't really understand what it is.

Fossil is posted to and discussed on HN all the time. It's a novelty, it differs in design from the popular tools that people actually use in their day-to-day jobs.

There's really no reason to think that Fossil is used internally at a rate higher than its public-facing usage. And I've literally never seen a project that uses Fossil aside from Fossil examples.

I use it for lots of projects, spanning years of work, including commercial projects with ~20 devs. Those projects aren’t public, and I’m sure lots of other people do use it that way. It’s still not going to rival the install base of git.

I use it too. I like to self-host my commercial projects, and when I decided to move away from Subversion about seven years ago I did consider Git, but the ease of self-hosting Fossil made the choice for me. Just plug it into apache httpd and off you go. Some time later GitLab became available, so now I'm also running a GitLab instance. But the Fossil server has much smaller footprint and is easier to manage.

The ease of use/management is a big deal, and to an extent I think it falls out of technical design decisions (proper database for repo storage that fully, easily, obviously allows first class multiple checkouts of a single repo instance). Others are philosophical design decisions, most of which I think I like, some I think I don’t, and others that just aren’t in place yet. Key is it’s nice to work with, and if I’m doing development/management, as much as possible I want it to be an enjoyable experience.

"All the time"? I counted about one HN posting per month.

From https://news.ycombinator.com/from?site=fossil-scm.org there are 20 links to fossil-scm.org in the last 2 years. I manually looked at the last 2 years of submissions with "fossil" in the name and found an additional 4 which were not to fossil-scm.org.

It's over 50% higher for Mercurial. I counted about 30 postings with the name 'Mercurial' over the last two years, of which about 7 were from mercurial-scm.org. I left out some of the obvious duplicates.

You write "I've literally never seen a project that uses Fossil aside from Fossil examples".

How much effort did you put into looking, and would you have recognized one if you did?

After SQLite, the most widely used project which uses Fossil is likely Tcl. http://core.tcl.tk/tcl/wiki?name=Index says it uses Fossil 2.7. For obvious reasons, there is an affinity between Tcl projects and Fossil.

A search for '"This page was generated in" "Fossil"' in DDG finds some non-trivial active projects: https://duckduckgo.com/?q=%22This+page+was+generated+in%22+%... . "Active" means "commits in the last few weeks." For examples:

"MySQL++ is a C++ wrapper for MySQL’s C API" - https://tangentsoft.com/mysqlpp/home

"Jsi is a C (+/-) embeddable JavaScript interpreter" - https://jsish.org/fossil/jsi/doc/tip/www/home.wiki

"Cxxomfort (cxx as in C++, comfort as in comfort) is a small, header-only library that backports various facilities from more recent C++ Standards" - http://ryan.gulix.cl/fossil.cgi/cxxomfort/index

"SquirrelJME is intended to be a Java ME 8 compatible environment for strange and many other devices" - http://multiphasicapps.net/doc/ckout/readme.mkd

Yeah, it's probably not "widely" used. I know the Tcl dev team uses it for Tcl and Tk. That's pretty much the only one I know.

There is a online hosting service: http://chiselapp.com

A list of public projects (I was kind of surprised there were so many): http://chiselapp.com/repositories/

I run ChiselApp.com if there are any questions about it. I inherited it from James Turner, the creator of it.

Chisel looks nice. Thanks for doing it

I would be happy to use the site for publishing software. But to see if I can rely on the site for backup, aswell as simply general interest; I would enjoy reading (maybe a page on the site) about where it came from / what your plans are for it / longevity / intent

i.e. I want to hear more about you ;)

Are all the user projects running version 1.34 (2.7 is the latest) or just the fossil hosting itself?

It's using Fossil 2.7 for everything.

Ah, sorry then. I was just reading the footer of the home page. That's why I asked.

> SQLite uses cathedral-style development. 95% of the code in SQLite comes from just three programmers, 64% from just the lead developer. And all SQLite developers know each other well and interact daily. Fossil is designed for this development model.

Is this not terrifying? I've always thought the distributed nature of git is part of why it's so incredible. Not relying on a few handful of individuals.

Previous discussion 9 months ago about the "Why SQLite Does Not Use Git" page at https://sqlite.org/whynotgit.html which has much of the same info: https://news.ycombinator.com/item?id=16806114

> The ability to show descendents of a check-in.

This is interesting. I wonder if the same thing could be achieved fairly simply in git though by keeping an extra area in the .git folder that handles reversing / denormalizing the DAG for some of the things that these reports show.

Microsoft are innovating in this area (bringing the improvements that they have in VSTS to the client). There's a good series of blog posts on this at https://blogs.msdn.microsoft.com/devops/tag/git/ (first article https://blogs.msdn.microsoft.com/devops/2018/06/25/superchar...)

I think it really comes down to preference. Both work well and for different reasons and styles. Including tech notes and docs into the VCS could be nice, if you use it, but like most project management, it comes down to how you work-- that's one reason why there isn't a one-size-fits-all project management product that is infintely better than any other offering. There are plenty of times I wished I had something like the embedded tech notes to explain design decisions. I use git myself currently, but I've wanted to dip my toes into fossil for awhile to see if it matched my workflow. I think storing the whole repo in an SQLite database was brilliant-- self contained and you don't have a problem slicing the data to get things like all parent commits or descendant commits.

As someone not versed in either, how much does not being able to track changes forward through the history matter? It sounds like that might be pretty important, but git’s popularity would seem to indicate that it doesn’t really matter in practice?

I don't know what you mean.

In general upstreams in Git do not allow history rewriting. It's only the forks (local branches) that allow history rewriting. This is just fine.

Also, when upstreams do accept non-fast-forward pushes, there's `git rebase --onto` for recovery by downstreams.

I have to say, I’ve used Git for many years now and I’ve never, ever, ever wished for this, nor even knew it wasn’t possible. So this seems like a silly thing to claim as an advantage for Fossil.

I never thought I needed bisect, until I had it. Now I can't live without it.

Yeah totally fair point. I figured there’s probably some usecase I haven’t run into for this feature. Still, I think it says something that I’ve never missed it despite some fairly advanced Git usage.

> The fossil all command

In git, this is pretty much a oneliner script dropped in your path and named `git-all`.

Something like (typing from memory on my non-work computer):

    find . -type d -depth 1 -exec git -C {} "$@" \;
Which allows any of the following to just work (I usually alias these commands):

    git all status
    git all status --short --branch
    git all pull --rebase

I’ll pull what I consider an important quote into this thread:

> ”SQLite uses cathedral-style development. 95% of the code in SQLite comes from just three programmers, 64% from just the lead developer. And all SQLite developers know each other well and interact daily. Fossil is designed for this development model.”

FWIW, I consider this an extraordinarily productive development model.

Agile may be about People and Interactions over Processes and Tools, but (and it’s too often forgotten) all dev must first be about effective output in the problem space. So, how do you optimize ratio of Outcomes to People/Interactions/Processes/Tools? This model.

FWIW, I also consider Linus’ and Guido’s “BDFL” approach a way the bazaar is channeled back into a cathedral.

fossil is also the name of a file system server on Plan 9 that dealt with hashes on the venti archival storage server.


The comparison on GPL vs BSD is henious. The only disadvantage is that you lose the opportunity to be selfish.

I use Fossil for my own projects; I find it is less confusing than Git. However, I use cathedral style for my own projects, so that helps; I manage all of the code by myself but nevertheless will allow anyone to look at the code, make their own version with their own modifications (a fork of the project) (and do whatever they want with their own copy of the code), make bug reports, and submit patches for me to review, but not to directly write to the code repository (only I do that).

Would you recommend using fossil for Unity3D projects? Needs to exclude certain files and being able to deal with binary files.

Just giving my experience - I working in rendering graphics and use fossil for version control. Some of my repos are more than 100GiB in size and fossil has no problem with it. Fossil handles binary files adequately but the only catch is file size is limited to SQLite 'Blob' size limits (which is currently 2GiB). So if you have asset files larger than 2GiB, you'll have to split it into multiple files.

Or Unreal. I don't really like Perforce.

The comparison lacks a big item: distributed-ness: the whole git-remote push & pull symmetry. It allows not only offline bare-minimum operation, but full offline development, then later online merging. I don't see any signs that fossil can do the same.

The comparison re. licenses is factually mistaken in several ways, but that's probably not worth a great deal of discussion - it's too religious a topic.

Most comparisons points out the major differences, rather than all the ways they are the same.

As you can see from bullet point #5 from the fossil home page, it works offline:

> CGI/SCGI Enabled - No server is required, but if you want to set one up, Fossil supports four easy server configurations.

and #6 concerns later online merging:

> Autosync - Fossil supports "autosync" mode which helps to keep projects moving forward by reducing the amount of needless forking and merging often associated with distributed projects.

There's also the quick start guide with an overview of distributed-ness - https://fossil-scm.org/fossil/doc/trunk/www/quickstart.wiki .

In fact, due to the integration, Fossil supports offline work not only for code, but also for changes in the wiki and the issue tracker. You can do full offline development on code, update documentation in the wiki, comment on issues and change their state and merge all these changes later when you are back online.

Some of those features are not features.

That comparison table made me dislike fossil even more. Fossil took something that’s not broken, and broke it.

I disagree. I would suggest you to atleast research it more before coming to such strong conclusions.

You're right; did more research and it makes more sense considering the problem niche it's solving. I jumped to assumptions too soon.

svn again?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact