Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Doesn't Use Git (matt-rickard.com)
181 points by bkq on Sept 12, 2022 | hide | past | favorite | 171 comments



I really like the notion of having the whole repo including issue tracking in a single file that can be easily copied around. But Fossil's stance on history - specifically, the deliberate refusal to implement history-editing functionality like squash and rebase - is a showstopper, and their rationale for it reads very much along the lines of "you're holding it wrong".


> their rationale for it reads very much along the lines of "you're holding it wrong".

Keep in mind that Fossil was written specifically to support the SQLite team's workflow. It was never designed for _anyone_ else's needs. They don't care if it doesn't work for other people's work styles because they didn't write it for other people.


...and Git was designed specifically for the Linux kernel development workflow, I think this is a small but important detail that's lost on many people who think that Git is the only versioning system in town. A lot of other versioning systems didn't or don't allow to rewrite history either, and they're doing their job just fine.


> ...and Git was designed specifically for the Linux kernel development workflow

Yep. One thing I like to ask is, does your project have millions of lines of code, being worked on by thousands of developers scattered around the globe? If not, you might want to think about whether Git is really the best fit for your needs.


What other VCS has the same tooling, free-as-in-free-beer online services, and broad base of surface knowledge as git? When I started studying my TAs brought out a cool new tool called Mercurial which was my introduction to version control. Guess how often I’ve used it since. Git is the software equivalent of the piano: easy enough to wield but difficult to master, mechanically simple but highly intricate, standardised in many different shapes and sizes with incredible variation, and truly ubiquitous.

Apart from its ubiquitousness, these are all neutral features: there are lots of crappy piano players out there who might be better off learning a different instrument, but if you don’t have any other preference it’s hard to go wrong with the default choice. Git is one of those tools where it’s really not worth the trouble looking elsewhere unless you know what you’re looking for, because there’s value sometimes in going with the flow. Whether it was for the best that git was the tool we all settled on, I think there’s some good in the fact that we settled on something.


Well, it takes like 10 seconds to learn Subversion to a usable level. So you could argue that it is a superior system … everywhere it’s viable.


What’s the GitHub of Subversion?



I mean, GitHub supports svn clients but it’s git all the way down. Granted I haven’t used svn on GitHub but I doubt it’s treated as a first class citizen. This is really just a stepping stone to migrating your VCS to git.


This is the wrong question.

The right question is why did git need github in the first place (while svn didn't)


svn had forges (sourceforge, rubyforge, etc.) that fulfilled many of the same things that GitHub does.


Mercurial


Which has 90% the same features as git, with 10% of the tooling and knowhow around it.

It legit may be a few % better for a few workflows, but why die on that hill?


It’s a better tool. Never seen someone complain about how to use mercurial. Complaints about how to use git are ubiquitous.


I used both, and I would complain about HG.


Complain “about” it sure. Complain about how hard it is to use? I wouldn’t believe you.


Sigh


Are you still trying to use HG? How goes that fight :)


Gave up. But not a day goes by I don’t get annoyed with git.


Weird. I use it 20x a day for years, and haven't even been slightly annoyed in the last 5? years since I learned all the syntax.


> Mercurial

Is that the same Mercurial that didn't even supported stashing local changes?

Exactly what does anyone stand to gain by switching from Git ti Mercurial?


Shelve shipped with mercurial 2.8 in 2013. Prior to that there were several third party extensions that did similar.

But it’s a good example of why mercurial was better. It’s more ergonomic. All of the parameters on shelve match what you’d see on other hg commands and to shelve and unshelve you use those commands, not pop and push. You can also give short names to shelves for use in the ui, not just the message or sha.


> Shelve shipped with mercurial 2.8 in 2013.

Not really. Mercuria's shelve extension existed for a while but it only saw its way into Mercurial Core and made available by default with the release of Mercurial 5.1.

https://www.mercurial-scm.org/wiki/ShelveExtension

I feel it's disingenuous to clam that a feature was provided by an application if it's only provided as an add-on that requires the user to explicitly enable.


Why? It was a widely anticipated and announced feature.

The standard way mercurial added new features was to gate them behind configs and only after they were widely tested turning them on by default. This is a good thing!

Prior to shelve being included with the mercurial distribution a third party extension was available going back to 1.x (or perhaps before I started using it around 1.8). And get this, it also prioritized ergonomics and was largely just adapted directly.


> I feel it's disingenuous to clam that a feature was provided by an application if it's only provided as an add-on that requires the user to explicitly enable.

Would you consider it disingenuous to claim that there's visual UIs for git? It seems weird to ignore extensions that are extremely well known within the community. Plenty of people recommend using Firefox because it supports better ad-blocking extensions, even though those are also "sold separately".


been a while since i used it but best i recall Mercurially was a very close competitor to git, but in many cases had arguably better commands for accomplishing the tasks. That is to say, less memorization of archaic patterns and more "oh that makes sense". Git has gotten a lot better in this area since then.

Git stash however, is one place where it's just crap. `git stash apply stash@{3}` ?!?!? No. omg no. Sure, for "hey it works" but not for "hey let's have this be the standard interface that we give to everyone and never improve" After all these years we still can't say `git stash apply 3` ??? really?

also git stash is wildly divergent in terms of conventions relative to all the other commands.


You could say the same thing about UNIX, and yet here we are.

Sometimes decent software accidentally becomes a ridiculously broad standard. You can't resist that force by dismissing it as what it was originally meant for.

It doesn't matter if git is the best fit. ~Nobody under the age of thirty is picking between git anything else any more than they're picking between Linux and anything else to run their single node/java/postgres process on their prod servers.


> One thing I like to ask is, does your project have millions of lines of code, being worked on by thousands of developers scattered around the globe? If not, you might want to think about whether Git is really the best fit for your needs.

This personal assertion makes no sense. What leads you to assume that just because an application scales beautifully that automatically means it doesn't fit your needs?

I mean, you don't even bother trying to argue feature X is disappointing or feature Y is missing. Your whole point is that git scales so if you don't need scale then... Then what?

The truth of the matter is that Git works fantastically well both with personal static website projects with a dozen files and with million loc behemoths, whether in local repo only or with multiple remote repositories.


I'm not sure you can have this conversation in many medium to large sized organizations anymore. We retired our last SVN, Mercurial and Perforce repo years ago and migrated them to git; there is a new generation of software engineer in the industry who often literally has only ever known git. Rightly or wrongly, git has overwhelmingly won out in the VCS space for now.

I also agree "distributed VCS" like git may be more complexity than many teams need. I can't remember the last time I worked on a repo offline (arguably the signature feature of distributed VCS), ever, although I can understand how this can be hugely helpful for some teams.


You're completely ignoring the massive adoption and extension of git by the industry. Its not like git is still developed in service to kernel dev exclusively.


Regardless of whether Git is the absolute best fit (personally I’d rather use BitKeeper), the ecosystem and tooling around git makes it the best choice for almost everything.


It does, but in a roundabout way. You can create a private branch, keep committing to it and when you're ready, merge it to the public branch with a single commit message.

IMHO, this solves all issues in a very elegant way - my local repository shows all the commits (both private and public) but everyone else's shows all the commits as intended by the developers. Additionally, the public history is maintained consistently without wondering if someone messed it up.

https://www.fossil-scm.org/home/doc/4d8aecdf/www/private.wik...


Very interesting, thank you! Unfortunately, the doc page doesn't make it clear that the merge would produce a single commit; I wish this was clearer.

Also, do I understand it correctly that it's a single branch named "private" per repo?


Squashing the whole development in a single commit will lose granularity though, and make it hard to review the changes later. You could just use quilt (patch queues) for development, it's not optimal to have two tools though.


How wouod you do code reviews then?

Git style mailing of patches?


Agreed. I wrote up some thoughts about the "Rebase considered harmful" article in one called "Rebase considered essential" <https://www.earthli.com/news/view_article.php?id=3864>

"Who decides where one level of granularity stops and the next begins? I think it’s the author of the commits. My workflow over the last ten years is based heavily on being able to massage commits so that I can prepare what I share to the server repository, where it can no longer be changed. I agree that there should be an unalterable history, but disagree with the author on where that history begins."

"My workflow is considerably different than it was before I used Git or had access to rebase. I would now be much less efficient if I didn’t have rebase. It would make me constantly focus on cleaning up commits before I really care to. You could make the argument that cleaning up afterward takes more time, but I haven’t experienced that to be the case. Instead, I want to be able to set the priorities rather than worry about committing something that I cannot undo."


Cleaning up commits can also make it easier for reviewers to audit your changes. For example by breaking up a larger feature into steps that don't warrant their own pull request because they are too granular. Or by separating changes that I maybe did only to make things look more neat.


re "Rebase considered essential" article: thanks, and I agree completely.


It works for SQLite because SQLite is tiny. A quick check[1] shows that they have just 1360 commits over the last year, 76% of them from just one author (and just five authors total!).

You don't need complicated branch refactoring tools when you're only sharing with a handful of people. It's like working with RCS back in the day: history is still linear and no one needs to worry about cross-ports between feature branches.

[1] Ironically done using a github mirror, because those are the tools I know and they're fast.


It’s not true that SQLite is tiny. I don’t know what portion of the test harnesses you’re seeing in the GitHub repo you looked at, but a portion of SQLite’s tests are proprietary. [0] The size of the SQLite test code is legendary, and exceeds that of the program code by an order of magnitude.

[0] https://www.sqlite.org/testing.html


OP was clearly talking about team size and change velocity, not code size.

5 contributors, with over 70% from a single person is indeed small.


That’s totally fair, but insofar as the number of commits matters at all, the number sited in the grandparent post for total commits is deceptive. It’s a fraction of the number of commits that team makes.


They actually right from my perspective.

It so happen that I usually do commit squashed changes to many files over code base with long and heavy legacy. I am afraid that squashed changes lose their utility for bisection search for a bug.

Yet, I required to do so. Mainly, because I can.


The whole point of squash commits is to maintain bisectability while having breaking commits on your private branches.


You could also just filter down to merge commits when you look at history. I think if gitlab and github supported this, many people would change their minds about squashing or rebasing.

I don't have a problem with squashing per se, especially when it's a private branch for a single developer. The problem is when you have multiple developers working on a feature branch together, or when you have to merge in a branch with a fix or feature that has not yet been merged up higher.

My bottom line is I want as few merge conflicts as possible, and I want to see the real author and a relevant commit message if I'm digging into changes to figure out what went wrong with something.


Squashing or rebasing a shared branch would break it for basically everybody else, so who even does that? Realistically, squash & rebase is used only to clean up history in those private branches before it gets in the main repo. Fossil doesn't support it on the basis that those local commits are valuable to other developers who work on the code, and I disagree with this as a general claim.


Squashing or rebasing a shared branch would break it for basically everybody else, so who even does that?

It tends to be developers that are used to the opensource workflow of forking before submitting a patch or pull request. Who are now on a team where everyone's working on branches in the same repo. I understand wanting to combine a bunch of commits that say "trying it like this, trying it like that". but I find it's better to have the noise than lose something important, or cause a messy conflict later on.

I also don't like multiple changes in a single commit. I hate when I look into the commit history of a file and it's "added feature X" which touches 15 different files making changes to multiple code blocks in each. If I want that, I'll look at the merge commits.


Open source or not, you still have your local fork of the repo, and that's where the squash ought to be happening.


How do squash commits maintain bisectability? They do compress history, and granularity is lost, so one can bisect, but on large batches of code (to be specific, as large as merges; if a team adopts small merges, the granularity is retained, but the amount of branches may be untenable). One can keep the private (pre-squash) branches, but they won't be in the main branch history, so one can't bisect.


The sad reality is that testing and reviewing will only get us to defect density of one defect per 1000 lines of code changed - a nice round number for a clarity of discussion (and we are lucky here, see below). E.g., if patch changes or adds 300 lines of code, which is not unusual amount of code, the probability of defect is 1/3. My changeset right now contains 320 files and 3200 lines added or changed (a fix for long standing defect in design). It is not green in testing, but you get the idea: it is quite probably I will introduce some defect into main branch. Three against one that I will and it'll pass review.

Usually, private branches are get deleted after merge happen.

In the end, I will have a situation where there is a defect in the main branch and there is no way to bisect it narrower than to squashed commit.

What should I do with 3200 changed lines in 320 files when there is a defect?

[1] https://stackoverflow.com/questions/4049958/embedded-softwar...

The page above hints that pre-review defect density for C/C++ code is 4.5 defects per 100 lines of code. After review it is 0.82 defects per 100 LOC, or 8.2 defects per 1000 lines of code. My big patch will introduce 25 defects that'll pass review. Even 10 fold decrease in defect density after testing (haha) will not significantly reduce probability of the introduction of a single defect.


I don't follow.

Either you can break your big commit down into pieces that can be applied one by one - then do that using history rewriting (and testing each part!).

Or that's not possible, e.g. because it's a global architecture change or something that is already atomar. Then there is nothing to bisect anyway - there is nothing that can be done, no matter which tool you use and no matter if you squash or not.


When commit is squashed and rebased upon main branch there is no pieces to bisect.

Having private branches also does not always work. For example, there was a purge of unnecessary branches in our main repository. If I use a fork of main repository (and I do), my private branches are also not readily available to the team.


You don't have to squash it all down to one commit. Many people do, but it's a choice they're making.


There are protocols to merge and squashing was required in two of four github-based teams I've worked with.

The choice is done for you. It is not yours or theirs often enough.


I've seen that, and even successfully fought against it once. I've also been on teams that don't treat it as a religious issue in the first place, and people rebase or merge from their private forks as they see fit.

Personally, I think that the only bar that is necessary is that commit history should remain readable - which is subjective, but that's why we have code reviews. In any case, this is not an issue with the tool, but rather with the religion that formed around it, which is often the case with software tooling.

One thing that helps a lot with a pragmatic consensus is to ensure that all members of the team have personal experience debugging old code that they didn't write or previously review.


This isn't a stylistic choice for Fossil, it's a technical one. In Fossil, all clones have identical artifacts (modulo private ones) so rewriting history by changing the set of artifacts creates a much harder synchronization problem.

Fossil DOES allow you to modify history, by adding new artifacts which retroactively change how previous artifacts are interpreted. They are called Control Artifacts and the set of things they can change are limited, but there's no reason additional ones cannot be added.


Apparently it does support squashing.

> Fossil does not support rebase. Here's an article from the author titled, Rebase Considered Harmful. While Fossil has the ability to squash merges, the primary workflow supported is merging.


Rebase Considered Harmful article is here: https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md


In my opinion, the worse part of the article is that it only shows the simple cases of feature branch history.

In particular, they show the following comparison:

Merge:

            -> C3 -> C5 -> C7
           /           /
  C1 -> C2 -> C4  -> C6
Rebase:

            -> C3 -> C5]   -> C3' -> C5'
           /             /
  C1 -> C2 -> C4  ->   C6
However, a more realistic example of a feature branch that lives for even a few days looks like this:

Merge:

            -> C3 -> C5 -> C7 -> C9 -> C11 -> C13 --------> C15
           /           /           /                    /       \
  C1 -> C2 -> C4  -> C6 -> C8 -> C10 -> C12 -> C14 -> C16 -> C17 -> C18

The final history after rebase being:

                                                                -> C3' -> C5' -> C9' -> C13' 
                                                              /                          \
  C1 -> C2 -> C4  -> C6 -> C8 -> C10 -> C12 -> C14 -> C16 -> C17 --------------------------> C18
A much much simpler history to follow, and obviously shorter.

And the complexity without rebase grows even more if more people are doing local merges and pushing to the same remote branch.


I feel like you inadvertently confirmed the authors point. In your “complicated” example, in order to visually show the baseline after rebase you removed the earlier C3, C5, C7 and C9 commits. But the authors point is that merge commits just need better tooling that can make it clearer what the baseline is!


It's git that removed them, not me: rebase re-writes history, that's the whole point. They simply do not exist in the repo anymore after the rebase operation is committed.

In fact, following the same merge history as in the first example, I should have named the final ones C3''', C5''', and C9', since they get re-written several times. But, most of the time, this is entirely irrelevant to their history (especially when they don't even touch the same files as the changes being merged in).

Note that C7, C11 and C15 completely disappeared, since they are unnecessary. Instead of being extra commits clogging up the log, they are the history rewrite events that don't need to be consigned.


> Note that C7, C11 and C15 completely disappeared, since they are unnecessary. Instead of being extra commits clogging up the log, they are the history rewrite events that don't need to be consigned.

In my experience, there's no such thing as an "unnecessary" merge. Every merge commit encodes tree changes from the various merge strategies. Explicit merge commits keep those (mostly) separate from deliberate developer changes and make it easy to trace back a subtle merge bug or the wrong merge strategy.

Git's modern merge strategies today feel almost like magic and "never mismerge", so I don't blame anyone for never having had to trace through merge commit history for subtle mismerges today, and assuming in all cases the merge strategies "just do the right thing". But it still happens, those subtle, terrifying moments when the magic breaks down in subtle ways and the merge tree output isn't what you expect or need.

From my perspective, trusting rebases and squashes and cherry picks is putting sometimes way too much trust in git's various merge strategies. I do trust them, and rely on them for a lot. But I've also seen the dark sides of them: the need to find mismerge needles in large haystack branches, the gut wrenching low level micro-management of rerere caches to avoid similar mistakes in the future. Personally, I'd much rather have a million "unnecessary" merge commits than ever again need to mentally "bisect" a mismerge from a deliberate choice in some long gone developer's hand cherry picked and squashed commit where all the merge details are mixed in with all the other code changes and no clues where the dividing lines were.


Git can trivially rebase out those merges, but that begs the question why they were even done in the first place.

I can imagine that sometimes a feature depends on changes from develop after the start of the feature branch. This means that development started too early.

Cherry-picking my changes on to a new branch will hide that fact. Rebasing will conserve the authoring timestamp. I guess it depends on the specific situation what's best.

In a trunk-based development workflow, I'd have to merge asap and maybe hide things behind feature flags or compile-time switches to avoid such merges.

Things get even more interesting when the merges from develop affected files that were changed on the feature branch. You will likely get conflicts if you attempt to rebase them away.


Multiple features are normally developed in parallel. It's good practice to keep up to date with the latest develop/master to avoid surprises at later stages.

Rebasing your work onto the head of master before merging it in, versus cherry-picking the relevant changes from your branch onto the head of master, are perfectly identical in terms of end result - whichever workflow is easier for you will achieve the same thing.

And of course you can get conflicts if multiple people are developing over the same files - this will happen regardless of using merge, rebase, or cherry pick (or even patches over email). You fix the conflicts when merging / rebasing / cherry picking / accepting a patch.


> They simply do not exist in the repo anymore after the rebase operation is committed.

They do if yet another tag or branch still links to those commits. And in practice they'll stick around for a while anyway, accessible through the reflog.


Sure - I should have said instead that they do not exist in the main branch's history anymore.


IMHO, both Git and Fossil are wrong here.

Fossil requires users to either commit stuff they don't want to see later, or hack away until finished, cherry-picking changes as necessary. This isn't a good workflow for everybody.

Git allows users to do anything they want, but your Linux master branch may not be the same as Linux Torvalds' master branch, and history can be changed at anytime, while merge commits can be completely rewritten, losing the actual merge information.

Both of these are wrong.

There should be a way to make WIP changes without committing it nor committing to it. (Yes, those are different things.) But there comes a point at which a change should be committed and committed to, to never change again. (With the exception of the nuclear option of erasing accidentally committed secrets and the like.)

Fossil is too rigid. Git is too flexible.

There is a middle ground. My design for such a VCS would have two kinds of commits: saves and full commits. Saves would be WIP stuff; anything goes. They would also be on auto-generated branches. Once WIP stuff is done, it should be to be rebased, squashed, merged, and/or cherry-picked into actual full commits on the branch that the saves created a branch off of.

But once those commits have been created, they are there forever. They should be a commitment. (I love the double meaning of the word "commit" here.)

With this design, people could have their cake and eat it too: they could have WIP stuff that they can still massage into a history they like, but then they would also have solid, reliable history with proper merge commits.


There will always be a need for fully rewriting history, e.g. secrets that were accidentally pushed, code that was included in violation of copyright, etc.


My post explicitly called out a nuclear option for such cases. Even Fossil has such an option; it's called "shun."


Squash and rebase are doing it wrong.


why do you care about my 10 "wip" commits on a feature branch? Why am I not allowed to package a change in a nice set of commits once it's done/reviewable?


Not them, but I would hope the messages are better than "WIP" and expose context vs a commit that simply says 'Implemented X feature". I usually leave those for a tag.


If you don't have commits called WIP you're not committing enough. You're doing it wrong.

I kid, I kid! But I do commit just to push a back up before getting on a train, or a convenient point to diff against as part of a refactor, etc.

They should all be a single commit by the end.


I like seeing small commits, mine and other peoples. It means when something's broken I can easily narrow down the cause to a single small change and either unbreak it myself or point the appropriate team/person at it, making it a quick and easy fix.

If squashing happens then all I can immediately say is "something in this 600 line changeset over 12 files for Feature X broke it". The bug report for that one is going to be more vague, get allocated more story points, and maybe stay in the backlog for several sprints (or forever).

If people are pushing their feature branches and not deleting them that makes life a bit better, but generally people who want their git history to be "clean" and squash commits also want to get rid of old branches.

Someone might say code review should have caught it. Code review almost never catches actual bugs. Or tests? Tests only test stuff that the author expected to break.

The main things I try to ensure are that every commit is small, every commit has a useful message, and no commits should break the build.


Squashing and rebasing doesn't have to produce one gigantic commit. It can take, say, 20 "WIP" commits (many of which might not even build), and organize them into 5 commits that constitute logical units with descriptive messages. But raw WIP commits themselves are rarely so organized, unless you invest a lot of time and effort into that while coding.

Personally, I prefer my coding to be free of such distractions, and to use SCM facilities as a scratchpad for fast iteration on the code (the ability to easily reverse changes etc); this results in many commits that don't make sense in the final pull request / code review.


I've heard of people doing this, but never seen it in real life. Do each of your "logical unit" commits build and run properly on their own? I'm interested in the concept, if there's a tool/process which makes that easy to achieve - but it sounds like possibly more work than just making good small commits to begin with.


I also do this. I prefer the various checks all pass, although I am likely to temporarily break lint when "make the change" and "make it pass lint" are independently obviously correct but look messy when composed. It is very rare (and I'll loudly call it out in the commit message) that a similar thing applies to types or tests and I'm more likely to actually disable the relevant test than suppress lint I'm fixing in the next commit in the same PR (although on reflection maybe I should be doing that).

The bulk of my PRs are still one small commit, but when a feature gets larger I find that "what's the next thing to try to solve the problem" isn't always the same question as "what will make this most readable for a reviewer." I tend to address presentation of the commits in the PR at about the same time I'm addressing readability of the resulting code, as they wind up being related concerns (though not exactly the same thing).


It really depends on how you choose to define the granularity. I prefer my commits to all be buildable at least, and ideally also pass the tests (so new code + new tests for it should be in the same commit); but that is subjective and there are valid arguments for smaller commits that are not necessarily isolated like that.

I can't think of any special tools you'd need for this? A git history visualizer with the ability to easily diff and squash ranges helps, but this is nothing special for pretty much any modern IDE. At the end of the day, you just look through the WIP branch history and identify chunks that span multiple commits but logically represent a single change towards some goal.


"I prefer my commits to all be buildable at least, and ideally also pass the tests (so new code + new tests for it should be in the same commit); but that is subjective"

In my view, it is not subjective. All commits locally must pass unit, integration and e2e tests, and be locally tested by the developer, every...single...one.


Right, OK - pretty manual then. I was envisaging some sort of n-way diffing tool which simulates several staging areas and changes could be moved around line by line, maybe a button to run a build + tests for each in parallel. Gets difficult as soon as code in one of the middle commits needs a bit of extra tweaking to work though, I suppose.

I'll have to try it sometime, I have a feeling it wouldn't normally turn out too differently to what I end up with anyway. I do think all commits should build and run properly - otherwise git bisect and similar processes stop being useful.


I think usually my issue is with context. I know the benefit of squashing, but I feel they get abused and you loose a lot of context.

WIP of a function/method, they could be squashed into a good commit that explains what and why. But some people implement a full feature, squash all of it into a single commit that simply says 'implemented Y'. But you loose context on why they modified the different functions/methods.

Yes, you can try to find it, but there's no context so you're just dealing with a huge block of code that modifies other things, doesn't just add new code.


I second the "you're doing it wrong" note. The use of a tool that permits easy squashing and rebasing opens up new ways of doing development where you can commit and checkpoint at any moment, cheaply. So e.g. when you inevitably break something, you can bisect to where it happened, or when you find you've added a mismatched assumption you can see where the thought process went wrong.

You don't need to have things tidy, you know you can recover anything you do. So likewise when you're "done" you can squash and split them into changes that make more sense logically before pushing to some shared tree that someone else is going to have to reason about.

People who develop without tools like this tend to do it in a big flat directory and think about "commits" as something done every day or so. Once you get beyond that style, it feels really clumsy.


I think that preferences along these lines are inherently subjective, and some people may well really be better served by the more traditional workflow where every commit is something you commit to. I just wish this wasn't presented as the one and only right way to do things, to the point where software actively resists any other.


Isn't that backwards though? Git supports "rare commits that you commit to" just fine. It's Fossil that refuses to support the "commit rapidly and frequently on a whim and fix up a submission later" idiom.


That's exactly my problem with Fossil.


> So likewise when you're "done" you can squash and split them into changes that make more sense logically before pushing to some shared tree that someone else is going to have to reason about.

Which keeps context which is my point.

The other case that I see is squashing it all into one that simply says 'Implemented feature Y' which doesn't provide any context into why something was changed.


--amend


You can't --amend remote, sorry.

If you don't backup your code to remote even if it's not finished, you're taking serious risk.


What does remote storage have to do with branch management? Push to your own branch! That's the core idea behind git development paradigm, and on sites like github it's as easy as clicking a "fork" button.


Because remote storage is either my fork of the code I work on, or my private repo which I develop my own software.

Of course it's a branch other than master/main, even if it's on "remote".

The thing is, I do not always continue developing on a single/same system, so I do "transfer" commits. I push unfinished code to its own branch on remote, pull from other system and continue developing.

When the code completes and passes all the tests. I merge to either development to prepare for the next version, or to master if the utility is small enough.

The place where the development branch lives doesn't matter, and --amend ing a single commit during a feature development is not the most correct way either.

Oh, and I don't use GitHub for my own software. That part is over.


Many companies like the idea of working with a single remote where individual developers prefix their branches with their username. The main reason is that the workflow is much simpler with only one remote endpoint. Having only one remote makes continuous synchronisation much easier (you don't need both "upstream" and "origin"). There are other benefits to this: If someone gets sick, you can find their work in the same place, and you can more easily pass over a branch to someone, since everyone shares remote workspace.

The drawback of one remote endpoint is that it becomes less obvious whose rules apply. You could argue that other people can mess up your remotes, but it isn't until you get to very large organisations or public development (e.g. FOSS) that you should need to deal with adversarial behavior.

I've had bosses who got angry that I force-pushed to my own remote branches because they liked to review code without being requested by pulling the branch, and after a force-push that doesn't work. But my defense was always: Let's establish a protocol on how to cooperate; there is surely some way we both get what we want, and that git supports.


Well... you can force push to a remote with sufficient privs, but you really don't want to do that if you can help it.


A remote can be your personal remote endpoint, or it can be on a shared remote endpoint, but prefixed with your username. As long as you `--force-with-lease` to a feature branch you control and agree on a protocol with potential collaborators, the harm is minimal.

I mostly avoid creating these kinds of conflicts when people's limited git experience would cause stress or unnecessary use of people's time fiddling with the history. Or if more than one person is doing stuff independent of one another.

I've had colleagues that frown on rebase, and people who can't not.

One of the beauties (and complexitties) of git is that it allows for this diversity of workflow.


Why should I abuse git while I can commit my developments part by part with nice descriptions?


Because you may want to back up your work remotely before you have the time to write something nice. I had to leave halfway through writing a unit test today. My colleague asked me to back up the code by pushing it. And now I'm back, fixing that test, writing that commit message. Most likely we'll merge right after, so my colleague probably won't need to add more, so the branch is in my control.


I generally work on a single feature over many commits. Sometimes I need to leave the thing half finished, so commit as is with a nice commit message detailing the state.

At other time I make "transfer commits" since I'll continue development on another machine and need the latest snapshot of the code to continue.

Not all features fit into a single new function, we need to move mountains to rearrange stuff, and to keep code tidy.

As long as the commit messages are clear, and the code works at the end, it's alright.


>Not all features fit into a single new function, we need to move mountains to rearrange stuff, and to keep code tidy.

Of course. But when modifying another, you can commit and say,

Implemented function X. Modified Y to accommodate this new argument that's used on X... and so on

>As long as the commit messages are clear, and the code works at the end, it's alright.

I'm fine if the commit messages are clear. The problem is when squashing, some people squash and don't keep commit them clear or some even squash into one that doesn't provide any insight or clarity, 'Implemented feature Y' and you get a diff of thousands of lines that touches everything


> Implemented function X. Modified Y to accommodate this new argument that's used on X... and so on

Sometimes I need to write "Implemented function X, but evaluates the result wrong possibly because of this. Fix this first, then continue".

> The problem is when squashing, some people squash...

A single commit touching whole codebase and only says "Bug fix" (or similar) is bad. I concur.

Also commit history should have enough granularity allowing bisection and partial rewind to understand problems and other side effects.


What message do you use for the commit you make after finishing for the day? Me going home usually does not coincide with any feature or change being completed.


Completed function <A>. Started function <B>

Code for function <B> has the skeleton. Still need to add <...>


> why do you care about my 10 "wip" commits on a feature branch?

If that's how a team develops PRs, then the commits are structurally suboptimal, and the problem is in the development practices.

Such team won't be able to efficiently bisect, independently of the repository structure.

I personally work with granular, self-standing commits, and with this workflow, non-squashed commits makes sense.

Obviously it's not possible to always have self-standing commits, but it is possible the vast majority of the times.

It takes a lot of practice and discipline, though, and if a team is not willing to put them, then of course, squashing in the only way that makes sense.


Why would I want to bisect on a feature branch?

Edit: I strive to keep my feature branches both short (# of commits) and light (# of lines changed).


After a feature branch is merged, its content is candidate for any bisect.

If the branch was squashed, bisecting will be less effective, compared to the same branch, merged without squashing (at the conditions of the commits being self-standing).


Do you find that you often have authoritarian takes, or just in matters pertaining to software development? If the latter, I suggest you pause and reflect on that. There is no one right way to do things, different workflows work for different people. Tools which accommodate the most number of people will be the most popular.


SQLite also doesn't use CVS, SVN, or Perforce.

If this title were "SQLite uses Fossil", it'd be more apparently just another ad for Fossil.


There is an element of click bait but for many young developers Git is the only VCS they ever knew so Git in the title would give more hints about the article content.


As a younger developer, I agree with you. I've heard of Subversion & Fossil before but know practically nothing about them. By the time I hit the stage of learning version control in my development as a programmer, there was no questioning git would be what I would learn, any discussion I saw of other version control systems was written with a tone that suggested they were antiques to no longer be used. Sort of how psych majors learn about Freud for the historical significance but don't actually use his theories anymore.


Also because of the mention of "fossil" would have meant nothing to these young developers. The use of "git" immediately brings to mind version control and the question: what does sqlite use then?


On the other hand, the internet isn't just for young developers, and even if there are a billion other VCS options to pick from, the ones with the most traction are the most relevant. If you put "I can do Fossil and Subversion" on your CV but everyone is hiring for Git, you're not a great match.


The author likes to have this slightly edgy tone, as far as I understood by reading a couple of his posts.

He likes to "engage" his audience in a way.


Which is not something I hold in high regard.


I personally don't like this style of writing and/or discussions either, but wanted to point the fact in most neutral way I can manage.

From my observations, some people think that they can extract more arguments or have a more fruitful discussion by being slightly spiky like that. I do not share this view.


Agreed, still Git too popular and Fossil is very interesting


Richard Hipp (SQLite guy) spoke about the thinking behind writing Fossil instead of using Git on Corecursive episode 66.

If i remember correctly, it wasn't that they had any real problem with git, but that git didn't really fit with how they wanted to work, so they created their own thing.

https://corecursive.com/066-sqlite-with-richard-hipp/


Fun fact: Fossil uses sqlite. And sqlite uses fossil. Think about that.

SQLite is what I consider a well-engineered product.

And Richard Hipp is an excellent engineer. Listen to his podcast episodes on the changelog. It's fun. His mentality is interesting.

https://changelog.com/person/drh/podcasts#feed


It has bootstrapped it's VCS!


The single file repository is super convenient for one person projects, but just a word of caution to fossil users - backup the repo file often. A while back I downgraded the fossil binary (by accident) and committed a change to a local repo which caused it to be corrupted. It was a repeatable thing, and was annoying at the time, but not a showstopper. I only lost a day's worth of work. I wish fossil had done a version check between the binary and the repo to refuse to alter the repo if the binary version was incompatible.


I use Fossil at home since it's just me. The single file approach takes up less space than the comparable git tree.

Fossil has the ability to push to git, which I do periodically just for curiosity.


Your dev environment is so resource constrained that the size of the repo is a factor?


There are a lot of things which make Fossil attractive for solo and small developers.

An example would be the excellent diffing tools accessible through localhost, as well as things like issue tracking, which git doesn't understand as a concept.

There were some deal-breakers for me (kind of esoteric) and I switched back to git, but not without some annoyance.


The web interface is one of the main benefits in my opinion, especially if you develop on wsl2, on a server, or inside a container.


Some of us likes to optimize for footprint, and that's OK.

I run an OrangePi Zero with 512MB of RAM as a home server since it both works and is a fun experience.


On Windows git performance isn't the greatest due to the sheer number of tiny little files (NTFS is absolutely awful). If you have a lot of small commits the number of tiny files grows and grows and grows and git can feel really slow compared to something like fossil. Especially if you're regularly copying your repo from one filesystem/machine to the next (without compressing it into something like a .zip first).


You can't run "git gc" sporadically on your repo?


The issues with git performance on Windows are a lot more complicated than the number of files.

The gc command doesn't help, and git automatically runs it after some operations.


I have 11809 files in my repo. Opening a git tool (GittyUp) on this repo takes minutes but seconds on Fossil (on a Pi4 with USB3 NVMe).

I use Windows + Linux machines + Pi and share between them so don't use the Pi4 exclusively but have noticed a difference in its ability to handle and open the repo.


At times a single large file is easier to handle than many small files.

Ever try and copy the .git part of a git repo? When it gets big, the copy gets bogged down by the sheer number of files.


If you do a git clone locally (yes, git clone works with local paths) you can get all git objects packed into a single .pack file). The pack file may also deduplicate unchanged portions of tracked files. You can even do a shallow clone to reduce the size (and history) of the pack file.

All of this is also possible to do on a repository locally, without cloning.


With regard to rebase and fossil vs git when you have a small team of trusted developers rebase is harmful because you lose information that could help you understand the _why_ of something. In contrast if you are working with thousands of untrusted developers you need a way to increase the signal-to-noise at the cost of some granularity.

This comes about from their development styles (see: http://www.catb.org/~esr/writings/cathedral-bazaar/).

- Use fossil if you prefer the cathedral style of development SQLite is open source but does not accept contributions.

- Use Git if you prefer the bazaar style of development. You want to accept hit-and-run style contributions from anyone.


I don’t understand the dislike of rebase — do you want hundreds of commits that say “linter fix” and “reformatting”?

If there’s commits like “fixed off by one issue in refactor” then it should be a comment in the code so that future will explorers won’t make the same mistake.

Rebase by itself isn’t bad, it just has to contain a good description of what was done and why.

Edit: after posting I realize that I confounded rebase and squash — which is kind of ingrained in my head as that’s how I use GitHub to merge PRs.


They are better than one single commit with 100s of changed files. Especially when you are debugging things and want to know why exactly some line was changed in that way and by whom.


To be honest, I hate Git. The commands don't really make sense outside some basic stuff. For certain organizational setups it seems like overkill. I am not a full-time programmer but I do programming for my job and for personal projects sometimes and I never enjoy using Git.


It's not intuitive, but it's an amazing tool that has stood the test of time for decades. I used to struggle with Git, but after putting in a little extra time and studying it more, now I feel like it's a crucial tool in my toolbox and I can use it for much more than _just_ storing my code. Git helps me understand what my colleagues wrote, gives me incredible control over breaking my work apart, and gives me tools to troubleshoot issues (bisect).


After moving to Git from Perforce for ~5 years now, I still miss P4V's Timeline view so much when trying to understand when and why some code changes happened.

It's like a kind of blame + log in a single UI, in Git terms, and is just wonderful for quickly moving through the entire history of a file.


This doesn't really address any of the GP's points.

There is no consistency in the language or syntax of the commands when taken as a whole. Even individually some of the commands don't really make any sense and harken back to seemingly random unix CLI flags. "I wasted enough time to memorize them" doesn't mean they couldn't be cleaned up and made a little better.

If you're a single developer git is indeed overkill (not that there's a much better alternative).


I keep hearing that Mercurial does all of this and is more ergonomic, which begs the question as to why it never captured mindshare the way git did.


> why it never captured mindshare the way git did.

If nothing else, network effects. Linux uses git, github is popular.


How much it matters for the average repo is debatable, but git is considerably faster than hg.


Git is part of the official doctrine of the Cult of Torvalds, anything else is heresy.


It should reduce its features, collapse options to one-way-to-do-thing. Undo mistake should be first class citizen, I've seen lots of gymnastic commands here, and they won't work for all cases.


Important note mentioned here is that sqlite doesn't accept external contributions and has a very small team. The Fossil folks and the sqlite folks are also essentially the same team. Fossil is a boutique VCS which essentially only exists for sqlite.


...and that also has certain consequences for Fossil's design and workflow; namely, an aversion to the "rebase"-style development [1]. A project with a handful of known contributors could reasonably use Fossil; for a project with dozens of contributors, which also wanted to be able to accept "drive-by" contributions, Fossil would be unworkable.

[1] https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md


Exactly. The Tcl Core Team (TCT) uses Fossil.


> I still believe that a squash+merge (or sometimes, rebase) workflow pairs better with pull requests

Unpopular opinion, but I find the value of having the actual surrounding context in which the code was developed more valuable than having a flat history.


It depends on that context, if they're the end of day "work in progress" or "typo" commits, I don't see the value in retaining those.

A non-linear commit history is much more difficult to run a bisect on which was enough to sell linear for me.


The bisect might be easier to do, but it's vastly less useful if you're looking for why and not what since the commits are not actually the authors originals and the state of the project you find is not the state the author authored them within.


For us only the original authors squash and tidy up their own commits so they can add as much context as they want.

That's not to say each PR is one commit, if there are multiple distinct parts of a PR that could be cherry picked in a way that they make sense on their own they'd still be separate commits.


> That's not to say each PR is one commit

This thread is about squash merge, which converts the whole branch into a single commit.


`git bisect --first-parent` is just as easy as a bisect on linear history. (It just follows the "linear" line down merge commits via their first parent, which by convention should be their "base" parent.) Plus it gives you the option to bisect "inside" a branch if you need to inspect something "low level" once you've narrowed down the "branch" based on merge commit. I think if `--first-parent` were the default to git bisect a lot more people would care a lot less about "linear commit history". But it is still an easy flag to apply yourself.


typo commits might be really useful, especially if you can bisect a bug from one of those commits. it happened to me.


> I find the value of having the actual surrounding context in which the code was developed more valuable

But only if those commits are real atomic changes of code which compiles and not a series of 'oops', 'tests pass once again!' , 'forgot to commit this file' type inner-monologue type history.

And then there are shops where less technical authors are contributing solely via the GitHub WebUI. So you get a commits of the ilk 'Updated <Filename>' or ''Files added via upload'.


I'm not sure this is too unpopular. In our groups not rebasing seems to be winning.


Author confuses SQLite's development model with Fossil's at the end, the latter welcomes contributions: https://www.fossil-scm.org/home/doc/trunk/www/contribute.wik...


I think it's just ambiguously written. The way I first read it, I thought the author was indeed referring to SQLite.


I’d use fossil more for personal work if i wasn’t hopelessly addicted to magit.


I feel like the (outside-in) discussions of magit/orgmode are a stealth campaign for emacs adoption. Whenever I experiment with a vim/neovim migration it always seems... underwhelming, and I chalk it up to 'bad port'.


pretty much :)

there are lots of people who use emacs for those two things and other editors/ides for other things.

i use emacs for more than the two, but certainly not for everything.


Anyone use Fossil for professional / team development?

I'd love to hear some real-world experiences.


I use it for payment MSP software (Python based) that I license to a couple of companies. I'm a mostly solo on this project, and switched off Github and git about two years ago. I prefer prefer a "merge only, if possible, please" workflow in git, so fossil was pretty easy. Fossil does not have ubiquitous ide support like Git does, so if you are an IDE lover, it may not fit well. Also, for solo use autosync is nice - commits are automatically pushed.

Like SQLite, Fossil is very nice software.


The sound of it from the article reminds me some of https://trac.edgewall.org/


The SQLite people not using git or hg is like the Bell Labs people and three-button-mouse heavy sam/acme: when you’re right about everything 99/100 times, that 100th time you’re really going to double down on some dumb-ass thing.

In what possible world is telling the most successful set of design decisions about document management in the history of documents or management to get fucked not just “I’ve got goodwill to burn, let’s ride.”? SQLite is indeed badass, but cool it djb, I’m already stressing about the zero bugs ever thing.


What might look like a most successful set of design decisions for you, can be viewed as a set of unfavorable tradeoffs by somebody else.

To squash or not to squash is a never ending debate precisely because of that. There isn't a right way to do it, each approach has a set of tradeoffs.

Maybe they are reinventing the wheel, but at their scale it's a tiny decision that doesn't mean that much. If they find out they are wrong, they can choose to migrate to Git instantly.


They aren't burning any "goodwill" by choosing to use something different or odd. If you feel goodwill being burnt, I think you're taking their choice in VCS too personally.

SQLite doesn't solicit contributions from programmers outside their organization, so it's really no skin off your back.


Programming existed before git. Programming will exist after git.


Maybe? Technically we don’t know anything about a free software world without Linus in charge other than TMI about Stallman right?


Do you know SQLite is a few years older than Git?


Indeed. I started paying serious attention to SQL in 1994 once I had read Stroustrup so many times I could recite it aloud, when did you get into databases?

Edit: I’d like to apologize for mouthing off like that. I’ve earned some amount of cranky old-timer cred, but not enough to just be a dick, which that was.


Assuming by "the most successful set of design decisions about document management in the history of documents or management" you meant Git, I see what Fossil does as an iteration no those ideas, not so much telling them to "get fucked." Recognizing that things like issue queues are commonly used together with version control repos and then integrating the latter more tightly to the former is a step forward rather than a complete refutation.


> three-button-mouse heavy sam/acme

Acme is great. I haven't been able to find another editor that integrates as tightly with the operating system. The closest I've gotten is Kakoune. And I wish mouse chording were a thing more generally. If we're going to use mice, we should optimize their utility.


https://www.nbcnews.com/technolog/how-fast-fast-some-pro-gam... claims that effective APM can hit 600, that sounds high to me but not by a lot. 10hz on the keyboard is probably not a realistic sustained rate for a hardcore vi/emacs user.

I easily hit 4hz at the end of a 20 hour coding bender, and I’m old: the hot shit kids on modafinil must be doing 1.5x my decrepit ass.

If you can really move on the keyboard? Touch the mouse? I’ll grab a coffee while I’m at it.

Edit: It turns out that “fast emacs” doesn’t turn up a video of a real pro easily on Google. You can see a code God like Russ Cox being slow and clumsy as fuck if you want: https://youtu.be/dP1xVpMPn8M.

It’s a good thing thing he gets everything right on the first try (not sarcasm, Cox is an alien life form optimized for doing the impossible) because if he didn’t he’d finish pound-defining everything by about next March.


Alt title - SQLite doesn't Git it

Great read :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: