
Linus’s rules on keeping Git history clean (2009) - jjuliano
https://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg39091.html
======
stinos
When I first read the often-repeated advice _NEVER change public history_ or
something along those lines I was like 'yeah, 'duh'. My opinion on that
changed a bit though and I reckon this advice follows the standard pattern of
good guidelines being falsely wrapped as strict rules which must never be
broken.

In practice: I only work in smaller teams and we use feature branches which
get merged into master after review, having the netto effect pretty much no-
one is ever working directly on master. So it if happens someone does
something like pushing a commit only to figure out 2 days later it has a typo
or even needs a small code change or anything which would improve the commit
with trivial changes not really worth creating another commit, we just go
ahead and fixup/rebase/force push i.e. rewrite public history. Since the rest
of the team always does pull --rebase of master anyway and/or rebases feature
branches this is not a problem at all.

~~~
davnicwil
The question is though, why would you even bother changing public history,
even if you can work around the practical problems?

In my view, the concept of a commit mapping exactly to a functional change,
and therefore being able to be correct or incorrect, improved, etc, is going
against the grain of what revision control is. A commit just is what it is. If
it contains a typo, a bug, etc, you notice and fix it 2 days later and that's
another commit. Git just describes what happens. What is the utility in
pretending that didn't happen and rewriting the history of changes as if you
never made that mistake? Who benefits?

If you are concerned about keeping master 'stable' so that checking out any
commit will result in a clean, working codebase, you can use abstractions on
top such as tags to point out to people which commits are good and/or bad.

I get the idea of a stable, neat git history as though you were all knowing
and perfect is comforting, but it's also nonsense and trying to attain that is
just wasted effort. Just let git describe what actually happened, yes it's
chaotic, yes there is constant rapid iteration, mistakes made and corrected
etc, but that's just the process of building stuff. That's the reason you
shouldn't rewrite history. There are pragmatic exceptions, though, like
writing out egregious errors like committing security keys that can't be
quickly changed.

~~~
doubleunplussed
I'm convinced that the obsession with rewriting history is solely due to
inadequate tools. Git doesnt keep the name of a branch after it's merged, so
people want to make merges look like a single commit on top so that they don't
face this ambiguity. Github doesnt even display the branching structure in its
commit log, which also shows a woefully small number of commits per page,
further incentivising squashing/editing. Many tools (some are better than
others) display commit history in a similarly non-dense way or in a way that
implicitly discourages branching in some way, e.g. gitk doesn't even display
commits from other branches by default. Large numbers of commits are also
unwieldy when commits are hashes that cannot be ordered mentally just by
looking at them.

Over in mercurial land people are more likely to keep history, even though
history rewriting is not only equally powerful, but more safe than git via the
'evolve' extension. We can limit our bisecting to a single branch, such as a
stable branch or the default (mercurial parlance for 'master') branch,
skipping over commits in feature branches that have been merged in. We can do
this because the branches retain their identities post-merge. The most widely-
used tool, tortoisehg, displays large numbers of commits densely, with the
full tree structure and branch names on display by default. Commits can be
referred to via their hash or by a simple incrementing integer (which is only
valid on your local clone, but still, this makes things easier for local
work).

So we keep all those typo commits - they're usually in feature branches anyway
since we don't merge until features are done and we try to keep the default
branch functioning. If a merge breaks something, we bisect on the default
branch only, which will tell us which merge commit broke it.

I'm still sad that git won the VCS wars over mercurial.

~~~
dahart
> I'm convinced that the obsession with rewriting history is solely due to
> inadequate tools.

The article at the top literally said explicitly never rewrite public history,
so what obsession are you talking about exactly? Git has what you want as long
as you don’t mistake local operations before push as “history”, and instead
only consider history to be commits that have been shared with other people.
That makes more sense anyway, there’s nothing sacred to preserve in the
arbitrary, noisy sequence of things I did while I was bumbling around on my
machine _before_ I push.

Git was designed with a toolset that shows every commit and lets you clean up
your own work before you contribute it to public history. Its tools work well
when you understand git’s design and use it the way it was intended. Git is
not Mercurial, though, that’s true. Perforce isn’t Mercurial either.

Git can limit bisect to a single branch, and normally does skip branches until
you want to descend into them. Don’t confuse losing the branch name with
losing the branch, git doesn’t lose the branches, only the names, and only if
you delete the names.

~~~
doubleunplussed
I'm talking about attitudes in general and not disagreeing with the post.

I agree with the advice never to rewrite public history, and I totally agree
with Linus's approach. He is in the minority with this attitude though, since
never rewriting public history means never doing a squash merge and never
rebasing a merge/pull request at merge time (both of which are common
practice). I suspect even people who endorse the idea of never rewriting
public history kind of don't think of the fork from which a pull request is
coming as 'public' even if it literally is.

I love the kernel's "keep-all" approach and want more people to use it, I bet
if they did the tools would improve to actually work better with that style -
whereas right now I think the tools are driving the workflow instead.

~~~
dahart
> right now I think the tools are driving the workflow instead.

Okay that's fair, I think that's true. To some degree it has to be true to
matter which tools you use, right? Even if it's Mercurial.

I haven't personally seen squash merges and rebasing pull requests being used
on pull requests of large multi-person branches very commonly, are you saying
that's common? I agree that there's common practice of using squash merges and
rebasing on private branches, or branches that contain commits by only a
single person and contain only code commits.

I'm looking for clarity, not disagreeing with you. The 'principled' argument
for never using rebase is almost always attacking the branching practices of
individuals and not teams. There definitely is a fuzzy line between pushing to
your own branch that is visible to others, but nobody else touches. I'd
normally consider that case private, not public, even if it's "literally"
public.

I don't feel like I'm hearing what the tangible advantages of never modifying
history are. Why is history considered more sacred than clarity of semantic
intent? People make mistakes and noise, a _lot_ , why shouldn't the tooling
allowing fixing mistakes and cleaning up irrelevant noise after the fact, as
long as it doesn't affect others?

Edit: I'm realizing another conceptual line to draw beyond what makes history
"public": the question is one of whether you're going to rewrite history out
from underneath other people. If not, and you're the only person affected,
then you made the local history in the first place, there's no principled
reason to prevent you from updating your own work, because it's equivalent to
making the same change before committing. If your rewrite is modifying commits
that other people already have, then you're inflicting damage on other people.
You may cause them to have merge conflicts, you may be modifying code
dependencies they're working on but haven't pushed, it's bad for very
practical reasons. Using this lens of what other people depend on, does that
help clarify your examples of squash merges and rebased pull requests?

~~~
regularfry
> I haven't personally seen squash merges and rebasing pull requests being
> used on pull requests of large multi-person branches very commonly, are you
> saying that's common?

I have. Github makes it quite easy to fall into this.

> Why is history considered more sacred than clarity of semantic intent?
> People make mistakes and noise, a lot, why shouldn't the tooling allowing
> fixing mistakes and cleaning up irrelevant noise after the fact, as long as
> it doesn't affect others?

I've got a concrete example of where it causes problems: code reviews. If
you've reviewed a branch at a specific commit, and standard practice is to
squash merge into master, or to otherwise allow rebases after the review
point, you lose the confidence that what's on master is actually what was
reviewed. I've seen cases where people got into the habit of getting reviews
done, then doing a squash rebase _locally_ , and including tidy-up commits
which had never been seen by anyone else before merging straight into master.

If you're in an environment where the rule is that Everything Must Be
Reviewed, that's a problem: it's far too easy for an accidental bug to end up
on master despite the code reviews and the automated tests on the preceding
branch being green.

With the example above, I never would have seen the problem unless I'd been
trying to use the git history to measure some statistics about how long it was
taking us to get code reviews done. It was only because I was looking at the
history commit by commit that it jumped out.

~~~
dahart
That's a good example, IMO, and yeah it should be very much frowned on (or
outright disallowed) to modify an approved code review before pushing without
further review. That is kind-of a code review workflow problem, more than a
discussion of whether rebase should "never" be used though, right?

The company I work for now has both notifications for commits in code reviews,
so everyone sees if you modify something after it being approved, and some
repos also have lockdown features where the approved review is tagged and
cannot be checked in if modified. So this can be solved with some tooling
around code reviews, and git itself doesn't exactly add up to a modern code
review toolset. This may be as much or more of a Github problem than a git
problem... acknowledging that there's a large swath of developers that doesn't
really know the difference between them.

~~~
regularfry
It's a bit of both. If you don't have a strong "thou shalt not rebase"
culture, it can be difficult to get people to accept the inconvenience of
getting re-reviews on the branch they've just committed a typo-fix to, so you
end up leaning on more complex tooling to force the issue.

------
ptsneves
My work flow consists in working in some topic locally and committing in the
smallest topic units possible. Then as I make some fixes due to typos or plain
wrong logic I just amend or fix up. The longer I keep branched out then bigger
the likelihood of having a lot of fixup commits. Then I use this awesome magic
git rebase somehash^ -i --autoquash. It re orders all my fixups automatically
so it is very painless to consolidate everything before pushing. Maybe it is
very known but this fixup/autosquash thing changed my productivity a lot.

~~~
yebyen
I don't think this feature is very well-known, in fact I've seen it mentioned
before, thought "how useful! I will absorb this into my brain now" – and
promptly forgot about it.

I think there is a critical mass of advanced git features required to be
really fluent in git, and there is a sizeable fraction of everyday git users
who simply haven't made it all the way there yet. Some teams have at least one
person who has enough git features under their belt...

To recognize when merge conflicts are being haphazardly made unnecessarily, to
be really particular about the shape of the git commit history and be aware of
what merges can make those conflicts, to occasionally show the rest of the
team some of these tricks or bail someone out when things go haywire, but at
least in my experience spreading all of this knowledge to the rest of the team
is a long slow process.

------
kazinator
If you want clean git history, you must burn "git merge".

No commit should ever have more than one parent.

~~~
favorited
That might be clean, but it's not history.

------
ahaferburg
I'm not very experienced with git. I use `git rebase` to pull changes from a
remote into my private repo in order to prepare pushing my changes. My
understanding of rebase is that it takes a commit history with one parent and
replays them all onto a different parent. So I don't understand how rebasing
would destroy history. Could someone explain what the issue is?

------
d_burfoot
In my opinion source control history is much overrated. Instead of documenting
your diffs, you should just document the code itself.

Every developer has some limited time budget for writing documentation. If you
spend lots of time and frustration trying to get the git history correct, you
have less time left over for actual code documentation.

> See? All the rules really are pretty simple.

Hmmmm.

~~~
viklove
I'm guessing you've never had to find the source of a bug via git bisect.

------
throwamay1241
'Commit to master, that way merging is somebody elses problem'

------
sgt
I bet a lot of people will find this advice completely confusing. Emailing a
patch?

~~~
Gorgor
Even if not many developers use an email-based workflow for their own work, I
think it’s pretty well-known that the Linux kernel does.

~~~
Tharkun
I suspect you're overestimating the number of developers who know or use
Linux. Many developers use Git on Windows and don't know anything about Linux,
let alone about the workflow of the kernel team.

~~~
u801e
The git project uses the same workflow for development that the Linux kernel
does. That is, using a mailing list to submit and discuss commits before
merging them into the maintainer's repository.

~~~
Tharkun
Sure. But many users of git don't know (or care) about the workflow of the git
team. They know github. Maybe gitlab. Or bitbucket. They could go their entire
development careers without ever sending a patch as an email, instead relying
on feature branches and pull requests.

This could be completely alien to them, and it seems wrong to downvote srg for
pointing that out.

------
yters
Is this all about making it possible to do delta debugging with git commits?

------
AstralStorm
Please add (2009) next time.

------
deepsun
Isn't that obvious? How else could it be?

~~~
nas
Look at the cpython history. There is a major change from Mercurial to Git.
It's not that Mercurial doesn't support the "rebase" workflow. It does
discourage it though. Personally, I much prefer the "rebase your WIP changes,
avoid merges" style. It is much easier to work with the resulting history.

~~~
Svoka
I never got this argument. What the point of "clean history"? History is for
log of work, not for being nice read.

~~~
rtpg
"Commits ordered by when I figured out a bug" has a lot less value than
"Commits where changes are grouped by semantic logic".

Stuff like git rebase don't work so well if you have a bunch of WIP busted
commits as well.

Maybe you don't commit that often, but people I work with (and myself) commit
pretty often so it's easy to have just outright mistaken git commit messages
like "fix X" followed by "actually fix X". going through a rebase to have "fix
X" mean "fix X" will be great for the future debugging session with a git
blame.

~~~
stonogo
On the other hand, "fix x" followed by a couple lines of rationale, then
"actually fix x" with a discussion of why the previous fix was wrong is more
useful to someone trying to understand x.

~~~
rgoulter
I think the discussion is valuable. I think adding those insights to a wiki
'Pitfalls' page or whatever is valuable.

From the top of my head, the cases where I'm looking at Git logs are:

1\. Code Review. Most of the time I'm reviewing code is looking at a diff. But
obviously one of "fix" and "actually fix" is redundant. A clean history also
benefits if I want to focus in on one of the commits.

2\. Annotation/Blame. If I'm debugging through some issue and looking at older
changes, it's nice if coupled changes are in the same commit.

A warts-and-all history has some advantages over rewriting git history (e.g.
you could find patterns of where "actually fix" happens and try and improve
those), but rewriting history makes the log a better communication tool.

------
hanniabu
If you're working with a group of people you should never git pull prior to a
commit and always git push -f

~~~
juliusmusseau
Joking aside, "git pull -r" is pure magic in the face of upstream rewrites. It
always does the right thing:

[https://mergebase.com/doing-git-wrong/2018/03/07/fun-with-
gi...](https://mergebase.com/doing-git-wrong/2018/03/07/fun-with-git-pull-
rebase/)

After I wrote that blog post someone told me the reason for the magic is the
use of "git merge-base --fork-point" under the hood.

~~~
yebyen
`pull --rebase` is nothing, go back and read the history of `rebase
--preserve-merges` and now `rebase --rebase-merges`

I have worked with both, and the new one is much better. Of course this only
matters if the branch you are rebasing has merges in it (so in all likelihood,
you must be a release manager to need features like this)

Between rebase-merges and --onto, I don't spend hardly any time fixing up bad
merges anymore.

