In practice: I only work in smaller teams and we use feature branches which get merged into master after review, having the netto effect pretty much no-one is ever working directly on master. So it if happens someone does something like pushing a commit only to figure out 2 days later it has a typo or even needs a small code change or anything which would improve the commit with trivial changes not really worth creating another commit, we just go ahead and fixup/rebase/force push i.e. rewrite public history. Since the rest of the team always does pull --rebase of master anyway and/or rebases feature branches this is not a problem at all.
In my view, the concept of a commit mapping exactly to a functional change, and therefore being able to be correct or incorrect, improved, etc, is going against the grain of what revision control is. A commit just is what it is. If it contains a typo, a bug, etc, you notice and fix it 2 days later and that's another commit. Git just describes what happens. What is the utility in pretending that didn't happen and rewriting the history of changes as if you never made that mistake? Who benefits?
If you are concerned about keeping master 'stable' so that checking out any commit will result in a clean, working codebase, you can use abstractions on top such as tags to point out to people which commits are good and/or bad.
I get the idea of a stable, neat git history as though you were all knowing and perfect is comforting, but it's also nonsense and trying to attain that is just wasted effort. Just let git describe what actually happened, yes it's chaotic, yes there is constant rapid iteration, mistakes made and corrected etc, but that's just the process of building stuff. That's the reason you shouldn't rewrite history. There are pragmatic exceptions, though, like writing out egregious errors like committing security keys that can't be quickly changed.
Over in mercurial land people are more likely to keep history, even though history rewriting is not only equally powerful, but more safe than git via the 'evolve' extension. We can limit our bisecting to a single branch, such as a stable branch or the default (mercurial parlance for 'master') branch, skipping over commits in feature branches that have been merged in. We can do this because the branches retain their identities post-merge. The most widely-used tool, tortoisehg, displays large numbers of commits densely, with the full tree structure and branch names on display by default. Commits can be referred to via their hash or by a simple incrementing integer (which is only valid on your local clone, but still, this makes things easier for local work).
So we keep all those typo commits - they're usually in feature branches anyway since we don't merge until features are done and we try to keep the default branch functioning. If a merge breaks something, we bisect on the default branch only, which will tell us which merge commit broke it.
I'm still sad that git won the VCS wars over mercurial.
GitLab and Fisheye usually display very well the graph of branching&merges
Also, I have this wonderfull git alias :
lg = log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date-order
The article at the top literally said explicitly never rewrite public history, so what obsession are you talking about exactly? Git has what you want as long as you don’t mistake local operations before push as “history”, and instead only consider history to be commits that have been shared with other people. That makes more sense anyway, there’s nothing sacred to preserve in the arbitrary, noisy sequence of things I did while I was bumbling around on my machine before I push.
Git was designed with a toolset that shows every commit and lets you clean up your own work before you contribute it to public history. Its tools work well when you understand git’s design and use it the way it was intended. Git is not Mercurial, though, that’s true. Perforce isn’t Mercurial either.
Git can limit bisect to a single branch, and normally does skip branches until you want to descend into them. Don’t confuse losing the branch name with losing the branch, git doesn’t lose the branches, only the names, and only if you delete the names.
I agree with the advice never to rewrite public history, and I totally agree with Linus's approach. He is in the minority with this attitude though, since never rewriting public history means never doing a squash merge and never rebasing a merge/pull request at merge time (both of which are common practice). I suspect even people who endorse the idea of never rewriting public history kind of don't think of the fork from which a pull request is coming as 'public' even if it literally is.
I love the kernel's "keep-all" approach and want more people to use it, I bet if they did the tools would improve to actually work better with that style - whereas right now I think the tools are driving the workflow instead.
Okay that's fair, I think that's true. To some degree it has to be true to matter which tools you use, right? Even if it's Mercurial.
I haven't personally seen squash merges and rebasing pull requests being used on pull requests of large multi-person branches very commonly, are you saying that's common? I agree that there's common practice of using squash merges and rebasing on private branches, or branches that contain commits by only a single person and contain only code commits.
I'm looking for clarity, not disagreeing with you. The 'principled' argument for never using rebase is almost always attacking the branching practices of individuals and not teams. There definitely is a fuzzy line between pushing to your own branch that is visible to others, but nobody else touches. I'd normally consider that case private, not public, even if it's "literally" public.
I don't feel like I'm hearing what the tangible advantages of never modifying history are. Why is history considered more sacred than clarity of semantic intent? People make mistakes and noise, a lot, why shouldn't the tooling allowing fixing mistakes and cleaning up irrelevant noise after the fact, as long as it doesn't affect others?
Edit: I'm realizing another conceptual line to draw beyond what makes history "public": the question is one of whether you're going to rewrite history out from underneath other people. If not, and you're the only person affected, then you made the local history in the first place, there's no principled reason to prevent you from updating your own work, because it's equivalent to making the same change before committing. If your rewrite is modifying commits that other people already have, then you're inflicting damage on other people. You may cause them to have merge conflicts, you may be modifying code dependencies they're working on but haven't pushed, it's bad for very practical reasons. Using this lens of what other people depend on, does that help clarify your examples of squash merges and rebased pull requests?
I have. Github makes it quite easy to fall into this.
> Why is history considered more sacred than clarity of semantic intent? People make mistakes and noise, a lot, why shouldn't the tooling allowing fixing mistakes and cleaning up irrelevant noise after the fact, as long as it doesn't affect others?
I've got a concrete example of where it causes problems: code reviews. If you've reviewed a branch at a specific commit, and standard practice is to squash merge into master, or to otherwise allow rebases after the review point, you lose the confidence that what's on master is actually what was reviewed. I've seen cases where people got into the habit of getting reviews done, then doing a squash rebase locally, and including tidy-up commits which had never been seen by anyone else before merging straight into master.
If you're in an environment where the rule is that Everything Must Be Reviewed, that's a problem: it's far too easy for an accidental bug to end up on master despite the code reviews and the automated tests on the preceding branch being green.
With the example above, I never would have seen the problem unless I'd been trying to use the git history to measure some statistics about how long it was taking us to get code reviews done. It was only because I was looking at the history commit by commit that it jumped out.
The company I work for now has both notifications for commits in code reviews, so everyone sees if you modify something after it being approved, and some repos also have lockdown features where the approved review is tagged and cannot be checked in if modified. So this can be solved with some tooling around code reviews, and git itself doesn't exactly add up to a modern code review toolset. This may be as much or more of a Github problem than a git problem... acknowledging that there's a large swath of developers that doesn't really know the difference between them.
Eh? This is trivial to change by specifying the "--no-ff" option to `git merge`, or by setting the config option "merge.ff" to false.
You can use convention to store this information, e.g. the first parent is always master, or you can put the info in the merge commit message. But it's hacky.
In mercurial the branch name lives on forever, attached to those commits whether they are merged or not. "Closing" a branch in mercurial is just a hint that it isn't going to be used for the time being and so shouldn't be listed in tools that list branches, but doesn't actually remove the label from previous commits. So the commit history has the branch names still after a merge. This way you can say "show me all commits in the master branch" (as distinct from "show me all ancestors of the tip of the master branch") and this will exclude feature branches, and is ideal for bisecting.
In our repos, we allow neither, so all non-merge commits are, by definition, on a feature branch on the right-hand-side.
I guess one person's hacky convention is another's primary workflow. ¯\_(ツ)_/¯
And what about maintaining a release branch? There are good reasons to be merging in both directions sometimes, even if you never commit directly to master.
We do, simply because it's less cognitive overhead to read log/blame output when it's less chaotic. This doesn't mean that there's zero chaos in the commit history of course. But less than when we'd just never fix simple mistakes right away.
Rebasing on master/stable/release branches is another story altogether.
It’s probably fine on a very small team, but pull with rebase doesn’t protect you from merge conflicts when someone rebases master underneath you, this is why the practice of rebasing public history is and should generally be considered an anti-pattern.
It was the best example of an exception to the rule I could think about, but personally, my threshold for whether rebasing master is ever allowed might be closer to 'no if team is larger than 3 people, otherwise try not to, think twice, and ask everyone first.'
In Linux, iirc, there's a lot more sharing code via email and the like, peer-to-peer. In the rest of the world, it goes through a central repository.
Anyway, a bit more flexible is that it's fine to rebase history and force push, as long as it's on your own branch or as long as everyone working on the same branch is aware that it's about to happen and acts accordingly. If they don't though, you're going to have a bad time.
The most important thing is to be conscious of what effect your actions will have on others. And of course, never do it on master - and if for some reason you have to, take a good look at what caused the problem in the first place. It's good practice in most projects to lock down master in whatever repository host you use (e.g. github) and only allow changes via pull requests.
(That said its still a good rule, and unless you know what you are doing you should follow it)
When you really understand the implications of changing public history, you will also know when you can break the rule.
Are you using git in an open source project, or is this proprietary software within a single company?
I think there is a critical mass of advanced git features required to be really fluent in git, and there is a sizeable fraction of everyday git users who simply haven't made it all the way there yet. Some teams have at least one person who has enough git features under their belt...
To recognize when merge conflicts are being haphazardly made unnecessarily, to be really particular about the shape of the git commit history and be aware of what merges can make those conflicts, to occasionally show the rest of the team some of these tricks or bail someone out when things go haywire, but at least in my experience spreading all of this knowledge to the rest of the team is a long slow process.
No commit should ever have more than one parent.
Every developer has some limited time budget for writing documentation. If you spend lots of time and frustration trying to get the git history correct, you have less time left over for actual code documentation.
> See? All the rules really are pretty simple.
This could be completely alien to them, and it seems wrong to downvote srg for pointing that out.
Basically you can serialize your git history as patches, transmit them any way you like (including email), and seamlessly re-integrate them as fully complete git commits.
I sometimes use this even when working only with myself, i.e. if I want to test a change on multiple machines before I'm ready to push. Then I can use normal git workflow to rebase, etc.
It might not be useful for you or your projects, but that's up to you. Linux is the other side of the coin though - they NEED to be able to read a commit from ten years ago, see what was changed (the diff), why it was changed, and by who it was changed and signed off. Take a look at their repository, e.g. via github for ease of access. Here's a random commit: https://github.com/torvalds/linux/commit/e1e54ec7fb55501c33b.... Even with my complete lack of knowledge of kernel development, I can tell that the minor code change is backed up by a lot of reasoning and intent.
But, it's a wholly different use case. Most applications I've worked with are basically webapps that will be thrown away within five years, where it's not as important.
For me, it's all about understanding context of a change, the "why" (commit message) not "what" (code comments).
I can run regression tests to find where things broke and, with the messages, have a good understanding of the context and mindset the developer was working in.
In my editor, I can select some code and click "show history for selection" and get a complete log of what happened on those lines. If the commit messages are good, I'll have on understanding of the context of the change.
Missing commit messages usually result in "I don't remember" type emails from the author when I inevitably ask what them why they made a change.
Stuff like git rebase don't work so well if you have a bunch of WIP busted commits as well.
Maybe you don't commit that often, but people I work with (and myself) commit pretty often so it's easy to have just outright mistaken git commit messages like "fix X" followed by "actually fix X". going through a rebase to have "fix X" mean "fix X" will be great for the future debugging session with a git blame.
From the top of my head, the cases where I'm looking at Git logs are:
1. Code Review. Most of the time I'm reviewing code is looking at a diff. But obviously one of "fix" and "actually fix" is redundant. A clean history also benefits if I want to focus in on one of the commits.
2. Annotation/Blame. If I'm debugging through some issue and looking at older changes, it's nice if coupled changes are in the same commit.
A warts-and-all history has some advantages over rewriting git history (e.g. you could find patterns of where "actually fix" happens and try and improve those), but rewriting history makes the log a better communication tool.
I think it’s maybe the difference between emotional truth and literal truth. The emotional truth is the valuable one so rebasing (which doesn’t mean squashing to one commit!) can mean you can clean out the noise and get something valuable.
A pre-condition for git bisect working is that each commit must run well enough to successfully test for whatever behavior change is being tested for. Otherwise, you'll identify some commit where the code went from working to not working, but if it's just some idiotic typo instead of actually the change that you are looking for, you've lost.
I'd rather have a git bisectable history that correctly reflects a steady progression of the product than one that records every typo some developer made for posterity. That I typo'ed a variable name in a version that never shipped to anybody and then had to commit "FIXUP" is not useful information. A clean commit that changes behavior, but is subsequently revealed to cause a regression against a test that won't be written for another two years, is incredibly valuable.
Some of you say you don't get the appeal of a "clean" history; I say back I don't get the appeal of a pedantically historical view of history. I have never gone digging through some old history to figure out whether or not some particular piece of documentation was at some point in the past misspelled. Completely uninteresting. I have never cared about the process of how a particular thing was arrived at, with all the false starts, not to mention that if I did, trying to read a series of patches isn't how I'd want to do it. I have cared about being able to cherry-pick a single clean commit to backport some feature, I have cared about git bisect, and I have cared about the ability to revert a particular feature via "git revert" without having to figure out which discontinuous set of half-a-dozen patches need to be reverted because almost no commits from the past can be reverted without making fundamental breaks to the build, not for fundamental reasons, but because they introduce typos and break variable names and re-do accidental file deletions, etc.
History as a log of work is way less interesting than history as a queryable and manipulable data structure representing the various mostly-valid states of your project, and the ability to manipulate those mostly-valid states at a project level. Composing two valid states of the project together to get a third is incredibly powerful, and when two valid states compose together to create an invalid state, there's real information of some kind there. This doesn't work if your history mostly consists of invalid states. Composing two invalid states of the project together to get another invalid state isn't a surprise, it produces and teaches nothing.
This is why, whenever I can, I have a git pre-commit hook that checks the compile of everything and runs all my test cases. It's better for that to be the habit, and to have to occasionally bypass it for some reason, than for the default to be allowing any ol' commit to fundamentally break whatever. Doing it in a CI system is fine too; I do what I can to keep the local tests working but it's not always possible. The key is just that something is done to ensure validity is maintained.
> but if it's just some idiotic typo instead of actually the change that you are looking for, you've lost
I’ve used `git bisect skip` in this situation loads of times without issue
After I wrote that blog post someone told me the reason for the magic is the use of "git merge-base --fork-point" under the hood.
I have worked with both, and the new one is much better. Of course this only matters if the branch you are rebasing has merges in it (so in all likelihood, you must be a release manager to need features like this)
Between rebase-merges and --onto, I don't spend hardly any time fixing up bad merges anymore.
-f or --force
on the same line as git. If someone feels it needs to be done, I try to stop them; failing that, I make sure they're the ones doing it. I don't need that kind of risk.