1. Projects like the Linux kernel which use frequently use 'git bisect' to perform a binary search on the history of project (to find when a bug was introduced), or where the patch series tell a story to code reviewers.
2. Open source projects, where some contributors have terrible git habits that the maintainers don't want to merge.
Editing git history makes less sense in other circumstances:
1. When dealing with less sophisticated git users who don't understand git's data model. About 90% of git nightmares begin when a novice git user tries to rebase something.
2. When it adds a whole layer of unnecessary process for zero payoff. If your developers all work in short-lived branches with clean histories, then just go ahead and merge normally without forcing everybody to jump through a bunch of hoops to beautify a history that nobody ever looks at anyway. Git was designed to handle branches.
I always twitch a bit when I see a 10-page blog post describing a "git workflow", with all sorts of complicated branching rules and heavy use of rebasing. That can make sense in certain specialized situations, but it shouldn't be considered a "best practice" that everybody needs to emulate.
I tend to justify this based on a simple observation: writing clean code that is easy for others to understand is also hard. If you could write a document that teaches others how to write clean code, what would it contain? How long would it be? How complex would it be? I'm betting you've built up a repertoire of wisdom on writing easy to read code, and I'm betting that document would be quite long, especially if you furnished it with examples.
I think it's the same for commit history. If you're rebasing, then you're probably trying to shoot for a clean history that other humans can understand. Maintaining a clean history can be just as hard (and just as rewarding) as maintaining clean code.
Why would you want to understand what was going on? Maybe for fixing a bug, maybe for working out why a seemingly useless change was necessary, maybe to understand the context of a code review building on previously merged branches, or maybe just because you need to catch up on what happened while you were gone.
If that's not important to you, then don't bother, but it can be hard to understand what you're missing until you've tried both ways.
Really ? To me the need for checking history is extremely common. Many projects have multiple delivery, maintenance and other branches, patches need to be applied here and there. Bugs are found and the versions affected needs to be identified.
Not maintaining a clean history would cause severe pain.
Do you automate it?
Or do you provide feedback by mail, IM, ... to the author when a commit mistake is detected?
Obviously, for any old arbitrary project, there's going to be limits to what you can ask your contributors to conform to before they just leave, but we're discussing "Best Practices" here.
I think these articles are useful because they spread the knowledge of "How To Be A Good Contributor"... for free.
 Ugh. Not a term I like, but really this is almost always(!) the best way to do it.
Then you wouldn't see value in rewriting. Unless you're not reading the history because it's a pain to read because it's messy.
I use `git blame` relatively frequently to understand the rationale and history behind the code I'm working on, or to track down bugs. Of course, this also requires writing decent commit messages---something that I encourage developers to do so that those maintaining their code in the future have some direction. Commit messages are a bit different than software documentation, not a substitute.
I bisect looking for bugs, where a clean history with small commits _that actually build_ is essential to quickly finding a bug. That can even be automated if you can create a script that can run a test suite or some other test to check for success.
This becomes increasingly important the more code and complexity you're dealing with. There are other cases where I look at history, but those are significant examples.
This seems like an instance of adapting development practices to bad tooling, whereas we could be fixing the tool itself. Shouldn't there simply be a "tree aware" git-bisect that can intelligently handle branches and merges? That would resolve this sticking point.
> 2. Open source projects, where some contributors have terrible git habits that the maintainers don't want to merge.
Then tell them to go away and come back with a cleaned up PR.
Afaik, git-bisect is aware of branches and merges.
But there are two drawbacks to not rebasing/squashing: 1) the number of extra commits. Although bisecting is O(log N), it still adds up if it takes a long time to run the tests. 2) if there are any commits that don't build or have to be skipped (git bisect skip) because of other reasons, the number of steps to bisect increases.
It's a good idea to ensure that any commit that goes to master branch will pass all tests or at least compile. Failure to do so will make git-bisect a lot less useful.
That also breaks the ability to do automated bisects.
I don't think it's at all unreasonable, at a minimum, to expect every individual commit to build and to pass tests.
(From my blog post about why history matters: https://vincenttunru.com/Spend-effort-on-your-Git-commits/ )
Even with full-time staff, in the almost 6 years I've been here the rest of the dev and design staff completely changed (and grew).
In my opinion, if you're not using history/blame your code is relatively new, or you're the only one touching code. Or you have regular code reviews (paired programming, whatever).
Seven years at my last job was just full-time staff, but we had the same issues, once a version control system was actually implemented.
But, again, this depends on the situation, the code, the history, the patches, etc. It's just a useful tool to keep in mind come merge time.
(Disclaimer: I made that :)
Once you get even a small understanding of it, there are plenty of places you can use it. For example, in the past month I've used it to:
1. Rewrite the history on a coding test I was given for an interview. The recruiter was oddly specific about how long to spend on it, and when it had to be in by, even though they said no one would look at it for another week. I worked on it throughout the extra week, and modified the history to make it look like I had done it in the allotted time-slot.
2. Rewrite history in feature branches where it doesn't accurately reflect what I had to do to get the code working. I work in an agency environment and (whether I like it or not) requirements change, sometimes mid-way through delivering something. If I've written something that won't ever see the light of day, or if I've done something that won't help the next person to read that code I'll rebase where needed.
3. Saving time for pull requests. I tend to commit early and often, and when I'm in "the zone" on a fairly chunky bit of work that can mean quite a few commits! When it comes to peer review, some people like to do it by commit, rather than the finished output, so to help these guys out I squash commits where possible, since we use our pull requests to illustrate the problem we're solving anyway. I think an untarnished history is important, but sometimes a PR audit trail is more useful than what you'd get from pure commits.
AuthorDate: Wed Aug 2 11:02:03 2017 +0100
CommitDate: Wed Aug 9 12:08:52 2017 +0100
In order to do no.1; perform your rebase, and then change the dates with filter-branch. I have a couple of helper functions for both for resetting to commit or author dates , use at your own risk! :)
I think this is the most important one. Your commits are documentation for people reading your code, so whatever helps them is good.
But I understand that a screen as shown in the article is not immensely usuful. But if it comes down on how to present git history information to users, maybe an additional aproach would be to come up with a better presentation layer (that, for example, could hide merge commits, "squashes" branches, etc.
but the history showed that it was tried (and perhaps the commit message can have a short note on why it wasn't selected).
without the git history, you'd have to rely on human memory to know this. Or a separately maintained documentation (which, lets face it, is never going to get updated).
One says that the history should be preserved exactly, because its important we keep a record of exactly what happened.
The second says that its ok to rewrite history a little if that makes it more understandable.
I think you would put yourself in the first, and that's ok. I would put myself in the second because at the end of the day I value understanding over precision in git histories. I'm of the opinion that one cares about the commit I made to correct a missing semicolon.
Also remember git push --force-with-lease!
I missed a semi-colon or a file? I probably noticed right after I made the commit (and before I pushed it to the remote branch).
Question: Do you have a problem with rebase-before-push?
Regardless, there's a happy medium here: Just commit whenever you feel like it, then rebase, squash or whatever just before you're reasonably(!) confident in your approach... and push. Obviously you don't want to force-push, so at the point that you're "reasonably confident" you're committed. If you need to retrace your steps, you just do that through normal "git revert" or whatever.
The point of the above being: This way of working preserves the useful parts of history and you still avoid the absurd noise of seeing everybody's "I changed a thing" commits and non-compiling-commits as well.
what do you need that in the git history for?
MERGE_COMMIT=$(git rev-parse :/“Merge .*$branchname”)
git diff $MERGE_COMMIT^1..$MERGE_COMMIT^2
> Clean up style
What was this code committed with? What other changes came along with it.
Git, for me, has 2 functions. When I'm developing, it's to save my work in an incremental way that I can reverse. It's to make notes about what I'm doing. After I'm done, it serves as a way to lay blame on me for what I changed and to track those changes in Github. "Who wrote this? What was it committed with?"
I want my PR to be encapsulated in 1 commit after it's been merged. At that point, there is no reason to read the internals of what I changed.
because gerrit forces every change to be a single commit, you have to rebase, which ensures that the history is nice and linear.
I know there is some people who prefer the "git-flow" style history graph (and I was one of those people) - but there is a lot of advantages to a clean history.
And, yes gerrit can be painful at first, but is a lot better with something like git-review, or repo
0 - https://docs.openstack.org/infra/git-review/
1 - https://source.android.com/source/using-repo
If you don't believe me, clone git itself ($ git clone https://github.com/git/git) and open up the repo in tig(1)
To be honest, I only in the past 2 years even bothered to view ($ git log --graph). Regardless of --graph getting wide now and then, I always visualize my git history as a straight line.
Also, sometimes having a non-linear history is inevitable. Especially in large open source projects where you're pulling in patches to a branch, and patches on top of that. You're not always going to be merging a branch with a single author straight onto master.
Despite posts like this (http://www.bitsnbites.eu/a-tidy-linear-git-history/) encouraging good git hygiene, I've had multiple open source projects merge in code via GitHub and never had negative consequence for it :P
Maybe there are corner cases where git bisect wouldn't work? Though I never used git bisect even once. Most I do is scroll through tig and view diffs. Also used to play with a cool git plugin in vim (https://github.com/tpope/vim-fugitive).
Also, GitHub has (since that linear git history post) introduced Rebase + Merge https://github.com/blog/2243-rebase-and-merge-pull-requests. I think that'll get you what you want.
I do keep branches ("pull requests" if you're using GH lingo) up to date with ($ git pull --rebase). That does mean a force push ($ git push --force), but it's ok if it's your personal branch. I also use interactive mode ($ git rebase -i <sha>) to edit/blend multiple commits.
Also, when I do merge, if I go through CLI, I'll preserve the history of the branch by not doing fast forwards ($ git merge <branch> --no-ff).
Git rebase can create large sequences of commits which no-one has ever checked out, and often don't build -- after a git rebase most people check their new head builds and passes tests, but I've never seen anyone bother check their new history works.
Otherwise you may lose the "why" unless code is extremely well commented and that never happens.
I don't buy that every change should be a separate commit - if changes are related and the commit message covers the changes it makes sense to squash.
Not sure if OP is talking about squashing all commits into one. I have seen people do this and I'm always a bit confused why, similar to your point.
Also what's the point of preserving the history of a branch that is used for the development of a requested feature ? For each commit before the final, the feature is incomplete, possibly not working at all.
Splitting per feature is indeed ok if it is small including effects on dependencies. But in my practice there are rarely small enough features - these map to something akin to user stories which have to be further broken down.
"Commit C's revision number has changed." - NO IT DID NOT. The original commit (f4ba6b) is still there, it still points to B, but a totally new commit has been created with a different content. To avoid confusion, it's better to name it C'. C is now a dangling commit (has no branch or tag pointing to it, and will eventually be garbage collected).
git config --global rebase.autoSquash true
We extensively use Github Pull Requests for code reviews and most of the time we squash our PRs into a single commit but I would love to have a way to merge into multiple smaller squashed commits (as OP did) instead of one big.
As a sole developer, I am pulled in two different directions when using git feature-branch style. I want to make sure I never lose my work so I check in frequently- any time I start a change that might ultimately fail, I commit my current code so I can recover it. But I also want a clean, concise and useful history.
After reading this article I tried 'git rebase -i master'
on my feature branch even though the master had no changes since starting the feature branch. This seems to work and allows me clean up my feature branch before I merge it to master.
Is there a better way to do this or are there problems with this?