And that’s why PR squashing is counter-productive. Good luck trying to track down an issue when bisect leads you to a 1500 line commit containing an entire feature.
Repo commit history is an artifact the team produces, as much as the code it contains.
Yep. I actually think PR squashing should be something you ask of someone else when they don’t take the time to clean up their commit history. If they don’t have the discipline to do it then one big sloppy commit is better than 25... I rarely squash my PRs because I always take the time to provide meaningful context surrounding my work.
I like to remind people that pull requests are an abstraction over a set of commits. I show people that they can click on each individual commit in a PR and see the granular delta. Surprisingly often I’ve learned that people really don't know you can do this! If they argue that you cant revert entire features remind them that merge commits are a thing. If they don't like lots of merge commits littering the history suggest that most people should be FF merging (rebase and push) anyway. In my experience git has exactly tue right tooling for many diverse workflow scenarios and the problem more often than not is people don't understand their tools. Help educate them.
If your team insists on 1 PR == 1 commit (or if you use an authoritarian tool that allows enforcing it like gh or gerrit) then be abundantly pedantic about making sure that only changes related to a single unit of work are being introduced in new PRs. I find that calling people out when they make sloppy changes starts to get them thinking that it would be really nice to just keep this typo fix in an isolated commit and not require a new PR for it. When I have seen that model be successful, it’s usually the case that the team agrees that it is more important to ask someone to break out an unrelated change than to merge a sloppy PR.
Yeah that's the scenario I'm thinking about. The trouble I have with this is two things:
- People inherently don't make the PRs as granular as a commit would be, because it puts load on other processes (like CI, review overhead, etc.) and slows things down that way. So you end up with PRs that are hundreds of lines instead of self-contained commits that are dozens of lines long (or less). They still fit under the same goal (and hence PR), but aren't as granular as they could/should be, which makes them harder to revert.
- It's far easier for people not to care about their commit messages and commit granularity than to care, and honestly, with CLI tooling, I can't claim it's easy. I use TortoiseGit and rebase all the time to keep my commits organized, and it's far easier for me to shuffle around commits that way, though it's still non-negligible overhead.
On the other hand, CI is precisely one argument I get for squashing: because rebasing/merging would imply some commits are untested, others specifically don't want them in the tree (they find it misleading when they bisect). I don't really find this compelling—I find it harmful to lose so much granularity information just because you didn't run a test against it (especially when it's already not the case that every commit passes all the tests) because it makes it impossible to look back and revert a tiny commit later, but I can't really convince others the trade-off is worth it. And to make this work well and keep a good history, you need to rebase to specifically avoid spurious commits like "fixed lint" or "removed whitespace", which generates additional overhead (and makes for harder reviewing when you rebase).
My position is it should be up to the author to decide whether a PR should be merged/squashed/rebased, because the answer can be different in different situations and the author would know best, but I have a hard time convincing anybody... people would rather have a uniform workflow instead.
That seems immensely more painful for a nontrivial project though. It'll clog the CI for everyone and nobody will go back and rebase to make sure every single commit is a self-contained passing unit.
In practice, one should be doing -all of these things- in different circumstances.
Rebase and push -- along with some squashing of redundant work-- is great. There's some degree of annoyance if intermediate states don't build, so it requires care to know that all the individual between states are probably safe (because CI, etc, generally don't help you here).
Merge commits are fine, though they do require a higher degree of acumen for people exploring the history.
And there's plenty of times when a developer submits a moderate sized 3 commit PR that whoever looks at the pull can decide that it's better to squash and make into one commit.
If you consistently have 1500 LOC PRs you are really doing PRs wrong. Our median PR is 150 LOC, altough we do rarely (a couple of times a year) have "jumbo" PRs of 1K-2K LOC, but making better individual commits is not going to help with understanding those. Those are the PRs that have extensive documentation and design consideration in the PR description.
I may exaggerate for effect, but 150LOC developed over two weeks is still vastly more surface area than a localized 5LOC change developed over the course of a few hours.
Of course circumstances vary and it depends on verbosity and idioms of your preferred language, framework and codebase. Maybe 150LOC is the minimum it takes to express a logical change unit.
My point is not about specific “maximum line count in a commit” but about a commit expressing a single logical change concept, and not a whole chapter of a book.
> making better individual commits is not going to help with understanding those … "jumbo" PRs of 1K-2K LOC.
This is where we disagree, it seems. You appear very categorical here, could you clarify why you think this is the case?
I share this opinion, however teams I've worked with are fond of squashing and drop words like "cleaner" or "more maintainable" without having proper evidence and I usually can't convince people to avoid this practice.
By default a squash would collect all commit messages.
Also, I agree with aeontech that artificially mandating PRs to be squashed is very counter productive and often hinders bisect. Large features should be implemented in atomic units as a series of commits.
I (and my team) always run an interactive rebase before opening PR. This gives us a chance to reorder, squash and cleanup individual “work in progress” commits into a logical set of atomic, meaningful commits, without turning the entire PR into a single commit.
On my team, we will open a PR with a logical set of commits. As reviews are posted, we will address those comments by using fixup! or squash! commits (using the title of the commit we want to amend). fixup commits are used for code changes and squash commits are used for commit message changes.
In the end before merging, we'll run
git fetch orgiin
git rebase -i --autosquash --keep-empty origin/master
git log -p --reverse origin/master..
git diff @{u}..
The first command will update the remote tracking branches. The second one will start the interactive rebase and order the commits based on the fixup! and squash! tags we used during the review. The third one will show the commit messages and associated diffs for the branch so that we can check that things make sense. The last one is to verify that we didn't inadvertently introduce a change during the rebase (though we will see a diff if someone else merged a PR into master before this rebase).
will verify that no diff was introduced during the rebase compared to the upstream branch as referenced by the PR.
No problem :) One thing I forgot to mention about squash! commits was to use the --allow-empty flag when running git commit, so that we can make a commit without having to actually change any files.
Then, when doing the interactive rebase with the --autosquash and --keep-empty flags, you'll get to a point where your editor opens up with text saying that "this is a combination of X commits". At that point, you can delete the text down to the text of the last squash commit, exit the editor, and it will replace the commit message for the corresponding commit with the text that was in the squash commit message.