I've had to dig through the history of a 5 year old project - it was much nicer when the commits were focused, the PRs has nice descriptions and even better if associated with ticket.
I will defend merge squash forever - low effort, good results. You do whatever you want in your feature branch, then merge squash. Everyone can do it. You want to split it in multiple commits? Then it probably should have been split in multiple PRs .
I need to write an article about this so I can just link to it instead of writing the same comment evrytime :)
Same. I'll also add that this company's policy of prefixing commit messages with ticket IDs was a godsend. I could trace confusing business logic to a meeting and specific people. It made a big refactor possible. Not easier, possible.
Even as a solo worker, descriptive and atomic commits are valuable. They make merging or reverting changes a lot easier.
Just the other day I wanted to find an old ticket which I couldn’t find via search. But I knew what changed in the code so I went to pull up the commit and… no ticket ID. And git does not provide any way to see which branch a commit was originally made to (our branches all have ticket ids in them).
I finally found it by searching PRs but it was painful.
My IDE leaves my previous commit message in the box which is awesome for me. Once I’ve committed once I just have to delete everything after the ticket ID and add my new commit message.
Ticket IDs are useful and low effort, especially if the previous developer entirely failed to document business logic in tests.
However, those hyper-detailed commit messages that some people seem to like? High cost, very low payback. Better to put all that effort into other docs instead.
I don't mind ticket IDs, but "Fixes #125" is not a very helpful message. Now I need to cross-reference an issue tracker just to understand what changed. I've been at places that required the ticket title be posted as well and that worked well. Humans could quickly see what changed and tooling could get triggered.
> However, those hyper-detailed commit messages that some people seem to like? High cost, very low payback. Better to put all that effort into other docs instead.
I guess I'm one of those people? I don't think the cost is very high. It's certainly no higher than putting them into another document. I don't get the emphasis on "very low payback" though. I've had to dig into projects where those detailed comments saved me weeks of effort. I love reading the rationale and any associated pitfalls with a nice code diff contextualizing the whole thing. And it's right there, easily accessible in my SCM log, where it integrates with the CLI and IDE. No 404s or issues because the startup that created the project died or was sold and the domain is no longer in use.
I thought it was an absolute standard to put the ticket number in the commit. Are there projects which don’t use it? How does Jira find the commits associated to the issue?
I do it even on small personal projects, as soon as it’s evident it will go to production, I set up an issue tracker and commit messages. As common as setting the linter files in TS by default, even when just drafting a POC.
We also name all branches after our Jira tickets, and our merge commit into the main branch is always: `TICKET-XXX - TICKET SUMMARY`. When we close the ticket in Jira, we also link back to the merge commit.
The Merge commit also updates the ChangeLog with the `TICKET-XXX - TICKET SUMMARY`.
That makes it really easy to trace things, as you can:
- Look at the ChangeLog
- Do a `git log --first-parent`
- Look at a Jira Ticket and jump straight to the merge commit
We also have policies in that every commit to master must be a merge commit and associated with a Jira Ticket. Also, each branch is tied to one and only one Jira Ticket.
I prefer to have every commit into the main branch be a merge. Work in your feature branch, do whatever you want, then use a single merge commit when it comes into the main branch. Then you can do a `git log --first-parent` and only see the merge commits. It gives a nice clean history and IF you need to dig down into the feature branch that was merged in, you still can!
This is my preference as well. Gives you the option of looking at the nice manicured "this is how I wish it happened" view, and when needed/helpful you can dig into the nitty gritty of what really happened with all its false starts and mistakes.
It's definitely better than having a bunch of garbage "wip" commits, but it's still far from ideal in my mind. In your world, you'd git bisect and eventually end up in a 300+ line diff.
> You want to split it in multiple commits? Then it probably should have been split in multiple PRs .
This can be inconvenient and slow down development. I still prefer to just interactively rebase and squash commits locally before making a PR. Then ideally I'd like a rebase + merge commit (which unfortunately Github doesn't support in the UI).
Then you can git bisect with --first-parent to find which PR caused the bug. Maybe it was small enough that you don't need to dig deeper, but if you do, you can bisect the individual commits (something that would be lost if you squashed).
But like everything in our business, it's a matter of trade-offs, team experience, priorities etc. When I was younger, I was very opinionated on this stuff, now I just go with the flow.
Regarding squash-merge: If you're taking about using GitHub's "Squash and merge" button, then as soon as you do this, you lose the SHA1 connection between what you committed locally and what winds up in the GitHub repo. This can result in unnecessary confusion later, in my experience. Better to squash-merge locally before pushing IMHO.
But better yet is to not squash at all! git log has --first-parent, which gives you essentially the same tidy view as if you'd always squash-merged (or always rebased) -- but the gory detail is still there if you need it. (That's provided that you either specify -m to git merge, or have moderately descriptive branch names so that the default "Merge branch 'add-cool-feature' into main" message suffices.)
I couldn't agree more. I've worked on multiple projects where PRs must be 'squash-merged' and I've never understood why people want to throw away history and make their bisects harder.
If you use proper merges you can always see a 'squash-merge' view of the repo. If you squash you can never see an individual commit view.
Also if you're required to tie changes to some ticket system—especially if you have to put that ticket info in the subject—then with merges you can put that ticket info just in the merge commit and the rest of the commits can have proper subjects with actual useful information in them.
I'm with you but when a feature branch is merged everyone at the time got the benefit of the git history of the changes in a nice logical way. The PR is approved and it's merged in but now the feature branch is deleted and the individual rational list of changes is lost when that feature branch is rebased on main.
I guess it's a lot of extra storage but if those branches were kept it would help illuminate the individual steps that benefited the reviewers at the time but now take additional archeology.
>I will defend merge squash forever - low effort, good results. You do whatever you want in your feature branch, then merge squash.
I completely agree. Trying to read through the git history of a non-squashed repo is painful, and I actively try to avoid it. Seeing a bunch of commits that just say "fixed typo" is not helpful to me, and I really don't care about seeing someone's workflow or thought process in minute detail as they were developing something.
This is a false dillema. Seeing garbage "fixed typo" commits has nothing to do with MRs being squashed or not. Squash merging may be used as a really poor way to clean them up (which happens at a wrong place in the process, leading to loss of actually valuable context), but that's pretty much the whole connection.
Don't present garbage commits to review in the first place and you're golden. It improves not just the history, it makes reviews easier and more effective too. The problem with it: it requires some basic Git skills, which is too much for some professionals out there.
Same. One side-effect I really like with it is really the complete freedom left to branch authors to do whatever the hell they want with their commits. Want to push your WIPs often? Do it. Want to experiment, commit, rollback, go someplace else, whatever? Do it. No one will care about the process and only the outcome of the end result will matter. And I think it removes one (possibly tiny but still) source of anxiety about how work is perceived by others.
The squashed merge commit is the only one that should be properly formatted, have all the checks and validation.
> You want to split it in multiple commits? Then it probably should have been split in multiple PRs .
Sure, but GitHub/GitLab/Gitea don't let you write multiple PRs that depend on each other easily, like Phabricator/Gerrit/email do. Either it's a mess of branches merging into each other, multiple commits in the same PR, or wait for one PR to be merged before writing the next.
I think GitHub is pretty good here: If you make PR #1 from branch foo with target main, and PR #2 from branch bar (a descendant of foo) with target foo (that is, a "sub-PR" of #1), then if #1 gets merged first (into main), #2's target branch will auto-update to main, which is exactly what you want.
If #2 gets merged first (into foo), then branch foo of course updates to include the commits in bar, which is again what you want.
>Note: If you delete a head branch after its pull request has been merged, GitHub checks for any open pull requests in the same repository that specify the deleted branch as their base branch. GitHub automatically updates any such pull requests, changing their base branch to the merged pull request's base branch.
(This makes more sense, since if you keep the original branch around, it must be because you want to do more with it.)
Ah, I didn't know that, thanks. Does it work branch "foo" is not in the same repository as "main" (eg. because you don't have write access to the repository)?
I haven't tried that specific case, but there's no reason for it not to work -- if #1 (you/foo) gets merged first, the only automatic modification that needs to happen is to change #2's target branch from you/foo to them/main, which is no problem since that target branch is just metadata that lives in your repo. (Merging #2 afterwards would again need to be done by someone with write access to them/main, as you'd expect.)
I mean, if both branches "foo" and "bar" are in your fork, you can't open a PR in the upstream repo, can you? As soon as I try, GitHub redirects me to my fork to create the PR there
bar is a descendant of foo, correct? That's the situation I'm talking about.
Assuming that's the case, you want to make #2 a local PR anyway -- that is, its source is you/bar and its target is you/foo. Once #1 gets merged to them/main (by someone with the access to the upstream repo, "them"), #2 will (I predict, based on reasons I've already given) automatically become a PR "in" the repo them. That is, its target branch will automatically change to them/main.
>Note: If you delete a head branch after its pull request has been merged, GitHub checks for any open pull requests in the same repository that specify the deleted branch as their base branch. GitHub automatically updates any such pull requests, changing their base branch to the merged pull request's base branch.
(This makes more sense, since if you keep the original branch around, it must be because you want to do more with it.)
So try deleting the branch underlying your PR #1. I think #2's target should then auto-update.
It auto-closed PR #2 and did not open a new one. Also, interestingly, there was no button on PR #1 to delete the branch from there as usual, I had to go to https://github.com/progval/testrepo/branches
>Note: If you delete a head branch after its pull request has been merged, GitHub checks for any open pull requests in the same repository that specify the deleted branch as their base branch. GitHub automatically updates any such pull requests, changing their base branch to the merged pull request's base branch.
(This makes more sense, since if you keep the original branch around, it must be because you want to do more with it.)
If your PR is that big and has that many interconnected pieces, then it should probably be broken up and serially merged anyway. Otherwise it becomes a pain for the reviewers to read, and a pain to make any large changes to.
Either the second one includes the first one's commits so the second one's diff is unreadable, or it doesn't and the CI fails because it's missing the features introduced by the first one (or merge-conflicts).
> You want to split it in multiple commits? Then it probably should have been split in multiple PRs.
I've seen this comment repeated in most discussions about this topic and it doesn't align with my experience. I'm trying to better understand that. Are you talking solely about web apps with continuous deployment, maybe with things blocked by feature flags? On many projects I work on splitting a PR by commit would grossly complicate things. And not using multiple commits makes it harder to track changes. I rely quite a bit on being able to read through source history and piece together what changed and why. But, I also work on compilers and runtimes which have a different deployment model and don't generally have a high amount of code churn.
I find source control rather unhelpful on any project I've participated in using squash merges. It helps distribute changes, but little more than that. I can't learn anything looking at the history. It requires access to GitHub because I can't access the original individual commits otherwise, making "git log" useless. It makes the mechanics of bisecting easier, but actually doing anything about it much harder, unless I want to revert the entire changeset. I find there isn't often an appetite for that, so we end up with follow-up commits that undermine the benefit of a single commit in the first place. You can't cherry-pick. Etc.
I can see some domains where you never really rollback and where things get replaced frequently enough that having access to the source history may not be valuable. But, I don't think it's a one size fits all situation. We have decades of experience prior to the squash merge feature in GitHub that demonstrates the utility of smaller commits.
Individual, logical commits are an incredibly powerful tool when stepping into a large codebase where the original author is no longer around. It's an opportunity for the author to document the rationale or design for a scoped change. Squash merges often add too much noise. Sure, you can roll up all the commit messages into one, but without access to the original commits that's of limited value.
>I will defend merge squash forever - low effort, good results.
Squash is fine on balance, but it's not zero effort. If you branch a new feature off a branch that gets squashed it will usually cause annoying conflicts.
This is usually fine if you merge frequently (ideal, but not always possible) but if you don't then it can quickly become very irritating.
If I joined a project which was fearful of deployments and frequently had backed up branches then I wouldn't be in a hurry to force them to start using squashes until after that problem was solved.
>You want to split it in multiple commits? Then it probably should have been split in multiple PRs .
Should is absolutely the wrong way to think about pretty much all software engineering practices. Add up the costs and the benefits.
> Squash is fine on balance, but it's not zero effort. If you branch a new feature off a branch that gets squashed it will usually cause annoying conflicts.
Conflicts happen when multiple people work on code, squash or no-squash.
The benefit of squash is when (not if) you want to solve a conflict, you only do it once. You get to compare the final snapshot of your code and the final snapshot of their code, and decide on a single, new best version.
This is in contrast to solving a conflict where you replay 10 of your commits against 10 of their commits, and try to work out what 'theirs' and 'mine' means as you change the same constructor from 2 args to 3 args 10 times.
I believe in a squash and merge too but if someone thinks a PR shouldn’t be squashed, I think they should merge as-is.
It doesn’t make a difference to me whether you squash or not to honest. My Git client can show Git history however I want it.
I mostly support squashing because I want the Git history to contain absolutely zero “WIP” commits and you can’t realistically ask people to groom their commits.
Squash merge is a tool to use when the contributor did their job poorly and you're hopeless about getting them to do it right. There's barely any other use for it when accepting merge requests.
> You want to split it in multiple commits? Then it probably should have been split in multiple PRs.
No, I disagree. A PR/MR/feature branch should correspond to one unit of work/ticket/issue/story/bug that you use for project management. That simplifies coordinating work with project managers, stake holders, etc. But, sometimes there are other reasons why you would want to split the resulting changes into multiple commits: In order to simplify code reviews.
E.g. consider a larger code base. In order to implement a new feature, I have to move some functionality from one module to some other, so that it can be reused between some old usage and the new feature. Such a change might be conceptually trivial, but it might touch many files, because lot's of imports do change. Separating this into a separate commit allows the reviewer to look at the commit message "moved foo from a to b" have a short glance at the diff, run the tests, and be done. When those same changes are mixed into many other changes, it easily becomes a mess that is much harder to decipher.
In general, I would say most text book refactorings that touch more than one or two files should be moved into a separate commit for that reason. Why is that? Because, those refactoring should not break any tests. In order to be able to check that, I need them as separate changes during the review.
Additionally, in a non-ideal code base -- everything with significant history or that is not the hobby of a perfectionist -- you will come across stuff that could or should be cleaned up while you are working on that file anyway. Again, mixing those changes with the changes implementing a new feature is a bad idea. It complicates code review. It makes it only possible to revert the new feature together with the code cleanup.
But how do we get a feature branch/MR/PR that is clean in such a way? Neither by rebasing the branch as it is on top of master, nor by squashing the whole branch into one commit. Instead you can rebase your branch as you go. When you realize you should have down a refactoring first, you can interactively rebase your branch to insert a new commit earlier. If I realize I would like to change these two lines in order to cleanup the code, I stage them and use magit to commit them directly into a preexisting commit that I can select interactively. That feature is called "instant fixup" [1] it is really useful. I haven't seen that in any other git frontend so far.
Keeping the history of feature branch mostly clean while I am working on it [2], reduces the amount of spurious merge conflicts when I rebase the branch on top of the current master branch.
[1]: This is like git commit --amend where you are free to select interactively which commit you would like to amend.
[2]: I still use wip commits at the end of my work day or before an experiment. But I tend to clean them up immediately.
The article seems to present a dichotomy between what the author terms a "clean" git history, which he seems to think is a history where multiple commits are squashed into single commits that contain, I guess "one feature", and the unnamed "other" way of doing it, which the author doesn't really elaborate what exactly it is, but he appears to means willy nilly uncurated commits of whatever? To me, both ways he talks about are insane.
With something like stgit[1], it is dead easy to maintain a stack of curated, small un-squashed git-bisectable commits, and your commit history looks like the work of a supernatural genius who knows exactly what he's doing and rarely makes mistakes, and if you have to port your patches (commits) across multiple variants of the same source (think linux drivers ported to multiple distro kernels) that's easy too.
I think it helps to think about when people read the commit messages, and when commits impact workflows. Having tons of commits that are partially broken ("saving work") can make it harder to git bisect. I sometimes read commits to understand what's being worked on, again "saving work" is just noise.
People on the Ops side, doing packaging, or deployment often care more about the commit messages. A good commit message can help them avoid looking at a diff, or understanding which commits need scrutiny.
I spent most of my career in ops, specifically packaging and deployment, and one of the most important things I learned early is to never trust commit messages. This has been just as true at places that spent an ungodly amount of time grooming their commit messages. It's pretty easy now to work with diffs directly, and diffs don't lie.
> ...and one of the most important things I learned early is to never trust commit messages. This has been just as true at places that spent an ungodly amount of time grooming their commit messages. It's pretty easy now to work with diffs directly, and diffs don't lie.
That's a sentiment that's very easy to take too far, and shy people away from valuable commit message practices out of misplaced perfectionism. Diffs definitely can lie. Not about the way it is, but about the way it should be. Commit messages are insanely valuable, and often convey important information that's impossible or unreasonable to communicate in any other way (for instance, things like like intent and rationales).
Diffs can say ABCD, commit messages can say CDEF. They all should be considered, together, but with a reasonable dose of skepticism, when you're figuring something out.
Some things that are true (if you're system lives long enough):
1. There's a good chance your documentation wiki will get purged or severely corrupted in some migration before you lose your commit messages.
2. There's a good chance your ticket system will get purged or severely corrupted in some migration before you lose your commit messages.
For some codebases, if you are lucky, you can trust commit messages from specific trustworthy people. Diffs might not lie, but they take much more space and time to read than 'git log --oneline'.
The challenge is that it's hard to see the value of good commits if you've never done a `git blame` or `git bisect`, but you can't really use those until you have good commits. A bit of a chicken and egg problem.
I use git rebase all of the time.
We rebase all features onto develop.
Develop is then merged --ff-only into stage/release, etc.
Which is then merged --ff-only into main/master, and then tagged.
I use squash less, as it can be helpful for blame of course, so rebase with minimal squash (like, squash that typo fix).
The article makes it sound like a 'rebase' workflow is so complex, and time consuming. Poppycock. it's easy, and force pushing a branch cause practically zero issues for developers.
If I have to rebase develop, for example, and then force push it, then remote users only have to do this:
git checkout develop
git reset --hard origin/develop
Since a user would never have any local changes in develop, it would never be an issue.
The "fanaticism" about and against rebasing is overblown in my opinion.
Oh, and the "time wasted" dealing with merge issues.. same, exactly the same in fact.
Granted, I don't go around rebasing everything all of the time, but a developer working on a feature just rebases to develop before they submit a PR. Sure, if there are multiple PR's then they also need a rebase merge, but since nobody would "continue" working on a feature once it is submitted, this isn't an issue.
Oh, and IF they did, well, again, they can just pull develop and rebase their feature.
Now, what I don't do is have multiple developers on a single feature branch. If my team was much larger, and that was a "normal process" then, it may well make rebasing more of an issue..
Maybe you've only worked with average to good devs who communicate well, like I do now, but I guarantee that in our field, we have people who can at the same time be bad to average AND bad communicators (or worse, toxic liars).
Luckily, now our worse communicator is also our most competent dev, but I've been in teams where learning basic git would be a month long challenge for half the people, and for half of those (two people, really, we were 9) communicating their difficulties, or worse, their mistakes was impossible.
I don't understand why this causes so much confusion and conversation.
If you are working with others then a force push is off the table. The only time you force push is to fix a catastrophic error. Forget it exists.
When working locally please rebase LOCALLY and clean up your commits before pushing an MR. No one wants to see fix 1, fix 3, rollback etc. Plus team environments usually have strict commit templates linked to Jira tasks etc.
Rebase -i is your friend.
I think people for some reason struggle to understand their local git repo versus the actual shared git repo everyone is using.
I only skimmed the article. But my stance is, anyone that ever used cherry-pick or reverted multiple commits is impacted by clean git history. Unclean git history could make a simple 2 minute cherry-pick or revert task to a 30 mins+ job.
History should be at the keystroke-by-keystroke level, which aggregates into small, focused commits, which aggregates into large merges.
In other words, good written records have layers and organizations. A history book will have chapters, sections, sentences, and footnotes. A database might let me organize my corporate history by day and then by division, or flip it around, and organize by division and then by month.
git his one linear history, which means it is navigable in one way, and you need to pick:
I do not agree. Commits in my patch series have no link whatsoever with the chronology of my work. I wouldn't call it "Git history" as long as it is the branch I'm working on. It becomes history once it is merged inside a more persistent branch.
> git his one linear history
> Don't make me pick!
I'm not understanding what you mean... to me it doesn't.
For example you can browse by 'chapters' by adding `--first-parent` to your `git log`. You have commits, which are paragraphs, then you have merges (ie chapters), and you can view from either view, or many others.
Git blame is an essential part of my daily job to understand why a certain line of code was introduced or how it evolved over time. Hit history is important.
I've been using commit-centric reviews and I've never encountered the issues he talked about. I try to make sure that each commit is complete (like a vertical) and use commit messages to provide context.
What about a system where each "save" (ctrl+s) would be like a commit? But then only those saves which pass tests ensuring the application still runs get pushed.
I will defend merge squash forever - low effort, good results. You do whatever you want in your feature branch, then merge squash. Everyone can do it. You want to split it in multiple commits? Then it probably should have been split in multiple PRs .
I need to write an article about this so I can just link to it instead of writing the same comment evrytime :)