I'm sure quite a few commenters on here only started writing code in the last 15 years. What's interesting is that git is (more or less) the de facto version control system and has been for that 15 years, and that likely means that what you put in your commit messages will stay around forever (for some definition of forever).
As a greybeard I've gone file-based backups, then SCCS, CVS, SVN, and now git for what seems like forever (git was 2005 according to wikipedia), though probably the late 00s. And migrating from one to the next (yes, quant code from the 90s is still running in 2022, and we have the commit messages to prove it).
Some people commenting above/below above the PR being the source of in depth information. As far as PRs are concerned, it's just another issue tracker, and I've been through the lot, also including various git hosting services). The commits travel with the repository. The PR discussions don't.
100 times this. I’ve worked at a company with a commit history going back to 1979 (literal paper logs at that point, plus in-file headers), and another company where we lost ~2 years worth of historical context/info to WIP commits because “our internal GitHub instance has that info” turned out to not be true when we migrated to a new instance. Nowadays, the relevant info from our PRs gets copied to git, and I hope in 40 years if someone has a question they can still see that history if needed.
I like the conventional commit style [0]. You may have seen these in open source repos or used them at work. They look like
feat: support new line chart
fix: update props for new line chart
chore: bump dependency version
What's also cool is there are tools (semantic release) that will then handle automatically the versioning and publishing of your module based on these commits using the commit type (feat, fix, chore etc) to determine the next appropriate version [1].
I built a full pipeline around this. Basically allowed for a paved road for devs to automatically semantically version and deploy application and module artifacts for npm with semantic release by spinning up a repo with some boilerplate generators I made available to the team.
Since then I've been obsessed with the conventional commit format everywhere for my own projects. Even if the commits aren't parsed for versioning, it just gets you into the habit of thinking about the scope of the commits.
The laughable part about building that full paved road pipeline is that the biggest friction point was always when devs didn't use semantic commits. At the time when I implemented it I didn't have checks for the commit format at the pr phase. I DID have a commitizen cli option as well but that was "Too much friction". Oh boy did I get bit. As someone who was working on DX I was constantly surprised by how many questions I got about something as seemingly simple as the conventional commit format from a crowd of enginerds.
> I was constantly surprised by how many questions I got about something as seemingly simple as the conventional commit format from a crowd of enginerds.
I'm not. Writing code and writing git commit messages take entirely different types of thinking. With code you are having to think logically and problem solve to tell the computer what to do. Git messages are more analogous to writing a term paper (though shorter). You have to think about the what you are trying to communicate, how best to describe what this commit is all about, maybe trying to figure out if this counts as a fix or something else, etc. Writing code is fairly easy for me. Writing git commit messages is extremely difficult for me. I sometimes go days without commiting because I'm dreading having to write a commit message.
One feature I wanted to add was for it to parse your source code for comments with a specific format (e.g. `# git-plan feat xyz` or `# git-plan fix xyz`) and then stitch all the hunks together into commits for you. So all you'd have to do is comment your code and then run `git plan commit` and it would generate commits for you to confirm with y/n.
I did this at one company too using commitizen to use a cli wizard for helping ppl get started (exposed it through an npm script, npm commit i think) but also setting up a commit linter that would run as a post commit hook and reject the commit if it was incorrectly formatted.
It helps get everyone using it familiar quickly despite being annoying as you get used to it.
I prefer issue/card numbers as it makes it easier to visually group and search for common commits of a single task in shortlog. feat/fix/chore have no functional use in shortlog (for me!) and the mixed length moves the start of the actual comment and makes shortlog messy. That information can just as easily be put into the commit message. 2c.
This is probably more appropriate for private projects rather than public contributed projects perhaps where the benefit of conventional style is to be able to easily create changelogs of large number of commits.
Your commit topic shrinks to only 40-45 characters if you try to do both (card numbers get quite large too if you're in a jira context, with their issue prefixes PROD-001 is 8 characters already) so you pick and choose.
The other thing is that in an internal context, what does knowing whether it's a feature or a bugfix do for you? If it's such a big issue, then it'd be integrated into the issue prefix
BUG-001
FEA-002 < less likely as each feature epic will probably have it's own code, so it'll be more XXXX-001 XXXX-002
Now you have a nice short log that aligns nicely, and you can figure out mostly where your merge points were anyway, so no need for merge commits, rebase works mostly fine.
- BUG-001 fix for this thing
- FEA-005 initial commit
- FEA-005 unit tests
- FEA-005 implement this thing
- FEA-005 implement that thing
- FEA-005 fixes for code review
- BUG-002 fix for another thing
- BUG-002 revert fix did not work
- BUG-003 fix for that annoying thing
Of course, just my 2c and my limited personal experience.
I've been using github-changelog-generator [1] for (you guessed it) automatic changelog generation, which adds bug fixes/features to the changelog based on issues and PRs, but sematic-release looks like it might be even more useful.
These "changelog generators" look to me like barking at the wrong tree, automation for the sake of automation, and attempt to skip the "boring" work of communicating changes by replacing it with "interesting" work of massaging git history, inventing microformats, etc. Why bother with separate changelogs at all then when you can see changes in `git log`. Just write good commit messages! /s
Well rather than manually enter the bug fixes/features introduced in each release on the GitHub releases, you can just paste a link to the automatically generated changelog.
This is less error prone and isn't meant to replace writing good commit messages.
Yes, but you might present some feature differently internally compared to your users. E.g. "Update libfoo" is a totally fine commit message, as you can just inspect the diff to see the version change, but in the changelog, I'd prefer "Update libfoo to 0.3" because dependency versions are something that's exposed to users of your component. Conversely, you might have a bunch of refactoring commits that each are important as their own unit but your changelog is maybe only interested in a summary.
Personally I don't see much value in these half-automatted changelogs over just giving git history. It also raises the bar for contributions because contributors have to figure out a non standard commit message format.
These are good points. Just to clarify, I can't speak for semantic release, but the github-changelog-generator I'm talking about will only add PRs (and issues if you want) to the changelog, not commits (so no non-standard commit message format needed). So you can put anything worth mentioning in a changelog into a PR, and it distinguishes between bug fix vs feature based on the tags added to the PR.
After working on a number of projects that tried to apply conventional commits properly, I found something like this to be more effective and sustainable.
It doesn't literally mean chore. It means a code change that you do regularly but is not major, and usually doesn't require much work (update dependencies being a good example).
Updating dependencies can be a major change. It can also be a security fix.
Marking it as a simple chore sounds like the impact has not been fully analyzed.
I understand the practice and if done meticulously by all contributors, it should at least signal a well organized project but most of the time, it just shovels your entire commit log as a release notes. It's OK to have it, but I much prefer projects that takes time to summarize the impacts to their users.
I don’t particularly enjoy cleaning my house but I love the aftermath.
Same with bumping versions on dependencies. If I didn’t have to do it I’d love to just get the new features in a non-breaking way. But I can’t, so I do, and it’s a chore.
The example using chore is a part of the conventional commit types to mean this commit is less significant. I often use it for things like
chore: resolve a merge conflict
chore: fix code formatting
chore: update dependency type definitions library
Each of these things is nice to have tracked as their own commit so they don’t show up in other more impactful or isolated commits. They don’t add a lot to the code but are rather little house keeping actions that help keep a codebase clean and up to date.
This should never be a standalone commit! A merge conflict is solved by the merge commit, and if you were to override the automatic commit message, this seems like a very information-poor text to replace it with.
Most people don't have the luxury of picking and choosing exactly what they work on. I don't WANT to work on keeping some legacy tooling from falling apart, and I would consider maintaining it a chore. That doesn't mean that it can just be abandoned though, so my boss will tell me (or someone else) to go work on it when needed and that's a perfectly normal and acceptable thing to happen in a workplace.
Yup, you should re-evaluate. There are a jillion reasons to do things that are chores. The word chore should not be implicitly perceived as a negative thing.
Do I want to clean my house? Yes. Is it still a chore? Do some thinking for yourself.
I honestly have no idea what the heck you are talking about; hence my comment. Do you seriously perceive any chores as not worth doing? That sounds unhealthy.
> There are a jillion reasons to do things that are chores.
> Do I want to clean my house? Yes. Is it still a chore?
> Do you seriously perceive any chores as not worth doing?
None of these are relevant as to whether 'chore' has negative connotations.
It clearly does to me and I'm baffled at the opposition. Do I want to clean my house? No of course not, and you don't either. If your house cleaned itself automatically would you still wish to do it? No, you do it because you need to, not because you want to.
Just about every dictionary definition mentions chores as at least possibly considered negative:
> a hard or unpleasant task: Solving the problem was quite a chore.
([1])
> A chore is a task that you must do but that you find unpleasant or boring.
> She sees exercise primarily as an unavoidable chore.
> Making pasta by hand with a rolling pin can be a real chore.
([2])
An entire article that wouldn't exist if chores were fun activities: [3].
> “Chore” can have a negative connotation and feel like a burden to a child.
I only got a few sentences into this but I can assure you that I love cleaning my house, and performing various other chores. Work related and otherwise.
If I didn’t do them, my life would be worse than if I didn’t! It’s pretty easy stuff! Embrace the chore, clean up after yourself! It’s worth doing.
Interesting topic. From my experience Headline + Bullet Points are far quicker to convey useful information, in a form that is terse yet easy to read.
For example:
----------------
improve Buffer Cache Management & logging
- change 'tryDrop()' to skip immediately, if lock unavailable
- move BufferCache logging to a separate logger
- attach BufferTrim.Unsuccessful -> Preemptive Flush of oldest buffers
--------------------------------
Having worked in many codebases & internal docs, I've found narrative significantly less efficient to read & write.
In commit messages, far too many developers skip almost any detail as to what they're changing & why -- the slowness of narrative formats may substantially contribute to this. In docs (Wiki etc), narrative encourages developers to ramble about minor technical details while omitting all high-level context.
Headline + bullet-points bypass a lot of struggles with narrative by allowing a simple headline and then, well, cutting to the point.
This also illustrates a informed personal preference -- studies find lowercase English words faster to scan/ comprehend.
I don't really have a problem with bullet points, conveying relevant information is after all the most important thing the message should do, but if that example is an actual one then it would raise some flags during review. Mainly because it's not super readable (mix of styles, super terse requires extra interpreting) but also because it mostly explains what has changed (most often unneeded information, as that should be clear from the diff) and not really why it is done that way (ok, the code might have comments, but still I expect the message to explain that). In my experience, using bullet points like this is correlated with this style of messages which just list 'I did x, y, and z' whereas a more narrative style, which can still go in bullet points, is correlated with easily digestable messages explaining properly the reasons for changes. Anecdotal of course.
> it mostly explains what has changed (most often unneeded information, as that should be clear from the diff) and not really why it is done that way
Ideally, commit messages should at least briefly explain what was changed (e.g., Add feature X, Fix bug Y) in addition to why it was done. When people look at git log output, they won't always show the associated diff for each log message. Also, when running git blame, they'll see the commit id and title, but not the entire change.
That doesn't mean that one can't run other commands to see the associated diff, or even getting the overall branch diff from the commit ids recorded in the merge commit, but having that what was done in the commit message allows one get an understanding before looking into it in more detail.
> Headline + Bullet Points are far quicker to convey useful information
Not sure why the quicker qualification, here. Your example follows the article's guidelines exactly. (Except, as a minor point, the initial character is upper-case).
In fact, if each of the bullets were its own commit, this would be the default message of the squash-commit to master.
Just thinking out loud, but wouldn't this be best handled with separate commits for each of these atomic changes, with a merge commit describing the intent ins narrative form?
Good question. The example message is adapted from a fast-moving innovation stream; it's an example of moving quickly and getting a lot of major work done.
In a more mature area, pace of work could often be slower and commits more granular. Until it's necessary to refactor/ reengineer things -- then commits are often larger again.
Bullet points are fine, but again the message should contain the context of the change and the "why" of it, not the what and how (that's what the diff shows).
I hate that rule about 50 characters. IIRC, it started because someone noticed that the average commit message in the linux kernel is about 50 characters. Then, for whatever reason it morphed into this widely propagated mantra saying that the maximum should be 50 characters.
I know the articles calls it a rule, but really it's more a guideline. It's so that `git log --oneline` and `git shortlog` produce succinct output. It's also the email subject when using `git format-patch`. With all of these, the idea is to have a quick-to-read summary of the commit. That's all, really.
Indeed, I follow these rules but I routinely violate the 50 character limit. It's far more important for subject lines to be useful than to be maximally succinct; I'm not out here trying to write an essay in the subject line, but I do need it to be appropriately descriptive.
Writing a good commit message within those constraints is kind of like writing a haiku. It's kind of entertaining sometimes, though I think it should be a suggestion rather than a rule.
It's usually fine unless your change concerns a specific function/module whose name is 20+ characters. Often I am renaming a function/module/package and can't fit its name in the first line which I find really annoying. Trying to refer to it in a different way or abbreviating it is so awkward.
I try to subscribe to the idea that if you cannot adequately describe your commit in a small number of characters (~50 is plenty I feel), your commit is likely too complicated. One size does not fit all, and if a workflow ain't broken there's usually no need to fix it. However most workflows are imperfect, there is almost always room for improvement.
But in the general case I find that if commits are not themselves concise and simple changes, they often should be broken up so that the moving pieces can be better tracked when looking back at the history.
It's often difficult to find a summary in 50 characters, and it should be! Anything that's longer than 50 characters is probably too verbose too be viewed as a log.
I think Linus himself recommended at some point that 50 character subject restriction (in a time when 80 character wide code lines were also still a recommendation).
That wouldn't be important in itself, but Github trims your message at 50 characters as well. Because of that "feature" alone it's annoying to pass that limit.
Honestly, the best guidance is just "put more than 2 seconds effort into your commits". No one really believes "More code" is a good commit message, it's just laziness. So yeah, put a little effort in, it's literally one of the few parts of your work that will remain once you are gone. Most code I've worked on gets replaced/updated/modified after I'm done working on it - but the one thing that I know will stay is the git history that some poor bastard will have to search through to figure out why this random bit of code ended up this way.
The only other thing I'll say is don't be weird. I knew a guy who would prepend his commits with a [tag] saying what part of the repo his commit was touching. This was a mixed sw/hw project and every single commit of his started with [sw]. It's like, ok thanks, firstly, you're a software engineer working the ../sw/ folder, secondly no one was confused that your 17 changes to a C++ file were going to be a hardware change.
I like this. Any advice other than "putting in effort for more than 2 seconds" has a lot of hidden assumptions about the context and organization around your code. Some days I only do a commit once at the end of the day as a backup. Other days I have 10 commits in 1 hour because everytime I thought I was finished I found something else that needed to addressed.
In practice, there often is not a very strict process around commits, so specific advice is often not very useful either.
My opinion about Git commit messages has always been the same: if you can't describe it in English, you don't know what you're committing. If your commit message is too long and messy, you're committing unrelated things that should be split into multiple commits. The commit message is a direct indicator of the quality of your commit.
I like this post. I know people are offering their alternative conventions which they have various rationales for preferring, but I've followed the conventions this post advocates for years, and they haven't done me wrong.
This post describes a widespread standard practice. People never complain about my commit messages and they don't have trouble understanding them.
I rather prefer the first paragraph to be a problem statement and the second paragraph explain the change in detail. This gives the rationale upfront as the focus and helps to understand the problem and check whether the problem/goal was fully understood when making this change and whether the change actually covers what it should achieve. Overall, the change was/is just a mean to an end and other variations are possible.
A junior dev turned me onto this several years ago, and I've gone by it ever since.
I was skeptical at first. For instance, the given reason for using imperative mode is not strong, in my opinion (the default git message on merge and revert is imperative is not a good reason, because other default git messages do not use the imperative). But, there are other good reasons for using it, not least of which is arbitrary consistency lowers cognitive load.
Now, it's my go-to if there isn't a good reason otherwise.
Every time I read something about "how to write good commit messages" it always seems to focus on reading the messages back as a log. That's great if that's how you use commit messages, but I tend to use them more for searching for a specific change so it doesn't really work for me. A more narrative style in commits I might need to go back to makes it easier to find them later.
How you write your commit messages should be driven by whatever you use them for afterwards.
But the PRs aren't part of your git repository? IMO a git repository should be self contained and not require a hosted provide to give context. It lets you manage your work with superior local tooling and without a browser running. Basically I take the exact opposite approach where my PRs are always just a short summary of the commit messages and provides a place for me to put the github specific things like the "Fixes #..". But I make sure my commits are a clean artifact of the changeset by themselves (ie. I do any squashing, etc, on them outside of the PR).
GitHub (by default) uses the name of the PR as the merge commit message and also includes the commit message of each commit in the log. Having whitespace-altering "Dummy commit to trigger CI, ugh!" commits in a git history isn't good but it still clutters the `git log` with stock squash+merge GitHub use.
I can't speak for everybody, but if GitHub goes down completely and I only had access to my git logs, I'd struggle to recreate ~20% of the information scattered across issues and PRs. This issue is external to merging preferences, but it's definitely not solved by squash-merges and descriptive merge messages.
> Having whitespace-altering "Dummy commit to trigger CI, ugh!" commits in a git history isn't good but it still clutters the `git log` with stock squash+merge GitHub use.
The frustrating thing about this is that this "omg minor commits on a merged branch clutter up the log!" is entirely a UI problem created by github's naive view of history where it shows things in a bafflingly obtuse linear order instead of letting you do something like `--first-parent` like the command line client lets you do.
Git itself has more than enough tools to give you that 'squashed' view without actually squashing anything, github just has no interest in providing it to you for whatever reason.
Also yes to the sibling comment that if you want to make something happen with a commit use `--allow-empty` and not "bump number" or "add random whitespace". Please.
It would also help if people were better about keeping a clean commit history for PRs. Ideally, a new commit should only get pushed to a branch per change relevant for reviewers. If the CI is causing the error, people should work on a temporary branch and resolve it there first before cherry picking things over into the PR branch (after rebasing the intermediate commits on the temp branch).
Rebasing is a really nice tool, though I think the UI is really lacking. A simple GUI for interactive rebasing would help a lot. Most clients I've used (which isn't a ton since I generally prefer the CLI) don't even have an option for rebasing at all.
> github's naive view of history where it shows things in a bafflingly obtuse linear
This hits home, it pretty much describes how I visualize logs in my head (compared to the visualizations I see that are more 2-D, branching off and merging together, etc.). I have a hard time working with some of the more advanced features because of this, and it'll probably always be an uphill battle to shift my thinking from linear to not-so-linear ...
The obtuse part about github's linear view isn't that it's linear, it's that it's interleaved by time in ways that form a chaotic view even for a linear one. Like, picking a random large project for an example, take a look at swift's history on github[1].
Because there's a bunch of PRs that were being operated on in parallel, and some probably that are even kind of old, the view you wind up with is likely a large streak of "PR merged" commits and then all the commits from those PRs jumbled together in an incoherent mess. Likely you'd have to scroll a few pages to even find some of those PR's commits (Note: git-log also has this as its default order, but you have some choices like --topo-order).
That said, I really really recommend you look at git's native --first-parent output, as I mentioned. That is likely the linear history you really want and it's right there in the client. It's the exact same thing as you get from a squashing strategy, except the history isn't gone it's just hidden.
I agree that the interactive tree views are chaotic and incoherent in their own ways. I don't use them either. I use --first-parent usually to find what I want and then I might dig in deeper if I need to.
But leaving that history there underneath means tools like git bisect can actually work, or if you need to narrow down onto a small change you actually can.
> The obtuse part about github's linear view isn't that it's linear, it's that it's interleaved by time in ways that form a chaotic view even for a linear one. Like, picking a random large project for an example, take a look at swift's history on github[1].
> Because there's a bunch of PRs that were being operated on in parallel, and some probably that are even kind of old, the view you wind up with is likely a large streak of "PR merged" commits and then all the commits from those PRs jumbled together in an incoherent mess.
Oh yes, I've just run into this problem myself: I had to backport a large set of commits. While they're of course roughly chronological, they're not strictly so, and so the result in Github's UI is a giant mess.
I sometimes hack author dates just so that commits show up in the right order in the stupid Github linear view.
It doesn't always work (on a big project like Swift it would be a lost cause), but because I care a lot about presenting my work as a sequence of commits optimized for reviewability, I try.
Seeing the --graph output really helps in understanding the branch history. The lack of a view similar to --graph on Github drives me bananas.
I don't like typing "git log --oneline --graph" all the time, so in my profile I have a `git slog` alias which is similar but adds date and author, plus truncates each line at 100 columns so it doesn't wrap:
Wait, can people really not just go into their CI systems and click a “build again” button? People actually insert ‘dummy’ commits to trigger builds?
I’ve been using Concourse to run my CI for years and years and just sort of assumed that “build again” was such basic functionality that every other CI system would also have it.
> Having whitespace-altering "Dummy commit to trigger CI, ugh!"
`git commit --allow-empty` may be sufficient for that "there is a new commit" trigger in many cases. If so, that may be preferable to whitespace changes as those clutter up the blame.
As an aside, my initial commit on a repo is an empty one so that I can branch from a completely empty repo to do radical rewrites and yet maintain a history relationship with that initial empty commit (which I feel is preferable to an orphan branch and then a merge with unrelated histories ... though those tell slightly different stories in the log).
> As an aside, my initial commit on a repo is an empty one so that I can branch from a completely empty repo to do radical rewrites and yet maintain a history relationship with that initial empty commit (which I feel is preferable to an orphan branch and then a merge with unrelated histories ... though those tell slightly different stories in the log).
Oh cool, I thought I was literally the only person on the planet to do this lol. I'd do it for branchpoints too except git rebase by default acts very poorly with empty commits in the edited history (deletes them). I wish this was normalized (ie. there was a flag to `git init` to add a commit message for a root commit).
Starting a repo with an empty commit is a cool idea. My first commit has been "Add empty README" since forever, but I like your way better and I'm going to start doing that.
I do 99% of my coding at work. At work we have Gitlab, Confluence and Jira and these all already contains mountains of context for all of our work. Pretending this context doesn't exist and storing it all in a Git repository isn't helpful.
I also do my coding at work, with Gitlab/Github, Confluence and Jira. And yet, due to various leadership decisions and technical migrations over the years, we have lost huge piles of records due to migrating work trackers, document stores, and VCS hosts (e.g. Unknown -> Jira -> Pivotal Tracker -> Jira). Due to these crappy migrations, we don't have old merge requests, old wiki documents, and even some old repositories. Sure, it's possible to find those in old archives sitting in an S3 bucket if I were to take a couple days, but they're not 'at my fingertips'.
What is at my fingertips though? All the git commits of every repository that we still have. Which means that the only thing that's actually endured has been the commits. So, on behalf of folks 5-20 years from now who'll be scrutinizing your work, please put the context into the repo directly since it's the only thing that'll stick around.
I'm speaking of coding in a larger context then... Where it can exist outside of the limits of a single company and a single team. In those sort of cases no, it doesn't really matter where you keep the history. It is a business decision and it is up to the business to decide to switch providers and absorb the loss.
I'm speaking of code that others will read. Maybe it is free software, maybe it is proprietary but shared by a larger team, maybe you are writing for customers... any way about it, if the code is to be maintained and read again over time the development history should be captured by a revision management tool like git.
There's a few reasons why this is a lossy workflow compared to commits with proper messages that are later merged, not squashed into one.
1. The PR itself becomes a single giant commit. To adequately explain the diff in a single message would require writing an essay instead of a few paragraphs. It is also now hard to tell which part of the PR message is associated with which part of the massive diff.
2. If something breaks, you can't dissect into which part of the diff caused the problem.
3. It's harder to document your work as you go. I may be working on a PR for days or weeks. Commits give me an opportunity to document each part of my change _at the time I make the change_. It's a chance for me to record how and why at the time it's fresh in my head. If I wait to do this until the PR is ready, I likely will have forgotten some bit of context I wanted to record about the commit.
4. It encourages sloppy commits, making a large PR harder to review. Ideally, I should be able to review a large PR by looking through each commit.
If your reply to all this is to say you should just submit smaller and more frequent PRs, fine, but then why ever have PRs that consist of more than one commit?
I abhor “Squash and Merge”, because I do take time to make good commits. They each pass tests, they have messages, and I rebase fixup commits away. For me, a merge commit is semantically meaningful (it even gives me a place to put who reviewed it in the merge commit message).
This probably seems like a whole lot of work for no benefit. However, I can’t tell you he number of times I’ve been on crappy plane wifi, hit `git blame` on some line and been able to comprehend my state of mind making a change years in the past.
For the code review step, sure, commit messages don't really matter unless your team reviews PRs commit-by-commit.
How many times do you actually change the default squashed message? If you write a series of garbage commit messages, I don't particularly trust that you'll write a very good squashed message, either. How many times do people skip updating the PR description with new information or features from comments? If your commit messages are good, the auto-squash message will be good and one will have a network dependency on GitHub to figure out what decisions went into that change.
In general I agree with your goal of a great commit log: 1 PR = 1 commit in the main branch. But I feel like GitHub is just the wrong tool to use if you want that. I used to use Gerrit, where commit messages _are_ your PR description. Sure, it makes you interact with git in some unfortunate ways, but the tradeoff is enforced commit cleanliness.
Everybody I've seen modifies the squash message in a professional setting. After all, that's the one that everybody else will see. Commits on your feature branch don't really matter and the entire branch can be deleted afterwards anyways
> I prefer Github's method of "git commit messages don't matter, pull requests do".
This is a bad idea unless it works well for your specific company workflows and you don't care about the future possibility of changing platforms.
Git repos are designed to be self contained, decentralised, and offline-first. If you only care about how things look on github, then the repo will have poor usability outside of github - ie. on your workstation, in your local git tools, on a repo mirror, etc.
Git commits can be a powerful tool for understanding code if the messages are useful. They are immediately accessible through local tools and can quickly add context to a block of code without breaking immersion. But that immersion is broken as soon as you hit a commit messages like "Merge pull request #123" or "fix bugs".
> I prefer Github's method of "git commit messages don't matter, pull requests do".
By doing that, you lock yourself into relying on Github in order to get the context behind a change rather than looking at the commit messages for a particular branch. That means, you cannot easily get that information just using git on the command line. On the other hand, if you put the context in a series of well formatted commit messages, you can get that context by reading through the commit log, either on the command line, or in Github by clicking on each commit.
> 1. Make it so the only merge strategy allowed on a repo is "Squash and Merge", so each PR = 1 commit in main branch
This leads to very large commits which cannot easily be reverted once any other commits that update the same files are added to the base branch. Also, in this case, the merge commits are essentially redundant, so why have them? There's nothing that prevents one from amending their commit to reference the PR number and eliminate the merge commit entirely.
> It's easier to be more expressive in a pull request, and intermediate changes while working on a PR aren't super interesting to me.
It really comes down to how those changes are presented. For example, if the change is one commit that adds a new method, and a second commit that adds calls to that method, that makes the change easier to review. Also, if a bug is found in the method, you can make a revert commit to remove all the calls to the new method, another commit to demonstrate the bug with one or more tests, another commit to fix the bug and update the tests to reflect that the fix works as expected.
If the commit was just a single PR commit, and another PR was merged, then one would have to craft a commit to undo the changes pertaining to that bug and then make a new commit in another PR to add the updated implementation. This makes it harder to see what the fix was since you essentially remove the entire feature and then commit a new version of that feature.
That's such a different approach than I've grown used to, but different platforms encourage different flows.
I've been using bitbucket (not the shiny nice new bitbucket) at work for years and its PR search is so abysmally bad that anything in the PR message but not in the commit message may as well not exist once its merged. `git log` is forever, bitbucket search is /dev/null
I'm curious what other affordances or lack of affordances encourage what git behavior.
I agree with this. Oftentimes I am experimenting around (obviously on a separate branch) and use commits as checkpoints. Not every commit is a perfectly polished state. Using separate branches and then squashing when it is ready, gives me the freedom to experiment without constraint, and then clean things up when it is ready
This happens sometimes when private repos change hands. I've worked at places where we've been given zips of git repos – commit messages reference PR and issue #'s from long distant organizations.
I've since started policing PRs whose commit messages reference things inside of our GitHub. At least provide a summary of what you're linking, for the worst case future.
If you really want just one commit on your PR you can reset your branch to its target before you merge:
git reset --soft <target>
Which will undo all commits and leave all modified files in the staging area. Then you can make one commit and force push it to replace your branch @ remote.
Throwing away the commit and doing it again can easily go wrong. It is easy to commit unintended changes by mistake.
The interactive rebase is a completely normal operation and intended for exactly this situation. It is also much easier to craft more than one commit, and last minute fixes of spelling errors and such things.
Careful here… if <target> is not in the ancestry of HEAD, resetting to it may have the effect of undoing any changes that have happened on the <target> branch that aren’t in yours.
To be safe, you can do:
git reset $(git merge-base <target> HEAD)
Which puts HEAD at the last commit in common between your branch and <target>… which, if target is already a parent/ancestor of your commit, is the same thing. But if target has had changed since you branched from it, this prevents you from undoing any recent commits.
Not really, I recently merged in a two commit branch where one commit was me changing all the vendor configuration for the framework we were using to a new version and the other was all the changes needed to support the change. That PR affected thousands of files in total but the need to frequently rebase the branch to avoid killer merge conflicts encouraged a low commit count. rebase -i can be your friend if you've got a long branch history that adds essentially no value (i.e. "Tried this thing/Didn't work reverting/Tried this other thing/Still no dice/Switching workstations").
An arbitrarily large number of file changes can be packed into a single commit, sometimes for review purposes it makes sense to purposefully isolate different groups of changes in a manner that doesn't mesh with how the dev work was actually done - sometimes I just don't want to have an ugly commit history. I'm allowed to be OCD about my work and sweep the commit where I added print __LINE_NUM__ between each LOC to track down a bug one time that I was too lazy to use gdb under the rug.
Yeah, I mean my comment more as a criticism of universally squash-merging as a policy since, not so much an endorsement of it in general. I run into cases like you describe pretty often, and I doubt I'm alone. Switching to squash-merging has some benefits but it's also brought out a fresh form of hell when too many changes are happening in too many branches at once.
Oh absolutely - we have a pretty modest sized company and we do tend to "always create merge commit" because it makes some of our deployment tooling easier but otherwise git preferences are left up to the dev and the particulars of the situation.
That's where the long-form back and forth arguments happen over the edge conditions.
Periodically a few days after a PR has been merged I'll have a discussion with someone and realize some context was never captured, so I'll just add it at the bottom of the closed+merged PR for posterity.
Yeah that means all the important information is in GH but if you migrate to GitLab or bitbucket or whatever they've got tools to pull all that info along with you when you migrate.
I don't really care about individuals commits other than to find the gitsha to find the PR that they were merged into in order to find the full context.
I'm an agnostic who keeps my distance from self-proclaimed heretics inhabiting the modern-day Church of Github (which formed as an offshoot of the Cult of Git, which itself emerged after launching a savage religious war with the iconoclastic Sect of Subversion), who prefers to have both a logical, reviewable commit history AND informational pull requests.
Smashing commit history via squash makes debugging harder. Provided you have the interactive rebase chops to pull off crafting a decent granular history, you'll reap significant dividends down the road if you keep that history intact. Because it's right there for you to use with `git blame`, etc.
And then why not have proper discussions in the pull request with reviewers? Sometimes important stuff comes up in review. Sometimes you might close the PR and start anew. Sometimes the PR gets incredibly lengthy. Maybe the result is a commit history which condenses some abominably lengthy PR.
There's no need to choose between them. You can have both!
It may be heretical but I tend to agree with this. Also at my workplace we tend to be standardized around PRs doing squash merges so there’s not really much point in having detailed interim commit messages anyway. As long as the PR has good information (and of course is linked to useful work items with good information) that’s usually good enough for me.
(Of course more often than not people have empty/useless linked work items, and PRs with useless titles and descriptions but that’s a whole different kind of problem.)
The PR commit can have sooo much good information, that can be used by so many teams. Loom videos for a quick demo of the feature. A concise description of the impact on customers, support teams. Deployment and feature rollout considerations.
I do try to make the individual commits somewhat presentable within the PR... a few extra minutes polishing the subjects lines. But nowhere near what this article, which seems to be suited for non-PR workflows such as mailing lists+patches, or workflows where every PR is squashed?
There's even a pitfall in putting too much "why" in the commit messages -- some programmers will be tempted to keep it out of the code, ie never comment their code.
I used to do that as well. The problem here is that PR belongs to a particular service provider (GitHub, GitLab and whatnot), and thus can be easily lost if you move providers.
Or even if you simply move commits to a new repository (for instance you need to clean out some sensitive information from git history).
In that sense having as much context in commit messages themselves saves the day.
I'm fully on board with the idea that the main branch should always pass.
I also like the approach where tests are performed in a temporary integration branch, then the main branch gets fast-forwarded to the tip of that integration branch only after the tests pass.
But the idea that intermediate commits must always pass tests forces you into some nasty contortions.
Say that you have to move files around, after which the tests fail, and then make a few minor changes to accommodate the new locations so the tests pass once agian. If you combine those two operations into a single step, troubleshooting that commit is a pain because the diff is gigantic.
For this reason, I prefer a less strict rule: all PRs get a merge commit and merge commits must always pass tests.
I totally agree with this, I'm not a fan of the written-in-stone nature of commit messages. Just like code, it's nice if log history is editable too in case your forgot something, or if you want the message to mention future changes that back reference it.
> Periodically a few days after a PR has been merged I'll have a discussion with someone and realize some context was never captured, so I'll just add it at the bottom of the closed+merged PR for posterity.
I saw this article early in my software development career and found it very valuable. Nostalgic and a good reminder. Send this to the still-learning software devs you care about.
We also have a rule to prepend every commit message with its issue number in our issue tracker (we don't use GitHub). That way in Git Blame/Log you can always quickly find where the change came from and why - the issue tracker usually has more detailed information.
Curious, I've run into this approach in the past and learned to hate it with passion.
That place used JIRA where the typical ticket ID was around 10 characters. That wasted a lot of prime real estate, e.g. in my email inbox, in history views (tig, gitk, and so on).
I also don't see the claimed benefit for git blame. In a real code base with significant history, it happens often enough that the first blame is an unrelated refactoring and you need to dig deeper. Therefore, I find that best practice is to look at the blamed commit first instead of the ticket. That gives you the full commit message, which can still contain a reference to an issue.
Putting footer lines like "Fixes: <issue>" into commit messages works much better. It's just as easily visible in git log etc. and doesn't waste space.
I view the commit message first, of course, and only if it's not enough, I go to the issue tracker.
>That place used JIRA where the typical ticket ID was around 10 characters. That wasted a lot of prime real estate
Interesting, just for the sake of it, I measured how much screen space the issue ID takes up on my monitor with my current font settings - around 5% (of monitor width, of course). Never been a problem for me.
If you view the commit message first, then I don't see how having the issue ID as the first thing helps your git blame use case. You can see the issue ID just as easily in a footer line of the commit message.
As for screen space, one of the standard email client layouts has a vertical split, with message titles on the left and bodies on the right. This generally fits well with 16:9 screens, but the line width for titles is limited. Having e.g. the affected component or subsystem up front is much more useful when skimming.
As long as the issue ID is somewhere in the message, I'm fine with it :) But, for example, yesterday I was cherrypicking many small commits pushed by a junior dev to wrong branches and it greatly helped that it comes first, there's less mental burden, it's easier to spot. We also have a tool which checks issue IDs in commit messages against the issue tracker (various custom states in our flow) and, again, it's more robust and simpler to implement if there's a standardized place where to put your issue ID.
We do the same and it works pretty well. The problem with that approach, though, the codebase may live for decades but issue trackers change and old tickets usually don't get migrated to the new system.
>old tickets usually don't get migrated to the new system
Well, our current issue tracker was migrated from Jira and I can still see all the ancient Jira tickets just fine (I mean what's important: the comments).
In any case, something is better than nothing at all.
I think the most important thing is that the commit message should capture the high level intent of the change. I write "intent", because the actual effect and "intent" can differ, and commit messages are immutable. It's fine, as long as you recognize that the commit message is there to capture original intent. You often see "Fix bug #1234", but you rarely see "Introduce bug #1234".
The other stuff about using consistent tense, capitalization, punctuation, etc... is important, but secondary.
The only rationales I've seen for using imperative ("fix bug") over the indicative ("fixed bug") are that it's what git does by default anyways.
Is it really that big of a deal to use the indicative sometimes and imperative other times?
For what it's worth, I always write commit messages in the imperative, out of habit. When others write commit messages in the indicative tense, part of me notices, but I move on because I can still understand the message if it adheres to the other guidelines.
None of this is really worth obsessing over I think, but one point in favor of imperative is it tends to also be the tersest phrasing, which is convenient if you're also aiming for 50 characters max.
So long as the meaning is clear, it really doesn't matter IMO. Just write it so that when viewed in isolation the meaning of the message is clear and matches the changes held within the commit.
Anyone who gets up in arms over "fixed" vs "fix" needs to get a life.
The imperative mood/future tense ("this commit will...") makes the commit the subject of the message, the past tense makes the developer the subject of the message. In my (limited, personal) experience, the people who insist on past tense commit messages have more trouble killing their darlings, are more likely to take review comments personal, and are more likely to view (part of) the codebase as theirs rather than the team's.
So for me, it's not about the commit message itself, but the communication style says something about the developer and how I best approach them.
My take on this requirement is that it helps developers let go of their attachment to the code they've written.
I use imperative, but for other projects which don't I switch to indicative if that is the style. Personally like imperative more, not sure why, just preference.
> Is it really that big of a deal to use the indicative sometimes and imperative other times?
'big', don't know, but wouldn't be surprised if there's an objectively provable higher cognitive load for having to read a mix of both (or, god forbid, other styles thrown in there as well) vs one consistent style. Just like there is for code etc. And some people are affected by striving for consistency more than others, so for them lack of consisteny might cause some extra friction.
I will play the devil's advocate: why would anyone outside of the realm of die-hard FOSS on Unix-like systems wrap text to 72 columns? Look, I'm not doing it now. This entire paragraph I'm entering into HN is one giant line of text that gets wrapped.
Suppose you're working on software for which patches will not be sent to FOSS mailing lists that are anti-HTML, anti-MIME, anti-long-line, ... why would you wrap the body of commit messages to 72 columns?
> Why would anyone outside of the realm of die-hard FOSS on Unix-like systems wrap text to 72 columns?
Because if the commit is performance related, including regression test results may be mandatory. Test results will be in some fixed format e.g. tabular, commonly exceed 72 columns, and become gibberish if wrapped.
If you don’t supply `-m`, `git` will pop open your `$EDITOR` with the new commit message. You can use whatever tools you like to wrangle text in there. I use vim, so it’s just a quick `gq<motion>` to format everything.
In my experience, when possible, the commit messages should read as part of a spec, describing the implemented behavior (mostly imperative). Even when fixing a bug, the commit would state what the code is intended to do, just as its resp. test exercises.
This way it's somewhat easier to follow the reasoning of the committed changes.
Another detail is to prefix a subsystem affected, that's in case of modular or complex application. Also help focus the attention.
My preference in general is to write single line messages (shorter ok, longer - ok too), unless it's something very non-trivial and source-code policy discourages lengthy comments.
Sometimes I wonder about repos that do not follow any guideline. I use an open source service[0] that never describes what the commit is about, so just know that any description of the commit is better than none at all.
My philosophy tends to be that your first commit should convey the task you are trying to achieve and additional commits beyond that point should be fixups, to be rebased into the single first commit before merging. This removes the back and forth of changing and reverting the same files, changes from PR comments etc. It's much more coherent.
Is there a reason why GitHub don’t make it easy in 2022 to write nice commit messages? Larger boxes, visual cues for character length, non default commit messages such as ‘Updated doc blah’ which are useless.
Lots at our place solely use the GitHub UI to edit Markdown files mainly and their messages are typically the least useful for that reason.
Though I don’t think I am really the best coder out there, I was taught by a senior to use proper Git Commit messages which I usually follow only. I am however quite frustrated by my team mates who end up re-using the same commit message essentially for bugfixes.
It matters to the same degree as being consistent about variable naming. It doesn’t change whether the system works, but if everyone is consistent then it’s easier to read and understand later.
> It matters to the same degree as being consistent about variable naming. It doesn’t change whether the system works, but if everyone is consistent then it’s easier to read and understand later.
I must admit I don't like that PEP rule. Camel case in some places, no capitals in other places. I just whatever I want really.
It's simple and worked well for us so far. Less rules, more information, and developers are free to use their words to express intention of changes instead of nonsense (feat, fix,....)
Have you gone through a ticketing system change yet with a developer team? I've seen several, and not once was "we must import the old PR's/issues/features into the new system" given any serious consideration. Conversely, every source control system migration I've seen took great pains to preserve the committer, the date and the full commit message of each separate commit.
Point being, in a few years all your [external references] will be useless. Can your commit messages explain the commit context on their own?
OK, my question is, no effective commit message system exists without a consistent ticket system.
A ticket reference gives you all you need. You see it in every line of code, from there you know the reason for that code to be written, it could include a reference to slack discussion url,...
Honest question, do you really expect your history to be meaningful long-term? Or are you simply taking the approach that the commit message is meaningless and a developer instead use GitHub search to find a PR relevant to a change they're investigating.
You mean the entire developer's thought process behind a change will be represented by a single 12x16 picture? Even hieroglyphs were more expressive than that.
> You mean the entire developer's thought process behind a change will be represented by a single 12x16 picture?
No, look at the log of FastAPI (or even gitmoji itself: https://github.com/carloscuesta/gitmoji) and you'll see that i.e. the bug emoji serves *only* to replace "bugfix: " at the beginning of the commit message, not the entirety.
As a greybeard I've gone file-based backups, then SCCS, CVS, SVN, and now git for what seems like forever (git was 2005 according to wikipedia), though probably the late 00s. And migrating from one to the next (yes, quant code from the 90s is still running in 2022, and we have the commit messages to prove it).
Some people commenting above/below above the PR being the source of in depth information. As far as PRs are concerned, it's just another issue tracker, and I've been through the lot, also including various git hosting services). The commits travel with the repository. The PR discussions don't.