I use the following convention to start the subject of commit(posted by someone in a similar HN thread):
Add = Create a capability e.g. feature, test, dependency.
Cut = Remove a capability e.g. feature, test, dependency.
Fix = Fix an issue e.g. bug, typo, accident, misstatement.
Bump = Increase the version of something e.g. dependency.
Make = Change the build process, or tooling, or infra.
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Refactor = A code change that MUST be just a refactoring.
Reformat = Refactor of formatting, e.g. omit whitespace.
Optimize = Refactor of performance, e.g. speed up code.
Document = Refactor of documentation, e.g. help files.
Yes. As is convention, commit messages should be a one line header, then and empty line an a body (if necessary).
The whole thing should be width limited to 80 or 100 characters.
And the subject line should complete the sentence "If this commit is applied, it will...". It should start with a capital letter, then move to lowercase, and necessarily will start with a verb.
At my work we almost encourage the blog post in there idea.
It's not a hard and fast rule and it's ok to ignore it when it makes sense. But we also don't mind if your commit message takes longer to write than the code took to change and debug.
A lot of context is assumed in commits, and almost all of it is temporal. Capturing as much of that as possible pays off down the line.
Commit in the article is a good example where the context explains much more than the change.
(On the flip side, the pay off has an expiry date so I'm not extremely fussed when people lax, but it's still good to check in basic assumptions whith your code)
well, I think if you start with this kind of bureaucracy, you're doing it wrong. It's not funny. Also, if your app is down, clients calling and you need to fix it fast. If I need to remember this rules to fix this, people will kill me, or I will kill myself :p
After a few days practicing this, it becomes second nature, and doesn't hold you up at all. Commit histories also become much quicker to parse through, especially with "log --oneline".
I notice no hindrance on my speed because I write good commit messages, and always value them when I one back to them.
> And the subject line should complete the sentence "If this commit is applied, it will...".
I kind of do it like this. Others use subjects that complete the sentence "This commit...", so their subjects will start with "adds", "fixes", etc. Though that adds one or two extra characters!
I think that’s opposing incorrect things, in this case. I think the spirit of The Law should be clear and kept alive. Whether to “[If applied, this commit will] <commit msg here>” or “[This commit]<verb> <work description>” is a tiny matter of difference when the point is clear, straightforward, complete commit msgs of logically discreet-and-coherent commits. Both the above formats would fit the bill, and I think I’d be thrilled if my biggest issue w code commits amongst my team were only these two slightly different formats.
The other thing I do is add a "MODULE:" prefix to my commit message. It makes them way more readable. Lots of other people do this too; you can see this in Redis, Linux, and Go codebases, for example.
So your message might look like "router: add support for path vars". Much easier to read than "Add support for path vars to router"
I don't stick to it as a rule but if it's clearly grouped in some way it gives people in the wider org a high level fyi about the area of code you're messing with.
Makes it easy to jump over the commit when bisecting for regressions or hastily looking for the buggy commit to revert.
I also use something like this. A variation of Angular's commit convention[1]. There are tools like commitizen[2] that can help new adopters to build commit messages.
I work at a place that used to prescribe this. I dislike it because it adds too much repetitive noise to the abridged git log. (It occurs to me that you may be making this point in a tongue-in-cheek way.)
Originally I suggested we put the JIRA number/link in the commit body, but then I learned about git-notes to add metadata to commits and now I kinda want to do this with the semantic labels suggested by the thread OP, too (I currently use the schema suggested at https://seesparkbox.com/foundry/semantic_commit_messages).
Notes kinda bloat the repository with additional objects, and they can be removed independent from the commits, so I'm a bit iffy on using them for this.
The branch is called feature/<Ticket ID>-<short-description>
The commits contain usual commit messages. When merging, changes are quashed. The final commit message is similar to the branch name and includes the ticket and a short description, which is similar to the mentioned pattern (bump/add/change/whatever).
I find that awful, because now half your subject line is burned by completely useless information: the JIRA ticket number doesn't provide any useful information when reading a git log / summary.
Yeah and if you no long use JIRA it's not useful. Whatever bug tracking software you use should allow for commits to be attached to bugs. But that should stay in the bug tracking software, not the VCS.
I see pretty consistent vocab used across orgs anyway, so given there is shardd domain knowledge / language at play, im not sure what the goal for standardization is?
Don't take this the wrong way, I use a lot of these when appropriate, but I don't think I could agree that refactoring must be just a refactor, and I don't think I want to limit anyone's commits to this list of change types either.
Forcing changes into these words means a lot of stuff is pointlessly bucketed when more appropriate wording could be used.
Say I want to push a commit "prioritise shipping route a over route b".
So I have to put it under start? Optimise? Fix? Why? Prioritise is the right word, why not just use that? Why play mind games when we have a whole dictionary to draw from?
Mercurial inherited this style of commit messages from Linux email-based code review (the Mercurial originator was a kernel hacker), because in that workflow your commit messages are kind of a persuasive essay for why your commit should be accepted. I believe that writing commit messages with that kind of goal in mind, thinking “why should you take this commit?” is a good motivator for writing something good and useful.
I think my favorite (in terms of humor) is a commit from mpv complaining about locales and encodings. You can practically feel the committer's sheer frustration.
> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.
It's where the commit message comes in the process that makes them noteworthy. They're the final chance for a developer to vent their spleen or blow their trumpet before signing off on the results of possibly days of work.
Back then when I was working for a Japanese outsourced project, the code won't compile unless the computer's locale was set to Japanese because the C code had comments in Shift JIS.
My favorite (in terms of dark humor, if we’re honest) is YOLO, one of the more interesting deep learning object detectors. [1] It is the exact opposite of yours in every way. The code is brilliant however.
> But maybe a better question is: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook.I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to.... wait, you’re saying that’s exactly what it will be used for?? Oh.
> Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait.....
> The author is funded by the Office of Naval Research and Google.
Yeah, my own resume isn't (or wasn't, it has been years since i updated it) that flair-y, but it does have a sidebar with screenshots of stuff i worked on (with my face doing a weird look at the top) and is in mostly prose style with bold text for each of the projects i worked on (bold helps to scan for stuff, at least that was the idea). I've been told a few times (by friends) that this isn't how resumes are supposed to look like but my line of thinking is that if you are so stuck up that you want a specific format for a resume, then you're most likely too stuck up for me to want to work with you :-P.
I was reading it and kind of interested, and thought this line was funny:
> I have a lot of hope that most of the people using computer vision are just doing happy, good stuff with it, like
[...] tracking their cat as it wanders around their house
And then suddenly realized I've wanted to do exactly that for a long time. Well... specifically install a camera that can detect my cat on the counter (and not my hands doing stuff) and sound an alarm/puff air to get him off.
Could this work for that? I think it could! I know my winter project...
On the topic of cats vs automation, I'd recommend reading this post [1] describing the arms race created by the writer's cat attempting to break into an automated feeding machine. HN discussion here [2]
We had an automated feeder with two chambers, each held down by a rotating, ticking timer switch. You rotate the switch to, say, 12 hours in the future, and 12 hours later the slot lines up and the lid pops over.
After many months, my cat learned to stand on top of the lid and use both paws to rotate the switch forward until the lid popped open.
This blew my mind. The switch was separated from the lid. I could imagine the cat attacking the lid itself, but this separate mechanism, requiring a motion completely distinct from the motion of an opening lid, and requiring patience without instantaneous reward or even evidence that it was going to work.... I just couldn't believe it.
There are few tech stories I enjoy more than the back-and-forth of breaking and improving a thing. A story where one side of the conflict is a cat means this might be my new favorite!
I've had good luck using low tech puzzle feeders with my cats. Each one is a bowl with a maze inside or a tower with various windows and trap doors they need to work the food out of to be able to eat. As the food levels get lower the pieces get harder to reach, which allows their laziness to take over and stop them from eating too much.
I love how he just randomly carries on his own internal dialog in his writing as if he really just doesn't give a fuck who's reading it. Totally brilliant!
I love seeing a section like this when reviewing a paper. I really wish more authors would include one. (Goodness knows I've chased down enough dead-ends in some of my own research efforts.)
Yeah, it's fun, but, seriously, git log is messy AF. I wouldn't appreciate it a bit, if somebody would do that to a project I'm involved in.
What's funny, though, the paper (written in a pretty much the same "fuck you" manner) is much more readable and informative than the average. Which says a lot about science papers out there.
Save a file in Notepad. Open in vi. See that it is different. Find data in the database, no clue the weird characters were originally supposed to be. And so on and so forth.
I once wrote a reasonable program and sent it as a bug report to the maintainer of the Perl module DBD::File. He sent it as a bug report to BerkeleyDB. They said they never thought about it but yes, that would be silent data corruption with no way to recover. The program? Maintain an address book in a BTree sorted in the current locale. Enter names. Change locale and insert something else. Voila! Lost data with no way to recover!
I had to write the following just this year because SQL Server still defaults to using CP 1252 for text. The culprit? One of those damned stylized quotes that Office loves to insert for you. The code:
Linux is not entirely UTF-8, though plenty of people treate it as if is so. Even if on Linux, you might need to consume files from other OS's or other Linux systems with different locales. Once had issues with systems configured with "C" locale vs "en-US" should have been near identical, but enough slight differences to cause failures. Been +10 years, so I dont remember the details.
Windows, the OS, is UTF-16 (or UCS-2 - I forget the details between the two), SQL Server has, for historical reasons, defaulted to CP1252, probably for compatibility with Office components.
But, it's not really a Windows problem, per se, because you have to deal with this issue if you deal with data originated from numerous Windows apps, even if on Linux. Yeah, you can insert a byte order mark (BOM) to indicate UTF-8, but most tools expecting UTF-8 actually dont check for the BOM and blow up in interesting ways if present. Ive seen this far too many times. Enough that anytime I see an encoding error from the likes of Python or Ruby, its an instant recognition (I do a lot ETL work from a number of vendors, so I see a lot of different files "types").
This needs to be REQUIRED READING at the Open Group and the ISO C standards committees.
I'll quibble just a bit and say that:
a) the C locale should be a UTF-8 locale...
that tolerates invalid sequences (because
the C locale historically is a just-use-8
locale),
b) even with new functions that take a locale
handle, we need functions that use a global
one, however that global one should be set
once and NEVER changed in the life of the
process, and it should be set either before
main() starts, or before main() does anything
that needs a locale, or starts any threads.
Really? I would say 90% of the commit message is technical, rest is just emphasizing his frustration, which is pretty much justified. Yes, he is a person that seemingly gets pissed off by bullshit design decisions and whatnot. So?
I disagree. Words are just words and we give them meaning. Being derogatory and unkind to mentally deficient folks is ethically wrong. Using that word in a different context to communicate frustrating imo is fine.
Try telling that to my boss, who has an autistic child. I held your opinion until I stuffed my foot in my mouth in a meeting. Now I don't use that word.
Wait until a derogatory word being used effects you directly. It's really easy to just not use words flippantly that you know some people have an issue with. You are clearly aware of how it can be offensive and for a lot of people your definition is still just a callback to people being unkind about mental illness. History matters too and if you choose to ignore that and use it anyway you're just being unnecessarily inconsiderate.
That is... wonderful. I've spent some time dealing with locales in C and other places that depend on the things being discussed in the commit. Just reading it bring back some of the rage I felt.
I do disagree with the assertion that it takes a lot of code to convert between the various UTF variants, 3 pages is an overestimate. https://stackoverflow.com/a/148766/5987
For anybody wondering, the likely origin of the invalid character is somebody using an Apple Ireland/UK keyboard layout where # is Option-3 (AltGr-3), and non-breaking space is Option-Space (AltGr-Space).
I recently added a Rake task to one of our builds which checks for the exact problem mentioned in GP, after having 3 separate occasions in the last 6 months where OSX "smart" characters have changed the encoding of a file consumed by things expecting pure ascii.
Unfortunately it is a bit of a hack that shells out to "file -i", but I'll take it over hours of frustration.
I don't know how many times these non-breaking spaces caused problems. I think linters should prevent commits that contain non-breaking spaces. And if really one is needed, it should be encoded as ` ` or with whatever encoding is relevant.
…or fix the non-Unicode compatible systems that are consuming the commit messages and breaking? If they fail with an nbsp then they’re probably also going to fail with more obviously useful non-ASCII characters.
How often are non-breaking spaces purposely inserted vs accidentally? And the tools might handle them fine but will produce strange results or errors. An example is inserting a non-breaking space in a document or string. It will prevent word wrap, which might not have been the user intent. A linter that requires these spaces to be explicitly set in encoded form would avoid these issues.
Non-breaking spaces are required to typeset French correctly, at least: https://en.wikipedia.org/wiki/Question_mark#Stylistic_varian.... The accidental insertion of non-breaking spaces is a possible issue, and it's a bit harder to detect than other typos, but it's also probably not as bad as other typos. Overall I think it's a bit hard to make the case that they should be disallowed.
I was forced to gain very intimate knowledge of a web based rich text editor that would use non-breaking space characters as markers to monitor current user selection.
My commits are usually short and sweet - to the point.
I document my code very well, however.
One of my strengths in a previous life as a Master Automobile Technician was the ability to document the entire process -- from duplication of a concern, to troubleshooting, to correction, to verification...it's literally how I got paid (which I never understood why so many automotive techs took short cuts while documenting, especially for warranty concerns where you deserve to be paid (by the manufacturer) for everything you did that was necessary to fix the concern the first time).
I could be mistaken (life-long coder, former network engineer / architect for nearly a decade, but I'm currently in my first-ever role as an actual backend developer). I think I was told to keep commits to one line unless absolutely necessary. I'll have to bring this up though. I like the idea of searching through the git logs for specifics, as opposed to having outlook search through the git commits for actual pieces of code changed, or error codes which might not actually be there.
In either event, at least more descriptive, yet still short, messages such as what strictfp suggested "Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII" Although I really like having some reasoning and logic - or how/why in those easily-searchable logs as well.
I think some people remember everything and some people don't (although we're all probably roughly similar logically). I have a hard time remembering what I ate for lunch yesterday - that's why I count on good documentation to function as efficiently as possible in the future. I've worked with people who have an absolute uncanny ability to remember 'stuff'. That's impressive, but I do not have that ability myself.
> I think I was told to keep commits to one line unless absolutely necessary.
The advice I've heard: Your first line should be a concise summary of the commit. This is because a lot of UIs only show the first line up front. (GitHub, git log --pretty=oneline, etc.) However, it's okay (and often encouraged) to go into further detail on subsequent lines.
Commit messages should describe why you're doing something ("X asked", or "[reams of supporting evidence why this needed to be made faster but more confusing]"). It provides context to current reviewers, and future archeologists who wonder what you were drinking at the time. Perhaps you had a good reason for doing [insane thing X]! Perhaps you didn't. If you didn't write it down, they might change or leave it, and break something or prevent something from getting a proper fix.
Code comments should be notes to code-readers that are relevant at all times until changed or deleted. "How to use this", "beware changing X", "Z is hot garbage and should be replaced if used for Q". Ideally you'll have asserts or tests or something that actually enforce this, but of course that's not always a realistic option. Comments in code will follow the code around, and don't require chasing code history through N layers of refactoring and indentation-wars, which is what makes commit messages mostly inappropriate for needs like this.
I do feel like Git commit descriptions are severely under-utilised for sure, but I believe there is a reason for that which until fixed, will prevent rich and contentful commit descriptions for flourishing.
In the article order: the screenshot is from a commit detail page. How often do you land on this page? You need to specifically click through. If you are in a commit list, the only thing that sets title-only commits and commits with description apart is an ellipsis link which practically blends in with the background. It is not very well integrated nor discoverable. Also I don't believe the commit descriptions render as MarkDown (unlike issues) which is also a shame as it feels like less a doc then. But I might be wrong on this. But even outside of GitHub, how many other UI/IDE plugins and other kinds of Git tools restrict commit display to just the title and put the description on the sideline? Most of them. I think this further leads to the currently low value of the commit message being searchable. Since it is exceedingly rare for good commit messages to exist, no one thinks to search them. People default to Google, when their own project's codebase/knowledge base could hold the answer to their query. I don't have much of an opinion on the commit message telling a story / having a human touch. I mean, it doesn't hurt I guess, but until _full_ commit messages become more "mainstream" (for a lack of a better word), they can be as human as they can, but they will live in solitude.
In the same vein, I wish there would be a standard workflow for putting code inside commit logs.
The typical use case would be database migration scripts : IMO they are always a pain to version properly because fundamentally Git and all the other software versioning tools let you describe the "nodes" (in the graph theory sense of the word) of a codebase ("at commit A the code was in this state, at commit B it was in the other state"), but severely lack when it comes to describe the edges between nodes ("in order to go from state A to state B, you need to execute this code")
I think the temporal dimension of software engineering is still poorly understood, and severely undertooled.
Strongly agree. I run into this a lot with stuff like style commits. You want to ensure that a change hasn't slipped under the radar.
I tend to go for something like:
This commit was generated.
<shell script here>
It would be super awesome to have a tool that easily verifies that A->B can actually be reproducibly accomplished by performing the actions in the commit message.
Sometimes no. For instance a database migration script is not equivalent to the diff between two database creation scripts.
While it should be possible to take a database creation script (state A) and a database migration script (edge A->B) and infer the new resulting database creation script (state B), the reverse is not true.
How do DB migrations relate to the source control graph?
It seems you're implying the codebase changed as a result of a script that itself is not source controlled. I can think of style commits falling in this category, like one of the children of this comment mentions, but DB migrations don't seem to be related.
Where they're really valuable, IMO, is when you're tracking down when/why a change was made with `git blame`. When you're looking for the reasoning behind a change, it's extremely helpful if there's a detailed commit message going along with it.
If only important/tricky commits have long messages it's fine. Most IDEs will show you the full text if you hover over the line, so when working on something it's easy enough to check them. I'm huge fan of the timemachine-like "Show Git History for Selection" view that IntelliJ has, so usually there's no reason to go to a Github/Gitlab commit page.
> I don't want your entire life story in my commit log.
I[1] want enough debug information in the commit log to be able to reproduce the issue without having to go on web hunts to understand the problem. Especially when the change appears to be trivial on the surface, because these are the ones that can turn out to be rabbit holes.
I don't want to have to interrupt you to get this information because you didn't write a good enough commit message, and you probably don't remember anyway. I don't want go look at an external issue tracker that i may not have access to, or may not even exist anymore.
[1] Where "I" is: me, your future self, a future maintainer, a junior dev, an open source contributor.
To me, at least, the issue with that commit message is the signal-to-noise ratio.
There is a lot of exposition for each piece of information. I prefer a more declarative commit message.
However, from the writing, I suspect this is just due to the committer not being a native English speaker.
e.g. the first paragraph doesn't lose any important information trimming it down to:
"After adding a test matching the contents of router_routes.conf,
`bundle exec rake` fails with:
ArgumentError:
invalid byte sequence in US-ASCII
"
Realistically, it would have been a better commit message if they'd given the shortlog SHA where the test was added that exposed the bug rather than an explanation of what the test does.
"After adding test <testname> (08c3e17), `bundle exec rake` fails with:"
Hmmm, I've always believed that no commit should break a build, even if you're committing the fix right after. Otherwise you're going to cause problems for `git bisect` or other practices of going through the history to find where a problem may have started.
Do other people commit breaking tests and then fixes?
I think this depends quite a bit on what other contributors are doing - it's one of those cases where several approaches are acceptable, but inconsistency is not.
"Commits should always build" is one doctrine I've seen. As you say, it makes bisecting and other error-analysis approaches easy. On the other hand, it risks either having large, opaque commits, or adding overhead to make intermediate commits build - possibly with flawed/meaningless behavior when they do.
Another is "the trunk should always build". In that case, you'd just squash branch commits down to logical groupings that are easy to analyze, whether or not things build. You can bisect on the trunk, but lose all guarantees about state on branches.
Finally, I've seen variations on "no commits that break the product", "no commits that make things worse", or "no committing failing tests without subsequent fixes". In this case, you can't generally commit broken builds, but can specifically add failing tests. The first rule just means "adding failing tests is ok", the second means "converting runtime bugs to failing tests is ok", and the third means "write your test and fix, but split (and ideally tag) the commits". All of these break bisect, but they guarantee the project itself won't become more broken from commit to commit, and they can help with other forms of reasoning about where bugs first occurred.
Every approach there seems viable if you stick to it. If there's no established practice, I suppose the best choice would be based on what sort of work and debugging is most likely to apply.
I might be missing something basic here. Isn't the "no commit should break a build" impossible to enforce on a codebase where you need to push a commit to run the tests?
Something where you can't test locally, like when testing on multiple architectures or when the tests just take too long for a laptop.
In my workflow, that would be in a feature branch, and exploratory branches can certainly break, but before I made a PR I would rebase my changes such that none of the commits broke the build.
It's also a different situation. The original one is "I've made a test that shows a problem." Your example is a surprise "I don't know whether this will pass my cloud-based tests." I would edit my branch if I had a surprise failure, since my initial code clearly wasn't correct.
I do this, but not in a way then end up on master. It's a driving force behind my preference for squash-and-rebase merge patterns.
A good bugfix PR is often two commits then: one with a test to catch the breakage, another to fix it so the tests pass. Reviewers can see the failing-then-passing CI job logs, so if they agree your test catches the bug, they have additional CI-automated validation your fix worked.
Then as long as you squash when completing the merge, you get the best of both worlds.
It goes without saying that commits should include tests that cover the change, where possible.
> immutably
That's what makes this modus operandi so powerful IMO - comments in code may go unmaintained, tests may start failing for other reasons, issue trackers come and go, developers leave the company, documentation rots.
The commit message is (unless you have a bad actor) immutably linked to the original change, and that's exactly why you should be thorough in expressing its reason for being. I can git checkout the point in time (perhaps having bisected) and have the information to allow me to reproduce the issue.
≥ Especially when the change appears to be trivial on the surface
Comments about the code should be in the code, where the next dev will see it. The more trivial a change, with far- reaching implications, the more important this is.
Doing so has heaps of benefits: future devs understand ramifications, shows that this code has been scrutinized, makes it easier when doing refactoring /yanking, or porting code.
That said, leaving the life story out will always be a good idea.
Pagure [0], Fedora's git forge, hosts code, issues, docs, and pull requests as four separate git repositories under the hood [1]. However, only project administrators can clone most of those repos.
I can imagine that working to a degree: Make a fork of a commit at an issue, then merge that fork back in with master at point of fix. Bit of a mess in the tree though.
I don't want to read what the code does (I can read that myself, thanks!), I want to know WHY it does it the way it does it - especially, if there is a more obvious, better way.
Also: People leave companies. Or die. At some point in time, you won't be able to ask the original author.
No thanks. If I had a dime for every function that's so obviously self-documenting to what it does.. etc. If you tell others what it does and why, concisely and thoughtfully, no one has to try and mentally parse the what of some clever undescriptive block of code.
On the other end of the spectrum you get ImageMagick useless commit messages[0].
That extreme aside, I'd rather have commit messages that delve into the why-and-how the commit alters the behavior to the better rather than cryptic message as 'Replace invalid ASCII char'. Now we have documented reasoning and thought process that can aid future debugging. They can also be beneficial for new devs hacking on the project, or students learning how to implement and improve systems.
Personally, I enjoy reading these. The Go commits often have commit messages like these, and they are shared on HN often for a reason. They're learning material. They can't go on a wiki because they're tied to particular set of changes in a particular point in history. They also can't be comments on the code because they're tied to particular lines in different files, and code comments can only cover a set of consecutive lines in one file.
One recent example I could find is this[1]. Yeah, it fixes ^Z, but why didn't the old approach work? Why did it work for some time then didn't? How did it change? Why is this commit optimal, if it is? All of this along with scenarios to reproduce the issue.
Give me your life story anytime over cryptic message.
Agreed. When at some point the website that they are pointing to changes in the future they will lose all context on why a change was made.
I believe in the "plane flying across the ocean without WiFi test" or basically anywhere without Internet access. If I am on a plane flying across the ocean without WiFi, do I have the information in the git commit to understand what happened. A git message that consists entirely of a link to a website is useless in that case.
You can write the brief summary in the first 80 characters, like OP did. Then write details in the body below, in case someone needs the context. Most tools display only the first 80 characters unless you expand the body.
This case is probably longer than necessary, but I've saved a day of debugging on multiple occasions due to someone (also myself) leaving some lines of context, reasons and reasoning after the high-level description.
> I don't want your entire life story in my commit log.
Why not? Where else do you want it? Is something forcing you to read the full commit log?
There's no length limit on commit messages and commit messages are mostly out of the way. Most VCSes have a way to only show you the first line. So if you want summaries, that's what the first line is for. If you want the full story, that's what the body is for.
Combined with annotate/blame, commit messages can be very helpful source-level documentation. Nobody has ever complained about too much documentation, and commit messages are the perfect time to document what happened because it's one of the few times where our tools actually force us to write something in order to proceed. As long as we're being forced to write something, write something good and informative.
I think the problem isn't the length or content of the commit message, but its organization. It needs to have the most important information first. It reads as an "entire life story" because it is written in a narrative, sequential form. Better organization would make it skimmable, and later coders could only read as far as they need to.
If I'm searching commits, I'm trying to find record of what changed and when. I only want clues, and quick skimming is paramount. I want no personality. I want concise descriptive commit messages.
That said, we reference an ID from our project management software with every commit, so once I find the commit I'm looking for, I can reference it back to external documentation. I still discourage personality there as well because it can get out of hand and clutter the comments, but it's more forgivable than being on the commit itself.
The pull request is a good place to put such a large amount of information. That would also be a good way to make sure it is seen by the broader team instead of burying it in commit history. You could make the argument that then it would not be part of the git history and therefore could be lost if you change hosts.
I never understood this philosophy. What makes Git more eternal than any other technology? Why is putting all of your data in one monolithic tool a good solution?
You might change your issue tracking solution. You might change your host solution. You might change your review platform. You might also change your VCS solution. Nothing is eternal.
The commit message itself is way more eternal than GitHub ephemera. There's plenty of old codebases in git that were imported from SVN (or even older RCSes) with all commits intact. What's likely not intact is data in ancient issue trackers from decades past. git is a DVCS, so anyone can clone the repo and get all the commit information. Cloning the issues and such is not nearly so trivial, and isn't a part of the git protocol itself so there's no guarantee it's in any kind of interchangeable format.
Important information should not just be in PR comments. It should be added into the commit information itself so that it'll be maximally available going forward. A good, fully explanatory commit message is a huge asset, and those commit messages will exist for the entire lifetime of the codebase. Anything else, not so much.
I didn't say anything about git. I said the commit messages are eternal, and none of those changes will change the commit messages (except, perhaps, changing the VCS, but usually that will preserve commit messages too).
You'd be foolish to do a VCS migration that discards the commit messages. I've never seen it happen personally, as people tend not to be that foolish. I've worked with legacy codebases that went from CVS -> SVN -> git and all of the commit messages going back to the very beginning are intact, because why would you ever do a migration that doesn't maintain them?
This is a really convincing argument. I was with the parent commenter until I read this; I was like, this is totally PR stuff! But hadn't considered offline situations, or host switches. Thanks op!
> I don't want your entire life story in my commit log.
I agree with this, but I think yours is too short.
Scientific papers typically introduce enough information such that a person familiar with the field but not an expert in that particular area can understand generally what's going on.
That's my ideal for a commit message as well: someone generally familiar with the codebase but who hasn't looked at this specific code (or perhaps not in a few months) should be able to understand what's going on; then the job of the reviewer is basically just verification.
My "template" is normally something like: 1) What's the current situation 2) Why that's a problem 3) How this patch fixes it. So in this case, it might look something like this:
---
Convert template to US-ASCII to fix error
$functions use `.with_content(//)` matchers to do X. These matchers require ASCII content. The $foo template contains a non-ASCII space; this results in the following error:
ArgumentError:
invalid byte sequence in US-ASCII
Fix this by replacing the non-ASCII space with an ASCII space.
---
No need for a life story, but still searchable, and has enough information for even a casual contributor to do a useful review.
I, too, would rate this a substandard git comment.
Dave basically vomited a bug ticket of information, which is highly contextual and irrelevant ... like the lines he was faced with, which tell us nothing in the future nor anything we could not see in the change. The error is known, from the ticket being addressed. Documenting what error a bundler throws in the application deployment, within git seems...silly, since it will likely not apply to all points in time. That's why we have separate issue tracking.
There was a whitespace encoding issue AND the developer didn't really understand the issue, since they ended with "One hour of my life I won't get back.". Over my 20 years, I've seen this EXACT scenario multiple times across multiple companies. Some jr engineer gets stuck with some troublesome weird error in a corner-case that ends up being a non-standard whitespace. It's a learning opportunity and he lamented it because it was different and nobody told him "we could stop this from happening again, generate a new issue".
There are salient improvements that the git commit would benefit from both comment changes and additional code:
1. Include a (new) feature ticket that is linked to this issue - to create a process that doesn't allow for this again (eg fix a linter)
2. Include the name of the bug ticket (Convert template to US-ASCII to fix error) in the commit title, that was being addressed.
3. Create a test to specifically enforce the us-ascii encoding or add necessary rules to a linter.
For critical applications, I for one would like to know the story behind a commit, preferably in the commit itself and not a reference to an external system like idk, Jira.
My favorite examples of commit messages are the Linux kernel, where you can tell that they're being specifically crafted instead of just used as a work log to be ignored. This means that ten years down the line, people can still see when a change was made and why, who was involved, who signed off on it, etc. Have a look at the commits at https://github.com/torvalds/linux/commits/master
This is true when you can reference the commit to an issue. Then, seeing the simple commit message you can select if you want to dig up what happened by reading up the comments at the issue.
On the other hand it really gets into my nerves when people don't use the task/issue/whatever manager system appropriately. Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline. In general I'm really disappointed by the majority of my colleagues for the lack of comments inside and outside of our codebase and this is a persistent issue, at all the companies I worked for. Me along with other similarly irritated people, always ask for documentation if it is not given.
> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline
Some people do it for job safety. The logic is if you don't document things and the knowledge is only in your head then you are more valuable, they can't get rid of you easily. If you document everything meticulously, then you are easier to replace.
> Some people do it for job safety. The logic is ...
Has anyone actually seen this logic work out well for the person that invokes it? Generally the type of person that uses it is one that you probably don't want on your team.
I know in instances like that, though, my next step would be to start working on contingency plans, as if someone has proven themselves to be indispensable, then that very fact is a risk that needs to be managed.
> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline.
This is assuming documenting the pipeline would have been helping! You may have spent a few days instead figuring out why your seemingly identical setup couldn't reproduce the build...
Not that I'm bitter about build systems or anything.
Talented coworkers dont need documentation very often... If someone cant figure out how to compile something, its likely they are missing knowledge about the language in general...
Git greps work if you're trying to search all the logs. I would think this detailed documentation would be most important when you're trying to understand a specific file or line of code. In that case, it's:
Generally speaking sure, there's no need to make things more complicated than they are, but the author even found some evidence in the history that indicates other people found this message useful.
The powerful thing about this is having everyone put this kind of info in the same place IF they think it might be useful to the next person.
Still, you've left out the details that you've confirmed that there's no other instances of this in our codebase. I'm also firmly in the "all commit messages should include a test plan" camp, so you should at least say how you found the error ("bundle exec rake was run before and after").
I get you're being terse for demonstrative purposes, but even eschewing verbosity we should still convey all the pertinent information.
One downside to that approach: it requires your issue tracker to be stable for long periods of time. I've worked in a number of places where that's not true and you end up needing to figure out that the #123 linked by the system you're using now was actually #123 in the old system and was migrated as #456 in the current one.
There's a balance here and I especially like that this commit message has enough information to make searches really easy should you need to do something like that.
That's why most guidelines for commit messages prescribe a short description and an optional long description. The message in the article does not have a short description, which would have been easy to include. For that reason, it's not "My favorite Git commit message" either.
I agree with the sentiment; this message is quite long for an invalid character in a file.
House style in the companies I've worked for is to include a link to a bug report and or code review that provides more context for those who want it. Even without that added context, I'd rather know
That's great for you. You don't want that. However, if you code in a team, doing everything for your own wants rather than considering the needs of the team (present and future) is just bad software engineering.
One of my favorite pranks is to put control characters in the commit message (like the bell) and then you get an auditory notification anytime anyone nearby opens your commit messages.
Also, if you haven't seen it before, read the Linux kernel Changelog. The latest Changelog can be found at [0]. Almost every commit tells a story, unless it's a trivial fix. If there's a bug, it often contains detailed analysis and rationals, and it's a form of important documentation.
Although it's not always practical to follow them in personal/work projects - Linux commits are the results of multiple rounds of reviews, and the commit log is its justification - but in personal/work projects, commits are made in real-time as soon as you debugged/refactored something. But I still use Linux kernel as a guideline for my own commit log, at least for new features or bugfixes.
I was working with a research team at UCLA and we used Wolfram Mathematica to process our results.
In Mathematica, pretty much any object can be a variable name. You can drag a JPEG of Kim Jong-un into Mathematica and integrate an expression with respect to Kim Jong-un. We'd sometimes get a kick out of that.
Near the end of the program, our whole team needed to process the last 3 months of results, but we were all getting consistently incorrect factors off when running our Mathematica notebooks. Five hours later someone discovered that one of the variables contained a stray Unicode whitespace or null character (or maybe a non-Unicode blank Mathematica object, such as a Graphics object with 0 area) that someone must have accidentally spawned somehow before saving and distributed the notebook to the rest of the team. Since Mathematica didn't recognize it as spacing but as part of the variable name, making it a different variable, the result of our integrals were incorrect. E.g. the integral of x^2 is x^3/3, but the integral of xx' is x^2x'/2, so the multiplier would be off by a factor of 3/2.
After discovering and selecting it, we "cut" it into the clipboard, pasted it into another Mathematica notebook, saved it, and it was never opened again.
This reminds of a Markdown issue I've had many many many times - sometimes (and only in some engines), headings would not render and I'd only get '# foobar' instead of '<hx>...'
It took too long for me to track the issue. When I write '#' using alt-3, I then write a space and oftentimes I don't lift alt soon enough and alt-space creates a non-breaking space (on macOS). And some/most Markdown engines don't recognise '#nbsp;text' as a heading.
I suspect something like this happened in the commit linked here.
This happened to me all the time, especially with python2 code without encoding declared (which caused a failure to parse the file because of the comment).
I’ve since switched my editors to highlight such characters.
I love these commits. Then don't have to be this verbose, but they have to tell a story of why things were done. I can sort of deduce the what from the code itself, but the why is sometimes shrouded in mystery.
I started with these explanatory git commits a few months ago and they are super useful, even if you're just reading your own commits from some time ago.
Putting this in the commit is not easily searchable, not universaly accepted and thus not expected, not practical and certainly can't involve discussion easily. This can be replaced with ticket number as most ticketing systems will actually read commit logs for those in order to associate ticket with code.
Now, there is that problem with decoupling code and story, but this is technical problem, nothing stops Gitlab and friends storring issues and friends in the repository itself.
I use these commits, but also use a proper issue tracking system. So I'm not quite sure your comment applies. The reason I'm doing this is:
1. If I'm looking at some code, I want to see its history without having to switch between git(lab|hub) and jira or whatever system I'm using.
2. The issue tracking system doesn't necessarily have some kind of resolve, a summary of what and why happened. It does have a description and a series of comments, but a summary is usually lacking.
3. I believe my commit history will far outlive any issue tracking system I use. So I'm safer putting information into both.
To me its beautiful, because it does what an issue tracking system does not do: it explains everything. Who, why, how, what, when. It is beautiful and simple documentation.
Issue tracking typically revolves about the who, what, when - not why it happened, or how it was resolved.
This is why I believe that code can never be fully self-documenting. I can't understand why the code exists from reading it. All the floofy contextual stuff is missing, and commits like this help to explain the floofiness.
I think that you are not using properly the issue tracker.
You absolutely need as a bare minimum the why and the reporter of the issue.
When you do a blame you can easily see the issue id and open it with 1 click to understand why that code exists and who requested the change.