I use the following convention to start the subject of commit(posted by someone in a similar HN thread):
Add = Create a capability e.g. feature, test, dependency.
Cut = Remove a capability e.g. feature, test, dependency.
Fix = Fix an issue e.g. bug, typo, accident, misstatement.
Bump = Increase the version of something e.g. dependency.
Make = Change the build process, or tooling, or infra.
Start = Begin doing something; e.g. create a feature flag.
Stop = End doing something; e.g. remove a feature flag.
Refactor = A code change that MUST be just a refactoring.
Reformat = Refactor of formatting, e.g. omit whitespace.
Optimize = Refactor of performance, e.g. speed up code.
Document = Refactor of documentation, e.g. help files.
Yes. As is convention, commit messages should be a one line header, then and empty line an a body (if necessary).
The whole thing should be width limited to 80 or 100 characters.
And the subject line should complete the sentence "If this commit is applied, it will...". It should start with a capital letter, then move to lowercase, and necessarily will start with a verb.
At my work we almost encourage the blog post in there idea.
It's not a hard and fast rule and it's ok to ignore it when it makes sense. But we also don't mind if your commit message takes longer to write than the code took to change and debug.
A lot of context is assumed in commits, and almost all of it is temporal. Capturing as much of that as possible pays off down the line.
Commit in the article is a good example where the context explains much more than the change.
(On the flip side, the pay off has an expiry date so I'm not extremely fussed when people lax, but it's still good to check in basic assumptions whith your code)
well, I think if you start with this kind of bureaucracy, you're doing it wrong. It's not funny. Also, if your app is down, clients calling and you need to fix it fast. If I need to remember this rules to fix this, people will kill me, or I will kill myself :p
After a few days practicing this, it becomes second nature, and doesn't hold you up at all. Commit histories also become much quicker to parse through, especially with "log --oneline".
I notice no hindrance on my speed because I write good commit messages, and always value them when I one back to them.
> And the subject line should complete the sentence "If this commit is applied, it will...".
I kind of do it like this. Others use subjects that complete the sentence "This commit...", so their subjects will start with "adds", "fixes", etc. Though that adds one or two extra characters!
I think that’s opposing incorrect things, in this case. I think the spirit of The Law should be clear and kept alive. Whether to “[If applied, this commit will] <commit msg here>” or “[This commit]<verb> <work description>” is a tiny matter of difference when the point is clear, straightforward, complete commit msgs of logically discreet-and-coherent commits. Both the above formats would fit the bill, and I think I’d be thrilled if my biggest issue w code commits amongst my team were only these two slightly different formats.
The other thing I do is add a "MODULE:" prefix to my commit message. It makes them way more readable. Lots of other people do this too; you can see this in Redis, Linux, and Go codebases, for example.
So your message might look like "router: add support for path vars". Much easier to read than "Add support for path vars to router"
I don't stick to it as a rule but if it's clearly grouped in some way it gives people in the wider org a high level fyi about the area of code you're messing with.
Makes it easy to jump over the commit when bisecting for regressions or hastily looking for the buggy commit to revert.
I also use something like this. A variation of Angular's commit convention[1]. There are tools like commitizen[2] that can help new adopters to build commit messages.
I work at a place that used to prescribe this. I dislike it because it adds too much repetitive noise to the abridged git log. (It occurs to me that you may be making this point in a tongue-in-cheek way.)
Originally I suggested we put the JIRA number/link in the commit body, but then I learned about git-notes to add metadata to commits and now I kinda want to do this with the semantic labels suggested by the thread OP, too (I currently use the schema suggested at https://seesparkbox.com/foundry/semantic_commit_messages).
Notes kinda bloat the repository with additional objects, and they can be removed independent from the commits, so I'm a bit iffy on using them for this.
The branch is called feature/<Ticket ID>-<short-description>
The commits contain usual commit messages. When merging, changes are quashed. The final commit message is similar to the branch name and includes the ticket and a short description, which is similar to the mentioned pattern (bump/add/change/whatever).
I find that awful, because now half your subject line is burned by completely useless information: the JIRA ticket number doesn't provide any useful information when reading a git log / summary.
Yeah and if you no long use JIRA it's not useful. Whatever bug tracking software you use should allow for commits to be attached to bugs. But that should stay in the bug tracking software, not the VCS.
I see pretty consistent vocab used across orgs anyway, so given there is shardd domain knowledge / language at play, im not sure what the goal for standardization is?
Don't take this the wrong way, I use a lot of these when appropriate, but I don't think I could agree that refactoring must be just a refactor, and I don't think I want to limit anyone's commits to this list of change types either.
Forcing changes into these words means a lot of stuff is pointlessly bucketed when more appropriate wording could be used.
Say I want to push a commit "prioritise shipping route a over route b".
So I have to put it under start? Optimise? Fix? Why? Prioritise is the right word, why not just use that? Why play mind games when we have a whole dictionary to draw from?
Mercurial inherited this style of commit messages from Linux email-based code review (the Mercurial originator was a kernel hacker), because in that workflow your commit messages are kind of a persuasive essay for why your commit should be accepted. I believe that writing commit messages with that kind of goal in mind, thinking “why should you take this commit?” is a good motivator for writing something good and useful.
I think my favorite (in terms of humor) is a commit from mpv complaining about locales and encodings. You can practically feel the committer's sheer frustration.
> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.
It's where the commit message comes in the process that makes them noteworthy. They're the final chance for a developer to vent their spleen or blow their trumpet before signing off on the results of possibly days of work.
Back then when I was working for a Japanese outsourced project, the code won't compile unless the computer's locale was set to Japanese because the C code had comments in Shift JIS.
My favorite (in terms of dark humor, if we’re honest) is YOLO, one of the more interesting deep learning object detectors. [1] It is the exact opposite of yours in every way. The code is brilliant however.
> But maybe a better question is: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook.I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to.... wait, you’re saying that’s exactly what it will be used for?? Oh.
> Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait.....
> The author is funded by the Office of Naval Research and Google.
Yeah, my own resume isn't (or wasn't, it has been years since i updated it) that flair-y, but it does have a sidebar with screenshots of stuff i worked on (with my face doing a weird look at the top) and is in mostly prose style with bold text for each of the projects i worked on (bold helps to scan for stuff, at least that was the idea). I've been told a few times (by friends) that this isn't how resumes are supposed to look like but my line of thinking is that if you are so stuck up that you want a specific format for a resume, then you're most likely too stuck up for me to want to work with you :-P.
I was reading it and kind of interested, and thought this line was funny:
> I have a lot of hope that most of the people using computer vision are just doing happy, good stuff with it, like
[...] tracking their cat as it wanders around their house
And then suddenly realized I've wanted to do exactly that for a long time. Well... specifically install a camera that can detect my cat on the counter (and not my hands doing stuff) and sound an alarm/puff air to get him off.
Could this work for that? I think it could! I know my winter project...
On the topic of cats vs automation, I'd recommend reading this post [1] describing the arms race created by the writer's cat attempting to break into an automated feeding machine. HN discussion here [2]
We had an automated feeder with two chambers, each held down by a rotating, ticking timer switch. You rotate the switch to, say, 12 hours in the future, and 12 hours later the slot lines up and the lid pops over.
After many months, my cat learned to stand on top of the lid and use both paws to rotate the switch forward until the lid popped open.
This blew my mind. The switch was separated from the lid. I could imagine the cat attacking the lid itself, but this separate mechanism, requiring a motion completely distinct from the motion of an opening lid, and requiring patience without instantaneous reward or even evidence that it was going to work.... I just couldn't believe it.
There are few tech stories I enjoy more than the back-and-forth of breaking and improving a thing. A story where one side of the conflict is a cat means this might be my new favorite!
I've had good luck using low tech puzzle feeders with my cats. Each one is a bowl with a maze inside or a tower with various windows and trap doors they need to work the food out of to be able to eat. As the food levels get lower the pieces get harder to reach, which allows their laziness to take over and stop them from eating too much.
I love how he just randomly carries on his own internal dialog in his writing as if he really just doesn't give a fuck who's reading it. Totally brilliant!
I love seeing a section like this when reviewing a paper. I really wish more authors would include one. (Goodness knows I've chased down enough dead-ends in some of my own research efforts.)
Yeah, it's fun, but, seriously, git log is messy AF. I wouldn't appreciate it a bit, if somebody would do that to a project I'm involved in.
What's funny, though, the paper (written in a pretty much the same "fuck you" manner) is much more readable and informative than the average. Which says a lot about science papers out there.
Save a file in Notepad. Open in vi. See that it is different. Find data in the database, no clue the weird characters were originally supposed to be. And so on and so forth.
I once wrote a reasonable program and sent it as a bug report to the maintainer of the Perl module DBD::File. He sent it as a bug report to BerkeleyDB. They said they never thought about it but yes, that would be silent data corruption with no way to recover. The program? Maintain an address book in a BTree sorted in the current locale. Enter names. Change locale and insert something else. Voila! Lost data with no way to recover!
I had to write the following just this year because SQL Server still defaults to using CP 1252 for text. The culprit? One of those damned stylized quotes that Office loves to insert for you. The code:
Linux is not entirely UTF-8, though plenty of people treate it as if is so. Even if on Linux, you might need to consume files from other OS's or other Linux systems with different locales. Once had issues with systems configured with "C" locale vs "en-US" should have been near identical, but enough slight differences to cause failures. Been +10 years, so I dont remember the details.
Windows, the OS, is UTF-16 (or UCS-2 - I forget the details between the two), SQL Server has, for historical reasons, defaulted to CP1252, probably for compatibility with Office components.
But, it's not really a Windows problem, per se, because you have to deal with this issue if you deal with data originated from numerous Windows apps, even if on Linux. Yeah, you can insert a byte order mark (BOM) to indicate UTF-8, but most tools expecting UTF-8 actually dont check for the BOM and blow up in interesting ways if present. Ive seen this far too many times. Enough that anytime I see an encoding error from the likes of Python or Ruby, its an instant recognition (I do a lot ETL work from a number of vendors, so I see a lot of different files "types").
This needs to be REQUIRED READING at the Open Group and the ISO C standards committees.
I'll quibble just a bit and say that:
a) the C locale should be a UTF-8 locale...
that tolerates invalid sequences (because
the C locale historically is a just-use-8
locale),
b) even with new functions that take a locale
handle, we need functions that use a global
one, however that global one should be set
once and NEVER changed in the life of the
process, and it should be set either before
main() starts, or before main() does anything
that needs a locale, or starts any threads.
Really? I would say 90% of the commit message is technical, rest is just emphasizing his frustration, which is pretty much justified. Yes, he is a person that seemingly gets pissed off by bullshit design decisions and whatnot. So?
I disagree. Words are just words and we give them meaning. Being derogatory and unkind to mentally deficient folks is ethically wrong. Using that word in a different context to communicate frustrating imo is fine.
Try telling that to my boss, who has an autistic child. I held your opinion until I stuffed my foot in my mouth in a meeting. Now I don't use that word.
Wait until a derogatory word being used effects you directly. It's really easy to just not use words flippantly that you know some people have an issue with. You are clearly aware of how it can be offensive and for a lot of people your definition is still just a callback to people being unkind about mental illness. History matters too and if you choose to ignore that and use it anyway you're just being unnecessarily inconsiderate.
That is... wonderful. I've spent some time dealing with locales in C and other places that depend on the things being discussed in the commit. Just reading it bring back some of the rage I felt.
I do disagree with the assertion that it takes a lot of code to convert between the various UTF variants, 3 pages is an overestimate. https://stackoverflow.com/a/148766/5987
For anybody wondering, the likely origin of the invalid character is somebody using an Apple Ireland/UK keyboard layout where # is Option-3 (AltGr-3), and non-breaking space is Option-Space (AltGr-Space).
I recently added a Rake task to one of our builds which checks for the exact problem mentioned in GP, after having 3 separate occasions in the last 6 months where OSX "smart" characters have changed the encoding of a file consumed by things expecting pure ascii.
Unfortunately it is a bit of a hack that shells out to "file -i", but I'll take it over hours of frustration.
I don't know how many times these non-breaking spaces caused problems. I think linters should prevent commits that contain non-breaking spaces. And if really one is needed, it should be encoded as ` ` or with whatever encoding is relevant.
…or fix the non-Unicode compatible systems that are consuming the commit messages and breaking? If they fail with an nbsp then they’re probably also going to fail with more obviously useful non-ASCII characters.
How often are non-breaking spaces purposely inserted vs accidentally? And the tools might handle them fine but will produce strange results or errors. An example is inserting a non-breaking space in a document or string. It will prevent word wrap, which might not have been the user intent. A linter that requires these spaces to be explicitly set in encoded form would avoid these issues.
Non-breaking spaces are required to typeset French correctly, at least: https://en.wikipedia.org/wiki/Question_mark#Stylistic_varian.... The accidental insertion of non-breaking spaces is a possible issue, and it's a bit harder to detect than other typos, but it's also probably not as bad as other typos. Overall I think it's a bit hard to make the case that they should be disallowed.
I was forced to gain very intimate knowledge of a web based rich text editor that would use non-breaking space characters as markers to monitor current user selection.
My commits are usually short and sweet - to the point.
I document my code very well, however.
One of my strengths in a previous life as a Master Automobile Technician was the ability to document the entire process -- from duplication of a concern, to troubleshooting, to correction, to verification...it's literally how I got paid (which I never understood why so many automotive techs took short cuts while documenting, especially for warranty concerns where you deserve to be paid (by the manufacturer) for everything you did that was necessary to fix the concern the first time).
I could be mistaken (life-long coder, former network engineer / architect for nearly a decade, but I'm currently in my first-ever role as an actual backend developer). I think I was told to keep commits to one line unless absolutely necessary. I'll have to bring this up though. I like the idea of searching through the git logs for specifics, as opposed to having outlook search through the git commits for actual pieces of code changed, or error codes which might not actually be there.
In either event, at least more descriptive, yet still short, messages such as what strictfp suggested "Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII" Although I really like having some reasoning and logic - or how/why in those easily-searchable logs as well.
I think some people remember everything and some people don't (although we're all probably roughly similar logically). I have a hard time remembering what I ate for lunch yesterday - that's why I count on good documentation to function as efficiently as possible in the future. I've worked with people who have an absolute uncanny ability to remember 'stuff'. That's impressive, but I do not have that ability myself.
> I think I was told to keep commits to one line unless absolutely necessary.
The advice I've heard: Your first line should be a concise summary of the commit. This is because a lot of UIs only show the first line up front. (GitHub, git log --pretty=oneline, etc.) However, it's okay (and often encouraged) to go into further detail on subsequent lines.
Commit messages should describe why you're doing something ("X asked", or "[reams of supporting evidence why this needed to be made faster but more confusing]"). It provides context to current reviewers, and future archeologists who wonder what you were drinking at the time. Perhaps you had a good reason for doing [insane thing X]! Perhaps you didn't. If you didn't write it down, they might change or leave it, and break something or prevent something from getting a proper fix.
Code comments should be notes to code-readers that are relevant at all times until changed or deleted. "How to use this", "beware changing X", "Z is hot garbage and should be replaced if used for Q". Ideally you'll have asserts or tests or something that actually enforce this, but of course that's not always a realistic option. Comments in code will follow the code around, and don't require chasing code history through N layers of refactoring and indentation-wars, which is what makes commit messages mostly inappropriate for needs like this.
I do feel like Git commit descriptions are severely under-utilised for sure, but I believe there is a reason for that which until fixed, will prevent rich and contentful commit descriptions for flourishing.
In the article order: the screenshot is from a commit detail page. How often do you land on this page? You need to specifically click through. If you are in a commit list, the only thing that sets title-only commits and commits with description apart is an ellipsis link which practically blends in with the background. It is not very well integrated nor discoverable. Also I don't believe the commit descriptions render as MarkDown (unlike issues) which is also a shame as it feels like less a doc then. But I might be wrong on this. But even outside of GitHub, how many other UI/IDE plugins and other kinds of Git tools restrict commit display to just the title and put the description on the sideline? Most of them. I think this further leads to the currently low value of the commit message being searchable. Since it is exceedingly rare for good commit messages to exist, no one thinks to search them. People default to Google, when their own project's codebase/knowledge base could hold the answer to their query. I don't have much of an opinion on the commit message telling a story / having a human touch. I mean, it doesn't hurt I guess, but until _full_ commit messages become more "mainstream" (for a lack of a better word), they can be as human as they can, but they will live in solitude.
In the same vein, I wish there would be a standard workflow for putting code inside commit logs.
The typical use case would be database migration scripts : IMO they are always a pain to version properly because fundamentally Git and all the other software versioning tools let you describe the "nodes" (in the graph theory sense of the word) of a codebase ("at commit A the code was in this state, at commit B it was in the other state"), but severely lack when it comes to describe the edges between nodes ("in order to go from state A to state B, you need to execute this code")
I think the temporal dimension of software engineering is still poorly understood, and severely undertooled.
Strongly agree. I run into this a lot with stuff like style commits. You want to ensure that a change hasn't slipped under the radar.
I tend to go for something like:
This commit was generated.
<shell script here>
It would be super awesome to have a tool that easily verifies that A->B can actually be reproducibly accomplished by performing the actions in the commit message.
Sometimes no. For instance a database migration script is not equivalent to the diff between two database creation scripts.
While it should be possible to take a database creation script (state A) and a database migration script (edge A->B) and infer the new resulting database creation script (state B), the reverse is not true.
How do DB migrations relate to the source control graph?
It seems you're implying the codebase changed as a result of a script that itself is not source controlled. I can think of style commits falling in this category, like one of the children of this comment mentions, but DB migrations don't seem to be related.
Where they're really valuable, IMO, is when you're tracking down when/why a change was made with `git blame`. When you're looking for the reasoning behind a change, it's extremely helpful if there's a detailed commit message going along with it.
If only important/tricky commits have long messages it's fine. Most IDEs will show you the full text if you hover over the line, so when working on something it's easy enough to check them. I'm huge fan of the timemachine-like "Show Git History for Selection" view that IntelliJ has, so usually there's no reason to go to a Github/Gitlab commit page.
> I don't want your entire life story in my commit log.
I[1] want enough debug information in the commit log to be able to reproduce the issue without having to go on web hunts to understand the problem. Especially when the change appears to be trivial on the surface, because these are the ones that can turn out to be rabbit holes.
I don't want to have to interrupt you to get this information because you didn't write a good enough commit message, and you probably don't remember anyway. I don't want go look at an external issue tracker that i may not have access to, or may not even exist anymore.
[1] Where "I" is: me, your future self, a future maintainer, a junior dev, an open source contributor.
To me, at least, the issue with that commit message is the signal-to-noise ratio.
There is a lot of exposition for each piece of information. I prefer a more declarative commit message.
However, from the writing, I suspect this is just due to the committer not being a native English speaker.
e.g. the first paragraph doesn't lose any important information trimming it down to:
"After adding a test matching the contents of router_routes.conf,
`bundle exec rake` fails with:
ArgumentError:
invalid byte sequence in US-ASCII
"
Realistically, it would have been a better commit message if they'd given the shortlog SHA where the test was added that exposed the bug rather than an explanation of what the test does.
"After adding test <testname> (08c3e17), `bundle exec rake` fails with:"
Hmmm, I've always believed that no commit should break a build, even if you're committing the fix right after. Otherwise you're going to cause problems for `git bisect` or other practices of going through the history to find where a problem may have started.
Do other people commit breaking tests and then fixes?
I think this depends quite a bit on what other contributors are doing - it's one of those cases where several approaches are acceptable, but inconsistency is not.
"Commits should always build" is one doctrine I've seen. As you say, it makes bisecting and other error-analysis approaches easy. On the other hand, it risks either having large, opaque commits, or adding overhead to make intermediate commits build - possibly with flawed/meaningless behavior when they do.
Another is "the trunk should always build". In that case, you'd just squash branch commits down to logical groupings that are easy to analyze, whether or not things build. You can bisect on the trunk, but lose all guarantees about state on branches.
Finally, I've seen variations on "no commits that break the product", "no commits that make things worse", or "no committing failing tests without subsequent fixes". In this case, you can't generally commit broken builds, but can specifically add failing tests. The first rule just means "adding failing tests is ok", the second means "converting runtime bugs to failing tests is ok", and the third means "write your test and fix, but split (and ideally tag) the commits". All of these break bisect, but they guarantee the project itself won't become more broken from commit to commit, and they can help with other forms of reasoning about where bugs first occurred.
Every approach there seems viable if you stick to it. If there's no established practice, I suppose the best choice would be based on what sort of work and debugging is most likely to apply.
I might be missing something basic here. Isn't the "no commit should break a build" impossible to enforce on a codebase where you need to push a commit to run the tests?
Something where you can't test locally, like when testing on multiple architectures or when the tests just take too long for a laptop.
In my workflow, that would be in a feature branch, and exploratory branches can certainly break, but before I made a PR I would rebase my changes such that none of the commits broke the build.
It's also a different situation. The original one is "I've made a test that shows a problem." Your example is a surprise "I don't know whether this will pass my cloud-based tests." I would edit my branch if I had a surprise failure, since my initial code clearly wasn't correct.
I do this, but not in a way then end up on master. It's a driving force behind my preference for squash-and-rebase merge patterns.
A good bugfix PR is often two commits then: one with a test to catch the breakage, another to fix it so the tests pass. Reviewers can see the failing-then-passing CI job logs, so if they agree your test catches the bug, they have additional CI-automated validation your fix worked.
Then as long as you squash when completing the merge, you get the best of both worlds.
It goes without saying that commits should include tests that cover the change, where possible.
> immutably
That's what makes this modus operandi so powerful IMO - comments in code may go unmaintained, tests may start failing for other reasons, issue trackers come and go, developers leave the company, documentation rots.
The commit message is (unless you have a bad actor) immutably linked to the original change, and that's exactly why you should be thorough in expressing its reason for being. I can git checkout the point in time (perhaps having bisected) and have the information to allow me to reproduce the issue.
≥ Especially when the change appears to be trivial on the surface
Comments about the code should be in the code, where the next dev will see it. The more trivial a change, with far- reaching implications, the more important this is.
Doing so has heaps of benefits: future devs understand ramifications, shows that this code has been scrutinized, makes it easier when doing refactoring /yanking, or porting code.
That said, leaving the life story out will always be a good idea.
Pagure [0], Fedora's git forge, hosts code, issues, docs, and pull requests as four separate git repositories under the hood [1]. However, only project administrators can clone most of those repos.
I can imagine that working to a degree: Make a fork of a commit at an issue, then merge that fork back in with master at point of fix. Bit of a mess in the tree though.
I don't want to read what the code does (I can read that myself, thanks!), I want to know WHY it does it the way it does it - especially, if there is a more obvious, better way.
Also: People leave companies. Or die. At some point in time, you won't be able to ask the original author.
No thanks. If I had a dime for every function that's so obviously self-documenting to what it does.. etc. If you tell others what it does and why, concisely and thoughtfully, no one has to try and mentally parse the what of some clever undescriptive block of code.
On the other end of the spectrum you get ImageMagick useless commit messages[0].
That extreme aside, I'd rather have commit messages that delve into the why-and-how the commit alters the behavior to the better rather than cryptic message as 'Replace invalid ASCII char'. Now we have documented reasoning and thought process that can aid future debugging. They can also be beneficial for new devs hacking on the project, or students learning how to implement and improve systems.
Personally, I enjoy reading these. The Go commits often have commit messages like these, and they are shared on HN often for a reason. They're learning material. They can't go on a wiki because they're tied to particular set of changes in a particular point in history. They also can't be comments on the code because they're tied to particular lines in different files, and code comments can only cover a set of consecutive lines in one file.
One recent example I could find is this[1]. Yeah, it fixes ^Z, but why didn't the old approach work? Why did it work for some time then didn't? How did it change? Why is this commit optimal, if it is? All of this along with scenarios to reproduce the issue.
Give me your life story anytime over cryptic message.
Agreed. When at some point the website that they are pointing to changes in the future they will lose all context on why a change was made.
I believe in the "plane flying across the ocean without WiFi test" or basically anywhere without Internet access. If I am on a plane flying across the ocean without WiFi, do I have the information in the git commit to understand what happened. A git message that consists entirely of a link to a website is useless in that case.
You can write the brief summary in the first 80 characters, like OP did. Then write details in the body below, in case someone needs the context. Most tools display only the first 80 characters unless you expand the body.
This case is probably longer than necessary, but I've saved a day of debugging on multiple occasions due to someone (also myself) leaving some lines of context, reasons and reasoning after the high-level description.
> I don't want your entire life story in my commit log.
Why not? Where else do you want it? Is something forcing you to read the full commit log?
There's no length limit on commit messages and commit messages are mostly out of the way. Most VCSes have a way to only show you the first line. So if you want summaries, that's what the first line is for. If you want the full story, that's what the body is for.
Combined with annotate/blame, commit messages can be very helpful source-level documentation. Nobody has ever complained about too much documentation, and commit messages are the perfect time to document what happened because it's one of the few times where our tools actually force us to write something in order to proceed. As long as we're being forced to write something, write something good and informative.
I think the problem isn't the length or content of the commit message, but its organization. It needs to have the most important information first. It reads as an "entire life story" because it is written in a narrative, sequential form. Better organization would make it skimmable, and later coders could only read as far as they need to.
If I'm searching commits, I'm trying to find record of what changed and when. I only want clues, and quick skimming is paramount. I want no personality. I want concise descriptive commit messages.
That said, we reference an ID from our project management software with every commit, so once I find the commit I'm looking for, I can reference it back to external documentation. I still discourage personality there as well because it can get out of hand and clutter the comments, but it's more forgivable than being on the commit itself.
The pull request is a good place to put such a large amount of information. That would also be a good way to make sure it is seen by the broader team instead of burying it in commit history. You could make the argument that then it would not be part of the git history and therefore could be lost if you change hosts.
I never understood this philosophy. What makes Git more eternal than any other technology? Why is putting all of your data in one monolithic tool a good solution?
You might change your issue tracking solution. You might change your host solution. You might change your review platform. You might also change your VCS solution. Nothing is eternal.
The commit message itself is way more eternal than GitHub ephemera. There's plenty of old codebases in git that were imported from SVN (or even older RCSes) with all commits intact. What's likely not intact is data in ancient issue trackers from decades past. git is a DVCS, so anyone can clone the repo and get all the commit information. Cloning the issues and such is not nearly so trivial, and isn't a part of the git protocol itself so there's no guarantee it's in any kind of interchangeable format.
Important information should not just be in PR comments. It should be added into the commit information itself so that it'll be maximally available going forward. A good, fully explanatory commit message is a huge asset, and those commit messages will exist for the entire lifetime of the codebase. Anything else, not so much.
I didn't say anything about git. I said the commit messages are eternal, and none of those changes will change the commit messages (except, perhaps, changing the VCS, but usually that will preserve commit messages too).
You'd be foolish to do a VCS migration that discards the commit messages. I've never seen it happen personally, as people tend not to be that foolish. I've worked with legacy codebases that went from CVS -> SVN -> git and all of the commit messages going back to the very beginning are intact, because why would you ever do a migration that doesn't maintain them?
This is a really convincing argument. I was with the parent commenter until I read this; I was like, this is totally PR stuff! But hadn't considered offline situations, or host switches. Thanks op!
> I don't want your entire life story in my commit log.
I agree with this, but I think yours is too short.
Scientific papers typically introduce enough information such that a person familiar with the field but not an expert in that particular area can understand generally what's going on.
That's my ideal for a commit message as well: someone generally familiar with the codebase but who hasn't looked at this specific code (or perhaps not in a few months) should be able to understand what's going on; then the job of the reviewer is basically just verification.
My "template" is normally something like: 1) What's the current situation 2) Why that's a problem 3) How this patch fixes it. So in this case, it might look something like this:
---
Convert template to US-ASCII to fix error
$functions use `.with_content(//)` matchers to do X. These matchers require ASCII content. The $foo template contains a non-ASCII space; this results in the following error:
ArgumentError:
invalid byte sequence in US-ASCII
Fix this by replacing the non-ASCII space with an ASCII space.
---
No need for a life story, but still searchable, and has enough information for even a casual contributor to do a useful review.
I, too, would rate this a substandard git comment.
Dave basically vomited a bug ticket of information, which is highly contextual and irrelevant ... like the lines he was faced with, which tell us nothing in the future nor anything we could not see in the change. The error is known, from the ticket being addressed. Documenting what error a bundler throws in the application deployment, within git seems...silly, since it will likely not apply to all points in time. That's why we have separate issue tracking.
There was a whitespace encoding issue AND the developer didn't really understand the issue, since they ended with "One hour of my life I won't get back.". Over my 20 years, I've seen this EXACT scenario multiple times across multiple companies. Some jr engineer gets stuck with some troublesome weird error in a corner-case that ends up being a non-standard whitespace. It's a learning opportunity and he lamented it because it was different and nobody told him "we could stop this from happening again, generate a new issue".
There are salient improvements that the git commit would benefit from both comment changes and additional code:
1. Include a (new) feature ticket that is linked to this issue - to create a process that doesn't allow for this again (eg fix a linter)
2. Include the name of the bug ticket (Convert template to US-ASCII to fix error) in the commit title, that was being addressed.
3. Create a test to specifically enforce the us-ascii encoding or add necessary rules to a linter.
For critical applications, I for one would like to know the story behind a commit, preferably in the commit itself and not a reference to an external system like idk, Jira.
My favorite examples of commit messages are the Linux kernel, where you can tell that they're being specifically crafted instead of just used as a work log to be ignored. This means that ten years down the line, people can still see when a change was made and why, who was involved, who signed off on it, etc. Have a look at the commits at https://github.com/torvalds/linux/commits/master
This is true when you can reference the commit to an issue. Then, seeing the simple commit message you can select if you want to dig up what happened by reading up the comments at the issue.
On the other hand it really gets into my nerves when people don't use the task/issue/whatever manager system appropriately. Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline. In general I'm really disappointed by the majority of my colleagues for the lack of comments inside and outside of our codebase and this is a persistent issue, at all the companies I worked for. Me along with other similarly irritated people, always ask for documentation if it is not given.
> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline
Some people do it for job safety. The logic is if you don't document things and the knowledge is only in your head then you are more valuable, they can't get rid of you easily. If you document everything meticulously, then you are easier to replace.
> Some people do it for job safety. The logic is ...
Has anyone actually seen this logic work out well for the person that invokes it? Generally the type of person that uses it is one that you probably don't want on your team.
I know in instances like that, though, my next step would be to start working on contingency plans, as if someone has proven themselves to be indispensable, then that very fact is a risk that needs to be managed.
> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline.
This is assuming documenting the pipeline would have been helping! You may have spent a few days instead figuring out why your seemingly identical setup couldn't reproduce the build...
Not that I'm bitter about build systems or anything.
Talented coworkers dont need documentation very often... If someone cant figure out how to compile something, its likely they are missing knowledge about the language in general...
Git greps work if you're trying to search all the logs. I would think this detailed documentation would be most important when you're trying to understand a specific file or line of code. In that case, it's:
Generally speaking sure, there's no need to make things more complicated than they are, but the author even found some evidence in the history that indicates other people found this message useful.
The powerful thing about this is having everyone put this kind of info in the same place IF they think it might be useful to the next person.
Still, you've left out the details that you've confirmed that there's no other instances of this in our codebase. I'm also firmly in the "all commit messages should include a test plan" camp, so you should at least say how you found the error ("bundle exec rake was run before and after").
I get you're being terse for demonstrative purposes, but even eschewing verbosity we should still convey all the pertinent information.
One downside to that approach: it requires your issue tracker to be stable for long periods of time. I've worked in a number of places where that's not true and you end up needing to figure out that the #123 linked by the system you're using now was actually #123 in the old system and was migrated as #456 in the current one.
There's a balance here and I especially like that this commit message has enough information to make searches really easy should you need to do something like that.
That's why most guidelines for commit messages prescribe a short description and an optional long description. The message in the article does not have a short description, which would have been easy to include. For that reason, it's not "My favorite Git commit message" either.
I agree with the sentiment; this message is quite long for an invalid character in a file.
House style in the companies I've worked for is to include a link to a bug report and or code review that provides more context for those who want it. Even without that added context, I'd rather know
That's great for you. You don't want that. However, if you code in a team, doing everything for your own wants rather than considering the needs of the team (present and future) is just bad software engineering.
One of my favorite pranks is to put control characters in the commit message (like the bell) and then you get an auditory notification anytime anyone nearby opens your commit messages.
Also, if you haven't seen it before, read the Linux kernel Changelog. The latest Changelog can be found at [0]. Almost every commit tells a story, unless it's a trivial fix. If there's a bug, it often contains detailed analysis and rationals, and it's a form of important documentation.
Although it's not always practical to follow them in personal/work projects - Linux commits are the results of multiple rounds of reviews, and the commit log is its justification - but in personal/work projects, commits are made in real-time as soon as you debugged/refactored something. But I still use Linux kernel as a guideline for my own commit log, at least for new features or bugfixes.
I was working with a research team at UCLA and we used Wolfram Mathematica to process our results.
In Mathematica, pretty much any object can be a variable name. You can drag a JPEG of Kim Jong-un into Mathematica and integrate an expression with respect to Kim Jong-un. We'd sometimes get a kick out of that.
Near the end of the program, our whole team needed to process the last 3 months of results, but we were all getting consistently incorrect factors off when running our Mathematica notebooks. Five hours later someone discovered that one of the variables contained a stray Unicode whitespace or null character (or maybe a non-Unicode blank Mathematica object, such as a Graphics object with 0 area) that someone must have accidentally spawned somehow before saving and distributed the notebook to the rest of the team. Since Mathematica didn't recognize it as spacing but as part of the variable name, making it a different variable, the result of our integrals were incorrect. E.g. the integral of x^2 is x^3/3, but the integral of xx' is x^2x'/2, so the multiplier would be off by a factor of 3/2.
After discovering and selecting it, we "cut" it into the clipboard, pasted it into another Mathematica notebook, saved it, and it was never opened again.
This reminds of a Markdown issue I've had many many many times - sometimes (and only in some engines), headings would not render and I'd only get '# foobar' instead of '<hx>...'
It took too long for me to track the issue. When I write '#' using alt-3, I then write a space and oftentimes I don't lift alt soon enough and alt-space creates a non-breaking space (on macOS). And some/most Markdown engines don't recognise '#nbsp;text' as a heading.
I suspect something like this happened in the commit linked here.
This happened to me all the time, especially with python2 code without encoding declared (which caused a failure to parse the file because of the comment).
I’ve since switched my editors to highlight such characters.
I love these commits. Then don't have to be this verbose, but they have to tell a story of why things were done. I can sort of deduce the what from the code itself, but the why is sometimes shrouded in mystery.
I started with these explanatory git commits a few months ago and they are super useful, even if you're just reading your own commits from some time ago.
Putting this in the commit is not easily searchable, not universaly accepted and thus not expected, not practical and certainly can't involve discussion easily. This can be replaced with ticket number as most ticketing systems will actually read commit logs for those in order to associate ticket with code.
Now, there is that problem with decoupling code and story, but this is technical problem, nothing stops Gitlab and friends storring issues and friends in the repository itself.
I use these commits, but also use a proper issue tracking system. So I'm not quite sure your comment applies. The reason I'm doing this is:
1. If I'm looking at some code, I want to see its history without having to switch between git(lab|hub) and jira or whatever system I'm using.
2. The issue tracking system doesn't necessarily have some kind of resolve, a summary of what and why happened. It does have a description and a series of comments, but a summary is usually lacking.
3. I believe my commit history will far outlive any issue tracking system I use. So I'm safer putting information into both.
To me its beautiful, because it does what an issue tracking system does not do: it explains everything. Who, why, how, what, when. It is beautiful and simple documentation.
Issue tracking typically revolves about the who, what, when - not why it happened, or how it was resolved.
This is why I believe that code can never be fully self-documenting. I can't understand why the code exists from reading it. All the floofy contextual stuff is missing, and commits like this help to explain the floofiness.
I think that you are not using properly the issue tracker.
You absolutely need as a bare minimum the why and the reporter of the issue.
When you do a blame you can easily see the issue id and open it with 1 click to understand why that code exists and who requested the change.
- Whenever someone asks a question in a pull request, answer it by putting a comment in code.
- Include a ticket number (for whatever ticketing system you use) in the commit
Why? I find that commit messages are black boxes. They only come out when doing a git blame, but they don't show up in my IDE. Instead, I'm more likely to run across messages like this when they're comments in code or discussions in our ticketing system.
I think if we had tighter integration among our IDE, git, and the ticketing system, detailed messages commit messages like this would be extremely useful.
I'm glad my near-exact pain has been experienced by others. I had an undefined function call of ' ' in a ruby script years ago. Finally, I turned to a hex editor at the suggestion of a colleague. The culprit was non-ascii whitespace that ruby decided should be a function declaration. Copy pasta error out of a hipchat code snippet.
Basically, ascii is encoded as integers. '32' represents a space. However, ascii is quite limited and if you want non-US-centric characters, you need to use other character sets. For a myriad of reasons, once you go into unicode (the Universal Character Set) there are lots more options for characters. For example, there are multiple whitespace characters.
These exist to give different widths or other adjustments to text that is non-ascii. What likely happened in the post (and what did happen to me) is copy-pasting from some document that changed a normal ascii space (that Ruby would expect and know how to deal with) into a unicode character that Ruby interprets as any other character. It would be like having a stray 'g' in the line, but you, as a developer, don't see it. Fun :)
I'm always intrigued when developers complain that it was "[surprisingly short amount of time] of my life I won't get back."
Maybe I'm not as clever, but I'm lucky if I fix an issue like that within a few hours. It can sometimes derail a workday. In fact, fixing in a few hours would be something worth celebrating!
When reading code, you want somthing simple and to the point most of the time (if that). Yes, it's nice to be able to tunnel down into the details - but this was fixing a bug, the code should just have a short description and more so - the reference number of the bug and if you need to, you can look that up on your bug tracking solution and get that detail.
Big problem with detailed comments, things change and comments (like code) can become obsolete/redundant and not reflect the code as the code got updated, but the ocmment did not.
No solid solution really and gets down to preference and also mindfulness of the life of code/comments.
Would be great to have code that you could rightclick and get the documentation, some woudl even prefer being able to write documentation and that gets turned into code, others would love code that could could be autodocumented. Get's down to taste, preferences and more so, experience. See, every programmer over time will eventually encounter a situation on somebody else's code that they are maintaining, fixing or replacing and find that the comments do not reflect the code. You eventually get down to the stage that you almost actively ignore comments based upon such experiences.
So whilst a detailed description in the form of a comment is good, it can and should be elsewhere, either the initial spec and program documentation or in this instance - bug tracking software system and just simple short line with bug reference or indeed just bug reference.
Contrary to the author's point, I don't think a git commit becomes more beneficial to the readers by adding a human context.
Building "compassion and trust" actually distracts readers from the essence of the commit: what happened and why. I am not discarding the importance of human element in collaborative endeavors but maybe such area should be pursued outside of a version control system.
There's not really any case where another source control system works better, but lots of cases where they work just as well.
Mercurial, for example, is functionally identical to git, but some people prefer its interface.
For a highly centralized organization where people only ever work on the organization's intranet, a centralized source-control system like SVN works well enough, and may have some advantages for the organization.
Actually, I found one just by reading this thread :
Fossil has integrated Bug Tracking, Wiki, Forum, and Technotes.
Which is great since I would prefer not having to worry about backing up these in the first place !
(Then git also has git-bug for at least some of this functionality...)
Git does not handle really large repos. You can search the internet for the term monorepo and see what organizations like Facebook, Google, and Microsoft are doing about that. None of them are using plain vanilla git.
Git is particularly poor in scenarios such as two people working on the same branch. SVN handles this with ease, but with Git it takes a lot of coordination amongst both contributors to keep things working.
I do not believe that is a good pattern. First, two devs probably should not be touching the same functionalities (if they are, they ought to be basically pair programming). So diffs should be orthgonal. If diffs are landing in the same branch, each dev should be using their own feature branch, and ideally PRing them back to the branch, but for small hacks, merge is fine.
Think fractally. The farther you get from master, the smaller and more atomic each commit should be.
If your merges are taking lots of coordination or failing to auto-merge, you probably have some poor engineering hygeine at play. Every time I've had merge fails, it's due to haste/sloppiness or a dev branch diverging too much from a mainline.
My favourite Github commit was someone removing their password from a test list in a penetration testing tool, because they didn't want anyone to know their password. I just tried, but couldn't track it down. The subsequent comment trail was hilarious.
> One thing Dan did here that I really appreciate was to document the commands he ran at each stage. This can be a great lightweight way to spread knowledge around a team. By reading this commit message, someone can learn quite a few useful tips about the Unix toolset:
> [..]
In the spirit of making everyone smarter: simplicity matters. Using the combination of find -print0 and piping that to xargs -0 is much easier than the mentioned abacadabra of characters.
From the xargs(1) manual:
> The options are as follows:
> -0 Change xargs to expect NUL (``\0'') characters as separators,
> instead of spaces and newlines. This is expected to be used in
> concert with the -print0 function in find(1).
I don't like all these information shoved in to commit logs. This should have been filed as issue and then linked to the commit. The issues are much more searchable, readable, commentable, archivable and interactable in many different ways.
I figure that so few people read commit messages that in most part of time this is kind of useless. Specially in an early stage of a project. Things will change as faster as I can type a message such as this.
Code is simple. Humans overthink.
I would prefer a commit message such as: "Fix invalid byte sequence in US-ASCII when running bundle exec rspec." than a dev that keep stucked trying to write a cool message and never fix the issue.
That very much varies. In a project where 90% of commits are “fix”, “foo”, “commit”, etc. then yes. Nobody will ever read that (or do pretty much anything else useful with a VCS).
On the other hand, when every commit message is on the level, the yes, people do read them. Actually, first step when investigating any problem or trying to understand some code is to look at the commit log.
See e.g. Linux kernel or some of the Google-related open source projects (chromium, webrtc, etc.) for examples of good, long-form commit messages.
In a professional context (and not only), one would think that this should be the normal expected good practice. But unfortunately, that is not the case.
It is so surprising (well not really) to see how, in most cases, developers put so little to close to zero effort in writing proper commit messages and more in general to have a clean commit history.
They simply don't care and you keep seeing garbage commits with non-sense to close to empty message and description. Sadly enough this is seen as normal and just accepted.
Every single team I have been working with from small to large organizations I always had to pick up on the "write proper commit history" fight. And even after extensive explanations on why you should do that, people simply don't care and they keep pushing stuff like: "fix", "updated class z" and stuff like that.
Commit history does not seem to be part of the review process.
Sometimes it is just so depressing to see how so unprofessional software engineers are.
If it makes you feel better, biologists are often no better. Our equivalent to git commits is labelling tubes and keeping little excel databases of what has gone where. Often databases stop being updated or people give their tubes esoteric labels that are meaningless to those who look at them a year later. As a research assistant in a large lab, I discovered blood tubes with literally no labelling, and often spent hours searching for samples in the labyrinth of freezers in that lab. It is also not uncommon for papers to be retracted because the original authors lost the raw data!
In an organization I've worked at we used to write good commit messages, about 9 years ago, before we started using Github. Then our good commit messages turned into good PR intros. I think Github and Gitlab etc are fantastic, but I'm a little sad that so much valuable information has been divorced from the git repository itself (and of course, the ultimate fate of those PR intros across the open source world depends on the companies hosting the repo.)
Personally, rightly or wrongly, the fact that I can't use Github/Gitlab to contribute to Django and Emacs prevents me from trying to make contributions to those projects. Similarly I find the insistence on using email to send patches, frustrating, when I know PRs (MRs) work so well. However, I guess Emacs and the linux kernel are keeping their good commit messages in their git repo and not losing them to a hosting company.
Maybe there should be tooling for automatically converting a PR intro to a commit message.
I really like this commit message. I've found that switching to git from more traditional version control systems requires a lot more discipline in some ways. A lot of people just commit, commit, commit lots of incremental changes with no context or story to them. I've seen pull requests with dozens of tiny commits together make up a cohesive effort, but individually are just useless. I've been really having to push my team to spend time to cleaning up their commit history before getting their pull requests merged.
I think it's really important to capture the context and indent behind changes. I may be weird, but when I'm fixing an issue, often try to find when it was introduced, which often provides really useful information for the fix. That's much harder to do if the commits aren't cohesive and the messages aren't descriptive.
I'm one of those people. I make lots of little commits, it gives me space to really make a mess of coding going down some rabbit hole and performing 'reset --hard' when I get too away from myself, and track what I'm doing locally. As long as each commit isn't causing a problem with CI/CD, and my pull request to master is well documented what is the value added of cleaning up commits?
> I make lots of little commits, it gives me space to really make a mess of coding going down some rabbit hole and performing 'reset --hard' when I get too away from myself, and track what I'm doing locally.
I think this is totally OK, just as long as you squash those all down before someone has to merge your PR.
> As long as each commit isn't causing a problem with CI/CD, and my pull request to master is well documented what is the value added of cleaning up commits?
Because it's hard to make sense of all those little commits later, so why keep them around? They're just noise with a very limited future value, and I don't want to have to sift through them in the future. It's basically impossible to clean up those kinds of messes once they get established in master, but it's very easy to contain them at pull request time.
> I like Git commit messages. Used well, I think they’re one of the most powerful tools available to document a codebase over its lifetime.
1000000% agree!
One of my co-workers in my previous job, I miss reading his PR and git message. It's such a joy reading his PR. I still remember reading his PR on introducing Babel to our big, old Rails 4 app before webpacker, Ruby Babel Transpiler came to life. It's like taking a journey with him. You can see his smile, struggle, surprise and all the emotional moments in his commits. He put his findings, why he made this decision, and where he found this solution in the commit msg. I learned a lot just by reading his PR.
I think reading a well organized PR, clean git commits and descriptive commit messages (even the code review comments are very useful) is one of the best ways to learn in work, especially for new hires.
This really just depends on your team/company/culture.
Lengthy commit messages are not really required if you have associated tickets in a bug-tracking or project-management system. More often than not, you'll just be duplicating info.
This is true until the company changes the bug-tacking and project-management software without properly porting over everything because they use different identifiers.
I tend to put a link to the external system with a very brief explanation, allowing someone to quickly assess the what and why with the ability to dig elsewhere for more detail.
Some of this background decision-making information can be included as developer documentation, whatever form that takes, e.g. as comments (usually for low-level) or sibling README file (usually higher-level).
Commit logs will have the greatest detail, but they also are the costliest to dig up, often requiring multiple rounds of `blame`. They are therefore most appropriate to include information pertinent at integration-time, namely code review context/justifications.
Merge commits (such as those created during typical PR/MR merges) have similar potential to include explanatory background, but at a coarser granularity, e.g. feature level.
Adding a test to prevent this error in the future would also be nice to see, something to mention in code review... Which the commit message makes much easier (without explanation this diff is confusing).
I say yes. I like to keep information about the code as close to the code as possible. Issue trackers come and go, and even if you keep the same issue tracker around, how are you going to relate the change in the code to the particular issue down the road?
Let’s hope the SaaS issue tracker you’re currently using never goes out of business or changes the product in a way that makes it worse for you. Or, if you host your own, that it keeps you satisfied in perpetuity.
Referring to the issue ID in the commit message is a fine practice in addition to writing good, comprehensive commit messages. Commit messages that consists only of an issue ID are – in my experience – utterly frustrating to deal with. They tell you nothing more than this change is somehow related to this or that bug or feature, but not how or why.
So the solution to a unreliable issue tracking solution is dumping that responsibility on your VCS? Why not fix the concerns you have with your issue tracker?
> writing good, comprehensive commit messages. Commit messages that consists only of an issue ID
Who said anything about only including a tracker id? The issue here is the extra verbosity in the commit message. What will the tracker tickets contain that isn't in your "comprehensive" commit message?
* So the solution to a unreliable issue tracking solution is dumping that responsibility on your VCS? Why not fix the concerns you have with your issue tracker?*
… Or, and hear me out, how about not worrying about that, and just use your VCS to accomplish something it’s imminently well suited for?
Also, how do you propose I solve the issue of the issue tracking service maybe going out of business or that of a more compelling product coming along?
To me, the primary purpose of an issue tracker is to collaborate on and track work in progress, and that’s what I use them for. I don’t find that they are particularly valuable as historical records of the source code.
But then you switch out your issue tracker service from something like Jira to something else, and suddenly that ID or URL means squat. A git repository can easily be pushed to any git based service be it GitHub, GitLab, Gitea, or something else, and the commit log and commit hashes stays the same.
Porting Jira issues to a different system would probably not preserve those IDs that you entered into your commit message. By all means, refer to your issue tracker in commit messages, but be aware that those references may not be valid in a few years.
Just put the old ID in the new ticket, then search. Or put commit hashes into the ticket and search by that.
> A git repository .. and the commit log and commit hashes stays the same
If issue trackers are so transient and flaky, and VCSs are so solid, then back up your old issue tracker and put it in git. What if your issue tracker stays, but you VCS changes?
Since this is describing the commit and what was done and why, the commit seems like a better place.
In tools like GitHub, if you make a PR with this commit, it will also automatically put the text in the PR description.
I would much prefer this at work over what I usually see with inconsistent commit message styles and not explaining properly what was done, and not following the recommended max length per line.
but "why" includes lots of detail about the steps they went through. Why can't "there was utf-8/non-ascii whitespace in the file" Cover all of that? Why detail all the steps to reproduce?
As a developer, you can also search the issue tracker, and don't have to go looking for commit in the VCS and figuring out how to search it effectively.
In a professional setting, companies usually want this information to live in the issue tracker. Mainly to provide insight to managers/other teams without looking at commit messages.
But it removes the information from the code: you now need to look at the issue tracker to make sense of changes, eg when looking at the history of a file, or with git-blame.
I'd argue that all relevant information that affects the code and architecture of the specific repo should live in commits. They should not only tell you what was changed, but also why; and provide enough context to understand the change in the scope of the repo.
All information that is not directly tied to the code, eg cross-repo/product/etc concerns can go in the issue tracker.
Of course this only works well with a `git commit --fixup` and squash + rebase / squash + merge workflow.
And with monorepos it also becomes a tooling problem.
The commit message still logs in the usual way, but it carries the whole set of information with it, in a way that a centralised ticket system doesn't.
When a developer is looking at logs for solving some problem, they can easily review the rationale for changes.
I'd much prefer the log explaining everything, rather than having to look to a ticket that may no longer exist.
> Why doesn't it? Are closed issues not searchable?
Everywhere I have ever worked has changed ticket systems at one point, and even when they are transferred, the transfer is "lossy".
Heck, just moving from one JIRA version to another can be "lossy".
> How do you provide comments or updates on an issue "relevant to the commit" without arbitrary commits?
In the ticket system. I was not advocating that a ticket system is pointless, because they are very useful.
But a decent commit message about why a change is necessary, especially when it may not be straight forward, can save you plenty of development time further down the line.
A great question! I’m still wondering why we all mostly use separate tools for tickets and version control (and knowledge base, and collaboration, and management/hiring, the list goes on), given we already know a lot about software development.
I think it's more than that. Sometimes I feel there is a disconnect between code history and repo contents e.g the current repo often contains a folder with the entire history of migrations, changelog etc
In my opinion absolutely no.
It should go in a Jira issue and the commit should have the Jira id in the message.
In this way when blaming you are just one click away to the full issue description, with properly formatted text and screenshot if needed.
It will also be visible to all the team in the Jira board and they don’t have to click on the specific commit to notice that problem.
WHY > WHAT... really helped my documentation, comments, commit messages. I think it’s an under utilized part of coding.
I just can’t wait for bigger companies to get onboard. At least in EE/CE it’s tough trying to figure out INTENT sometimes. I see the code, service, headers, docs but you’ve never told me how you intent for me to use this! Maybe your plans were awesome but would change how I was planning to work your thing in. A single page on INTENT would go a long way sometimes.
I really like Google's guidelines for commit messages because they enforce a style like this. It really makes dealing with legacy code much easier when you can look at past commits and see that your predecessors were thinking.
https://google.github.io/eng-practices/review/developer/cl-d...
I intentionally add non-ASCII characters to our code, so that an incorrectly configured IDE or bad tool fails.
75% of the development team has at least one non-ASCII character in their name, so it would be pretty rude otherwise.
It's much better to knowingly reject a tool at the start, since it can't handle ordinary characters, than find out a year later with the first e.g. British customer that it can't handle "£".
I tested that on a text file I was working on ... and I discovered that the file contained a BOM (U+FEFF), not at the start of the file, but at a random point in the middle of the file. I've deleted it. Who knows what problems it might have caused for me later?
You could have a pre-commit git hook that refers to a whitelist of allowed non-ascii characters, or also allow all alphabetic characters, or something like that.
It should be checked as part of the CI too, if you're doing that, some people might never install the hooks, and some people might git commit --no-verify.
This issue would never have happened using modern software. And if your text (including source code) still isn't in UTF-8, you're doing it wrong.
(I guess unless you're using some specific, very performance-conscious system that has to use 7/8-bit characters and is never going to connect to the Internet anyway.)
Documentation stored in your repo, unit tests, and comments. If your README.md includes code bracketed with backticks, you've got non-ASCII characters in your repo.
Things like Péter Rózsa or Kg.m² or ±0.5 PPM or C11 Standard §6.3.1.3/2 all work with my toolchain. Why mangle them into ASCII when there's no need to?
Maybe it works in your english codebase, but I'd be very hesitant about rolling it out everywhere. For example, Perl 6 supports unicode in identifiers so it's perfectly valid to write your code in Japanese. Another increasingly common example I've seen in python is the use of emoji in command line tools.
I like spending some extra effort on my Git commits (really, typing a message like this doesn't take that long, compared to the amount of time spent doing the change in the commit). It's just a shame that both GitLab and GitHub do not render Markdown in the body of the commit messages, and present them like actual prose rather than a wall of monospaced text.
...where ticket 340295 (wherever, not necessarily Github Issues) goes into more detail about the cause, investigation, and resolution process, as a history of the evolution of said process across a conversation.
IMHO you should pick a right medium for some kind of messages and I dont think git commit message is the right medium in OP example. Most companies have ticketing tool to keep track of additional information like: how long did it take to solve that issue, was there any input from other people, long "marked up" message explaining what was wrong, how it was fixed and tested (markup is a lot easier to read than console log).
Using ticket number already forces you to use "the right tool" to view the message.
IMHO I would consider OP example as a bad practice and your example as a better solution.
The other side of this argument—why I’m torn—is that the git commit message isn’t necessarily written by the developer who developed the solution, but might instead be written by a project maintainer who received the solution as a patch, or is copying a fix from a downstream or sibling-fork project, as a standalone patch.
In other words: if you have a ticket tracker with this information captured in it, it makes some sense to just link to it. But if you have a mailing list with this information captured in it—in only the loosest amalgamation, where there’s no clear “thread” that contains the whole discussion, and the original developer of the patch might not even be a part of that discussion—then it seems like it’d be important for the final committer to write their own summary of the events that led to this commit, such that people can understand what went on if they weren’t following the list. And where do you put that summary? The commit message.
I would argue that maybe this is a core part of git’s design, given that it was developed specifically for the LKML style of “patches first, sent to a list, and then discussed in the context of what they solve”, rather than the GitHub style of “discussion first [on open issues] sparks PRs that attempt to solve [i.e. close] the issue.” Git assumes that you, as an “editor”, are going to be summarizing an otherwise-illegible discussion history for the benefit of the people viewing the commit; and so it provides a multi-line commit message as a place to stow that editorial summary.
This is awesome until you can't access the ticketing system, or there is context outside that system, or even better the company decides to change tools and loses the mapping to old issues (true story. joy of startups).
By all means put the ticket number in the header line, along with type of patch.
But do yourself a favor and put enough context in too. Doesn't have to be all the detail.
The problem with this approach is when the team changes its ticketing system or the codebase goes to a different team (that has a different ticketing system), meaning all that context is lost. Happens all the time.
I ran into a similar issue a few years back. There was the same non-space whitespace character spread throughout the codebase. I tracked it down to a single developer, who had no idea why that was happening. My guess was that they were copy/pasting a lot from MS Word documents, but we never found out for sure.
The worst thing about Gitlab and Github is that commit messages are not immediately editable in a PR/MR conversation. You can find conversations like this all over {gitlab,github}-dot-com but you cannot find them in commit messages anymore.
Should have been in a doc or wiki instead of commit message.
I have never seen any dev searching for error messages in commit messages.
For the rest of the points (makes smarter, builds trust and compassion), if it's so worthy put it on the blog (like this blog post itself) so it can has a potential to reach some reach some audiance.
> I have never seen any dev searching for error messages in commit messages.
I've seen many devs search for error messages in Github search. That often turns up results in people's comments in issue threads, but the search also includes commit messages.
I have had github issues come up for search results frequently, but never a commit message.
But then again, I am not saying "don't write the story". If you think you found something worthy, just write a blog post or even a pastebin/gist would be better in terms of the number of people it reaches.
agreed. My team used to like of extensive commit messages, so that if you had trouble with a piece of code you could just git blame it and take a look at the referred commit.
The problem with that approach is that it doesn't survive as well as you'd like, because fixing a typo in a line would get you "ownership" of the line (since only the last person to change it is blamed). It's even worse in semi-major refactors due to moved/renamed files being treated as new....
> fixing a typo in a line would get you "ownership" of the line (since only the last person to change it is blamed).
That's a UI/UX/usability problem with blame, not an inherent one with the practice. Github's blame UI solves this very elegantly (blame history can be traversed easily), as do some others.
> It's even worse in semi-major refactors due to moved/renamed files being treated as new....
This on the other hand is a real problem with Git, but I don't see that it's strictly related to putting context in commit messages. This issue occurs either way.
I think this should have been an issue. Open an issue with the original error, document the troubleshooting in the comments of the issue, then close the issue with the commit.
“Convert utf-8 to us-ascii to fix error”. Says right there in the commit title, short and to the point.
Absent some automated summarising functionality that produces just the right level of detail for you personally, a concise title and a very detailed commit message that you can skim through to find relevant bits is an eminently reasonable compromise.
1) Just read the title
2) Maybe have TL;DR section for a summary, and details for who cares.
Having a rich commit message is extremely important for capturing code review discussions and design decisions and glue things together for the set of changes in a single place. The advantages far outweigh any negatives from a short commit message that will not give you the "why" of the solution.
One of my favorites, and I can't find it now. Someone's last commit message on the deprecated perl project was in 3D ascii art; something along the lines of "No More Perl." I have a few projects I'd like to write that to :)
General education in a commit is questionable... OTOH find -exec and escaping the ';' (or '+' for xargs-like one line) was helpful (hard to parse --help, though manpage is clear). Now I don't know what to think.
The Holy Order of Git Commit Log Bikeshedding and Overengineering have their field day. And they all have it wrong. Any commit log that doesn't compile to an automated build script isn't worth the bytes it's made of.
What I gather from this is that commit messages are a better place for documenting stuff like this than polluting every line of code with often redundant comments.
I think the git log is an interesting place to document this kind of thing. Also, there should never be redundant comments. I do think though that lots of folks put things in the git log that _should_ be a code comment. For myself (as most of the people I can think of that I work with), we don't reference git commits unless we are actively investigating a previous change. You should not hide reasons for code being the way it is in the git commit; it should be exposed for direct observation to someone who might edit the code. Ex: '// the following delete call is required to remove the item from the DNS cache to ensure the test validates non cached dns items'. That should be in the code.
I often review commit logs of my teams, especially while we are tracking down problems or I'm making sure the release notes capture everything.
There has to be a balance between this and "WIP"; I'm imagining trying to page through the commit log to see what changed when every 1 line change has a 35 line commit associated with it.
I love descriptive git commit messages. For those complaining about the length:
1) Just read the title
2) Maybe have TL;DR section for a summary, and details for who cares.
3) Use pretty printing for git log
Having a rich commit message is extremely important for capturing code review discussions and design decisions and glue things together for the set of changes in a single place. The advantages far outweigh any negatives from a short commit message that will not give you the "why" of the solution.
You've repeatedly crossed into personal attack on HN recently. That's not cool. I've banned this account until we get some indication that you've read https://news.ycombinator.com/newsguidelines.html and sincerely want to use HN in its intended spirit. That spirit is intellectual curiosity and kind, thoughtful conversation.
>> I don't want your entire life story in my commit log.
>Then I never want to work with you ever (or any code base you ever touched) because of your laziness...
I love coaching smart junior engineers, and the first thing I teach them is using Git properly and the importance of good commit messages! :)
The problem is not someone not knowing how important it is but want to learn. The problem is the people with this kind of (lazy?) attitude NOT WILLING to do what is absolutely necessary in the long term.
The commit was complaining specifically about the way locales were designed, not C as a whole. While C was very successful, I don't think you could argue that C locales ever reached the same level of popularity.
That being said I do agree with you that this complaining is not massively productive. Dealing with localization and non-ASCII text is notoriously difficult. Look at Java, Python 3, Windows, PHP 6 and how many "misconceptions about Unicode" articles you can find online. You could spend hours pointing out how each approach has tons of drawbacks but clearly the perfect solution doesn't seem to exist so a compromise had to be found.
In particular I'm not sure I agree with his complaints about locales being global state. How else would you handle them? You need to have some kind of global config flag somewhere to decide what local the user wants to use. Having explicit versions of the stdlib taking locales as parameters could be nice I suppose.
This bit in particular seems to completely miss the point:
>Idiotically, locales were not just used to define the current character
encoding, but the concept was used for a whole lot of things, like e. g.
whether numbers should use "," or "." as decimal separaror.
Of course if this programmer assumes that locales and charsets should be the same thing they'll end up frustrated.
When locales were invented, it was reasonable to assume that the locale would determine the character set. With the subsequent invention of Unicode that no longer needs to be the case, but code standards live forever.
It wasn't really that reasonable as soon as it was made global. It's exactly the kind of assumption that mostly works but has dark corners from the beginning.
Some imaginable version of locales was probably a good approach at the time, but that wasn't what got standardized.
I don't disagree, I've run into those dark corners myself. But the ease of use of globals is undeniable - there's a reason the Singleton pattern is still popular after all the ridicule it attracts.
Sure, it's easy to use. But that isn't the job of standardization bodies, or the right test to use on something as low level as this. It's harder to get right, but it is exactly these sort of failures in standardization that cause the most global pain, because they wend their way through everything.
In this case I don't think the locale functions were designed by committee, I think they were accepted as-is from a popular implementation. And they were implemented that way because it was the simplest most straight-forward way (I'll admit I'm just guessing now). Were they part of K&R C?
>When locales were invented, it was reasonable to assume that the locale would determine the character set
Sure, but that's not what the commit is saying. Instead it's saying that it should only determine the charset and that Unicode effectively makes locales pointless. That's absolutely not the case, there's a lot more to locales than character encoding.
I think we're in agreement. There are lots of aspects to a locale, and character sets are only a small part of that. Possibly the most visible part though.
Unicode doesn't make character sets pointless, it only makes them deprecated. It's still useful to have a way to convert from one set to another, and it's a shame the standard library doesn't have an easy way to do that. The deficiencies of locales are visible only in hindsight.
Locales being global state only makes sense for single-user applications. That assumption is no longer true once you have a server where every request may be from a different user. A better way to handle this is something like Go's context object, which gets passed explicitly.
That's not a very common use case for C, either now or back then (arguably you probably had more user-facing C cgi back then, but it was still one invocation per user so arguably you could set the locale for each call). Some webapps use C in the backend but generally don't deal with the localization at this level.
I'm not saying that C locales aren't bad and limited, I'm saying that it's a compromise that makes some sense. In particular when you're trying to bolt something into an already extremely popular language instead of designing a new thing from the ground up.
Can you imagine the churn if the C standard suddenly introduced a whole new set of string functions just to deal with the locales? Well, you don't really have to imagine, just look at the way it works on Windows with their wide strings.
This comment comes across really mean and judgmental. I didn’t read all of the original commit, but it read to me as informative and passionate with all of the frustrations solving a difficult problem comes with.
Honestly, this comment is so over the top I can’t even detect whether you’re serious or not.
HN is a community. Users needn't use their real name, but do need some identity for others to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...
> My over the topness is on purpose. The tone of the commit message resonates with exactly the same tone as my post (to me).
No, you see. The author of the commit was being humorous. You're just being an asshole.
> by a person who added a media player, a gold cup holder, on top of their foundations, walls, & plumbing?
C would be worthless if it wasn't for the programs that were built on it. To argue that the passion of the C developers is worth more than the passion of a media-player developer makes zero sense.
> A true craftsman would build a replacement without the perceived flaws.
Great thing the author never claimed to be "a true craftsman."
> I’ll give them props when they release their general purpose language
Ah, the "you can't criticize art unless you're an artist" argument.
It’s a shame you see it only as complaint, the brilliant thing about the text you’re replying to is that it educates the reader as to the bigger problem, as well as being very funny. Does your complaint do either? Understanding why C locales are broken may very well contribute to the lasting growth of humanity. Did you note that Eric Raymond fully agreed?
I hope you can in the future find the value of using humor to make your comments enjoyable to read, like the author wm4 did.
> I don’t really care about social media norms and use throwaways liberally
Did you mean that to be funny? It’s quite ironic following your “rant by someone with no real responsibility” comment. That seems pretty funny to me, but not clear that was the intent.