Hacker News new | past | comments | ask | show | jobs | submit login
My Favourite Git Commit (fatbusinessman.com)
1313 points by robin_reala on Oct 18, 2019 | hide | past | favorite | 370 comments

I use the following convention to start the subject of commit(posted by someone in a similar HN thread):

    Add = Create a capability e.g. feature, test, dependency.

    Cut = Remove a capability e.g. feature, test, dependency.

    Fix = Fix an issue e.g. bug, typo, accident, misstatement.

    Bump = Increase the version of something e.g. dependency.

    Make = Change the build process, or tooling, or infra.

    Start = Begin doing something; e.g. create a feature flag.

    Stop = End doing something; e.g. remove a feature flag.

    Refactor = A code change that MUST be just a refactoring.

    Reformat = Refactor of formatting, e.g. omit whitespace.

    Optimize = Refactor of performance, e.g. speed up code.

    Document = Refactor of documentation, e.g. help files.

Formatted for my fellow HN-disenfranchised mobile sufferers:

Add = Create a capability e.g. feature, test, dependency.

Cut = Remove a capability e.g. feature, test, dependency.

Fix = Fix an issue e.g. bug, typo, accident, misstatement.

Bump = Increase the version of something e.g. dependency.

Make = Change the build process, or tooling, or infra.

Start = Begin doing something; e.g. create a feature flag.

Stop = End doing something; e.g. remove a feature flag.

Refactor = A code change that MUST be just a refactoring.

Reformat = Refactor of formatting, e.g. omit whitespace.

Optimize = Refactor of performance, e.g. speed up code.

Document = Refactor of documentation, e.g. help files.

Wonderful. Thanks.

Yes. As is convention, commit messages should be a one line header, then and empty line an a body (if necessary).

The whole thing should be width limited to 80 or 100 characters.

And the subject line should complete the sentence "If this commit is applied, it will...". It should start with a capital letter, then move to lowercase, and necessarily will start with a verb.

"It should start with a capital letter, then move to lowercase, and necessarily will start with a verb.", are you seriously?

Seems like they are. Also seems like a solid practice that far too many people ignore.

or you can say something equivalent yet unambiguous

A good commit message isn't about convention, and no convention makes a commit message good.

When I review a commit, I need only the information I won't get from the diff that I need to understand the context and the behavior.

My brainpower is a limited resource and extra noise in my signal is extra work.

I'm totally here for `ISSUE_123456 fixes defect` and `wip` are insufficient, but I'm not writing a code blog post in there.

At my work we almost encourage the blog post in there idea.

It's not a hard and fast rule and it's ok to ignore it when it makes sense. But we also don't mind if your commit message takes longer to write than the code took to change and debug.

A lot of context is assumed in commits, and almost all of it is temporal. Capturing as much of that as possible pays off down the line.

Commit in the article is a good example where the context explains much more than the change.

(On the flip side, the pay off has an expiry date so I'm not extremely fussed when people lax, but it's still good to check in basic assumptions whith your code)

I fix about 2~4 issues by day, 20~40 commits by day. If I do write a blog post for every commit my productivity will down by 70% at least.

Well that's the argument about immediate productivity and good documentation.

Can you elaborate on your surprise or objection? It may not be obvious to onlookers what you're reacting to.

well, I think if you start with this kind of bureaucracy, you're doing it wrong. It's not funny. Also, if your app is down, clients calling and you need to fix it fast. If I need to remember this rules to fix this, people will kill me, or I will kill myself :p

After a few days practicing this, it becomes second nature, and doesn't hold you up at all. Commit histories also become much quicker to parse through, especially with "log --oneline".

I notice no hindrance on my speed because I write good commit messages, and always value them when I one back to them.

If you follow GP's convention then yes, and it's not a bad convention.

> And the subject line should complete the sentence "If this commit is applied, it will...".

I kind of do it like this. Others use subjects that complete the sentence "This commit...", so their subjects will start with "adds", "fixes", etc. Though that adds one or two extra characters!

Like many of the pedantic things we like to argue about, I think teams/projects should choose consistency over prescription.

I think that’s opposing incorrect things, in this case. I think the spirit of The Law should be clear and kept alive. Whether to “[If applied, this commit will] <commit msg here>” or “[This commit]<verb> <work description>” is a tiny matter of difference when the point is clear, straightforward, complete commit msgs of logically discreet-and-coherent commits. Both the above formats would fit the bill, and I think I’d be thrilled if my biggest issue w code commits amongst my team were only these two slightly different formats.

Consistency is just prescription by precedence

Is the "subject line" the same as the "header" in this schema?

Yeah, first line, w/e

Yes! I thought this was a great idea.

The other thing I do is add a "MODULE:" prefix to my commit message. It makes them way more readable. Lots of other people do this too; you can see this in Redis, Linux, and Go codebases, for example.

So your message might look like "router: add support for path vars". Much easier to read than "Add support for path vars to router"

Wanted to mention this.

I don't stick to it as a rule but if it's clearly grouped in some way it gives people in the wider org a high level fyi about the area of code you're messing with.

Makes it easy to jump over the commit when bisecting for regressions or hastily looking for the buggy commit to revert.

If someone made enforcing this convention a git hook I would use it in a heartbeat.

I.e. Love using husky to enforce code conventions through linters...

Only issue is it would be hard to completely enforce in code and obviously require some review but still nice

I also use something like this. A variation of Angular's commit convention[1]. There are tools like commitizen[2] that can help new adopters to build commit messages.

[1]: https://gist.github.com/stephenparish/9941e89d80e2bc58a153

[2]: https://github.com/commitizen/cz-cli

Every commit message must start with JIRA ticket number

I work at a place that used to prescribe this. I dislike it because it adds too much repetitive noise to the abridged git log. (It occurs to me that you may be making this point in a tongue-in-cheek way.)

Originally I suggested we put the JIRA number/link in the commit body, but then I learned about git-notes to add metadata to commits and now I kinda want to do this with the semantic labels suggested by the thread OP, too (I currently use the schema suggested at https://seesparkbox.com/foundry/semantic_commit_messages).

And I've used git nearly every day for more than a decade and written a git client, and this is the first time I've heard of notes. Thanks!

Notes kinda bloat the repository with additional objects, and they can be removed independent from the commits, so I'm a bit iffy on using them for this.

Good points to consider, thanks! It’s good to know all the tradeoffs.

We do this, and I don't find it that bad:

The branch is called feature/<Ticket ID>-<short-description>

The commits contain usual commit messages. When merging, changes are quashed. The final commit message is similar to the branch name and includes the ticket and a short description, which is similar to the mentioned pattern (bump/add/change/whatever).

This is a good way to; complicate development, slow down development and add an unnecessary dependency of jira...

I find that awful, because now half your subject line is burned by completely useless information: the JIRA ticket number doesn't provide any useful information when reading a git log / summary.

Yeah and if you no long use JIRA it's not useful. Whatever bug tracking software you use should allow for commits to be attached to bugs. But that should stay in the bug tracking software, not the VCS.

That sounds awful, but I/we do name feature branches as team-<TICKET #>(TICKET TITLE),with the title as usually optional

Some additions I personally use:

- Change: change functionality or behavior.

- Cleanup: more than formatting, less than refactoring. No functional changes.

- Rename: refactoring that changes a name but does nothing else.

- Extract: refactoring that creates a new module or class by moving code out of an existing module or class.

Isn't that convention just the English language?

I like the convention, but I challenge the idea that optimization can be just a refactoring.

Why? (Honest question, hear me out)

Is it a vocab consistency thing?

I see pretty consistent vocab used across orgs anyway, so given there is shardd domain knowledge / language at play, im not sure what the goal for standardization is?

Don't take this the wrong way, I use a lot of these when appropriate, but I don't think I could agree that refactoring must be just a refactor, and I don't think I want to limit anyone's commits to this list of change types either.

Forcing changes into these words means a lot of stuff is pointlessly bucketed when more appropriate wording could be used.

Say I want to push a commit "prioritise shipping route a over route b".

So I have to put it under start? Optimise? Fix? Why? Prioritise is the right word, why not just use that? Why play mind games when we have a whole dictionary to draw from?

I really appreciate this kind of commit message. There’s some very good ones in the Mercurial logs too:




https://www.mercurial-scm.org/repo/hg/rev/4a0d0616c47d (all modesty aside)

Long commit messages aren’t that atypical. Have a stroll through the logs:


Mercurial inherited this style of commit messages from Linux email-based code review (the Mercurial originator was a kernel hacker), because in that workflow your commit messages are kind of a persuasive essay for why your commit should be accepted. I believe that writing commit messages with that kind of goal in mind, thinking “why should you take this commit?” is a good motivator for writing something good and useful.

I think my favorite (in terms of humor) is a commit from mpv complaining about locales and encodings. You can practically feel the committer's sheer frustration.

[1] https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

Final paragraph:

> All in all, I believe this proves that software developers as a whole and as a culture produce worse results than drug addicted butt fucked monkeys randomly hacking on typewriters while inhaling the fumes of a radioactive dumpster fire fueled by chinese platsic toys for children and Elton John/Justin Bieber crossover CDs for all eternity.

I also nominate this commit.

Pure, rage-filled, git commits are possibly the most honest form of art.

It's where the commit message comes in the process that makes them noteworthy. They're the final chance for a developer to vent their spleen or blow their trumpet before signing off on the results of possibly days of work.

George Carlin would probably approve and applaud.

Also from the commit message:

> like Shift JIS (sometimes called SHIT JIZZ)

Back then when I was working for a Japanese outsourced project, the code won't compile unless the computer's locale was set to Japanese because the C code had comments in Shift JIS.

My favorite (in terms of dark humor, if we’re honest) is YOLO, one of the more interesting deep learning object detectors. [1] It is the exact opposite of yours in every way. The code is brilliant however.

Even the papers are snarky. [2]

[1] https://github.com/pjreddie/darknet/commits/master

[2] https://arxiv.org/pdf/1804.02767.pdf

> But maybe a better question is: “What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook.I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to.... wait, you’re saying that’s exactly what it will be used for?? Oh.

> Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait.....

> The author is funded by the Office of Naval Research and Google.

He's being funded for this. This is fantastic.



This is the kind of guy I'd love to hang out with.

That resume is something else.

i'm all for adding an artistic flare to set your resume apart from others... but that's a bold move, cotton.

It's not about trying to catch attention, it's saying "I'm so good that even though this is totally unprofessional you're still going to want me"

It's the resume equivalent of Culture ships with silly names/AI personas.

At a certain level, it's boring to just be good. The interesting challenge becomes to stay good while being silly as fuck.

And that's why I'm serious and professional.

...you're going to want me...kept away from others due to HR cringing at my presence.

You're not filtering him out of the hiring pool, he's filtering you. ;)

Yeah, my own resume isn't (or wasn't, it has been years since i updated it) that flair-y, but it does have a sidebar with screenshots of stuff i worked on (with my face doing a weird look at the top) and is in mostly prose style with bold text for each of the projects i worked on (bold helps to scan for stuff, at least that was the idea). I've been told a few times (by friends) that this isn't how resumes are supposed to look like but my line of thinking is that if you are so stuck up that you want a specific format for a resume, then you're most likely too stuck up for me to want to work with you :-P.

My team is in a full-on hiring spree right now. I've opted out of it because there's no process.

Ultimately, it boils down to people's gut feelings, which is so disgustingly random.

The real WTF is: Apparently there is a place called "Unalaska" in Alaska!

My understanding is that the in the local language "Alaska" means "Peninsula," and "Unalaska" means "Near the peninsula."

I was reading it and kind of interested, and thought this line was funny:

> I have a lot of hope that most of the people using computer vision are just doing happy, good stuff with it, like [...] tracking their cat as it wanders around their house

And then suddenly realized I've wanted to do exactly that for a long time. Well... specifically install a camera that can detect my cat on the counter (and not my hands doing stuff) and sound an alarm/puff air to get him off.

Could this work for that? I think it could! I know my winter project...

On the topic of cats vs automation, I'd recommend reading this post [1] describing the arms race created by the writer's cat attempting to break into an automated feeding machine. HN discussion here [2]

[1] http://quinndunki.com/blondihacks/?p=3023 [2] https://news.ycombinator.com/item?id=13230904

We had an automated feeder with two chambers, each held down by a rotating, ticking timer switch. You rotate the switch to, say, 12 hours in the future, and 12 hours later the slot lines up and the lid pops over.

After many months, my cat learned to stand on top of the lid and use both paws to rotate the switch forward until the lid popped open.

This blew my mind. The switch was separated from the lid. I could imagine the cat attacking the lid itself, but this separate mechanism, requiring a motion completely distinct from the motion of an opening lid, and requiring patience without instantaneous reward or even evidence that it was going to work.... I just couldn't believe it.

Never underestimate captive animals (or humans). They have nothing but time to observe your patterns of behavior and learn from them...

There are few tech stories I enjoy more than the back-and-forth of breaking and improving a thing. A story where one side of the conflict is a cat means this might be my new favorite!

I've had good luck using low tech puzzle feeders with my cats. Each one is a bowl with a maze inside or a tower with various windows and trap doors they need to work the food out of to be able to eat. As the food levels get lower the pieces get harder to reach, which allows their laziness to take over and stop them from eating too much.

One example: https://sep.yimg.com/ay/entirelypets/kyjen-dog-games-slo-bow...

I love how he just randomly carries on his own internal dialog in his writing as if he really just doesn't give a fuck who's reading it. Totally brilliant!

I really like his licensing model, specially the META license: https://github.com/pjreddie/darknet/blob/master/LICENSE.meta

> 4. Things We Tried That Didn’t Work

I love seeing a section like this when reviewing a paper. I really wish more authors would include one. (Goodness knows I've chased down enough dead-ends in some of my own research efforts.)

Yeah, it's fun, but, seriously, git log is messy AF. I wouldn't appreciate it a bit, if somebody would do that to a project I'm involved in.

What's funny, though, the paper (written in a pretty much the same "fuck you" manner) is much more readable and informative than the average. Which says a lot about science papers out there.

Number 2 might be the greatest technical paper I've ever read. Brilliant :D

Oh. Locales. The remembered pain.

Save a file in Notepad. Open in vi. See that it is different. Find data in the database, no clue the weird characters were originally supposed to be. And so on and so forth.

I once wrote a reasonable program and sent it as a bug report to the maintainer of the Perl module DBD::File. He sent it as a bug report to BerkeleyDB. They said they never thought about it but yes, that would be silent data corruption with no way to recover. The program? Maintain an address book in a BTree sorted in the current locale. Enter names. Change locale and insert something else. Voila! Lost data with no way to recover!

> Oh. Locales. The remembered pain.

More like the ongoing pain.

I had to write the following just this year because SQL Server still defaults to using CP 1252 for text. The culprit? One of those damned stylized quotes that Office loves to insert for you. The code:

    def _wrap_str(value: str):
            return SqlVarChar(
        except UnicodeEncodeError:
            logging.getLogger("bulk copy").exception(f"value causing error: {value}")

I would call that Windows pain at this point, not locales.

In the Linux/Postgres world, everything is UTF-8. Which is the default for all internet protocols. Do that and the pain is gone.

Of course Windows doesn't do that...

Linux is not entirely UTF-8, though plenty of people treate it as if is so. Even if on Linux, you might need to consume files from other OS's or other Linux systems with different locales. Once had issues with systems configured with "C" locale vs "en-US" should have been near identical, but enough slight differences to cause failures. Been +10 years, so I dont remember the details.

Windows, the OS, is UTF-16 (or UCS-2 - I forget the details between the two), SQL Server has, for historical reasons, defaulted to CP1252, probably for compatibility with Office components.

But, it's not really a Windows problem, per se, because you have to deal with this issue if you deal with data originated from numerous Windows apps, even if on Linux. Yeah, you can insert a byte order mark (BOM) to indicate UTF-8, but most tools expecting UTF-8 actually dont check for the BOM and blow up in interesting ways if present. Ive seen this far too many times. Enough that anytime I see an encoding error from the likes of Python or Ruby, its an instant recognition (I do a lot ETL work from a number of vendors, so I see a lot of different files "types").

SQL Server prior to 2019 stores unicode data in UCS-2 (UTF-16 analog, rougly). SQL Server 2019 supports UTF-8.

Bugs in the C library string localisation have previously caused a problem in PostgreSQL as well:


Since Postgres 10 they use ICU instead of relying on the C library string routines to give more control.


I have an Untagle game on my phone... is there an SVG editor that could do something similar?

I've found Graphviz / DOT to be surprisingly handy at times:

* https://www.graphviz.org/gallery/

* https://en.wikipedia.org/wiki/DOT_(graph_description_languag...

"this is a simplified representation" :D

Wonder if there's some more ideal way to lay out those so they are less of a tangled web. This graph makes me want an adjacency list.

This needs to be REQUIRED READING at the Open Group and the ISO C standards committees.

I'll quibble just a bit and say that:

  a) the C locale should be a UTF-8 locale...
     that tolerates invalid sequences (because
     the C locale historically is a just-use-8
  b) even with new functions that take a locale
     handle, we need functions that use a global
     one, however that global one should be set
     once and NEVER changed in the life of the
     process, and it should be set either before
     main() starts, or before main() does anything
     that needs a locale, or starts any threads.

> “Those not comfortable with toxic language should pretend this is a religious text.”

Not a commit but from the same author, so you may enjoy: https://github.com/wm4/dingleberry-os/blob/master/README.rst

It does go off on a few tangents, but it's an interesting read.

> Descent 2 custom level reviews

You weren't kidding about the tangents.

This message says much more about the author than it does about the commit.

Really? I would say 90% of the commit message is technical, rest is just emphasizing his frustration, which is pretty much justified. Yes, he is a person that seemingly gets pissed off by bullshit design decisions and whatnot. So?

Yeah. People who use the term "retarded" that way are stupid, no matter how smart they are.

I disagree. Words are just words and we give them meaning. Being derogatory and unkind to mentally deficient folks is ethically wrong. Using that word in a different context to communicate frustrating imo is fine.

Try telling that to my boss, who has an autistic child. I held your opinion until I stuffed my foot in my mouth in a meeting. Now I don't use that word.

There it is.

"When you omit courtesy you're throwing sand in the gears of a machine that doesn't work too well in the first place."

~Heinlein (I'm paraphrasing.)

There's no glory in being a boor, and no shame in being courteous.

None of this is news: https://en.wikipedia.org/wiki/Etiquette#History

    A fool never learns,
    A man learns from his mistakes,
    A wise man learns from the mistakes of others.

Wait until a derogatory word being used effects you directly. It's really easy to just not use words flippantly that you know some people have an issue with. You are clearly aware of how it can be offensive and for a lot of people your definition is still just a callback to people being unkind about mental illness. History matters too and if you choose to ignore that and use it anyway you're just being unnecessarily inconsiderate.

I don't recommend you to join chats in competitive online games.

That may be so, but its active usage in one social group does not justify the discourtesy of its usage, especially in a professional setting.

Yeah, people are idiots. Incremental progress is better than nothing, though.

That is... wonderful. I've spent some time dealing with locales in C and other places that depend on the things being discussed in the commit. Just reading it bring back some of the rage I felt.

Agreed, that's an over the top commit message.

I do disagree with the assertion that it takes a lot of code to convert between the various UTF variants, 3 pages is an overestimate. https://stackoverflow.com/a/148766/5987

“mpv”, “sheer frustration”... I didn’t have to click the link to know it was authored by wm4.

This is why i stick with ASCII :-P.

For anybody wondering, the likely origin of the invalid character is somebody using an Apple Ireland/UK keyboard layout where # is Option-3 (AltGr-3), and non-breaking space is Option-Space (AltGr-Space).

I recently added a Rake task to one of our builds which checks for the exact problem mentioned in GP, after having 3 separate occasions in the last 6 months where OSX "smart" characters have changed the encoding of a file consumed by things expecting pure ascii.

Unfortunately it is a bit of a hack that shells out to "file -i", but I'll take it over hours of frustration.

I don't know how many times these non-breaking spaces caused problems. I think linters should prevent commits that contain non-breaking spaces. And if really one is needed, it should be encoded as `&nbsp;` or with whatever encoding is relevant.

…or fix the non-Unicode compatible systems that are consuming the commit messages and breaking? If they fail with an nbsp then they’re probably also going to fail with more obviously useful non-ASCII characters.

How often are non-breaking spaces purposely inserted vs accidentally? And the tools might handle them fine but will produce strange results or errors. An example is inserting a non-breaking space in a document or string. It will prevent word wrap, which might not have been the user intent. A linter that requires these spaces to be explicitly set in encoded form would avoid these issues.

Non-breaking spaces are required to typeset French correctly, at least: https://en.wikipedia.org/wiki/Question_mark#Stylistic_varian.... The accidental insertion of non-breaking spaces is a possible issue, and it's a bit harder to detect than other typos, but it's also probably not as bad as other typos. Overall I think it's a bit hard to make the case that they should be disallowed.

I was forced to gain very intimate knowledge of a web based rich text editor that would use non-breaking space characters as markers to monitor current user selection.

> whatever encoding is relevant

Such as UTF-8?

Ah I would have bet good money on an apple product being involved.

This gives me ideas.

My commits are usually short and sweet - to the point. I document my code very well, however.

One of my strengths in a previous life as a Master Automobile Technician was the ability to document the entire process -- from duplication of a concern, to troubleshooting, to correction, to verification...it's literally how I got paid (which I never understood why so many automotive techs took short cuts while documenting, especially for warranty concerns where you deserve to be paid (by the manufacturer) for everything you did that was necessary to fix the concern the first time).

I could be mistaken (life-long coder, former network engineer / architect for nearly a decade, but I'm currently in my first-ever role as an actual backend developer). I think I was told to keep commits to one line unless absolutely necessary. I'll have to bring this up though. I like the idea of searching through the git logs for specifics, as opposed to having outlook search through the git commits for actual pieces of code changed, or error codes which might not actually be there.

In either event, at least more descriptive, yet still short, messages such as what strictfp suggested "Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII" Although I really like having some reasoning and logic - or how/why in those easily-searchable logs as well.

I think some people remember everything and some people don't (although we're all probably roughly similar logically). I have a hard time remembering what I ate for lunch yesterday - that's why I count on good documentation to function as efficiently as possible in the future. I've worked with people who have an absolute uncanny ability to remember 'stuff'. That's impressive, but I do not have that ability myself.

> I think I was told to keep commits to one line unless absolutely necessary.

The advice I've heard: Your first line should be a concise summary of the commit. This is because a lot of UIs only show the first line up front. (GitHub, git log --pretty=oneline, etc.) However, it's okay (and often encouraged) to go into further detail on subsequent lines.

> I think I was told to keep commits to one line unless absolutely necessary.

I think I would phrase something more usefully as: keep it as short as possible, but no shorter.

I find it to be fairly rare that a commit is so self-evident that only the summary line can do.

My standard has been:

Commit messages should describe why you're doing something ("X asked", or "[reams of supporting evidence why this needed to be made faster but more confusing]"). It provides context to current reviewers, and future archeologists who wonder what you were drinking at the time. Perhaps you had a good reason for doing [insane thing X]! Perhaps you didn't. If you didn't write it down, they might change or leave it, and break something or prevent something from getting a proper fix.

Code comments should be notes to code-readers that are relevant at all times until changed or deleted. "How to use this", "beware changing X", "Z is hot garbage and should be replaced if used for Q". Ideally you'll have asserts or tests or something that actually enforce this, but of course that's not always a realistic option. Comments in code will follow the code around, and don't require chasing code history through N layers of refactoring and indentation-wars, which is what makes commit messages mostly inappropriate for needs like this.

I do feel like Git commit descriptions are severely under-utilised for sure, but I believe there is a reason for that which until fixed, will prevent rich and contentful commit descriptions for flourishing.

In the article order: the screenshot is from a commit detail page. How often do you land on this page? You need to specifically click through. If you are in a commit list, the only thing that sets title-only commits and commits with description apart is an ellipsis link which practically blends in with the background. It is not very well integrated nor discoverable. Also I don't believe the commit descriptions render as MarkDown (unlike issues) which is also a shame as it feels like less a doc then. But I might be wrong on this. But even outside of GitHub, how many other UI/IDE plugins and other kinds of Git tools restrict commit display to just the title and put the description on the sideline? Most of them. I think this further leads to the currently low value of the commit message being searchable. Since it is exceedingly rare for good commit messages to exist, no one thinks to search them. People default to Google, when their own project's codebase/knowledge base could hold the answer to their query. I don't have much of an opinion on the commit message telling a story / having a human touch. I mean, it doesn't hurt I guess, but until _full_ commit messages become more "mainstream" (for a lack of a better word), they can be as human as they can, but they will live in solitude.

In the same vein, I wish there would be a standard workflow for putting code inside commit logs.

The typical use case would be database migration scripts : IMO they are always a pain to version properly because fundamentally Git and all the other software versioning tools let you describe the "nodes" (in the graph theory sense of the word) of a codebase ("at commit A the code was in this state, at commit B it was in the other state"), but severely lack when it comes to describe the edges between nodes ("in order to go from state A to state B, you need to execute this code")

I think the temporal dimension of software engineering is still poorly understood, and severely undertooled.

Strongly agree. I run into this a lot with stuff like style commits. You want to ensure that a change hasn't slipped under the radar.

I tend to go for something like:

This commit was generated.

<shell script here>

It would be super awesome to have a tool that easily verifies that A->B can actually be reproducibly accomplished by performing the actions in the commit message.

A few version control systems are change-based rather than snapshot-based — darcs¹ and pijul² that I know of.

¹ http://darcs.net/

² https://pijul.org/

> when it comes to describe the edges between nodes ("in order to go from state A to state B, you need to execute this code")

Isn't that what a patch file gives ?

Sometimes no. For instance a database migration script is not equivalent to the diff between two database creation scripts.

While it should be possible to take a database creation script (state A) and a database migration script (edge A->B) and infer the new resulting database creation script (state B), the reverse is not true.

This is the tables vs events duality described in this great article : https://engineering.linkedin.com/distributed-systems/log-wha...

The patch file gives the output of the execution of the code.

A basic example would be - run 'find -name '*.py' -execdir sed -i 's/foo/bar/g' +' on a repo, and commit the result.

For those not familiar with POSIX shell stuff, that will find and replace 'foo' with 'bar' across the repo.

The command is far more understandable at a glance than the patch (commit) and is far more likely to be reviewed properly.

How do DB migrations relate to the source control graph?

It seems you're implying the codebase changed as a result of a script that itself is not source controlled. I can think of style commits falling in this category, like one of the children of this comment mentions, but DB migrations don't seem to be related.

Where they're really valuable, IMO, is when you're tracking down when/why a change was made with `git blame`. When you're looking for the reasoning behind a change, it's extremely helpful if there's a detailed commit message going along with it.

If only important/tricky commits have long messages it's fine. Most IDEs will show you the full text if you hover over the line, so when working on something it's easy enough to check them. I'm huge fan of the timemachine-like "Show Git History for Selection" view that IntelliJ has, so usually there's no reason to go to a Github/Gitlab commit page.

OR you could just write

Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII'.

I don't want your entire life story in my commit log.

> I don't want your entire life story in my commit log.

I[1] want enough debug information in the commit log to be able to reproduce the issue without having to go on web hunts to understand the problem. Especially when the change appears to be trivial on the surface, because these are the ones that can turn out to be rabbit holes.

I don't want to have to interrupt you to get this information because you didn't write a good enough commit message, and you probably don't remember anyway. I don't want go look at an external issue tracker that i may not have access to, or may not even exist anymore.

[1] Where "I" is: me, your future self, a future maintainer, a junior dev, an open source contributor.

To me, at least, the issue with that commit message is the signal-to-noise ratio. There is a lot of exposition for each piece of information. I prefer a more declarative commit message. However, from the writing, I suspect this is just due to the committer not being a native English speaker.

e.g. the first paragraph doesn't lose any important information trimming it down to:

"After adding a test matching the contents of router_routes.conf, `bundle exec rake` fails with:

        invalid byte sequence in US-ASCII

Realistically, it would have been a better commit message if they'd given the shortlog SHA where the test was added that exposed the bug rather than an explanation of what the test does.

"After adding test <testname> (08c3e17), `bundle exec rake` fails with:"

Hmmm, I've always believed that no commit should break a build, even if you're committing the fix right after. Otherwise you're going to cause problems for `git bisect` or other practices of going through the history to find where a problem may have started.

Do other people commit breaking tests and then fixes?

I think this depends quite a bit on what other contributors are doing - it's one of those cases where several approaches are acceptable, but inconsistency is not.

"Commits should always build" is one doctrine I've seen. As you say, it makes bisecting and other error-analysis approaches easy. On the other hand, it risks either having large, opaque commits, or adding overhead to make intermediate commits build - possibly with flawed/meaningless behavior when they do.

Another is "the trunk should always build". In that case, you'd just squash branch commits down to logical groupings that are easy to analyze, whether or not things build. You can bisect on the trunk, but lose all guarantees about state on branches.

Finally, I've seen variations on "no commits that break the product", "no commits that make things worse", or "no committing failing tests without subsequent fixes". In this case, you can't generally commit broken builds, but can specifically add failing tests. The first rule just means "adding failing tests is ok", the second means "converting runtime bugs to failing tests is ok", and the third means "write your test and fix, but split (and ideally tag) the commits". All of these break bisect, but they guarantee the project itself won't become more broken from commit to commit, and they can help with other forms of reasoning about where bugs first occurred.

Every approach there seems viable if you stick to it. If there's no established practice, I suppose the best choice would be based on what sort of work and debugging is most likely to apply.

I might be missing something basic here. Isn't the "no commit should break a build" impossible to enforce on a codebase where you need to push a commit to run the tests?

Something where you can't test locally, like when testing on multiple architectures or when the tests just take too long for a laptop.

In my workflow, that would be in a feature branch, and exploratory branches can certainly break, but before I made a PR I would rebase my changes such that none of the commits broke the build.

It's also a different situation. The original one is "I've made a test that shows a problem." Your example is a surprise "I don't know whether this will pass my cloud-based tests." I would edit my branch if I had a surprise failure, since my initial code clearly wasn't correct.

I do this, but not in a way then end up on master. It's a driving force behind my preference for squash-and-rebase merge patterns.

A good bugfix PR is often two commits then: one with a test to catch the breakage, another to fix it so the tests pass. Reviewers can see the failing-then-passing CI job logs, so if they agree your test catches the bug, they have additional CI-automated validation your fix worked.

Then as long as you squash when completing the merge, you get the best of both worlds.

Having worked with the committer, I can tell you that he’s definitely a native English speaker.

I prefer that the commit includes the addition of a test in the test suit that get fixed (or a few). This is good because:

* It ensure that the bug is real. [1]

* It ensures that the bug is fixed. [1]

* It prevents reversions (assuming the test are run automatically).

* The test may prevent reversions in other related code, or discover other hidden bugs.

* It brings you closer to a 100% test coverage.

* You don't have to guess how to reproduce the bug, reading the comment.

* If the bug depends on subtle configurations, they should be set in the test. [2]

From time to time there are bugs that are obvious in the code, but they are too difficult to find a test for them.

[1] Been there, done that.

[2] Once I found a bug that depended on the local timezone.

100% test coverage is such an overrated stat.

Write a regression test (including its documentation) instead of just documenting the issue in human interpreted language, immutably.

Your future maintainer will thank you for not having to dig through repository history.

It goes without saying that commits should include tests that cover the change, where possible.

> immutably

That's what makes this modus operandi so powerful IMO - comments in code may go unmaintained, tests may start failing for other reasons, issue trackers come and go, developers leave the company, documentation rots.

The commit message is (unless you have a bad actor) immutably linked to the original change, and that's exactly why you should be thorough in expressing its reason for being. I can git checkout the point in time (perhaps having bisected) and have the information to allow me to reproduce the issue.

Especially when the change appears to be trivial on the surface

Comments about the code should be in the code, where the next dev will see it. The more trivial a change, with far- reaching implications, the more important this is.

Doing so has heaps of benefits: future devs understand ramifications, shows that this code has been scrutinized, makes it easier when doing refactoring /yanking, or porting code.

That said, leaving the life story out will always be a good idea.

Agreed, but in this case, it was an encoding/whitespace change so there isn't really anywhere else to put this info.

IMO the repo is the code

However, I would have done a simpler commit and linked to an issue where I explained the problem/solution in more detail

This assumes your issue tracker doesn't change. I've been at my current position 8 years, and in that time we've had 3, and the first 2 are shut down.

We use Gitlab so it has both in the same project - but like you said that could change

> I don't want go look at an external issue tracker

Related question: are there projects that use git itself as issue tracker?

Pagure [0], Fedora's git forge, hosts code, issues, docs, and pull requests as four separate git repositories under the hood [1]. However, only project administrators can clone most of those repos.

[0]: https://pagure.io/pagure

[1]: https://docs.pagure.org/pagure/usage.html

I can imagine that working to a degree: Make a fork of a commit at an issue, then merge that fork back in with master at point of fix. Bit of a mess in the tree though.

This. It's the same with comments in code:

I don't want to read what the code does (I can read that myself, thanks!), I want to know WHY it does it the way it does it - especially, if there is a more obvious, better way.

Also: People leave companies. Or die. At some point in time, you won't be able to ask the original author.

No thanks. If I had a dime for every function that's so obviously self-documenting to what it does.. etc. If you tell others what it does and why, concisely and thoughtfully, no one has to try and mentally parse the what of some clever undescriptive block of code.

"You spent an enormous amount of time learning X, which is encoded in this three-letter bugfix. Don't make the next person go through that too."

On the other end of the spectrum you get ImageMagick useless commit messages[0].

That extreme aside, I'd rather have commit messages that delve into the why-and-how the commit alters the behavior to the better rather than cryptic message as 'Replace invalid ASCII char'. Now we have documented reasoning and thought process that can aid future debugging. They can also be beneficial for new devs hacking on the project, or students learning how to implement and improve systems.

Personally, I enjoy reading these. The Go commits often have commit messages like these, and they are shared on HN often for a reason. They're learning material. They can't go on a wiki because they're tied to particular set of changes in a particular point in history. They also can't be comments on the code because they're tied to particular lines in different files, and code comments can only cover a set of consecutive lines in one file.

One recent example I could find is this[1]. Yeah, it fixes ^Z, but why didn't the old approach work? Why did it work for some time then didn't? How did it change? Why is this commit optimal, if it is? All of this along with scenarios to reproduce the issue.

Give me your life story anytime over cryptic message.

[0] https://github.com/ImageMagick/ImageMagick/commits/master

[1] https://github.com/golang/go/commit/610d522189ed3fcf0d298609...

Agreed. When at some point the website that they are pointing to changes in the future they will lose all context on why a change was made.

I believe in the "plane flying across the ocean without WiFi test" or basically anywhere without Internet access. If I am on a plane flying across the ocean without WiFi, do I have the information in the git commit to understand what happened. A git message that consists entirely of a link to a website is useless in that case.

You can write the brief summary in the first 80 characters, like OP did. Then write details in the body below, in case someone needs the context. Most tools display only the first 80 characters unless you expand the body.

This case is probably longer than necessary, but I've saved a day of debugging on multiple occasions due to someone (also myself) leaving some lines of context, reasons and reasoning after the high-level description.

> I don't want your entire life story in my commit log.

Why not? Where else do you want it? Is something forcing you to read the full commit log?

There's no length limit on commit messages and commit messages are mostly out of the way. Most VCSes have a way to only show you the first line. So if you want summaries, that's what the first line is for. If you want the full story, that's what the body is for.

Combined with annotate/blame, commit messages can be very helpful source-level documentation. Nobody has ever complained about too much documentation, and commit messages are the perfect time to document what happened because it's one of the few times where our tools actually force us to write something in order to proceed. As long as we're being forced to write something, write something good and informative.

I think the problem isn't the length or content of the commit message, but its organization. It needs to have the most important information first. It reads as an "entire life story" because it is written in a narrative, sequential form. Better organization would make it skimmable, and later coders could only read as far as they need to.

If I'm searching commits, I'm trying to find record of what changed and when. I only want clues, and quick skimming is paramount. I want no personality. I want concise descriptive commit messages.

That said, we reference an ID from our project management software with every commit, so once I find the commit I'm looking for, I can reference it back to external documentation. I still discourage personality there as well because it can get out of hand and clutter the comments, but it's more forgivable than being on the commit itself.

The pull request is a good place to put such a large amount of information. That would also be a good way to make sure it is seen by the broader team instead of burying it in commit history. You could make the argument that then it would not be part of the git history and therefore could be lost if you change hosts.

I will make that argument. The hosting is ephemeral, the commit message is eternal.

Plus, what if you want to know what happened and you're simply offline? Let's not unnecessarily break the D in DVCS.

I never understood this philosophy. What makes Git more eternal than any other technology? Why is putting all of your data in one monolithic tool a good solution?

You might change your issue tracking solution. You might change your host solution. You might change your review platform. You might also change your VCS solution. Nothing is eternal.

The commit message itself is way more eternal than GitHub ephemera. There's plenty of old codebases in git that were imported from SVN (or even older RCSes) with all commits intact. What's likely not intact is data in ancient issue trackers from decades past. git is a DVCS, so anyone can clone the repo and get all the commit information. Cloning the issues and such is not nearly so trivial, and isn't a part of the git protocol itself so there's no guarantee it's in any kind of interchangeable format.

Important information should not just be in PR comments. It should be added into the commit information itself so that it'll be maximally available going forward. A good, fully explanatory commit message is a huge asset, and those commit messages will exist for the entire lifetime of the codebase. Anything else, not so much.

I didn't say anything about git. I said the commit messages are eternal, and none of those changes will change the commit messages (except, perhaps, changing the VCS, but usually that will preserve commit messages too).

You'd be foolish to do a VCS migration that discards the commit messages. I've never seen it happen personally, as people tend not to be that foolish. I've worked with legacy codebases that went from CVS -> SVN -> git and all of the commit messages going back to the very beginning are intact, because why would you ever do a migration that doesn't maintain them?

This is a really convincing argument. I was with the parent commenter until I read this; I was like, this is totally PR stuff! But hadn't considered offline situations, or host switches. Thanks op!

> I don't want your entire life story in my commit log.

I agree with this, but I think yours is too short.

Scientific papers typically introduce enough information such that a person familiar with the field but not an expert in that particular area can understand generally what's going on.

That's my ideal for a commit message as well: someone generally familiar with the codebase but who hasn't looked at this specific code (or perhaps not in a few months) should be able to understand what's going on; then the job of the reviewer is basically just verification.

My "template" is normally something like: 1) What's the current situation 2) Why that's a problem 3) How this patch fixes it. So in this case, it might look something like this:


Convert template to US-ASCII to fix error

$functions use `.with_content(//)` matchers to do X. These matchers require ASCII content. The $foo template contains a non-ASCII space; this results in the following error:

ArgumentError: invalid byte sequence in US-ASCII

Fix this by replacing the non-ASCII space with an ASCII space.


No need for a life story, but still searchable, and has enough information for even a casual contributor to do a useful review.

I really like that commit message - though it'd be nice to link to any sort of issue/task tracking ID that's relevant to that piece of work.

“I didn't have time to write a short commit message, so I wrote a long one instead.”

Pascal! My favorite language

I, too, would rate this a substandard git comment. Dave basically vomited a bug ticket of information, which is highly contextual and irrelevant ... like the lines he was faced with, which tell us nothing in the future nor anything we could not see in the change. The error is known, from the ticket being addressed. Documenting what error a bundler throws in the application deployment, within git seems...silly, since it will likely not apply to all points in time. That's why we have separate issue tracking.

There was a whitespace encoding issue AND the developer didn't really understand the issue, since they ended with "One hour of my life I won't get back.". Over my 20 years, I've seen this EXACT scenario multiple times across multiple companies. Some jr engineer gets stuck with some troublesome weird error in a corner-case that ends up being a non-standard whitespace. It's a learning opportunity and he lamented it because it was different and nobody told him "we could stop this from happening again, generate a new issue".

There are salient improvements that the git commit would benefit from both comment changes and additional code:

1. Include a (new) feature ticket that is linked to this issue - to create a process that doesn't allow for this again (eg fix a linter)

2. Include the name of the bug ticket (Convert template to US-ASCII to fix error) in the commit title, that was being addressed.

3. Create a test to specifically enforce the us-ascii encoding or add necessary rules to a linter.

For critical applications, I for one would like to know the story behind a commit, preferably in the commit itself and not a reference to an external system like idk, Jira.

My favorite examples of commit messages are the Linux kernel, where you can tell that they're being specifically crafted instead of just used as a work log to be ignored. This means that ten years down the line, people can still see when a change was made and why, who was involved, who signed off on it, etc. Have a look at the commits at https://github.com/torvalds/linux/commits/master

This is true when you can reference the commit to an issue. Then, seeing the simple commit message you can select if you want to dig up what happened by reading up the comments at the issue.

On the other hand it really gets into my nerves when people don't use the task/issue/whatever manager system appropriately. Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline. In general I'm really disappointed by the majority of my colleagues for the lack of comments inside and outside of our codebase and this is a persistent issue, at all the companies I worked for. Me along with other similarly irritated people, always ask for documentation if it is not given.

> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline

Some people do it for job safety. The logic is if you don't document things and the knowledge is only in your head then you are more valuable, they can't get rid of you easily. If you document everything meticulously, then you are easier to replace.

> Some people do it for job safety. The logic is ...

Has anyone actually seen this logic work out well for the person that invokes it? Generally the type of person that uses it is one that you probably don't want on your team.

I have!

Company promoted the guy and raised his salary because he had plan to leave the company

I know in instances like that, though, my next step would be to start working on contingency plans, as if someone has proven themselves to be indispensable, then that very fact is a risk that needs to be managed.

Like having a new guy learn the material, taught by the old guy who doesn't want anyone else knowing it!

> Recently, I lost a couple of days trying to figure out how to compile a c++ framework because the other guy didn't document his pipeline.

This is assuming documenting the pipeline would have been helping! You may have spent a few days instead figuring out why your seemingly identical setup couldn't reproduce the build...

Not that I'm bitter about build systems or anything.

Talented coworkers dont need documentation very often... If someone cant figure out how to compile something, its likely they are missing knowledge about the language in general...

Then use `git log --oneline` and you don't have to see the lengthy details, until the inevitable day when you find you need them.

How do you surface them when you need them, though? git grep?

Git greps work if you're trying to search all the logs. I would think this detailed documentation would be most important when you're trying to understand a specific file or line of code. In that case, it's:

- git blame (who wrote this?)

- git show (look at the commit surfaced by blame)

Sure? Or 'git log', then use the pager to search, or pipe into something with better fuzzy search, etc?

Generally speaking sure, there's no need to make things more complicated than they are, but the author even found some evidence in the history that indicates other people found this message useful.

The powerful thing about this is having everyone put this kind of info in the same place IF they think it might be useful to the next person.

Still, you've left out the details that you've confirmed that there's no other instances of this in our codebase. I'm also firmly in the "all commit messages should include a test plan" camp, so you should at least say how you found the error ("bundle exec rake was run before and after").

I get you're being terse for demonstrative purposes, but even eschewing verbosity we should still convey all the pertinent information.

This is good, I'd add:

> Replace invalid ASCII char. Fixes rake error 'invalid byte sequence in US-ASCII'. See #123

So people can get the life story if they want it.

One downside to that approach: it requires your issue tracker to be stable for long periods of time. I've worked in a number of places where that's not true and you end up needing to figure out that the #123 linked by the system you're using now was actually #123 in the old system and was migrated as #456 in the current one.

There's a balance here and I especially like that this commit message has enough information to make searches really easy should you need to do something like that.

That's why most guidelines for commit messages prescribe a short description and an optional long description. The message in the article does not have a short description, which would have been easy to include. For that reason, it's not "My favorite Git commit message" either.

I agree with the sentiment; this message is quite long for an invalid character in a file.

House style in the companies I've worked for is to include a link to a bug report and or code review that provides more context for those who want it. Even without that added context, I'd rather know

do you think a junior developer, or maybe somebody not vary familiar with Linux would not learn anything or benefit from reading those comments?

Obvious point is that commit messages can be used besides what was done as a form of documentation and teaching tool (why, how).

That's great for you. You don't want that. However, if you code in a team, doing everything for your own wants rather than considering the needs of the team (present and future) is just bad software engineering.

That belies the effort that went into the fix

One of my favorite pranks is to put control characters in the commit message (like the bell) and then you get an auditory notification anytime anyone nearby opens your commit messages.

Does this actually work on modern editors / browsers?

Only in the terminal. That said, the first thing I put in my .xsession is `xset -b` to disable the audible bell.

Also, if you haven't seen it before, read the Linux kernel Changelog. The latest Changelog can be found at [0]. Almost every commit tells a story, unless it's a trivial fix. If there's a bug, it often contains detailed analysis and rationals, and it's a form of important documentation.

Although it's not always practical to follow them in personal/work projects - Linux commits are the results of multiple rounds of reviews, and the commit log is its justification - but in personal/work projects, commits are made in real-time as soon as you debugged/refactored something. But I still use Linux kernel as a guideline for my own commit log, at least for new features or bugfixes.

[0] https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.7

I was working with a research team at UCLA and we used Wolfram Mathematica to process our results.

In Mathematica, pretty much any object can be a variable name. You can drag a JPEG of Kim Jong-un into Mathematica and integrate an expression with respect to Kim Jong-un. We'd sometimes get a kick out of that.

Near the end of the program, our whole team needed to process the last 3 months of results, but we were all getting consistently incorrect factors off when running our Mathematica notebooks. Five hours later someone discovered that one of the variables contained a stray Unicode whitespace or null character (or maybe a non-Unicode blank Mathematica object, such as a Graphics object with 0 area) that someone must have accidentally spawned somehow before saving and distributed the notebook to the rest of the team. Since Mathematica didn't recognize it as spacing but as part of the variable name, making it a different variable, the result of our integrals were incorrect. E.g. the integral of x^2 is x^3/3, but the integral of xx' is x^2x'/2, so the multiplier would be off by a factor of 3/2.

After discovering and selecting it, we "cut" it into the clipboard, pasted it into another Mathematica notebook, saved it, and it was never opened again.

This reminds of a Markdown issue I've had many many many times - sometimes (and only in some engines), headings would not render and I'd only get '# foobar' instead of '<hx>...'

It took too long for me to track the issue. When I write '#' using alt-3, I then write a space and oftentimes I don't lift alt soon enough and alt-space creates a non-breaking space (on macOS). And some/most Markdown engines don't recognise '#nbsp;text' as a heading.

I suspect something like this happened in the commit linked here.

This happened to me all the time, especially with python2 code without encoding declared (which caused a failure to parse the file because of the comment).

I’ve since switched my editors to highlight such characters.

Happened to me on linux all the time when writing the pipe operator in a terminal. Thankfully the error message lead me to a fix.

I love these commits. Then don't have to be this verbose, but they have to tell a story of why things were done. I can sort of deduce the what from the code itself, but the why is sometimes shrouded in mystery.

I started with these explanatory git commits a few months ago and they are super useful, even if you're just reading your own commits from some time ago.

To me they are red flags because it means that very likely who writes this kind of commit messages doesn’t use a proper issue tracking system.

Totally agree with you.

Putting this in the commit is not easily searchable, not universaly accepted and thus not expected, not practical and certainly can't involve discussion easily. This can be replaced with ticket number as most ticketing systems will actually read commit logs for those in order to associate ticket with code.

Now, there is that problem with decoupling code and story, but this is technical problem, nothing stops Gitlab and friends storring issues and friends in the repository itself.

I use these commits, but also use a proper issue tracking system. So I'm not quite sure your comment applies. The reason I'm doing this is:

1. If I'm looking at some code, I want to see its history without having to switch between git(lab|hub) and jira or whatever system I'm using.

2. The issue tracking system doesn't necessarily have some kind of resolve, a summary of what and why happened. It does have a description and a series of comments, but a summary is usually lacking.

3. I believe my commit history will far outlive any issue tracking system I use. So I'm safer putting information into both.

To me its beautiful, because it does what an issue tracking system does not do: it explains everything. Who, why, how, what, when. It is beautiful and simple documentation.

Issue tracking typically revolves about the who, what, when - not why it happened, or how it was resolved.

This is why I believe that code can never be fully self-documenting. I can't understand why the code exists from reading it. All the floofy contextual stuff is missing, and commits like this help to explain the floofiness.

I think that you are not using properly the issue tracker. You absolutely need as a bare minimum the why and the reporter of the issue. When you do a blame you can easily see the issue id and open it with 1 click to understand why that code exists and who requested the change.

But now I have to go find the issue in a completely separate system, rather than just look at the commit message.

It’s added complexity, which is something we strive to remove from our code.

It's better to have these in with the source control system,

When you change or switch to a different ticketing system, you will bring these with you.

If you are switching ticketing systems without migrating all the issues you are doing it wrong.

Honestly, I think better rules are:

- Whenever someone asks a question in a pull request, answer it by putting a comment in code. - Include a ticket number (for whatever ticketing system you use) in the commit

Why? I find that commit messages are black boxes. They only come out when doing a git blame, but they don't show up in my IDE. Instead, I'm more likely to run across messages like this when they're comments in code or discussions in our ticketing system.

I think if we had tighter integration among our IDE, git, and the ticketing system, detailed messages commit messages like this would be extremely useful.

Code comments have a habbit of getting lost during future tidy ups or refactors.

If the code is that unclear it needs a comment, refactor it into a named method, where the method name describes exactly what it's doing.

I'm glad my near-exact pain has been experienced by others. I had an undefined function call of ' ' in a ruby script years ago. Finally, I turned to a hex editor at the suggestion of a colleague. The culprit was non-ascii whitespace that ruby decided should be a function declaration. Copy pasta error out of a hipchat code snippet.

Sorry, beginner here, can you explain more simply the difference in the whitespaces? Are they just encoded differently?

There are full on character encoding introductions. I think an easy to approach one is:


Basically, ascii is encoded as integers. '32' represents a space. However, ascii is quite limited and if you want non-US-centric characters, you need to use other character sets. For a myriad of reasons, once you go into unicode (the Universal Character Set) there are lots more options for characters. For example, there are multiple whitespace characters.


These exist to give different widths or other adjustments to text that is non-ascii. What likely happened in the post (and what did happen to me) is copy-pasting from some document that changed a normal ascii space (that Ruby would expect and know how to deal with) into a unicode character that Ruby interprets as any other character. It would be like having a stray 'g' in the line, but you, as a developer, don't see it. Fun :)

This explanation of where the non-breaking space came from feels really likely: https://news.ycombinator.com/item?id=21290159

Pasta and computers are always a bad mix...

Especially spaghetti code...

I'm always intrigued when developers complain that it was "[surprisingly short amount of time] of my life I won't get back."

Maybe I'm not as clever, but I'm lucky if I fix an issue like that within a few hours. It can sometimes derail a workday. In fact, fixing in a few hours would be something worth celebrating!

Seriously. I'd count my lucky stars if I squished that one in under 4.

When reading code, you want somthing simple and to the point most of the time (if that). Yes, it's nice to be able to tunnel down into the details - but this was fixing a bug, the code should just have a short description and more so - the reference number of the bug and if you need to, you can look that up on your bug tracking solution and get that detail.

Big problem with detailed comments, things change and comments (like code) can become obsolete/redundant and not reflect the code as the code got updated, but the ocmment did not.

No solid solution really and gets down to preference and also mindfulness of the life of code/comments.

Would be great to have code that you could rightclick and get the documentation, some woudl even prefer being able to write documentation and that gets turned into code, others would love code that could could be autodocumented. Get's down to taste, preferences and more so, experience. See, every programmer over time will eventually encounter a situation on somebody else's code that they are maintaining, fixing or replacing and find that the comments do not reflect the code. You eventually get down to the stage that you almost actively ignore comments based upon such experiences.

So whilst a detailed description in the form of a comment is good, it can and should be elsewhere, either the initial spec and program documentation or in this instance - bug tracking software system and just simple short line with bug reference or indeed just bug reference.

Contrary to the author's point, I don't think a git commit becomes more beneficial to the readers by adding a human context.

Building "compassion and trust" actually distracts readers from the essence of the commit: what happened and why. I am not discarding the importance of human element in collaborative endeavors but maybe such area should be pursued outside of a version control system.

I wish Git wasn't like some generic term for source control. A little diversity is good!

Good point - but what is the use case where another source control system (like SVN?) works better ?

There's not really any case where another source control system works better, but lots of cases where they work just as well.

Mercurial, for example, is functionally identical to git, but some people prefer its interface.

For a highly centralized organization where people only ever work on the organization's intranet, a centralized source-control system like SVN works well enough, and may have some advantages for the organization.

Actually, I found one just by reading this thread : Fossil has integrated Bug Tracking, Wiki, Forum, and Technotes. Which is great since I would prefer not having to worry about backing up these in the first place ! (Then git also has git-bug for at least some of this functionality...)

Git does not handle really large repos. You can search the internet for the term monorepo and see what organizations like Facebook, Google, and Microsoft are doing about that. None of them are using plain vanilla git.

Virtually all use cases in my experience.

You'll have to give some specific examples...

Git is particularly poor in scenarios such as two people working on the same branch. SVN handles this with ease, but with Git it takes a lot of coordination amongst both contributors to keep things working.

I do not believe that is a good pattern. First, two devs probably should not be touching the same functionalities (if they are, they ought to be basically pair programming). So diffs should be orthgonal. If diffs are landing in the same branch, each dev should be using their own feature branch, and ideally PRing them back to the branch, but for small hacks, merge is fine.

Think fractally. The farther you get from master, the smaller and more atomic each commit should be.

If your merges are taking lots of coordination or failing to auto-merge, you probably have some poor engineering hygeine at play. Every time I've had merge fails, it's due to haste/sloppiness or a dev branch diverging too much from a mainline.

This is my favourite way to write commit messages. Like a blog post on the thing I’m doing.

I wish it were easier to gather, annotate and contextualise (perhaps with images) the contents of a commit message.

My favourite Github commit was someone removing their password from a test list in a penetration testing tool, because they didn't want anyone to know their password. I just tried, but couldn't track it down. The subsequent comment trail was hilarious.

This is the best PR thread on the internet. Thank you for sharing this.

> It makes everyone a little smarter

> One thing Dan did here that I really appreciate was to document the commands he ran at each stage. This can be a great lightweight way to spread knowledge around a team. By reading this commit message, someone can learn quite a few useful tips about the Unix toolset:

> [..]

In the spirit of making everyone smarter: simplicity matters. Using the combination of find -print0 and piping that to xargs -0 is much easier than the mentioned abacadabra of characters.

From the xargs(1) manual:

> The options are as follows:

> -0 Change xargs to expect NUL (``\0'') characters as separators, > instead of spaces and newlines. This is expected to be used in > concert with the -print0 function in find(1).

I don't like all these information shoved in to commit logs. This should have been filed as issue and then linked to the commit. The issues are much more searchable, readable, commentable, archivable and interactable in many different ways.

I figure that so few people read commit messages that in most part of time this is kind of useless. Specially in an early stage of a project. Things will change as faster as I can type a message such as this.

Code is simple. Humans overthink. I would prefer a commit message such as: "Fix invalid byte sequence in US-ASCII when running bundle exec rspec." than a dev that keep stucked trying to write a cool message and never fix the issue.

> few people read commit messages

That very much varies. In a project where 90% of commits are “fix”, “foo”, “commit”, etc. then yes. Nobody will ever read that (or do pretty much anything else useful with a VCS).

On the other hand, when every commit message is on the level, the yes, people do read them. Actually, first step when investigating any problem or trying to understand some code is to look at the commit log.

See e.g. Linux kernel or some of the Google-related open source projects (chromium, webrtc, etc.) for examples of good, long-form commit messages.

In a professional context (and not only), one would think that this should be the normal expected good practice. But unfortunately, that is not the case.

It is so surprising (well not really) to see how, in most cases, developers put so little to close to zero effort in writing proper commit messages and more in general to have a clean commit history. They simply don't care and you keep seeing garbage commits with non-sense to close to empty message and description. Sadly enough this is seen as normal and just accepted.

Every single team I have been working with from small to large organizations I always had to pick up on the "write proper commit history" fight. And even after extensive explanations on why you should do that, people simply don't care and they keep pushing stuff like: "fix", "updated class z" and stuff like that.

Commit history does not seem to be part of the review process.

Sometimes it is just so depressing to see how so unprofessional software engineers are.

If it makes you feel better, biologists are often no better. Our equivalent to git commits is labelling tubes and keeping little excel databases of what has gone where. Often databases stop being updated or people give their tubes esoteric labels that are meaningless to those who look at them a year later. As a research assistant in a large lab, I discovered blood tubes with literally no labelling, and often spent hours searching for samples in the labyrinth of freezers in that lab. It is also not uncommon for papers to be retracted because the original authors lost the raw data!

In an organization I've worked at we used to write good commit messages, about 9 years ago, before we started using Github. Then our good commit messages turned into good PR intros. I think Github and Gitlab etc are fantastic, but I'm a little sad that so much valuable information has been divorced from the git repository itself (and of course, the ultimate fate of those PR intros across the open source world depends on the companies hosting the repo.)

Personally, rightly or wrongly, the fact that I can't use Github/Gitlab to contribute to Django and Emacs prevents me from trying to make contributions to those projects. Similarly I find the insistence on using email to send patches, frustrating, when I know PRs (MRs) work so well. However, I guess Emacs and the linux kernel are keeping their good commit messages in their git repo and not losing them to a hosting company.

Maybe there should be tooling for automatically converting a PR intro to a commit message.

I really like this commit message. I've found that switching to git from more traditional version control systems requires a lot more discipline in some ways. A lot of people just commit, commit, commit lots of incremental changes with no context or story to them. I've seen pull requests with dozens of tiny commits together make up a cohesive effort, but individually are just useless. I've been really having to push my team to spend time to cleaning up their commit history before getting their pull requests merged.

I think it's really important to capture the context and indent behind changes. I may be weird, but when I'm fixing an issue, often try to find when it was introduced, which often provides really useful information for the fix. That's much harder to do if the commits aren't cohesive and the messages aren't descriptive.

Implement and enforce conventional commits: https://www.conventionalcommits.org/en/v1.0.0/

I'm one of those people. I make lots of little commits, it gives me space to really make a mess of coding going down some rabbit hole and performing 'reset --hard' when I get too away from myself, and track what I'm doing locally. As long as each commit isn't causing a problem with CI/CD, and my pull request to master is well documented what is the value added of cleaning up commits?

(Junior developer here, looking to be convinced!)

Depends how you handle your PRs. If you squash and rebase within Github or similar, no problem.

But ideally interactive rebase before you push your PR and tidy up all those commits into larger topical ones.


"DEV-1 - Write tests for widget X calculator" "DEV-1 - Implement widget X calculator" "DEV-1 - Refactor widget X factory service"

> I make lots of little commits, it gives me space to really make a mess of coding going down some rabbit hole and performing 'reset --hard' when I get too away from myself, and track what I'm doing locally.

I think this is totally OK, just as long as you squash those all down before someone has to merge your PR.

> As long as each commit isn't causing a problem with CI/CD, and my pull request to master is well documented what is the value added of cleaning up commits?

Because it's hard to make sense of all those little commits later, so why keep them around? They're just noise with a very limited future value, and I don't want to have to sift through them in the future. It's basically impossible to clean up those kinds of messes once they get established in master, but it's very easy to contain them at pull request time.

> I like Git commit messages. Used well, I think they’re one of the most powerful tools available to document a codebase over its lifetime. 1000000% agree!

One of my co-workers in my previous job, I miss reading his PR and git message. It's such a joy reading his PR. I still remember reading his PR on introducing Babel to our big, old Rails 4 app before webpacker, Ruby Babel Transpiler came to life. It's like taking a journey with him. You can see his smile, struggle, surprise and all the emotional moments in his commits. He put his findings, why he made this decision, and where he found this solution in the commit msg. I learned a lot just by reading his PR. I think reading a well organized PR, clean git commits and descriptive commit messages (even the code review comments are very useful) is one of the best ways to learn in work, especially for new hires.

20 years after UTF8's creation[1], still having to spend an hour of your life "fixing" your content for tools which only handle US-ASCII.

[1] https://en.wikipedia.org/wiki/UTF-8#History

While humerous this comment is way to verbose. We prefer the following template >

Issue: What is the problem we are attempting to fix

Cause: What is the root cause of the issue, since with bug reports this is usually much different then the random musings of the reporter

Fix: How does the commit address the cause.

Each of these should be 1-3 brief lines

This really just depends on your team/company/culture.

Lengthy commit messages are not really required if you have associated tickets in a bug-tracking or project-management system. More often than not, you'll just be duplicating info.

This is true until the company changes the bug-tacking and project-management software without properly porting over everything because they use different identifiers.

I tend to put a link to the external system with a very brief explanation, allowing someone to quickly assess the what and why with the ability to dig elsewhere for more detail.

You could say the same for improperly porting commit messages when switching version control systems.

Some of this background decision-making information can be included as developer documentation, whatever form that takes, e.g. as comments (usually for low-level) or sibling README file (usually higher-level).

Commit logs will have the greatest detail, but they also are the costliest to dig up, often requiring multiple rounds of `blame`. They are therefore most appropriate to include information pertinent at integration-time, namely code review context/justifications.

Merge commits (such as those created during typical PR/MR merges) have similar potential to include explanatory background, but at a coarser granularity, e.g. feature level.

Adding a test to prevent this error in the future would also be nice to see, something to mention in code review... Which the commit message makes much easier (without explanation this diff is confusing).

Should this go into a commit message, instead of an issue/ticket?

I say yes. I like to keep information about the code as close to the code as possible. Issue trackers come and go, and even if you keep the same issue tracker around, how are you going to relate the change in the code to the particular issue down the road?

FWIW, I also prefer READMEs to Wikis.

> Issue trackers come and go

Unless you use Fossil :-P. Or for git/hg/svn/cvs/Folder - Copy(43), one of those ticket trackers that work with files inside the repository itself.

> Issue trackers come and go

What do you mean? Just put an issue ID in the code and/or commit.

Let’s hope the SaaS issue tracker you’re currently using never goes out of business or changes the product in a way that makes it worse for you. Or, if you host your own, that it keeps you satisfied in perpetuity.

Referring to the issue ID in the commit message is a fine practice in addition to writing good, comprehensive commit messages. Commit messages that consists only of an issue ID are – in my experience – utterly frustrating to deal with. They tell you nothing more than this change is somehow related to this or that bug or feature, but not how or why.

> Let’s hope the SaaS issue tracker

So the solution to a unreliable issue tracking solution is dumping that responsibility on your VCS? Why not fix the concerns you have with your issue tracker?

> writing good, comprehensive commit messages. Commit messages that consists only of an issue ID

Who said anything about only including a tracker id? The issue here is the extra verbosity in the commit message. What will the tracker tickets contain that isn't in your "comprehensive" commit message?

* So the solution to a unreliable issue tracking solution is dumping that responsibility on your VCS? Why not fix the concerns you have with your issue tracker?*

… Or, and hear me out, how about not worrying about that, and just use your VCS to accomplish something it’s imminently well suited for?

Also, how do you propose I solve the issue of the issue tracking service maybe going out of business or that of a more compelling product coming along?

To me, the primary purpose of an issue tracker is to collaborate on and track work in progress, and that’s what I use them for. I don’t find that they are particularly valuable as historical records of the source code.

But then you switch out your issue tracker service from something like Jira to something else, and suddenly that ID or URL means squat. A git repository can easily be pushed to any git based service be it GitHub, GitLab, Gitea, or something else, and the commit log and commit hashes stays the same.

Porting Jira issues to a different system would probably not preserve those IDs that you entered into your commit message. By all means, refer to your issue tracker in commit messages, but be aware that those references may not be valid in a few years.

Just put the old ID in the new ticket, then search. Or put commit hashes into the ticket and search by that.

> A git repository .. and the commit log and commit hashes stays the same

If issue trackers are so transient and flaky, and VCSs are so solid, then back up your old issue tracker and put it in git. What if your issue tracker stays, but you VCS changes?

As close to the code as possible means a code comment, not a commit message.

Since this is describing the commit and what was done and why, the commit seems like a better place.

In tools like GitHub, if you make a PR with this commit, it will also automatically put the text in the PR description.

I would much prefer this at work over what I usually see with inconsistent commit message styles and not explaining properly what was done, and not following the recommended max length per line.

but "why" includes lots of detail about the steps they went through. Why can't "there was utf-8/non-ascii whitespace in the file" Cover all of that? Why detail all the steps to reproduce?

Absolutely yes. As a developer I can:

  git log | grep <error message>
  git show <commit id>
I don't have to go looking for an issue tracker and figuring out how to search it effectively.

As a developer, you can also search the issue tracker, and don't have to go looking for commit in the VCS and figuring out how to search it effectively.

This is a big discussion.

In a professional setting, companies usually want this information to live in the issue tracker. Mainly to provide insight to managers/other teams without looking at commit messages.

But it removes the information from the code: you now need to look at the issue tracker to make sense of changes, eg when looking at the history of a file, or with git-blame.

I'd argue that all relevant information that affects the code and architecture of the specific repo should live in commits. They should not only tell you what was changed, but also why; and provide enough context to understand the change in the scope of the repo.

All information that is not directly tied to the code, eg cross-repo/product/etc concerns can go in the issue tracker.

Of course this only works well with a `git commit --fixup` and squash + rebase / squash + merge workflow.

And with monorepos it also becomes a tooling problem.

> you now need to look at the issue tracker to make sense of changes

Not really, the commit message can be informative without being this verbose.

> you now need to look at the issue tracker

Doesn't seem like a bad thing to me. Issue trackers are designed to search through.

Yes, because it lives with the code.

The commit message still logs in the usual way, but it carries the whole set of information with it, in a way that a centralised ticket system doesn't.

When a developer is looking at logs for solving some problem, they can easily review the rationale for changes.

I'd much prefer the log explaining everything, rather than having to look to a ticket that may no longer exist.

> in a way that a centralised ticket system doesn't

Why doesn't it? Are closed issues not searchable?

It seems to me you doubt the ticket retention, but instead of fixing that, use commits to store issues instead.

How do you provide comments or updates on an issue "relevant to the commit" without arbitrary commits?

> Why doesn't it? Are closed issues not searchable?

Everywhere I have ever worked has changed ticket systems at one point, and even when they are transferred, the transfer is "lossy".

Heck, just moving from one JIRA version to another can be "lossy".

> How do you provide comments or updates on an issue "relevant to the commit" without arbitrary commits?

In the ticket system. I was not advocating that a ticket system is pointless, because they are very useful.

But a decent commit message about why a change is necessary, especially when it may not be straight forward, can save you plenty of development time further down the line.

I have mixed feelings about JIRA, but if you move to a centralised ticket system you have to plan migration.

If you really need to, put more detailed notes into an "Issues" file for commits/minor releases, and clear it on every major release.

IMO the given example is more verbose that needed. It should give enough information about what and why a fix is done, but no more.

A great question! I’m still wondering why we all mostly use separate tools for tickets and version control (and knowledge base, and collaboration, and management/hiring, the list goes on), given we already know a lot about software development.

I think it's more than that. Sometimes I feel there is a disconnect between code history and repo contents e.g the current repo often contains a folder with the entire history of migrations, changelog etc

Fossil - https://www.fossil-scm.org/

Code, commit logs, tickets, and project management are all part of the same repo with Fossil.

If you put it in a pull request, that'll just put it in the merge commit instead.

I'd prefer it being on the commit itself.

In my opinion absolutely no. It should go in a Jira issue and the commit should have the Jira id in the message. In this way when blaming you are just one click away to the full issue description, with properly formatted text and screenshot if needed. It will also be visible to all the team in the Jira board and they don’t have to click on the specific commit to notice that problem.

WHY > WHAT... really helped my documentation, comments, commit messages. I think it’s an under utilized part of coding.

I just can’t wait for bigger companies to get onboard. At least in EE/CE it’s tough trying to figure out INTENT sometimes. I see the code, service, headers, docs but you’ve never told me how you intent for me to use this! Maybe your plans were awesome but would change how I was planning to work your thing in. A single page on INTENT would go a long way sometimes.

I really like Google's guidelines for commit messages because they enforce a style like this. It really makes dealing with legacy code much easier when you can look at past commits and see that your predecessors were thinking. https://google.github.io/eng-practices/review/developer/cl-d...

I feel kinda dumb. I've made many 1,000s of commits since 2011, and I never realized until now that commit messages can have a "body."

To avoid those kind of issues, non-ascii characters are forbidden in our code base. They are automatically verified in a pre-commit git hook.

I intentionally add non-ASCII characters to our code, so that an incorrectly configured IDE or bad tool fails.

75% of the development team has at least one non-ASCII character in their name, so it would be pretty rude otherwise.

It's much better to knowingly reject a tool at the start, since it can't handle ordinary characters, than find out a year later with the first e.g. British customer that it can't handle "£".

Gmail ignores anything past the + in username+foo@gmail.com. I wonder how much bobby-tables-esque havoc I can wreak with name+£€ϵ@gmail.com.

Edit: already found one. Leet is supposed to end with Compart › unicode “” U+2608 Thunderstorm Unicode Character.

Doesn't show up on my comment.

If you don't allow non-ascii then you may have to mangle some people's names in the copyright header.

It's good to check files for unexpected characters, though. Here's some Perl to do it:

  perl -e 'binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8");
  foreach (split(//, join("", <STDIN>))) { ++$c{$_}; }
  foreach (sort(keys %c)) { printf "%8d %s\n", $c{$_}, $_; }'
I tested that on a text file I was working on ... and I discovered that the file contained a BOM (U+FEFF), not at the start of the file, but at a random point in the middle of the file. I've deleted it. Who knows what problems it might have caused for me later?

You could have a pre-commit git hook that refers to a whitelist of allowed non-ascii characters, or also allow all alphabetic characters, or something like that.

It should be checked as part of the CI too, if you're doing that, some people might never install the hooks, and some people might git commit --no-verify.

Indeed but in our case it is for a private repo with only 5 developers. The hooks are installed automatically when running make.

This issue would never have happened using modern software. And if your text (including source code) still isn't in UTF-8, you're doing it wrong.

(I guess unless you're using some specific, very performance-conscious system that has to use 7/8-bit characters and is never going to connect to the Internet anyway.)

It's like preventing the headache with a guillotine.

Why? Usually non-ascii characters belongs to translation, and translation usually not belongs to codebase.

Documentation stored in your repo, unit tests, and comments. If your README.md includes code bracketed with backticks, you've got non-ASCII characters in your repo.

Things like Péter Rózsa or Kg.m² or ±0.5 PPM or C11 Standard § all work with my toolchain. Why mangle them into ASCII when there's no need to?

Maybe it works in your english codebase, but I'd be very hesitant about rolling it out everywhere. For example, Perl 6 supports unicode in identifiers so it's perfectly valid to write your code in Japanese. Another increasingly common example I've seen in python is the use of emoji in command line tools.

> For example, Perl 6 supports unicode in identifiers so it's perfectly valid to write your code in Japanese.

Raku/Perl6 also has non-ASCII Unicode operators (they all have multicharacter ASCII aliases, but the Unicode characters are usually more readable.)

They definitely belong in comments though. Names, non-English languages and the like.

This (primarily American) attitude to encodings is why we're in this situation.

Just off the top of my head, test cases would be a valid reason to have non-ASCII in your codebase.

I like spending some extra effort on my Git commits (really, typing a message like this doesn't take that long, compared to the amount of time spent doing the change in the commit). It's just a shame that both GitLab and GitHub do not render Markdown in the body of the commit messages, and present them like actual prose rather than a wall of monospaced text.

This is a lovely commit, thanks for sharing it.

Over the years, I took the habit to request from my coworkers to write semantic commit messages, like https://www.conventionalcommits.org/

You can use git hook libraries like the python pre-commit or javascript husky to check your git commit messages format.

"1 hour of my life I won't get back".

I wish. You know when this kind of bug happens to you it blows away half a day at least.

I’m torn about whether I’d like to see this kind of information in a commit message, vs. Something more like:

“Remove parser-unsupported character. Closes #340295.”

...where ticket 340295 (wherever, not necessarily Github Issues) goes into more detail about the cause, investigation, and resolution process, as a history of the evolution of said process across a conversation.

IMHO you should pick a right medium for some kind of messages and I dont think git commit message is the right medium in OP example. Most companies have ticketing tool to keep track of additional information like: how long did it take to solve that issue, was there any input from other people, long "marked up" message explaining what was wrong, how it was fixed and tested (markup is a lot easier to read than console log).

Using ticket number already forces you to use "the right tool" to view the message.

IMHO I would consider OP example as a bad practice and your example as a better solution.

The other side of this argument—why I’m torn—is that the git commit message isn’t necessarily written by the developer who developed the solution, but might instead be written by a project maintainer who received the solution as a patch, or is copying a fix from a downstream or sibling-fork project, as a standalone patch.

In other words: if you have a ticket tracker with this information captured in it, it makes some sense to just link to it. But if you have a mailing list with this information captured in it—in only the loosest amalgamation, where there’s no clear “thread” that contains the whole discussion, and the original developer of the patch might not even be a part of that discussion—then it seems like it’d be important for the final committer to write their own summary of the events that led to this commit, such that people can understand what went on if they weren’t following the list. And where do you put that summary? The commit message.

I would argue that maybe this is a core part of git’s design, given that it was developed specifically for the LKML style of “patches first, sent to a list, and then discussed in the context of what they solve”, rather than the GitHub style of “discussion first [on open issues] sparks PRs that attempt to solve [i.e. close] the issue.” Git assumes that you, as an “editor”, are going to be summarizing an otherwise-illegible discussion history for the benefit of the people viewing the commit; and so it provides a multi-line commit message as a place to stow that editorial summary.

This is awesome until you can't access the ticketing system, or there is context outside that system, or even better the company decides to change tools and loses the mapping to old issues (true story. joy of startups).

By all means put the ticket number in the header line, along with type of patch.

But do yourself a favor and put enough context in too. Doesn't have to be all the detail.

The problem with this approach is when the team changes its ticketing system or the codebase goes to a different team (that has a different ticketing system), meaning all that context is lost. Happens all the time.

I ran into a similar issue a few years back. There was the same non-space whitespace character spread throughout the codebase. I tracked it down to a single developer, who had no idea why that was happening. My guess was that they were copy/pasting a lot from MS Word documents, but we never found out for sure.

The worst thing about Gitlab and Github is that commit messages are not immediately editable in a PR/MR conversation. You can find conversations like this all over {gitlab,github}-dot-com but you cannot find them in commit messages anymore.

    Closes #123456
instead of actual background.

It's bad for your codebase.

Hidden characters again. The answer on stack overflow I am the most proud: https://stackoverflow.com/questions/41061400/perl-join-strin...

My personal favorite is one where I actually fixed a bug by flipping one single bit.


Gah... About a third of my commit messages are just "minor cleanup" or a variation. Should I be ashamed?

I think shame is an awful motivator. Instead I would praise you for identifying an opportunity to improve :)

Should have been in a doc or wiki instead of commit message.

I have never seen any dev searching for error messages in commit messages.

For the rest of the points (makes smarter, builds trust and compassion), if it's so worthy put it on the blog (like this blog post itself) so it can has a potential to reach some reach some audiance.

> I have never seen any dev searching for error messages in commit messages.

I've seen many devs search for error messages in Github search. That often turns up results in people's comments in issue threads, but the search also includes commit messages.

I have had github issues come up for search results frequently, but never a commit message.

But then again, I am not saying "don't write the story". If you think you found something worthy, just write a blog post or even a pastebin/gist would be better in terms of the number of people it reaches.

agreed. My team used to like of extensive commit messages, so that if you had trouble with a piece of code you could just git blame it and take a look at the referred commit.

The problem with that approach is that it doesn't survive as well as you'd like, because fixing a typo in a line would get you "ownership" of the line (since only the last person to change it is blamed). It's even worse in semi-major refactors due to moved/renamed files being treated as new....

Far better to have docs apart.

> fixing a typo in a line would get you "ownership" of the line (since only the last person to change it is blamed).

That's a UI/UX/usability problem with blame, not an inherent one with the practice. Github's blame UI solves this very elegantly (blame history can be traversed easily), as do some others.

> It's even worse in semi-major refactors due to moved/renamed files being treated as new....

This on the other hand is a real problem with Git, but I don't see that it's strictly related to putting context in commit messages. This issue occurs either way.

I think this should have been an issue. Open an issue with the original error, document the troubleshooting in the comments of the issue, then close the issue with the commit.

Why not put it in both?

Commit messages should be short and to the point. I don't want to read a story to understand what this commit did.

“Convert utf-8 to us-ascii to fix error”. Says right there in the commit title, short and to the point.

Absent some automated summarising functionality that produces just the right level of detail for you personally, a concise title and a very detailed commit message that you can skim through to find relevant bits is an eminently reasonable compromise.

Who says this?

I follow the linux kernel guidelines, which have large descriptions in commits. After all these guys wrote git.

Here's linus's take https://gist.github.com/matthewhudson/1475276

1) Just read the title 2) Maybe have TL;DR section for a summary, and details for who cares.

Having a rich commit message is extremely important for capturing code review discussions and design decisions and glue things together for the set of changes in a single place. The advantages far outweigh any negatives from a short commit message that will not give you the "why" of the solution.

This is one thing the Linux kernel does comparatively well compared to most open source projects.

Related: Greg Ward - Documenting history, or How to write great commit messages: say what, not why. https://www.youtube.com/watch?v=Jb6ij4eRu6c#t=378

One of my favorites, and I can't find it now. Someone's last commit message on the deprecated perl project was in 3D ascii art; something along the lines of "No More Perl." I have a few projects I'd like to write that to :)

General education in a commit is questionable... OTOH find -exec and escaping the ';' (or '+' for xargs-like one line) was helpful (hard to parse --help, though manpage is clear). Now I don't know what to think.

The Holy Order of Git Commit Log Bikeshedding and Overengineering have their field day. And they all have it wrong. Any commit log that doesn't compile to an automated build script isn't worth the bytes it's made of.

What I gather from this is that commit messages are a better place for documenting stuff like this than polluting every line of code with often redundant comments.

I like it.

I think the git log is an interesting place to document this kind of thing. Also, there should never be redundant comments. I do think though that lots of folks put things in the git log that _should_ be a code comment. For myself (as most of the people I can think of that I work with), we don't reference git commits unless we are actively investigating a previous change. You should not hide reasons for code being the way it is in the git commit; it should be exposed for direct observation to someone who might edit the code. Ex: '// the following delete call is required to remove the item from the DNS cache to ensure the test validates non cached dns items'. That should be in the code.

A very descriptive commit that doesn't actually fix the bug, only removes the offending character that was triggering it.

But why was this a problem in the first place? Pretending non-ASCII characters don't exist isn't the right solution.

Why are they using us-ascii instead of UTF-8?

I had a similar issue with zero-width space recently. Why in the world does that character even exist?

I've used it to hint good line-break positions in a text body where the soft-hyphen does not apply.

I often review commit logs of my teams, especially while we are tracking down problems or I'm making sure the release notes capture everything.

There has to be a balance between this and "WIP"; I'm imagining trying to page through the commit log to see what changed when every 1 line change has a 35 line commit associated with it.

That’s the beauty of it, the first 50 characters of the first line (IIRC) will be shown as a summary, and this one summarizes 35 lines in seven words.

I believe there's a standard format for this.

The first line is the title, then there's a description.

These heavy descriptions of commits, are used quite heavily in linux kernel dev

Would `git log --oneline` not help?

This would be terrible without a good subject (the first line).

seems like the commit messages lines are in reverse order.

tell me what you did.

then tell me why you did.

leave the saga for the end text so I don't have to read through your musings to get to the meat.

I'm just told to not write so much bullshit.

I love descriptive git commit messages. For those complaining about the length: 1) Just read the title 2) Maybe have TL;DR section for a summary, and details for who cares. 3) Use pretty printing for git log

Having a rich commit message is extremely important for capturing code review discussions and design decisions and glue things together for the set of changes in a single place. The advantages far outweigh any negatives from a short commit message that will not give you the "why" of the solution.

Just link to the bug

TL;DR: Written language can be used to communicate information.


You've repeatedly crossed into personal attack on HN recently. That's not cool. I've banned this account until we get some indication that you've read https://news.ycombinator.com/newsguidelines.html and sincerely want to use HN in its intended spirit. That spirit is intellectual curiosity and kind, thoughtful conversation.

Comments like this one, and https://news.ycombinator.com/item?id=21276666,

and https://news.ycombinator.com/item?id=21204403,

and https://news.ycombinator.com/item?id=21149400,

and https://news.ycombinator.com/item?id=21123836,

are not ok here. It's a pity, because you've also posted quite good comments. But they are not worth the cost of the worst ones.

We detached this subthread from https://news.ycombinator.com/item?id=21290076 and marked it off-topic.

>> I don't want your entire life story in my commit log. >Then I never want to work with you ever (or any code base you ever touched) because of your laziness...

What an incredibly aggressive response.

I think it's an appropriate response to such a blunt dismissal of an entire way of being

The comment it responded to wasn't exactly friendly either.

Ironically, i'm now curious about kissgyorgy's life story. How did they get so touchy?

So you never want to work with... any junior dev ever? I can't think of a single person we have hired out of college who used Git well.

I love coaching smart junior engineers, and the first thing I teach them is using Git properly and the importance of good commit messages! :)

The problem is not someone not knowing how important it is but want to learn. The problem is the people with this kind of (lazy?) attitude NOT WILLING to do what is absolutely necessary in the long term.


We detached this subthread from https://news.ycombinator.com/item?id=21290517.

The commit was complaining specifically about the way locales were designed, not C as a whole. While C was very successful, I don't think you could argue that C locales ever reached the same level of popularity.

That being said I do agree with you that this complaining is not massively productive. Dealing with localization and non-ASCII text is notoriously difficult. Look at Java, Python 3, Windows, PHP 6 and how many "misconceptions about Unicode" articles you can find online. You could spend hours pointing out how each approach has tons of drawbacks but clearly the perfect solution doesn't seem to exist so a compromise had to be found.

In particular I'm not sure I agree with his complaints about locales being global state. How else would you handle them? You need to have some kind of global config flag somewhere to decide what local the user wants to use. Having explicit versions of the stdlib taking locales as parameters could be nice I suppose.

This bit in particular seems to completely miss the point:

>Idiotically, locales were not just used to define the current character encoding, but the concept was used for a whole lot of things, like e. g. whether numbers should use "," or "." as decimal separaror.

Of course if this programmer assumes that locales and charsets should be the same thing they'll end up frustrated.

When locales were invented, it was reasonable to assume that the locale would determine the character set. With the subsequent invention of Unicode that no longer needs to be the case, but code standards live forever.

It wasn't really that reasonable as soon as it was made global. It's exactly the kind of assumption that mostly works but has dark corners from the beginning.

Some imaginable version of locales was probably a good approach at the time, but that wasn't what got standardized.

I don't disagree, I've run into those dark corners myself. But the ease of use of globals is undeniable - there's a reason the Singleton pattern is still popular after all the ridicule it attracts.

Sure, it's easy to use. But that isn't the job of standardization bodies, or the right test to use on something as low level as this. It's harder to get right, but it is exactly these sort of failures in standardization that cause the most global pain, because they wend their way through everything.

In this case I don't think the locale functions were designed by committee, I think they were accepted as-is from a popular implementation. And they were implemented that way because it was the simplest most straight-forward way (I'll admit I'm just guessing now). Were they part of K&R C?

No, of course not.

>When locales were invented, it was reasonable to assume that the locale would determine the character set

Sure, but that's not what the commit is saying. Instead it's saying that it should only determine the charset and that Unicode effectively makes locales pointless. That's absolutely not the case, there's a lot more to locales than character encoding.

I think we're in agreement. There are lots of aspects to a locale, and character sets are only a small part of that. Possibly the most visible part though.

Unicode doesn't make character sets pointless, it only makes them deprecated. It's still useful to have a way to convert from one set to another, and it's a shame the standard library doesn't have an easy way to do that. The deficiencies of locales are visible only in hindsight.

Locales being global state only makes sense for single-user applications. That assumption is no longer true once you have a server where every request may be from a different user. A better way to handle this is something like Go's context object, which gets passed explicitly.

That's not a very common use case for C, either now or back then (arguably you probably had more user-facing C cgi back then, but it was still one invocation per user so arguably you could set the locale for each call). Some webapps use C in the backend but generally don't deal with the localization at this level.

I'm not saying that C locales aren't bad and limited, I'm saying that it's a compromise that makes some sense. In particular when you're trying to bolt something into an already extremely popular language instead of designing a new thing from the ground up.

Can you imagine the churn if the C standard suddenly introduced a whole new set of string functions just to deal with the locales? Well, you don't really have to imagine, just look at the way it works on Windows with their wide strings.

This comment comes across really mean and judgmental. I didn’t read all of the original commit, but it read to me as informative and passionate with all of the frustrations solving a difficult problem comes with.

Honestly, this comment is so over the top I can’t even detect whether you’re serious or not.

It’s a shame you see it only as complaint, the brilliant thing about the text you’re replying to is that it educates the reader as to the bigger problem, as well as being very funny. Does your complaint do either? Understanding why C locales are broken may very well contribute to the lasting growth of humanity. Did you note that Eric Raymond fully agreed?

RIP git log

Someone typed and commited that whitespace character on purpose.

Yeah but because git is a command line tool, it doesn’t really encourage you to do this in the first place...

The title of the commit was enough for me. I wouldn’t have read the content anyways

When tax payers foot the bill it's easy to spend entire morning on a single line commit.

For many or most guys out there, reality is way different. For one, it's an actual reality, not VC or government funded Utopia.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact