
My favourite Git commit (2019) - neo2006
https://dhwthompson.com/2019/my-favourite-git-commit
======
bassdigit
The author spent all the time documenting his journey but totally fails to
explain the actual reason of the problem: If the file contains a UTF-8 byte
sequence, why is there an attempt to parse it as US-ASCII? I get that it's
just a wrong space, but if some day somebody actually want to put a UTF-8 char
there for some reason, this commit essay wouldn't help at all whereas googling
stackoverflow seems to deliver (unverified):
[https://stackoverflow.com/questions/15947425/rake-tasks-
fail...](https://stackoverflow.com/questions/15947425/rake-tasks-fail-with-
invalid-byte-sequence-in-us-ascii)

~~~
bassdigit
Following this thought, it could be argued that the author did not even fix
the problem, only eliminated a symptom of the underlying, serious
configuration mistake that restricts file contents to an outdated character
encoding.

~~~
mar77i
I like your approach. And the blogpost's author does not even hint that the
story told could mask an underlying issue at all. Which I find mind boggling
somehow, but can we call this symptom whack-a-mole and what can be interpreted
as anxiety-driven reasoning a cultural issue? Or is there a different, broader
context that can wrap this up better that I'm not seeing?

~~~
bassdigit
The commit message succeeds in justifying the effort but lacks an in-depth
understanding of the related mechanics. I can imagine the developer felt bad
about 'only' committing a one character change at the end of the day and wrote
the message to make up for it. Because of this un-economic outcome, he
hesitated to dig deeper.

There is no reason to feel bad.

The cultural issue is bad management that still evaluates developers by lines
of code / commit activity.

------
jspash
I'm just curious but... did this actually fix anything other than the spec?

The reason I ask is that I find myself (with Rails particularly) spending an
awful lot of time on maintaining tests that were written by a much younger and
much less experienced developer (me, a few years ago).

And I've come to the slow realisation that a lot of what I was testing wasn't
really important.

I'm not saying the author shouldn't have written the spec, or that the spec
wasn't valuable. But maybe the true bug here was that rspec matchers don't
honour utf-8 spaces and perhaps they should? I dunno. Just thinking out loud
here.

~~~
hinkley
Getting people to write tests is hard. Getting them to write matchers... it’s
like you’re asking them to fly. Which is a shame because you can avoid so, so
much test boilerplate by writing a high quality custom matcher.

~~~
jfkebwjsbx
What do you mean by matcher?

~~~
allover
> "matchers" [...] let you test values in different ways

Example docs for Jest's matchers: [https://jestjs.io/docs/en/using-
matchers](https://jestjs.io/docs/en/using-matchers)

Generally speaking, if you use a specific matcher, you get a better (more
informative) error message when the test fails.

~~~
jfkebwjsbx
Ah, the common testing functions like `assert_equal`. Thanks!

~~~
hinkley
Right, and some APIs let you write your own. Getting a custom one to hand out
actionable error messages is a bit tricky but quite worth it.

------
pvorb
I, too, use git commit messages to document the reasoning behind my changes.
But I'm unsure if my colleagues even notice, since they regularly write bad
commit messages like "Fix bug" and things like that. Also I regularly find my
commit messages got lost, since the reviewer squashed several commits into
one.

Do you have any tips for establishing a culture of good commit messages?

~~~
optimuspaul
I usually start with being aggressive about people not squashing commits. I
think it's bad form. I also find shame works, as well as blocking merges when
reviewing when commit messages provide no value. Could also start a swear jar
of sorts, the offender puts a dollar in whenever they push a commit with a
useless message... and use the proceeds for charity rather than something that
benefits the team.

~~~
l0b0
This requires an existing buy-in from the team. How do you establish that buy-
in in the first place? I know trying to be a good citizen isn't enough,
because the worst offenders simply aren't interested in the revision history.

~~~
optimuspaul
Great question I don't have a solid answer to. Due to my seniority I tend to
either be the lead or act as the lead for teams. It is easier for me to reason
with the team to buy into these kind of things. The worst offenders will tend
to be weeded out over time either by changing their ways or leaving. I don't
have much sympathy for them if they can't see how their actions affect others.
A team has no place for a rogue agent.

------
judge2020
Previously:
[https://news.ycombinator.com/item?id=21289827](https://news.ycombinator.com/item?id=21289827)
(domain has changed since the last post)

~~~
saagarjha
I knew I had seen it earlier and was wondering why I couldn't find it…

------
ahh
I do love the social history of good commit messages, and what they teach us
about the process of what we do.

I will offer one nit here, which is that I _always_ start my commit messages
with one-line summaries; this makes it really easy to scan commits (and git is
tooled to do this!)

~~~
sixstringtheory
I don't understand your nit: is it that you don't see a one-line summary
(which exists: "Convert template to US-ASCII to fix error") or that you don't
feel it correctly summarizes the change?

~~~
BoorishBears
It doesn’t summarize the intention of the changes, but it does summarize a
bunch of stuff already in front of you when you’re looking at a diff...

-

Remove non-ascii character to fix error from `bundle exec rake`

-

That commit message would add external context (what was broken/where the
error was coming from) and intent

------
rachelbythebay
I wonder: what is the bad character? It's apparently not 0x20, but what is it?
How did it get there? What keeps this from happening again to someone else?

~~~
robert-boehnke
If I had to guess, the no break space that macOS has on [alt]-[space]: " ".

EDIT: Hmm, that doesn't seem to be it, here's the commit in question
[https://github.com/alphagov/govuk-
puppet/commit/63b36f93bf75...](https://github.com/alphagov/govuk-
puppet/commit/63b36f93bf75a848e2125008aa1e880c5861cf46#diff-18df4bf9d3ebe0f2e326ef9a94f3146c)

~~~
bassdigit
It IS the no-break space. The hex UTF-8 representation of it is C2A0; you can
find that sequence in original version of the file.

------
c3534l
What's the latest on how long a commit message should be? Used to be everyone
said a message should never be longer than two sentences and that they should
be as short as possible so that people actually read them. This thing posted
is an essay. Not the haikus I thought was considered best practice.

~~~
chucksmash
I've always just gone with

52 ~line~ [edit: character] summary

Then write a book, just being sure to wrap at 72 chars. Never really
considered what an optimal commit message length was.

Maybe it's something along the lines of "if you are playing poker and you
can't figure out who the easy money is, the easy money is you." I've never
seen a commit message, PR description, etc and thought "well that was
excessive," so maybe I'm the one...

This one stood out to me: 157 lines of rationale, discussion of alternatives,
etc for 22 lines of uncontroversial changes[1]. It's much more likely to be
useful in and of itself as a piece of documentation than a one-liner, "Changes
to prevent strcpy" though.

[1]:
[https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...](https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5ef6a52fd46ee6dd)

~~~
temac
While I'm perfectly fine with this commit-essay, and would probably be with
any other, in _this_ particular case I would also be perfectly fine with a
terse "ban strcpy" \+ actually a few more comments in the source code.

Commit messages are not going to serve as a knowledge base for state of the
art and best practices, and strcpy is universally recognized as problematic.
Plus, about the how, while it is fine to see the dev thought a lot about it,
did some testing, etc., the way he did the ban is ultra classic in the end.
Also, it might have been even better to put some short explanations in
comments instead of in just the commit message, in particular the usefulness
of the BANNED macro to preserve some line numbers with gcc. So in this
particular case I'm not really convinced that the commit message serves any
strong purpose.

But it does not hurt. And it participates to a culture where useful commit
messages are important. So I still (weakly) prefer it to my hypothetical "ban
strcpy".

------
mroche
Linking to the mpv commit[0] message brought up in the last discussion[1].
It’s absolutely hilarious, and a great example of someone rage typing after
dealing with an issue.

[0] [https://github.com/mpv-
player/mpv/commit/1e70e82baa9193f6f02...](https://github.com/mpv-
player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe)

[1]
[https://news.ycombinator.com/item?id=21290517](https://news.ycombinator.com/item?id=21290517)

------
s3graham
My personal favourite commit (as opposed to commit message)
[https://chromium.googlesource.com/crashpad/crashpad/+/badfac...](https://chromium.googlesource.com/crashpad/crashpad/+/badfacccee)

(Sadly, the commit message isn't great, but clearly it's worth that hash.)

------
sytelus
Please do not do this. Long essay style commit messages are not really
searchable, have no real way of discussions, do not support formatting, images
etc and are hard to even read in many tools. A better way is to create an
issue first and attach it to your commit.

------
kimusan
This is my favorite git commit: [https://android-
review.googlesource.com/c/platform/system/bt...](https://android-
review.googlesource.com/c/platform/system/bt/+/1241366)

result:

[https://android.googlesource.com/platform/system/bt/+/refs/h...](https://android.googlesource.com/platform/system/bt/+/refs/heads/master/gd/README.md)

------
dirtydroog
That's just way too verbose though. There should have been a ticket with a
description of the problem. Name the branch after the ticket and jira will
automatically link the ticket to the commits that fixed it.

I've encountered bad utf-8 when data is copied from excel or outlook.

~~~
gwright
I prefer the engineering details to be in the commit message and not an
external issue tracking system. That system might not be accessible 3 or 5 or
10 years later.

~~~
dirtydroog
That is true, and it might not work for Open Source development, but I don't
see managers trawling through git commits to see what was done / is in
progress.

~~~
couchand
Then find better managers.

------
jariel
Can someone comment on a best practice whereby the commit might merely refer
to the problem ticket (in whatever system), which should have all of the
relevant problem histories, and the articulation of the solution? As opposed
to all of this documentation in git?

~~~
saagarjha
If you company has a bug tracker like this internally and you maintain the
project "in the open", please don't do this. I'm looking at you, Google and
Apple…

------
Jefro118
Does anyone here try to enforce good commit message practices within their
engineering teams?

------
juped
All that text and it doesn't mention that the offending character is a U+00A0
NO-BREAK SPACE with UTF-8 bytes C2 A0.

------
codegeek
Tl;Dr: Sometimes, Why is more important than What.

~~~
cborenstein
+1 that Why matters most.

My teammate recently wrote a blog post on how we document Why in git commits
and PR descriptions.

[https://medium.com/better-programming/daily-habits-to-
turn-y...](https://medium.com/better-programming/daily-habits-to-turn-your-
git-history-into-valuable-documentation-15113e1bf312)

Would also recommend checking out a related talk by a tech lead at StitchFix:
[https://confreaks.tv/videos/rubyconf2018-documentation-
trade...](https://confreaks.tv/videos/rubyconf2018-documentation-tradeoffs-
and-why-good-commits-matter)

------
neo2006
It's not only about the fact that the commit message for a one char change is
about a page long. It's about that all the information in that page long
commit message is a useful information

~~~
BoorishBears
Strongly disagree.

There is so much superfluous cruft in the message.

Listing the error, and the fix, would give exactly as much context for the
next person to run into this issue.

If your goal to explain the process to find similar cases, just a couple more
lines would have done just as well:

-

Fixes case where `bundle exec rake` would fail with error message:

ArgumentError: invalid byte sequence in US-ASCII

This is caused by the presence of non-ASCII characters in files.

The following command was used to find files with non-ASCII characters:

`find modules -type f -exec file --mime {} \\+ | grep utf`

Running `iconv -f UTF8 -t US-ASCII` on listed files will show the exact
location of offending characters.

-

No long winded exposition, no extremely case specific program output
(including current machine's name...)

I’m not saying this is the worst commit message ever, and I’d appreciate the
intention if I came across it, but if you’re caring enough to give this much
context, you can save both yourself, and the next person to look for your
message, some cognitive load by sticking to what’s needed.

-

And if you’re wondering how to decide “what is needed”, it varies, but I think
a good rule of thumb is: ask yourself how one would find this commit message

What would someone grep for in git's history if they came across this? And if
they found it, what information answers their query?

They’re likely to search the command that is failing, and the error.

The answer to that query is, what causes the command to error out, and how to
find cases of that cause.

They’re not going to search for the program output of find, or iconv, not
going to search for the exact file you were modifying when this happened, or
the fixes you tried that didn’t work.

~~~
skrebbel
I disagree.

I mean, you're right, but I don't think that's time well spent. This is a
commit message, not a blog post. Writing concisely is hard and time consuming.

This was probably written in stream-of-consciousness fashion which strikes a
nice balance between time invested and possible future usefulness (if any). I
mean, the commit might be useful in the future, but there's also a reasonable
chance that nobody except the reviewer of the PR is ever going to read that
commit message. (If the OP hadn't blogged about it)

~~~
BoorishBears
How is writing concisely more time consuming than this long winded stream of
consciousness with multiple quoted command outputs and explanations of non-
fixes?

If I come across an error and it takes two commands to fix it, my immediate
intuition is a commit message with... the error and the two commands. And
that’s extremely quick to write...

It’s definitely not “let me write a detailed account of the last hour of my
life”... that immediately feels like it will take much longer

Also to be clear, the rule of thumb I mention is not about “what should you
think about, then write down”

It’s a filter on what you were already about to invest effort in writing down.

If you feel like an even shorter message is good enough, that’s great.

Just realize sometimes less is more...

~~~
JoshTriplett
It's the exact opposite of "If it was hard to write, it should be hard to
understand". This is "This was hard to debug, so let me make it easy to
understand and give you (or future me) the hard-won information I just
discovered so that you don't have to go through what I did".

If the information took you a great deal of effort to get, that suggests it
may be valuable to share. That may not be the right time to attempt to cut out
some of that information for brevity, especially if you don't know what the
future reader of the message might need to know.

~~~
BoorishBears
What information in the concise version I gave is missing, that causes you to
go through additional effort?

I think you edited this in after my reply:

> If the information took you a great deal of effort to get, that suggests it
> may be valuable to share.

I think this is the core of where we disagree. I often spend insane amounts of
time debugging something, only to find the solution was quite simple and
unrelated to what I tried.

Often the amount of time you spent debugging is because the search space for
your solution was unbounded, not because each step you took was particularly
relevant to the actual solution.

To me a commit message should be able hopefully narrowing that search space,
not documenting the original unbounded one.

~~~
JoshTriplett
I'm not suggesting that the version you gave would be a problem in practice;
I'm also not suggesting that the commit message in the article is the best
possible commit message for that change. I'm suggesting that after a long
debugging adventure, I'd prefer to err on the side of including any
information that feels hard-won, rather than trying to make the message as
concise as possible at the _possible_ expense of missing an important detail.

I've read many commit messages with too _little_ detail, and zero with too
_much_ detail. (I have read code with too many comments, but it's rare; I've
never once read a commit message that made me think "too much detail".) I'm
sure it's possible, but on balance I prefer to cultivate instincts of
"document anything that was hard to learn" and "in the moment, prefer to over-
document rather than under-document".

~~~
BoorishBears
Isn’t this exactly what I said in my original comment...

I appreciate the intention and wouldn’t complain, it’s just if you care enough
to write so much, save yourself sometime and the next person to read it some
effort by being a little more concise.

Commit messages are like naming things, there’s no right way, but the more you
try to do it “right” the more “right” you get.

If you just throw your hands up and say “I’m goin to vomit our every thought I
had working on this”, sure it won’t kill anyone and it’s preferable to always
writing _nothing_ (that’s kind of obvious...), but you’ll never get to the
point where writing even somewhat balanced messages is comfortable

------
bassdigit
Paraphrasing the first comment of the commit[1]:

A commit message with such a long explanation helps nobody, because it is not
searchable by Google, not even by GitHub[2]. [...] If the commit message
requires a blog post to be good (searchable) – it is a poor commit message.

[1]: [https://github.com/alphagov/govuk-
puppet/commit/63b36f93bf75...](https://github.com/alphagov/govuk-
puppet/commit/63b36f93bf75a848e2125008aa1e880c5861cf46#commitcomment-35571322)

[2]: [https://github.com/search?q=invalid+byte+sequence+in+US-
ASCI...](https://github.com/search?q=invalid+byte+sequence+in+US-
ASCII&type=Commits)

~~~
emj
I must say our private git repos are a treasure trove of information, not
using git for help is immensely stupid. In the same say it's sad that Jira and
all do not help you find commit messages in a better way.

