
A Git Horror Story: Repository Integrity With Signed Commits - mikegerwitz
http://mikegerwitz.com/docs/git-horror-story.html
======
dkarl
Code gets reviewed for production at 6am by a guy freshly rousted out of bed
who committed some of it four hours earlier, and who is doing this at the
insistence of a "frantic" colleague, the fix involves "a bunch of commits"
from more than one person, and his main worry is that he might be held
responsible for a hidden back door maliciously inserted by his coworker?
That's like driving home drunk with your headlights off and worrying that your
car could get blown up by a drone because of eroding limits on the use of
military force against civilians. I realize it's hypothetical, but man, fix
your glaring problems first before you worry about unlikely scenarios.

~~~
heretohelp
This does a great deal to explain why Facebook's release engineering ignores
most attempts to push something out-of-band unless they can justify on the
basis of real breakage for the users. (Cf. HotFix Bar)

------
roel_v
I'm not sure what this 31 a4-pages story is saying beyond 'webs of trust are
hard'. Nor am I sure what the first several long-winded pages add - they don't
demonstrate a concrete problem, yet they are referenced to several times with
the words 'as demonstrated by the story in the beginning, ...'. The summary at
the bottom doesn't have a whole lot less information than all the content
before it, and doesn't lead the reader to believe for the first 10 minutes
that some ground-shaking vulnerability disclosure is imminent.

~~~
nknight
It's a detailed exploration of git object signing.

It's not trying to describe a "ground-shaking vulnerability". There is a
current, known issue with display of git signature verifications, and he's
including it in his article. That's _not_ the _point_ of the article.

The point is an exploration of options for securing your repository history.
That's it.

~~~
roel_v
Well then maybe (actually, not "maybe", but "certainly") the title should be
"a detailed exploration of git object signing", and not link bait like "a git
horror story". But of course the first title, while much more accurate and
informative, would not draw nearly as many website hits, so the author
deliberately set out to mislead people with the title, in order to get more
hits/reads. This sort of misrepresentation of content is what makes the web
much harder to explore and the continuous stream of content much harder to
keep on top of for the information consumer, and therefore should be called
out and chastised.

But then again, I'm just an old grumpy man who is much faster to criticize and
be negative than he should be, so who am I to cast judgement? Do I ameliorate
or exacerbate the situation with posts like my previous one or this one? I
don't know, maybe I'm part of the problem in that way. Either way, I much
preferred (so are my motivations selfish? Is that good or bad? I think good,
but how do I _know_?) the old HN when submissions like this one would be
labeled for what they were.

~~~
mikegerwitz
I'm sorry that you feel that way. You have understandably misinterpreted my
intent. The title was not intended as bait, nor was the story. Rather, I enjoy
writing and was hoping to appeal to readers that have difficulty reading dry
material (which is the material I tend to write), while at the same time
providing something to relate to in order to put the article into context. The
title itself was a play on a television series --- American Horror Story ---
that my girlfriend once fell asleep to as I was hacking late one night. The
story was intended to augment the actual useful material (code samples and
explanations), not obscure it, which it appears to have done.

Please notice that this article is segregated from the rest of my blog with no
glaring/obvious links to it. This was intentional. I posted this article in
the hope that it would be useful to others, not because I would gain any
benefit from doing so.

I will take your comment into consideration for future articles.

~~~
lincolnq
FWIW, I read the article because of the attractive title you gave it. I found
the beginning exciting and the rest quite informative. If you had titled it
more boringly, I wouldn't have chosen to read it.

------
mdehaan
Warning -- This article posts a longwinded first person account and finally
ends with "this scenario is purely hypothetical". So it sounds like it is
talking about a security vulnerability with git that really happened (signed
commits or tags being broken, perhaps?), and it's actually an article saying
you should use signed commits if you can't control access to your repo.

------
crazydoggers
`git tag -s` is all that's needed. The primary problem in your horror story is
that you GPG signed a commit with a tag when you shouldn't have. When you tag
with GPG, you are saying you _do_ trust the history. You didn't review the
commits you were signing with the tag and that's were you made your mistake.

Linus himself is against GPG signing commits.

[http://git.661346.n2.nabble.com/GPG-signing-for-git-
commit-t...](http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-
td2582986.html#a2583316)

Ultimately a human being has to be responsible for reviewing code. Once it's
reviewed, you can tag and sign it. No GPG signature is going to make up for
not reviewing your code.

------
eykanal
Interesting solution, but is this really a problem? The only situations I can
think this applies to are large-scale open-source projects with many
maintainers, and even more people submitting patches. For closed-source
projects without external patch submissions, this is really only avoids the
security threat of someone sitting at your computer and committing malicious
code to your repository, which (to me) doesn't sound like much of a threat.
(At least, if that _is_ a threat, rogue commits to your git repo are only one
part of your problem, and door locks would probably be a better, more broadly
beneficial solution than implementing this.)

~~~
wyck
Yes of course, creating a backdoor or sabotage via revision control is a
really good attack vector (like really). Recently several large open source
projects had shady forged commits, so it's safe that say that anything private
is also being targeted.

~~~
MattJ100
Apart from Ruby, which?

~~~
marshray
Wordpress <http://wordpress.org/news/2011/06/passwords-reset/>

------
Roritharr
This describes pretty exactly why security questions are hard.

It's not as easy as using tool x and generate key y... its more than using
software, if you want security, you have to think about the whole stack, down
to the access rights that each part of your hardware has.

There are good and best practices, but with IT Security, using the best
practices in one place and just plain forgetting to unlock you PC when you go
to the toilet at the office is more than enough...

People are allowed to be afraid when asked security questions.

~~~
nknight
> _just plain forgetting to [lock] you PC when you go to the toilet_

You're not even trying. Physical access, game over, the lock is irrelevant.

You fear what you don't understand, and you don't understand because you're
afraid to learn. You have simply crafted a vicious cycle for yourself, and are
making the world that much more unsafe by spreading your fear and acting as if
everyone should share in your fear.

~~~
Roritharr
I'm not spreading fear, since i consider HN a place where people are educated
enough understand the problems in it security.

Yes, once someone has hardware access, its devilishly hard to secure an
environment, but just stating that its impossible is as wrong as security
systems that ask their users too many questions to which they most likely have
no good answer/no doubtfree answer.

There are systems which work pretty securely, i have worked with several
educated security experts to create security infrastructures in companies.

I'm not saying that i don't understand the problems or don't want to learn
solutions to them, i'm trying to defend the position of the user that can't be
asked to learn about all the caveats of it security, since it is one of the
most complex problems in computer science.

No, not every front-end developer that has to check-in his html und js files
can be asked to understand all principles of it security.

------
rmc
tl;dr : You can GPG sign git commits with -S option. This makes things more
secure

~~~
natep
But it's not a cure-all

~~~
rmc
Yes. "more secure" ≠ "100% secure"

~~~
natep
And sometimes, "more secure" == "less secure than before due to complacency"

------
tytso
The article puts forward a false dilemma by implying your only choices are (a)
squashing all of the commits into a single patch, (b) examining the end point
and then only signing the end point (with the risk that there might be some
bad stuff hidden in some of the intermediate commits), and (c) examining and
signing every single commit.

In fact what we do in the kernel community is most patches are in fact
reviewed via e-mail. This increases the likelihood more people will look at
the patch and notice any potential problem, and more importantly, the e-mail
archive introduces an additional external record of the proposed change. After
each patch in a patch series is reviewed, in isolation, with each change small
enough that it can be easily reviewed in a mail reader, and with especial
attention made to maintaining bisectability, the maintainer who will
ultimately by signing the git pull request will apply it into his or her tree.

For larger changes and for larger subsystems where we have submaintainers, it
is the maintainer's responsibility to make sure the submaintainer sends a
signed pull request. At that point the submaintainer is assuming full
responsibility for the set of patches which he or she is sending to the upper-
level maintainer, just as the maintainer assumes responsibility for the
patches he or she sends to Linus.

In the case where the submaintainer does not send a signed pull, or the
maintainer does not have full trust in the submaintainer (either for social or
technical reasons), it is the responsibility for the maintainer to check each
and every commit for correctness. If he or she can't do that, he shouldn't
accept the pull request; instead, he might request that the patch series be
sent to the mailing list so everyone can review the patch series.

One of the things which follows from this is that for me as a kernel or
e2fsprogs maintainer, github pull requests are not particularly useful. Most
of the time I have no idea who the person is that initiated the pull request,
because the primary community is on the mailing list. Worse, the pull request
only gets sent to maintainer, which makes the maintainer the review
bottleneck. If the patch series is sent to the mailing list, as individual
commits, with no requirement to jump to a browser, it allows the review
workload to be distributed to other developers on the mailing list.

As far as I'm concerned, a github pull request is the same thing as a patch
sent via private, personal e-mail. I'm going to review it very carefully, and
only apply it if I _know_ that it is clean. Typically the review involves
extracting the commit as a diff, and as has been discussed in other places,
github doesn't encourage properly formatted commit descriptions, so I end up
needing to extract the diff, and then reapply it with an appropriate commit
description. So for me github pull requests generally represent negative, not
positive value, because of the quality control and quality assurance
requirements of the code bases in which I work.

~~~
mikegerwitz
The style of review you mentioned (which I made a brief mention of in the
article) still falls within the three merge options. In the case above,
maintainers seem to be doing option b or c, depending on trust. Instead of
signing the merge commit, the pull request is signed. One of the downsides to
this is that it does not extend the signature to the repository and requires
the users to consult the mailing list. That said, I'm rather fond of the Linux
and Git style of code review and would recommend it in addition to what was
recommended in the article (with the former only being necessary if its
benefits are desirable).

Thank you for the detailed explanation; it did help to clarify details I
assumed, but was uncertain of.

~~~
tytso
No, there's no need to consult the mailing list, except if you want to see the
"legislative history" of the review process.

The person requesting the pull signs the pull request by signing the git tag.
The GPG signature on the git tag covers the complete history of up to that
point in the tag, so there's no need to sign each and every git commit, as you
propose. When the upstream maintainer (or Linus, at the top level) accepts the
pull request, git will automatically create a merge request that contains a
copy of the git tag's GPG signature in the merge request. (After this point,
the git tag can be discarded). When Linus releases something, he signs a git
tag for the overall git repository, and everyone can check that.

So for example, when I am satisfied that everything in the ext4 tree that I
want to push to Linus is OK, I create a signed tag named "ext4_for_linus", and
send him a git pull request, which is created using the "git request-pull"
command. This creates an e-mail message contains the URL of my git repository
and the name of the signed tag. When Linus accepts my pull request, the
resulting merge commit will contain a copy of my GPG signature from the git
tag. This signature signs my name to the commit id of the git tag, and the
chain of SHA-1 hashes inherent in git guarantees that the entire commits back
to the beginning of time came from me. Of course, my changes will branch of
something like v3.3-rc2, which is in Linus's tree, so what is being signed
that is of interest is my new changes between v3.3-rc2 and the tip of the ext4
tree.

Once Linus has accepted my pull, I'll reuse the ext4_for_linus name for the
next GPG-signed tag for the next 2-3 month development cycle, since the
important bit (the GPG signature) is preserved for long-term posterity in the
merge commit. But what's important to note here is that the signature in the
merge commit is signed by _me_ , since it contains the contents of what I
pushed to Linus, even though the merge commit was created by Linus when he
accepted the pull request by merging in my changes into the repository. What
Linus signs is the GPG tag for the actual released kernel (i.e., v3.3).

See? So there's no need to refer to the mailing list to validate the pull
request or to establish after-the-fact accountability. The mailing list is
just used for review purposes. But of course the review is a critical part of
the acceptance process.

~~~
mikegerwitz
> the resulting merge commit will contain a copy of my GPG signature from the
> git tag.

Ah, I see. Thank you for the clarification.

It was not overall integrity of the trees that I was bringing into question.
While the code review process may be a problem for certain organizations, I am
not doubting that process for the kernel. With such a process in place, signed
tags are likely to be sufficient, yes.

Note, however, that one of the points I was trying to make with the article
was proof of authorship (to prevent someone from committing as you, much the
same way as you would want to prevent someone from fraudulently sending an
e-mail or message through any other means). It is for this reason that the
workflow you described is not a complete alternative, but would be an
excellent means of augmenting the review process for any project.

That said, author identity is not a concern for most projects, in which case,
based on the quoted portion of your comment above, my arguments do not
necessarily apply (assuming that all commits that are not yet ancestors of a
tag _are_ ancestors of a signed merge, as mentioned in your comment).

Might I ask how the maintainers deal with issues of identity? If they
encounter a commit in your pull request that appears to be from a well-known
contributor, do they verify his/her identity before merging, or do they not
care?

~~~
tytso
We do have a ring of trust which we established recently, which is rooted in
cross-signed keys belonging to Linus Torvalds, Dirk Hohndel, Peter Anvin, and
myself. We all know each other quite well, both in person and via conference
calls, so that was easy to set up after the kernel.org security breach. All
keys in the key ring is signed by at least two, and preferably more, other
trusted keys, all leading up to the root keys.

But that's only important from the perspective of knowing that a particular
public key belongs to a person, which in turn is only really important when a
maintainer wants to know whether or not the person who signed the git tag
really is the submaintainer who is claiming to have signed the git tag.

Most of the time, we don't really worry about identity; what we worry about is
the correctness of the commit. Especially for patches being sent via e-mail, I
really couldn't care less whether the identity of the person sending the
e-mail was Christoph Hellwig, or his evil twin. I'm still going to review the
patch very carefully, because we all make mistakes. That's my job as the
maintainer; to ultimately be a crap filter, and to reject crap, no matter
whether it's written by someone famous or some nobody.

So when I send a pull request to my upstream maintainer (which in the case of
ext4 is Linus), he really doesn't care about the identities of the authors in
my pull request, because it is _my_ reputation which is on the line; I'm the
one who signed the git tag pointed to by the pull request. Just if it gets by
me, and then it gets by Linus, it's ultimately Linus's reputation which is on
the line.

So that's why your question is a bit ill-formed. Someone who is above me in
the maintainership hierarchy isn't really going to care much about the
identities of those who are below me in the maintainership hierarchy. The
trust relationship is fundamentally between me and Linus, and then I have
trust relationships between me and those developers below me. So Linus doesn't
care about the identities of those which are two levels below him in the
maintainership hierarchy, so long as he trusts me to have done my job
properly. Of course, if Linus discovered that I had been blindly accepting
pull requests from my downstream without doing proper quality control, then
he'll yell at me and he might refuse to take future pull request from me (or
at least threaten to do so). But normally, there's no need for him to verify
the identity of the authors of commits in my pull request; he only needs to
verify _me_ , and to verify that he trusts my technical judgement and quality
control checks and processes. If decides he doesn't trust me or my technical
judgement then he won't accept pull requests from me; and in that case he
would be absolutely right to do so.

------
raverbashing
So what does this means is if someone compromise their GPG key (more or less
easy) and get his passphrase (not difficult) they can sign all kinds of
commits and make it look like he did it. Great

Security is only as good as the weakest link.

Also, don't make it look like it's more secure when it may not be

------
asharp
From what i've read, the problem is that an attacker can add commits with
forced Author information into a central repository to frame somebody else.

Wouldn't the signing of all commits as they are committed solve this problem?
(ie. rather then trusting Author information from the commit, trust the
signed-by information to give author information?)

~~~
mikegerwitz
Note that Signed-off-by (as added by -s) is different than -S (GPG-sign). The
-s option simply appends the "Signed-off-by" line to the commit message, so
this can also be forged.

The GPG signature cannot be forged (access to the private key is needed).

------
jentulman
Thank you for taking the time to write this up in such detail and for making
it an enjoyable read too.

------
kevinpet
I can't read this. If there's a vulnerability in the sequence of events
described at the beginning, it needs to be called out more explicitly. If I
wanted a mystery, I'd pick up some Agatha Christie.

------
Natsu
This part bothers me:

git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" --first-parent
\ | grep -v "${t}G$"

What of the case where some smartass decides to

$ git commit -am 'Yet another foo G'

So long as it doesn't have a bad signature, it will give an empty string,
right? So how do I know where the G came from?

~~~
mikegerwitz
I do appreciate the attention to detail. Fortunately, even if "G" were
appended to the commit message, grep would fail to match (and remove) it: a
tab is output directly before %G? (which, as you mentioned, could be an empty
string). This ensures that there will always be characters after the subject.

That said, it could be terribly confusing when viewing the output. For that
reason, perhaps another character after the tab (before the %G?) would be a
good addition.

~~~
Natsu
Is there no way to sneak a tab into the message?

~~~
mikegerwitz
Indeed there is, but there will always be a tab after that tab. Consider this
commit message:

    
    
      Foo\tG
    

This would result in the following output:

    
    
      ... Foo\tG\t

~~~
Natsu
Ok, that makes more sense. It's hard to see whitespace, so I didn't notice a
tab lurking just before the newline.

~~~
Natsu
Wait... if I can sneak a tab in there, what if someone hid a _newline_ in
there, too?

Would you not end up with

    
    
      foo\tG
      \t
    

And break the regex? Or is there absolutely no way to smuggle a \n into the
message?

~~~
mikegerwitz
%s inserts the "subject", not the entire commit message. The subject is
considered to be the first line of the commit message (I forget if there is a
character limit), so a newline would simply be considered the delimiter (and
would not appear in the output).

Otherwise, you would be correct --- that would have rendered the check
useless.

~~~
Natsu
Ahh, you're right. I get a little paranoid when dealing with user-supplied
strings, but usually that's a good thing :)

~~~
mikegerwitz
Absolutely; your critical thought and inquiry is a very positive thing.

------
rmccue
Wouldn't this be solved with a merge commit? If I can see where a commit
merged in, I could prove that I didn't author it, as I don't have access to
the other author's machine.

------
hdmi
Very good and long read; should wake up a few people

------
bp2070
An interesting article, but I would prefer the author presented the issues
first in a VCS agnostic way (as they are not specific to git) and then
introduce the hypothetical scenario using git.

Actually these issues are not even specific to VCS, but security of most
digital information.

