Hacker News new | past | comments | ask | show | jobs | submit login
A Git Horror Story: Repository Integrity With Signed Commits (mikegerwitz.com)
163 points by mikegerwitz on May 22, 2012 | hide | past | favorite | 44 comments

Code gets reviewed for production at 6am by a guy freshly rousted out of bed who committed some of it four hours earlier, and who is doing this at the insistence of a "frantic" colleague, the fix involves "a bunch of commits" from more than one person, and his main worry is that he might be held responsible for a hidden back door maliciously inserted by his coworker? That's like driving home drunk with your headlights off and worrying that your car could get blown up by a drone because of eroding limits on the use of military force against civilians. I realize it's hypothetical, but man, fix your glaring problems first before you worry about unlikely scenarios.

This does a great deal to explain why Facebook's release engineering ignores most attempts to push something out-of-band unless they can justify on the basis of real breakage for the users. (Cf. HotFix Bar)

I'm not sure what this 31 a4-pages story is saying beyond 'webs of trust are hard'. Nor am I sure what the first several long-winded pages add - they don't demonstrate a concrete problem, yet they are referenced to several times with the words 'as demonstrated by the story in the beginning, ...'. The summary at the bottom doesn't have a whole lot less information than all the content before it, and doesn't lead the reader to believe for the first 10 minutes that some ground-shaking vulnerability disclosure is imminent.

It's a detailed exploration of git object signing.

It's not trying to describe a "ground-shaking vulnerability". There is a current, known issue with display of git signature verifications, and he's including it in his article. That's not the point of the article.

The point is an exploration of options for securing your repository history. That's it.

Well then maybe (actually, not "maybe", but "certainly") the title should be "a detailed exploration of git object signing", and not link bait like "a git horror story". But of course the first title, while much more accurate and informative, would not draw nearly as many website hits, so the author deliberately set out to mislead people with the title, in order to get more hits/reads. This sort of misrepresentation of content is what makes the web much harder to explore and the continuous stream of content much harder to keep on top of for the information consumer, and therefore should be called out and chastised.

But then again, I'm just an old grumpy man who is much faster to criticize and be negative than he should be, so who am I to cast judgement? Do I ameliorate or exacerbate the situation with posts like my previous one or this one? I don't know, maybe I'm part of the problem in that way. Either way, I much preferred (so are my motivations selfish? Is that good or bad? I think good, but how do I know?) the old HN when submissions like this one would be labeled for what they were.

I'm sorry that you feel that way. You have understandably misinterpreted my intent. The title was not intended as bait, nor was the story. Rather, I enjoy writing and was hoping to appeal to readers that have difficulty reading dry material (which is the material I tend to write), while at the same time providing something to relate to in order to put the article into context. The title itself was a play on a television series --- American Horror Story --- that my girlfriend once fell asleep to as I was hacking late one night. The story was intended to augment the actual useful material (code samples and explanations), not obscure it, which it appears to have done.

Please notice that this article is segregated from the rest of my blog with no glaring/obvious links to it. This was intentional. I posted this article in the hope that it would be useful to others, not because I would gain any benefit from doing so.

I will take your comment into consideration for future articles.

FWIW, I read the article because of the attractive title you gave it. I found the beginning exciting and the rest quite informative. If you had titled it more boringly, I wouldn't have chosen to read it.

Warning -- This article posts a longwinded first person account and finally ends with "this scenario is purely hypothetical". So it sounds like it is talking about a security vulnerability with git that really happened (signed commits or tags being broken, perhaps?), and it's actually an article saying you should use signed commits if you can't control access to your repo.

`git tag -s` is all that's needed. The primary problem in your horror story is that you GPG signed a commit with a tag when you shouldn't have. When you tag with GPG, you are saying you do trust the history. You didn't review the commits you were signing with the tag and that's were you made your mistake.

Linus himself is against GPG signing commits.


Ultimately a human being has to be responsible for reviewing code. Once it's reviewed, you can tag and sign it. No GPG signature is going to make up for not reviewing your code.

Interesting solution, but is this really a problem? The only situations I can think this applies to are large-scale open-source projects with many maintainers, and even more people submitting patches. For closed-source projects without external patch submissions, this is really only avoids the security threat of someone sitting at your computer and committing malicious code to your repository, which (to me) doesn't sound like much of a threat. (At least, if that is a threat, rogue commits to your git repo are only one part of your problem, and door locks would probably be a better, more broadly beneficial solution than implementing this.)

Yes of course, creating a backdoor or sabotage via revision control is a really good attack vector (like really). Recently several large open source projects had shady forged commits, so it's safe that say that anything private is also being targeted.

Apart from Ruby, which?

I presume your talking about the github incident. As far as I know that was just a proof of concept. There wasn't an actual, malicious, black hat forged commit, was there?

Corporate networks get broken into all the time. Any machine that a copy of the repository is stored on is a potential attack vector.

Various strategies are used to specifically protect all kinds of sensitive systems far beyond simple network security. This article is a detailed exploration of a/some strategy/ies for protecting your git repository against malicious modification -- a scenario that is rare but far from unprecedented.

This describes pretty exactly why security questions are hard.

It's not as easy as using tool x and generate key y... its more than using software, if you want security, you have to think about the whole stack, down to the access rights that each part of your hardware has.

There are good and best practices, but with IT Security, using the best practices in one place and just plain forgetting to unlock you PC when you go to the toilet at the office is more than enough...

People are allowed to be afraid when asked security questions.

> just plain forgetting to [lock] you PC when you go to the toilet

You're not even trying. Physical access, game over, the lock is irrelevant.

You fear what you don't understand, and you don't understand because you're afraid to learn. You have simply crafted a vicious cycle for yourself, and are making the world that much more unsafe by spreading your fear and acting as if everyone should share in your fear.

I'm not spreading fear, since i consider HN a place where people are educated enough understand the problems in it security.

Yes, once someone has hardware access, its devilishly hard to secure an environment, but just stating that its impossible is as wrong as security systems that ask their users too many questions to which they most likely have no good answer/no doubtfree answer.

There are systems which work pretty securely, i have worked with several educated security experts to create security infrastructures in companies.

I'm not saying that i don't understand the problems or don't want to learn solutions to them, i'm trying to defend the position of the user that can't be asked to learn about all the caveats of it security, since it is one of the most complex problems in computer science.

No, not every front-end developer that has to check-in his html und js files can be asked to understand all principles of it security.

tl;dr : You can GPG sign git commits with -S option. This makes things more secure

But it's not a cure-all

Yes. "more secure" ≠ "100% secure"

And sometimes, "more secure" == "less secure than before due to complacency"

The article puts forward a false dilemma by implying your only choices are (a) squashing all of the commits into a single patch, (b) examining the end point and then only signing the end point (with the risk that there might be some bad stuff hidden in some of the intermediate commits), and (c) examining and signing every single commit.

In fact what we do in the kernel community is most patches are in fact reviewed via e-mail. This increases the likelihood more people will look at the patch and notice any potential problem, and more importantly, the e-mail archive introduces an additional external record of the proposed change. After each patch in a patch series is reviewed, in isolation, with each change small enough that it can be easily reviewed in a mail reader, and with especial attention made to maintaining bisectability, the maintainer who will ultimately by signing the git pull request will apply it into his or her tree.

For larger changes and for larger subsystems where we have submaintainers, it is the maintainer's responsibility to make sure the submaintainer sends a signed pull request. At that point the submaintainer is assuming full responsibility for the set of patches which he or she is sending to the upper-level maintainer, just as the maintainer assumes responsibility for the patches he or she sends to Linus.

In the case where the submaintainer does not send a signed pull, or the maintainer does not have full trust in the submaintainer (either for social or technical reasons), it is the responsibility for the maintainer to check each and every commit for correctness. If he or she can't do that, he shouldn't accept the pull request; instead, he might request that the patch series be sent to the mailing list so everyone can review the patch series.

One of the things which follows from this is that for me as a kernel or e2fsprogs maintainer, github pull requests are not particularly useful. Most of the time I have no idea who the person is that initiated the pull request, because the primary community is on the mailing list. Worse, the pull request only gets sent to maintainer, which makes the maintainer the review bottleneck. If the patch series is sent to the mailing list, as individual commits, with no requirement to jump to a browser, it allows the review workload to be distributed to other developers on the mailing list.

As far as I'm concerned, a github pull request is the same thing as a patch sent via private, personal e-mail. I'm going to review it very carefully, and only apply it if I know that it is clean. Typically the review involves extracting the commit as a diff, and as has been discussed in other places, github doesn't encourage properly formatted commit descriptions, so I end up needing to extract the diff, and then reapply it with an appropriate commit description. So for me github pull requests generally represent negative, not positive value, because of the quality control and quality assurance requirements of the code bases in which I work.

The style of review you mentioned (which I made a brief mention of in the article) still falls within the three merge options. In the case above, maintainers seem to be doing option b or c, depending on trust. Instead of signing the merge commit, the pull request is signed. One of the downsides to this is that it does not extend the signature to the repository and requires the users to consult the mailing list. That said, I'm rather fond of the Linux and Git style of code review and would recommend it in addition to what was recommended in the article (with the former only being necessary if its benefits are desirable).

Thank you for the detailed explanation; it did help to clarify details I assumed, but was uncertain of.

No, there's no need to consult the mailing list, except if you want to see the "legislative history" of the review process.

The person requesting the pull signs the pull request by signing the git tag. The GPG signature on the git tag covers the complete history of up to that point in the tag, so there's no need to sign each and every git commit, as you propose. When the upstream maintainer (or Linus, at the top level) accepts the pull request, git will automatically create a merge request that contains a copy of the git tag's GPG signature in the merge request. (After this point, the git tag can be discarded). When Linus releases something, he signs a git tag for the overall git repository, and everyone can check that.

So for example, when I am satisfied that everything in the ext4 tree that I want to push to Linus is OK, I create a signed tag named "ext4_for_linus", and send him a git pull request, which is created using the "git request-pull" command. This creates an e-mail message contains the URL of my git repository and the name of the signed tag. When Linus accepts my pull request, the resulting merge commit will contain a copy of my GPG signature from the git tag. This signature signs my name to the commit id of the git tag, and the chain of SHA-1 hashes inherent in git guarantees that the entire commits back to the beginning of time came from me. Of course, my changes will branch of something like v3.3-rc2, which is in Linus's tree, so what is being signed that is of interest is my new changes between v3.3-rc2 and the tip of the ext4 tree.

Once Linus has accepted my pull, I'll reuse the ext4_for_linus name for the next GPG-signed tag for the next 2-3 month development cycle, since the important bit (the GPG signature) is preserved for long-term posterity in the merge commit. But what's important to note here is that the signature in the merge commit is signed by me, since it contains the contents of what I pushed to Linus, even though the merge commit was created by Linus when he accepted the pull request by merging in my changes into the repository. What Linus signs is the GPG tag for the actual released kernel (i.e., v3.3).

See? So there's no need to refer to the mailing list to validate the pull request or to establish after-the-fact accountability. The mailing list is just used for review purposes. But of course the review is a critical part of the acceptance process.

> the resulting merge commit will contain a copy of my GPG signature from the git tag.

Ah, I see. Thank you for the clarification.

It was not overall integrity of the trees that I was bringing into question. While the code review process may be a problem for certain organizations, I am not doubting that process for the kernel. With such a process in place, signed tags are likely to be sufficient, yes.

Note, however, that one of the points I was trying to make with the article was proof of authorship (to prevent someone from committing as you, much the same way as you would want to prevent someone from fraudulently sending an e-mail or message through any other means). It is for this reason that the workflow you described is not a complete alternative, but would be an excellent means of augmenting the review process for any project.

That said, author identity is not a concern for most projects, in which case, based on the quoted portion of your comment above, my arguments do not necessarily apply (assuming that all commits that are not yet ancestors of a tag are ancestors of a signed merge, as mentioned in your comment).

Might I ask how the maintainers deal with issues of identity? If they encounter a commit in your pull request that appears to be from a well-known contributor, do they verify his/her identity before merging, or do they not care?

We do have a ring of trust which we established recently, which is rooted in cross-signed keys belonging to Linus Torvalds, Dirk Hohndel, Peter Anvin, and myself. We all know each other quite well, both in person and via conference calls, so that was easy to set up after the kernel.org security breach. All keys in the key ring is signed by at least two, and preferably more, other trusted keys, all leading up to the root keys.

But that's only important from the perspective of knowing that a particular public key belongs to a person, which in turn is only really important when a maintainer wants to know whether or not the person who signed the git tag really is the submaintainer who is claiming to have signed the git tag.

Most of the time, we don't really worry about identity; what we worry about is the correctness of the commit. Especially for patches being sent via e-mail, I really couldn't care less whether the identity of the person sending the e-mail was Christoph Hellwig, or his evil twin. I'm still going to review the patch very carefully, because we all make mistakes. That's my job as the maintainer; to ultimately be a crap filter, and to reject crap, no matter whether it's written by someone famous or some nobody.

So when I send a pull request to my upstream maintainer (which in the case of ext4 is Linus), he really doesn't care about the identities of the authors in my pull request, because it is my reputation which is on the line; I'm the one who signed the git tag pointed to by the pull request. Just if it gets by me, and then it gets by Linus, it's ultimately Linus's reputation which is on the line.

So that's why your question is a bit ill-formed. Someone who is above me in the maintainership hierarchy isn't really going to care much about the identities of those who are below me in the maintainership hierarchy. The trust relationship is fundamentally between me and Linus, and then I have trust relationships between me and those developers below me. So Linus doesn't care about the identities of those which are two levels below him in the maintainership hierarchy, so long as he trusts me to have done my job properly. Of course, if Linus discovered that I had been blindly accepting pull requests from my downstream without doing proper quality control, then he'll yell at me and he might refuse to take future pull request from me (or at least threaten to do so). But normally, there's no need for him to verify the identity of the authors of commits in my pull request; he only needs to verify me, and to verify that he trusts my technical judgement and quality control checks and processes. If decides he doesn't trust me or my technical judgement then he won't accept pull requests from me; and in that case he would be absolutely right to do so.

So what does this means is if someone compromise their GPG key (more or less easy) and get his passphrase (not difficult) they can sign all kinds of commits and make it look like he did it. Great

Security is only as good as the weakest link.

Also, don't make it look like it's more secure when it may not be

From what i've read, the problem is that an attacker can add commits with forced Author information into a central repository to frame somebody else.

Wouldn't the signing of all commits as they are committed solve this problem? (ie. rather then trusting Author information from the commit, trust the signed-by information to give author information?)

Note that Signed-off-by (as added by -s) is different than -S (GPG-sign). The -s option simply appends the "Signed-off-by" line to the commit message, so this can also be forged.

The GPG signature cannot be forged (access to the private key is needed).

Thank you for taking the time to write this up in such detail and for making it an enjoyable read too.

I can't read this. If there's a vulnerability in the sequence of events described at the beginning, it needs to be called out more explicitly. If I wanted a mystery, I'd pick up some Agatha Christie.

This part bothers me:

git log --pretty="format:%H$t%aN$t%s$t%G?" "${chkafter:-HEAD}" --first-parent \ | grep -v "${t}G$"

What of the case where some smartass decides to

$ git commit -am 'Yet another foo G'

So long as it doesn't have a bad signature, it will give an empty string, right? So how do I know where the G came from?

I do appreciate the attention to detail. Fortunately, even if "G" were appended to the commit message, grep would fail to match (and remove) it: a tab is output directly before %G? (which, as you mentioned, could be an empty string). This ensures that there will always be characters after the subject.

That said, it could be terribly confusing when viewing the output. For that reason, perhaps another character after the tab (before the %G?) would be a good addition.

Is there no way to sneak a tab into the message?

Indeed there is, but there will always be a tab after that tab. Consider this commit message:

This would result in the following output:

  ... Foo\tG\t

Ok, that makes more sense. It's hard to see whitespace, so I didn't notice a tab lurking just before the newline.

Wait... if I can sneak a tab in there, what if someone hid a newline in there, too?

Would you not end up with

And break the regex? Or is there absolutely no way to smuggle a \n into the message?

%s inserts the "subject", not the entire commit message. The subject is considered to be the first line of the commit message (I forget if there is a character limit), so a newline would simply be considered the delimiter (and would not appear in the output).

Otherwise, you would be correct --- that would have rendered the check useless.

Ahh, you're right. I get a little paranoid when dealing with user-supplied strings, but usually that's a good thing :)

Absolutely; your critical thought and inquiry is a very positive thing.

Wouldn't this be solved with a merge commit? If I can see where a commit merged in, I could prove that I didn't author it, as I don't have access to the other author's machine.

Very good and long read; should wake up a few people

An interesting article, but I would prefer the author presented the issues first in a VCS agnostic way (as they are not specific to git) and then introduce the hypothetical scenario using git.

Actually these issues are not even specific to VCS, but security of most digital information.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact