Hacker News new | past | comments | ask | show | jobs | submit login
Forging signed commits on GitHub (iter.ca)
308 points by rokkitmensch 7 months ago | hide | past | favorite | 117 comments



GitHub closing any security issue by default sure is a thing.

I reported an issue that was not fun to reproduce in their SAML implementation, where they would include organization membership in OAuth tokens even if you didn't have a SAML session, so if you did device posture in your SAML provider to protect from access to code, installing apps that allow you to get code (e.g. SonarCloud) would allow bypass.

The weirdest thing is that they had an endpoint that did check if there was a SAML session (membership API would 403 without), but I still got told this wasn't a bug and within expected behavior (not for our CISO it wasn't, and it's still not documented).

Shoutout to Tailscale for sending me some stuff despite it not being an issue in their implementation.


> Since the first author name is zero characters long, the regex skips that line, and the fake second author line is used instead. Git ignores extra author lines after the first, so Codespaces looks at the second author line but Git looks at the first. This means we can create GitHub-signed commits with any author name+email.

Classic weird machine exploit of looking at two independent ad hoc parsers and finding inputs where they desynchronize and enter different states.


One crucial thing that enabled this work is being able to decompile the enterprise version of the GitHub binary. It makes you think that distributing a binary of your app is either a good idea or bad idea, depending on how you view security. If your app is important though then it’s definitely a good idea.


At the opposite, thanks to that they now have a safer platform.

At the opposite, there could still be nefarious insiders that discover that and abuse of it or resell the exploit for years without anyone noticing.

Especially for a proprietary service, as a customer, for me it is an added bonus point to know that the thing can and is battle tested by outsiders...


> [was] being able to decompile the enterprise version of the GitHub binary.

> It makes you think that distributing a binary of your app is either a good idea or bad idea,

> If your app is important though then it’s definitely a good idea.

I don’t follow. What are you saying would have been the good idea here?

It also isn’t clear to me that they required the code except to answer why it works, certainly this parsing defect could be discovered simply by testing the endpoint.


That distributing a binary app was a good idea. The attacker found and fixed the vulnerability.

Of course it wasn't required to find the vulnerability, however if you make it then it's more likely that good guys will find and fix stuff. The contrary would be it's difficult to find this stuff, so then only highly motivated bad guys do it, because at that point 10 grand might not be worth it.

Again, this was all prefaced with it depends. Some might see it as making it easier to find problems, and others might say 'yeah, exactly.". It's just an opinion. A fact is however that a binary makes it easier. Source code would be even easier.


But what alternative are you suggesting? Not having apps?


if you don't offer a self hosted version you don't have to distribute anything. just force people to use your cloud product


This looks like a classic case of a parser mismatch vulnerability [0].

[0] https://www.brainonfire.net/blog/2022/04/11/what-is-parser-m...


Discussed on HN:

https://news.ycombinator.com/item?id=38743029 (22 comments)


I treat "signed by GitHub's verified signature" equivalent to unsigned. It's not the author's signature, just some third party's.


I think it adds value. It means that the commit was created using the author's credentials (to the extent that you trust GitHub and its lack of bugs). So the author can be trusted to some extent on these commits.

Although extending this signing ability to arbitrary commits via the codespaces API seems weird as the commit is no longer generated by GitHub. It seems now that you could generate some fake merge that changed data in a surprising way. Previously you knew that merges generated by GitHub would be "clean". Not that this changes much in practice.


In particular it means "this was committed via the GitHub Web UI" and "the author was authenticated to GitHub". But the latter part is not really any different from who pushed the commit. And clearly there is no value in this as long as GitHub doesn't make this feature more secure. Using regexp to parse the author line then ignoring author lines that don't match... yikes.


I realized there is one benefit that this offers: Github attests to the time that it happened.

Supposing you're looking at a popular repository, one where a malicious commit would likely be noticed eventually. The last commit was "one month ago". What's to say someone didn't compromise the developer's computer, sign a malicious commit backdated by a month, and push it to Github? If the last commit was made via the Github UI, you have pretty good assurance (i.e. as much as you trust Github not to get hacked) that this didn't happen.

Even better if the previous commit was done by the author, and the Github UI commit is trivially confirmed as safe. That way, you can confirm the author's commit locally, in case Github is the one that got hacked.

If both the author and Github got hacked, :shrug: I guess that's a pretty skilled adversary.

Caveat: All of the above is my own analysis. I'm curious if there are flaws in my thinking here.


But how do you know who pushed a given commit? I don't think it is recorded in the Git repository.

I agree that "to the extent that you trust GitHub and its lack of bugs" is a big caveat. But it sill seems better to have this information than not to have it.


Who pushed the commit, IIRC, is metadata that's not on the Merkle hash tree -- it can't be on the Merkle hash tree without there being a commit for the push of the other commit, since anyone can push, but an authored commit is immutable.


The badge is also displayed if the commit was signed locally by a GPG key whose public component is uploaded to GitHub by the the claimed author and committer.


Clicking on the badge shows a different message, however:

> This commit was signed with the committer's verified signature.

vs

> This commit was created on GitHub.com and signed with GitHub's verified signature.


> It means that the commit was created using the author's credentials (to the extent that you trust GitHub and its lack of bugs).

This is incorrect. If I create a merge request, and then the maintainer clicks the "squash and merge" button, that commit is associated with me, even though I'm not actually the creator of that commit at that point: the maintainer is. I believe this happens even if someone else (e.g. the maintainer) pushed commits to that merge request before squashing them together.


One should treat git itself as insecure. When you sign a commit, the signature is based on the commit message and commit metadata including tree object_id and parent object_id. But if sha1 is used to generate commit hashes, one can theoretically forge a commit. This means that in an elaborate supply chain attack, one could spoof a commit with a valid signature. That signature would then still appear valid for the spoofed commit and probably make it seem more legitimate than it is.


For people who might be wondering why git hasn't moved to sha256 yet, here's a lwn article on it: https://lwn.net/Articles/898522/


For a more recent bit of info on it (I remembered this thread from the latter half of last year on the mailing list):

https://lore.kernel.org/git/2f5de416-04ba-c23d-1e0b-83bb6558...

The important snippet from this thread is:

> On 6/29/23 07:59, Junio C Hamano wrote:

> > Adam Majer <adamm@zombino.com> writes:

> > > Is sha256 still considered experimental or can it be assumed to be stable?

> > I do not think we would officially label SHA-256 support as "stable" until we have good interoperability with SHA-1 repositories, but the expectation is that we will make reasonable effort to keep migration path for the current SHA-256 repositories, even if it turns out that its on-disk format need to be updated, to keep the end-user data safe.

> That could be a different definition of stable. But I'm satisfied that current sha256 repositories will not end up incompatible with some future version of git without migration path (talking about on-disk format).

> So maybe my question should be reworded to "is sha256 still considered early stage, for testing purposes only with possible data-loss or can it be relied on for actual long lived repositories?"

> > So while "no-longer-experimental" patch is probably a bit premature, the warning in flashing red letters to caution against any use other than testing may want to be toned down.

> Agreed. I think it should be clear that SHA256 and SHA1 repositories cannot share data at this point. The scary wording should be removed though, as currently it sounds like "data loss incoming and it's your fault" if one chooses sha256

> - Adam

TLDR: SHA256 works well and can be considered "beta". It's viable but you can't really share data between a SHA1 and SHA256 repo. So if your repo doesn't need to merge to/from existing sha1 repos and you aren't dealing with subtrees or anything fancy like that, you should be able to use SHA256. It is not expected to break your repo, it's reasonably well supported, and in the event a migration needs to occur, there will be support for that as well.

So if you want to jump on the SHA256 train you absolutely can provided you are following a pretty normal/boring use case.


Note that while SHA1 itself is broken, git specifically has a code to detect attempts at collisions and prevent them, see all the sha1dc stuff [0]

[0] https://news.ycombinator.com/item?id=17825441


Yeah I've always found that to be frustrating. Someone squashes and merges MY commits, and github still shows them as validly signed, even though they aren't the commits I created. Sadly it's hard to actually filter those out: the UI makes them all look the same.


I don’t like how GitHub changes my committer to them in the name of being Verified. I don’t care about it being Verified by GitHub (yes, even if it had always worked and not allowed spoofing).

It’s weird that GitHub has such a long history of cooperation with the Git project and yet just rewriting the committer is considered okay to them.


It's common practice to have the committer not match the author (especially in environments where only patches/diffs are sent). I can't imagine the git project having any issue with that, given that the author field stays retained (ignoring this bug ofc.).


I tried to do some more research on this. Unfortunately the search combination

> rewrite commit github verified

is so insanely SEO-poisoned by (1) StackOverflow questions about how to rewrite commits and (2) GitHub’s docs about regular (built-in Git commit verification) commit verification.

But I thought about it some more. And I guess them hijacking the committer on pull request merges is not that insanely bothersome. It’s their platform after all.

Just yet another reason to not interact with forges in a way that affects Git itself.

But another thing I’ve noticed is that sometimes my committer becomes the stupid initial (legacy) email I used to sign up there. Why in the hell? My current email is already registered there. But I tested this now and at least I got the option to merge as one of my addresses. So I guess it was fixed?

> It's common practice to have the committer not match the author

When the committer and author are different people. And apparently when GitHub tries to insert itself as a pseudo-person.

I expected that my own actions would be tied to my own identity on GitHub. Not for them to shoehorn GitHub Verified™ into commit objects that I create through clickity-clacking around their UI. But it is after all their UI and they can poison commits however they like since they create it (i.e. I didn’t create it and send it to them; then I would have noticed if they tried to rewrite it). So shame on me.

> especially in environments where only patches/diffs are sent

Clearly doesn’t apply here.

> I can't imagine the git project having any issue with that, given that the author field stays retained (ignoring this bug ofc.).

Given that they don’t use GitHub for anything directly related to Git (PRs and such) and that they have their own way of indicating code provenance: No, I guess they have no reason to care. (Other than in spirit.)

But it matters in general (in spirit) that if I pick up some patch by Jack Cooper and apply it then the committer is me. Not outlook.com. Or whatever program did it for me.

The commit object has a few metadata fields. I’ve never seen anyone say, “Oh yeah sure, just use that field for whatever it’s not like it matters anyway”.


If you contribute to the Git project itself, or other major git-using projects like the Linux kernel, you’ll get completely accustomed to having your commits committed by someone else.

Having a long history of cooperation with the Git project should only increase your confidence in doing that.

Also, don’t use the web UI to commit if you want to sign with your key, simple.


I know that Junio C Hamano applies all patches himself to git.git and thus is the committer on all commits.[1] That is how Git works when picking out patches from a mailbox.

The are all committed by people, not some middle man program.

Just like I can click the big merge button myself and then it is committed by... oh by mr noreply?

Try to find some committer or author in the git.git project that has a name like “Verified By Yahoo! <jibber jabber hash nonense>”? Good luck. (Almost like that kind of forge silliness was never accepted into any of their workflows... yet they are somehow very comparable, in your mind.)

This is like trying to explain to some Outlook representative that no one cares if they “verify” emails that go through them with some proprietary DKIM headers that only they know and care about.[2] “Well actually, if you understood how email headers work then you would see that it is not at all unlike......”

> Also, don’t use the web UI to commit if you want to sign with your key, simple.

Thank you for the tip.

[1] With the few pull request by way of email exceptions

[2] Made up but not implausible example


You may have pressed the button, but the actual commit action was done by GitHub's infrastructure, hence they are listed as committer. Think of it as an acknowledgment that maybe they were hacked and made to do the commit even if you didn't want it.

If you want to do the commit yourself, you can run `git merge` locally and push the result. They won't touch the commit (including the committer) in that case, because the commit hash must never change. I'm not sure if they track who did the push (or even what the sensible value would be if two people pushed the same commit to two different forks).


You might have pressed in the command, but the actual commit action was done by `git am`, hence why “Git Am” is listed as the committer.

Now let me tediously explain how commit hashes work even though you already alluded to it in a previous comment.


The web-flow signing system is for users’ convenience in places where it’s not feasible to sign the commit with their own private key: commits made in the web interface or on an ephemeral GH-provisioned VM (codespace). For the latter, you are free to send your own private key to your codespace so you can sign your own commits but GitHub cannot because they don’t have your private key and don’t want to have it. Defaults matter and signed commits are important.

As a sibling notes, this use case and similar ones is the reason the committer field exists as distinct from the author field. I think a $10K bounty for this bug speaks to how seriously they stand behind the fact that they will only sign and mark as verified commits whose author field matches an authenticated user.

(Disclaimer: former GH employee)


> The web-flow signing system is for users’ convenience in places where it’s not feasible to sign the commit with their own private key:

Who signs all their commits? Joey Hess maybe? There are certainly others. But I’ve never seen anyone make a case for this. In fact only negative cases since it just encourages you to automate your signing process, which many are not comfortable with.[1]

I’m not important enough to sign anything.

On Bitbucket we push the big merge button and out comes a commit with the correct person attributed to it.[2] Even Atlassian manages to do this the correct way.

> For the latter, you are free to send your own private key to your codespace so you can sign your own commits but

Yeah GPG/SSH sign commits... who cares. Most people don’t.

> Defaults matter and signed commits are important.

I don’t care about your opinion.

I wouldn’t mind if this was an option that I could opt out of. (I’m wondering out loud, not asking you or anyone else.) I just haven’t heard of it yet.

I’m a Git user after all so I’m used to changing bad defaults.

> As a sibling notes, this use case and similar ones is the reason the committer field exists as distinct from the author field.

Quite a leap to go from attributing emailed-around patches to the correct author while also maintaining the committer (like the maintainer) to what looks equivalent to Norton Antivirus junk output stuffed 40 lines into someone’s email signature.

> I think a $10K bounty for this bug speaks to how seriously they stand behind the fact that they will only sign and mark as verified commits whose author field matches an authenticated user.

“I think the price they put on this SPOOFING vulnerability speaks to how serious they are about verified commits”, they said without irony.

“Sent from my GitHub”, ah they all felt at-ease immediately... wait the same platform that had a spoofing vulnerability?

[1] Well, allegedly. I have never signed anything so I don’t know.

[2] They committed it too. Or wait. Was that the merge button?


It's wild that we're in 2024, and there's a security issue that is essentially "we parse data in an ad-hoc format with an ad-hoc parser based on regexps, and we got that wrong". And the solution is "we tweaked the regexp", and not "we properly specified/documented the format (or replaced it with something properly specified), and wrote a parser that correctly implements the specification".


Even if I had to use regex, I'd at least do it in 2 stages:

- Extract all "author" lines, fail-fast if there is more than one

- Parse the line, again fail-fast if there is anything strange

this would just stop this, and similar, attacks completely. Postel's law has no place in modern development if we want things to be secure.


> In computing, the robustness principle is a design guideline for software that states: "be conservative in what you do, be liberal in what you accept from others". It is often reworded as: "be conservative in what you send, be liberal in what you accept". The principle is also known as Postel's law, after Jon Postel, who used the wording in an early specification of TCP.

I didn't know, so if anyone wants to know, here's the summary from Wikipedia.

https://en.wikipedia.org/wiki/Robustness_principle


It's a tough balance. Postel's law is from a world before security mattered. It is incompatible with correctness.

The early Internet definitely wouldn't have worked without a thick layer of "you set this flag wrong in the header but I know what you meant". But these liberal formats have been the death of many a "0 remote root holes in the default install" OS slogan ;)

Git, meanwhile, would be something like:

    $ git clone example.com/cool-code
    FATAL: commit abc123: 2 errors: 
        header.author: too many authors in header
        header.author[0]: empty friendly name
    checkout aborted, contact the upstream git repo and tell 'em it's broke  
People would be mad and would stop using Git, so "git clone" has to accept whatever.

But the signature generating-code doesn't need to be this liberal; it can run this validation and say "I'm not going to sign that", and nobody would be mad. Github's implementation is just a shortcut; some engineer tried their flow against a test commit they made, it worked, they checked it in, they shipped the feature. Then someone thought "what if there are two author lines", and broke the signing code. (Maybe write some fuzz tests that emit headers and try signing them? You might not find this bug, but it can help.)

The problem is the missing spec, but specs are useless if not enforced, because people won't know that they accidentally violated the spec. That's what's tough. Postel's Law guarantees working software. But sometimes working means failing open. (Failing closed can definitely suck. Ever been late to work because of a "signal problem"? The signalling system failed closed and prevented your train from moving even though there wasn't any actual reason to prevent movement. Postel's Law would say "it's just a burned out light bulb; power ahead at line speed, not expecting a train randomly misrouted and barreling into you head on". 99.9% of the time, it's right! But there are a lot of dead 0.1%-ers, which is after the early days, we decided to make the system fail closed.)


>Postel's law

Also known as "make other people's incompetence and inability to conform to a spec your problem."


Yeah that “law” makes absolutely zero sense to me. It’s like “be nice” but in a self-sacrificial way which then indirectly sacrifices other people in the long run.


It's from an era when interoperability in a then-very-small universe was very important to moving the Internet forward. Back then there were no spammers, and very few bad actors. It's possible that the "Postel law" helped for a while -- hard to say for sure though.


You're not wrong, but sometimes you have to be pragmatic. I've done a lot of work building interfaces to acquire clinical data from healthcare provider organizations for care quality improvement. There are clear industry standards from HL7 (V2 Messaging, CDA, FHIR) but many provider organizations don't implement them correctly, so we had to accept a lot of junk and just find a way to deal with it. Many of those provider organizations lack IT resources and are at the mercy of their vendors or consultants, so trying to hold them to strict standards results in an impasse.


It makes sense for a lot of situations purely out of practicality. If a good portion of your customers use tools that break spec, your choices are to accept the brokenness or lose the customer.

Take browsers, for example. A lot of people complain that browsers try to work around broken certificate setups on servers, but if they didn’t, users would just switch to browsers that did.


> It makes sense for a lot of situations purely out of practicality. If a good portion of your customers use tools that break spec, your choices are to accept the brokenness or lose the customer.

Practical? So the adage is just “be liberal because you have to out of practical necessity”? Are you sure that’s the history behind it?


Idk it makes sense for UX sometimes. Really it’s a case by case basis and I don’t think you should be adhering to any law for this.


This still has the same problem. Your solution requires them to program the first "Extract all author lines" step to accept both lines, namely they would have had to make sure their extractor considered author lines with a zero-length author name to be "an author line". But if they had the foresight to do that they could just as well have done it with their original regex to begin with.


I don't see how?

As I said, actually parsing the line is the 2nd step. The first step is to extract all lines which start from "author ", without checking what the rest of the line looks like. That could be something like starts_with?("author "); or split by first space and check if first word is "author"; or verify against "^author " regex; or take first 7 chars and compare against "author ". All of those methods would easily catch the duplicate line in the attack and reject it.

(you might be curious if simple "author", without any argument, would be bad. First, even if it was, the impact would be pretty low as there is no second name to inject. Second, there is no need to worry: git actually checks for space after "author" [0], simple "author\n" by itself would be rejected)

[0] https://github.com/git/git/blob/e02ecfcc534e2021aae29077a958...


>The first step is to extract all lines which start from "author ", without checking what the rest of the line looks like.

I'm saying that this already requires knowing that "without checking what the rest of the line looks like" is an important consideration. It would be just as valid to define "extract all author lines" as "extract all lines that have `author <value>`, as they did.

Only by knowing that the latter definition causes the bug that the TFA found (either by thinking about it or by testing against the canonical git implementation) would it become known that that definition is wrong.

The problem is not the structure of the parser. The problem is that they didn't check that their implementation matches the behavior of the system it speaks to (git).

Edit: Also covered in https://news.ycombinator.com/item?id=39101419 / https://news.ycombinator.com/item?id=39101736 / https://news.ycombinator.com/item?id=39107945


Well, yeah, you have to think about security to write secure code.

Let's say you are working in Microsoft and you need to implement commit parsing, and all you have is few examples. Which of the statement below matches your thoughts the best?

1. I am sure no one will ever pass anything unusual or try to hack this, after all this is just an publicly accessible endpoint which protects important user data. I am going to go with solution that is fastest to implement, single regexp.

2. It's important that this function does not fail, so I am going to do my best to try to find a valid user email. If there are multiple lines or some lines are malformed, I'll just keep searching. I'll use a single regexp to make sure my code is robust to frontend changes.

3. Looks like relevant line is "author " followed by name and email in some format followed by datestamp... I know email parsing is extremely complex, so it's important to get this right. I wonder how they handle all the details - national characters in email, quotes in real name.... Oh well, I'll just grab the whole "author" line and implement the strict regexp check, and if anything is weird I'll fail the request. This may reject a user with weird name, but we can fix it when if the user complains. Oh, and I know git only allows single author per commit, so I am going to enforce this too.

4. All of the data samples are in the exact same format: "tree", "author", "committer", "gpgsig", space-prefixed gpg signature, empty line, commit message. And they all come from same origin (codespace javascript), so I don't they will change too much. I am going to hardcode this specific field order, and if there are any discrepancies, I will fail the request. The frontend team may get annoyed at me if they update git library and tests start to fail, but at least the system would be secure.

---

So I assume no one is going to argue for option 1 (although all those links you have posted are making me doubt)

Option 2 is what Postel's law is about. It was really popular at early internet, as it allowed much more compatibility between systems. I'd argue it is not a good idea in today's internet at all.

Option 3 is what I would do myself. Note I am not using any knowledge about git internals, this is all just generic defensive thinking about network protocols.

Option 4 is what I'd do in really secure systems where I control all parts. Could be a bit of overkill for the github case though...


It makes sense when you want rapid adoption, as was the case with the Internet.


There's nothing inherently wrong with using a regular expression here.

The security issue is that there are two different parsers and they interpret invalid input differently. This is a parsing differential vulnerability.

This security issue can (and does) occur whenever there is more than one parser implementation.


git treats objects as a generic structure ("list of space-separated name-value pairs, maybe with space continuations, followed by double newline, followed my message") followed by additional rules for individual values. Something very common in older protocols -- see HTTP, deb control files, email, etc...

And github code decided to short-circuit the process and combine two steps into one, which introduced weird inter-layer interactions where lines with good name but invalid content would be silently rejected.


Especially wild for GitHub, I would have thought they had the entirety of 'git foo' codified up with actual parsers by now, but I guess a regex will always do..


This doesn’t surprise me at all coming from github. I’ve tried using their public api’s, and they’re all shockingly bad. In some cases I thought it was my mistake, but then I checked their website, and found it was bug for bug compatible with the thing I’d built.

My usual rule of thumb for choosing products is to go with whatever rising alternative there is to the entrenched monopoly. The little one needs to be much better to overcome monopoly effects.

I don’t sign my own paycheck, but would be moving to gitlab or something if I did.


Well the user-git-facing parts of GitHub are built on libgit2 and libssh.


I must be missing something because the corrected regex (/\Aauthor ([^<]*)[ ]{0,1}<(.+)>/) seems to behave the same as the problematic one:

  https://regexr.com/7qrum
I wonder whether github may not be using PCRE.


The problem wasn't that the regexp matched the second "author" line, but that it didn't match the first one.


This is the correct take.


I wrote a chintzy Preact app that pulls the source for a Pokemon Red hack from Github and makes Pokemon stats, moves, type matchups, etc. searchable [1] because enough has changed from the original that I wanted a quick reference for my current playthrough. I literally wrote in the TODOs in the README

  Generate proper AST from asm source and ditch janky Regex parsing?
because I felt guilty about doing things like

   const match = l.match(/\s*move\s*(\w+),\s*(\w+),\s*(\d+),\s*(\w+),\s*(\d+),\s*(\d+)\s*(.*)/)
Maybe I shouldn't be so hard on myself.

[1] https://github.com/alexjplant/rgbdex


A proper AST would "read" the source code programmatically, as opposed to brittle regex? Is there any benefit to building that when your regex works and what you're parsing is static? (Genuine questions, my compsci background is not strong).


Really nice and educational exploit. Regexes should almost never be used in parsing structured stuff in production. While it is developer friendly, it's very hard to think all the edge cases.


Disagree here, the problem is their implementation was slightly different to Git's. There is more chance of stuffing up like this with a handrolled parser than with a regex. The only viable alternative i'd accept here to actually reduce risk is to call into libgit itself. Absent that, a regex is appropriate for this problem.

There are many problems where regexes are succinct and appropriate, (of course there are many more where they are neither).


Unless git uses regex, it is almost impossible to verify that the regex is same as their validation. What if git's validation changes. Even looking at the corrected regex I am not sure what would happen for strings like '<a@a.com>>' or '<' or 'x <a>'. Not saying the github's corrected regex is wrong, but it is a mental challenge to verify it.


This isn't really a regex problem though. The regex probably did exactly what the dev intended, it's just that the dev failed to take into account that <author> could also be a 0 length string.

The same logic bug would have occured with a parser.


This is all speculation of course but a parser is more likely to recognize the author: prefix and then error out on anything unexpected in the rest of the line. Meanwhile regular expressions don't have any way to report errors.


Personally I’d have separated finding the Author line from pulling out the contents of that line - but still used Regex


When the data itself is just plain text, are there any other (better) ways?


Parsers?


I don't see how writing a parser from scratch would mitigate bugs vs using a regex parser. Parsers are just a hot spot for security bugs that should get extra scrutiny.


Regex is the parser for regular languages.


What I tought, they should have used some more official library to manage git instead of doing the work themselves, but perhaps there was none?


We have libgit2, but github have their own ruby implementation of everything.

(java and rust have their own "native" implementation too.


libgit2 was written after github. So that might explain.


libgit2's frontpage claims that GitHub does use them: https://libgit2.org/


Depends which part of github uses them. Might even be their GUI client…


Are there alternatives that make it easier to think about all the edge cases?


I think re-implementing the functionality is the mistake here. A big counter-argument to “rewrite in rust” is usually that by rewriting you introduce new bugs.

Especially for security critical things one should re-use the implementation to avoid issues like the above. Proving both the original and re-implemented parser to be equivalent would probably also work, but not sure how practical.

And in any case have competent people audit what you did.


How did we get from criticizing regexes to mocking “rewrite in rust” in the span of three comments?


I’m not mocking it. I’m just applying the argument I see many people on HN make in discussions about rust to this case, where I suspect many will lean much more on the side of using regexes to reimplement parsers. Don’t know if that’ll be the same people. Either way, I didn’t want to have a discussion about rust.


> I’m not mocking it. I’m just applying the argument I see many people on HN make in discussions about rust to this case, where I suspect many will lean much more on the side of using regexes to reimplement parsers.

Hmm. Okay.

The part that bewilders me is that Rust showed up. Which would have made sense if someone was complaining about C++ or something, or maybe unsafe memory management. But regexes?

> Either way, I didn’t want to have a discussion about rust.

Which is why you brought it up unprompted... I give up.


Okay, I’ll give you one more thing: I love using rust. I also use zoxide, ripgrep, dua, etc.

I don’t hate the language. Quite the opposite. I hope you can now go back and just see the argument for what it is and not for what you thought I was insinuating.


Hey, watch what you’re saying buddy, ‘rewrite in rust’ is what pays the bills in this family!


Keep using regular expressions, just stop using . and instead, define what is allowed.

You're expecting an input to be a UUID? Check it with ^[a-fA-F0-9\-]{30,40}$ and you know you're not getting any apostrophes or script tags or enormous inputs or empty inputs or newlines or lookalike characters or emojis.


The fault is still present in that, you shouldn't just regexp "look for needle in haystack" in a single phase when you're supposed to extract key-value pairs and operate on the author key.

What happened here was that the Github employee programming this didn't bother to read the spec and/or think.


Relying on regexes instead of thoroughly tested parser AST's encourages vulnerabilities.

Are they really scraping logs for credentials instead of using an API?


I’ve always avoided blind signing endpoints for this reason. You’re reducing the authenticity guarantee by a thousandfold, and increasing so much complexity.

I don’t think there’s an easier way though short of not supporting signatures in codespaces, and this definitely feels like a feature where the Codespaces product team’s roadmap was prioritised over the Security team.


It would be better to generate a user-and-codespaces-specific private key that would be safe to give to the user by checking out in the codespace so that you can use normal GPG signing flows.


Now that SSH key signing is a thing, you might be able to use SSH agent forwarding to sign commits without having to hand over your private key.


...GPG had such a feature for ages: it's AFAIK largely a result from very modular/flexible smartcard/HSM support.


> Since the first author name is zero characters long, the regex skips that line, and the fake second author line is used instead. Git ignores extra author lines after the first, so Codespaces looks at the second author line but Git looks at the first.

How common is it to have multiple author lines? I haven't seen that before, only co-author footer[1].

[1]: https://docs.github.com/en/pull-requests/committing-changes-...


Apparently multiple authors is invalid.[1]

[1] https://stackoverflow.com/q/65445147/1725151


Why does Github use the same GPG key for all users? Feels like it makes these kind of confused deputy attacks endemic to the entire platform rather than being contained to a single user account.


Signing commits is such a pain to set up. I'm convinced its such a boring area with so little profit that no one has spent enough time to make it more straight forward.


I tend to agree with Linus Torvalds' statement about signing tags rather than commits.

https://web.archive.org/web/20201111174438/http://git.661346...


What's hard about it? You just define `user.signingkey = $key_id` and `commit.gpgsign = true` in `~/.config/git/config` once and be done with it. Or only the first one if you want to choose what you sign instead of signing everything.

Or are you talking about setting up GPG in general?


1Password makes it easy to do with ssh keys and biometrics. It’s actually pretty slick.


you can sign w ssh key w 2 git configs set. im not sure this qualifies as pain


It's meaningless unless your keys are known to others to speak for you. I.e., you have to participate in the PGP trust mesh, and that is what is a pain.


https://datatracker.ietf.org/doc/draft-koch-openpgp-webkey-s...

Serve the key at a certain URL from a webserver on your email domain. Clients that trust the domain can trust the key that it serves.


or github.com/<username>.keys


isn't this what keybase is for?


Yup. This.


If you use a regex in your application you need to write lots of unit tests for it that include many negative test cases. Perhaps even look for a test case generation tool/library that helps automate this.


The better solution is to have a formal specification of the syntax, that the regular expression can straightforwardly be derived from and be compared with.


> April 24 2023: GitHub closes the issue, saying that being able to impersonate your own account is not an issue

I mean, fear & curiosity would have me testing it before closing. Wha?


As I understand things, everyone who runs a bug bounty gets an endless stream of useless reports telling them things like "your website sends the server header which makes you vulnerable to hacking, bounty please"

Therefore, they have very junior people assigned to triage bug reports. And those very junior people are in the habit of closing 99.9% of reports as not-really-a-security-problem.


Anecdote time, I discovered a bug where a bank app would continue receiving 2FA notification prompts even after the user terminated the session from another device (e.g. after realizing their device was stolen). The app was apparently still holding onto some kind of a valid token that could be used to approve 2FA requests despite being "logged out".

I submitted detailed reproduction steps, with a video clearly showing the bug. Triage claimed that this was "intended behavior" and as far as I'm aware, the bank didn't even see my report.


Haha yes, that's the norm. They don't care. Report to the regulator. Then they care. Find out who the regulator is in your country, and report to them. Kinda annoying, but when the regulator tells them they need to do it, it magically gets done.

Note that the regulator doesn't threaten with childish things like a fine of 1M USD, that's kids' playground stuff. They threaten to revoke their banking license, which will cost them 100x more per day :)


I've found that responding "Well, if that's intended behavior you don't mind if I blog about it then? The post will go up in a week." tends to help them take it more seriously.


Sorry, if that's your security process, it's broken. If you're microsoft and that's your security process, you have no excuse and should be embarrassed. However, I heard such stories about Microsoft's security process plenty of times before, so I guess that's what it is.


> "your website sends the server header which makes you vulnerable to hacking, bounty please"

Getting nightmare flashbacks at my last job.

My CISO insisted everything on our penetration test report get remediated. Even "Information" level items, like the server header. And port 443 being open. facepalm


It’s a soul-sucking type of work. At my company, engineers tend to blindly follow security tooling recommendations to the T without ever considering context. Quite frustrating. One example is hardening an internal-only, ten-monthly-active-users app as if it were public. Lots of misdirected engineering hours if you ask me, but go to tick that checkbox the auditor asked for!


Is there anything more pointless than signed commits? If your security team ever enforced it you're probably already pwned but may as well replace them just in case.


Why do you say that? It seems like they would be more important in the context of supply chain attacks.


If they were able to get your keys to push a commit or sneak code into your branch we are waaaaay past the territory where a signed commit can be trusted.


the fact they haven't published an analysis with every single case this was abused, even if zero, in the past is proof enough this was a feature to someone.


> GitHub fixed the issue by changing the problematic regex

Why am I not surprised.


(2023)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: