- you have to tell git to use submodules for this to trigger (so `clone --recurse-submodules` or a manual `git submodule update --init`)
- credit for discovery goes to Etienne Stalmans, who reported it to GitHub's bug bounty program
- most major hosters should prevent malicious repositories from being pushed up. This is actually where most of the work went. The fix itself was pretty trivial, but detection during push required a lot of refactoring. And involved many projects: I wrote the patches for Git itself, but others worked on libgit2, JGit, and VSTS.
unpatched hosting site ->
in house (patched) v2.17.1 --bare mirror ->
The protection in v2.17.1 only gets enabled by default if you're checking out a repository yourself, not if you're merely fetching and re-serving git objects.
Turning on receive.fsckObjects as the official v2.17.1 release notes suggest is not sufficient to protect against this attack. It needs to be transfer.fsckObjects, which also turns on fetch.fsckObjects, which is what's needed here.
I should have clarified above, too: there were folks from GitHub, Microsoft, and Google working on the various fixes.
In addition to our recently implemented monthly non-critical security release process (we already had a critical release process before), we are making a number of changes in how we secure GitLab.com, which includes expanding our HackerOne program this year to be a public bounty program. As always, we appreciate the contributions of security researchers.
Not sure why the erroneous releases haven't been removed? Seems a bit confusing.
Chocolatey has the original 2.17.1, the 2.17.1-2 update is not yet approved.
Cygwin still only has 2.17.0-1. I wonder whether they were even part of the embargo.
I usually just track the repository's atom feed  and download urgent updates directly from there (git-scm.com eventually links to the releases published on GitHub anyway).
Edit: The Git 220.127.116.11 Chocolatey package has now been approved https://chocolatey.org/packages/git/18.104.22.168
Storing config data outside the repo would not be a foolproof solution, but it would probably make things a little safer. (Having the <repo_root>/.git folder has always felt a little bit "in-band" to me, and I don't like it.)
Too many of us are so used to git clone'ing a repo and building the software with make or its descendants that we overlook the security considerations.
The issue here is that the "git clone ..." allows for arbitrary code execution so the flow of 1) clone 2) analyze 3) make breaks at step 1.
Looks like it's back to tarballs for me.
We've been totally conditioned to just wget some archive, unpack it and build it, and even if git clone takes it one step further in day-to-day practice there is no difference between the two.
And depending on the sophistication you might want to do that in a chrooted environment or a VM.
It does not, actually. It's the submodule steps that create the vulnerability, and those have to be done manually. There are no standard or automated ways of pulling submodules, every project that uses them has its own scheme and provides its own build instructions. Frankly they're pretty obscure and in most communities replaced by tricks like npm's dependency management instead. It's a goof, and fixed, but even for the most naive users the exposure is fairly low.
It's true that it happens early in the process, but it's not true that a simple git clone command is a vector.
I’m pretty sure that it would be trivial to hide malicious code in a tarball that you wouldn’t find (especially if you’re not expecting it).
npm from a security standpoint feels a little like a house of cards.
Regardless, it still highlights the fact that everything is built on a rather fragile layer of trust.
The left-pad issue  highlights how one developer can unpublish their code and result in breaking every package depended on that package. "Unpublish" can be substituted for anything from "unpublish" to "publish broken code" or "publish actively malicious code."
Further, issues like the is-odd/is-even package popularity spike how developers can develop minimally beneficial encapsulation packages and then insert them into other packages as dependencies to pump up their numbers.  Well, what if someone's motivation wasn't to make their packages look important but to instead give them a way to inject code to all the locations (or maybe just one!) that run `npm update` or similar automatically on a daily basis.
No, neither of these events that have actually happened are particularly malicious. On a scale of "excessive use of service" to "full network worm ransomware" they're somewhere around "suspiciously sketchy." But, the same problem can be exploited to cause real damage. Yeah, it very much reveals how fragile the web of trust in NPM is.
I really do hate to keep posting these links, but people keep bringing NPM up where they're relevant!
How they could unpublish their code. left-pad's specific issue is no longer allowed by npm and thus could not arise today. That's not to say other issues could not crop up, of course.
Of course, then the goal just becomes attacking that whitelist, and all the complexity that comes with that. Security is hard.
None of the use-cases I read are convincing enough to allow `git clone` to do anything but what its short man description says.
I'm not even thinking about security, just basic separation of concerns. If `git clone` leaves a script-hooked repo in an unusable state for building, I want to know up front so I can complain to the maintainer and get that problem fixed.
It can also be used for other cases where you'd like to amend what git does by default when updating the tree. See this recent thread where some users want to have mtime behavior on files that's different from git's defaults, and one way to do that is via a post-checkout hook.
git submodule sync && git submodule update --init --recursive
Also put it into `post-rewrite` so that the same works for `git rebase`.
Although I'm not sure those tools could 'find' and build a vuln, but there could be ways to analyze an algorithm, and detect that it can do dangerous things it's not not supposed to do. A little like static analysis works.
I'm sure those tools are already built by the NSA at least, so they just have to peek into github repos, point out what code is vulnerable, give it to some developer to make an exploit. Done.
That way the NSA would clearly wins the cyber arms race, versus those pairs of eyes Torvalds was being quoted for, would surely be obsoleted.
If you run code without trusting the author, you're likely going to have a bad time.
This is, of course, unexpected. And while you should perhaps raise an eyebrow if somebody you don't know asks you to recursively clone a repository that you're not interested in - this is indeed a problem and you should upgrade your Git client.
Given that running the code therein (if the owner of the repo. is malicious) will hurt you, this doesn't do too much apart from have it happen earlier on :)
I guess one scenario where it could be a problem is if you were planning to clone untrusted code and read it all carefully before running it.
Sure, in theory, "unexpected and dangerous behavior" is par the course for security research and you isolate even data that you don't intend to execute if you suspect it is malicious. But, in practice, this is an easy mistake to make.
As another example, consider an automatic git mirror, or whatever the internal GitHub/GitLab/Bitbucket infra might do to move repos around, without intending to execute the code.
I've never seen this. It smacks of terrible practice that would not last long in a daily CI system. Once you have a good version, locking on to that version is one thing. Just randomly downloading new code versions for a daily build is impractical.
I'm not saying it's trivial , but more that people execute code from GH and similar alll the time without reading/evaluating , and this won't do any worse than that.
(1) could be any shared service like gitlab.com, github.com, bitbucket.org, etc
I guess this will make some bug bounties for researchers who can find services that haven't patched quickly...
I don't think I've ever done that, even on the few places which have used submodules, but I guess some people do.
While I try not to run code I don't trust, I have much more liberal point of view when it comes to cloning it. I assume others do too.
I'd be interested in hearing how you establish trust in the software you run. Assuming you're using git for cloning software code, do you include libraries that are dependencies of the code you're running in your trust calculations?
echo "deb http://ftp.debian.org/debian stretch-backports main" | tee /etc/apt/sources.list.d/stretch-backports.list
apt-get -t stretch-backports install git
Guess when it's not your direct product this is OK.