What a garbage clickbait thread. From scary words like "attack", "infected", etc. you would think projects are compromised. But nothing is compromised. From wayyyyy down in the thread:
> The attacker creates FAKE orgs/repos and pushes clones of LEGIT projects to github.
Yeah, anyone can push anything to their own GitHub accounts/orgs, including malware. We know that.
From what I can see, he wrote the tweet after organically finding one of these via Google and then searching and finding that there were many many more.
It's absolutely true that the wording is wrong, but I think it's reasonable to accept a jumped the gun rather than a clickbait explanation.
The presence of large volumes of project copies on typosquats and synonym squats is still a problem, they'll still get indexed by tools, and then the tools boost their page rank, and eventually some make it to users. Given that the Go init payload contains an RCE and not just a data collection, there is still something of note there. Yes it's not 35k compromised projects, but it is a broad deployment of malicious code.
Yes, the scope is not "35k existing GitHub repos are infected", since AFAICT all the infected repos are forks, so the title is misleading.
However:
1. The scale is pretty worrying. Given the total number of repos on GitHub (> 100M) it's a drop in the ocean, but still huge.
2. Typo-squatting on, say, PyPI or npmjs is certainly note-worthy, and this is a very similar attack.
3. At least some of the infected forks had several stars, some from ~ 5 year old accounts, so apparently some people were using them.
4. The original Twitter thread did note that infected forks were being created — it just didn't emphasise that this was the only attack surface, probably because the author didn't realise.
And these aren't compromised projects, they are repos created by the "attacker" if you can even call them that. Of course anyone can push malware to their own account. The author admits this in the thread:
> The attacker creates FAKE orgs/repos and pushes clones of LEGIT projects to github.
Pure scaremongering and/or attention seeking.
Edit: Sorry, I posted two similar comments because my first top level one was immediately downvoted to the bottom. It has since come back up.
This is a consequence of centralization. The canonical project sites and repositories should not be on GitHub.
Fanatics who believe otherwise will still clone those projects so that they are on sacred ground, but the practice should be frowned upon and fought against.
Another detrimental effect of GitHub is that they have trained users to accept public "forks" (a misnomer) as the usual way to contribute even trivial patches.
This lowers the bar for accepting and trusting non-official repositories.
GitHub has devalued the brand of large projects and has introduced the age of industrialized software development by creating an addictive environment where software politicians thrive by manipulating their social networks and working on their personal brand.
This is also a problem for enterprises. I’ve seen commits from root, ec2-user, etc: GitHub knows who’s pushing a commit even if git doesn’t, and it’s maddening that at least for enterprise accounts they don’t carry that identity into the metadata.
That would change the commit hash, at least if you want it to survive a clone of the repo. Of you'd store it externally so that it would only be able to be shown in the webui then it's of limited use, but maybe better than nothing.
I feel the commit data could be extended to include some metadata that isn’t used to compute the hash. GitHub could then make use of this data to populate whatever.
(Not sure if such a field already exists in the commit blob)
I’m somewhat familiar with how git works. In my understanding, a commit is just a blob combining the commit information and a tree blob, hashing them together to create a commit id.
This design doesn’t preclude the usage of additional information in the commit blob that isn’t used to compute the hash.
(Think for example how file access times do not affect its hash)
Git is a content-addressed object store, the address of any stored object is the hash of the object itself. So you actually can't stuff extra data into an object and not change its ID; this auxiliary data would need to go in a separate store indexed by object ID or a similar solution. The reason why file access times don't affect git hashes is because git does not store them.
Correct. My suggestion for a solution is for github to add a "reject-unsigned" feature. Only allow commits signed by <my gpg key> and <my email> to be pushed to github, under any projects/org.
1. What happens when someone needs to resolve a merge conflict involving your commit? Let's say I maintain a fork of an open source repo to add some feature, and I periodically merge back in upstream changes... that necessarily involves resolving conflicts. By default, git retains author ownership, and now the commit is unsigned, but it's really your work. What do we do? Do I have to use a custom merge flow that also rewrites authorship from "Alice <alice@gmail.com>" to "Alice <alice@gmail.com.fake.unsigned.suffix>"?
2. What happens if your gpg key is compromised or expires? Are all your previous repos now invalid? I can't fork it because it contains a commit authored by you, but with a revoked or expired gpg key?
3. What happens to previous commits if I enable this feature? Can all my unsigned commits no longer be pushed to github? I made a commit in a project at work 5 years ago with my email, but didn't sign it.. if that company wants to open source that project on github, do they now have to rewrite history to change the author on my unsigned commits?
4. What does the "squash merge" button on github PRs do for your PRs?
We've adopted this policy internally. It largely requires using the proper git workflow, rather than (in our case) gitlab's GUI flow, though we make an exception for a simple merge (I think it's verifiable that no code is added).
The check is that commits at the point of merge have a valid signature. Historical commits are part of the history and as such cannot be changed without an additional commit (with a valid signature). Previous unsigned commits are deemed trusted at the point you begin signing and checking.
Squash merge breaks stuff and shouldn't be used. To complete things, the restricted set of operations exposed through the GUI should sign using gitlab's or github's key (or some accepted bot key we've set), with the check happening on the input commits, but AFAICT that's not supported yet.
A blockchain could be useful for proving a commit was not done significantly before or after the claimed timestamp, and (depending on how you do it) non-repudiation.
IMO I don't see how this is desirable enough to be worth it.
Wider adoption of GPG signatures would be much more impactful.
It could make sense to use a blockchain for key distribution/key discovery/revocation, effectively as a replacement of PKI, though.
They have something under Settings > SSH and GPG keys where you can enable Vigilant mode.
While that still allows pushing unsigned commits, it will flag them with a warning batch.
I had this on for a while, but unfortunately as some open source projects tend to rebase commits before pushing them, this was causing warnings to be shown (as the rebase breaks my signature), so I turned it off again as to not scare people when looking at the commit history of a project and seeing warnings after my contributions were merged in.
Good to know, I was not aware.
The squash/rebase issue is definitely problematic, though a tree of signatures could be appended to each commit. Now... this does break how commits are currently signed.
a git repo is practically a blockchain. Fixing this will require how git treats signatures, but no additional parallel architecture needs to be created.
I lost a Yubikey, so I revoked the subkeys on it. Github didn't let me update the existing pubkey, so I had to remove and re-enter it. Now all things signed by the revoked key, despite being before the revoke date, come up as unverified.
I'm not sure if it's a me issue or a PGP issue or a GitHub issue, but it's a pretty broken system.
How would that work for past commits? Would people be forbidden to mirror a project to git just because it contains an unsigned commit of mine from 2007?
It would only allow commits signed by me to be pushed under my email.
Github uses the email as the "proof" of commit ownership. By only accepting signed commits a user would not be able to push a commit impersonating me.
That solves impersonation, but that is not a related problem here.
These repos were not taken over but cloned and made to look like another repo via similar naming.
I think what you're looking for is more "all accounts must be verified via payment/identity" then you really know who is making "random clones" and "look-a-likes" w/ malware.
But you've got a whole host of other problems in the process.
While browsing the nanobox repo linked in the twitter thread I started to get 404s, so it looks like GitHub is on it. Edit: other repos have vanished as well now.
The author is being obtuse. They mean that clones have been made of those projects that include malicious code.
It's like if I make a copy of the New York Times website but replace the cover image with nudity and put it on a different URL and someone tweets "omg NYT has nudity on the front page" and clarifies, vaguely, 10 tweets down that it was actually not the real NYT but a clone.
I'm not convinced that the author is spinning it this way on purpose (ie for maximum emotional effect / retweets / internet points) or if it just comes from being too close to the subject matter, but it's pretty misleading either way.
Riiight, that makes a LOT more sense. This would have been HUGE if the actual repos were infected and it wasn’t even at the top here at HN. I was very worried for a minute there, your comment has calmed me right down. Thank you!
How would code like this make it into so many repos? People accepting pull requests and not properly reviewing them? Or is there something even worse about this attack?
Considering that only clones are affected, your original tweet is downright wrong. None of the listed projects (python, js, bash, docker, k8s) are affected. Anybody can fork a repository to introduce malware.
TL;DR: These are forks by unknown people containing malware. I see no indication in the linked thread of even a single successful compromise actually occurring, or malicious code making it into legitimate upstream projects.
This is interesting. If you go to that user's profile, and look at the "contributions", there are none in July / August. Yet the commit is from two days ago.
I admire the desire to help, but looking at flow logs to that IP address is how people are going to determine if they have compromises in their environments. Excess traffic to the IP will just muddy the water.
If lots of software released today haven't been pinning their versions on release (especially Electron apps) or signing their commits if they are open-source, then this is a chaotic supply chain attack waiting to happen and is more worse than I thought.
But really it is yet, another reason to avoid GitHub entirely and just self-host using GitLab or Gitea.
You may have misunderstood (understandably, because the tweets seem to be deliberately misleading). These are malicious commits in forks of repositories. There is no supply chain attack unless you make a habit of taking random forks of popular projects from GitHub and inserting them into your supply chain.
Spam is not a problem GitHub has ever had to seriously face so far but this sort of attack does seem like it could catch some users casually googling for libraries.
If you impersonated all these real repos, made npm, pypi packages for them etc and also updated the readme I think you could catch some people off guard.
The supply-chain attack is a self-inflicted attack if you're Googling a library and copy-pasting it as a Git dependency without so much as a glance at any of the numerous indicators that are screaming at you that it's untrustworthy.
It seemed pretty clear to me that GGP misunderstood this as malicious code being inserted into existing trusted repositories, which is a common misunderstanding in the rest of the comments, and seems to be encouraged by the poor wording of the tweets.
> The supply-chain attack is a self-inflicted attack
It is attack regardless.
Someone has made something malicious which affects for the process for the end-user acquiring the final software.
> it seemed pretty clear to me that GGP misunderstood this as malicious code being inserted into existing trusted repositories, which is a common misunderstanding in the rest of the comments, and seems to be encouraged by the poor wording of the tweets.
I think the author just wanted to get attention and be sensational.
He deliberately did not mention that they are forks.
Just rushed to report findings.
> The attacker creates FAKE orgs/repos and pushes clones of LEGIT projects to github.
Yeah, anyone can push anything to their own GitHub accounts/orgs, including malware. We know that.
Save yourself some time. Flagged.