Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Malicious code added to 35k GitHub repos, leaking user environments (twitter.com/stephenlacy)
215 points by pcmonk on Aug 3, 2022 | hide | past | favorite | 76 comments



What a garbage clickbait thread. From scary words like "attack", "infected", etc. you would think projects are compromised. But nothing is compromised. From wayyyyy down in the thread:

> The attacker creates FAKE orgs/repos and pushes clones of LEGIT projects to github.

Yeah, anyone can push anything to their own GitHub accounts/orgs, including malware. We know that.

Save yourself some time. Flagged.


From what I can see, he wrote the tweet after organically finding one of these via Google and then searching and finding that there were many many more.

It's absolutely true that the wording is wrong, but I think it's reasonable to accept a jumped the gun rather than a clickbait explanation.

The presence of large volumes of project copies on typosquats and synonym squats is still a problem, they'll still get indexed by tools, and then the tools boost their page rank, and eventually some make it to users. Given that the Go init payload contains an RCE and not just a data collection, there is still something of note there. Yes it's not 35k compromised projects, but it is a broad deployment of malicious code.


Yes, the scope is not "35k existing GitHub repos are infected", since AFAICT all the infected repos are forks, so the title is misleading.

However:

1. The scale is pretty worrying. Given the total number of repos on GitHub (> 100M) it's a drop in the ocean, but still huge.

2. Typo-squatting on, say, PyPI or npmjs is certainly note-worthy, and this is a very similar attack.

3. At least some of the infected forks had several stars, some from ~ 5 year old accounts, so apparently some people were using them.

4. The original Twitter thread did note that infected forks were being created — it just didn't emphasise that this was the only attack surface, probably because the author didn't realise.


The risk here is that somebody might download the fake repo mistaking it for the real one


Especially considering how easy it is for duplicate content to reach the top results on Google these days.


https://news.ycombinator.com/newsguidelines.html

> If you flag, please don't also comment that you did.


This code does more than leak environments. The go code pulls down arbitrary text and passes it to sh -c, example: https://github.com/zerops-io/zcli/commit/0396ee57bc0e5e0b123...


note that it's 35,613 code results, not 35k repos

and 13K of the search results come from this org

https://github.com/redhat-operator-ecosystem


And these aren't compromised projects, they are repos created by the "attacker" if you can even call them that. Of course anyone can push malware to their own account. The author admits this in the thread:

> The attacker creates FAKE orgs/repos and pushes clones of LEGIT projects to github.

Pure scaremongering and/or attention seeking.

Edit: Sorry, I posted two similar comments because my first top level one was immediately downvoted to the bottom. It has since come back up.



This is a consequence of centralization. The canonical project sites and repositories should not be on GitHub.

Fanatics who believe otherwise will still clone those projects so that they are on sacred ground, but the practice should be frowned upon and fought against.

Another detrimental effect of GitHub is that they have trained users to accept public "forks" (a misnomer) as the usual way to contribute even trivial patches. This lowers the bar for accepting and trusting non-official repositories.

GitHub has devalued the brand of large projects and has introduced the age of industrialized software development by creating an addictive environment where software politicians thrive by manipulating their social networks and working on their personal brand.


I'm upvoting just for the hot take.


This is that thing where people can put anyone in as the commit author, thus impersonating the original creator right?

Seems like the solution is "don't just copy random github urls into your code" ?


This is also a problem for enterprises. I’ve seen commits from root, ec2-user, etc: GitHub knows who’s pushing a commit even if git doesn’t, and it’s maddening that at least for enterprise accounts they don’t carry that identity into the metadata.


That would change the commit hash, at least if you want it to survive a clone of the repo. Of you'd store it externally so that it would only be able to be shown in the webui then it's of limited use, but maybe better than nothing.


Specifically it could be exposed through the GitHub REST API without impacting the commit itself.


I feel the commit data could be extended to include some metadata that isn’t used to compute the hash. GitHub could then make use of this data to populate whatever.

(Not sure if such a field already exists in the commit blob)


> I feel the commit data could be extended to include some metadata that isn’t used to compute the hash.

That's not how git works.


I’m somewhat familiar with how git works. In my understanding, a commit is just a blob combining the commit information and a tree blob, hashing them together to create a commit id.

This design doesn’t preclude the usage of additional information in the commit blob that isn’t used to compute the hash.

(Think for example how file access times do not affect its hash)


If it's not part of the information that's hashed to create the commit id, it's not part of the commit. By definition.


Git is a content-addressed object store, the address of any stored object is the hash of the object itself. So you actually can't stuff extra data into an object and not change its ID; this auxiliary data would need to go in a separate store indexed by object ID or a similar solution. The reason why file access times don't affect git hashes is because git does not store them.


I think you want git-notes (or something similar):

https://git-scm.com/docs/git-notes

"Adds, removes, or reads notes attached to objects, without touching the objects themselves."


What is the difference between a 'random' and 'non random' repo?

The whole point of 'Open Source' is that we can use code which might otherwise be a bit 'random'.

It's not 'Institutionalized Open Source' it's just 'Open Source' i.e. we're not all Torvalds.

So, credibility etc. is a very fickle thing otherwise, this is a serious security issue and we really don't have answers.

We used to think about code as 'logic that works' but now we have other criteria, I wonder if our FOSS models need to adapt bit.


It's a good point actually.

I suppose the message is "read the code you're using" but that is hard for big libraries and frameworks.

Obviously using one's code where they are impersonating someone else is a big red flag.


Reading the code for functional integrity is already a big deal, but having to sleuth around for the sneacky hacks? No way.

I don't know what the answer is, but the model has to be changed.


Correct. My suggestion for a solution is for github to add a "reject-unsigned" feature. Only allow commits signed by <my gpg key> and <my email> to be pushed to github, under any projects/org.


Let me ask a few questions about this scheme:

1. What happens when someone needs to resolve a merge conflict involving your commit? Let's say I maintain a fork of an open source repo to add some feature, and I periodically merge back in upstream changes... that necessarily involves resolving conflicts. By default, git retains author ownership, and now the commit is unsigned, but it's really your work. What do we do? Do I have to use a custom merge flow that also rewrites authorship from "Alice <alice@gmail.com>" to "Alice <alice@gmail.com.fake.unsigned.suffix>"?

2. What happens if your gpg key is compromised or expires? Are all your previous repos now invalid? I can't fork it because it contains a commit authored by you, but with a revoked or expired gpg key?

3. What happens to previous commits if I enable this feature? Can all my unsigned commits no longer be pushed to github? I made a commit in a project at work 5 years ago with my email, but didn't sign it.. if that company wants to open source that project on github, do they now have to rewrite history to change the author on my unsigned commits?

4. What does the "squash merge" button on github PRs do for your PRs?


We've adopted this policy internally. It largely requires using the proper git workflow, rather than (in our case) gitlab's GUI flow, though we make an exception for a simple merge (I think it's verifiable that no code is added).

The check is that commits at the point of merge have a valid signature. Historical commits are part of the history and as such cannot be changed without an additional commit (with a valid signature). Previous unsigned commits are deemed trusted at the point you begin signing and checking.

Squash merge breaks stuff and shouldn't be used. To complete things, the restricted set of operations exposed through the GUI should sign using gitlab's or github's key (or some accepted bot key we've set), with the check happening on the input commits, but AFAICT that's not supported yet.


And now we’re on the fast track to adopt a blockchain as a tamper evident mechanism.


git is effectively a blockchain. Trying to use a blockchain for this has many of the same problems as described in GP's comment.


A blockchain could be useful for proving a commit was not done significantly before or after the claimed timestamp, and (depending on how you do it) non-repudiation.

IMO I don't see how this is desirable enough to be worth it.

Wider adoption of GPG signatures would be much more impactful.

It could make sense to use a blockchain for key distribution/key discovery/revocation, effectively as a replacement of PKI, though.


They have something under Settings > SSH and GPG keys where you can enable Vigilant mode.

While that still allows pushing unsigned commits, it will flag them with a warning batch.

I had this on for a while, but unfortunately as some open source projects tend to rebase commits before pushing them, this was causing warnings to be shown (as the rebase breaks my signature), so I turned it off again as to not scare people when looking at the commit history of a project and seeing warnings after my contributions were merged in.


Good to know, I was not aware. The squash/rebase issue is definitely problematic, though a tree of signatures could be appended to each commit. Now... this does break how commits are currently signed.


Would this be a problem if commits were stored in a blockchain? Rebase would effectively fork the chain.


a git repo is practically a blockchain. Fixing this will require how git treats signatures, but no additional parallel architecture needs to be created.


I lost a Yubikey, so I revoked the subkeys on it. Github didn't let me update the existing pubkey, so I had to remove and re-enter it. Now all things signed by the revoked key, despite being before the revoke date, come up as unverified.

I'm not sure if it's a me issue or a PGP issue or a GitHub issue, but it's a pretty broken system.


How would that work for past commits? Would people be forbidden to mirror a project to git just because it contains an unsigned commit of mine from 2007?


I like it, although raises the bar for contributors to join.

Also does not help that enough percentage of repo owners would accept then signed PRs to their projects.

The fake accounts can be created with gpg signed fake commits too.


How does this solution solve the problem?

You're just adding an extra step that's hardly going to stop someone.


It would only allow commits signed by me to be pushed under my email. Github uses the email as the "proof" of commit ownership. By only accepting signed commits a user would not be able to push a commit impersonating me.


That solves impersonation, but that is not a related problem here.

These repos were not taken over but cloned and made to look like another repo via similar naming.

I think what you're looking for is more "all accounts must be verified via payment/identity" then you really know who is making "random clones" and "look-a-likes" w/ malware.

But you've got a whole host of other problems in the process.


They have attacks for different programming languages and environments. So not just a single target (e.g. npm) attack.


While browsing the nanobox repo linked in the twitter thread I started to get 404s, so it looks like GitHub is on it. Edit: other repos have vanished as well now.


> So far found in projects including: crypto, golang, python, js, bash, docker, k8s

Huh? What does that mean?


The author is being obtuse. They mean that clones have been made of those projects that include malicious code.

It's like if I make a copy of the New York Times website but replace the cover image with nudity and put it on a different URL and someone tweets "omg NYT has nudity on the front page" and clarifies, vaguely, 10 tweets down that it was actually not the real NYT but a clone.

I'm not convinced that the author is spinning it this way on purpose (ie for maximum emotional effect / retweets / internet points) or if it just comes from being too close to the subject matter, but it's pretty misleading either way.


Riiight, that makes a LOT more sense. This would have been HUGE if the actual repos were infected and it wasn’t even at the top here at HN. I was very worried for a minute there, your comment has calmed me right down. Thank you!


This should be the top comment on this thread.


It appears what is infected are forks of those repos but not the originals.


What would make sense for golang, python, docker, k8s; maybe even bash if you squint a bit.

But what's the repo for "crypto" or "js"?


How would code like this make it into so many repos? People accepting pull requests and not properly reviewing them? Or is there something even worse about this attack?


Many of the repos I found were clones of valid projects with same names under new orgs and new users. For instance, this projects is valid: https://github.com/scala-network/GUI-miner and it's infected clone: https://github.com/stellitecoin/gui-miner

GPG signed commits by the legitimate users do not contain the malware


Considering that only clones are affected, your original tweet is downright wrong. None of the listed projects (python, js, bash, docker, k8s) are affected. Anybody can fork a repository to introduce malware.


js is a project?


You're right. It's not. I just copy-pasted the list from the tweet. I assume that the author meant to write jq.


Most of them don't seem to come from pull requests, I wonder if it's paired with a bunch of compromised github accounts?


Not compromised, just created by the attacker.


TL;DR: These are forks by unknown people containing malware. I see no indication in the linked thread of even a single successful compromise actually occurring, or malicious code making it into legitimate upstream projects.


Here is a commit with malicious code from a Microsoft employee:

https://github.com/promonlogicalis/asn1/commit/7bdca06d0edf8...


That commit was rewritten from https://github.com/Logicalis/asn1/commit/d60463189a563e49f19... which was signed, but is not in the fork.


Damn, github should show some big visible warning about this.


As long as the commit is not signed (marked green), that means nothing.


This is interesting. If you go to that user's profile, and look at the "contributions", there are none in July / August. Yet the commit is from two days ago.


Do we need verified orgs on GitHub now?


> Correction, 35k+ "code hits" on github, not infected repositories.

Source: https://mobile.twitter.com/stephenlacy/status/15547180866572...


Somebody should DDoS ovz1.j19544519.pr46m.vps.myjino.ru... (mostly kidding)


I admire the desire to help, but looking at flow logs to that IP address is how people are going to determine if they have compromises in their environments. Excess traffic to the IP will just muddy the water.


myjino-ru is just a virtual hosting provider.

j19544519 - seems to be the username on this hosting.


how is this affecting people if the clone does not open PRs to the original one?

so this will send data to the hacker's network if we clone and build the wrong repo right?


Oh dear. This is a gigantic disaster.

If lots of software released today haven't been pinning their versions on release (especially Electron apps) or signing their commits if they are open-source, then this is a chaotic supply chain attack waiting to happen and is more worse than I thought.

But really it is yet, another reason to avoid GitHub entirely and just self-host using GitLab or Gitea.


You may have misunderstood (understandably, because the tweets seem to be deliberately misleading). These are malicious commits in forks of repositories. There is no supply chain attack unless you make a habit of taking random forks of popular projects from GitHub and inserting them into your supply chain.


> There is no supply chain attack

Actually yes, this is all about supply chain attacks. Typosquatting is one of the most common methods. It goes under this category.


Spam is not a problem GitHub has ever had to seriously face so far but this sort of attack does seem like it could catch some users casually googling for libraries.

If you impersonated all these real repos, made npm, pypi packages for them etc and also updated the readme I think you could catch some people off guard.



The supply-chain attack is a self-inflicted attack if you're Googling a library and copy-pasting it as a Git dependency without so much as a glance at any of the numerous indicators that are screaming at you that it's untrustworthy.

It seemed pretty clear to me that GGP misunderstood this as malicious code being inserted into existing trusted repositories, which is a common misunderstanding in the rest of the comments, and seems to be encouraged by the poor wording of the tweets.


> The supply-chain attack is a self-inflicted attack

It is attack regardless. Someone has made something malicious which affects for the process for the end-user acquiring the final software.

> it seemed pretty clear to me that GGP misunderstood this as malicious code being inserted into existing trusted repositories, which is a common misunderstanding in the rest of the comments, and seems to be encouraged by the poor wording of the tweets.

I think the author just wanted to get attention and be sensational. He deliberately did not mention that they are forks. Just rushed to report findings.


The last paragraph is orthogonal to the problem that an npm install poses here, wherever your repo is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: