Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: GitHub leaked names of private repos with pages
149 points by cillian64 on Feb 15, 2022 | hide | past | favorite | 22 comments
I just received the following email from GitHub:

  Hi <username>,
  
  We're writing to let you know that between January 2021 and September 2021, the following information about your repository was inadvertently made publicly viewable after being sent to a third-party vendor as part of metadata analysis of GitHub Pages sites; the name of the private repository and the GitHub username with ownership of the repository. No repository content or other private data was exposed as part of this incident.
  
  User privacy and security are essential for maintaining trust, and we want to remain as transparent as possible about events like these. GitHub itself did not experience a compromise or data breach as a result of this event, nor did unauthorized users gain access to repositories. Read on for more information.
  
  * What happened? *
  
  GitHub learned from an internal discovery by a GitHub employee, that GitHub Pages sites published from private repositories on GitHub were being sent to urlscan.io for metadata analysis as part of an automated process. This internal process was implemented before the private GitHub Pages feature was released and provides metadata that is used during human review of potentially malicious or abusive GitHub Pages sites.

  To view the name of the private repository on urlscan.io, you would need to have been looking at the front page of urlscan.io within approximately 30 seconds of the analysis being performed or have specifically searched using a query that would return the analysis in the search results. 
  
  * What information was involved? *
  
  The following URLs, but no content, were made publicly viewable:
  
  GitHub Pages URLs
  <redacted - URL to private github page>
  
  * What GitHub is doing *
  
  GitHub immediately began work on fixing the automated process that sends GitHub Pages sites for metadata analysis so that only public GitHub Pages sites are sent for analysis. Future analysis of public GitHub Pages sites will be unlisted from public view as an additional protection.
  
  We also worked with the third-party vendor, urlscan.io, to delete all existing public records of private GitHub Pages sites generated from this situation.
  
  * What you can do *
  
  No action is required on your end; we have updated our systems and worked with our third-party vendor to ensure this data is no longer publicly viewable. 
  
  Please feel free to reach out to us with any additional questions or concerns through the following contact form: 
  
  <redacted>
  
  Thanks, 
  GitHub Support


This email actually increases rather than decreases my trust in GitHub - it explains exactly what happened and answers all of the questions I might have about the incident. It reassures me that they take this kind of thing extremely seriously.


Yeah. Mistakes are going to happen and what really separates the good companies from the bad ones is how they deal with those mistakes. This seems to be a good example (for the reasons you stated). I have seen plenty of incident reports from other services that don't really tell me what happened or what exactly was done about it. That's when I feel unhappy. Within reason. I'd still be pretty unhappy if a company was really negligent, even if they had a really great incident report :P


My thoughts: I appreciate that they are very open about exactly what information was public and how this happened. Compared to many disclosures this is refreshingly straightforward. However, it looks like that they are only telling me about this 4-5 months after it was discovered internally, which is disappointing.


> it looks like that they are only telling me about this 4-5 months after it was discovered internally

My guess is that it took them that long to figure out exactly what happened and then work with the vendor to scrub the data and verify it was scrubbed. They probably also asked the vendor to analyze their logs.

So I'd give them the benefit of the doubt and say this was probably the quickest they could go.


kudos to github on this specific case, too. It's possible that they did take 4-5 months to figure out the whole picture, either due to complexity of the indcient or simply the bureaucracy (needs to be reviewed by PR or even lawyer etc.)


I assume all private stuff on Github would be leaked at some stage and change my behavior accordingly. You have to code as if someone you don't want will be reading it. Same goes for any service advertising their service as 'privacy aware'. How do you audit those claims?

I mean Whatsapp is closed source for example and could be intercepted without you knowing, despite all their claims of 'privacy by design using E2E encryption'. Whatsapp also leaks metadata which can be even more detrimental to privacy instead of mere content leaking out.

You have to assume all secrets will be leaked (by whatever means) and change your behavior.


Honestly, any illusion of WhatsApp being secure went out the window when the original founders left, leaving billions on the table, and then donated millions to Signal. I think that was a pretty clear indicator that privacy/security of WhatsApp was over.


> You have to code as if someone you don't want will be reading it.

That should be standard practice no matter where your code is, Github or not.


The only reason I have private stuff on GitHub is because, last I remember, they don't have an "unlisted" option. If I needed a repo for anything seriously proprietary, I wouldn't use GitHub for that.


> I assume all private stuff on Github would be leaked at some stage and change my behavior accordingly.

This is good, but should be expanded to all online information. If you have something on the Internet, you should assume it can become public at any time. There is no such thing as computer security, and we need to stop pretending that there is.


> You have to code as if someone you don't want will be reading it.

This reminded me of "Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live."


Sorry, have we worked together?


On the off chance that's not sarcastic, I doubt we have.


don't worry, it was merely a lighthearted comment about some of the people I have worked with in the past.


I find it bizarre that urlscan.io displays recent scans from paying customers. I assume GitHub is large enough that they have to pay, anyway. If they're not, who is?


From the URLscan pricing page [0], it looks like each plan has a tier of "private", "unlisted", and "public" scans. It looks like you're somewhat incentivized to just publicize all scans because that's the most economical. Based on what GitHub's email said, they've opted to scan things in public, probably assuming that the repos are public anyways. It looks like this assumption was a poor one to make, in this case.

[0]: https://urlscan.io/pricing/


Oh man I thought this was gonna be just showing the TLD or something. There is a scrolling list of scans, down to the exact HTTP transactions. Just watched an OAuth grant roll by in plaintext. Yikes.


I saw an authenticated (!) zoom invitation, yikes indeed.


There are also screenshots on every scan page, for example unsubscribe links where the email is visible.

This site must be a treasure trove for spam harvesters.


When you run a scan you specify whether it’s public, unlisted, or private. Can someone here explain the utility of non-private scans? (The urlscan.io folks apparently think it’s too obvious to explain.)


> To view the name of the private repository on urlscan.io, you would need to have been looking at the front page of urlscan.io within approximately 30 seconds of the analysis being performed

Oh okay, risk is relatively low. Then:

> or have specifically searched using a query that would return the analysis in the search results.

Well, this is not good. Of course, it depends on what you'd publish on a private page, but its discomforting.


> Future analysis of public GitHub Pages sites will be unlisted from public view as an additional protection.

Having just checked their website, I'm pretty certain this hasn't happened yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: