
8M GitHub profiles were leaked from GeekedIn's MongoDB - jashkenas
https://www.troyhunt.com/8-million-github-profiles-were-leaked-from-geekedins-mongodb-heres-how-to-see-yours/
======
r3bl
I'm one of the affected users.

My initial reaction was that I had absolutely no idea that this site even
existed. While this is publicly facing data, it might explain some situations
that I experience, such as people sending me generic recruitment emails to my
old email address (the one leaked here, that hasn't been on my GitHub page for
months now) filled with the name of the GitHub repository with the most stars
and telling me how they liked my code in that repository (even though the most
popular one on my profile is not software-related at all).

Mini ask HN to those whose email address is public on GitHub: How frequently
do you get these kinds of recruitment emails?

~~~
avar
Maybe one a month. Usually "We saw your profile on GitHub which matches <our
vague search criteria> and would you be interested in
<hackathon/conference/somesuchthing>?".

To hijack your comment for my own follow-up question. I'm in the EU
(Netherlands). Many of these mails are clearly mass E-Mails with just my name
as a template sent by some US-based company, they'll include an unsubscribe
link for future E-Mails, but aren't unsolicited mass-marketing E-Mails like
these illegal in the EU, and if so would it be worthwhile reporting them, and
who to?

~~~
chris_7
Flag as spam with your email provider would be a good start.

------
daw___
From GeekedIn's announcement [0]:

    
    
        It is a site that crawls open-source code hosting sites (e.g. github and bitbucket) 
        and creates profiles of open-source projects and open-source developers.
        Those profiles include things like technologies used by a developer on open source
        projects (e.g. Scala/Java, .NET, Clojure, Python, etc), libraries and frameworks
        (e.g. Hibernate, Spring, JQuery, bootstraps, etc.), on many cases locations of
        the developer, and even when a developer used a particular
        technology (for Open Source projects)
    
    

[0] "I'm creating www.geekedin.net" [https://www.linkedin.com/pulse/im-
creating-wwwgeekedinnet-er...](https://www.linkedin.com/pulse/im-creating-
wwwgeekedinnet-ernesto-reinaldo-barreiro)

------
Symbiote
If they're based in Spain, isn't this a clear breach of EU data protection
legislation?

Is it even possible for the service to comply with the data protection
directive? It's necessary to obtain consent before processing personal data,
but that's not the case here.

[https://en.wikipedia.org/wiki/Data_Protection_Directive](https://en.wikipedia.org/wiki/Data_Protection_Directive)

------
wyldfire
Boy, I gotta say I was really nervous when I saw "Have I been pwned" in my
inbox. Took a while to convince myself that it's just my public-facing Github
profile.

~~~
kaoD
Then imagine how hard my heart dropped when I was taking a summer nap and
suddenly Google notified me (through my Android phone) that some suspicious
login activity was taking place in my main mail account. The panic!

Yes, I reused my Gmail password in $pwnd_service. Bad idea.

Fortunately Google managed to detect the unusual activity and lock them out,
but maybe others aren't that lucky. That event led me to finally use a
password manager and stop reusing passwords everywhere.

What an afternoon, trying to change password from all services I could
remember. Can you remember all services you signed up for with your email?
Cause I definitely didn't.

Quite a humbling experience.

~~~
DCoder
> _Can you remember all services you signed up for with your email? Cause I
> definitely didn 't._

You might also want to check your browser settings to see which domains have
saved cookies, saved passwords, or the "never save passwords on this domain"
flag.

------
Sephr
How is this notable? It's just public data scraped from GitHub. I wouldn't
care even if they _intentionally_ redistributed this database to everyone.

The more in-depth metrics are also things you can scrape from GitHub. If you
aren't comfortable with these metrics being calculated about your public
actions, then you probably shouldn't have signed up for a "social coding"
site.

~~~
runeks
I agree. Redistribution of public knowledge can hardly be described as a
"leak".

I received an email from haveibeenpwned.com about this, but I don't see how
this makes sense, given that no information was revealed that users hadn't
consented to reveal.

------
chadscira
Doesn't github already make majority of this data public? With the exception
of emails... Previously they used gravatar hashes which allowed email brute
forcing (since removed). Then they also were not omitting emails from commit
messages.

[https://cloud.google.com/bigquery/public-
data/github](https://cloud.google.com/bigquery/public-data/github)

~~~
a3n
It's like DMV records being public. You used to have to go there, or phone,
and you were only looking for information about a specific person.

But if it's available on the internet, it becomes a different thing entirely.
Rather than being track-downable for specific purposes, you become harvestable
for mass purposes.

Merely being known is sometimes the first step in being victimized, and these
kinds of things make being known easier and more frequent.

~~~
username223
> You used to have to go there, or phone, and you were only looking for
> information about a specific person.

Exactly. If $INTERNET_DATA_HARVESTER had to pay someone to watch whenever it
wanted to see what I did on the internet, it would cost them a lot of money,
so they would only watch if they had a reasonable suspicion of profit. Even at
starvation wages of $2/day, that's $700/yr/person, which would bankrupt the
major data harvesters. Even if Facebook only paid each starving person to
stumble over to the DMV one day a year, it would cost them $42 billion to
monitor their users.

------
danielpatrick
Two days ago I received an email from a recruiter at
[http://sourced.tech/](http://sourced.tech/) mentioning they had analyzed my
open source contributions and they had a position that matched me. It looks
like source{d} brags about scraping personal data from sites for recruitment
purposes.

The problem is the email they used is not publicly facing in my Github
account, but it _is_ the login email that I use for Github. I never got the
"Have I Been Pwned" email, but this is very concerning to me, I keep this
email very private.

Coincidence?

~~~
pkill17
Are you sure none of the commits on any of your repos accidentally have the
author email as the private one you login with?

~~~
throwwwwwwwwww
I was about to say the same thing. For example if you go to someones GitHub
profile and they have a public e-mail listed but you then go their
repositories, filter by "source" (repos created by them, not forks) and you go
to the oldest one they have of that kind, git clone that and then

    
    
         git shortlog -s -e
    

You might find that they were using their school e-mail back then for example.

Also if you clone all their repositories and everything else they have
contributed to you might find an occasional commit in which they accidentally
used for example the e-mail they have at work.

~~~
dom0
You can just retrieve the Git patch directly from the commit page (add .patch
to the URL), which has all the metadata of the commit, including author info.

------
thejosh
So it's just public info you can already get from Github?

~~~
prplhaz4
Well, kinda, but based on the examples given, the data also includes
assessments and assumptions about you presumably based on the aggregated
information that is available.

They appear to be building recruiting profiles of github users based on their
public GH profile and commit info.

~~~
yeukhon
But you just said they are all public data, it just happens someone spent the
time to build a profile more "digestible". So in the end, really, this is
public data.

------
michaelmior
For anyone who is not aware, Have I been pwned[0] is a great way to keep tabs
on data breaches involving your data. You just give them your email and you'll
be notified if it shows up in data dumps from any major breach.

[0] [https://haveibeenpwned.com/](https://haveibeenpwned.com/)

------
a3n
I just signed up to HIBP because of this.

But I'm still nervous about it, because now I'm known by another party, _and_
they know where I've been compromised and with what email. HIBP is probably
good people, but they can be breached too. It's why it's taken me this long to
convince myself to try it.

~~~
Gaelan
All of HIBP's data is breaches that are already floating around the web, I
believe.

~~~
a3n
And here they all are in one convenient place.

~~~
Gaelan
I'm responding to your worry about HIBP being breached–there is no new data to
leak.

------
captn3m0
First breach for my current email. There goes my clean track-record, although
I guess this one doesn't count that heavily.

_Edit_: Just checked my leaked info. Either Troy's service or geekedin didn't
scrape the data properly, because it shows up my very tiny secondary account
I'd made long back for some github specific testing (that required a second
account)

------
toyg
I've long subscribed to HIBP, so I got the relevant email about this today;
however, when I check on the site, I don't get the "raw geekedin data" button
in Safari nor in Firefox. What's up?

EDIT: I basically re-subscribed, and once I re-verified the address, I got the
button.

------
Cub3
Grr, i'm both affected bu this and the recent Donate Blood leak, is there
anything I should be doing to increase my security to protect against these
kinds of leaks?

Or, even protect things like my Bank Accounts, Utility accounts etc. against
spear phishing attacks?

------
plorntus
[http://i.imgur.com/fWT8usT.png](http://i.imgur.com/fWT8usT.png)

I really should have abused the github email bug further to do something more
fun with scrapers.

------
kristianp
I was a little surprised to see mr Hunt publishing the IPs of the Mongo
servers. That's not something he usually does.

------
specialp
[https://api.github.com/users/1](https://api.github.com/users/1) This isn't
hard to do. All they did was iterate the user ids from 1-n. It is the Rails
way to assign ids incrementally.

~~~
orf
> It is the Rails way to assign ids incrementally.

Well... their databases way. It's not 'the rails way'.

~~~
kristianp
It's also the Rails way unless things have changed since I last used Rails.

------
cdevs
Can geekedin help us hire developers that have some security awareness? Maybe
make a site that list those that work for departments that get breached in the
future and there will be some accountability for the first time.

------
algesten
my leak says I only have 2 years of experience?!?! puleaaze! how do I correct
them?

~~~
Ziomislaw
mine says 47 years. I AM NOT THAT OLD ;p

------
username223
It seems like GeekIn is straightforward web-scraping by a greedy creep. This
is why I just create a new throwaway account whenever I need to deal with
Github in a logged-in way. Thank God for Mailinator.

------
SCdF
Interestingly, I don't have my GH location set, and so their data defaults to
ZA, and my location to 28.16256, -23.41612. Weird they couldn't just leave
those values blank.

------
bbcbasic
You can find some github profiles with old information in the waybackmachine.
Not troyhunt (unfortunately as that would have been nice to show off my
point!) but I found some others.

------
Halienja
Demystified the Recruiter spam - "We've analyzed your GitHub contributions".
Thanks Troy or else I would have kept wondering about these spams!

------
libeclipse
A friend of mine's github profile returned a 404 for a while today. I'm
thinking it wasn't a coincidence.

------
sriehl
I was wondering why I started getting recruitment emails. I'm glad I changed
my email address on github recently.

------
ErikAugust
The "data trading scene" referenced by the author. What/where is this? Dark
web?

------
necessity
Public data LEAKED!!!

------
addadandan
Yeah I agree. Full of shit. Seems like a gorilla marketing tactic for have I
been pawned. I definitely feel as though i have been pawned now.

The data is not hacked and therefore you are legally entitled to expose via
your HIPD service. When I mentioned that the IPs were exposed they were taken
down in less than a couple of minutes. Yet 18 minutes after you replied saying
that you contacted "someone" to take them down. Clever way to get on the front
page of hacker news. Right out of the politics of fear playbook :) Have we
been pwned?? Yes we have.

Traffic whore.

"One of the key projects I'm involved in today is Have I been pwned? (HIBP), a
free service that aggregates data breaches and helps people establish if
they've been impacted by malicious activity on the web. As well as being a
useful service for the community, HIBP has given me an avenue to ship code
that runs at scale on Microsoft's Azure cloud platform, one of the best ways
we have of standing up services on the web today."

