I am conflicted with this because I wrote a similar tool in Go five years ago [1], which attempts to find relevant email addresses in the Git commit history, GitHub API, and GitHub profile page. Paul Irish, a Google Chrome Software Engineer, inspired me to create that program. His version is written in Bash [2].
Many people started using my program, and GitHub ended up banning many of them for misbehaving (spamming) other users. I have not updated the code because I do not want to incentivize people to spam, especially IT recruiters, who are not ashamed to use the same tactics to recruit people.
I would encourage @jemmaissroff and everyone else to be mindful and ethical with this type of project.
I also wrote this same program for my work as an product manager working on oss projects, but ultimately decided not to release it publicly due to my own ethical concerns. I use it exclusively to contact folks who "touch" my projects repos, in order to get better feedback on the problems they're encountering, and better understand their use cases. But, I also understood this tool would be very valuable to spammers and generally make the quality of life on GitHub worse.
I'm also of the opinion that grabbing people's email through the commit history is at least shady if not outright unethical, even if it's public. Most folks I talk to don't realize that their email is public this way.
I tried recently to productize my version of this tool [1], with the caveat that you can only use to contact folks that tough repos you or your organization owns. It hasn't seen much success so far though.
An acquaintance just recently got an email spam about some "network scanner as a service" (sken[dot]ai) because he had starred the bandit network scanner repo. Shady as fuck.
Related: you can set a noreply email address (e.g., kuyan@users.noreply.github.com) for commits that forwards to your actual email address. You can also ask Github to reject commits that do not have the noreply address set.
I'm unsure what the etiquette is with with email addresses in commits–if someone doesn't list it on their profile, but they use it in their commits, should you really be using it to contact them? Does using an automated tool that finds the one time they forgot to use a noreply address make it any worse?
(Notably, the author of this leaves their address off their profile, but leaves it in their commits. Would emailing them be reasonable?)
Roughly summarizing my personal take on this: "Am I the indended audience for this instance of the email address being public?"
- If there's an "obviously" better forum, prefer that (e.g. a public issue tracker for non-security bugs)
- If I'm likely to have my own git clone (e.g. I'm another developer or security researcher actively interested in the project), go for it! I tend to be fine with inqueries from this audience even if it's not necessairly project related, although bonus points if they are.
- If you were specifically trying to hunt down a point of contact for me and me alone, that's probably fine too. No untargeted and untailored recruiter spam! And likely not as part of a wider recruitment drive, even if you're "tailoring" your recruitment pitches. ...if you're offering me a job specifically to continue working on one of my many abandoned FOSS side projects that you, personally, as another developer have been using - or because we've worked together in the past and want to again - that might be OK ;)
- The recipient might redirect you to other channels they see as better fitting - obviously, respect that if they do.
I’ve used that in the past to reach out to developers, and had fruitful discussions. (And they didn’t seem to mind; it’s great to find fellows interested in the same niche/obscure topic!)
I was under the impression that it was a very deliberate choice for git to store/show email, and part of participating publicly in a software project involves attaching a name/email to commits. If one doesn’t want that exposed, then it’s okay to have some junk/null value.
The worst thing would be for some automated software to weaponize this by mining email addresses from git repos and spamming them or selling them to marketers. <this is why we can’t have nice things>
I assume the author of this tool could have no reasonable objection to being contacted this way. They created it, so they should expect people are going to use it, and, if someone contacts them this way, I don't see where they have any standing to complain.
The other question is tougher, because I don't think people put their email addresses in their git identities for this purpose. Yet, it's still technically public information. I am not sure what the right answer is here.
The purpose of email addresses in Git commits is certainly to allow contact from others looking at the code — but that is more relevant for a distributed project like the Linux kernel (coordinated with mailing lists) than a single-committer project on Github.
Occasionally, those email addresses are used to try and contact contributers to an open source project to check copyright, licenses etc. Sometimes many years later.
The readme & blog posts usage of "cold mail" screams "spamming" to me, which is not okay. I could see using the address in a commit for a purely technical contact relating to that software.
Doesn't this person's stated purpose violates GitHub's TOS?
"If you would like to compile GitHub data, you must comply with our Terms of Service regarding scraping and privacy, and you may only use any public-facing User Personal Information you gather for the purpose for which our user authorized it. For example, where a GitHub user has made an email address public-facing for the purpose of identification and attribution, do not use that email address for commercial advertising."
Making an email address public-facing for the purpose of identification and attribution != cold email me. Using an attribution email for the purposes of contacting someone is not an authorized use.
You might be right, and if you open an abuse Report on the repository linked in this post (it’s in the hamburger menu on iOS, dunno about web), you could simply paste your comment into the text area and be done.
That’s certainly a possibility for GitHub to consider in response to my complaint, but only GitHub is qualified to evaluate the accuracy of that statement, and they’ll only do so if someone reports it. (They can also update the terms of service, if they deem something to be unacceptable but not yet prohibited.)
I remember when Gmail came out the spam filtering was so good people commented about how they finally felt comfortable sharing their email in public again. Not anymore, I guess.
It makes me sad that email as a system has gotten so wildly out of control that users feel the only way to protect themselves is to hide their address. Clearly the spammers have won.
I'm actually surprised there hasn't been more email innovation over the last decade. Surely we can do better here so our reaction to something like this isn't anxiety.
I don't know if this is my personal experience alone, but I feel that open source spam filters like spamassassin and rspamd do a much better job than gmail's filter. Gmail's filter has too many false positives and false negatives. We shouldn't give up on email just because gmail is bad at it.
Besides, I feel that gmail and most other webmail providers have created a bad usage pattern for email. Email is much more pleasant when used with an indexing tool (notmuch/mu), custom filters (sieve) and a proper spam filter. Custom domain with dedicated account (or alias) for development also improves the situation.
I was using gmail as an example. I have no idea what people are using for spam filtering, but whatever it is it's clearly not so good that they don't care whether their address is in the public.
Another fun trick: curl https://github.com/USER.keys to list a user's public ssh keys. Iterate over every GitHub user, then correlate. Now you know what accounts are actually the same user (or share ssh keys).
Many people started using my program, and GitHub ended up banning many of them for misbehaving (spamming) other users. I have not updated the code because I do not want to incentivize people to spam, especially IT recruiters, who are not ashamed to use the same tactics to recruit people.
I would encourage @jemmaissroff and everyone else to be mindful and ethical with this type of project.
[1] https://github.com/cixtor/emailgetter
[2] https://github.com/paulirish/github-email