
Characterizing secret leakage in public GitHub repositories - feross
https://blog.acolyer.org/2019/04/08/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/
======
dguo
I accidentally published[1] my AWS secret key last year because I pushed an
old project from college. At the time, I was very new to using source control
and had little idea how to distinguish between what should and shouldn't be
committed. I hope colleges and code boot camps go over that sort of info
nowadays. The usefulness to effort to learn ratio seems exceptionally high.

[1]: [https://www.dannyguo.com/blog/i-published-my-aws-secret-
key-...](https://www.dannyguo.com/blog/i-published-my-aws-secret-key-to-
github/)

~~~
aiddun
My sophomore year of high school, I was trying to writing a Discord (chat
platform) bot for a server I shared with friends and unknowingly included the
private key in a public repo I hoped to show them. A specifically written
crawler for Discord keys found the key and starting spamming the server with
images of very very undesirable things from the far corners of the internet at
a rate of hundreds per second. Needless to say I learned my lesson the hard
way.

~~~
sbmthakur
Thanks for sharing! How do people write such crawlers? Do they specifically
point them at Github repos?

~~~
yorwba
The paper discussed in the article describes writing such a crawler. They
simply use the GitHub search API.

------
pmc
An increased use of SSH keys is also found in our SSH honeypot:
[https://pmcao.github.io/caudit/](https://pmcao.github.io/caudit/) ~ plugging
our paper.

------
rsmolinski
Looks like a great methodology and good results. Looking forward to reading
the paper because I've been working around the GitHub API restrictions for the
same purpose.

Specifically, I'm building a SaaS
([https://www.locktower.com/](https://www.locktower.com/)) for organizations
(or security teams) looking to have a managed solution for detecting leaked
secrets in GitHub/BitBucket/etc. I'm in the process of building an on-prem
version as well. Overall, I really hope to help drive down the number of
unresolved leaks that the authors found.

------
7ewis
I wrote a tool that scans all the new commits to our Org for
passwords/secrets.

Webhook > AWS API Gateway > Lambda

The Lambda uses the new(ish) Layers feature so it can use Git. I then use the
truffleHog[0] library to scan for entropy/regexes inside the commit.

If something is detected, it posts to an SNS topic, which is currently
subscribed to by another Lambda that posts an alert to my team and the
Security team's Slack channel.

It then calls the GitHub API to make the repo private to limit the exposure.

[0] -
[https://github.com/dxa4481/truffleHog](https://github.com/dxa4481/truffleHog)

~~~
semi-extrinsic
Why not have a pre-commit hook clientside that runs truffleHog AND if
successful generates some form of file indicating it was run, then have a
serverside hook checking for that file? This should be doable even with plain
Github/etc, no?

~~~
waffleguy
That was too simple to think of ;)

~~~
ithkuil
How do you install serverside hooks in GitHub?

~~~
waffleguy
I’m sorry! I didn’t realize my screen name said google

~~~
ithkuil
I'm sorry! I realized my question sounded as if I was asking instructions
about how to do something that was possible but I didn't know how to do it.

Rephrasing:

What if you use a git server like GitHub that doesn't allow you to install
server side hooks?

------
novaleaf
This highlights the difficulty of sharing secrets with your production code.
How can you get secrets into production in a secure way?

Cloud providers have proprietary solutions, but those don't work on other
providers (or your local dev env).

Rolling your own secrets server seems like an expensive centralized disaster
waiting to happen.

It seems like putting a secret into source code is one of the least risky
options. Just make sure it's not in a public git repo.

------
pr0tocol_7
[https://github.com/zricethezav/gitleaks](https://github.com/zricethezav/gitleaks)
plugging my own tool. You can enforce custom rules like entropy ranges +
custom regexes to get less false positives similar to what is described under
"Validity Filters" in this article.

------
torbjorn
Article quotes someone making this claim: > we discovered that even if commit
histories are rewritten, secrets can still be recovered…. we discovered we
could recover the full contents of deleted commits from GitHub with only the
commit’s SHA-1 ID.

Do repo cleaning tools such as [https://rtyley.github.io/bfg-repo-
cleaner/](https://rtyley.github.io/bfg-repo-cleaner/) leave the original
commit's SHA-1 ID intact?

------
aflag
Shouldn't GitHub put in some sort of warning for potential leaks when they
happen?

~~~
gcommer
From the paper: "GitHub recently introduced a beta version of Token Scanning"

[https://help.github.com/en/articles/about-token-
scanning](https://help.github.com/en/articles/about-token-scanning)

[https://github.blog/2018-10-17-behind-the-scenes-of-
github-t...](https://github.blog/2018-10-17-behind-the-scenes-of-github-token-
scanning/)

~~~
aflag
That's ok, I guess. But I was thinking more in the lines of blocking the push
unless you disabled the feature from your repo.

------
herohamp
Wait do people not use .env files? I've aliased "gitinit" to make a .env file,
.gitignore that ignores env nodemodules etc, then runs git init

~~~
scarface74
There is no excuse to ever have AWS secret keys anywhere in your code or your
settings.

If you are running locally, you should be using your own secret keys that are
configured in your user directory with

    
    
      aws configure
    

If you are running on anything within AWS you should be using a role attached
to your EC2 instance or lambda and the SDK can retrieve your keys
automatically.

Unfortunately, every single third party code sample on the internet has you
including the secret keys in your code.

~~~
i_am_nomad
An employee of mine once committed a keypair for our company GSuite, clearly
labeled, in a Python script. I asked her to remove it from the repo, and she
simply pushed a new version of the file with the keypair gone. Plus, she
hadn’t configured .gitignore, so all the binaries were there too.

~~~
inimino
The right request would have been to revoke it, not to try to remove it from
the repo.

------
lallysingh
The title's a great example of Pun? Description. The pun gets the attention,
the description tells you why you should click.

