Hacker News new | past | comments | ask | show | jobs | submit login
GitHub commit search: “remove password” (github.com/search)
861 points by rsc-dev on Feb 15, 2017 | hide | past | favorite | 257 comments



Too many comments here recommend to clean up the commit and just hide the mistake under the rug. This is wrong.

If you leak a password to any public location, there is only one reasonable course of action: CHANGE IT!

Don't even bother rewriting the commit. Focus on changing that password right away, and while you're at it, figure out a better way to manage your secrets outside of your source code in the future. Mistakes happen, but they shouldn't be repeated.


the solution is to store the password and any other sensitive information in a text file that you read when your program starts up. And don't forget to add that file's name to .gitignore so git will ignore it. As simple as that. :)

If you leaked the password in the git repository, change it as @jvehent just commented.


I think its as important to make it "hard to do the wrong thing" as "easy to do the right thing". In this case having to explicitly exclude a file containing passwords from being deployed would fail that rule of thumb.

The Azure Key Vault is a good solution that so far seems easy to work with (I've only just started using it though) and it can make the storage of secrets easier to secure but you still have the issue of storing the credentials to the key vault, so its difficult to fully get rid of the problem although you can definitely isolate access to your passwords that way and then have a more limited number of credentials to secure.


Somehow I doubt Azure Key Vault is easier than a gitignore line and a text file


Probably not, but it's a lot easier to accidentally commit (or otherwise expose) a text file.


Its not easier for sure, but it is more reliable and harder to mess up by making a single character mistake.


.gitignore is easy but the pw is still naked. Probably a good practice to hash it.


Probably not as the result would be useless for authenticating against another service.


As you say, the Azure Key Vault helps making things more secure (by allowing to control, log and revoke keys usage), but it does not help at all with the problem of API keys in the source code - it's just another set of keys that you need in your config.


The way we do it for azure app services is to store most keys in the "application settings" section of the portal so most deployed connection strings, api keys, oauth2 credentials etc only exist in config files for local development and for everything else they are defined in the portal. You can lock down access to the portal and app services to only people who will be managing them (in a larger shop than us, dev ops) so developers wouldn't even have the keys to the kingdom. There are probably better ways to do this, but it has gotten us away from keys in config files.


Sometimes! (Full disclosure: MS employee, big-time Azure user in a different group)

An AAD Application can act on behalf of a logged-in user (using OAuth2 or openid) with the correct delegated permissions. This means you can grant key/secret/certificate CRUD privileges to an AAD user or group, and then use OAuth to obtain a token granting access to the KeyVault resource. All activity is performed by the client (read: application) on behalf of the user (read: human) against the resource (read: key) without having to store any secrets at all.

I use the keyvault pretty extensively, and have really grown to like it.


It can be slightly more complex than that if you use a package manager to publish your work. I once accidentally leaked a password due to having both a .gitignore and an .npmignore, and forgetting to include my .env file in the latter. Fortunately, I realized what had happened almost immediately and was able to change the password. Now I tend to `tar tf` everything before publishing.


Because of issues like this I tend to set up my deploy scripts to first `git clone` into a temporary directory, then do the rest of the work from there.


I believe that it's better to not have that kind of sensitive files be dotfiles.


Shamless plug: SecureStore, our .NET secrets manager: https://neosmart.net/blog/2017/securestore-a-net-secrets-man...

I'm drafting a writeup and will post it to HN when that's ready. Other secrets managers I've seen posted to HN seem far too overcomplicated, at least for our company's needs. This is a step up from reading secrets from a plain text file, but not so complicated that you need a separate docker image running a service dealing out passwords to your webapps or similar.

EDIT: rationale for use, as requested:

Using this approach, you can better manage your secrets since you can actually commit the passwords file in your code base, and the API lends itself to easily switching between dev and production ids/secrets. You can track revisions to the secrets file, rolling back your commits rolls back the secrets as well. You can also include the deployment of secrets in your deployment script - typically, the encryption key for your vault is only generated and distributed to the production servers once, while secrets may be added, changed, and removed continuously during the development and product lifecycles.

Using a simple secrets manager like NeoSmart's SecureStore lets you embrace the benefits of deployment automation, revision control, and more, without sacrificing safety and security in the name of productivity or ease-of-use.

EDIT2: what the heck, just took 10 minutes off to write it up and publish it: https://news.ycombinator.com/item?id=13654005


What hashing algorithms are you using? Are you updating that hashing algorithms as cipher-suites are being broken? Are we as a user required to set a random seed? Just because it is encrypted doesn't mean it is impossible to decrypt. Especially if default settings in the hash are being used and someone uses a poor password found in another breach. Its a great idea, but I would still worry about security issues around publishing a password hash.


High level APIs are used for all crypto. The user does not have to generate their own seed. You can use your own key file or stretch a secret phrase once to create one with the library. Passwords are otherwise not hashed, only encrypted.


I think you need to add a sentence giving one good reason to use anything other than a plain text file.


Thanks, good idea.

I just did, though I may have gone overboard as it is more of a paragraph than a sentence. We developed SecureStore out of necessity, believe me, KISS all the way.


The fundamental trade-off for any new tool is that devs have finite capacity to learn new tools. That's why we like to learn a few tools really well and then reuse the heck out of them. However, we are open to warning-stories - that if you do it the naive way, at some point you'll get bitten by X, Y and Z. And even then we might not acre until we actually get bitten!


Agreed that you absolutely cannot store sensitive passwords in your source code repo.

Your proposed solution, however, has its own share of problems for some deployment scenarios. Where do you get this file from? Assuming you are meant to place it by hand each time you deploy your application... what about autoscaling? What if you want unattended deployment of apps?


There are lots of ways to handle those scenarios.

Deciding where to store your secrets is extra easy if you're in the cloud. In AWS you can use KMS to store it if it's 4kb or less. A cli command or API call can decrypt it for you. If it's larger, you can use a tool such as credstash which lets KMS manage the keys.

If you're in an environment that's using Chef, it can handle them. Ansible has a solution as well.

Or you can use something like Hashicorp Vault, though it requires setting up servers for that purpose.

Once you've decided on one of these tools, it's no problem to script the retrieval of a secret into your deployment mechanism. It will work fine for autoscaling or any other unattended deployment.


Oh, yes, I didn't mean there wasn't a solution. We use Hashicorp Vault, for example. I simply meant that "store passwords in a file" (as mentioned in the post I replied to) is too simplistic to cover all scenarios.


Files are good because virtually anything can read and write them, so they are very portable. I used to work with a really great security guy who created a good system for handling secrets. He had us write them to files, but only to a tmpfs mount, so that they were written to memory and not disk. We also didn't write them to regular files, we used named pipes. This way the application on initialization would read a secret from the pipe and would block waiting for it to be available if it hadn't been written yet. There was a separate process in place to handle the retrieval of the secret and the writing of it to the pipe. As soon as it was written, the application would finish reading and continue its initialization. This ensured things happened in the correct order, and also made the file to be one-time-use.


Kms doesn't have a size limit if used right. You should use kms to store a key and store the data on s3 encrypted.


Sure, which is why I said to use e.g. credstash in such cases. It stores the secrets in DynamoDB while using KMS to handle the keys. I guess you are talking about using S3 server side encryption, which is another approach.


No I'm literally talking about taking the row key and using it as the lookup from kms for the crypto key. Then you take the plaintext, the crypto key, encrypt the plain text and store it whereever. It's 1 additional aws api call over storing the stuff unencrypted, and about 5 lines of code in java that can be turned into a 1 line library call. Not sure why you need credstash.


+1 Upvote for credstash


I'm curious of cases for when people are running into size limitations for storing secrets... what type of secrets are > 4kb? I could imagine some example but I'm wondering about real world examples...


Every file or message you want to send encrypted through AWS or store permanently in S3. I often use crypto as signing as well, since it mostly comes for free code wise at my job.


You totally can store sensitive passwords in your source code repo, just so long as they're encrypted. Here's how we do it: https://neosmart.net/blog/2017/securestore-a-net-secrets-man...

SecureStore is designed to be repo-friendly, it purposely avoids needless IV/payload regeneration, is based in plain text, and preserves element order to avoid driving source code managers crazy.


Or environmental variables loaded at provisioning though this just moves the target. But these options do not themselves have the "in motion" characteristics that provide the best reliability, scalability and security.

If your needs are more sophisticated this means keyservers where hardware security module (HSM) might by part of the equation.


Agreed that is good practice. Oftentimes though when you're prototyping something you aren't really thinking that much about the structure of the project, and then you end up accidentally committing passwords. But yeah, would be a good practice to have this deeply ingrained and just do it automatically for every similar situation.


Completely agree. There are forks, mirrors and crawlers on GitHub, even you rewrite the commit and force push to GitHub server, the original commit data still exists in the forks and mirrors, and in fact anyone can even view the original commit in your own repo if they know its commit hash.


Very much this.

The first thing you should be doing is making the password useless by changing it. Doing anything else is entirely irresponsible. Sure, remove the file in question after that... but you can't treat the old password as anything other than public knowledge at that point.


I've only ever leaked a webhook, realised minutes later, and then changed the webhook URL on the backend. It's not hard to do, and doing anything else is simply really crappy security through obscurity while hoping for the best.


Why would a webhook URL be a secret? Wouldn't it be more like internal API if anything?

I would assume that the parameters sent to the webhook, an auth token or something of the sort would take care of the security bit. Obscuring the URL seems like security-by-obscurity no?


For purposes of security, there's no difference between example.com/api/my-webhook?auth-token=[some-uuid] and example.com/api/my-webhook/[some-uuid].


Not if you treat the secret URL like a password. Plus, not all webhook callers allow you to authenticate them without supplying them a special URL.


But what if your codebase is used in thousands of places that you don't control? You can't always change it.

The real lesson is - don't put passwords in your code.


Don't use passwords/secrets/credentials that you can't rotate. If you've created a product in such a way that you can't rotate secrets, you have a large security issue that you should fix ASAP.

It's like someone responding to the suggestion to "use strong/unique passwords" with "but what if I don't have any authentication?"


Using the same password in thounsands of places isn't good either. Use unique random passwords.


Blackbox is one good way to store secrets: https://github.com/StackExchange/blackbox


Reminds me of the Bitcoinica fiasco.

Stuff happens, especially under pressure. But yeah, if that happens to you, there is no more reasonable course of action than changing it right away.


100% agree with this. Not worth the hassle to do anything else.


I liked this one:

https://github.com/squared-one/omniauth-unsplash/commit/072b...

"... It's not really removing any password, is it? But hey, why not use the momentum ... wheeeeeeeeeeeeeeeeee!"


    -    protected $password = '12root34';
    +    protected $password = '';
"I'm a bit disappointed now that putting 'protected' in front of the password doesn't protect it ;)"


Another less 'relevant' result:

  -    acceptHandshake = params.pass == PASSWORD
  +    acceptHandshake = true//params.pass == PASSWORD


That just seems like a guy testing his authorization code. I would expect the next commit to put it back to its functional state.


Like 'Revert "remove password"'? ;-)


just growth hacking


free advertising. genius!


Right after my "remove secrets" post: https://news.ycombinator.com/item?id=13650614

There are just so many of those it's crazy:

    remove .env
    YOURFAVORITEAPI_SECRETKEY
    YOURFAVORITEAPI_PASSWORD
Also replace "remove" with delete/rm/replace/etc.

And replace "YOURFAVORITEAPI" with CircleCI, Travis, Mailchimp, Trello, Stripe, etc, etc.

Also, companies I contacted consider it the customer fault and basically don't care.


I once pushed my Amazon S3 key to GitHub accidentally. Realized instantly what I'd done, and while in the process of feverishly regenerating a new key, my cell phone rings. It's Amazon telling me I pushed my S3 key to GH.


It happens to us as well.

The interesting thing is that there is also an evil crawler that will automatically launch thousands of windows vms to mine bitcoins (that's all they do). Amazon told us that we have leaked our account id and secret but also they notice the other crawler has launched a lot of VMs and they did a refund to us. yes, we love amazon.

Lesson learned: you never put the account id and secret in your code, not only that you should not hardcode it, but there is no need to even read that from the environment etc.

Don't do something like this `new S3({accountKey: ..., accountSecret: ..}` instead you do `new S3()` and that's it. Every AWS SDKS is smart enough to find the keys in the environment following a series of steps:

- environment variables

- ~/.aws/credentials

- and when your code is run on ec2, lambda, etc. you should use IAM Roles.

So, in addition to not hardcoding an AWS secret, your code should not even pass the secret to the SDK.

Consider also enabling CloudTrail and have alerts on that.

There is also a way to not have ~/.aws/credentials in your machine and have another thing that requires MFA. I am not familiar how this work yet but we started to use it.


Whoa, that's actually amazing. Wonder how they got alerted and reacted so fast.


Github provides a public firehose for events[0]. So it's possible to hook a process to read from the firehose, and look for commit events and then match file contents against the list of API keys.

[0] - https://developer.github.com/v3/activity/events/


Yikes. So this is where the evil crawlers are sitting.

Reminds me of the water pipeline in Finding Nemo with the crabs above it.


Alexa probably overhead the developer swearing…


It's cheaper for them to give a few engineers a web crawler project that's this specific than it is to refund people. Im just surprised they don't have an "auto revoke access key if found on interwebz" setting in the AWS account settings actually.


It's not surprising, consider the failure modes:

- a key is made public, and we have to call a user or refund them (for retention purposes)

- a key is made public, and we revoked the key, potentially breaking the customers builds/deploys and potentially knocking a customers stuff out (if, for example, a key is disabled during a push to production).


I heard AWS has a crawler for that specifically. Not sure if it's true, but makes sense based on the anecdata.


You pushed your secret key, and they recognised it?

Does that imply that they are not hashing secret keys, or did you also push the account key (allowing for a single auth test on their side)?


It's also possible that they just scrape the Github firehose for common patterns like

  AWS_SECRET_KEY="FOOBAR"
and send a message to the committer's email (since you presumably used a correct/valid email in the git commit).


The secret key can probably be used to generate the account key.


It is customer fault.

However it should be pretty easy for them to set up a script to search github for this kind of stuff and automatically invalidate keys


At my work one of my coworkers accidentally put a secret token in a GitHub issue. Couple hours later he got an email from the sysadmin at the parent company saying his token finding script went off. He probably wouldn't have noticed for a long while if that script wasn't running.


Wouldn't the token-finding script be even more of a risk?

If the token is XYZ and the script is searching https://github.com/search?utf8=%E2%9C%93&q=XYZ&type=Commits&...:

1. It's sharing the token with GitHub.

2. It's embedding the token as query-string parameter in a GET request, which is much more likely to be logged (than sending it as data in a POST request), and more likely to be available to less-privileged/less-trusted staff.

3. If the request is sent to a non-HTTPS endpoint, the query can be MITMd, revealing the token.

I'd be very wary of setting up a token-finding script, it feels like it adds more risk than it saves.


You can scrape the issues without exposing the token. You could probably do it by just subscribing to all of them and parsing emails. No one(especially in security) should be using a third party search to match sensitive data. It's like searching Google for your social security number.


It was a pattern based script, all the tokens had the same length.


You just search that some token was uploaded by your people, not specifically yours.


Maybe they search the tokens public key and not the token itself. Then if the public key is found, then they download the repo and do scanning for the private key.


A group at BigCorp Inc. was sharing a tool they'd written to ThirdParty Ltd. As part of this, they transferred documentation, including how to configure the tool. Including an example. With a real AWS key. For a dev too, so the key had no restrictions.


And this would be a cool feature from github too. A link mentioning "we found something in your code that looks like a secret, please know people will use it."


They do this for all of their own API keys already. They not only notify you but instantly invalidate a key pushed to a public repo.

Annoyingly there is no way to turn it off even when you explicitly want to share an API key knowingly. But i'm more than fine with needing to "obfuscate" an API key or manage secrets correctly knowing it saves TONS of people.


Why would you ever want to share a valid Github API key publicly?


It's been a while, but IIRC it was a key with no permissions used on a CI server to get around github's API usage limits.

It probably wasn't the best idea, but it was the only "secret" needed in the whole project and I didn't want to maintain a way of managing secrets in a public project for a pointless key.

In the end I did just that, and looking back it was the better choice, but at the time it was annoying.


"... used ... to get around github's API usage limits."

I wonder why they'd want to invalidate that. :)


Continuous development, e.g. Jenkins? (Please don’t do this)


Why not?


Out of curiosity, why not a deploy user? https://developer.github.com/guides/managing-deploy-keys/


Split it to parts and concatenate it, then?

$key = "BAAD" + "F00D" + "CAFE" + "BABE";


> But i'm more than fine with needing to "obfuscate" an API key or manage secrets correctly


GitLab has this: https://docs.gitlab.com/ee/push_rules/push_rules.html#preven... (enterprise edition, admittedly)


1) Search for common pattern where the key can be stored

2) Search if found keys are actual valid keys

3) Expire key, send explanatory email, issue new key

(I think that's what AWS does)


Heroku's official Python template include `.env` in the repo: https://github.com/heroku/python-getting-started https://github.com/heroku/heroku-django-template (Although, to be fair, they do include `.env` in the `.gitignore` file.)


MailChimp has something like this. A few years ago I accidentally committed and pushed an API key, and I got an email from them a few minutes later saying that they had found the key and already invalidated it, so it couldn't cause any damage. Very proactive and smart, especially for an email service which is likely a huge target for abuse around this sort of thing.


Someone should create an app where you select an API service and it gives you a key that's been pushed to a public GitHub repo.


For anyone wondering, if you want to remove a file or secret you've already committed, you can use BFG Repo-Cleaner to go through your commit history and completely remove any trace of it.

https://rtyley.github.io/bfg-repo-cleaner/


Just note that if it's a public repo, it may not help you, due to attackers scraping Github's API and mirrors like GHTorrent. From "Why Deleting Sensitive Information from Github Doesn't Save You":

http://jordan-wright.com/blog/2014/12/30/why-deleting-sensit...

The top HN comment on the article details their experiences with getting hacked this way:

https://news.ycombinator.com/item?id=8818035


Warning - in the HN comment that is linked, don't click the link, is a browser popup spam which is actually hard to close (url has been dropped and picked up by a spammer?)


Sounds like a better idea to just change the secret.


That is always the best course of action, no? Once it's out, assume it's compromised.


But then what do you do when you accidentally commit someone's private medical records or get a "right to be forgotten" order?

Edit: I had similar objections to "why not rework databases as an immutable diff history?" https://news.ycombinator.com/item?id=13581096


Why not both? :)


That's time you could be spending on adding a new feature or fixing a bug :) Just change the secret and be done with it!


And when someone new thinks "that password's wrong, it'll update it!"? Do both, gets rid of the issue on both sides and really doesn't take long :)


Why would they if the tip does not have any passwords in it? It's not like a potential contributor will search the commit log to see if there were once passwords around. Besides, making such changes in public changesets is rude, to say the least.


What I mean is, lots of folks seem to be saying to ignore the presence of the file and just change the password where it's used.

Removing the file, or the password and adding a comment, as well as changing the password where it's used is much less likely to end up with a re-added password later.

Of course, removing the file, adding it to .gitignore and changing the password makes it even harder as a contributor would have to work to add the password back, which is even less likely to happen.


Oh sorry I tought you were saying to remove it from VCS history (as said many times in the thread). I totally agree you.


+1. Requires Java but BFG Repo Cleaner is the only app I've ever felt worth installing the JVM for.


There are so many of these.

It gets a little scary when it veers from professional security to individual personal privacy https://github.com/search?p=2&q=smtp.gmail.com+pass&ref=sear...


META

I notice in many instances Github showing an error.

We could not perform this search: Must include at least one user, organization, or repository

But if you change anything in the URL, it works again. Such as adding &p=2, which lucideer tried. But I got the error on his link. So I changed it to p=3, and it worked.

So I'm guessing Github has an autodetection for a particular global code search getting high hits (for something like this, I assume) that locks people out.

Seems more like a band-aid on a broken leg, though.


Wow very interesting indeed, same behavior for me. Definitely makes me curious about the implementation details of Github's search feature.


I should be amazed at how prevalent this is but after almost two decades in IT/IS, it's no more than the equivalent to the Post-IT on a monitor, but more accessible. Dumb, but business as usual.


> the equivalent to the Post-IT on a monitor

Writing passwords down on a piece of paper, and keeping that in your wallet or locked desk drawer is actually one of the more secure ways of storing passwords these days.

No risk of electronic compromise, and its highly unlikely that people who would steal your wallet or break into your home are also interested in your online accounts.


To be honest, and not that I ever would, but, if I stole your laptop, the moment I would become interested in your online accounts would be the moment I found your paper with all of your passwords on it.


This still has many flaws.

1. New guy gets hired.

2.You work for a large corporation and we'll you can't say you can trust everyone there.

3. Small company of 10 (company I work at for example) could be compromised by the weekly janitor.

4. Someone could break in and make it look like a robbery all while stealing your critical infrastructure.

I recently helped research a bit about internal security for our office and sticky notes are still a very common place for credentials to be compromised.


Wow, that is scary.



Some of these commits are PGP signed...


What do you mean by that?

That people took the trouble to use PGP but then go and do something this silly?


It's a good reminder that the human is usually the weakest link.


Yes.



Computers are supposed to be good at what imperfect humans are not. This only proves how primitive the tool is.

That is, for example, if Gmail can ask "it looks like you forgot the attachment" why can't Git say "this is a public repo and you're about to commit and push passwords. Are you sure?"

It's going to be easier to fix the tool than it is to make humans be perfect.


Gmail can make a simple keyword search for a handfull of phrases in something that's known to be text

Git would have to first decide whether a file is a textfile or binary file, a decision that can be done reasonably well heuristically but that is undecidable in the general case. Then it has to parse text files for a long, curated list of known keywords that are only used for storing API keys and are not (usually) used in normal code. I'm not sure if that's even feasable.

And then of course git has no concept of "public" and "private" repos, so the entire task can't be handled well by git.


Exactly! So we agree that Git - as it is today - is not up to the task(s)? :)

That's all I'm trying to say

We keep expecting a cat to bark, and then we're shocked and disappointed that it doesn't. So let's stop asking and find / build a better tool.


At least parsing and checking the commit message wouldn't be too hard, right?


I don't see a need for Git or Github to do this. Would be simple enough to set up your own Git precommit hook, that can be shared if you like.


While there may well be common trends, Git is a tool for arbitrary content - it's going to be pretty hard to accurately find passwords/secrets being committed. There are tools out there for more specific sets of stuff, but expecting git to catch anything is a little much.


Regardless. The broader point is, Git is a screwdriver and what is needed at this point is a hammer. Sure, we can keep trying to pound nails with a screwdriver but that's harder work and is far less productive.

We. Need. A. New. Tool.

p.s. But there are how many CSS pre-processors? And how many JS frameworks? Etc. Things we don't need. Go figure.


I mean, I'd argue it's a screwdriver and you want an electric screwdriver. Sure, I applaud that effort, but it doesn't mean the original thing is bad, just that it could be improved.


How would git know that it's a password/key/whatever?


I believe Django's logging framework will automatically replace strings in your settings.py file (basically a dict) with '*' if the key "looks like" a secret (contains the word 'secret' or 'key' or 'passw' etc).


What would I do? I'd take ALL those incidents on GitHub and I'd run them thru some sort of AI pattern recognition algorithm. That would become my identification "engine" (?).

It might not catch everything all the time - since humans are pretty creative when it comes to fucking things up - but I bet it would be pretty effective. Certainly more effective than what we have now. Then if it can keep learning going forward, all the better, eh.


just make it search for files or variable assignments named "password" or "secret". That will catch the majority.

In comparison Gmail doesn't catch all cases either, if you say something like "here are" instead of I've attached it misses it.


Key is easy. The high entropy should tip you off.

Passwords, look for variables with the name password, passwd assigned strings.

Like Gmails attachment, it'll get stuff wrong, just make it easy to continue on.


This. However, it would only work with secure passwords. Setting the entropy count too low would result in a bunch of false positives.


I'd say that it's hard to implement this effectively. Maybe as a language / framework-specific hook.


Good idea! Shouldn't be so hard to implement a simple prototype.

if "PASSWORD=xxx" in text => prompt alert or ask confirm

Next step would be to take this list (search result), make a curated list of 100-1000 unencrypted password (text/line + files + infos of repo), and then hard code some rules to detect +80% of cases.


A good a time as ever to mention AWS's provider-agnostic secrets-aware git hook that attempts to prevent this at the repository level [1]

[1] https://github.com/awslabs/git-secrets


Throw the word 'oops' in for good measure.

https://github.com/search?p=2&q=remove+password+oops&ref=sea...


I actually compiled a list of these from the last time this subject was mentioned on HN. http://gitoops.xyz/


This is a good example of the increased risks from doing your development out in the open, any mistakes are exposed to a much wider group of potential adversaries.

On an internal VCS, this would still be a problem, but a bit less visible/exploitable...


On an internal VCS this may be a deliberate decision: Secrets need to be stored somewhere and a cost-risk analysis can result in "this is the best place that we currently have at our disposal". That obviously won't fly if your threat model includes "adversary may attack our github account from within GH" or if you ever plan on opening up that repo, but if neither applies this may be the best place to store some sorts of secrets.


I've gone through the process of open-sourcing previously closed codebases, and in virtually all of them a decision is made to make a single "genesis" commit to start the public exposure because there's just not enough manpower (or I don't know git well enough) to go through and ensure there not only aren't any secrets now (meaning passwords, or info the company doesn't want to release), but also there weren't at any point in the past.


Genesis commit, that's a catchy name for it. We've done the same thing, after some discussion this always ends up making the most sense.

Also, you can hide your crimes and not show off all your "TODO: put more stuff here" commits to the world.


Sure there's always a cost / benefit balance to take into account.

That said I'd say putting secrets in a git repo is a pretty risky thing to do. By the nature of the tool that means that the secret ends up on the device(s) of every developer who checks out the codebase, so the security of the secrets is equal to the security of the worst secured device in question.


> That said I'd say putting secrets in a git repo is a pretty risky thing to do

Storing and keeping secrets is a pretty risky thing in general. Think: small team, small app, everybody has the secrets anyways for deployment purposes. Sure, setting up vault is superior - but how much effort does that cost that could be invested in a better solution. Or a puppet repo that you use to provision your machines, shared in the ops team: small team, everbody has root - on each machine there might be an ssh key that gives away all your secrets. So better invest in solid FDE and maybe tie that to a TF device, a yubikey that is required to decrypt the disk etc. Not perfect by all means, but there's limited time to go around and you really should think about what threats you want/can defend against. (for example, for most projects, I'm not wasting any thought about defenses against a nation state actor, that's a threat that I won't be able to meaningfully counter anyways).


FDE's and Yubikeys are nice and good controls for some classes of risk, but distributing your passwords onto dev laptops via a git repo. opens you up to a wide range of risks that those won't help you with.

Unless you got super-corporate lockdown with the end point devices you have risks like "A user with access to the repo. installs software which turns out to be malware", "A user with access to the repo. leaves their laptop in a coffee shop unlocked", "A user with access to the repo. puts it on a USB key and loses the key". None of these are nation state level concerns, they're things that could impact the project purely by accident, or at the hands of low-skill attackers

The point is once you've allowed secrets to be in a distributed system like this you have very little control over what happens to them, which is why I'd recommend using a secrets management system where there's more control (e.g. vault from hashicorp) in almost all circumstances.


Also, there's a surprising number of websites out there with a .git directory in the root...


Don't hardcode things, .gitignore your production config files, check in conf.example if needed.


We've published internally developed projects on github after removing anything sensitive and initializing a new repo from the latest version of the code base.

You lose your development history, but you ensure you won't get bitten by stuff like this.


I pretty much stick to private repos to avoid this being a problem. I'm still generally pretty careful about it, but in case I slip up it's nice to know the info isn't just floating around out there for anyone to grab.


A while ago I discovered similarly that there are several searches which lead you to active database logins.

https://github.com/search?&q=mysqli_connect+http&type=Code

https://github.com/search?q="rds.amazonaws.com"&type=Code

etc...


Wow. People would really store production passwords on GitHub for everyone to see?

I wonder if GitHub blocked those searches "We could not perform this search Must include at least one user, organization, or repository"

Edit: If I click on PHP for the language in the sidebar they show up. But hmm, I wonder if maybe GitHub tries to block leaks like that from being searched.


Yeah, I just started getting those too. But adding "&p=2" to the URL shows results for the next page...

shrug


You could probably also just use Google to find them with site:github.com.


When I was a teenager I liked to use search engines to find PHP upload tests. People almost always served uploaded files from a directory and made no distinction between .jpg or .php files.

(No, I didn't exploit this, I just enjoyed finding them)


To people like me who have done this many times in the past and want to add the file to gitignore

http://stackoverflow.com/questions/1139762/ignore-files-that...

The other alternative I can think of is to hide sensitive values in environment variables


Careful. If you have already committed it, and you ever push the repo to a public GH repo, your key is compromised. Just because some benevolent slacker-attackers on HN aren't sniffing the PSHB event queue, doesn't mean no one is. If you ever send the secret to Github, criminals have it. If you ever committed it, and then you ever push, then you've sent it to GitHub.

So yes, add the file to gitignore and git rm it, but also invalidate your keys and get new ones.



Wow private keys just sitting in plain sight o.O


From a former HN discussion http://gitoops.xyz/



Search for "update password" and you'll see almost as many results (268,000), many presumably with active passwords.

"Add password" finds 792,000 results, of which at least some (on the first page) are actual passwords.


I'd hope that all those people promptly changed their passwords after realising. But 200k commits, I'm sure a good percentage of them didn't.


Related topic: "Production AWS keys on GitHub" ~ 3 years ago

https://news.ycombinator.com/item?id=7411927

By just looking quickly, it seems that you can still find many recent live keys...


Amazon scans GitHub and revokes valid keys (can't find source, but proof is that it's been a while no AWS keys were stolen from GitHub)


I have it on good authority ( >_> ) that this is indeed true.


I'm embarrassed to say that it's happened to me. I deliberately created an IAM user that was 'public', for purposes of the application. Amazon shut down that user and everything (not much) that it had access to. This was in 2014 or 2015.



Searching filename:id_rsa also yield a rather interesting result. Alongside with "BEGIN OPENSSH PRIVATE KEY". I wonder how much of these also contains ssh_config.


People...seriously...

I get it...you like github but you don't want to pay for private repos. That's when you use Gitlab or BitBucket and then this problem goes away.


Well, we suppose that's a solution of some kind. How about never committing passwords and having passwords hardcoded in your codebase?


Also that


After this was posted, there has been some serious trolling in Github with "fake" commits matching this search.


Another for the list, JSON or YAML containing 'password' - https://github.com/search?utf8=%E2%9C%93&q=password+extensio...

Reminds me of the eye opening experience available at https://www.exploit-db.com/google-hacking-database/


I set up a honeypot and made this commit: https://github.com/teaearlgraycold/honey/commit/7c4289717979...

Already had a couple of sassy individuals telling me my honeypot is shit via the tty logging.


Fell for it, this is the first time I connected to a honeypot (or so I think). I especially liked the part where you type 'exit' and it just keeps you connected but the command line changed to 'root@localhost' at the beginning. Had a good laugh there :)

What software are you using?



How do you guys, handle this problem?

I use either `git-crypt` [1] or `ansible-vault` [2].

1: https://github.com/AGWA/git-crypt

2: http://docs.ansible.com/ansible/playbooks_vault.html


I follow the 12 factor app methodology (https://12factor.net/), everything in ENV.


For puppet users: https://github.com/TomPoulton/hiera-eyaml

Advantage of this approach is it encrypts the values individually instead of per file. This way the secrets files are git/review friendly.


Dotenv and Ansible vault, depending on the project. I also want to look into Hashicorp's Vault https://www.vaultproject.io


ENV variables. I use `dotenv` in Ruby projects.


And it actually uses GPG : )


Yep, same here. I prefer git-crypt, but we use Ansible vault too.


.env or ENV with AWS KMS


This made me realize an unexpected (to me) search behavior on Github. Basic/Default search will search commit history, but if I try to add advanced options I don't appear to get search history.

https://github.com/search?utf8=%E2%9C%93&q=remove+password+u...

Here I was trying to search for "remove password" just on repos for nicksagona (just happened to be one of the first users to display when you go to this thread's search).

That comes up with zero results. This leaves me wondering how I would run similar searches on repos that I'm involved with as a way of auditing to make sure none of them have compromised passwords that would need changed.

I would love to hear suggestions on how to do this.


Good thing all my commit messages are "xxx"


If you found a similar mistake in your repository, you can delete commit from history using: `git rebase --onto <commit-id>^ <commit-id>`. Or if you want actually rewrite it, see git rebase -i` documentation.


A single person who checked out your repo before the force push will still have the credentials. Once this has been pushed to a public repo, assume that the credentials are burned and revoke them.


Indeed; but these actions complement each other.


if you revoke the credentials, removing them from the git history serves no purpose but disrupts everybody that has a clone of the repo. So you're doing harm for little benefit other than covering up the incident. A net loss, if you ask me.


It serves the purpose of removing a hint on your password patterns from public availability.

E: Oh, and just to preempt this, even saying "i use only random passwords with no pattern" is useful information, as is having a ballpark password length.


Don't have password patterns, problem solved. Knowing that my password is 20 random characters of all possible symbols will not reduce your search space by any significant amount.


Still useful, means there's no point in checking any < 20, which halves the search space. Or, on the outside, can be useful to abort any attempts at bruteforce by way of cost evaluation and move on to another target.


Halves the search space, so in other words it only reveals 1 bit of information about the password, a completely insignificant amount. For a 64-symbol alphabet, a 20 character password is 120 bits long, so you'll still have to brute force 2^119 passwords on average. The sun will swallow the Earth before then.


My goal when choosing a password (generator) typically isn't "what will tie up an adversary for the longest, preventing them from moving on to attack someone else".

Call me selfish, but if my password is known to be too tough to bother, so Eve moves on to someone else's weak password, great.


If you know your password is too tough to bother, then having people bash away at it is no cost to to, and benefit to everyone else.


Too little, too late. Plenty of people watch the stream of recent commits to github, and can snatch an API key as soon as it's pushed. Removing the compromised, revoked key from your git history is like making sure your front door is closed properly after coming home to find you've been burgled.


I can think of a legitimate use case for that: cleaning up a that's about to be opened to the public (so no one who shouldn't have access to sensitive info has checked it out yet).


`git filter-branch` is a tool intended to remove sensitive material from git history. It requires a force push, of course.

If the keys/passwords are already pushed to GitHub or other public hostin, they should also be revoked.


Doesn't that requires a force push? Force pushes are acceptable for private repositories with a single user, but typically not in larger projects.

Just revoke the password/secret/whatever.


I find force pushes acceptable for topic branches of public repos. In fact, I use them a lot to leave behind clean history. Same as with squash merges, which technically also lose history.


It's better to push a similar name and let people decide if and when to rebase --onto the new upstream.

Squash merges are just bad. They destroy all the info that makes git handle branching and conflicts better than svn.


Or do both? Better safe than sorry.


It's better to just revoke and not re-write git history in a public repository. Re-writing history is pointless after the credentials are revoked, and causes a headache to others using your repository.


You'll have to revoke committed credentials regardless, as github is so frequently scrapped to find such content.


Just a tiny tweak - a handy shortcut to

<commit-id>^ <commit-id>

is

<commit-id>^!

I use it all the time with:

git diff <commit-id>^!


This is the main reason I use gitlab, because they have free private repos. When your trying to smash out code as fast as possible in a startup you don't want to have to worry about acccidently checking a secret in


Bitbucket is another alternative for free private repos.


GitHub has become the best place to find all sort of sensitive information. From root password, access to companies network, Api keys, everything is available from a search box, you don't need to be a genius to do serious damage, spying or doing all sort of black hat stuff.

Sure people should clean up their work, but as a fact not everybody does and it won't change tomorrow. You'll simply hear on the news: some Russian hacker are behind the attack or another bad excuse


This is pretty much how I found a couple exposed Stripe API keys. You just need to look through code of people who use the example implementation, and then dig in the history/config a bit. If you send stripe a key or two, they'll give you a free shirt.

https://adamlaycock.ca/blog/2016/05/23/Stop-Posting-Keys.htm...


Not only passwords, but api keys as well. I can't tell you how many times I've come across public repos that have full api credentials in them. Boggles the mind..


You'd better change the password instead of removing it.


Just using a random commit name like 'minor bug fix', 'updated version' for these kind of commits will save a lot of headaches like this.

One can do better by adding random lines/ logs in lot of files and sneakily remove password from one of them and then give a random commit name.

But then it all boils down to your mindset at that particular moment when you are commuting.


> Just using a random commit name like 'minor bug fix', 'updated version' for these kind of commits will save a lot of headaches like this.

Just change the leaked passwords, don't try to hide the commits.


Precisely, just change the password/key and don't do anything at all. People might think you are stupid or think used random text, either way you are safe.


Security by obscurity is no security at all. Revoke the creds and then either just remove them or run BFG as a secondary measure.


Don't commit passwords. Put them in a config file and .gitignore it.

You could upload an example file... but please don't put real passwords in the example file.


Git-crypt is your friend!


Cool. I did no know this existed. I did not know that I wanted it. Now I want it.


Then .gitignore and .env are your best friends :)


git-crypt suits a significantly larger set of use cases such as storing private keys, etc.


Someone here should make a bot and leave comments on all those commits warning people to change their passwords.


Looks like a lot of people read Hacker News... https://github.com/doutchnugget/awesomevim/commit/a7292962ce...


The worst part in that is that it provides tons of passwords to analyze and detect recurring words or schemes. This probably also will hurt people that never commited their passwords in public repos. Github should probably filter out such searches.


I think the ship of "easily analzed password dumps" has already sailed e.g. https://xato.net/today-i-am-releasing-ten-million-passwords-... <-- 10 million passwords


Yeah, indeed. I guess we can expect password schemes to change over time, so it's still a good idea to prevent it. Not sure in which proportion it helps, though.


Years ago, there were proggit posts of google searches for open myphpadmin consoles. I think people then went and deleted/messed with the databases. My search fu is failing me though, I can't find any of those threads.


Or try "remove aws"


I tried "fuck", but it's nowhere near as prevalent as "remove password"


Give "oops" a try


"stuff" isn't far behind


Always been a fan of "fix" ...LOL


Fixed? At least "wip" for "work in progress" doesn't raise your expectations too much.


One of the things I've done after making this mistake was creating an example config file for this information, committing that, and then dropping a gitignore on the real config file.


Thank you for highlighting this is a such a common mistake. I hope developers of all skill levels look at this and realize it how easy it is to make this mistake and learn from it.


I would recommend using torus.sh or other secret manager instead of an env or text file. I've forgotten to include them in .gitignore too many times.


some random guy spammed a ton of commits with a bitcoin address saying "please send me money"

jesus christ, how low can you stoop


Maybe a confidential-linter or git pre-commit hook would be nice to prevent leaking confidential information.


Good find. Reminds me of the a query done for Google Search that exposes sites vulnerable to SQL injection.



Heh the first result of this query has a password there as well https://github.com/budworth/cs313-php/commit/9113053776bb834...


leaks of this types are known since AGES, and still people are unable to keep private things private.



Once it's on the internet. Sites like github have become an amazing OSINT tool.


There are tools that automatically rewrite your git history and can fix this.


Rotate your keys and passwords in production, everyone.


And now I got goatse'd, thanks Hacker News, lol.


can someone please give me an ELI5 breakdown of what I'm missing... did these people attempt to change their password via git commits?


The question is: how many are left in the wild?


Another one: Search for Root password :D


git commit --amend just cleanup all the mess. also, better to change the pushed one


Wow!!!

This could be a big thing. It's time to write: How to write code without expose you


Holy shit!


use quotes. only about 19k


some of these seem legit


Encryption, do you speak it?


imagine private repo's :)


Not nice


git is a great tool, unfortunately there's no standard of storing sensitive info (like passwords). some store it as ENV variables, some hard copy the files to servers using custom scripts etc.. would love to see an easy tool that developers can use to manage this info like people manage files on git.


When you put your dotfiles in a repository online, be sure to commit all the public keys and none of the private ones.

Github, like SSH, uses an asymmetric authentication scheme. They even publish everyone's public keys. It's much more secure than passwords.


Almost as many results on 'remove credentials' :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: