
Keeping Passwords in Source Control - creativityhurts
http://ejohn.org/blog/keeping-passwords-in-source-control/
======
agwa
To solve this problem, I wrote git-crypt[1], which uses git's smudge/clean
filters to transparently encrypt/decrypt files when you check them in/out. So
it's a lot like this solution except you don't need the manual makefile steps.
As an added bonus, git diff/blame still work on the encrypted file.

[1] <https://github.com/AGWA/git-crypt> and
<http://www.agwa.name/projects/git-crypt/>

~~~
jeresig
git-crypt is incredibly cool - great work!

~~~
agwa
Thanks!

------
AngryParsley
First: I completely agree. Keeping plaintext secrets in source control is a
bad idea. Encrypting them is a good idea. If you have plaintext secrets,
encrypt them _now_ using this makefile or git-crypt. Then rotate them.

That said, this solution has a couple of issues:

1\. It encrypts the entire file instead of individual secrets in the settings
file. Encrypted files can't take advantage of many version control features. A
small change in plaintext creates a huge diff in ciphertext. Git blame doesn't
work anymore. Git diff gets a lot more spammy, since you'll see a diff for the
entire settings file if there's the slightest change in it.

2\. It uses symmetric key encryption. If a developer knows the password to
encrypt a secret setting, they can decrypt all the other secret settings. This
is true until someone rotates the passphrase and re-encrypts the file.

To fix both of these problems, I recommend using Keyczar
(<http://code.google.com/p/keyczar/>). If you write the right wrappers, it
allows you to encrypt individual settings with a public key. Decrypting them
requires a private key that exists only on production servers.

At a past job (Cloudkick), sensitive things in our settings.py looked like
this:

    
    
      from cloudkick.crypto_wrappers import kz_decrypt
      ...
      BORING_THING = "whatever"
      SECRET_THING = kz_decrypt("kz::xxxx....", "/path/to/private/key")
    

kz_decrypt did exactly what you'd think: given an encrypted string and a
private key, return the decrypted string. The private key was only on
production servers, so the risk of leaking a secret was minimal. The public
key was in source control, so anyone could encrypt a secret. For debugging or
testing, one could also replace the call to kz_decrypt with a plaintext
string. I wish the code had been released. It was only 100 lines or so.

This set-up would require a some extra work for settings files that don't
allow code execution. Still, once you've set it up, it's pretty close to the
most secure and convenient way to store secrets.

~~~
blake8086
Doesn't this mean that for a sufficiently short secret, someone could run an
offline attack to guess it?

~~~
AngryParsley
Good catch. My comment was already rather long, so I didn't mention that the
public key actually encrypts an AES key that encrypts the secret. A different
AES key is used for each secret. Also if the secret is < 1000 bytes (I forget
the exact value), it's padded with random bytes. The encrypted format is
something like kz::[AES key]:[encrypted padded secret]. Both the AES key and
secret bytes are base64 encoded so they don't screw up parsing or break Python
string quoting/escaping.

~~~
hamburglar
Presumably it's padded if it's not a multiple of 16 bytes, because that's the
AES blocksize, and not just some off-the-wall requirement that the data be
1000 bytes long. I'm also hoping that your encrypted format has one more
field, which is an IV that changes each time the data is encrypted.

~~~
AngryParsley
There's no IV, but the AES key changes each time you encrypt the secret. The
key is random.

------
wuster
I don't think storing passwords in any format in source control is good. Like
someone else said, it's mixing app logic with deployment.

We use a combination of Google's open source Keyczar [1] and a relatively new
Python keyring [2] library which uses the Keyczar crypter to read/write keys
to a keyring storage backend, where the backend interface can be implemented
with local crypted files or a cloud service.

[1] <http://www.keyczar.org/> [2] <http://pypi.python.org/pypi/keyring>

We built this Python wrapper called appauth around of the concept of a Keyring
service by application domain.

e.g. pseudo code:

    
    
        import appauth
        auth_service = appauth.AuthService('my-web-app')
        db_creds_cfg = auth_service.get('primary-db')
    

Inside of db_creds_cfg, it can be a free-form dictionary that provides
whatever details is needed to get into a resource:

    
    
        db_creds_cfg['db_host']
        db_creds_cfg['db_port']
        db_creds_cfg['username']
        db_creds_cfg['password']
    

I put in some honest effort to find an open source solution to this, but
failed to find anything with a simple install process AND programming
interface. Is there any interest from HN if we choose to open source this?

Furthermore, we use Google Authenticator on our servers to require two-factor
auth: <http://code.google.com/p/google-authenticator/>, on top of disabling
password auth in favor of signing in with ssh keys. All log files are then
either set to permission 600 just to be super paranoid.

~~~
RegEx
> Is there any interest from HN if we choose to open source this?

I personally don't understand this type of comment. Just open source it!

~~~
wuster
gotta go through the motions of putting up good readmes and documentation, not
a trivial amount of effort, only worth it if I think enough people want it.
open sourcing something isn't exactly free effort.

~~~
RegEx
A project is worth open sourcing if it's useful enough for other people to
use. I personally do not believe there's much more to it than that.

------
meaty
Actually, having no passwords and using a platform which supports integrated
authentication (like Windows) is probably the best approach with respect to
handling this. The authentication requirements are handled at an
infrastructure level, meaning no credentials are kept in source control or on
your production web servers.

In fact, none of our web servers carry ANY credentials at all. Our IIS
processes run as a specific user and are granted access to resources (message
queues, databases etc) as required.

I'm not sure stuff like this is entirely possible on Linux (I haven't tried to
be honest), but I assume you can do the equivalent with OpenLDAP / pam_ldap
and SELinux.

~~~
griffindy
on a unix system you could put them inside `/etc/profile.d/user.sh` as
environment variables so that whenever that user is running something those
variables exist. then if you're using chef (not familiar with puppet, etc.)
you could keep those passwords/keys in an encrypted data bag and set them
during provisioning.

~~~
wilmoore
Good idea...and, yes, you can do the same with puppet.

------
drone
In a previous job, we had a lot of components with configuration files
containing credentials for databases, etc. What we had done instead, was to
put placeholder tokens (imagine %DB_ROLE1_PASSWORD%) in the configuration
files, and then puppet (chef later) would be used to deploy the packages and
replace the tokens. In this way, no developer ever knew of the production
passwords, and only the system admins had access to the source control for the
puppet scripts. There really shouldn't be any access to production credentials
by developers if you have separate roles for developer and admin. (Some
companies are too small, I know =)

------
rcoh
This seems pretty reasonable. The obvious risk is that if the encryption keys
leak, all of your credentials may be retroactively compromised...

Still, much better than keeping the passwords in VC unprotected, of course.

My preference still is using environment variables so that the secure bits can
be fully decoupled from your code, however..

~~~
bradleybuda
Storing secrets in the environment is an _excellent_ idea. Heroku highly
encourages this approach, and it makes it much easier to give different levels
of access to code and secure data. It also makes it easier to avoid
accidentally copying secrets across environments (i.e. using a production API
key in staging or dev).

If you're not familiar with the practice, I'd encourage you to read the
"twelve-factor" section on configuration: <http://www.12factor.net/config> .
The advice applies even if you're not using Heroku for hosting.

~~~
voltagex_
It did lead to an interesting exploit though -
[http://titanous.com/posts/vulnerabilities-in-heroku-build-
sy...](http://titanous.com/posts/vulnerabilities-in-heroku-build-system)

~~~
nikcub
that vulnerability still would have applied if config directives were stored
in files

------
WestCoastJustin
I like this idea. I would add one tip though, use a group password safe,
rather than contacting person X. As a sysadmin we have passwords all over the
place (root, network, wifi, desktop, remote sites, etc, etc). There are five
people on our team, and we use Password Safe (windows) [1] and/or KeePassX
(linux/mac) [2] to manage lots of passwords. You do not have to contact person
X, for a password, if the are away for some reason, just check the safe.

[1] <http://passwordsafe.sourceforge.net/>

[2] <http://www.keepassx.org/>

~~~
jeresig
Great tip. This is what I've used in a number of organizations now, including
a version of this at Khan Academy. I've amended the blog post to mention this.

------
nisa
Lot's of GitHub users don't agree:
[https://github.com/search?q=.netrc+password&type=Code...](https://github.com/search?q=.netrc+password&type=Code&ref=searchresults)

------
moonboots
I recommend the scrypt command line utility [1] instead of openssl. Openssl
use md5 as a key derivation function [2], and cost of recovering a reasonable
length, randomly generated password is surprisingly low [3]. The costs in the
presentation are from 2009, and I can only imagine how they've dropped thanks
to a few years of bitcoin-driven gpu/hardware developments. If you trust your
code host, e.g. github or bitbucket, this isn't a concern, but neither are
plaintext passwords in version control. If you're using a very long, randomly
generated password, you're safe as well.

The disadvantage is that you'll need to compile scrypt from source.

[1] <http://www.tarsnap.com/scrypt.html>

[2] slide 20: <http://www.tarsnap.com/scrypt/scrypt-slides.pdf>

[3] slide 19: <http://www.tarsnap.com/scrypt/scrypt-slides.pdf>

~~~
helper
You seem to be confused. scrypt is a key derivation function. This blog is
suggesting you use openssl (using cast5-cbc cipher) to encrypt/decrypt text
that happens to contain passwords. The two actions (key derivation vs
encryption/decryption) are orthogonal.

Replacing an encryption algorithm with a key derivation function doesn't make
sense.

~~~
moonboots
The scrypt command line utility uses the scrypt kdf to generate a 256 bit key
for aes.

Both kdf and cipher are used during single file encryption with openssl and
the scrypt command line utility. Openssl implicitly uses a md5 as a kdf during
encryption [1.1][1.2]. Cast5 requires a 128bit key, and the kdf helps stretch
the user's password to fit this key requirement.

I can understand the confusion, as scrypt is typically referenced in kdf
discussions. It's actually somewhat difficult to extract the kdf functionality
from the scrypt source code because the code is geared towards single file
encryption. See this post for a confused q&a with scrypt's author [2].
Wrappers around scrypt like this python package[3] have made the "mistake" of
using the entire encryption pipeline when they just wanted the kdf. Using
scrypt in this manner should still be safe, but it will waste some cpu cycles
on aes.

[1.1] <http://www.openssl.org/docs/apps/enc.html>

[1.2] <http://www.openssl.org/docs/crypto/EVP_BytesToKey.html>

[2] <https://news.ycombinator.com/item?id=1350392>

[3] <http://pypi.python.org/pypi/scrypt/>

~~~
helper
I see. In that case I take back everything I said.

------
_gm
I'm sorry but storing sensitive information (e.g. passwords) in SCM is a
terrible idea even if they are encrypted.

Why are you mixing deployment with development? They should be two different
things IMHO.

~~~
hamburglar
Wat. SCM doesn't necessarily mean development. Many, many people version
control their deployment configurations (e.g. /etc/puppet/master/*)

------
morganpyne
Since we manage our servers with Puppet, we use hiera-gpg to securely store
sensitive information in encrypted form in git. Puppet then safely deploys
these files to our servers and our application deployment process (Capistrano)
symlinks/copies these config files in to the application as part of the
deployment process. The sensitive config files themselves are excluded from
our application's git repository and developers keep local copies of these
files (containing local dev. credentials only) for development purposes.

More info on hiera-gpg here: [http://www.craigdunn.org/2011/10/secret-
variables-in-puppet-...](http://www.craigdunn.org/2011/10/secret-variables-in-
puppet-with-hiera-and-gpg/)

------
marklit
I find only keeping boilerplate configs in repos helps decouple application
code from any one installation. Software can sometimes be used in more than
one environment, it's not always a one-to-one relationship.

------
epynonymous
you're still exposing yourself by putting your settings and credentials albeit
encrypted, out there. i dont like this approach at all, i'd prefer either
environment variable or a more ruby way of doing things like using a rake
command to convert an erb to yaml file. make sure you then encrypt or at least
obfuscate credentials in the config file (base64 or encryption), though
hackable if you can read ruby, but at least you're adding another layer of
indirection.

------
DeepDuh
> console.error("Did you forget to run `make decrypt_conf`?");

> console.error("You need to run `make decrypt_conf` to update it.");

Couldn't you just make the decrypt_conf target depend on the encrypted
configuration file and make the standard build command depend on the decrypted
file? This way it would get enforced with every 'make'.

In case you don't use Makefiles for building at all (because you only use
script languages for example) I don't get why you use a Makefile instead of
just two shell scripts.

------
chimeracoder
I prefer using environment variables, and then enforcing this in the Makefile:
<https://gist.github.com/ChimeraCoder/4728823>

If this is the first target (or a prerequisite of the first target), then
running 'make' will ensure that those variables are set to non-empty strings.

------
jimktrains2
<https://github.com/jimktrains/polygonus> I wrote this to allow us to encrypt
and search passwords. The encrypted file may be kept under version control,
though I don't know what types of attacks that could aid.

------
epynonymous
and i would also like to comment that you should be very careful not to commit
code in source control that hardcodes credentials because there's a history
that could be exploited.

------
camus
the fixing of the problem is easy , use freaking envirronment variables !
there is not one language/framework whatever that doesnt support them , so for
exemple in node :

> var db_password = process.env.DB_PASS ;

You dont have to keep any password sensitive file whatever inside an
opensource project. that is what envirronment variables are fucking made for !

