Hacker News new | past | comments | ask | show | jobs | submit login

Where is the right place to store db passwords, api keys, etc? What is best practice in this area?

I use AWS for a number of applications, so I've started doing the following:

1. Create a JSON file containing encrypted secrets (DB pass, etc.)

2. Upload the file to a secure S3 bucket with fine-tuned permissions and server-side encryption

3. For the instance that is launching, include the permission in the IAM role that allows it to "S3:GetObject" on the specific JSON file you uploaded.

4. Deliver decryption keys to the app in some manner (Chef, Ansible, etc.)

5. When the app starts, it downloads the JSON file and loads the environment variables.

I wrote a blog post[1] about this with more detailed info and an NPM module if you use Node.js.

[1] http://blog.matthewdfuller.com/2015/01/using-iam-roles-and-s...

This system is truly beautiful, I think one of the best suggestions in the thread (definitely the best self-rolled solution not using other tools)

I have a question about redundancy or "What happens if your gatekeper EC2 instance goes down"? If you have multiple gatekeepers could they be set up this way:

- let's say you have five different web apps using a gatekeeper to hold their secrets

- let's say you have n gatekeepers (let's say 3) and each of the apps knows the address of all three gatekeepers.

- If the primary gatekeeper is unreachable, all five apps would try to contact the secondary gatekeeper, but that gatekeeper would only (ever) respond in the event that the secondary gatekeeper also found the primary gatekeeper unreachable.

It's like a sleeper cell - at any given moment you have multiple replacement gatekeepers ready and waiting to serve, but each of them is unable to respond unless the one above it in the list stops responding. In this way you could lose gatekeepers (even permanently) and build a little bit of resilience into the apps depending on it while you're able to sort out what happened and restore normal behaviour.

Is this a good idea?

I'm confused by your reference to "gatekeeper EC2 instances." In the scenario I described, the secrets are housed on S3, not a separate EC2 instance. So, theoretically, as long as the underlying instance running the application code can access S3, and S3 doesn't go down (very unlikely), there shouldn't be any issues.

Yeah, we take this same approach. I wrote an ansible module that does this.

We (Shopify) use https://github.com/Shopify/ejson -- we store encrypted secrets in the repository, relying on the production server to have the decryption key.

It's relatively common to provision secrets with configuration management software like Chef/puppet/ansible/etc using, e.g. Chef's encrypted data bags.

Another slightly heavier-weight solution with some nice properties is to use a credential broker such as Vault: https://www.vaultproject.io/

For ansibile the built-in solution is: http://docs.ansible.com/ansible/playbooks_vault.html

Just wanted to +1 the suggestion for Vault - I've found it to be a really nice balance between usability and security.

Environment variables are the best and easiest way that I know of. You can supply those anyway you want to, and any programming language can easily get their values.

Glad I don't work at your shop then. Environment variables are a terrible way to give your app secure information. There's well over a dozen reasons why you shouldn't do this in your apps, but one super obvious one is there's way to many frameworks that expose environment variables in their debug output if not properly configured. Think you'll never misconfigure a server? Guess again, pretty much every major site (Google, FB, Twitter, Yahoo, EBay, Microsoft, etc) have all done it at some point.

Please review HN's guidelines on civility.

Fair point, I potentially should've left off the first sentence. I stand behind the rest of the post, but the first sentence is a bit on the edge and I apologize.

Alright, well, I've never seen an application/framework spit out environment variables when it was misconfigured. But then again, I barely work with web-related stuff so maybe I just don't use the kind of software that does this. Could you provide some examples?

Many web frameworks do this when in "debug" mode.

Your comment sounded more colloquial than uncivil to me, but thanks for responding so respectfully.


I'm glad you made a new account named 'shutupbitch' just to tell me this. Thank you for your contribution.

Please don't feed trolls.

The "dump environment" problem is an issue for novice developers, but mature shops should have security-conscious frameworks for secrets handling that do things like clear the variable from the environment at initialization time.

What are your other 11 objections?

I'm surprised this isn't higher up. Are there any arguments against ENV variables in favor of something else?

Here's my caution to this. If low level processes can do "ps aux", and they see something like:

DB_USER=scott DB_PASSWORD=b3withm3pl3aze /usr/bin/python webapp.py

That could be troublesome if an attacker figured out a way to run remote commands on your server even as an unprivileged user.

When I test that, the command line does not include the variable-setting. Is this a problem that depends on version or are you mistaken?

It would be kind of weird to include that considering how argv works.

Reading the environment of another process is a privileged operation.

If an attacker can get the process that's running the webapp.py to exec some abitrary bash command, that process has the ability to read its own /proc/$PID/environ . In general, you can read /proc/$PID/environ on processes that you own. At least I can do that on my Debian system:

    pikachu@POKEMONGYM ~ $ sleep 99 &
    [1] 21340
    pikachu@POKEMONGYM ~ $ cat /proc/21340/environ
XDG_SESSION_ID=5COMP_WORDBREAKS= "'><;|&(:TERM=screenSHELL=/bin/bashXDG_SESSION_COOKIE=8571b679eed8952dd96ad28a54...<etc>

(I actually gave the wrong example in my previous comment. While it is true that giving the ENV on cmdline will show up in ps eaux, the more appropriate example is what I just explained in this comment.)

If you can get it to exec some arbitrary bash command (or otherwise access the environ of a process) you can also have it cat any file on the server, and even the memory of the running processes that belong to the same user as the exploited process, and also execute network requests. So if you get that far, pretty much nothing will protect you.

Sure, but there are some shops that do their security from a point-of-view of "Attacker can run commands on your server as the user that started whatever-public-service/webapp/api", and go from there. I happen to think that's the best way to think about it.

Now, if an attacker manages to get root access then it's game over[1]. That just shouldn't happen. But nobody should be running their webserver as root. So, whatever that user is should be low-powered with only enough privileges to start the webserver & bind port 8080 (and use iptables or whatever to reroute connections to port 80 --> 8080) and the whole setup should be designed that this account won't be able to escalate things further if someone got a bash shell to it.


1. You should at least have some way of detecting that it happened and consider all data & files compromised and just wipe the whole machine & start over. Or take that machine offline for investigation into what happened and put a fresh new one in its place.

If an attacker can run an arbitrary command on your server, it's already time to rotate all the credentials in your system and let any data subjects whose data you hold know that you fucked up, big time. That's just the Linux model.

The example above is someone who have stupidly started a process with the environment variables exposed on the command line

Ok, but that's not a problem caused by the exposure of the environment, it's caused by the exposure of the command line.

I agree - I was just explaining what the issue the above commenter raised. It just means you should use a saner way of initializing your environment with sensitive values.

ENV variables are inherited by child processes by default, so please use care when using this approach.

My preferred solution currently is to use try to use encrypted strings in config files that are not stored in VCS. The host machine encrypts and decrypts using host specific keys so if the file is copied off-server, it is not fully compromised immediately. This is usually via python script which rewrites the file. (BTW, pretty easy to do on Windows boxes with MS API). I've considered using encrypted folders on windows in addition but not sure if that really makes a difference.

Usually the base config is in VCS but without user/password/db strings. We then manually configure the file with the encrypted strings on the server (usually with the machine name in the filename so that we can use hostname in code to find it and makes it clear the file is machine specific). Not all tools make this easy though and only works if you can add your own code in between. Also prefer files to environment as the files can be locked down easier in my opinion and more obvious what is going on.

I like some of the other solutions that are using encrypted strings but with a keystore server and may consider for the future if they support both windows and linux.

Stack Exchange's blackbox [1] is one solution. I haven't played with it personally and I'd love to hear other people's take on what's worked for them.

[1] https://github.com/StackExchange/blackbox

FWIW I have a /private directory in the root of all vhosts, so it looks like:

Anything stored in /private/ is not publicly accessible by the web server process, but can be read or written by anything running under the user's username.

It's specifically for storing things like configuration files.

I think this should be standard practice.

Thanks for the tip. How do you keep passwords and keys in sync amongst team members safely?

I only just recently had to figure that out. I opted for setting up a .kdb KeePass file in a private git repo and giving everyone ("everyone" = myself + one other) access to that. I'm pretty sure that's not a very good solution.

Why do you store such stuff under /public_html anyway? One level higher would be more appropriate I think.

It's not under public_html. It's under /srv/www/domain.com. It is one level higher.

Oh, that makes more sense. Sorry I got confused by the ascii tree.

http://12factor.net provides a good guidance for this at a high level. In reality all config should be separated from code.

There's a variety of mechanisms for loading this into your environment.

You can use a datastore like HashiCorp's vault: https://vaultproject.io

Windows has something similar in DPAPI - https://msdn.microsoft.com/en-us/library/ms995355.aspx

Config files that are not version controlled, or environment variables. I prefer config files because it's easier for me to communicate to other team members what needs to be present in their local development environment.

I typically handle this by versioning a `config.example` file, which includes all the necessary config keys an application expects. The example file defaults these attrs to various strings meant to show they are examples only. I include instructions to copy the `config.example` to a `config.yml` (or some other appropriate extension), and replace the values as necessary. The `config.yml` file is specifically excluded in the `.gitignore` file. The application will only load the `config.yml` file when started, so I also ensure to raise a descriptive error informing team members when they are missing a local `config.yml`.

This allows the `config.example` to also serve as a self-documenting config for the application, as comments can be included that identify and explain each of the config keys and their purposes.

Keywhiz is a good solution. See here for some background info: https://square.github.io/keywhiz/

Disclaimer: I worked on it.

I store dummy values in VC, then edit the real data on the production server. (And I obviously never check anything in from production, if you can set the production VC user to read only.) This has a nice side effect that if I edit the configuration file the new stuff gets merged in without causing a mess.

Another way is a second file that overrides settings as needed. Although I have found that to be less maintainable if the configuration file changes. That file should be somewhere entirely out of the VC tree.

Either way, the file must be placed in a directory that is not served by the web server.




are traditional. Only /public is exposed by the web server.

For me, I have connection criteria for a configuration database as environment variables... the config library will then connect to the configuration server with those credentials and get everything that application needs to connect to other services... I'd considered using etcd for this, but was unstable for me at that time... I keep settings cached for 5 minutes, then the library will re-fetch, in case they changed.

I think that if you're using git,

    use git crypt[1][2]
    use git-dir and work-tree options/env vars
[1] https://www.agwa.name/projects/git-crypt/

[2] though you have to remember to git crypt lock

In a configuration file that is not version controlled, or even environment variables, so that your application starts with the right variables, but they are not in some config file.

How do you communicate that data amongst team members then?

As I detailed in my other response to your original question, use an example config file that is version controlled. It includes all the necessary config keys, but example-only values. All team members would then be able to easily create a local config file based on the example that works. You can even document the config with comments in the example file so devs know what is needed and what it's for.

I think at one point, if you have a shared password for a development DB, production DB, etc. then just keeping those on a pen and paper notebook is your best solution. Usually, for shared environments such as that (although I hope the team can set-up their own DB's for development!), the number of shared "secrets" is relatively small. Some secrets are best not stored electronically, especially if they can give away user data.

Can also try maintaining a different repository of passwords and and pulling it on to the server during deployment.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact