Hacker News new | past | comments | ask | show | jobs | submit login
Keywhiz: Square's system for distributing and managing secrets (square.github.io)
232 points by bascule on Apr 21, 2015 | hide | past | web | favorite | 31 comments

Coincidentally Coda Hale's sneaker (https://github.com/codahale/sneaker) just popped up in my Twitter feed earlier. Sneaker stores the secrets on S3. Keywhiz stores the secrets in a central database and then ephemerally on the client servers. I guess if you started with something like sneaker on AWS using Amazon's KMS you could then move to Keywhiz if you eventually moved out of AWS.

I appreciate that you posted this. My first thought when seeing this was "Why all the fancywork when we can just use S3?"

Keywhiz uses KMS to encrypt config data and then put it in DynamoDB. This means that access to Keywhiz-managed keys is ultimately mediated by access rules around who has access to the KMS key used to encrypt the secret.

With an S3-based solution, either Amazon can manage the keys for you and you use IAM Roles to mediate access (which I find to be a cleaner solution, personally, though open to other perspectives), or you can use KMS keys in S3 and then we're back to access being mediated by KMS keys, plus IAM roles, but without a DynamoDB table to manage or pay for (i.e. S3 will be super cheap b/c secrets occupy so little space).

I still appreciated the OP's intro to KMS, which I found insightful.

You'd probably still want to use Sneaker until you were massive, even if you moved out of AWS. S3 provided tremendous value compared to its costs (3 cents/month/GB for storage, requests are cheap as well) compared to EC2.

Though the sneaker page makes it very clear that its not ready for production use.

I don't think "Keywhiz should be considered alpha at this point" really screams production ready, either. For me, the Sneaker README's detailed enumeration of which threat models had been thought over really helped inspire confidence, as did the acknowledgement that no professional cryptographers had evaluated its soundness (most people just ignore this idea and rampage onwards unencumbered by reality). I believe that both are probably better than storing plaintext keys and passwords at rest in Git or on developer machines.

Wow this looks great! If the whole online payment system thing doesn't work out for square the engineers there should fire the entire management chain and pivot into a developer services company, because the projects and libraries they publish are just stellar (particularly the Android stuff).

Maybe I'm missing something, but how do you get the client certificate private key on the e.g. new webserver which allows it to connect and retrieve its server certificate private key? Isn't that the same key distribution problem, just one step removed?

Each client would have its own client key in this case, compartmentalizing access. The loss of a single client key would not expose your entire infrastructure.

I had been planning to do this with keys stored in S3 and IAM roles to retrieve the keys at instance boot time, stored in a ramdisk, but this saves all of that trouble.

Instance profiles (credentials) are accessible to all users for the life of the instance. Until AWS provides an ability to lock down portions of instance metadata and/or a way to delete early in the boot process, it's much safer to bake your bootstrap key directly into the AMI as root (or specific user) visible only.

Good point. I've been using IAM roles to bake the AMIs with Packer, where the key is fetched during the bake process; perhaps I'll investigate sneaker if there's time tonight.

Thanks for rubber duckying this with me!

Yeah, perhaps as an additional block device you can mount with whatever permissions you want.

right on, like the FUSE in question.. but securely mounting something remotely means storing a bootstrap key somewhere anyway. Turtles all the way down :)

Fundamentally you dont. Or ive never seen an actual solution in practice. Physical hardware comes down to trusting the layer one/two domain and (usually) the mac address. With virts you trust your infrastructure to pass through some blob like a pre shared key. Which again is just punting on the actual problem. In either instance you might inject manual authn/authz by requiring a human in the process.

Hypothetically the problem of boot strapping trust is solved by tpm. However thats predicated on functional hardware attestation. And thus the hypothetical bit.

I saw a reference to that PKI package in the linked article, but it doesn't really answer the question.

A newly brought up server can generate its own private key and issue a key signing request to a certificate authority, but how does the CA server authenticate the request? You still need a shared secret (perhaps implicitly through the username / password of an administrator doing things manually).

Any way you cut it, I don't see how you avoid this problem outlined in the article: "Common practices include putting secrets in config files next to code or copying files to servers out-of-band. The former is likely to be leaked and the latter difficult to track."

I do take the point made by some of the sibling comments that this allows for better mitigation of the damage in the case of the loss of the bootstrap shared secret, and I can certainly see the value in that.

Perhaps similar to how new versions of puppet let you provide custom attributes (like a pre-shared key) as part of the request so you can do verification on your issuer and build the secret into your boot strap process. A key compromise would simply require a rotation on your issuer and your boot strap process.

You can solve this by having a bootstrapping process that issues the appropriate credentials when bringing up a new server.

And how do you trust the identity of the new server/instance during boot strapping?

You could leverage the TPM and some version of remote attestation and only permit key-requests from attested machines. Alternatively (or concurrently), you could PXE boot all devices with a parameterized shared-secret individualized for each node.

I'm getting the feeling they don't like something about etcd or raft or both. Is that accurate? I noticed certstrap was adapted from etcd-ca.

It certainly requires some bootstrapping, but it seems like once you've got the certs setup keywhiz should be able to manage updates to those and all your other keys, so you do the annoying work once with what are essentially ephemeral secrets, and then you get to use keywhiz to do all your management thereafter.

I get the impression that you're going to need a certificate creation process to go with it, probably with an in-house authority, to allow the components to talk to each other in the first place.

> KeywhizFs is a FUSE-based file system, providing secrets as if they are files in a directory. Transparently, secrets are retrieved from a Keywhiz Server using mTLS with a client certificate.

I hope this doesn't bring their entire infrastructure tumbling down when a network problem causes processes to block while reading from the mount point.

There's an in-memory cache that keeps the secrets locally even when the server disappears either due to downtime or network issues.

Also coincidentally, this was just released today -- Credstash: a utility for managing secrets using AWS KMS and DynamoDB. Written in Python. https://github.com/LuminalOSS/credstash

Oh thank goodness! I was about to have to implement something like this using S3 signed requests, ramdisks, etc. I owe somebody a beer! (or coffee)

I want this in a smartwatch.

mm it says "java" :(

People downvote this, but there's a couple good reasons it shouldn't be in Java.

This doesn't need to be a complicated service. You could write an alternative to Keywhiz in bash in a weekend. This is a tremendous win over Java because 1) it's interpreted, so modification is trivial and fast, 2) it's a commonly known language by nearly everyone on *NIX platforms, 3) it's simple to troubleshoot, 4) highly extensible and 5) fast to develop with.

In terms of how to do this securely, you could of course still write it in bash and use FUSE to distribute your secrets, but why? Personally i'm more of a fan of push services when it comes to opening up what is literally all the keys to your kingdom. Design a host-and-user-and-service-specific trigger that can request the key server push the proper credentials to the machine and you avoid opening up your server to attacks on open services.

If you want to keep your secrets from getting on disk (and honestly, this isn't a concern for most servers as 99% of them are on all the time and will more readily leak secrets from memory than from disk), just push the secret to a tmpfs mount. The only annoyance is when your machine reboots each service's init script needs to request the secret be pushed, but I can't imagine that takes any more time than doing the same thing through Keywhiz.

I know the draw of writing everything imaginable in your favorite language. I used to write all my tools in Perl, but it turns out that's annoying for people who have to admin my services later that don't write Perl. For really simple admin tools like this, Bash is really the best thing for you. (Though i'll admit, some web app language more secure than Bash to handle the client->server requests would be handy)

The downvotes are likely the result of not adding anything to the conversation.

More specifically to your point, I strongly disagree. People should build systems in their favorite language if the language fits the use case. And usually, the language chosen isn't as important as being familiar enough with the language that your solution is clear and well written.

However, contradicting myself for a second, bash is probably not a great choice to build complex systems. The problem with bash is that it's easy to write something that works, but it's much harder to change it and keep it working. Bash is very powerful for one liners, but the second your program grows, you start to miss all the nice things that most general purpose languages have. And another problem I've ran into with bash is that not all *nix systems actually have bash, and even those that do have many different versions which make it hard to write a large code base that works without issue across different services. Even more so when people tend to use non-bash binaries, and then you have to write up a script to install the dependent binaries, and have to make it work for different package managers and the complications go on and on.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact