Hacker News new | comments | ask | show | jobs | submit login
Confidant: an open-source secret management service (lyft.com)
245 points by woodrow on Nov 4, 2015 | hide | past | web | favorite | 67 comments

This is the first time I've seen a nicely documented requirements.txt and I like! https://github.com/lyft/confidant/blob/master/requirements.t...

No more "pip freeze | sort -f > requirements.txt", explicit dependencies are way more maintainable!

It looks like most of that could be generated. Now I want a tool that looks up that info and outputs annotated requirements file.

Shameless plug, I wrote this small being-bored-at-a-conference tool to show you the licenses of packages installed in your Python environment: https://pypi.python.org/pypi/license-info/0.8.7

It's pretty rough but gets the job done for separating free and open source requirements.

And then writes the code, including tests to show that all the requirements are met.

Great observation. One of best docs on dependencies I've seen.

+ for readability, - for precise versions. Good luck receiving critical updates this way...

We watch CVEs and update accordingly. Assuming you're using the latest stable release of the docker image or are using the latest stable release of Confidant (we're still working on making releases through github) you should be using a version with secure dependencies.

Using >= doesn't ensure security, but it does ensure less stability, and part of security is availability.

Pinning precise versions is the best practice for requirements.txt (as opposed to setup.py in a package) (http://nvie.com/posts/pin-your-packages/). Since we have no guarantee that every dependency uses e.g., semantic version, it's the safest way to have reproducible builds across machines.

You can also now list outdated packages with pip if you wanted to upgrade them yourself or test compatibility of new versions. Example for a side project I have laying around:

  $ pip list --outdated
  Django (Current: 1.8.5 Latest: 1.8.6)

  $ pip install --upgrade django

From a packager's point of view: enforcing specific version results in me having to patch and repackage the service for deployment just because one of its dependencies changed. I understand that keeping track of which project uses what kind of versioning strategy (to apply proper range) is hard - but it's either that or tracking every single point release they do.

I fully agree that deployment should be reproducible and stick to tested versions. But requirements.txt is how you build the software, not how you deploy it. (Unless it is, but then no sympathy from me ;) )

> + for readability, - for precise versions. Good luck receiving critical updates this way...

Real lesson. Authentication failed and no clue why app couldn't start. Developer spent almost two days and realized a new gem version had been released and the new version was not compatible with the current release. Quote developer "fool me once shame on you, fool me twice shame on me."

Imagine auto-scale (no image) or installing packages during docker container initialization, in production, the word "fuck" will fill up your mailbox/IRC/Slack/HipChat/ChatBot from your dear SRE / DevOps because of lack of regression testing.

Same argument goes for auto system update in Ubuntu. To receive critical update you better have a process to review, and warn critical updates. One change in the system API can screw up your entire system.

If you want the fail hard model, which is really useful, the best way is to have two sets of requirements.txt. One for random Saturday testing (just launch a docker instance running tests with latest version of packages installed), and one for development all the way to production.

But any critical updates should be API compatible and promoted ASAP. Sure, they need to be verified, but if someone breaks their API with a critical update, that's a separate issue with their release model. If that happens, system packages sometimes fix such incompatibilities (you'll get a new package with patched version instead of a completely new release)

From some brief exposure to Plone (and through Plone, Zope) -- I think basing your deployment strategy on "they should" is a bad idea.

Zope originated much of python packaging, through necessity (said tools/tooling has since continued to evolve). Plone still (AFAIK) have massive lists of "known good sets" of dependencies. Often down to the minor version. Because sometimes, what is a bugfix for most consumers of a library trigger latent bugs in other (sets of) libraries.

Yes, you want to fix all the bugs. But sometimes you can't - and then you might need to be explicit in upgrading from 3.2.1 to 3.2.1-bugfix9 rather than making the "leap" to 3.2.2.

> Sure, they need to be verified, but if someone breaks their API with a critical update, that's a separate issue with their release model.

Actually, it goes both way, and if your production is down, it is your problem. You can't expect everyone to be nice to you with guarantee. It is a shared responsibility.

i mean, aren't specified versions crucial for compatibility? what am i missing?

on second thought, i guess you are suggesting to specify the major, with an ambiguous minor? but doesn't this require that no package will introduce a breaking change in a minor version? not sure you can rely on that assumption.

Yes, and yes. It's the best practice to have flexible subversions of dependencies (or dependencies of dependencies) when you're shipping a package with setup.py (http://python-packaging-user-guide.readthedocs.org/en/latest...). However in requirements.txt for an app that runs by itself, their approach is the best practice.

An alternative approach, if you don't want to be 100% tied to KMS, is https://github.com/mozilla/sops . It uses KMS, but also PGP, and potentially anything that comes up in the future.

A big +1 for SOPS. We've used it with great success.

That's very nice. If you're using Microsoft Azure though, you might as well use Azure Key Vault[0].

[0] https://azure.microsoft.com/en-us/services/key-vault/

How does this compare to Vault? [1]

Looks like Confidant is tied to AWS whereas Vault can use various backends..?

[1] https://www.vaultproject.io/

Vault is a nice piece of engineering (we use it), but it has what I call serious "backend-itis". Everything is pluggable, which makes it a bit of a nightmare to understand and use. For example, "secret backends" and "storage backends" are entirely separate things, but the docs aren't super clear about it (not to mention auth backends, audit backends, listeners ...).

Unsurprisingly it buys completely into the Hashicorp ecosystem (Consul, weirdo HCL configs, etc) which is a plus or minus depending upon your perspective I guess. Vault servers are also stateful, which complicates deployment (you can't just stick N Vault servers behind a load balancer).

If I had infinite time I'd consider creating a "SimpleVault" fork consisting of exactly one secret and storage backend, one way to auth, one curated workflow and make it run in a stateless manner. I'd probably also remove all the clever secret generation stuff, since 90% of applications just require a secure place to store secret blobs of data, as opposed to creating dynamic time-limited IAM credentials on the fly or whatever.

I just want to correct some of your points, I don't want to detract or compare Vault to Confidant here, as this is their time to shine!

"Buys into the HashiCorp ecosystem": Consul is completely optional. We also support ZooKeeper, SQL, etcd, and more. There is zero forced tie to Consul. HCL is correct though!

"you can't just stick N Vault servers behind an LB": Actually, that is exactly what we recommend! https://vaultproject.io/docs/concepts/ha.html Vault servers are stateful in that there can only be one leader, but we use leader election and all data is stored in the storage backends, so it can die without issue.

"90% of applications just require a secure place to store secret blobs": Just use Vault out of the box. You'll have to configure only storage, after that you can use the token you get from initialization (one form of auth) and the `/secret` path to store arbitrary blobs that is preconfigured (one form of backend).

Hi, quick question since you seem to be on top of Vault.

My team and I were investigating possible configuration options and secret management, and we couldn't figure out a good reason as to why the tool doesn't use S3 as one of the backends, which would essentially eliminate the need to maintain our own etcd/consul/sql whatever - plus, since the entire thing is path based, it seems incredibly well suited for that backend.

I couldn't think of one, but I might be missing something.

S3 is one of the available storage backends. You only need consul/etcd/zookeeper if you want to have a high availability setup.

not trying to be snarky, but you do understand who posted the response above(i.e. mitchellh) right??

Your fork description is pretty close to what Confidant is. You should give it a try if you're in AWS!

We also wanted things to be simple & wrote a custom cli and ansible module to wrap around https://github.com/oleiade/trousseau, but have slowly found the need for more of the features that tools like these provide.

We're planning on switching to Vault or something similar in the near future & will be testing out both of these. I definitely agree with the "backend" confusion surrounding Vault but I'm interested in where Confidant may be lacking. We host services on AWS and Heroku, so in some places AWS' KMS may not be the best option.

Regardless, I do have to say it's so refreshing to see security tools with decent documentation & a clean user experience. I'm almost stunned to see a swiftype search box, one page threat model description & preconfigured turn-key container all in one place. Props to Lyft for opening up a great tool to the masses!

FAQ in the post discussed this:

"The main difference between Confidant and the others is that Confidant is purposely not cloud agnostic, choosing to use AWS’s features to deliver a more integrated experience. By leveraging AWS’s KMS service, Confidant is able to ensure the master encryption key can’t be stolen, that authentication credentials for Confidant don’t need to be distributed to clients, and that authentication credentials for Confidant clients don’t need to generated and trusted through quasi-trustable metadata."

Confidant is purposely tied to AWS so that it can rely on KMS for its master key and for authentication.

Indeed, thanks. I just found a little more on Vault's take on this: https://www.vaultproject.io/intro/vs/kms.html

I'd love for someone to explain what you get from using a secret management service other than encrypted at rest blobs.

Ex. You store your AWS Master key in a config file, and you have Microservice A that reads that key from the file. Microservice A is compromised (or its VM is compromised). How does having a secret store help you here? Couldn't the attacker just inspect the code of Microservice A and see that you are just reading from disk/reading from Vault?

In short, what do services like this protect from me (other than accidentally checking in my code to a public repo?)

They let you not check secrets into repos, they let you update and rotate them in a centralized place, they let you easily share them between services, and they let you store them encrypted at rest.

Also, in this particular case, thanks to KMS you can also keep the stored at-rest on microservice A, and decrypt them only in memory as well.

Note that you don't store a master key in a config file, but instead you use the KMS master key to encrypt/decrypt things. You never get direct access to the master key.

I have a genuine question, why not use S3 alone for secret management?

One selling point of Confidant is using IAM roles to bootstrap authentication to the secret store. You can also do that with S3, put each secret into an individual text file and give each IAM role permission to access the secrets it needs. Set the S3 bucket to encrypt the data at rest, it uses KMS behind the scenes and automatically rotates encryption keys.

Rotation of the secrets themselves could be scripted or manual, that part would be basically the same process as using Confidant or any other tool. And I believe S3 access can even be auditable with CloudWatch logs.

Also, S3 now offers either eventually consistent or read-after-write consistency. EDIT: actually, it looks like new object PUTS can be read-after-write consistent but updates are not. So this could be a downside, if you rotate a key getting the new one is eventually consistent. In practice this might not be a big deal though, there's already going to be a gap between when you activate the new key and when your app gets reconfigured to start using the new key.

I'm very curious what the downsides might be of doing this. For all the various secret management tools that have been released in the past year or two, I'm kind of surprised I've never heard anyone talk about using raw S3.

Encrypted S3 is definitely an option. The difficulty there is how to manage secrets that may be shared across services and how to ensure that there's no race conditions between updates of secrets.

Confidant provides access to credentials by IAM role. The key here is that you can have credentials A, B, and C, and IAM roles example1-production, example2-production and example3-production, then map credentials A and B to example1-production, A and C to example2-production and A to example3-production. Confidant just stores mapping info for this. If you update A, you don't need to fan out an update to all three services. Race conditions can come in when A and B or C are updated at the same time. S3 is very eventually consistent and there's no locking. read-after-write consistency is only available in one region at this point.

Of course, this is all based on your assumptions and how you design your system. Based on our needs KMS + DynamoDB made more sense for us. Something like sneaker (https://github.com/codahale/sneaker) may fit your needs better.

Another interesting bit about Confidant is that it manages the KMS grants on the auth KMS key, which lets you use the key from any service to any service to pass secrets, or to do general service to service auth.

Also, there's a bit more coming down the road for automatic management of other common secrets that are annoying to manage in AWS, like SSH host keys for autoscale groups: https://github.com/lyft/confidant/issues/1

So they basically reinvented this?


We wrote Confidant in the beginning of 2015, so it's not a reinvention of sneaker. That said, sneaker is very similar in design, as is credstash: https://github.com/fugue/credstash

Both of those are really great projects and I recommend them if they fit your use-case.

One of the reasons we originally avoided S3 was because it can occasionally be very eventually consistent, which we wanted to avoid. We also wanted to avoid fanning out secrets to services when the secret needed to be shared between multiple services.

We also wanted to have a UI that was simple and easy to use for everyone.

Another alternative developed for AWS deployments, written in Python and uses KMS: Credstash https://github.com/fugue/credstash

The only downside of credstash is that it doesn't have the ability to restrict sets of credentials to different IAM roles. The access is all-or-nothing, per dynamo table.

Otherwise the general design of credstash is very similar to Confidant.

It is possible to use fine grained access control with dynamodb in order to restrict access within a ddb table

Please correct me if I am wrong, but I think there is no secure way to store stuff in an virtual environment.

I wish I am wrong - cause my heart always bleeds if I see db passwords in configuration files! But As long as there is a hypervisor you do not control access to - you must trust the owner of the bare metal to (1) honor your privacy (2) be competent to secure his system. Trust is nice, but it is not security.

granted - Confidant and KMS seem better solution than most. Will look into it at more detail. thx for open sourcing it and moving the solution forward.

Security is a spectrum. There's definitely a path for someone with full local access to the hypervisor and system memory to do some careful reconstruction or other malicious injection, but that attack vector is amazingly rare compared to attacks based on bad protocols for network traffic, bad app-level auth, and insecure storage practices (mixing code and creds, for example).

There's immense value in defending against the kind of attacks where an attacker gets partial access, even if an attacker with omnipotence can compromise you.

Indeed! After all we live in a world where many ssl private certificates are only protected by OS file system rights on a internet facing server. And clear text database passwords in config files are common, in all content management systems - and even many customer relationship systems!

mysqldump got a CLI argument to provide the password! As far as I know - visible to anyone with some access on the system. the documentation warns about that and suggest to create a config file to store the password.

security is a spectrum - but if it's about password storage in modern web applications live on the lower end.

I am increasingly frustrated by that - and if I raise concerns many admins stick to binary security "storing stuff unencrypted on disk is okay, cause the attacker is inside already" followed by "you must store the key somewhere" it's wrong and it's true at the same time. it's also sad - because not only the users, but also the ppl paying us trust us to keep them save. and we don't. :( we can say "it's a spectrum" and we are truthful - but we just can't keep em safe. I think it's important to recognize that simple fact.

...or is it me who is just meticulous?

I did not know such thing is possible. very cool! thx for pointing it out! :)

but still..some rhetorical questions: Is KMS based on this concept? Is it's implementation open sourced? Is it verified to run unchanged in the instance I pay? Who is verifying it? Who is paying the one verifying it? Can I meet him / her to drink a coffee and build trust between us?

pls, do not misunderstand me. I got more trust in the amazon techs than in my own ability to admin a complex linux system on bare metal. But while all that is great progress toward security we still need to make a leap to find a solution. don't we?

> I did not know such thing is possible.

It's theoretically possible, but right now it's too slow to be practical in usual situations. Just storing and single operations are maybe possible. But for example homomorphic AES takes multiple seconds per each block. And that's still AES-128.

Wait, are you saying HE is feasible for 128 bytes of data, if you can allow seconds for de/encryption? That'd be plenty usable for read-at-startup 128 bit api keys for micro-services, disk encryption keys etc?

(Granted, in this context, having a de-crypted disk at run time under a vm would be considered insecure - but still better (for some use cases) than man alternatives)

AES has a block size of 128 bits, not bytes. Using the fastest reported method, that would be 2s/block, so 16s/128bytes.

So in practice, probably that's the case - you could use homomorphic encryption for very infrequent operations on KEKs for example.

Well, actually I was wrong twice, so it cancels out ;-) 128 bits is enough for many things. Eg. 128 bit keys. Too little for an ECC secret key, but not by much.

Nice way for Lyft to fire back after that iOS reverse-engineering video [1] revealed that they were showing off one of their keys in a production client. I don't know if this was intentional, and I believe whatever exploit they had was mild, but it restores (at first glance) my faith in them a bit :).

[1]: https://realm.io/news/conrad-kramer-reverse-engineering-ios-...

"KMS provides access to master encryption keys,... but doesn’t provide direct access to the master key itself, so it can’t be stolen."

Doesn't Amazon KMS have access to the master key? And therefore, it can be stolen from them?


"AWS KMS is designed so that no one has access to your master keys."

"...never storing plaintext master keys on disk, not persisting them in memory, and limiting which systems can connect to the device..."

Translation: We promise to not look at your keys.

Granted, this is most likely better than nothing, especially if you trust AWS, but ideally, you want only the client to have the key material.

If you're running in AWS with any secret management service that isn't using CloudHSM, you're trusting AWS with your key material.

There surely is some amount of trust there, though.

My undertanding was that the Customer Master Key was stored in an HSM, but your customer-generated keys were not. I might be wrong about that. So if true, AWS employees would not have access to your root key material, but the definitely the intermediary key material. It's a cost trade-off

Either way, you have to bootstrap secret material onto your instances somehow, so you've got to trust Amazon somewhere.

Not to diminish your point -- the how/where/when really does matter. Locking your house key in your car is still better than leaving it on your front step, but also not as good as in your pocket.

We use ZeroTier to encrypt our AWS microservices traffic. Way easier to setup and just ... works.

This is a bit different from network encryption (which is really valuable). This is a centralized location for services to store and retrieve secrets, like passwords to external services, SSL keys, etc..

Another one of those problems every company seems to try to solve on their own. :P

I totally agree (we did), though Confidant is in some ways trying to be an extension of existing AWS tools. I'm curious what you're using though since I've felt for a while now that every one of these secrets stores have faults or are a giant pain to work with. Confidant may be the least offender in that regard & anyone that is trying to make being secure easier or trying to make better tools gets an A in my book.

My company is using Vault as far as I know. I think HashiCorp is doing a decent job identifying the sort of things every company is trying to reinvent

It's unfortunate that this is tied to AWS, while the industry is moving towards a cloud agnostic approach to hosting.

That being said, it seems like an interesting project to keep an eye on.

I'm in agreement with mpdehaan2 here.

By sticking with a single cloud provider you get a lot of benefits. One of the more major benefits is cost, as you can do things like reserved instance in EC2. Another major benefit of sticking with a single provider is that you get to use solutions that only that provider has, which in this case is KMS, DynamoDB and IAM.

The biggest downside of being cloud agnostic is that you're stuck with the smallest combination of features that all of the clouds provide.

It seems like Confidant can and will be great for customers using AWS, and frankly it's nice to see open source projects built using AWS services, but is not a complete black box (like many AWS services themselves). I also totally agree that utilizing the tools provided by your hosting provider can be majorly beneficial; however, I think that sometimes teams employ hacky solutions so they can use a hosted service from AWS (or similar), in which case they end up with technical debt and the inability to change directions in the future.

As you said, until tools exist that offer operability between clouds, users will be stuck with the smallest combination of tools. I do however believe such a day is coming.

I'd say the opposite - the industry is in general being a lot more comfortable with investing in AWS to the point of lockin being acceptable now. More people are treating cloud providers less like a big pool of IP addresses.

Not saying it's a good or bad thing, but attend AWS reInvent or something and it's pretty easy to percieve huge uptake.

If you adopt extra management stacks running on or against your cloud you lose a lot of the benefits.

That's an interesting perspective and observation. I've always found lock-in to be a bit scary, and I've heard this echoed by a number of colleagues & companies. No one can deny the usefulness of tools provided by hosting providers, namely AWS; however, to me, architecting a system in such a way where you are literally unable to jump ship when the need arises (compliance, cost, etc) is unacceptable. With hybrid / multi cloud more of a reality with the existence of tools like Docker (and others), there needs to exist sets of tools capable of handling such variability. I digress, this really belongs in an entirely different thread.

We tried running on three cloud service providers while connecting back to our own data center (corporate stuff) at the same time, for a couple years, because of the whole "cloud agnostic". The code requires us to be compatible with OpenStack and AWS. Trust me, no matter what people say about the compatibility of some of the components between OpenStack and AWS, you cannot do the AWS-way 100% in OpenStack. In the end I got a pile of shit code with a lot of workaround and condition.

Sure Heat is like a comparable version of CloudFormation, but you can't do what you did in AWS and translate to Heat. OpenStack doesn't have all the services that AWS has. Not every product (and not every API in a product of AWS) works 100% with CloudFormation. If you are running an old release of OpenStack, you are more fucked in that case.

We are now moving our cloud infrastructure to AWS entirely so no more "it works in this environment", or "they sort of work and sort of different because of X and Y tool not comparable."

The right attitude should be "do one thing well, and improve." Architect "right" and than unlock yourself whenever possible. The same goes to people migrating to Docker or any container technology. Dockerfile is a lock-in. You still have to move back to regular shell script later if you drop Docker.

> Dockerfile is a lock-in. You still have to move back to regular shell script later if you drop Docker.

Not really a big problem there, since Dockerfiles are pretty much shell scripts :)

I would tend to agree here to some extent. I think you have to be careful what you lock into. We use RDS and Elasticache but if push came to shove we could easily spin up our own db instances elsewhere or use some other managed service. That being said, locking into something as integral about how you handle your secrets is a big decision. One that when you make, you must be sure you are going to be in AWS for long haul. The reason is how you handle your secrets will touch how everything in your infrastructure is deployed. Migrating to a new system is not easy. (Trust me I am trying to migrate to Vault) For this, I would prefer something less rigid like Vault but I think this tool seems pretty cool.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact