
Ask HN: Securely store sensitive data in the DB? - provetza
I am designing an information system and one of the requirements is to keep sensitive data encrypted in the database, with a possible intruder being unable to decrypt them. Encrypting everything in the application with a key and then storing to the database is unacceptable, since all it does is add a little difficulty for an intruder -once he gets the key he gets the data too.<p>Passwords are kept hashed, so the password provided in the login gets hashed and if it matches the stored hashed password the user is authenticated, otherwise not. The password is not stored in cleartext and cannot be retrieved, but of course can be reset if the user forgets it. So far so good but what happens with other sensitive data that I need store, as API keys, cc data etc? These cannot be encrypted with the user password, because if the user forgets the password these become useless.<p>What are some best practises to keep sensitive data encrypted on the database, and reassuring that after a system break-in the attackers won&#x27;t be able to get the data unencrypted?  I want to design and implement a solution as secure as it can be and would like to hear thoughts, ideas and experience by other startups and engineers. I have not found anything really useful in this direction, apart from references to proprietary solutions that promise to do anything on some magical way (no comments!)
======
patio11
Any system which allows you to locally decrypt information, for the purpose of
doing anything for the user, should be assumed to allow an attacker who roots
the box to locally decrypt information. That's the unfortunate harsh technical
reality.

If you have compliance reasons motivating this need for encryption, you'll
find that e.g. HIPAA and PCI-DSS ignore technical reality, in favor of
requiring that you encrypt information stored at rest and imposing substantial
penalties on you if it leaks. There are a variety of ways to do this. One
fairly common one for HIPAA-compliant applications is putting the e.g. MySQL
data files on a partition which is block-level encrypted. You then issue
decryption keys to folks who need them, such as e.g. the application.

If your host is totally compromised, the host holds both the decryption key
and the ciphertext, which means "Sucks to be you." However, this does provide
non-zero increase in security (e.g. if an old copy of the DB drive ends up
floating off to eBay because of poor physical control on your part, and you
can document that it doesn't include the encryption key, you just avoided a
reportable information breach), and it does check the appropriate boxes on
e.g. HIPAA.

~~~
orofino
To expand on this a bit, absolute security for encryption just doesn't exist.
If you wanted your data 100% secure, put it in a database, disconnect the DB
from your network, put it in a locked room, guarded by biometric locks and
security guards. Even in that scenario, the data is vulnerable, but why even
bother discussing that point, as the data is worthless if you can't access it.

With that reality in mind, I was responsible for PCI for a large part of the
infrastructure at a Level 1 Merchant, meaning a yearly audit had to be passed.
Ultimately, our solutions boiled down to restricting access to an external
(read different machine/network segment), firewalled host that did the
decryption. In some cases this was an appliance that was purchased (this helps
with compliance, but they're expensive, and they're a nightmare if they become
a performance bottleneck as they're a black box you know little about). In
other cases we used a web service we built that worked similarly (auditors
will pick this apart because it isn't a "standard" solution).

In all cases here is a high level of how they work: encrypted data is passed
to the service, which pulls the encryption key out of memory, decrypts the
data, and sends it back to the requesting host. The encryption key is stored
in (at least) two pieces, each piece is encrypted with a key encrypting key,
key encrypting keys are know to very few employees, no single employee holds
both key encrypting keys. The encryption keys is only assembled in its
entirety while in memory.

Again, there are problems to this, as patio11 intimates, compliance includes
much theater a times, but this is reality, and it does provide benefit over
other solution, in this case, at least three layers of security must be
compromised before you could decrypt everything.

~~~
liquidcool
I'm in ecommerce as well and I've seen PCI/DSS auditors require vendors/hosts
to rearchitect using an encryption/key management appliance. You wrote you
built your own solution - are there no known, trusted open source
alternatives? As you mention, the appliances are almost astronomically priced,
so it seems like an area OSS (or a disruptive startup) would help.

~~~
lifeisstillgood
Love to follow up on that too

------
michaelt
1\. If at all possible, don't store credit card numbers in your database. A
payment gateway will take care of this for you - you have an iframe the user
uses to submit their credit card details straight to the payment gateway, and
the payment gateway gives you back a token you can use to charge and refund at
your convenience (locked to your merchant account so not useful to attackers).
DataCash and Chase Paymentech are two companies that provide this service, and
I'm sure there are others too.

2\. If the user forgets their password and resets it, ask them to re-enter
their credit card details in case their e-mail has been hacked. (Also ask them
to re-enter their details for deliveries to new postal addresses, if
applicable)

So if you can't access CC data after a customer resets their password, that's
no problem.

3\. Use database-level security; set up roles and accounts in your database so
tables containing sensitive data only have select grants to apps and users
that really need them. When a table has some columns that are sensitive and
others that aren't, set up a view with the sensitive columns replaced with
placeholder data and give them access to that instead.

~~~
bndr
How do payment gateways store CC information? For example if I am a payment
gateway, how do I securely store CC data?

~~~
stevekemp
YOu have a dedicated box that stores details and is remotely contacted through
an XML-RPC/JSON-HTTP API of some sort.

The API should have two methods:

* Add a new card to account. * Make payment of £xx from card NN.

The machine is locked down, runs no other services, and so cards cannot be
exported/stolen from this system. You'd encrypt the filesystem and prompt for
a key/passphrase at boot. Ideally you'd only login via the serial console so
the only service exposed is your "add/charge" methods.

(Even allowing the remote-deletion of cards could be a security issue;
obviously.)

~~~
bndr
Exposing only the "Add new card" and "Charge Amount XX" methods actually makes
sense, Thanks for the info!

------
brasetvik
There's a book series "Translucent Databases" with a lot of interesting use
cases, where the assumption is that an adversary has gained access to the
entire database.

I've only read the first edition, and it's some years ago, but I'd recommend
giving it a quick read. :)

------
bradleybuda
As many commenters have pointed out, there are theoretical problems with what
you're asking for (if you can decrypt the data, then your attacker can too).
But, there are some practical things you can do to make the attacker's life
harder:

\- Don't write your own cryptographic code or design your own crypto systems;
use existing libraries as much as possible.

\- Separate your reads from your writes. Using a public/private key pair, you
can give one set of systems the ability to write encrypted data but not read
it, and a different set of systems the ability to read the data. The systems
that can decrypt / read data should be isolated as much as possible - don't
expose them to public networks, limit which operators have access to them,
etc. The separation also forces you to encapsulate your secure data and define
an API over it; rather than arbitrary reads, hosts that don't have the
decryption keys will have to ask the hosts that do to perform specific
operations. If you're writing a Rails app, the Strongbox gem [1] enforces this
pattern for you.

\- Rotate your encryption keys.

\- Don't store keys in code. Follow the Heroku pattern [2] of storing any
sensitive data (i.e. private keys) in the environment, where it is bound at
runtime to your code and encrypted data.

\- Store as little sensitive data as possible. Make sure data you don't need
any more is periodically purged.

\- Human processes are just as important as code; keep track of who has access
to sensitive data, make that access opt-in, and remove it when those people
change jobs or are terminated. Do everything in your power to keep those
operators from being phished (user education, two-factor auth, etc).

\- Don't store credit cards if at all possible. Find a payment processor [3]
to do it for you. It's not worth the headache, it makes you a more attractive
target, and it may come with additional legal overhead (depending on your
jurisdiction).

[1] [https://github.com/spikex/strongbox](https://github.com/spikex/strongbox)

[2] [http://12factor.net/config](http://12factor.net/config)

[3] [https://stripe.com/](https://stripe.com/)

------
RyanZAG
I'd say the first thing to understand here is that absolute safety is
impossible in this case. If the hosting server is compromised, password
loggers can be installed and even the login page itself can be altered to
remove any form of security. With access to emails, an attacker could send an
official email asking everyone to reset their passwords, etc.

So your question is actually: How can I make my system divulge the least
amount of data as possible over time to someone who has compromised the
service?

To hamper someone from changing your service to remove security you could set
up daily checks from a server hosted in a different location to download your
static resources and check them against a pre-validated hash.

For storing data - as others have mentioned - the key is to store that data in
a way that it cannot be accessed from that one server alone. A simple solution
for this is to setup an internal service that will provide the data when given
the correct login details. This gives the attacker an additional server he
would need to hack. If you keep this layer as simple as possible it can add a
lot of security. Of coarse, if the hacker is able to compromise your server
for a long period, he can record anything passing through here anyway.

In the end though, the web-server itself is a lynchpin in which all customer
data has to flow at some point, and if that key server is compromised for a
long enough period, eventually all data can be extracted regardless of
precautions. That means that designing your web service with security in mind
from day 1 is very important. Regardless of what people try to sell you here,
there are no silver bullets that will prevent data theft - only mitigate the
impact or delay it.

~~~
patio11
_This gives the attacker an additional server he would need to hack._

If they root your web tier, and your web tier knows how to ask your internal
service layer for sensitive data, then the attacker knows how to ask your
internal service layer for sensitive data.

I really hate repeating "If you lose any one box in your deployment then you
can assume you will lose all data, regardless of whether you encrypt things or
not" because it makes me feel like Debbie Downer, but that is, in fact, the
threat environment.

~~~
RyanZAG
If you don't store your user passwords/hashes in a way that your web layer can
access directly, then you can slow the attacker down by requiring them to wait
for that user to actually log in and send the password. ie. If your web layer
passes the authentication tokens through to the data layer, and the data layer
handles storage/authentication of those tokens, then hacking the web layer
only allows you to log future requests.

To achieve a layout such as this, you would prevent your web layer from
talking to the database itself directly, and force all data requests through a
different service layer.

Obviously, this makes your whole architecture much more complicated and you
only really gain any security if you are able to detect the attacker before he
can sniff all passing user data anyway.

Your assumption is still spot on though - one box down really does mean game
over. All of the tactics above and in the rest of this thread only slow down
an attacker or make the attack more complicated. None of them will ever
prevent it entirely.

------
raverbashing
\- Keep passwords in memory (so as if you start the service it prompts for the
password)

\- Asymmetrical crypto. So for example, you encrypt your CC data upon sign-up
but then to run the charges you need the private key (and this is somewhere
else)

\- Enable SSL communication with your DB. Postgres has this, because being
defeated by network sniffing is bad.

~~~
ra
Memory is not secure. It's quite a common attack to grab keys / passwords from
the memory of an executing program.

~~~
raverbashing
No, it is one of the safest places.

If your attacker has access to arbitrary memory of a process, you're using an
insecure OS/version. Or they dumped your process memory using a vulnerability
(in your system)

Yes, there are some possible attacks (page file, cache, etc)

Attacking memory after a reboot requires physical access (unless you
hibernated without an encrypted file, in this case...)

It certainly beats the security of file/network

~~~
ra
Just because it's more secure than network / file doesn't mean it's secure.
That's why we have smartcards / HSMs.

~~~
raverbashing
Smartcards are vulnerable as well and there has been successful attacks
towards some smartcards (google it)

And not every system has an HSM available

~~~
ra
Sure but smartcards and HSMs are slow and in a properly managed environment
are much safer than memory.

Whether or not the OP has an HSM is moot. The OP said, "I want to design and
implement a solution as secure as it can be" ... and that means (among many
other things) keys on HSM.

------
adamlj
If you have to store CC data inhouse I would suggest storing it on a completly
sepatate machine which only stores and charges cards. The only communication
allowed from this box would then be Store this card, Charge the card with this
token etc.

~~~
stevekemp
I wrote a similar comment too. In practice you find you might want to allow
"delete card" or "update card" which are complications to the simple-model.

------
dankohn1
I'm a fan of using client-side encryption so that the database only ever
stores encrypted content, and therefore can be treated as out-of-scope for PCI
compliance purposes.

Take a look at
[https://github.com/braintree/braintree.js](https://github.com/braintree/braintree.js)
which is a nice library for encrypting data with a public key before being
uploaded.

This is a specific exception to the generally correct concept that Javascript
cryptography is bad and should be avoided.
[http://www.matasano.com/articles/javascript-
cryptography/](http://www.matasano.com/articles/javascript-cryptography/) Of
course, it's essential that the whole transaction take place over SSL.

And even then, you still need to have a set of machines that can read from the
database and access the private key, and those machines must be highly
secured, as well as supporting requirements like key revocation and key
rotation.

------
adg001
If statistics are considered sensitive, you can use cryptographic counters in
lieu of their plaintext counterparts.

A cryptographic counter is a public string representing an encryption of a
quantity, satisfying the following properties:

1\. Subjects with access to the _public-key_ can update the encrypted counter
by an arbitrary amount, by means of increment or decrement operations and
without first decrypting the value (i.e., the operation is performed over
encrypted data);

2\. The plaintext value is hidden from all participants except the entity
holding some secret key;

3\. The adversary can only learn if the cryptographic counter was updated
(i.e., information about whether the counter was incremented or decremented is
kept hidden to all participants except the secret-key holder and the updating
entity -- honest-but-curious threat model).

An implementation is available at
[https://github.com/secYOUre/Encounter](https://github.com/secYOUre/Encounter)
.

------
jevinskie
This is an ideal use case for homomorphic encryption, whenever it becomes
useable.

[https://en.wikipedia.org/wiki/Homomorphic_encryption](https://en.wikipedia.org/wiki/Homomorphic_encryption)

------
halayli
use Asymmetric encryption and store the private key in an HSM. The private key
never leaves the HSM device.

~~~
ra
This, and other layers of security. There is no magic formula to keep your
data safe.

If you are serious you need network security (i.e. firewalls physically
separating networks), proxies, IDS etc. You also need to build your app in a
security conscious way (read owasp.org).

You simply can't do this sort of thing well if you do not have infosec
experience. If this data is truly sensitive you should hire someone who lives
and breathes security. There are established methods and approaches. You can't
get away with "use this algorithm" or "use that library".

After you've built, pen tested and deployed your app; security depends on key
management and good change management practices that you simply can not skimp
on.

We once built an app that uses an HSM; even though that app is in a secure and
private (single occupant) data centre in the organisations own basement, they
decided it was necessary to get a "shark cage" built, just so that they could
tell if the server had been physically compromised.

~~~
RyanZAG
You can't ever really tell if a server has been physically compromised.
IDS/HSM/bla are only a chance at working out if it's happened. A perfect
attacker could obtain access to any system and never trigger any alarms if
they understand the triggers for any alarms that are in place.

Much the same as you can never tell if someone has broken into your apartment:
you could tell a novice has broken in by looking for papers that are out of
place or footprints/fingerprints. An expert burglar would make sure not to
leave anything obvious like that. You could tell if an expert has broken in
using something like IDS: set up a special trap or webcam that will detect it.

However, a perfect burglar would replace the webcam tapes, find and
disable/ren-enable any traps, etc. Since most web hosting environments are so
standard, it's actually a MUCH easier prospect to be a perfect hacker than a
perfect burglar too.

Also, no amount of perfect security skills can keep you absolutely safe. An
unknown exploit in your OS is simply out of scope for even the greatest
security expert, and no amount of best practices can help if your
OS/CPU/RAM/Network Card will give the intruder full access through some
unknown flaw outside of your control.

------
harrytuttle
We keep encryption keys for sensitive data in active directory and have a
front end firewall, web servers, midplane application firewall, back end
service layer cluster, internal firewall before anyone front facing can get at
the info. The decrypted data is never passed to the web layer.

To gain access, someone will have to root two separate active directory
domains after breaking into multiple low privilege accounts and a database
cluster.

Possible always, but we make it a hard target.

~~~
kintamanimatt
After all, the primary objective isn't to create an impenetrable system, but
one that's exceptionally difficult to penetrate.

------
jahewson
The best practice for storing sensitive data: don't. You have to design your
system presuming that someone has hacked your server - if the encryption key
is stored there, they can find it.

\- As you're aware, password hashes are a way to avoid storing sensitive data
(though they should still be treated as sensitive). You're using a strong hash
function and a salt, right?

\- Re-use the principle of password hashes for API keys to simply _avoid_
having to store hyper-sensitive data: generate a long (say 512-bit) secure
random number (using OpenSSL) as the user's secret API key. Then hash the key
as if it were a password and store only the hash. Now if someone steals your
API key database they can't use it to authenticate as your users.

Note: for API keys a strong hash such as bcrypt will probably be too slow and
resource-intensive. However, because API keys are (long) random data, unlike
passwords, you can use a faster hash function like SHA-1.

\- As for credit card data: don't. You probably can't afford the PCI audits
and dedicated hardware and the same principle applies: just don't store
sensitive data. Instead, many payment gateways offer 'tokens' for recurring
payments in which you pass the payment information to their API without
storing it (or use their hosted page in iframe, if acceptable to you) and they
return a token which can be used to charge against that card in the future.
Not all payment gateways offer this, and some charge (too much) so take a look
at [https://spreedly.com/](https://spreedly.com/) which offers a middle-man
gateway service which adds tokens and other API feature to pretty much all of
the payment gateways.

As you can see, in both cases it's possible to simply avoid storing the most
sensitive data.

------
rythie
Here's an idea:

Use the user's password to decrypt a key, that then decrypts the data - which
I know you can't do because of password resets...

So to deal with password resets, create another password which decrypts the
same key. Store that other password in a physical safe, possibly in a bank
safety deposit box. This will slow down password resets to a manual process of
course.

For additional security you can store these split a password in two or more
pieces and store in different banks. For convenience you could allow users
from the same organisation to reset each other's passwords (since they all
have access to the same key).

Also, use a IDS so you know know as soon as you've been hacked - because
people logging in at that time are still at risk.

~~~
stan_rogers
Trivia note: this is, in a nutshell, how the Lotus/IBM Notes ID works. The
password is used in a KDF to generate a key, which in turn decrypts the user's
private key (and certain other credentials, along with symmetric secret keys
for shared encrypted doccuments). Success/failure is determined solely by the
successful decryption of known bytes in the encrypted package. Other info (the
user's public key, identity and certifier, all signed) are maintained in the
clear and can be easily and safely exported and may be "trusted" for
authentication with remote machines. There is a "password recovery" system as
well (it doesn't actually recover the password, but allows a reset), requiring
cooperation of two or more admins¹ (in a Shamir-type arrangement) so that
previously-encrypted user data will not be lost.

¹ There is the option to use a single admin, but there are great big warning
signs and scary red boxes all over that section of the doco. It's something
you'd only use in a solo shop (as a Notes ISV or a Domino web dev).

------
EGreg
Let's say you are able to perfectly encrypt something so that only the people
who have "authorized access" can get the data.

You still have a problem of determining who these people are. See
[http://en.wikipedia.org/wiki/Confused_deputy_problem](http://en.wikipedia.org/wiki/Confused_deputy_problem)

Your best bet is to have three-factor authentication (something they are,
something they have, something they know) generate a key to encrypt the data.
Then their user agent still has to be trustworthy (no viruses on their
computer, etc.) In addition it has to not be tricked by various exploits (such
as
[https://www.owasp.org/index.php/Session_fixation](https://www.owasp.org/index.php/Session_fixation),
[https://www.owasp.org/index.php/Session_hijacking_attack](https://www.owasp.org/index.php/Session_hijacking_attack),
[http://en.wikipedia.org/wiki/Cross-
site_request_forgery](http://en.wikipedia.org/wiki/Cross-
site_request_forgery), [https://en.wikipedia.org/wiki/Cross-
site_scripting](https://en.wikipedia.org/wiki/Cross-site_scripting)). You also
have to secure the channel, preferably with TLS using Diffie-Hellman Key
Exchange or some other way that won't be compromised just by stealing keys on
the server. Even with all this, the Web (HTTP) is not a good way to access the
data on the server, because the client usually loads all the code it runs from
the server, and thus has to trust the web server not to serve malicious code.
Otherwise the server can later do a [http://en.wikipedia.org/wiki/Man-in-the-
middle_attack](http://en.wikipedia.org/wiki/Man-in-the-middle_attack) such as
a
[http://en.wikipedia.org/wiki/Replay_attack](http://en.wikipedia.org/wiki/Replay_attack),
to get the same data. And when I say the server, I mean the server compromised
by some hackers who got root access credentials. And so forth.

In short, you will never have perfect security, just approximations. (Unless
possibly if you make use of
[http://en.wikipedia.org/wiki/Quantum_cryptography](http://en.wikipedia.org/wiki/Quantum_cryptography))

------
anywherenotes
I was just at Oracle 12c presentation, and I'm not pushing for Oracle (and not
associated with them), but just going to mention something they have, that
maybe your database provider also has.

Oracle (I believe starting from 12c) has masking for data. They were saying
how you can mask everything except last 4 digits of credit cards. So if
someone gains credentials of a regular employee, they will be able to query
data, but sensitive data will be masked by database itself.

------
a_smith
I've recently been looking into the same issue. I need a way to encrypt data
before inserting it into a database in such a way that the person inserting
the record can read it, their supervisor can read, but their colleague can't.
It needs to survive a password reset and I don't want to store any keys on the
server unencrypted.

This lead me to attribute based encryption and the libbswabe library. The idea
is you generate a master keypair and from these you generate private keys for
each of your users. Your user's private keys can only decrypt data that was
encrypted with attributes that were also applied to their key.

For example, let's say we have 2 users Alice and Bob. Alice is a supervisor
for the IT department. Her key was generated with the attributes "alice" (her
username) and "itdepartment". Bob is a normal employee in the IT department,
the only attribute applied to his private key was his username "bob"

Now lets say we use the master public key to encrypt each of the fields in the
user table (Firstname, Lastname, Email, etc). If each field for a user record
is encrypted with the attributes: [current_username] and "itdepartment", then
Bob can decrypt his fields because they are tagged with "bob" and "bob" is an
attribute in his key and Alice can decrypt her record through the same logic
AND every record whose fields were encrypted with the attribute "itdeparment".

If users private keys are encrypted with their password and stored in the
database, then the only way you can get Bob's key is to break his password. An
attacker now has access to the data that Bob's key can decrypt, but
importantly, not everything. If Bob forgets his password (and therefore can't
access his private key) then a new one can be generated and all it needs to do
is have the "bob" attribute in order for him to have access to all his old
data.

Now this is by no means a complete description of a solution, you have to
securely store the master private key (you only need this to generate private
keys for your users though, not for every put/get request), there's issues
around key revocation and lots of gaps in my description, but these issues are
present for any crypto system. Attribute based encryption though seems to me
like it overcomes a lot of the issues that plague other solutions, the biggest
single one being that other solutions require the master private key to either
be on disk, or in memory at all times, this solution doesn't need that.

------
buro9
patio11 got this right, if your application is able to decrypt it, then
nothing you can will secure this data. Encryption is not the tool that you are
looking for.

You can persist with encryption, but only if the user holds the key, ideally
via 2-factor auth.

Instead of this, I'd go for whitelist of access, audit logs, monitors, rate
limiting and alerts.

If you hold all the encrypted data and the keys, you only need your
application server to fail. My personal view is that worse than thinking you
have security is not responding (or even noticing) when the inevitable
happens.

Configure your systems to be as secure as possible without going down the
obscurity path, and then tripwire everything and know what unusual patterns of
activity look like and who did what.

------
Eduard
What is "CC data"?

~~~
patio11
Credit card data (numbers and the like, though probably not including CVVs --
the security codes -- because one should not be storing that in any form
anywhere).

~~~
bobsoap
How are sites that store your CC data for later purchases able to charge your
card without storing the CVV? Amazon, for example, and pretty much every
subscription/SaaS product.

