
CryptDB – A database system that can process SQL queries over encrypted data - monort
https://github.com/CryptDB/cryptdb
======
cvwright
CryptDB is cool, but be _very_ careful if you're thinking about actually using
it for anything.

Muhammad Naveed, Seny Kamara, and I have a paper appearing at ACM CCS next
month where we show that the approach used in CryptDB and other similar
systems is vulnerable to a raft of statistical attacks.

[http://research.microsoft.com/en-
us/um/people/senyk/pubs/edb...](http://research.microsoft.com/en-
us/um/people/senyk/pubs/edb.pdf)

The problem happens when you have a sensitive field in the DB, and you
actually need to run queries on that field for your app to function. CryptDB
will "peel the onion back" so that your column is protected with a weaker
deterministic or order-preserving encryption primitive in order to let the
queries succeed. But then you're vulnerable to the statistical attacks that we
describe in our paper.

Of course, the CryptDB authors are smart people, so they foresaw some of this.
They describe functionality in their tool for marking certain columns as
"sensitive", so CryptDB will never peel the onion back too far. But then you
can't use that column for most queries.

Like I said, be very careful.

~~~
kerkeslager
Thanks for working on this. I think what the CryptDB people are doing is doing
is important, and I think it can't succeed without people like you finding
gaps in their security.

~~~
cvwright
Thanks. It's definitely been a fun project. And yes, our goal is to keep this
area moving forward towards better and more secure systems. CryptDB was a big
step, but there's still lots to be done.

If you're interested in other recent work in this area, check out the Secure
Anonymous Database Search project from Mariana Raykova and others at Columbia
[1], and the secure database work from David Cash and a big team of other
smart people [2].

[1]
[http://www1.cs.columbia.edu/~mariana/search.htm](http://www1.cs.columbia.edu/~mariana/search.htm)

[2] [http://eprint.iacr.org/2014/853](http://eprint.iacr.org/2014/853)

~~~
kerkeslager
Thanks, I'll definitely read those.

------
A_Beer_Clinked
This is a fantastic project from the MIT CSAIL team that builds brings
together many cryptographic components into a integrated system. The academic
papers on this are available from here
[https://css.csail.mit.edu/cryptdb/](https://css.csail.mit.edu/cryptdb/)

Conceptually it's a sql proxy that decides how to maximally encrypt data based
on the operations that are required. It can then weaken and re-encrypt
portions of the data on the fly so that more powerful operations can be
performed.

You can chain the encryption to the users login to to prevent cross user
leakage.

This model allows you to have a database that is protected from adversary who
have access to the database.

The encryption techniques include a partially Homomorphic scheme and other
non-homomorphic schemes.

I believe that the team behind it seem to have moved on to a startup using
similar technology. The code is not being actively maintained anymore. I was
able to build it successfully on Ubuntu 12.04 with Bison 2.x(The code requires
a mysql build which chokes on BISON 3).

They claim only a ~15%-25% performance hit although I've not yet be able to
replicate that myself yet.

------
zzalpha
[http://arstechnica.com/information-
technology/2015/09/resear...](http://arstechnica.com/information-
technology/2015/09/researchers-respond-to-developers-accusation-that-they-
used-crypto-wrong/)

------
mkiraz
Please note that the attack already shown in
[http://www.ijiss.org/ijiss/index.php/ijiss/article/view/58/p...](http://www.ijiss.org/ijiss/index.php/ijiss/article/view/58/pdf_17)
in “Section 4 Security & Efficiency of CryptDB” in March 2014. It basically
says that “First of all, CryptDB is open to frequency attack where the
adversary knows the frequency of the plaintext. Namely, if the RND layer is
decrypted to the DET layer in EQ onion, then the frequency attack is possible
to apply because of deterministic encryption in the DET layer. In this attack,
the adversary that observes the queries can determine the ciphertext simply by
looking at the results’ row count. This attack can only be fixed by the RND
layer, which has no usable functionality in practice. For example, assume that
we have left part of Table 11, and its encrypted form is the right part in
Table 11. By using the knowledge of the frequency, one can learn the
corresponding plaintexts from the right encrypted part in Table 11. This issue
can be solved easily by using random IV based symmetric encryption, however,
this will prevent executing all queries.“ You only apply the real medical data
to show its practicality.

------
michaelmior
Related:

[http://research.microsoft.com/en-
us/projects/cipherbase/](http://research.microsoft.com/en-
us/projects/cipherbase/)

[http://www.cs.hku.hk/research/techreps/document/TR-2014-03.p...](http://www.cs.hku.hk/research/techreps/document/TR-2014-03.pdf)

------
jdimov9
I'd like to see this built into PostgreSQL in a way where I can optionally
enable it for certain tables only (e.g. accounts, user profiles, sessions).

~~~
A_Beer_Clinked
The papers I mentioned above) indicate that it can be used against postgres.
You can annotate your scheme to configure how you want the tables to be
handled.

------
jamies888888
Do you have any examples of this being used on a live website?

~~~
wsxcde
Raluca gave a job talk at Princeton a year or so ago and she said CryptDB has
been adopted by Google, among others.

~~~
tedunangst
What does that mean? Is the search index encrypted? Is gmail? Some internal
site counting down until the cafeteria serves fish tacos again?

~~~
cvwright
They have a tech demo showing how you can store encrypted data in BigQuery,
only touching the plaintext on the client.

[https://github.com/google/encrypted-bigquery-
client](https://github.com/google/encrypted-bigquery-client)

