We already have a way of checking equality and indexing data safely using digests.
They have come up with similar techniques for ordering encrypted data, performing calculations on encrypted data and doing full text search on encrypted data.
These are all quite amazingly useful to me even though there are definite drawbacks. Eg. an encrypted value that you want to provide calculations on is stored in a 2048 field. But there are definitely great applications for it where it would be worth it.
I am still trying to understand what benefit their DET and JOIN constructs have over just using say a sha256 digest. But I have only skimmed the paper so far.
It would be interesting to see if this can be setup on an ec2 instance proxying towards an RDS instance. I don't from the outset see why not.
The DET construct (and this might apply to the JOIN as well, I don't remember) is most useful with symmetric key encryption since then you can use in inside one of their "onions". You couldn't peel off the DET layer to get more functionality if it was stored as a digest, since those are only one-way.
It has been a fascinating evolution personally to write software that creeps on the border of sqlite not being quite enough. (Which probably speaks to the design, but still, it seems like a rite of passage).
Yes. Three detailed papers in PDF format. The front page needs a summary of what it actually is though, and the README file provides that information quickly and efficiently, whilst the website doesn't.
You're not going to escape having to trust someone along the way, the goal is to minimize this trust. Presumably the win with CryptDB is that you don't want to trust Joe Developer who is writing SQL queries, and you don't want to trust Evil Steve who is walking around the cloud datacenter with a bolt-cutter and a USB drive.
Right now, if you want to store sensitive data, you basically have to do it all in-house, which costs big money (think of all the PCI regulations you have to satisfy and auditors that you need to pacify).
I think the idea is that a users data is encrypted with their password, or a key derived from it. So the key doesn't really sit anywhere that the admin can access it, it only exists in memory for a short period of time when the user enters it into the website, and the web app is decrypting the data using it.
This doesn't just defend the data from unscrupulous sysadmins, it also defends data from hackers who manage to gain access and run a mysqldump.
Actually, your first statement is not true. They also present a multi-user mode, where the keys are generated by a user's password when they login. The keys only remain active while the user is logged in, so if somebody gets a hold of your proxy only those users data is vulnerable. Although, I will admit, the paper seems to assume an attacker only gets a small attack window (I believe) and hasn't just installed something that monitors the proxy indefinitely.
The point you raise about attackers monitoring the proxy for a long time is important.
My understanding is that CryptDB offers no protection from attackers who sit between the proxy and the web server. Obviously the web server (or other client) must deal with plaintext, otherwise application software would require changes. The authors of CryptDB assert in no uncertain terms that this is a drop in solution that requires zero application changes, therefore the proxy must do all the work.
The idea is to run the proxy on a different machine than the database, thus allowing the maintenance of the database server's hardware, OS, and RDBMS software to be outsourced without providing access to your data. No amount of monitoring of traffic between the proxy and the RDBMS should matter.
The weakest part of this system is that is appears to store the data in the database with different types of encryption that allow for various operations to be performed on the cipher text. I think that anyone who controls the database system can obtain some of the weaker cipher texts of the data and possibly break them.
I really can't be sure until I test it out... I'm kinda disappointed that it doesn't come with quick instructions to get it going on postgres.
Nice try MIT, but I don't really see this useful in the real world. Typically data that needs to be encrypted must be accessed by system processes to make any application useful. For example you want to encrypt the contact email, but you want to send automatic alerts to that email or a system needs to do an automatic credit card payment. Those are hard to do when you need the users password to decrypt the data.
You're thinking at the wrong level. The password that encrypts a user's credit card data is almost certainly not the password a customer logs in with. It's some highly controlled password that only some privileged authentication server knows.
The goal is to reduce the attack surface, and prevent incidental discovery of data. For example, I could set up a server that manages high security passwords and only grant access to a select few trusted people. I have to carefully audit how that server gets used, and who can use it, but I can let any old DBA mess around with all my encrypted data. I can throw it on any old server, I could outsource it to some cloud hosting company, it doesn't matter. That's a huge win in some industries. The only thing I need to trust now are the servers with credentials and the CryptDB software itself, I don't need to care about the data itself.
There are two modes presented in the paper: single-user and multi-user. In single user, the proxy has a master key in which it derives all other necessary encryption keys from. In multi-user mode, keys are generated from a user's password and the database schema is annotated to identify who's keys can decrypt which data.
So for your proposed example, you would likely run in single user mode and it would work just fine.