Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually, there are encryption schemes that allow deduplication. They leak information (that the file you have already exists), but the encrypted bits themselves are secure.

The keyword is "convergent" encryption. We used something like this at Iron Mountain Digital many years ago (they still do, AFAIK), and it is used in BitCasa today.

You should read the papers, but essentially the concept can be boiled down to encrypting the plaintext with a hash of the plaintext.

Since there is no way to derive the hash of a plaintext from an encrypted block, there is no way to hack the key other than regular old brute force. But if the same data is uploaded twice, the same hash is computed, and thus the same encryption is used, and thus the encrypted cipher text is identical.

The encryption keys can be stored separately from the cipher text. In particular, the user who uploaded the data would store the hashes (this would already happen in most backup applications anyway). Then, for retrieval, they give the hash and the block location to the server, who is now able to decrypt it. By stealing the server, you gain zero access to plaintext data.

Very cool stuff :)



Knowing the mapping between a hash of some plaintext and it's de-duplicated ciphertext means a person can just provide a list of hashes and ask Mega to delete their corresponding ciphertexts, even if they can't break the encryption. At least if they maintain their ignorance they can truthfully say they don't have the power to track down a ciphertext for any given plaintext hash. Hopefully they will, and just provide bulk cloud storage, with people holding onto their little key files. It's much easier to back up a 1KB key-file (or whatever form it comes in) than the encrypted 250GB blob it protects.


Derive your encryption key from the contents of the file and a "convergence key". The "convergence key" can then be null for global convergence, a shared secret for a privately shared convergence, or a random nonce for no convergence. The derived encryption key is stored the same in every case. When encrypting a file, clients trade off using space versus a file getting deleted if the server is required to remove the ciphertext. The server never knows the difference.


This could even actually be done by the user before storing it on the cloud service and finding duplicates would be trivial server-side. (Though I don't see much incentive for a person to do this since it only benefits the hoster.) For example, in the Mega interface, a user could specify the length of the convergence key (random salt that inversely affects the likelyhood of de-duplication on the host) with a default length of 0. This would then be part of the "key" proper, as those bits are required to access the original file.


And it should be done such that the server treats everything the same. The incentive comes from deduped files counting less against storage quotas, and no time spent uploading the file. I'm just commenting on the general approach here, not the applicability to any particular type of service.

But your 'random salt' idea suffers from the attacker just generating all possible encryptions of the plaintext due to the small number possibilities. The "convergence key" is solely a security-parameter-length key that you can pass around to your friends so that your files will dedupe with theirs while not being susceptible to confirmation attacks by others.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: