Actually, there are encryption schemes that allow deduplication. They leak infor...

vy8vWJlco · on Jan 17, 2013

Knowing the mapping between a hash of some plaintext and it's de-duplicated ciphertext means a person can just provide a list of hashes and ask Mega to delete their corresponding ciphertexts, even if they can't break the encryption. At least if they maintain their ignorance they can truthfully say they don't have the power to track down a ciphertext for any given plaintext hash. Hopefully they will, and just provide bulk cloud storage, with people holding onto their little key files. It's much easier to back up a 1KB key-file (or whatever form it comes in) than the encrypted 250GB blob it protects.

mindslight · on Jan 18, 2013

Derive your encryption key from the contents of the file and a "convergence key". The "convergence key" can then be null for global convergence, a shared secret for a privately shared convergence, or a random nonce for no convergence. The derived encryption key is stored the same in every case. When encrypting a file, clients trade off using space versus a file getting deleted if the server is required to remove the ciphertext. The server never knows the difference.

vy8vWJlco · on Jan 19, 2013

This could even actually be done by the user before storing it on the cloud service and finding duplicates would be trivial server-side. (Though I don't see much incentive for a person to do this since it only benefits the hoster.) For example, in the Mega interface, a user could specify the length of the convergence key (random salt that inversely affects the likelyhood of de-duplication on the host) with a default length of 0. This would then be part of the "key" proper, as those bits are required to access the original file.

mindslight · on Jan 19, 2013

And it should be done such that the server treats everything the same. The incentive comes from deduped files counting less against storage quotas, and no time spent uploading the file. I'm just commenting on the general approach here, not the applicability to any particular type of service.

But your 'random salt' idea suffers from the attacker just generating all possible encryptions of the plaintext due to the small number possibilities. The "convergence key" is solely a security-parameter-length key that you can pass around to your friends so that your files will dedupe with theirs while not being susceptible to confirmation attacks by others.