> Files and folders, therefore, are encrypted with symmetric encryption. Symmetric encryption means the same key is used to encrypt and decrypt your data; this is less secure than asymmetric encryption (where one key encrypts and a different key decrypts), but it's faster and easier to implement.
Asymmetric encryption is not appropriate for encryption of large amounts of data (or would require incredibly large keys to work). Symmetric encryption is always used to encrypt large chunks of data, and asymmetric encryption can be used to encrypt a symmetric key.
The article seems to hint that Mega was lazy in its implementation, that's not really true.
> For the data stored in Mega, the encryption key used is generated for you at the time of sign-up and is itself hashed—or scrambled—using your account's password.
The key can't be hashed, or it couldn't be recovered. A hash is meant to be un-reversible. Unless the article means that the key itself is a derivation (a hash?) of the password. This would indeed be a major issue!
If the key is encrypted using the password (hopefully after running it trough a PBKDF), it is fine.
The absence of a password change option is indeed worrying, but the absence of a password recovery option is reassuring. If you lose your password, then you lose the ability to decrypt your key. If you lose your ability to decrypt your key, you lose your data.
It indeed is problematic that you can't back up the key though!
Why is this an issue? Other people call this 'issue' PKCS5.
That's why I suggested that a more appropriate application would be to derive a key from the password (using a PBKDF, which PKCS5 is (PBKDF2)) and use that one to encrypt the file encryption key.
This way, you can always change the password (provided you still have it) and encrypt the encryption key again, while not needing to encrypt each file.
It is in there interest to do de-duplication and as they are still required to remove files under the DMCA it will just mean that multiple people loose there infringing files at once. All someone then needs to do to upload the file again is change one byte.
The encryption also stops them from being able to implement a back door for the content industry to police the site themselves as was the case with megaupload.
If the file blocks are being encrypted with a unique key before they're uploaded, you seemingly wouldn't have to alter any bytes of a file that got hit with a DMCA request. You'd just have to re-upload it.
If every copy of THEHOBBIT.mp4 resulted in the same data after it was encrypted, if it was a predictable transformation, I don't see how any court would give Mega a pass for not 'knowing' what the contents of an uploaded file were. They'd not only know they'd be explicitly generating a fingerprint of that unique file in creating the encrypted version. It'd be irrelevant as a legal shield.
Their clinging on deduplication proves they are not serious about anonymity and safety.
I'd agree that they're not serious, if it's true that they're deterministically generating encryption keys based on file hashes. But the de-duplication part has nothing to do with it.
The biggest user-facing problem with torrents today is that it takes too long to find 'the right' or 'good' files. Ones that aren't loaded up with malware, or mislabeled, or encoded poorly, etc. Having to push various one-byte-difference copies of files is going to explode the search problem for users. And, once found, 'good' files couldn't be popularized, lest they be pulled down. And once one user scored a 'good' file, they wouldn't be able to share it directly through the service to even known associates due that same universal blacklist.
The usability on such a system sounds like a wet-dream for driving people into the arms of for-pay alternatives.
Further, having a fingerprint on every file you upload is a direct violation of the marketing pitch of security and anonymity that Dotcom's making to the 'legit' crowd.
Consider if there were a security breach at a popular online service and a text file of compromised accounts released. The FBI could upload a copy and then subpoena Mega for information on every user who had that same copy.
Sure, people could alter their files themselves, but who wants to put up with that kind of cognitive overhead to prevent malicious prosecution/persecution?
As read elsewhere, do not store any confidential file on mega: the encryption does not protect the user, but the platform itself.
User uploads file. Mega computes the convergent encryption E(F) using the hash of the file H(F). The hash of E(F) is H(E(F)) and determines the file already exists in Mega, and thus is deduplicated.
Mega does not tell the user this, and their used storage size increases (thus, RIAA cannot upload The Hobbit and determine it's already there). The user enters their password P, the convergent hash H(E(F)) is encrypted with the user's password - P(H(E(F))) - and is only stored correlated with the user as such. The hash of the original H(F) (used to convergently encrypt the file) is also encrypted with the user's password, as P(H(F)).
On retrieval, the user enters their password P, the hash P(H(E(F))) and the hash of the original P(H(F)) is decrypted. Now Mega knows where to find the convergently encrypted file, using H(E(F)) to locate E(F). Mega decrypts using the hash of the original, and returns the file F to the user.
If the password P is only stored hashed (as it should!), then there is no way to correlate a given infringing file with any other user's ownership. The user's account only contains P(H(E(F))) and P(H(F)), both of which are unique to that user.
Anyone see a problem (other than the implementation details and possible lack of motivation for Mega to do so)?
Basically, if a user's 'tree' is encrypted with his password, I don't think anyone can identify who has a particular file. It does allow to revoke the file for everyone in one operation though.
Symmetric encryption means the same key is used to encrypt and decrypt your data; this is less secure than asymmetric encryption (where one key encrypts and a different key decrypts), but it's faster and easier to implement.
Isn't that comparing apples to oranges? What would be the benefit for Mega or the user to switch to RSA for that? Encryption would be using a symmetric cipher anyway, unless I'm missing something.
After all, unless you audit all the code fetched each time you load the page, they can mess with the code client side at any moment without anybody noticing. Is this not more telling about the limits of webdev rather than the skills of Mega's coders?
I haven't studied what Mega is doing at all, nor am I ever planning to; my point is just that however badly written the article is, there is indeed evidence that there's bad crypto in this system.
AFAIK this applies to any software.
> Symmetric encryption means the same key is used to encrypt and decrypt your data; this less computationally-intensive and easier to implement than asymmetric encryption, which we'll get to in a moment.
"Files and folders, therefore, are encrypted with symmetric encryption. Symmetric encryption means the same key is used to encrypt and decrypt your data; this is less secure than asymmetric encryption (where one key encrypts and a different key decrypts), but it's faster and easier to implement."
Symmetric ciphers are used almost exclusively for the encryption of large files. They're more secure, faster, and easier to work with.
This actually doesn't follow at all. What files a given user has in their account doesn't have to be known to the server, though it's generally not difficult to correlate gets/puts/deletes to accounts. If the hierarchy is handled on the client side and encrypted like everything else, then all the server knows is: we store blobs identified by keys, and when those keys are overlapping, we don't need to store a second copy.
This does make keeping track of storage usage impossible.
After all, if each file is being encrypted with different keys, then Alice and Bob's encrypted copies of THE HOBBIT wouldn't match at all. So much as removing the file from Mega's servers and re-uploading it would seemingly change the key and thus the encrypted data entirely, right?
And when encrypted blocks do happen to match, there'd be absolutely no way of knowing whether they represent the same block of the same source file. In fact, one could be reasonably confident that a block overlap was purely incidental, given the random unique keys. (barring completely broken key generation)
As far as I can tell, they're generating the keys for the files from a hash of the file, meaning the keys are not random and unique (for the files -- user keys are different). I described rough steps for secure dedupe in another comment below.
They're going to wind up having to maintain a hash blacklist which is going to be just annoying enough that people aren't going to bother.
It's a shame mega isn't just running an encrypted block-level store. They should have shipped client binaries that handle the encryption and key generation and simply exchanged blocks with Mega servers.
Suppose they are using 1MB chunks. The keys are 16 bytes. The 1MB chunk may be 2 ^ (8 millionish) different things, the keys only 2^128 different things. Thus, two different 1MB chunks, to be the same file with different keys, must have two of their possible 2^128 encryptions overlap out of the possibility space of 2^(8 millionish); that any two different chunks would even have such an overlap available is exceedingly, exceedingly improbable, let alone that the two users would end up with the exact correct keys to have actually manifested the overlap.
There are only two plausible theories when an overlap occurs: 1. It is the same file with the same key or 2. Someone's worked out how to artificially create a collision. And those two are hardly equal in probability either....
: Yeah, that's a simplification since there aren't actually 2^128 valid keys. Roll with it. Actually I've taken a number of small liberties for simplicity; none of them matter.
People have this weird fetish around deduplication, but it isn't magic. It makes it so the tenth copy of storing a backup of Windows XP doesn't hardly cost you any space. This is where the astonishing compression claims come from. It does not magically compress much of anything else, though. The claims are true, but not generally applicable. In practice, the middle bits of otherwise unrelated files don't get de-duped, excepting a couple of obvious and rare cases like 'a megabyte of zero padding in the middle of a file' which hardly amount to anything. Your World of Warcraft texture file is simply not going to overlap with your eBook copy of 50 Shades of Grey, and in general, two things sampled even from a 2^(524,288) possibility space, which corresponds to an absurdly small 64KB block size, are very unlikely to collide.
Block size has very little effect on collisions; what collides are identical files, or files that are nearly identical because they are versions of the same thing (and block size tends to matter surprisingly little there for various reasons), and what doesn't collide is everything else.
For that matter, if you're writing a library to handle an existing file format that has a flexible location for metadata, please put any metadata (especially user-generated metadata) as far back in the file as possible.
Of course, the main advantage is that you don't have to shift all of the data around if the user starts editing file metadata, but it does also help block-level deduplication.
1MB just sounded much larger than what I've read and seen. At that size, yeah, about the only thing you're going to be de-duplicating is entire files. Interesting that smaller block sizes don't de-duplicate enough in practice to justify it.
But I'm guessing they wanted to do it that way so that the key generation is done on the client side. They could generate high-quality keys on their servers through the normal means of doing so, but then they have access to all user's private keys, and part of the point of this service is that the users don't want to trust the service (ie, Mega).
How could they de-dupe stuff without knowing what it is in the first place?
(Edit: I'm talking about the proposal from previous threads of encrypting with a random key, not encrypting with the hash.)
1) Alice takes file P and hashes it to produce key K
2) Alice encrypts file P with key K to produce file C
3) Alice encrypts key K with key U (her user key) to produce key X
4) Alice sends key X and file C to Mega for storage
When Bob does the same steps on a matching file, the only things that differ are key U and key X -- these are his user key, and his user key for the data. That means that Mega can deduplicate file C freely, because they're identical.
Note that Mega never knows the hash of the original file -- key K -- or it'd be able to decrypt the files.
This implies that it is not convergent, unless dedup is somehow done with the "Meta MAC".
The trick lies in ensuring the key used for encryption is linked to the file, not the user. You then have to store a separate key for each encrypted file (and that encryption key is probably going to need to be wrapped by your actual user key for storage).
It seems like either the server must know the file's encryption key, or the key must be derivable from the contents... which is no good.
Chrome has native crypto.getRandomValues() and Safari's Math.random() was changed to use Arc4 PRNG back in 2008. I'm not sure if it has been updated to account for the flaws in the first bytes of Arc4, but if it has then this is cryptographically strong.
I don't know where Firefox and the other browsers stand on CSPRNGs, but as more and more implement crypto.getRandomValues(), this will improve services relying on client-side crypto.
If this is true, why does an RSA key have to be significantly longer than an AES key?
In fact in most implementations asymmetric crypto is not used for exchanging or storing data. It's used to exchange a key that can be used for symmetric crypto.