8. Our service may automatically delete a piece of data you upload or give someone else access to where it determines that that data is an exact duplicate of original data already on our service. In that case, you will access that original data.
Duplicate check, I get that. But, how do they do it? They say the files are encrypted on the browser, so if I upload file X and other user uploads X too, they can't know they're the same because both uploads are encrypted. So, they can check only for duplicates of the encrypted outcome of each file. But, wouldn't that be inefficient? Probability of collision in encrypted files is (AFAIK) really low, something like 2^(-N), N being the size of the file on bits... If I did it well, it'd be a collision probability of 7.458E-155 for a file of 1MB.
EDIT: Added example.
Apparently, they use unsalted symmetric key encryption which allows them to discover [hash(file), password] duplicates. By comparison, the old Megaupload would apparently deduplicate based on [hash(file)] matches.
Suppose Alice and Bob have files [D, E, F] and [F, G, H], respectively. If MEGA discovers that Alice and Bob share a duplicate file F and Alice to reveals her password (through password frequency analysis or to the Government), then all of Bob's files are compromised.
I would personally feel unsafe storing private documents on Mega due to the lack of public/private key encryption, but that's me.
It is probably worth pointing out that all of this encryption stuff is only for them. Nomatter how they brand it, the purpose is to give them plausible deniability, which they lacked with the previous Mega.
If you want encryption meant for you, then you should probably just encrypt your files yourself before you let Mega touch them.
Morally ok? That is subject to opinion. I don't have a problem with them now, and I never did.
Legally immune? They very well could be. Certainly any action against them is going to be much harder this time, if only because a bunch of people in New Zealand are still pissed about what happened last time without the revisions that were made. If their system works as they claim, and renders them unable to govern content, then how could they be considered culpable for content? If I start posting nasty stuff encrypted with PGP to HN, would HN be to blame for failing to recognize the nasty stuff and remove it?
Hm. This makes it sound like your point stands regardless of personal ethics; as though
> profitting off of the distribution of other peoples work, without their permission, is not something that should be encouraged or tolerated
is a provable, obvious, non-debatable stance. I don't think it is.
I don't think profiting off it changes the basic ethics; if something is morally okay to do as a hobby, it should generally be morally okay to do as a business until proven otherwise.
I wouldn't agree you always need the permission of someone upstream to share ideas or content. That's something we can discuss. Don't make it sound like you have the answer sheet in front of you.
Is the line about intent rather than technicality? For example how driving a bus that a drug dealer is on is not illegal, but driving the getaway car from a robbery is? If it is, then how does one prove that Dotcom wants to support copyright violation, rather than his official stance of just believing that there is a level past which he can not be expected to police his customers?
Enforcement of the standard you're promoting would require pervasive surveillance of every data storage and transfer method in the world, and backdoor access for all forms of encryption.
Freedom of information and communication is orders of magnitude more important for a healthy and free society than copyright of digital goods, and you can't have both--they are fundamentally incompatible.
To address your latest spurious post, dropbox, gmail, etc. Facilitate file sharing on a small scale. Public links to megaupload listed on public aggregating websites that list the latest 'releases' are an entirely different matter. You obviously know that but are willfully blind to it to make stupid arguments.
Let's say you are right. Explain to me how you would justify the pricing scheme we have for digital media to the average person on this planet, who most likely is Indian/Chinese/Nigerian and makes less that $5 a day.
How much should we charge this individual? Do you think they have the same right as westerners to challenging themselves and experiencing other cultures?
No, you assume that piracy is only white people that don't have a large allowance, which is a childish view perpetuated by the media. Most people don't have a computer, and if they are lucky enough to get one they should be able to have access to a large variety of the same digital goods.
tl;dr show some goddamn empathy.
Don't you have it in your head that danenania's position is incorrect and it's therefore a waste of time to argue the subject with you? It certainly seems so, specially since danenania isn't actually arguing for piracy, just that (s)he considers the steps of eliminating piracy to go against more important values.
As an analogy, the fact that I defend almost absolute free speech doesn't mean I'm in favor of all speech, it just means that eliminating said speech is worse than allowing it to exist freely.
Because clearly they profit off the distribution of other peoples work without their permission every day.
What do you have to say to those of us who disagree completely and fully with the entire concept of intelectual property?
With the way I feel about copyright, I believe distribution of other peoples work, with permission or otherwise is something that should be actively encouraged and praised.
Whether they profit off of it or not is completely irrelevant.
And if mega is de-duping content then that could technically eliminate "whack a mole" for the copyright holders (except for people re-encoding the file and re-uploading). I had heard in the past that mega would only take down one link.
And presumably, unless they think they can legally get away with ignoring it, they will.
What they won't be able to do is respond to a request that says "Delete all copies of [Big movie of the year], and continue to delete all of our movies as they pop back up."
I feel like a goddamn broken record.
This is the important bit. Files can be shared between private groups, using Mega as an intermediary, without them ever becoming indexed on the public web. Previously, files shared in this manner could still be a target of a takedown, because Megaupload would know they had them, even if nobody else did. Now Mega can make a stronger guarantee about keeping this kind of sharing "safe", because they have no idea whether they're hosting this kind of file or not.
As soon as the sharing group got big enough to notice, then there could be agents in the group reporting infringement.
It would kind of function like hard links on linux file systems.
The document does say it uses public/private keys for data transfer. You would not usually use public/private for data storage because of the huge keysizes required.
If they use asymmetric encryption for data transfer, then how does that work as part of a convergent encryption scheme? Wouldn't all of the file hashes be different at that point?
It seems really interesting, so they can check for duplicates while keeping files secure. Thanks!
"However, one more attack: an attacker can guess plaintexts and test if you have that file."
If that's the case, pirates beware.
Edit: GNUet quote:"The gnunet encryption scheme is remarkable in that it allows identical files
encrypted under different keys to yield the same ciphertext, modulo a small
block of metadata. These files can then be split into small blocks and distributed
(and replicated if need be) across hosts in a gnunet system to balance load."
Efficient Sharing of Encrypted Data - Krista Bennett, Christian Grothoff, Tzvetan Horozov, Ioana Patrascu
They wouldn't really want that, would they? So as clever as it is, I doubt they do it this way.
The thing with copyrighted content, though, is that even if the file you're checking might be infringing on copyrights in certain cases, in other cases it might as well be completely legit. I wrote about this on some earlier MU submission, so I won't repeat all that here, but all in all, even if you knew that file X existed on Mega's servers, it would be pretty damn haphazard to just outright delete it, because you might be hurting many legitimate users by doing so.
Anyway, I think Mega could secure user's files simply by encrypting the locator keys they have with the user's own key, and this data only gets decrypted and parsed client-side when the user uses Mega with the user's own key. This way you could only prove that a file exists on Mega's servers, but had no way to check which user(s) it belongs to without cracking the individual user data one by one. And of course, if you don't have any exact files to check against Mega, then you wouldn't be able to even figure out whether "content X" is hosted there somewhere, and neither could Mega (since they'd naturally only store locator hashes and encrypted data itself).
True, and it seems that Mega's copyright infringement reporting page has an option for that:
Takedown Type: Remove all underlying file(s) of the supplied URL(s) - there is no user who is permitted to store this under any circumstance worldwide
One solution to that would be to have two hashes of the file, and to use one as the key and one as the index.
This is just the reason, why sensitive data needs to be encrypted before deduplication.
OFF System is also one interesting approach to this problem: https://en.wikipedia.org/wiki/OFFSystem
These are more or less "funny" work-a-rounds to the actual problem.
This is only reasonable method when ever you can assume that every plaintext is unique, which also makes point of deduplicationg data absolutely pointess.
Example: Alice and Bob both upload the same file to Mega. Alice is raided by the MAFIAA. They get a court order telling Mega to list all users having a copy of that file. Can Mega comply? Or does not even Mega know that Bob has the same file too?
Related question: Is there any way for Bob to share his uploaded file with friends?
The idea is simple: if you have data, then you can generate the key from data itself to encrypt and decrypt than data. Then you use hash of encrypted data to look up if server (or just other side) has the same data. If hash exists, no need to send to server - just tell server to bump references. If hash does not exists, just upload data. The key derived from data needs to be stored locally - if you lose it, you will lose data too.
key = f(data);
Which is to say that every file has a globally 1-to-1 mapping to its encrypted version. I'm not sure how they are storing the (User, [(Filename,Key)]) data, but this is ideally encrypted on a per user basis, making any sort of per-user lookup attacks moot.
Encrypted data is indistinguishable from random data, so it is incompressible. The chances of finding anything significant to deduplicate are so low that it's not worth trying.
For that reason, when ever I'm sharing anything I usually encrypt files with my recipients public keys before sending those out. Just to make sure that data is really private and keys are known only to my selected peers. In some cases when I want to make stuff even more private, I encrypt data separately with each recipients public key, so you can't even see list of public key ID's which are required to decrypt the data.
I also have 'secure work station' which is hardened and not connected to internet. That's the workstation I use to decrypt, handle and encrypt data. Only encrypted and signed data is allowed to come and go to that workstation.
Not sure I understand this, how can you deduplicate if every file is encrypted via a random key?
Then, on the user side, they store an per-user encrypted index (random, counter, MAC) to those individual chunks to represent the file.
That way, they can only see giant encrypted blocks of data, and per-user encrypted indexes to data. But it is all encrypted.
They would need to hack into accounts by keylogging passwords to decrypt the indexes and see what files users can actually access.
Public links could be shared by giving out a key in the URL that is a file containing indexes to other blocks. So whoever knows the URL, knows the index, and can get the data.
That is the way I'd design it, at least... :)
They never get to see your key or what is in the file.
Of course this will only give you the key to that one particular file, not any other files that you do not have yourself.
Assuming that it works this way, it would allow Mega to figure out if you own a known "bad" file. Just take something like "New_Jay-Z_Album.zip," hash it, and try the hash against your encrypted files. It seems like Kim is trying to avoid this problem.
"File integrity is verified using chunked CBC-MAC. Chunk sizes start at 128 KB and increase to 1 MB, which is a reasonable balance between space required to store the chunk MACs and the average overhead for integrity-checking partial reads."
Also, if you believe Microsoft Outlook, there's something called "compressible encryption", which implies there are encryption schemes that aren't exactly random, meaning in turn that not all N-bit blocks are equally likely.
not to mention metadata associated with such a small block would be probably more than the size of the block itself.
However given a file Mega would be able to tell whether they stored that file, eliminating the plausible deniability they're aiming for.