Hacker News new | past | comments | ask | show | jobs | submit login

I just found a weird things on they ToS [1] ...

8. Our service may automatically delete a piece of data you upload or give someone else access to where it determines that that data is an exact duplicate of original data already on our service. In that case, you will access that original data.

Duplicate check, I get that. But, how do they do it? They say the files are encrypted on the browser, so if I upload file X and other user uploads X too, they can't know they're the same because both uploads are encrypted. So, they can check only for duplicates of the encrypted outcome of each file. But, wouldn't that be inefficient? Probability of collision in encrypted files is (AFAIK) really low, something like 2^(-N), N being the size of the file on bits... If I did it well, it'd be a collision probability of 7.458E-155 for a file of 1MB.

[1] https://mega.co.nz/#terms

EDIT: Added example.




"A node token ("magic cookie") grants access to a subtree of the issuing user's filesystem. An associated symmetric key is required to decrypt and/or store decryptable data."

https://mega.co.nz/#developers

Apparently, they use unsalted symmetric key encryption which allows them to discover [hash(file), password] duplicates. By comparison, the old Megaupload would apparently deduplicate based on [hash(file)] matches.

Suppose Alice and Bob have files [D, E, F] and [F, G, H], respectively. If MEGA discovers that Alice and Bob share a duplicate file F and Alice to reveals her password (through password frequency analysis or to the Government), then all of Bob's files are compromised.

I would personally feel unsafe storing private documents on Mega due to the lack of public/private key encryption, but that's me.


> I would personally feel unsafe storing private documents on Mega due to the lack of public/private key encryption, but that's me.

It is probably worth pointing out that all of this encryption stuff is only for them. Nomatter how they brand it, the purpose is to give them plausible deniability, which they lacked with the previous Mega.

If you want encryption meant for you, then you should probably just encrypt your files yourself before you let Mega touch them.


It's pretty embarrassing that tech blogs are even covering this. Copyright law is about intent, not about technical specifics. The idea that new megaupload is ok because it encrypts illegitimate content before storing it is absurd.


What do you mean by "ok"?

Morally ok? That is subject to opinion. I don't have a problem with them now, and I never did.

Legally immune? They very well could be. Certainly any action against them is going to be much harder this time, if only because a bunch of people in New Zealand are still pissed about what happened last time without the revisions that were made. If their system works as they claim, and renders them unable to govern content, then how could they be considered culpable for content? If I start posting nasty stuff encrypted with PGP to HN, would HN be to blame for failing to recognize the nasty stuff and remove it?


However you feel about copyright, profitting off of the distribution of other peoples work, without their permission, is not something that should be encouraged or tolerated.


> However you feel about copyright

Hm. This makes it sound like your point stands regardless of personal ethics; as though

> profitting off of the distribution of other peoples work, without their permission, is not something that should be encouraged or tolerated

is a provable, obvious, non-debatable stance. I don't think it is.

I don't think profiting off it changes the basic ethics; if something is morally okay to do as a hobby, it should generally be morally okay to do as a business until proven otherwise.

I wouldn't agree you always need the permission of someone upstream to share ideas or content. That's something we can discuss. Don't make it sound like you have the answer sheet in front of you.


Roughly speaking I agree with you, but where do you draw the line? If I download a movie illegally right now then chances are Google and Firefox are immediately benefiting from it, and then even further away are Microsoft (OS), hardware manufacturers, etc. who profited off enabling me to do this.

Is the line about intent rather than technicality? For example how driving a bus that a drug dealer is on is not illegal, but driving the getaway car from a robbery is? If it is, then how does one prove that Dotcom wants to support copyright violation, rather than his official stance of just believing that there is a level past which he can not be expected to police his customers?


What, in the same way that manufacturing syringes that will be used to mainline illegal narcotics should not be "encouraged or tolerated"?


It takes some twisted logic to think thats a valid analogy. Dropbox is a perfectly valid filehosting service because they take reasonable measures to prevent copyright infringement. Megaupload(and mega) is not, because it's run by a person who has no real interest in preventing copyright infringement, and has shown that he's more than willing to profit off it, while he pretends that he doesn't know it's occurring or can't prevent it.


Define 'reasonable measures'. People use Dropbox to share copyrighted material all the time. Ditto S3, gmail, external hard drives, and any file storage method in existence.

Enforcement of the standard you're promoting would require pervasive surveillance of every data storage and transfer method in the world, and backdoor access for all forms of encryption.

Freedom of information and communication is orders of magnitude more important for a healthy and free society than copyright of digital goods, and you can't have both--they are fundamentally incompatible.


Arguing against piracy on the internet is a giant waste of my time because if you have it in your head that piracy is "good" or "ok" then you will rationalize bad arguments all day defending an incorrect position.

To address your latest spurious post, dropbox, gmail, etc. Facilitate file sharing on a small scale. Public links to megaupload listed on public aggregating websites that list the latest 'releases' are an entirely different matter. You obviously know that but are willfully blind to it to make stupid arguments.


No, you are taking a childish view of the defenders of piracy.

Let's say you are right. Explain to me how you would justify the pricing scheme we have for digital media to the average person on this planet, who most likely is Indian/Chinese/Nigerian and makes less that $5 a day.

How much should we charge this individual? Do you think they have the same right as westerners to challenging themselves and experiencing other cultures?

No, you assume that piracy is only white people that don't have a large allowance, which is a childish view perpetuated by the media. Most people don't have a computer, and if they are lucky enough to get one they should be able to have access to a large variety of the same digital goods.

tl;dr show some goddamn empathy.


Arguing against piracy on the internet is a giant waste of my time because if you have it in your head that piracy is "good" or "ok" then you will rationalize bad arguments all day defending an incorrect position.

Don't you have it in your head that danenania's position is incorrect and it's therefore a waste of time to argue the subject with you? It certainly seems so, specially since danenania isn't actually arguing for piracy, just that (s)he considers the steps of eliminating piracy to go against more important values.

As an analogy, the fact that I defend almost absolute free speech doesn't mean I'm in favor of all speech, it just means that eliminating said speech is worse than allowing it to exist freely.


First of all, MegaUpload did take preventative measures, at least that is what they argue. It's clear you don't understand how dangerous it is to take down the provider for a users actions.


Twisted logic? Many organizations distribute syringes explicitly for heroin.


Next you're going to be telling me they make pipes for smoking marijuana. Call me when you start making sense.


No artist is ever paid when Vevo displays an artists music video on Vevo. Yet, google pays artists who post their own work. That means Gangnam Style has probably make over a million dollars from advertising on Youtube. So who is stealing more? The distributors or Google. The answer is: they are all stealing. The largest portion always goes to the content manager. Pulling distribution from the hands of companies and putting it back into the artist control will see a much fairer distribution of wealth. Thus the conflict and artificial moral discussions we see pushed from media about this. Sharing is a moral act.


Congratulations, you've just made criminals of Google.

Because clearly they profit off the distribution of other peoples work without their permission every day.


Gmail shows me ads related to the content of my email. Worse, it shows ads related to the content of emails sent by my friends and family.Some of which have no relationship or agreement with Google. Google does this to maximize advertising revenue. Google is profiting off the original works of my friends and family without their permission and provides them absolutely nothing in return. Now you may complain about the moral and ethical practice of distribution rights. But sharing with people you know is not distribution. Most countries have different laws for sharing then distribution. In my country it is naturally legal to share music and other recorded content. Your country may convince you that this immoral, but in the natural sense, sharing is never immoral. In fact it is compassion.


> However you feel about copyright

What do you have to say to those of us who disagree completely and fully with the entire concept of intelectual property?

With the way I feel about copyright, I believe distribution of other peoples work, with permission or otherwise is something that should be actively encouraged and praised.

Whether they profit off of it or not is completely irrelevant.


As soon as a file goes public and anybody can get it, then it seems to me irrelevant that its encrypted on mega servers. They'll have to respond to takedown requests because the contents are known.

And if mega is de-duping content then that could technically eliminate "whack a mole" for the copyright holders (except for people re-encoding the file and re-uploading). I had heard in the past that mega would only take down one link.


> They'll have to respond to takedown requests because the contents are known.

And presumably, unless they think they can legally get away with ignoring it, they will.

What they won't be able to do is respond to a request that says "Delete all copies of [Big movie of the year], and continue to delete all of our movies as they pop back up."


As long as it's the same file, they will be able to do precisely that.


> And presumably, unless they think they can legally get away with ignoring it, they will.

I feel like a goddamn broken record.


> As soon as a file goes public

This is the important bit. Files can be shared between private groups, using Mega as an intermediary, without them ever becoming indexed on the public web. Previously, files shared in this manner could still be a target of a takedown, because Megaupload would know they had them, even if nobody else did. Now Mega can make a stronger guarantee about keeping this kind of sharing "safe", because they have no idea whether they're hosting this kind of file or not.


I would think this type of sharing would not be as much of a concern for copyright holders since a private group is not really mass distribution.

As soon as the sharing group got big enough to notice, then there could be agents in the group reporting infringement.


Tell that to the valid copyright holders hosting their content on MegaUpload. Regardless of whether the system is used for legal or infringing purposes, the reality is that distribution companies have the ability to take down both when they claim some are using it to infringe. The concern for copyright holders is not the encryption per say, but the ability for big business to NOT be able to take down their legal content under the guise of a moral cause.


I dont think the content can be accessed by mega without the full URL. The URL probably contains the information about the location of the data plus a passphrase to unlock the key to decrypt the data. In this way mega could hold the data without knowing the URL to access and decrypt it. The full URL would only be retrievable through the user interface, which mega would not have access to unless their servers are storing your login password. Which I assume they are not for legal reasons. I am making assumptions here.


If they did it correctly then they would keep a reference count to that the stuff only gets deleted when the last reference to it gets deleted.

It would kind of function like hard links on linux file systems.


Copyright law can absolutely hinge on technical specifics. Look at network DVR services being forced to store a copy of a show per user in order to be found legal in the US.


Yeah, or the fact that Aereo has one antenna per user,


In order to maintain the plausible deniability, Mega cannot store an decryption passphrase. Meaning they will not have the links to the unencrypted files. As the URL will contain the passphrase within it. The only thing required to make this private is to plug a Tor hidden service onto the front end. That way the direct link to the data is not known by the download user and Mega will not store this link without losing plausible deniability. Win win. Bad part is that all downloads now have to go through Tor (slow).


I'm not entirely sure what they mean by a "magic cookie" here.

The document does say it uses public/private keys for data transfer. You would not usually use public/private for data storage because of the huge keysizes required.


I think the cookie bit is just an authorization credential.

If they use asymmetric encryption for data transfer, then how does that work as part of a convergent encryption scheme? Wouldn't all of the file hashes be different at that point?


I would assume the asymmetric crypto is just for the transfer. In other words you encrypt the data and then send it wrapped in another layer of public/private crypto. Not entirely sure though.


To be fair, if there's any chance that your password is the same as someone else's, then you have a terrible password.


Unfortunately for a lot of mundane (non hacker) users, this is very likely.


Possibly convergent encryption, basically when you encrypt the file you use a hash of the file as the key. This key can then be encrypted with several different passwords meaning that several people can decrypt this file.


This? http://crypto.stackexchange.com/questions/729/is-convergent-... http://www.ssrc.ucsc.edu/Papers/storer-storagess08.pdf

It seems really interesting, so they can check for duplicates while keeping files secure. Thanks!


From that StackExchange link, the top-ranked answer has the following comment:

"However, one more attack: an attacker can guess plaintexts and test if you have that file."

If that's the case, pirates beware.


Yes, there's exactly the same problem as with Freenet. Because same plaintext encrypts to same ciphertext there is huge problem with that. If I really don't anyone want to know that I got this data, that's failed scenario. It makes things easier for service provider, they don't want to know what they're storing. Just like Freenet's data cache. But if I know what I'm looking for, I can confirm if my cache contains that data or not. Therefore this approach doesn't remove need for pre-encrypting sensitive data. Otherwise it's easy to bust you for having the data.

Edit: GNUet quote:"The gnunet encryption scheme is remarkable in that it allows identical files encrypted under different keys to yield the same ciphertext, modulo a small block of metadata. These files can then be split into small blocks and distributed (and replicated if need be) across hosts in a gnunet system to balance load."

http://grothoff.org/christian/esed.pdf Efficient Sharing of Encrypted Data - Krista Bennett, Christian Grothoff, Tzvetan Horozov, Ioana Patrascu


This means that if there's a commonly available plaintext version of a file, then you can encrypt it, compute the hash of the encrypted version and then serve it to Mega along with a DMCA takedown notice.

They wouldn't really want that, would they? So as clever as it is, I doubt they do it this way.


>then serve it to Mega along with a DMCA takedown notice.

The thing with copyrighted content, though, is that even if the file you're checking might be infringing on copyrights in certain cases, in other cases it might as well be completely legit. I wrote about this on some earlier MU submission[1], so I won't repeat all that here, but all in all, even if you knew that file X existed on Mega's servers, it would be pretty damn haphazard to just outright delete it, because you might be hurting many legitimate users by doing so.

Anyway, I think Mega could secure user's files simply by encrypting the locator keys they have with the user's own key, and this data only gets decrypted and parsed client-side when the user uses Mega with the user's own key. This way you could only prove that a file exists on Mega's servers, but had no way to check which user(s) it belongs to without cracking the individual user data one by one. And of course, if you don't have any exact files to check against Mega, then you wouldn't be able to even figure out whether "content X" is hosted there somewhere, and neither could Mega (since they'd naturally only store locator hashes and encrypted data itself).

[1] http://news.ycombinator.com/item?id=4824986


There'll be plenty of cases when the content is inherently infringing. Cam copies, for example. Additionally, if there's a jurisdiction where a DVD rip of Dora The Explorer is not considered a fair use, they may start pounding Mega to limit this jurisdiction's access to the file. This sort of thing.


>There'll be plenty of cases when the content is inherently infringing.

True, and it seems that Mega's copyright infringement reporting page[1] has an option for that:

Takedown Type: Remove all underlying file(s) of the supplied URL(s) - there is no user who is permitted to store this under any circumstance worldwide

[1] https://mega.co.nz/#copyrightnotice


Oh, nice. This is so tongue-in-cheek. There's no way of knowing the state of any permission worldwide.


Wouldn't this allow MEGA to also get a copy of the key so they can decrypt your data?


Mega would be storing copies of the key that can only be decrypted with a password.


But when someone uploads the same file with a different password, if they want to de-duplicate on content alone, they need to be able to put that new encrypted file hash alongside the original.

One solution to that would be to have two hashes of the file, and to use one as the key and one as the index.


Only if they had a copy of the original plaintext file.


Well, they could have. Unless you're generating original content. This is the problem if you are sharing nothing something which is not 100% original on block level.

This is just the reason, why sensitive data needs to be encrypted before deduplication.

OFF System is also one interesting approach to this problem: https://en.wikipedia.org/wiki/OFFSystem

These are more or less "funny" work-a-rounds to the actual problem.


Which in theory they don't have, as they encrypt it on your browser.


Well, you said "encrypt", what do you mean by encrypt? What's the key? In the Freenet's solution, they encryption key for the data is the data, not any random or user provided key. So the same plaintext is always turned in to same ciphertext. Which we all know, it's very bad idea, it ruins encryption. In this case, it's quite easy to spot out all of the users having the same content in their account.

This is only reasonable method when ever you can assume that every plaintext is unique, which also makes point of deduplicationg data absolutely pointess.


Convergent encryption sounds great. But if Mega is using this, do they have any way of finding out which users have a specific file?

Example: Alice and Bob both upload the same file to Mega. Alice is raided by the MAFIAA. They get a court order telling Mega to list all users having a copy of that file. Can Mega comply? Or does not even Mega know that Bob has the same file too?

Related question: Is there any way for Bob to share his uploaded file with friends?


I guess that is generally done using "convergent encryption". There are many variations and they were done by some peer-to-peer DHT filesystem (I was kinda involved in some "skunk" projects developing it).

The idea is simple: if you have data, then you can generate the key from data itself to encrypt and decrypt than data. Then you use hash of encrypted data to look up if server (or just other side) has the same data. If hash exists, no need to send to server - just tell server to bump references. If hash does not exists, just upload data. The key derived from data needs to be stored locally - if you lose it, you will lose data too.


And deduplication would work since the 2nd person would derive the same key for the data and thus can decrypt it?


Yes exactly.

key = f(data);

upload(encrypt(data, key));

store_key(filename, key);

Which is to say that every file has a globally 1-to-1 mapping to its encrypted version. I'm not sure how they are storing the (User, [(Filename,Key)]) data, but this is ideally encrypted on a per user basis, making any sort of per-user lookup attacks moot.


Perhaps, a much simpler possibility than the other proposed in this thread, is that, in addition to uploading the encrypted file, the service uploads hashes of chunks of plaintext data. That way the service can just compares hashes of data, just like a regular dedup implementation.


Maybe they use block deduplication on the storage arrays? Particularly looking at the "our service may automatically delete a piece of data you upload". In other words, you already uploaded it encrypted and they're just serving out the de-duplicated bits.


There wouldn't be much point in doing that.

Encrypted data is indistinguishable from random data, so it is incompressible. The chances of finding anything significant to deduplicate are so low that it's not worth trying.


Piece is the keyword here.


Read Freenet documentation. They encrypt everything, yet they use very efficient deduplication. I really like Freenet's design. Encryption key is based on the payload, so if you don't know what he payload is, you can't decrypt the packet. Of course decryption keys can be delivered using different encrypted tree of keys, which is used when you deliver download link.

For that reason, when ever I'm sharing anything I usually encrypt files with my recipients public keys before sending those out. Just to make sure that data is really private and keys are known only to my selected peers. In some cases when I want to make stuff even more private, I encrypt data separately with each recipients public key, so you can't even see list of public key ID's which are required to decrypt the data.

I also have 'secure work station' which is hardened and not connected to internet. That's the workstation I use to decrypt, handle and encrypt data. Only encrypted and signed data is allowed to come and go to that workstation.


"Each file and each folder node uses its own randomly generated 128 bit key. File nodes use the same key for the attribute block and the file data, plus a 64 bit random counter start value and a 64 bit meta MAC to verify the file's integrity."

Not sure I understand this, how can you deduplicate if every file is encrypted via a random key?


Maybe on file upload, they encrypt it with the file hash then chunk those encrypted files and store those with dedup.

Then, on the user side, they store an per-user encrypted index (random, counter, MAC) to those individual chunks to represent the file.

That way, they can only see giant encrypted blocks of data, and per-user encrypted indexes to data. But it is all encrypted.

They would need to hack into accounts by keylogging passwords to decrypt the indexes and see what files users can actually access.

Public links could be shared by giving out a key in the URL that is a file containing indexes to other blocks. So whoever knows the URL, knows the index, and can get the data.

That is the way I'd design it, at least... :)

edit:typo


Not sure, it doesn't seem to say anything about encrypting with the file hash and implies that metadata and actual data use the same key.


It could be achieved through convergent encryption see this presentation from last week http://crypto.stanford.edu/RealWorldCrypto/slides/tom.pdf


The user's javascript that encrypted the file can also make a hash send it to mega, and then they check if they already have that.

EDIT: They never get to see your key or what is in the file.


Ok, so I try to upload a.exe to Mega. They make a hash, detect someone has already uploaded it. They don't upload my file, and instead they place a link in my account to that "a.exe" of some other user. How can I access it then? Because it's encrypted with a key which is not mine.


The hashing is done by the client before upload. So if all clients use the same hash algo they will generate the same hash for the same file. So it is encrypted with a key that you know because you have the original file.


Deriving the key from the plaintext + the ciphertext is called a known-plaintext attack [1]. AES isn't vulnerable to this.

[1] http://en.wikipedia.org/wiki/Known-plaintext_attack#Present_...


It is if you know that the key is derived directly and deterministically from the plaintext itself.

Of course this will only give you the key to that one particular file, not any other files that you do not have yourself.


Ah I see this was in reference to your convergent encryption post above. Point taken.

Assuming that it works this way, it would allow Mega to figure out if you own a known "bad" file. Just take something like "New_Jay-Z_Album.zip," hash it, and try the hash against your encrypted files. It seems like Kim is trying to avoid this problem.


Note that they are likely to use something like 4MB chunks of the file rather than whole files. This prevents things like metadata/name differences creating a different hash for the whole file.


In that case wouldn't decrypt(encrypt(F, K1), K2) produce garbage?


Got to wonder if this is a bit of copy and paste from the old terms, although you would think they would have checked this quite thoroughly! Good spot.


Something doesn't smell right regarding the browser-based private key encryption. What I think is happening is that they could theoretically decrypt all your content, but the fact that they dont, and would have to jump through some technical hoops to do so, is maybe enough for them to argue that they don't know what is being uploaded?


From dev docs it looks like file integrity (and thereby duplicate check) is done with CBC-MAC:

"File integrity is verified using chunked CBC-MAC. Chunk sizes start at 128 KB and increase to 1 MB, which is a reasonable balance between space required to store the chunk MACs and the average overhead for integrity-checking partial reads."


Perhaps they de-duplicate using smaller block sizes? Something like 256 bytes.

Also, if you believe Microsoft Outlook, there's something called "compressible encryption", which implies there are encryption schemes that aren't exactly random, meaning in turn that not all N-bit blocks are equally likely.


even 256 bytes block is big enough to make collisions so rare that it's not even worth.

not to mention metadata associated with such a small block would be probably more than the size of the block itself.


Here's one solution: http://news.ycombinator.com/item?id=2461713

However given a file Mega would be able to tell whether they stored that file, eliminating the plausible deniability they're aiming for.


This is exactly how megaupload used to work. They use file hashes, i heard reports of some hash collisions meaning entirely different files were identified as duplicates...that was gamed though i believe, real life collisions less likely


maybe they store a hash along your strongly encrypted file. this way they could go after copy-righted files accross the system once reported at the cost of quasi-decrypting some of your files. your really unique files would still be save although you might not want them to be able to timestamp proof your possession of these neither.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: