Hacker News new | comments | show | ask | jobs | submit login

From the README, in case it wasn't obvious:

"These utilities make use of the deduplication scheme of Dropbox__ to allow for "teleporting" files into your Dropbox account given only a list of hashes, provided of course that the files already exist on their servers. This enables arbitrary, anonymous transfers of files between Dropbox accounts."

Between this and the minor information leakage issue I suspect Dropbox will be making changes to their deduplication scheme.

A simple way to fix both of these issues is to require each user to upload the complete file once, regardless of whether Dropbox already has it stored. Deduplication in storage and per-user uploading is still possible.

Also interesting to note is the Github repo for this has been deleted. Tarball of the source is still available.

Napkin-cryptographic way how Dropbox could fix this while still getting full deduplication: currently, when the client discovers that a file has been added locally, it sends hashes of 4MB blocks, and the server considers the file added.

Additional measure at that point: the server could challenge the client to provide the values of bytes at a couple of arbitrarily chosen byte offsets of the original file. (Could precompute that, provided the queries don't repeat often).

What would stop pirates from querying each other (maybe on some P2P network) for those random bytes?

Client A wants the file that Client B has so when Dropbox asks Client A for some random offset, Client A asks Client B in the background and relays the result to Dropbox.

It really depends on how far pirates would be willing to go.

Of course, Dropbox can't prevent people from sharing content out of band. But if Client A and Client B are offering arbitrary byte ranges to complete strangers, they are effectively playing BitTorrent again.

Yes, but they are only exchanging a constant amount of information to fool the server challenge, whereas we could hope to do better if the server builds challenges which use information that he knows the client has.

For some reason, this inspired me to write a blog post: http://a3nm.net/blog/deduplication_attacks.html and http://news.ycombinator.com/item?id=2489594

Does that mean that the current protocol allows users to steal arbitrary files given a hash?

For example if some web site charges per download of a file, but still has the hash posted publicly, you can try to "steal" it from someone who has it stored privately in Dropbox?

IOW, the file hash is equivalent to your account login/password combo [restricted to any given file]?

As far as I understand the original posting, you can download any file from Dropbox's servers if you know its Dropbox hash, which apparently is a sequence of SHA256 hashes of 4MB blocks.

If you have a sub-4MB sensitive file, and you publish its SHA256, and the Dropbox protocol applies the hash function in the same way as file hashing tools (e.g. doesn't include a tag meaning "this hash is computed particularly for Dropbox deduplication" into the SHA computation), yes, then apparently people can download your file.

However, I rarely see SHA256 checksums along with download links; more SHA1 and MD5.

That's still a bit worrying though; do people stop to consider that publishing a SHA256 hash bears the risk of being equivalent of publishing the file itself (assuming someone uploads it to dropbox)?

Another related attack could be to start with a known file (say, your employment contract), swap out the name with a colleague and generate a bunch of files with different salary amounts, essentially bruteforcing sha256 sums. If dropbox suddenly coughs up a file, you've revealed his salary!

Assuming you know the exact structure of the file this would be a perfectly valid attack. There could be a lot of variance in rich formats like PDF files from things like compression, etc, so this might be expensive to perform on non-plaintext files.

Dropbox effectively acts as an "existence oracle". You can't ask it to cough up a file you don't have, but you can ask it if a given file exists anywhere in the system.

This would be an effective way for law enforcement or copyright civil enforcement to check for content that is clearly illegal or a certainly copyright violation to possess. They would need to query for a set of hashes of the given illegal content. If any matches returned positive data, they would be able to issue a subpoena for all users who stored the given content in their dropbox folder and pursue them further.

> for content that is clearly illegal or a certainly copyright violation to possess

How can something be "clearly" a violation? If I have an album, but copy someone else's rip instead of making my own - is that "clearly" a violation? Alternatively if I used the same application, I'd probably obtain the exact same file - is that clearly a violation too?

(grooveshark kind of operates on the assumption that it's ok)

I'm thinking of something like a pre-release album, a theatre rip of a movie, etc. Not a rip of something legitimately licensed to you, but of something not officially released to the public.

The Perkeo database used by some German polices contains hashes of known child-porn image files. Probably not SHA256, though, given that it was started in 1998.

The employment contract scenario doesn't require download-by-hash, only deduplication. You could just measure the amount of network traffic the client needs to "upload" your file.

Just read the reappeared sourcecode (assuming it works as advertised): The hash is an SHA256 of pure 4MB blocks in the input file. They add no message type information which could prevent mixups between Dropbox-deduplication hashes and hashes computed for other purposes.

The following dropship file was assembled using only shasum, ls and vi:

         {"blocks": ["f3f754a5dcd93f271ad013a5ee84f495a36da84f152e0a1fec4646345b0c10d6"], "name": "ostseestrand.jpg", "size": 514779}
Could someone who has never shared files with me verify that it indeed produces a picture of a beach?

When I run dropship with a file containing the JSON you quoted, it prints, "('Oops, blocks are not known: %s', [u'8_dUpdzZPyca0BOl7oT0laNtqE8VLgof7EZGNFsMENY'])".

Yes, I've got the beach image in my Dropbox folder now :)

Yeah, Canon PowerShot A60 ;)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact