
How Dropbox Knows When You’re Sharing Copyrighted Stuff Without Actually Looking - muzz
http://techcrunch.com/2014/03/30/how-dropbox-knows-when-youre-sharing-copyrighted-stuff-without-actually-looking-at-your-stuff
======
brownbat
The moral of the story is that you should pad all your files with some nonce.

EDIT: Sorry, the _moral_ is probably avoid copyright violations. The advice
above is more like the practical workaround. It's nonetheless useful as a way
to avoid overreach from rights holders in cases of false positives or
suppression of fair use.

~~~
meowface
Yep.

    
    
        echo a >> file.mp3
    

should suffice.

------
orthecreedence
How Turtl has no idea when you’re sharing copyrighted stuff:
[http://turtlapp.tumblr.com/post/81222024691/how-turtl-has-
no...](http://turtlapp.tumblr.com/post/81222024691/how-turtl-has-no-idea-when-
youre-sharing-copyrighted)

------
sentenza
No new info here. Dropbox de-duplicates using hashes, so they would use them
against a blacklist. The main problem with Dropbox still remains: They can
read your files if they want to and you have only their word that they don't.

For now I'm going with Bittorrent sync [1], at least until an equivalent
service with an open source comes along. (Would be nice if they open sourced
theirs, though.)

[1] [http://www.bittorrent.com/sync](http://www.bittorrent.com/sync)

~~~
termain
Also, SpiderOak.

~~~
JetSpiegel
I use SpiderOak, but they can still read your files, they hold your keys.

Something like tarsnap is the only paranoid-proof choice. It's not even
distributed as a binary file.

[https://www.tarsnap.com/](https://www.tarsnap.com/)

~~~
mathrawka
They cannot read your files, only your password can be used to decrypt them,
which is not even saved on their server.

[https://spideroak.com/zero-knowledge/](https://spideroak.com/zero-knowledge/)

[https://spideroak.com/engineering_matters](https://spideroak.com/engineering_matters)

~~~
LoganCale
If you enter your password into a system that they control at any time, they
have potential access to your files and can be ordered by the government to
change their code so that it does collect your password in order to enable
decryption of your files.

~~~
WildUtah
_and can be ordered by the government to change their code so that it does
collect your password in order to enable decryption of your files._

And they can be ordered to do so secretly with no notice, neither to company
management nor to you nor to any judicial body that could review the decision.

That's the lesson of Lavabit.

------
mbrutsch
How do they compute a hash of a file without actually looking at the file?
Magic?

~~~
Dylan16807
'looking' here means having a person look at the file or possibly having a
system run analysis* of the file. Neither happens.

Since you seem to be talking about having their systems touch any bytes of the
file, I have a more important question: How would you expect to have a file
synced with dropbox and shared via dropbox without dropbox touching the bytes?
Magic?

* you know what I mean by analysis, think gmail

~~~
TheLoneWolfling
Simple. Truecrypt (or whatever other encrypted filesystem you prefer) volume
on Dropbox. Files in partition.

Dropbox has no idea what is on the volume.

(In an ideal world, you'd have a filesystem set up that was a separate
encrypted file for each file in the virtual filesystem, with an additional
encrypted file that contained the virtual directory info. But this is
simpler.)

~~~
Dylan16807
Well sure if you use another program to encrypt the file then Dropbox can only
see the encrypted bytes, but that's not what mbrutsch was objecting to. A
particular truecrypt volume could still have a takedown related to it.

~~~
TheLoneWolfling
Not automatically via the method used here (hash blacklist)

~~~
Dylan16807
When I say 'particular' I mean someone notices the truecrypt volume being
shared either semi-publicly or multiple times and files a DMCA for the volume
itself. Then you have to alter it or make a new one.

------
bane
It's also a method dropbox can use to save storage space. Inevitably, two or
more users will put the same file in their DB. By computing and comparing
hashes they can simply just store one copy of the file and point all users to
that single instance of the file (they may do an exact match check too once
the hashes collide since hashes can have unintended collisions as well).

I have no idea how much efficiency this buys them, but the same mechanism can
be used against a blacklist of known copyright or illegal materials. In this
case, having a hash that matches something in the copyrighted materials list
AND sharing it triggers this action.

Just as easily, putting some child porn in your DB could also have them check
against illegal material hashes and trigger an automated notice to some law
enforcement agency or something similar.

This can even be used for National Security without violating the classified
contents of the documents. Say another Snowden-type grabs a bunch of
classified documents. The agency could give DB a list of hashes with
instructions to "call us" if a user uploads a bunch of leaked documents to
their DB. They could even do this blind, just send DB a hash of every
classified document or piece of data they ever produce and catch leakers
before anybody in the agency even knew something was leaked internally.

~~~
aroch
Say a company, with its 1000 1TB/user accounts, has ~600GB/user in real life
usage, since there's going to be a lot of cross over they could save upwards
of .5PB.

------
AnthonyMouse
The trouble with using hashes is that they have no context. Suppose I own the
copyright on some file and issue a DMCA takedown against somebody else sharing
it with the world. In that context the takedown is totally legitimate. But now
the hash is on the list, so what happens next time when the file is being
shared in some totally different context? What happens when the copyright
holder wants to share the file? What happens when some other third party wants
to share the file with a small number of people in a way which is
unambiguously fair use?

You end up over-blocking, unless you include some option to bypass the block,
in which case why even bother? It's not like you could inform the copyright
holder when someone shares the file without invading users' privacy and
opening up the system to abuse against political speech.

The whole thing is a mess because the fundamental problem is that the cost of
most infringement on the internet is less than the cost of evaluating whether
that use is infringing, so you end up with a high error rate no matter what
you do. It breaks the founding assumption that copyright is not in tension
with the First Amendment and makes you choose which you want to sacrifice for
the other.

------
voltagex_
This is a good time to note that ClearSkies [1], the open source BTSync
alternative, is having a fundraiser:
[https://www.bountysource.com/teams/clearskies/fundraiser](https://www.bountysource.com/teams/clearskies/fundraiser)

1: [https://github.com/jewel/clearskies](https://github.com/jewel/clearskies)

------
PhasmaFelis
What a bunch of doublespeak bullshit. "We're not _looking_ at your stuff,
we're only computing a hash of your stuff and comparing it to a blacklist!"
Right up there with "we're not recording your phone calls, we're only
recording your phone call _metadata_."

------
norswap
I wish more posts had great summaries like that at the start.

------
ramonex
doh

