
Ask HN: Where to find a list of hashes for copyrighted material? - Fire-Dragon-DoL
Due to the recent law the EU signed, any product is required to perform some sort of content filtering of what users upload (assuming the data is available to the public).
As such, I need to perform some content filtering, but I can&#x27;t seem to figure out where I&#x27;d go to gather a list of hashes of copyrighted material. Is there any API, a list of hashes or can it be requested to someone?<p>I&#x27;m just one software developer and don&#x27;t represent any company.
======
ChrisGranger
I just don't see how hashing is going to prevent copyright violations when
motivated people can make insignificant edits to files to change their hashes,
_especially_ on text files (add a random string to the end), images (change
the HSL of a single pixel), and music (add an extra space in the ID3 tag). The
whole thing seems like a Sisyphean task. Any list of hashes will grow to
unmanageable size almost immediately.

~~~
nikonyrh
You'll need to rely on [https://en.wikipedia.org/wiki/Locality-
sensitive_hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing)
and fuzzy matching. I've heard Shazam works really well in most cases,
although I'm not sure whether they submit the raw recording or just
"fingerprints".

Still far from a trivial problem, especially for video.

~~~
ChrisGranger
Oh, interesting. I was unfamiliar with this, and thinking in terms of
cryptographic hashes needing to be identical.

------
nikonyrh
Sadly law makers aren't technical people, I don't think they have a clue on
how to implement such system. Basically only big players can afford the R&D
effort.

I think EU should build this as a free online service to which you send a
query in some compressed & hashed form and the API would reply whether that
piece is protected by copyright or not.

------
kayamon
Go to thepiratebay.org, download a torrent, look at all the hashes within.

(note that there is basically no way to realistically filter content based on
file hashes alone)

~~~
Fire-Dragon-DoL
That's problematic though, it's the only reasonable implementation (reasonable
as in: don't need to spend an insane amount of money on it) for any startup.
Performing video/audio/images or even text comparison it's a business on its
own. It sounds weird that it's not possible to get such a list

