

Remove watermarks from PDFs/papers with pdfparanoia - kanzure
https://github.com/kanzure/pdfparanoia

======
kintamanimatt
What is the motivation behind this?

~~~
dkroy
This is from the github page: "pdf watermark removal library for academic
papers"

~~~
kintamanimatt
I read that, but why? I've never been part of the magical world of academia,
and don't understand why watermark removal is something that's important.

~~~
kanzure
Author here (btw, like you I am not a part of the world of academia either).
These watermarks include your name, institution, and ip address on every page.
This is a huge privacy violation, especially when sharing research with
colleagues.

[http://scholar.google.com/scholar?q=%22Authorized+licensed+u...](http://scholar.google.com/scholar?q=%22Authorized+licensed+use+limited+to%22)

[http://scholar.google.com/scholar?q=Redistribution+subject+t...](http://scholar.google.com/scholar?q=Redistribution+subject+to+SEG+license+or+copyright)

[http://scholar.google.com/scholar?q=Redistribution+subject+t...](http://scholar.google.com/scholar?q=Redistribution+subject+to+AIP)

[http://scholar.google.com/scholar?q=Downloaded+from+http%3A%...](http://scholar.google.com/scholar?q=Downloaded+from+http%3A%2F%2Fpubs.acs.org+on)

[http://scholar.google.com/scholar?q=Downloaded+*+*+2001..201...](http://scholar.google.com/scholar?q=Downloaded+*+*+2001..2013+to+*)

More details: [https://groups.google.com/group/science-liberation-
front/bro...](https://groups.google.com/group/science-liberation-
front/browse_frm/thread/c68964cf55d8f6fa)

Even more details:
[https://groups.google.com/d/msg/diybio/o6irBNeCmrE/pqGJXMlQY...](https://groups.google.com/d/msg/diybio/o6irBNeCmrE/pqGJXMlQYa0J)

Also, Google seems to have deals with academic publishers based on ip address.
You can sometimes see Google in the watermarks in their own search index:

[http://scholar.google.com/scholar?q=filetype:pdf+google-
inde...](http://scholar.google.com/scholar?q=filetype:pdf+google-indexer)

Even stranger, Google Docs Viewer seems to remove watermarks from some pdfs,
but I have no idea why. Did they write their own watermark removal library?

evidence: [https://groups.google.com/group/science-liberation-
front/t/a...](https://groups.google.com/group/science-liberation-
front/t/a1cf073711b97b00)

~~~
jahewson
I presume this violates your license with the paper vendor?

~~~
kanzure
There are many complicated licensing relationships in this ecosystem.. do you
mean a particular ToS? a college's license with Elsevier/JSTOR/IEEE? a
scholar's use of library resources, presumably governed by some honor code or
something setup by a university? or something else?

------
luciannovo
Is there anything like this for stock images? There's a simple way(masking) to
do it with photoshop- but it's not automatic.

~~~
slapshot
It seems like the producers of stock photography have a library to do this;
there's a button on each stock photography page that will do it for you, but I
can't find source code for it. You just have to enter a sixteen-digit string
and a MM/YY verification for it to work. Reports are that certain 15-digit
strings will also work.

