
Show HN: Make Your PDF Look Scanned - baicunko
http://www.scanyourpdf.com
======
jmwilson
From the github repo, the site is a wrapper around exactly two shell commands.
Instead of uploading your data to an untrusted site, you can run from the
comfort and safety of your local computer:

    
    
      convert -density 150 input.pdf -colorspace gray -linear-stretch 3.5%x10% -blur 0x0.5 -attenuate 0.25 +noise Gaussian -rotate 0.5 temp.pdf
      gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=LeaveColorUnchanged dAutoFilterColorImages=true -dAutoFilterGrayImages=true -dDownsampleMonoImages=true -dDownsampleGrayImages=true -dDownsampleColorImages=true -sOutputFile=output.pdf temp.pdf

~~~
cs702
Try this one-line ImageMagick command to make COMPACT pseudo-scanned files:

    
    
      convert -density 150 ORIGINAL.pdf -colorspace gray +noise Gaussian -rotate 0.5 -depth 2 SCANNED.pdf
    

Consider using `-depth 1`, `-depth 3` as a final parameter to map colors to
only 2¹=2 or 2³=8 instead of 2²=4 gray levels. Using a small number of gray
levels SIGNIFICANTLY reduces file size and also gives your pseudo-scanned
document a more pixelated, it-just-came-out-of-my-old-printer look.

Also consider using `-density 100` or even `-density 75` for long text
documents. Using a density of 75 dpi produces documents that are 4x smaller
than 150 dpi (75²=150²/4) and doesn't affect the readability of normal-sized
(10-12pt) text that much.

Finally, sometimes it works best not to add Gaussian noise.

~~~
miles
Rather than making a COMPACT version, your command created a file over twice
the size of one created using the aforementioned

convert letter.pdf -colorspace gray \\( +clone -blur 0x1 \\) +swap -compose
divide -composite -linear-stretch 5%x0% -rotate 1.5 as-scanned.pdf

However, that may be a useful feature, since many users end up inadvertently
creating very large PDFs when scanning.

~~~
Moru
If you lower the resolution those auto-scanning services that does OCR on the
documents will get problems converting the tex back to normal text. Which
might or might not be the point of doing this in the first place... :-)

~~~
mwcampbell
Just remember that if you make it harder for OCR to produce accurate text from
an image, you also make it harder for blind people to read it.

------
miles
Show HN: FalsiScan – Make it look like a PDF has been hand signed and scanned
(770 points, 34 days ago)
[https://news.ycombinator.com/item?id=22811653](https://news.ycombinator.com/item?id=22811653)

~~~
derwiki
Thought this sounded familiar.

------
baicunko
I recently came across a couple of institutions which required me to print,
sign and send back a couple of documents. COVID and all of that means I don't
have a printer at home. I made this website by inspiration from other posts
here and now it's free to use! Code is open source so feel free to comment any
new ideas or things you would like includede!

~~~
pmiller2
What do you mean "required"? Like, they wouldn't accept a clean, non-scanned
copy? That's absurd.

~~~
BillinghamJ
Some types of documents/deeds do require "wet ink" signatures by law -
[https://www.lawsociety.org.uk/support-
services/advice/articl...](https://www.lawsociety.org.uk/support-
services/advice/articles/signing-and-exchanging-documents/)

~~~
yosito
7 years of being a digital nomad, and signing every document digitally, and I
never ran into this. TIL.

~~~
blaser-waffle
Unless there is a serious challenge to the contract no one cares. You
obviously are okay with the contract and are signing it (if only in an
obfuscated, non-physical sense), and they're just looking for a signature that
could be kinda real -- if you're good with the deal, and they're good, who
cares?

------
ArneVogel
Edit: fixed now.

Original: Please don't upload any private or confidential pdfs right now. I
emailed OP two security concerns that trivially allow anybody to see any of
the converted pdfs.

~~~
lewiscollard
It's still far short of being suitable for use of any private documents.

[https://github.com/baicunko/scanyourpdf/blob/master/pdfwebsi...](https://github.com/baicunko/scanyourpdf/blob/master/pdfwebsite/views.py#L45)

This is rather less than secure; output files are named, e.g.,
"Scan_2020512_{four random lower-case letters}.pdf" into a web-server-readable
directory.

That gives a total of 456976 different possible filenames on a day. It's more
than feasible to brute-force that many filenames in the hour before files get
deleted.

OP: I don't think randomly-suffixed file names are an inherently bad way to
approach this. But you should definitely consider using a longer random
string, and definitely consider not using the `random` module too (it is not
secure and is not intended to be).

~~~
baicunko
Thank you for the comments. I agree with you, I will decrease how long the
file is in the server (I just hit 40gb from hacker news) as well as implement
rate limiting to prevent any brute force

~~~
lewiscollard
Rate limiting (if by that you mean at the firewall or the web server) is not
the way to do it. That shifts the problem somewhere else in the stack, into a
place that isn't under version control in the same repository.

Consider: If you moved this on to another server, would you remember to enable
rate limiting there? If someone else uses your code, will they know to enable
rate limiting?

Rate limiting isn't a bad idea, but your security should not depend on it,
especially as you have a way of securing it in your application.
base64.b16encode(os.urandom(8)) will give you a 64-bit, filename-safe, as-
close-to-random-as-reasonable suffix that should be long enough to make it
brute-force-proof :)

The same reasoning applies to the cron job (I presume) that is cleaning your
files - that's something you have to remember to set up for future
(re-)deployments.

Edit: I'd also like to add that showing your code on HN takes bravery and this
is, in fact, a neat tool that solves a problem I really wish _didn 't_ exist.
So, good work on both counts :)

~~~
bwindsor
Or use secrets.token_urlsafe
[https://docs.python.org/3/library/secrets.html#secrets.token...](https://docs.python.org/3/library/secrets.html#secrets.token_urlsafe)

~~~
lewiscollard
Hey thanks - somehow, I entirely missed this being a Thing in the stdlib in 3.
:) I've used it in my PR on baicunko's repo.

------
atum47
Great you decided to share the source code, but then I was able to see that
you let the admin session enabled. you can disable that on production

[https://stackoverflow.com/questions/4845239/how-can-i-
disabl...](https://stackoverflow.com/questions/4845239/how-can-i-disable-
djangos-admin-in-a-deployed-project-but-keep-it-for-local-de)

~~~
baicunko
Thanks! I will implement this once the traffic from HackerNews decreases a bit
(server is getting totally hammered).

Still there's no admin user configured so it's safe

------
yourapostasy
Not to detract from this, because it is brilliant, and I'll definitely use in
the future as a last resort.

Before resorting to this, I've found that if I convert the PDF to an image,
and send it as a TIFF file, that is usually what the organization's people are
looking for. I haven't had to do that for years now.

On the extremely rare occasions someone asks if I signed it on "real paper"
(lol), I say with a straight face, "yep, I'm a computer guy, I have a really
good scanner and image software". I do. It's just gathering dust. Last time
that happened was about 5 years ago.

Over 20 years ago, I wrote my signature in thick, black Sharpie across an
entire letter-sized, landscape-orientation page, scanned it with the highest
resolution scanner I could cadge at the time (600 dpi, wooo!), laboriously
cleaned it up, added an alpha channel, then even more laboriously vectorized
it. Ever since then, dropping my signature into PDF's has worked except for
those situations where a physical, wet-signed notarized document was required.

At first I took to the trouble to convert the resultant PDF into TIFFs and
digitally sign them. Then with some experimentation I found that flattened and
stripped PDFs without the digital signature were accepted without comment.
Further experimentation revealed to me that only developers like us could even
tell the difference, and plain PDF's where I dropped the signature into them
are accepted these days.

Now, I use an Acrobat DC stamp that I converted from the vectorized form, and
haven't touched the old bitmap or vectorized versions in years. Ironically,
the most secure option of digital signatures gave me the most problems.

------
switz
Impressed to see this is your first open source project! What a fantastic
blend of simplicity, technology, and wit.

I love it.

Original (was PDF):
[https://i.imgur.com/v5nn1ql.png](https://i.imgur.com/v5nn1ql.png)

Processed:
[https://www.scanyourpdf.com/media/Scan_2020512_wegb.pdf](https://www.scanyourpdf.com/media/Scan_2020512_wegb.pdf)

~~~
xiconfjs
uploaded again:
[https://www.scanyourpdf.com/media/Scan_2020512_oqkk.pdf](https://www.scanyourpdf.com/media/Scan_2020512_oqkk.pdf)

------
dkonofalski
I can't believe that I'm saying this but this is soooo needed. It's ridiculous
to me how many organizations still require hand-signed copies as if that is
somehow a deterrent to anything.

~~~
giarc
I have to submit my work hours in an excel sheet. There is a section for
"Signature". I didn't put anything since it's an electronic file and my email
sending the file should serve as a "signature". However, my manager insisted I
use a cursive font to type out my signature.

~~~
Faaak
What if your signature can't be spelled (mine doesn't mean anything, it's just
random symbol) ? Ridiculous

------
supernova87a
Not saying this site is, but it makes me think of all the (less legit) file
conversion websites which are basically portals to harvest your documents (or
your aging parents' documents that they don't otherwise know how to convert),
and you later find they appear on crappy sites like Scribd. Or worse.

~~~
renewiltord
I use these to 'accidentally' upload marketing collateral. Got some clickbacks
through unique UTMs so I know it works.

~~~
eob
Any chance you could elaborate a bit? This sounds interesting but I'm not in
the know enough to read between the lines.

You were trying to see if someone was sniffing the documents uploaded (and
confirmed they were).... or you realized you could use them as a vector though
which others would post your materials on websites elsewhere (and they did)?

~~~
renewiltord
The latter. It goes all over the place, so that's cool.

------
WalterBright
I find that digital books are simply too perfect. There should be pdf fonts
where there are maybe 10 incantations of each character, and the display:

1\. picks one incantation randomly for each display 2\. slightly and randomly
alters the position/rotation of each character 3\. adds a tiny blotch now and
then

Like the print in a real book, especially ones printed before 1970.

I also suggest that the background be an actual scanned image of a blank piece
of paper. Those "paper color" backgrounds are too perfect. Take some blank
pages out of an older book sometime and scan them, and you'll see what I mean.

~~~
jjoonathan
Most of the "book simulation" features I've seen (background textures, page
turn animations, the like) have come across as gimmicky and useless, meanwhile
digital books tend to still suffer from conceptually simple formatting
problems like poor responsive text reflow or baking detailed vector figures
into tiny JPEGs.

I'd settle for "too perfect" in a heartbeat.

~~~
WalterBright
I've sent many suggestions to the Kindle people of things they could improve
on the Kindle, all of which were very simple to do. The years go by, they've
done exactly 0 of them.

One of them, for example, was an option to eliminate the margin in a pdf
display. The pdf already has a margin, so there's the pdf margin plus the
margin the ereader puts around the pdf. This significantly reduces the number
of pixels displaying information.

~~~
bargle0
Making PDF display better doesn't make them any money.

~~~
hombre_fatal
More likely, there are multiple sev5 issues for it in the ticket system that
nobody will _ever_ get around to implementing.

------
j_4
Haha, great job. There's also something to be said for the grim humour of how
technology led us to this point.

Sadly, I'd also be extremely wary of sending the kind of documents that I need
to print out and sign through some server-side black box.

~~~
baicunko
Thanks, I totally agree. That's why I made the code public so you can see
what's being run. A friend very privacy-minded told me maybe a desktop app
could be used by those who don't want to upload their documents so that's
something I'm currently exploring

~~~
hunter2_
> made the code public so you can see what's being run

The thing is, sharing a repository does not prove that the server is running
that same code. And someone worried about their document security wouldn't run
some random binary locally either, because it could send the document off to a
server. They would run the source code locally after reading it, which sharing
the repository allows for.

------
raldi
Suggestion: Include some before/after examples

~~~
jaifraic
This. I had no idea what this did until I read a couple of HN-comments. The
short introduction could mean a couple of things:

Just downgrading the pdf? Looking for a signature-like part and turns this to
pseduo-handwritten characters, maybe changing the color? Something completely
different?

Naturally, I did not want to upload a potentially confidential document to
some random webservice.

I actually had a pdf file containing several pages of "Lorem ipsum" that I
needed for another thing, but I deleted it yesterday because I was done with
it.

------
jduckles
I've extracted the oneliner command that runs this into a gist of a simple
bash script. I don't want to send my PDFs to an unknown server. Also modified
a bit (density and output compression) to reduce file size.
[https://gist.github.com/jduckles/29a7c5b0b8f91530af5ca3c22b8...](https://gist.github.com/jduckles/29a7c5b0b8f91530af5ca3c22b897e10)

~~~
baicunko
Great idea, I will probably include your gist in the original GitHub
mentioning you

------
9nGQluzmnq3M
Neat site, but is this really necessary? I switched to digital-only PDFs (edit
online & slap in image of signature) a long time ago, without doing any
obfuscation to make them look "real", and I've never gotten any pushback from
the various government agencies, banks, insurance companies etc that insist on
signed & scanned forms.

~~~
andrewingram
I tried to get a refund from MyProtein about 18 months ago because my order
never arrived. I filled in the refund pdf and slapped on a previously scanned
copy of my signature that’s stored in the MacOS Preview app. They rejected it
and said it had to be my real signature. I was really annoyed with them and
haven’t shopped there since.

------
doc_gunthrop
It looks like what is being used for the transformation is:

1) set to grayscale (optional), 2) add blur, 3) slight rotational tilt, 4) add
gaussian blur (?)

You can go one further by randomly adding tiny artifacts (ie. specks) to add
even more realism. Maybe even a simulated crease in a corner.

~~~
baicunko
I will check regarding artifact. Also someone else mentioned to add random
rotational tilt per page to ensure it looks more "legit"

------
secfirstmd
This is cool and would be useful from time to time with some stupid
organisations that insist for some reason on full scanning.

Perhaps one suggestion. Can you update your documentation a bit to make it
easier for someone to be able to implement it themselves? There's not much
about that on the Github and I would guess some people would rather run their
own locally.

------
chmaynard
Apparently I'm the only person on HN that doesn't understand what a "scanned
look" means, and the author doesn't provide any images to illustrate. Could
someone enlighten me?

~~~
baicunko
Sure! I tried to simulate the print, sign and scan of documents to avoid
having to do it.

Sample PDF: [https://campustecnologicoalgeciras.es/wp-
content/uploads/201...](https://campustecnologicoalgeciras.es/wp-
content/uploads/2017/07/OoPdfFormExample.pdf)

Output PDF:
[https://scanyourpdf.com/media/Scan_2020513_gtqh.pdf](https://scanyourpdf.com/media/Scan_2020513_gtqh.pdf)

EDIT: Forgot to mention that a before and after will be included in the
website as it has been mentioned multiple times as a great way of showing what
the website does!

------
camillomiller
I live in Germany and I never had anyone telling me that a PDF signed
digitally wasn’t enough, especially if they expect you to e-mail it back to
them. Is this a US problem?

~~~
sabertoothed
Just had it in Germany with an application to a bank (DKB). Digitally signed
one was rejected. I need to print on paper, sign and then scan. Signing the
PDF in an iPad app was not OK.

~~~
camillomiller
Oh, interesting, then I guess I was just lucky. DKB is indeed well known for
being quite old-school. I'll add that in my case the need for a printed and
hand-signed copy was always connected to shipping the document or faxing it.
Never through email.

------
davchana
What I do, is print the PDF as image, with 60% jpg quality. JPG artifacts make
it look like normal quality scan.

I have my 3-4 copies of signatures as font file, along with initials.

------
pingec
Is this project different in some way from many already existing solutions
that do the same?

I like that it is open source and in theory possible to self host since I
really wouldn't want to upload my documents anywhere.

I would really like to know if a similar solution exists that is very easy to
run locally or if it runs in the browser it does everything client-side?

~~~
baicunko
I like to think this solution is way more friendly than using terminal for 99%
of people. What others have suggested is to develop a stand-alone app for more
private documents which can't be sent somewhere else

~~~
pingec
I didn't mean the terminal. Typing "make pdf look like scanned" into google
returns me many websites with same functionality.

------
SanchoPanda
Your convert and gs mastery is truly impressive, great job and great project.

------
miki123211
I believe printing, signing and scanning should be a punishable crime. It has
a terrible impact on accessibility, as the whole text layer of the PDF is lost
and it becomes unreadable for screen readers.

------
fabatka
Slightly related: add artificial coffee stains to LaTeX documents:
[http://hanno-rein.de/archives/349](http://hanno-rein.de/archives/349)

------
morisy
Only semi-related, but I thought their was an open source PDF-flowing tool
featured on hacker news a while back (that turned PDF into responsive HTML).
Anyone know of something like that?

------
terrycody
Really question is in some cases, you still need to sign your signature and
then scan the document again, I hope this website can let you add signature
automatically...

------
mapster
For signing docs I open PDF in Illustrator and place my signature image then
save and send. Are they really looking for signs of document having been
scanned?

------
thesis
Pretty cool!

What I normally do in a pinch is use Google Drive.. it has a "scan" option
that you can take a picture with your phone.

~~~
jeffbee
FWIW the output can be significantly better using PhotoScan app, also from
Google, no idea why they don’t integrate them.

------
hawaiian
Needs options for adding dog ears and a shadow cast from the top of the page,
and maybe stapler holes.

------
aasasd
Funny thing is, some countries require documents to be signed on paper and
then scanned. E.g. Japan.

------
Havoc
What are the legitimate uses of this? The only uses I can think of are less
than kosher.

~~~
thanksforfish
Maybe you are at home and you need to sign something, but you don't have a
printer and scanner because its 2020 and those are seldom needed. You can use
this and move on with your life, or drive to a Kinkos or other place that will
let you print and scan for a small fee. The latter may be a waste of time.

Wait, what less than kosher method were you thinking?

------
lilblockchains
That's a pretty cool project. Do you mind explaining your deployment process?

~~~
baicunko
I will include a more in-depth deployment process in Github to simplify
implementing this by anyone

------
electriclove
Can this be done client side so the PDF does not need to be uploaded anywhere?

~~~
baicunko
I will implement a desktop stand-alone app for those private documents which
can't be uploaded somewhere else!

------
sergiotapia
can you put an example picture side by side, right on the frontpage

~~~
baicunko
Yes, I think it's a great idea. I'll work on this through the weekend and
update the website with other feedback received. Thanks

------
krick
Seriously, how stupid it is that this can be actually useful.

------
greenknight
You may want to purge the key in your repo (settings.py)

------
IRM
Hav somebody download ROBLOX on a school computer

------
underlines
Why not deleting the file after 1 download?

~~~
baicunko
I thought someone might want to share their document with someone else. I may
include in a future release a max number of downloads (i.e. 2 or 30 minutes
and then delete)

------
ecnahc515
Also semi-related:
[http://ismycreditcardstolen.com/](http://ismycreditcardstolen.com/)

------
bastard_op
I just edit files with masterpdf under the linux free version, and send them
back with a transparent png of my signature merged. Good enough.

------
talkinghead
example on landing page would be great

------
ladybro
Genius. Nice work.

------
IRM
have somebody download Roblox on school compoter

------
abiogenesis
Obligatory xkcd reference: [https://xkcd.com/1683/](https://xkcd.com/1683/)

------
behnamoh
To OP:

the server is down.

~~~
baicunko
Seems OK on my side. What are you seeing?

~~~
behnamoh
it works fine now.

Just one comment: maybe you could randomize the rotation angle so that all
pages don't look the same.

