From the github repo, the site is a wrapper around exactly two shell commands. Instead of uploading your data to an untrusted site, you can run from the comfort and safety of your local computer:
Consider using `-depth 1`, `-depth 3` as a final parameter to map colors to only 2¹=2 or 2³=8 instead of 2²=4 gray levels. Using a small number of gray levels SIGNIFICANTLY reduces file size and also gives your pseudo-scanned document a more pixelated, it-just-came-out-of-my-old-printer look.
Also consider using `-density 100` or even `-density 75` for long text documents. Using a density of 75 dpi produces documents that are 4x smaller than 150 dpi (75²=150²/4) and doesn't affect the readability of normal-sized (10-12pt) text that much.
Finally, sometimes it works best not to add Gaussian noise.
If you lower the resolution those auto-scanning services that does OCR on the documents will get problems converting the tex back to normal text. Which might or might not be the point of doing this in the first place... :-)
The output file size relative to other approaches will depend on the contents of the input file (e.g., does it embed any bitmaps, or is it only text in non-rasterized fonts?), the density you use (e.g., 150 dpi is 4x greater than 75 dpi), the bit depth you use (3 bits is 4x greater than 1 bit), and whether you add noise or not. Also, the one-line command above your comment is short/easy to remember and reasonably fast -- which is useful if you're impatient like me.
All that said, note that you can always add a `-compress <compression method>` to the convert command, although this could make it noticeably slower, say, for a directory full of long documents. There's even a Fax TIFF compression method that makes the document look like it came out of one of those fax machines you see in movies from the 1980's. To get a list of supported compression methods, type `convert -list compress` at the command line.
Anyone unsophisticated enough to require you to print and handwrite a signature is also unsophisticated enough to not realize that the PDF size is a tell.
In summary you could say yes but I wanted to simplify the process for the not-so-techy person. I will work on a Standalone app that simplifies the command and shows a nice GUI for more private documents
I hope you don’t take the comment you’re responding to as anything negative. I’m a software engineer and I would use this myself rather than try and remember some arcane command line utility.
Not at all, I'm actually quite happy with all the comments, ideas and feedback I've received on this, really encourages me to keep improving the project
For other noobish windows users like myself, you can simply dump the following into a bat file. When run, it'll look for a file called "scanThis.pdf" and apply the conversion with ImageMagick.
IF EXIST scanThis.pdf (magick convert -density 100 scanThis.pdf -colorspace gray +noise Gaussian -rotate 0.5 -depth 2 SCANNED.pdf) ELSE (ECHO File scanThis.pdf not found & PAUSE)
Nope that's it. The bat file here is just a clickable shortcut so you don't have to remember the whole command. The interesting stuff is happening in ImageMagick, which has to be installed already.
I recently came across a couple of institutions which required me to print, sign and send back a couple of documents. COVID and all of that means I don't have a printer at home. I made this website by inspiration from other posts here and now it's free to use! Code is open source so feel free to comment any new ideas or things you would like includede!
> COVID and all of that means I don't have a printer at home.
I recently felt privileged enough to get a printer at home in the workroom of my 2bd apartment in San Francisco.
And yet, it ran out of toner! The notaries wanted documents already printed! The print shops are all closed! What the deuce!
I got one to sympathize with me, she wouldn't take my flash drive, but ran out of excuses when I said I have the files on my iphone's file section and could email them.
Unless there is a serious challenge to the contract no one cares. You obviously are okay with the contract and are signing it (if only in an obfuscated, non-physical sense), and they're just looking for a signature that could be kinda real -- if you're good with the deal, and they're good, who cares?
But scanning these documents defeats the point. The law isn't saying "it must be signed with wet ink, but after that a scanned copy is sufficient", it says you need the actual paper with the wet ink signature.
> If you’re a solicitor responsible for registration, you need the other parties' solicitors, on closing, to send you the wet-ink signed parts of the documents. You’ll only proceed when you have the wet-ink signed documents, not when you have seen electronic images of them.
I usually sign documents on my iPad but in this case they told me the signature had to be from a pen. COVID and everything means I don't have a printer nor a scanner at home so I developed this to "scan my iPad signed documents" worked like a charm!
It's interesting that the problem they had was with the signature. You can very easily make a handwritten signature on an iPad. I don't see how there's any practical way to tell the difference between a high-quality, handwritten signature done on an iPad with either a finger or an Apple Pencil and a scanned signature done with a pen.
Are you sure that's not just some excuse they gave you?
Interestingly, there are LaTeX packages for coffee stains[1] and to simulate the variability of an actual typewriter[2]. I suppose both of these involve discrete objects (the coffee stain, individual characters) and photocopy/scan effects involve 'fuzzing' of the whole image, rather than any particular element, and TeX isn't really set up for the former.
Banks that need monthly financials to assure regulators that they are servicing their loans are also required to justify the documents to the regulator. If you just autogenerated it perhaps they’re wrong or falsified. While if you have to take out a pen and sign them that won’t happen.
That's beyond absurd. The content of the document is what determines whether it's falsified or not, and you can inspect the content for inconsistencies on a generated PDF far more easily than a scanned one.
In this case I think it’s the business auditors (pwc etc) assuring “best practices” that will satisfy the regulators and thus the regulatory auditors (Federal reserve, treasury, FDIC...).
Banks are themselves full of absurd practices but in this one tiny area I’d give them a pass.
Let me clarify by stating that I am not necessarily claiming that there is/was a software issue with DocuSign. There are also some big procedural differences between signing a hard copy and the way most electronic signature systems work. I could have been fooled by the user.
Regardless, I am sure there has been a version of a some document signing software somewhere that had a bug with immutability. Now paper documents aren't immutable either, but you couldn't accidentally change one with a bad where clause in a SQL statement, and casual attempts will leave evidence of the change.
BTW this is why numbers in contracts are often written “1 (one)”: because in the age of pens it was easy to change “1” to “10” or “4” or “one” to “none”
Also why checks do this — checks are actually simply contracts, which were standardized in the US a bit in the 1920s (you’d get a check form, fill in your name and the bank name, then the amounts, etc; current check design With preprinted bank info dates back I believe to the late 1960s.
I had the same issue recently. What I did is sign a blank paper, take a pic of it with my phone and copypasted the text over the blank page with The Gimp.
I used to do that, but nowadays I tend to sign things on iPad. When using the stylus it looks exactly like it was hand signed (even without the stylus, I can just zoom way in and make it look kinda hand signed). The pdf is of course not scanned looking, but nobody had complained yet.
I am fairly close to various expat communities in my city. "Where can I print this document" is a very common question. It also affects tourists who need printed tickets that are only available a few days before their flight.
it occurs to me that you could do this with a lower tech method, just sign your name on a piece of scotch tape and align the tape on the monitor over the document, take a photo with your phone and then use that as the 'scanned image' in the pdf file.
Original: Please don't upload any private or confidential pdfs right now. I emailed OP two security concerns that trivially allow anybody to see any of the converted pdfs.
This is rather less than secure; output files are named, e.g., "Scan_2020512_{four random lower-case letters}.pdf" into a web-server-readable directory.
That gives a total of 456976 different possible filenames on a day. It's more than feasible to brute-force that many filenames in the hour before files get deleted.
OP: I don't think randomly-suffixed file names are an inherently bad way to approach this. But you should definitely consider using a longer random string, and definitely consider not using the `random` module too (it is not secure and is not intended to be).
Thank you for the comments. I agree with you, I will decrease how long the file is in the server (I just hit 40gb from hacker news) as well as implement rate limiting to prevent any brute force
Rate limiting (if by that you mean at the firewall or the web server) is not the way to do it. That shifts the problem somewhere else in the stack, into a place that isn't under version control in the same repository.
Consider: If you moved this on to another server, would you remember to enable rate limiting there? If someone else uses your code, will they know to enable rate limiting?
Rate limiting isn't a bad idea, but your security should not depend on it, especially as you have a way of securing it in your application. base64.b16encode(os.urandom(8)) will give you a 64-bit, filename-safe, as-close-to-random-as-reasonable suffix that should be long enough to make it brute-force-proof :)
The same reasoning applies to the cron job (I presume) that is cleaning your files - that's something you have to remember to set up for future (re-)deployments.
Edit: I'd also like to add that showing your code on HN takes bravery and this is, in fact, a neat tool that solves a problem I really wish didn't exist. So, good work on both counts :)
I know this doesn't really add much to the discussion, I just wanted to let you know I really, really appreciate HN over other sites for comments like this. Ones that help you learn something new in a really intuitive and on top of that "non-condescending" way (for lack of a better word I can think of). Thank you!
Hey, I know I was not as positive & encouraging as I should have been initially, hence the edit on the end. But thank you for the kind words mate, that actually means a lot to me. <3
thanks! took me quite a while to prepare as I read a bunch of other servers failing catastrophically when posting on HN due to the sheer amount of traffic.
I will start working on your comments throughout the weekend, I agree with most of them. Would love for you to follow the github page for any other comments you may have, all are appreciated
It was while reading my above comments that I realised I should have shut up and contributed code instead, because that's definitely more helpful than being critical on HN, especially to a newcomer & their first project.
So that is what I've decided to do! First step: a PR coming out of getting this up and running on my Ubuntu box. :)
Dotenv libraries are just for dev and other similar environments. In production you should still use normal environment variables (or whatever system you use to load your configuration), as dotenv files stay on the filesystem and sometimes even committed to your SCM.
haha this is like those domain name search websites that just automatically register the good sounding domain names for themselves once the user types it in.
do you OP! I think it still provides a service, enjoy all the secrets
I thought github had hooks for this kind of thing now? I remember it caught a private key I tried to push to a similar django repo (not for a prod site or anything), and that was about 2 years ago
Not to detract from this, because it is brilliant, and I'll definitely use in the future as a last resort.
Before resorting to this, I've found that if I convert the PDF to an image, and send it as a TIFF file, that is usually what the organization's people are looking for. I haven't had to do that for years now.
On the extremely rare occasions someone asks if I signed it on "real paper" (lol), I say with a straight face, "yep, I'm a computer guy, I have a really good scanner and image software". I do. It's just gathering dust. Last time that happened was about 5 years ago.
Over 20 years ago, I wrote my signature in thick, black Sharpie across an entire letter-sized, landscape-orientation page, scanned it with the highest resolution scanner I could cadge at the time (600 dpi, wooo!), laboriously cleaned it up, added an alpha channel, then even more laboriously vectorized it. Ever since then, dropping my signature into PDF's has worked except for those situations where a physical, wet-signed notarized document was required.
At first I took to the trouble to convert the resultant PDF into TIFFs and digitally sign them. Then with some experimentation I found that flattened and stripped PDFs without the digital signature were accepted without comment. Further experimentation revealed to me that only developers like us could even tell the difference, and plain PDF's where I dropped the signature into them are accepted these days.
Now, I use an Acrobat DC stamp that I converted from the vectorized form, and haven't touched the old bitmap or vectorized versions in years. Ironically, the most secure option of digital signatures gave me the most problems.
I can't believe that I'm saying this but this is soooo needed. It's ridiculous to me how many organizations still require hand-signed copies as if that is somehow a deterrent to anything.
I have to submit my work hours in an excel sheet. There is a section for "Signature". I didn't put anything since it's an electronic file and my email sending the file should serve as a "signature". However, my manager insisted I use a cursive font to type out my signature.
Related, I was once applying for an apartment lease. I had entered an incorrect value on some field on one of the forms. The kind person emailed me the form pointing out the mistake and told me I would have to fix it before they could accept the application and would have to come back in.
Not wanting to drive a few hours, I exported the unsigned pdf as images, quickly made the fix, converted it back pdf, and sent it away with a message saying not to worry, I fixed the problem.
Then communication ceased. I couldn’t get a reply. When I called, they put me on hold and said they were no longer taking applications (they claimed many units were available before).
I like to think the pristine color and noise matching, from my especially mediocre photoshop skill, was too convincing, and made them worry.
Not saying this site is, but it makes me think of all the (less legit) file conversion websites which are basically portals to harvest your documents (or your aging parents' documents that they don't otherwise know how to convert), and you later find they appear on crappy sites like Scribd. Or worse.
Any chance you could elaborate a bit? This sounds interesting but I'm not in the know enough to read between the lines.
You were trying to see if someone was sniffing the documents uploaded (and confirmed they were).... or you realized you could use them as a vector though which others would post your materials on websites elsewhere (and they did)?
I find that digital books are simply too perfect. There should be pdf fonts where there are maybe 10 incantations of each character, and the display:
1. picks one incantation randomly for each display
2. slightly and randomly alters the position/rotation of each character
3. adds a tiny blotch now and then
Like the print in a real book, especially ones printed before 1970.
I also suggest that the background be an actual scanned image of a blank piece of paper. Those "paper color" backgrounds are too perfect. Take some blank pages out of an older book sometime and scan them, and you'll see what I mean.
Most of the "book simulation" features I've seen (background textures, page turn animations, the like) have come across as gimmicky and useless, meanwhile digital books tend to still suffer from conceptually simple formatting problems like poor responsive text reflow or baking detailed vector figures into tiny JPEGs.
I've sent many suggestions to the Kindle people of things they could improve on the Kindle, all of which were very simple to do. The years go by, they've done exactly 0 of them.
One of them, for example, was an option to eliminate the margin in a pdf display. The pdf already has a margin, so there's the pdf margin plus the margin the ereader puts around the pdf. This significantly reduces the number of pixels displaying information.
Yes. People wrongly argue that vinyl sounds are more accurate. CDs are more accurate. But CDs are perfect, and for us older folks, the sound of vinyl, with its scratches, pops, crackle, rumble, and crosstalk has a comfortable nostalgic appeal.
Thanks, I totally agree. That's why I made the code public so you can see what's being run. A friend very privacy-minded told me maybe a desktop app could be used by those who don't want to upload their documents so that's something I'm currently exploring
> made the code public so you can see what's being run
The thing is, sharing a repository does not prove that the server is running that same code. And someone worried about their document security wouldn't run some random binary locally either, because it could send the document off to a server. They would run the source code locally after reading it, which sharing the repository allows for.
This is very cool, I have the same concern. Most documents I need to scan like this have my PHI or PII on them and I wouldn't trust uploading to a third party, and especially free, site. What I will add to the conversation is that seeing the source code and trusting that you're running that exact code are different thing. Would love to see this as something I could run locally after cloning the repo, as a python or npm script or some. Very cool work overall!
This. I had no idea what this did until I read a couple of HN-comments.
The short introduction could mean a couple of things:
Just downgrading the pdf? Looking for a signature-like part and turns this to pseduo-handwritten characters, maybe changing the color? Something completely different?
Naturally, I did not want to upload a potentially confidential document to some random webservice.
I actually had a pdf file containing several pages of "Lorem ipsum" that I needed for another thing, but I deleted it yesterday because I was done with it.
I've extracted the oneliner command that runs this into a gist of a simple bash script. I don't want to send my PDFs to an unknown server. Also modified a bit (density and output compression) to reduce file size. https://gist.github.com/jduckles/29a7c5b0b8f91530af5ca3c22b8...
Neat site, but is this really necessary? I switched to digital-only PDFs (edit online & slap in image of signature) a long time ago, without doing any obfuscation to make them look "real", and I've never gotten any pushback from the various government agencies, banks, insurance companies etc that insist on signed & scanned forms.
I tried to get a refund from MyProtein about 18 months ago because my order never arrived. I filled in the refund pdf and slapped on a previously scanned copy of my signature that’s stored in the MacOS Preview app. They rejected it and said it had to be my real signature. I was really annoyed with them and haven’t shopped there since.
This is cool and would be useful from time to time with some stupid organisations that insist for some reason on full scanning.
Perhaps one suggestion. Can you update your documentation a bit to make it easier for someone to be able to implement it themselves? There's not much about that on the Github and I would guess some people would rather run their own locally.
Apparently I'm the only person on HN that doesn't understand what a "scanned look" means, and the author doesn't provide any images to illustrate. Could someone enlighten me?
EDIT: Forgot to mention that a before and after will be included in the website as it has been mentioned multiple times as a great way of showing what the website does!
I live in Germany and I never had anyone telling me that a PDF signed digitally wasn’t enough, especially if they expect you to e-mail it back to them.
Is this a US problem?
Just had it in Germany with an application to a bank (DKB). Digitally signed one was rejected. I need to print on paper, sign and then scan. Signing the PDF in an iPad app was not OK.
Oh, interesting, then I guess I was just lucky.
DKB is indeed well known for being quite old-school.
I'll add that in my case the need for a printed and hand-signed copy was always connected to shipping the document or faxing it. Never through email.
I live in Italy and ING Direct did this to me. The same happened in Belgium with KBC, so I think some banks might still be doing this here, unfortunately!
I like to think this solution is way more friendly than using terminal for 99% of people. What others have suggested is to develop a stand-alone app for more private documents which can't be sent somewhere else
I believe printing, signing and scanning should be a punishable crime. It has a terrible impact on accessibility, as the whole text layer of the PDF is lost and it becomes unreadable for screen readers.
Only semi-related, but I thought their was an open source PDF-flowing tool featured on hacker news a while back (that turned PDF into responsive HTML). Anyone know of something like that?
Really question is in some cases, you still need to sign your signature and then scan the document again, I hope this website can let you add signature automatically...
For signing docs I open PDF in Illustrator and place my signature image then save and send. Are they really looking for signs of document having been scanned?
Maybe you are at home and you need to sign something, but you don't have a printer and scanner because its 2020 and those are seldom needed. You can use this and move on with your life, or drive to a Kinkos or other place that will let you print and scan for a small fee. The latter may be a waste of time.
Wait, what less than kosher method were you thinking?
People who require "print, sign, scan" do so for the purpose of avoiding forgery, not because they benefit from any other aspect of the procedure. So long as you aren't putting someone else's signature onto the document, everything is kosher because you've deprived them of nothing given that forgery was successfully avoided.
It's like if all devices in your house have no internet connection, and your ISP support rep asks you to reboot your computer. You say "Ok it just rebooted" without rebooting because the reason for the procedure is simply to ensure that it's not an issue local to one device, and you have ensured that for them (albeit by alternative means).
I thought someone might want to share their document with someone else. I may include in a future release a max number of downloads (i.e. 2 or 30 minutes and then delete)
I once ran into this at work. Some (fairly old) android versions send two requests under certain circumstances. Someone else might elaborate this maybe.