Show HN: Simple PDF to PNG Server

slowenough · on Nov 15, 2019

Actually it just uses ImageMagick and soffice under the hood, so it can do way more than PDF. I've tested it with DOC, DOCX, XLS, and they all seem to work.

Edit: in case anyone wants an example, here's the conversion of tesla-model-s.pdf[0]:

https://secureview.cloudbrowser.xyz/uploads/filekat9.v05lnle...

[0]: https://www.tesla.com/sites/default/files/tesla-model-s.pdf

slowenough · on Nov 15, 2019

Sorry, I'm afraid I just cleaned the upload directory. HN, you have converted 1.6G of files.

Here's that Tesla example link again:

https://secureview.cloudbrowser.xyz/uploads/file5rc5.9nhts2j...

slowenough · on Nov 16, 2019

Forgive me, I've cleaned the uploads directory again. Over 3.5G of files converted.

Here's that Tesla link (I'm making it permanent now):

https://secureview.cloudbrowser.xyz/uploads/filecp9z.ohbtt6u...

umvi · on Nov 18, 2019

Sounds like you need to auto-clean your uploads directory

FreeHugs · on Nov 15, 2019

Nice. What is your tech stack to serve it?

slowenough · on Nov 15, 2019

Just a digital ocean droplet running FreeBSD, then PM2 running Node.JS and ImageMagick built with multicore support.

kodablah · on Nov 15, 2019

One wonders if you can WASM-ify ImageMagick and the use a JS FileReader and a canvas or other PNG output approach to do this all client side. Hrmm, I might tackle this myself if I get the time...

johnsonjo · on Nov 16, 2019

Looks like someone has gotten pretty far WASM-ifying ImageMagick already [1]!

[1]: https://github.com/KnicKnic/WASM-ImageMagick

z92 · on Nov 15, 2019

ImageMagick with "multicore" support won't speed things up the moment number of concurrent requests touches the number of cores you have. And in digital ocean's case you have only one core.

slowenough · on Nov 15, 2019

Actually I created a Droplet with 6 cores

Koshkin · on Nov 15, 2019

But remember that multi-threading was shown to improve performance in the era of single-core computers.

mlyle · on Nov 15, 2019

Sure, but if your concurrent number of fully tendered requests at least sometimes dips below the number of cores, it is a big win.

(At the possible cost of top-end scale; improving response to normal workload significantly and hurting the top workload reachable somewhat).

dandanio · on Nov 15, 2019

Github? I would love to contribute.

slowenough · on Nov 15, 2019

Yep, I just put it up now

https://github.com/crislin2046/p2.

michaelmior · on Nov 15, 2019

Note that HN (and probably many others) won't correctly autolink URLs with a period at the end. It's also somewhat confusing. You may want to consider removing it.

slowenough · on Nov 15, 2019

Oh! Sorry. Let me change that

http://tiny.cc/p2github

edit: I like the dot ¯\_(ツ)_/¯

yegle · on Nov 15, 2019

Package it as a docker image and you can then run it on cloud.run (Might be expensive though).

Disclaimer: I work for Google Cloud.

slowenough · on Nov 23, 2019

Hey, this is actually atop FreeBSD so it is not as simple as just making a Dockerfile.

It should be too much tho, perhaps just making an install_deps.sh script alternative to use apt, and writing a Dockerfile.

You're welcome to submit a PR but I might get to it first.

Edit: I just made some changes relating to this, including a Dockerfile. I have not tested in Docker, but it now works on Nix (Debian)

marcus_holmes · on Nov 15, 2019

what's your experience running soffice server-side? I might have to do this soon, and I'm dreading it.

slowenough · on Nov 15, 2019

My experience is relatively small.

The `convert` script of ImageMagick calls soffice by default if the input file is a LibreOffice type.

I actually started writing a shell script to detect if it was DOCX etc, and run soffice first, but then I found out that ImageMagick does that automatically, if libreoffice is installed.

marcus_holmes · on Nov 15, 2019

hmm, OK, I didn't know that. Thanks that's useful :)

Be interesting to see how your demo site holds up under HN front page traffic - I know Imagemagic is used a lot for random web sites, but I worry about scale a lot. Any info would be useful!

slowenough · on Nov 15, 2019

no probs. if you want to email me I can let you know once it all quiets down.

marcus_holmes · on Nov 15, 2019

sent :)

lixtra · on Nov 15, 2019

We’ve been running it as part of alfresco. It’s not very stable and hangs every few weeks. Since we kill it every night we didn’t notice any problems.

marcus_holmes · on Nov 16, 2019

good tip, thanks

ztravis · on Nov 15, 2019

It's actually not too bad. You can run a pool of soffice processes, but unless you really care about the startup time, I'd suggest running individual processes per job, which you can run headless from the command line. For the most part libreoffice does well, although at some point you'll start to discover the quirks in their rendering...

7777fps · on Nov 15, 2019

Does it support SVG?

slowenough · on Nov 15, 2019

as an output format?

7777fps · on Nov 18, 2019

I wondered about input format.

slowenough · on Nov 23, 2019

Hey, yeah I think it doesn't support fully on the input end, but it does support on the output (if you change the $format variable in the convert script).

Input looks a bit weird, for example,

https://secureview.cloudbrowser.xyz/uploads/file1tda.4paa65d...

Which is from:

https://dev.w3.org/SVG/tools/svgweb/samples/svg-files/AJ_Dig...

jermaustin1 · on Nov 15, 2019

Before I upload and download a file from a random Show HN with URLs like "secretpage-canneverbefound" and "very-secure-manifest-convert", I'd like to see some source code.

Typically with little projects like this it is customary to discuss how you built the application.

slowenough · on Nov 15, 2019

It looks like the URLs named like that are trying to hack you and steal your secrets, I get that.

The reason they're named that way is an artifact of how this was meant to be a private service to let a remote cloud browser view PDF (etc) files securely, without forcing the client to download them.

So initially I didn't intend to make this public as its own service because the service was supposed to have only 1 customer (this other service).

It wasn't easy at first, as ImageMagick took a long time to convert PDFs at 300 dpi, so I rebuilt it with multicore support, and still it took a long time. I eventually discovered a sweet spot at 100dpi.

But aside from that I tried to keep it very simple, it's just some Node.JS and some shell scripts running on FreeBSD on digital ocean.

devxpy · on Nov 15, 2019

With superb open source, cross platform libraries like pdfium available to the public, why won't I just do this on the client?

slowenough · on Nov 15, 2019

OK, that's a good point. I think the only reason was I looked at PDF.js and felt the quality was not so good, and then I tried ImageMagick and found it gave good quality, so I just went with that.

Another reason is I made this to convert the PDF (or Word doc, XLS, etc) remotely without the file ever touching the user's device. So they couldn't be exposed to any exploits contained in the file or of the PDF engine (PDFium recently caused a Chrome 0 day I think).

devxpy · on Nov 15, 2019

Wow, wasn't expecting such a well constructed counter argument.

slowenough · on Nov 15, 2019

Thank you. The internet, eh? You never know what you're gonna get.

vyuh · on Nov 15, 2019

"to convert the PDF (or Word doc, XLS, etc) remotely without the file ever touching the user's device"

This made me think that the site accepts a URL to a PDF/XLS/DOC file and returns PNG.

slowenough · on Nov 15, 2019

It sounds like that makes you think the site accepts a link to a PDF/XLS/DOC and returns a PNG.

Actually, that is basically what happens in the original context for this service.

If you take a look at

https://free.cloudbrowser.xyz

open a remote browser, then search for a PDF/XLS/DOC etc and then click on it, you'll see a dialog which says the file is transferred.

What's happening is the remote browser downloads the file to a temp directory in the cloud, then some software does a POST request with that file to the service (Simple PDF to PNG Server) and gets the viewing link back immediately, before the conversion has completed.

It then sends the viewing link back to the remote browser which opens a tab. The client can then view the document as images securely without the file ever touching their device.

Originally, I didn't intend to release this viewer as a separate thing/product/service.

If you use the viewer (without the remote browser) you do have to submit the file. I didn't make it to fetch them. I made it as a "secure document viewer" for the remote browser product.

mirimir · on Nov 15, 2019

That's a great point.

I typically don't like using remote resources for stuff that I can do locally. But for iffy pdfs, it's a good tradeoff.

Edit: Nice to see the PDF capability added :)

slowenough · on Nov 15, 2019

Thanks, I remember you commented on my first share of the remote browser thing :)

redman25 · on Nov 15, 2019

I used PDF.js in a side project for comparing pdfs and it worked quite well. It runs completely client side, so I can build the site with a static site generator and html: https://www.parepdf.com/

It can start to bog down with large image heavy pdfs. But overall it runs performantly.

4ver · on Nov 15, 2019

Does this use ghostscript? Was looking into doing this a while back but wasn't sure if I'd have to get a license from Artifex to use it in a SaaS app.

slowenough · on Nov 15, 2019

The `gs` I have installed says it is licensed with this[0][1] so I think it's fine.

[0]: https://www.gnu.org/licenses/agpl-3.0.en.html

[1]: https://tldrlegal.com/license/gnu-affero-general-public-lice...

bluetidepro · on Nov 15, 2019

Yeah, let's just upload our files to a page that's called "secretpage-canneverbefound.html", and with no info other than the 5 word title on HN. -_-

thatguyagain · on Nov 15, 2019

I mean you don't need to upload a PDF list of all your darkest secrets.

mobilemidget · on Nov 15, 2019

No but a mention this is happening serverside and you are actually copying the file to the server kind of notice or message would be correct imho

behindsight · on Nov 15, 2019

> [...] mention this is happening serverside

Not being snarky, but the submission title does say "Server".

mobilemidget · on Nov 16, 2019

I know, and you, and probably all tech savvy people, realise when clicking the browse file button you are probably going to submit a file to a server, but not everybody does and those do need an explicit warning in my opinion.

exhilaration · on Nov 15, 2019

I'm glad I read the comments, I was about to do exactly that!

bluedino · on Nov 15, 2019

Do this with plot files and dxf's and you can sell it

slowenough · on Nov 15, 2019

If you'd like to mail me at cris@dosycorp we could discuss what you need.

sanjeetsuhag · on Nov 15, 2019

It seems like there are free sites that do that already.

alpb · on Nov 15, 2019

Have you considered making this open source and distribute as a container image that people can self-build/deploy/host on a serverless platform, like a Google Cloud Run?

For example, this DOC to PDF converter: https://github.com/as-a-service/pdf

slowenough · on Nov 23, 2019

Thanks, this is actually on FreeBSD, so it's not so easy to make it for Docker, but the repo now has a Dockerfile for nix systems. I haven't tested it tho.

fahrixds · on Nov 16, 2019

https://www.tesla.com/sites/default/files/tesla-model-s.pdf

andrewstuart · on Nov 15, 2019

I like the minimalism.