Actually it just uses ImageMagick and soffice under the hood, so it can do way more than PDF. I've tested it with DOC, DOCX, XLS, and they all seem to work.
Edit: in case anyone wants an example, here's the conversion of tesla-model-s.pdf[0]:
One wonders if you can WASM-ify ImageMagick and the use a JS FileReader and a canvas or other PNG output approach to do this all client side. Hrmm, I might tackle this myself if I get the time...
ImageMagick with "multicore" support won't speed things up the moment number of concurrent requests touches the number of cores you have. And in digital ocean's case you have only one core.
Note that HN (and probably many others) won't correctly autolink URLs with a period at the end. It's also somewhat confusing. You may want to consider removing it.
The `convert` script of ImageMagick calls soffice by default if the input file is a LibreOffice type.
I actually started writing a shell script to detect if it was DOCX etc, and run soffice first, but then I found out that ImageMagick does that automatically, if libreoffice is installed.
hmm, OK, I didn't know that. Thanks that's useful :)
Be interesting to see how your demo site holds up under HN front page traffic - I know Imagemagic is used a lot for random web sites, but I worry about scale a lot. Any info would be useful!
It's actually not too bad. You can run a pool of soffice processes, but unless you really care about the startup time, I'd suggest running individual processes per job, which you can run headless from the command line. For the most part libreoffice does well, although at some point you'll start to discover the quirks in their rendering...
Hey, yeah I think it doesn't support fully on the input end, but it does support on the output (if you change the $format variable in the convert script).
Before I upload and download a file from a random Show HN with URLs like "secretpage-canneverbefound" and "very-secure-manifest-convert", I'd like to see some source code.
Typically with little projects like this it is customary to discuss how you built the application.
It looks like the URLs named like that are trying to hack you and steal your secrets, I get that.
The reason they're named that way is an artifact of how this was meant to be a private service to let a remote cloud browser view PDF (etc) files securely, without forcing the client to download them.
So initially I didn't intend to make this public as its own service because the service was supposed to have only 1 customer (this other service).
It wasn't easy at first, as ImageMagick took a long time to convert PDFs at 300 dpi, so I rebuilt it with multicore support, and still it took a long time. I eventually discovered a sweet spot at 100dpi.
But aside from that I tried to keep it very simple, it's just some Node.JS and some shell scripts running on FreeBSD on digital ocean.
OK, that's a good point. I think the only reason was I looked at PDF.js and felt the quality was not so good, and then I tried ImageMagick and found it gave good quality, so I just went with that.
Another reason is I made this to convert the PDF (or Word doc, XLS, etc) remotely without the file ever touching the user's device. So they couldn't be exposed to any exploits contained in the file or of the PDF engine (PDFium recently caused a Chrome 0 day I think).
open a remote browser, then search for a PDF/XLS/DOC etc and then click on it, you'll see a dialog which says the file is transferred.
What's happening is the remote browser downloads the file to a temp directory in the cloud, then some software does a POST request with that file to the service (Simple PDF to PNG Server) and gets the viewing link back immediately, before the conversion has completed.
It then sends the viewing link back to the remote browser which opens a tab. The client can then view the document as images securely without the file ever touching their device.
Originally, I didn't intend to release this viewer as a separate thing/product/service.
If you use the viewer (without the remote browser) you do have to submit the file. I didn't make it to fetch them. I made it as a "secure document viewer" for the remote browser product.
I used PDF.js in a side project for comparing pdfs and it worked quite well. It runs completely client side, so I can build the site with a static site generator and html: https://www.parepdf.com/
It can start to bog down with large image heavy pdfs. But overall it runs performantly.
I know, and you, and probably all tech savvy people, realise when clicking the browse file button you are probably going to submit a file to a server, but not everybody does and those do need an explicit warning in my opinion.
Have you considered making this open source and distribute as a container image that people can self-build/deploy/host on a serverless platform, like a Google Cloud Run?
Thanks, this is actually on FreeBSD, so it's not so easy to make it for Docker, but the repo now has a Dockerfile for nix systems. I haven't tested it tho.
Edit: in case anyone wants an example, here's the conversion of tesla-model-s.pdf[0]:
https://secureview.cloudbrowser.xyz/uploads/filekat9.v05lnle...
[0]: https://www.tesla.com/sites/default/files/tesla-model-s.pdf