
Show HN: Webpage to PDF Microservice - cjimti
https://imti.co/webpage-to-pdf-microservice/
======
krn
A basic command line alternative using Headless Chrome[1]:

    
    
      chrome --headless --disable-gpu --print-to-pdf https://www.chromestatus.com/
    

[1]
[https://developers.google.com/web/updates/2017/04/headless-c...](https://developers.google.com/web/updates/2017/04/headless-
chrome#create_a_pdf_dom)

~~~
rav
Similar functionality is packaged in wkhtmltopdf, which essentially runs
Webkit headless to print to PDF.

[https://wkhtmltopdf.org/](https://wkhtmltopdf.org/)

~~~
cjimti
txPDF is a simple containerized web services wrapper around wkhtmltopdf,
intended to be used as a Microservice component in a larger system.

~~~
ashkulz
wkhtmltopdf maintainer here. That's really cool!

Did you manage to find a workaround for
[https://github.com/wkhtmltopdf/packaging/issues/2](https://github.com/wkhtmltopdf/packaging/issues/2)?
If so, would appreciate a PR :-)

~~~
cjimti
Thanks, I'll check out that issue and see if there is anything I can
contribute. wkhtmltopdf is a great utility and we rely on it heavily.

------
stevekemp
I've reported many bugs in projects that turn "URL" to "PDF".

You need to be sure you're limiting the kind of URLs that people can submit.
For example ensure that nobody makes a PDF of :

* file:////etc/passwd

* [http://169.254.169.254/latest/meta-data/local-hostname](http://169.254.169.254/latest/meta-data/local-hostname)

* [http://localhost:8080/](http://localhost:8080/)

I'd say over half of the "PDF-creation" projects posted here have been
vulnerable to some/all of those attacks. (I continue to be surprised at how
many web-to-pdf services exist. I guess there must be a lot of people paying
for them?)

~~~
cjimti
These are great security suggestions and I should make some clarifications on
the intended use. We use txPDF as a backend Microservice and not open to
direct public use. It is good for automating report generation from other
portions of a larger system.

------
thomasfromcdnjs
Awesome timing. Just started work on a LinkedIn alternative called
[https://jaresume.com](https://jaresume.com)

We need a reliable way of turning peoples resumes into PDF's

Going to give this a go today or tomorrow.

Doing it with
[https://github.com/GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer)
also works quite well

~~~
dvh
Why not simply press Ctrl+p and print to PDF?

~~~
thomasfromcdnjs
We have tried it in the past, just doesn't work reliably with different html
configurations.

Does ctrl+p on this page ->
[https://jaresume.com/thomas](https://jaresume.com/thomas) look good for you?

~~~
rcfox
Have you tried using a print media stylesheet? You could hide the navigation,
reduce the whitespace, maybe shrink the font size a little bit, and remove
link text decoration.

~~~
thomasfromcdnjs
Great idea. I have used print media sheets in the past, but found them easy to
have regressions e.g. elements that are introduced but not hidden. A webpage
to pdf process is also vunerable to that though.

I think ideally, because the resume renderer is a react component, I'd rather
just boot up chromium with the react component and resume data and do a fully
clean render of the page into pdf.

We shall see.

------
ernsheong
You can achieve this using just the browser.

In Chrome Dev Tools, click on the devices button (the icon with the phone and
tablet). Using the top-right menu, select "Capture full size screenshot".

Walla, you now have a full size screenshot that you can convert into PDF.

Incidentally, I am author of
[https://www.pagedash.com](https://www.pagedash.com), which is a personal web
scrapbook which allows you to capture the current page as HTML and generate
links to share with others.

~~~
superasn
I tried it with this page only but it didn't work for me. Got a 110Kb png file
but it's empty. It is a valid PNG but it's completely blank. Maybe it's buggy.

------
ZeKZ
I find wkhtmltopdf very difficult to work with, for instance the official
documentation is just a man [1].

I discovered the project Weasyprint[2] a few months ago. I find it easier to
use, and very powerful when using Python. You can define a custom loader to
inject images or styles generated on the fly for instance.

There are still some missing features compared to wkhtmltopdf, such as
defining a custom footer and header, but it's a very promising project.

[1]
[https://wkhtmltopdf.org/usage/wkhtmltopdf.txt](https://wkhtmltopdf.org/usage/wkhtmltopdf.txt)

[2]
[https://github.com/Kozea/WeasyPrint/](https://github.com/Kozea/WeasyPrint/)

~~~
jimnotgym
Since you mention Python, I have found pdfkit[1] to be a pretty good wrapper
for wkhtmltopdf. I have a document generation engine that uses it dozens of
times a day. Worst part is that wkhtmltopdf in the Ubuntu repos is still
compiled (when last checked) without some patch that allows it to run
headlessly. I built from source, which was not too difficult.

[1][https://pypi.org/project/pdfkit/](https://pypi.org/project/pdfkit/)

------
Globz
One of my application running at work has a task of creating a user ordersheet
made through the main app workflow and transposing it to an HTML document
which is then converted to a PDF document by wkhtmltopdf and dispatched via
email, etc.

I found this setup to be really stable and easy to maintain, so far it has
produced around 70k orders per year and has been running for over 4 years now
without any hiccups.

Before that I was using phantomjs but it wasn’t as fast and reliable for some
reasons that I can’t quite remember now, since I havent touch that part of the
app in a long time.

All I remember is that wkhtmltopdf was easier to tweak and compose with.

------
btown
[https://prerender.com/](https://prerender.com/) is a great service (fully
MIT-licensed at
[https://github.com/prerender/prerender](https://github.com/prerender/prerender)
) for this type of thing, both for rendering internal pages and for
scraping/rendering external sites that rely heavily on client-side code.

------
liftbigweights
You can also use pdf printers available in linux distros and even windows now.

------
bramd
I'm still looking for a service like this, but that creates a nicely tagged
PDF and conveys the HTML structure in the PDF tags.

Tagged PDFs are a requirement in many processes for accessibility or archival
reasons.

~~~
gildas
Why not using HTML instead of PDF? I'm the author of an extension that allows
to save faithfully a web page into an HTML file [1]. From my point of view,
that should be the best solution for archiving web pages in a file. Votes on
HN disagree with me though [2], I wished I could understand why.

[1] [https://github.com/gildas-lormeau/SingleFile](https://github.com/gildas-
lormeau/SingleFile)

[2]
[https://news.ycombinator.com/item?id=18243721](https://news.ycombinator.com/item?id=18243721)

~~~
Ibethewalrus
read recently PDF is defacto standard by government

------
jotto
alternatively if you want a SaaS REST API:

    
    
       curl https://service.prerender.cloud/screenshot/https://google.com/ > out.jpg
    
       curl https://service.prerender.cloud/pdf/https://google.com/ > out.pdf
    
       curl https://service.prerender.cloud/https://google.com/ > out.html
    
    

[https://www.prerender.cloud/](https://www.prerender.cloud/)

------
fastball
Why not just

> Print

> Open as PDF

?

~~~
supermatt
To save your microservice having to run a graphical environment and simulate
mouse interaction?

