
Show HN: Ready-to-use API to convert any web page to PDF using headless Chrome - jancurn
https://www.apify.com/jancurn/url-to-pdf
======
conradk
Saving a webpage to a PDF is literally one command line away:

chromium --headless --disable-gpu --print-to-pdf=google.pdf
[http://google.com/](http://google.com/)

What does Apify add in this case?

~~~
xilni
This is exactly what I do bundled into a nice function added to my .zshrc

    
    
        chromepdf() {
            chrome --headless --disable-gpu --print-to-pdf="$1" $2
        }

~~~
sgolestane
Is there anyway to make chrome wait until the page loads?

~~~
lashkari
Looks like it prints automatically once Page.loadEventFired is triggered.

Alternatively, you can run Chrome headless with the remote debugging API
(--remote-debugging-port=9222) and send a Page.printToPDF
([https://chromedevtools.github.io/devtools-
protocol/tot/Page/...](https://chromedevtools.github.io/devtools-
protocol/tot/Page/#method-printToPDF)) after some delay.

------
jotto
I've been working on something similar:
[https://www.prerender.cloud/docs/api](https://www.prerender.cloud/docs/api)

    
    
      // URL to screenshot
      service.prerender.cloud/screenshot/https://www.google.com/
    
      // URL to pdf
      service.prerender.cloud/pdf/https://www.google.com/
    
      // URL to html (prerender)
      service.prerender.cloud/https://www.google.com/

~~~
FabioFleitas
Heads up that I'm getting a "Too many requests for the month, sign up for an
account at [https://www.prerender.cloud/"](https://www.prerender.cloud/") when
trying to go to any of those links.

~~~
jotto
Thank you - I had an overaggressive rate limiter for non auth'd accounts, it's
_improved_ now.

~~~
rahulroy9202
Nope. Still getting - Too many requests for the month, sign up for an account
at [https://www.prerender.cloud/](https://www.prerender.cloud/)

------
visarga
By the way, is there an opposite service that converts PDF's into plain HTML
for reading? I know about [https://www.arxiv-
vanity.com/papers/](https://www.arxiv-vanity.com/papers/) but it only works on
arXiv PDFs.

~~~
rpedela
Best one I have found.
[https://github.com/coolwanglu/pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX)

------
jugjug
Off-topic, but Apify as a service looks really good. I was spinning up a
dedicated VM on AWS with Docker installed only to get a simple webscraper
running. Apify solves this elegantly and removes an significant pain in my
workflow.

------
ak39
Any info how this compares to commercial html to pdf renderers like PrinceXML?

~~~
schneidmaster
In my experience, Prince is great for static HTML + CSS rendering, but its
JavaScript engine is pretty lackluster -- I couldn't get it to work with
rendering React components, for example. So it depends a lot on your use case
and if you can server-side render everything. It's also pretty pricey[1] --
not that I mind paying for quality software but that sticker could be
prohibitive for a lot of folks.

[1] [https://www.princexml.com/purchase](https://www.princexml.com/purchase)

~~~
jamespaden
I'm a developer at [https://docraptor.com](https://docraptor.com). We're an
official Prince partner with a SaaS pricing model, but we've got a separate
JavaScript engine for that very reason.

------
dmmalam
Also check out [https://urlbox.io/](https://urlbox.io/). YC alum, super
helpful.

------
laktek
I built Screen.rip, which also supports PDF generation.
[https://screen.rip/#pdf](https://screen.rip/#pdf)

Screen.rip gives you more control over the generated PDF beyond Puppeteer's
options (like it can wait for certain elements to appear, inject CSS or switch
to screen stylesheet instead of the print stylesheet).

------
phmagic
I love this service! I think ease of adoption you can allow pre-made scripts
to be shared so the non-technical can easily set up work flows that go right
into their email. For the technical folks, I think it would be great to have
examples of things you can do with Apify that is a hassle to do with your
local chrome headless.

Great job!

------
nikisweeting
If you're interested in running your own personal Way-Back machine that uses
Chrome headless for archiving (among other methods), check out Bookmark
Archiver.

[https://github.com/pirate/bookmark-
archiver](https://github.com/pirate/bookmark-archiver)

------
sebazzz
We are not too happy with our EvoPDF license so in the basis this is a good
option. However, I do not think this allows adding headers, footers, page
numbers etc.

------
jeppebemad
Is there a similar API around that accepts HTML instead of a URL? I’ve build
one for my project, but I would prefer to delegate this to an external
service.

~~~
chrismorgan
Bear in mind that you’ll need to either embed all your resources, or only use
CORS-enabled resources, or fake the origin for your HTML document so that it
can access non-CORS-enabled resources on a particular domain.

Encoding your HTML as a data: URI might work for this service as-is (provided
you use no non-CORS-enabled resources). Haven’t tried it.

------
Robdel12
Question: does this create accessible PDFs? That would be a really nice
_possible_ work around for screen reader users having issues with a website.

~~~
jancurn
I believe it's possible, by adding something like "scale: 1.5" to "pdfOptions"
you might render an accessible PDF

------
tehlike
you could rpobably launch this service free, and someone will probably create
a docker image, and make it one click.

------
colordrops
As long as GPU support is not functional in headless, "any web page" is a
misnomer. A large enough percentage of sites use GPU acceleration so that
headless mode is useless. This needs to be addressed by the Chrome team.

~~~
kyberias
What do you mean "use GPU acceleration"? Are you saying that a large
percentage of sites use WebGL? Using GPU acceleration for web page rendering
is just a browser performance optimization. They can render the same page
without GPU only slower.

~~~
colordrops
I'm more specifically talking about WebGL. We'd love to use headless chrome at
our company but we can't. But even for things like CSS transforms, we do a lot
of really heavy 3D work and software emulation won't cut it.

~~~
nikisweeting
CSS transforms work just fine without the GPU. We use it extensively for
screenshot testing our CSS transform-based animations on
[https://oddslingers.com](https://oddslingers.com).

~~~
colordrops
We are using a ton of them to create 3D heads up display camera overlays. They
are too slow with GL software rendering.

~~~
nikisweeting
Not sure what you mean? Usually people use either CSS 3D Transforms __or
__WebGL, but not both.

~~~
colordrops
What do you mean? We are using both right now. Why can't they both be used on
the same page?

------
panda888888
Does this work if the page is behind a password/SSO wall?

And is it possible to print multiple Chrome tabs?

Printing pages to PDF is pretty straightforward. It's the above two issues
were I've run into problems. Anyone know of a good solution to the second one?

~~~
gbrits
Assuming you want this done automatically, what's the advantage of 'printing'
multiple tabs to PDF in a headless browser, over just sequentially loading and
printing the pages you want done?

