
BreezyPDF Lite: HTML to PDF generation as a Service - deedubaya
https://github.com/danielwestendorf/breezy-pdf-lite
======
jim-a-1020401
I've been working on a documentation process with Markdown -> HTML + CSS ->
PDF for a while and I found that Weasyprint works best. It supports everything
I need except CSS target-counter for auto-generation of page numbers in a TOC
but otherwise really good.

I really don't like that this uses something non-standard for header/footer
and stuff in the margins since that's already covered by standardized CSS
@page stuff; [https://www.w3.org/TR/css-gcpm-3/](https://www.w3.org/TR/css-
gcpm-3/). I'm using that with Weasyprint for automatic header/footers with
auto page-numbering, including setting strings from the html to use for
heading names, document title, author, etc... example CSS: `h1 { string-set:
h1-title content(), h2-title ""; }` + `@page { @bottom-left {content:
string(h1-title) string(h2-title);}`.

Weasyprint seems closest to princexml and is free.

~~~
deedubaya
You can use CSS page sizing as well, passing it via meta isn't required.

[https://docs.breezypdf.com/metadata#use-css-for-page-
size](https://docs.breezypdf.com/metadata#use-css-for-page-size)

Headers/and footers are just HTML strings and can be super rich with images
etc and customized with CSS. Page numbering is free as well in
headers/footers.

Of course you could just use properly positioned <header> and <footer> tags
and do whatever you need to with JS for page numbering.

~~~
jim-a-1020401
That's good for page sizing but as other comments on this thread noted, no
browsers seem to have proper support for CSS paged media including Chrome.
When I tested it, none of the @bottom type stuff or other print/page related
CSS worked at all which is why I ended up with Weasyprint since it has it's
own engine (and is really great). Example of some more paged media stuff not
supported by Chrome; [https://www.smashingmagazine.com/2015/01/designing-for-
print...](https://www.smashingmagazine.com/2015/01/designing-for-print-with-
css/). Another thing Weasyprint does really well which surprised me is
automatically carrying over table headers onto subsequent pages when a table
is split across pages, which is really nice.

------
deedubaya
Turns html like this: [https://github.com/danielwestendorf/breezy_pdf_lite-
ruby/blo...](https://github.com/danielwestendorf/breezy_pdf_lite-
ruby/blob/master/example/ex.html)

Into a pdf like this: [https://www.dropbox.com/s/v4j4n1cvtm032w9/breezy-pdf-
dashboa...](https://www.dropbox.com/s/v4j4n1cvtm032w9/breezy-pdf-dashboard-
example.pdf?dl=0)

~~~
brendandahl
And it all comes back to HTML as dropbox uses pdf.js to show that PDF.

~~~
deedubaya
The circle is complete

------
nightbrawler
Despite being a bit pricey, PrinceXML is the best tool I've seen and used for
HTML to PDF conversion. Massive set of features and very reliable output.
We've used it as part of our reporting engine to output PDF's since 2011.
Supports spanning tables across multiple pages, JavaScript, image loading and
lots of other cool stuff

~~~
Scarbutt
Prince renders HTML somewhat different than browsers and doesn't support the
latest HTML + CSS standards, why choose it for new stuff over headless chrome
and/or puppeteer?

~~~
wolfgang42
I use PrinceXML at work, for generating press-ready output. (E.g. a catalog,
generated straight from the product database and ready to be sent to the
printer. Previously they were doing this all manually with PageMaker, which
was very tedious and prone to mistakes.) It supports the CSS Paged Media spec,
so you have full control over page layout: margins, widows and orphans, page
breaks, bleeds, and so on. It also understands page numbers, and gives you
full control over headers and footers. Other things it supports include press
crop marks, color management, and PDF options like bookmarks and PDF link
regions.

Headless browser rendering is fine if all you need is a two-page invoice PDF,
but it falls down when you need control of anything other than basic stuff
like the font size.

~~~
deedubaya
> it falls down when you need control of anything other than basic stuff like
> the font size

This is definitely not the case anymore, and BreezyPDFLite supports most of
the features you mention, while supporting the same HTML/CSS/JS you might be
displaying to end users across evergreen browsers.

~~~
wolfgang42
Does BreezyPDFLite support CSS like this? Chrome didn't support any of this
last I checked.

    
    
      h2 {flow: static(header);}
      @page :right {
          margin-bottom: 1.4cm;
          @top-left {content: flow(header);}
          @bottom-right {content: counter(page);}
      }
    

Edit: Also, tables of contents, with page numbers and links in the PDF:

    
    
      ul.toc a::after {content: leader('.') target-counter(attr(href), page);}

~~~
jim-a-1020401
No browsers support those features from all the testing I've been doing over
the past month or two. Weasyprint supports everything I tried except target-
counters and dot leaders which I've only found support for with princexml.
Weasyprint does author a proper outline within the PDF so my current workflow
(no $ for prince) is to use Weasyprint to make the PDF with a TOC that has no
page numbers, extract the outline from the PDF with a python tool to get the
page numbers, update the original HTML with page numbers then run it through
Weasyprint again. That'll be bundled in an automated build thing triggered by
a git commit of the original markdown file straight to PDF so it's not
something I have to do manually each time.

------
jacquesm
PDF is 'lossy' when compared to HTML, you will lose a ton of semantic
information, and on top of that the resulting PDF will be _much_ larger than
the source material.

I'm currently involved in an effort to do the reverse and there isn't a day
that I don't curse the PDF specification and the various implementations. And
with the 'data:' source for graphical element and MathJax there isn't much
reason for for instance scientific papers to be published as pdfs to begin
with.

[https://github.com/thomaspark/pubcss](https://github.com/thomaspark/pubcss)

PubCSS has the right idea.

~~~
sebazzz
Convert to PDF can be very useful for generating reports. For instance, you
display a chart using Javascript, and get exactly the same chart in the
generated report.

We use EvoPDF for this purpose, which also uses a website based webbrowser
under the covers. Unfortunately it is quite slow, especially when Javascript
is required for the report. It also handles tables badly across multiple
pages, and full page backgrounds are also cumbersome.

------
nicolasMLV
It is mainly a wrapper around this chrome headless command : (maybe you only
need that)

`#{chrome_alias} --headless --disable-gpu --print-to-pdf="#{pdf_path}"
"#{html_url}"`

------
mapgrep
I am guessing the API/service architecure is what differentiates it from the
command line tool [https://wkhtmltopdf.org/](https://wkhtmltopdf.org/) ?

~~~
forgot-my-pw
WKHTMLTOPDF is pretty inactive nowadays. It uses an older webkit engine, and
there's only been 1 release in the past 2 years (which happened to be 19 days
ago, actually).

------
cygned
We have built something similar internally. Macroservice running a cluster of
headless Chromes that turn an uploaded HTML file into a PDF. Used for
reporting.

We now switch to a Microsoft Word rendering backend where we process Word
files with template strings in them and then run Word in headless mode to save
files as PDFs. While the HTML-to-PDF approach works, most of our users work
with Word all day so we are solving the wrong problem.

~~~
wilsonfiifi
Can you elaborate a bit more on your Word based reporting? To date the only
solution i've been satisfied with is Aspose Words. Unfortunately licensing is
a bit steep so it would be good to know if any cheaper options are available.

~~~
cygned
We basically process the template with a node.js library. We then have a
node.js server on a Windows machine that takes a file, saves it onto disk and
runs a Visual Basic application on it which itself starts Microsoft Word
headless to save the file as PDF.

~~~
benbristow
> runs a Visual Basic application on it

Clever, but that does sound a bit messy.

~~~
cygned
Sounds messy. However, this VB app is literally just 15 LOC, pretty
straightforward. And it is the only solution we have found so far that allows
us to leverage Word.

------
laktek
If you like a hosted API, I've built [https://pdf.cool](https://pdf.cool)

~~~
deedubaya
Or BreezyPDF.com ;)

