
Wkhtmltopdf, shell utility to convert html to pdf using webkit rendering engine - tilt
http://code.google.com/p/wkhtmltopdf/
======
cletus
I spent a lot of time 2-3 years ago assessing different tools to convert
HTML+CSS to PDF [1]. At the time, this was to convert HTML plus custom tags
into well-formatted legal documents.

At the time the hands down winner was Prince XML [2]. It's relatively
expensive ($3800 for a single server license) but it _just works_ , works from
many languages and produces beautiful results quickly (look at their samples).
It doesn't take a lot of developer time to make up that purchase cost.

I haven't checked out this particular project but with the others I have they
tended to work for smaller samples but would die, take forever or have
unpredictable results on even moderately large documents (~150k).

For any commercial project, honestly I'd just fork over the $3800 for Prince
without hesitation. It's simply that good.

EDIT: actually, looking over the SO question I think I did check out an early
version of this project and didn't have much success with it. The one thing
that concerns me about this project now is the last news item is over a year
old. Is it still being actively maintained?

[1]: [http://stackoverflow.com/questions/391005/convert-html-
css-t...](http://stackoverflow.com/questions/391005/convert-html-css-to-pdf-
with-php)

[2]: <http://princexml.com/>

~~~
mike-cardwell
I used wkhtmltopdf in a previous project and found it to be extremely reliable
and easy to use. I was extracting the HTML mime parts from incoming email,
converting them to PDFs with wkhtmltopdf, then converting that to a PNG with
ImageMagick and displaying the PNG to the user in a web browser.

~~~
pbhjpbhj
Why bother with the intermediate stage and not go direct html-to-png?

~~~
mike-cardwell
Originally because I didn't find a free app which would do that. Then I
decided to keep the PDF as it was quite useful. Unlike with the PNG, the HTML
links were retained in the PDF. Ie, HTML anchor tags are still clickable in
PDFs generated by wkhtmltopdf.

EDIT: A PDF will also let you select text, unlike an image. However, an image
is nicer to embed in a webpage. So I utilised both in order to get the best of
both worlds.

------
pilif
There are two reasons I have with wkhtmltopdf that still have me fall back to
my own printing stuff I've done in 2004 in order to create PDFs for our users:

* WebKit's support for printing is a bit behind the times and stuff like "display: table-header-group;" isn't quite supported, so whenever you have to print big lists across multiple pages, you are practically forced to do your own page breaking.

* Due to an issue somewhere between qt and webkit, it's not possible to hyphenate text. Well. It is possible, but it causes the hyphen not to be painted in most cases.

Having not to deal with manual page breaks or being able to hyphenate (and
thus do real justification) were the two reasons for me to move off my home-
grown solution, but as those two are the things missing in wkhtmltopdf, I'm
staying with my own solution.

Aside of that: If you can live with these shortcomings and with the fact that
you are for all intents and purposes forced into using their static build
(patched qt, kerning issues for everybody trying a build with the same qt
patches), then this might be the perfect solution for PDF generation.

It feels great to use CSS with mm measurements and getting exactly what you
need. Or creating barcodes by just embedding SVG or being able to use the full
capabilities of HTML, CSS and even JS when building your page.

------
jcr
It's worth noting this log entry:

> Aug 11 2009: Development moved to git
> <http://github.com/antialize/wkhtmltopdf>

------
potyl
WebKit is quite powerful and can be quite easily used for generating a PDF,
SVG, PostScript of PNG in almost no effort.

I wrote a simple Deck.JS [1] and S5 [2] PDF converter using a few lines of
scripting. These programs take a slide presentation written in HTML5 and
convert them into a portable PDF document. This is very handy since you can
then share a single file that includes all graphical elements (fonts, images,
layout) intact.

I have a GitHub toy repo [3] where I made a few tests with WebKit. On the the
programs there (screenshot.pl) even lets you use XPath to find the subnode to
grab.

[1] <https://github.com/potyl/perl-App-deckjs2pdf>

[2] <https://github.com/potyl/perl-App-s5pdf>

[3] <https://github.com/potyl/Webkit>

------
imurray
Every time I see a utility like this, I think _maybe_ I could switch to
producing some materials in HTML as the primary, or main intermediary, source
format. Then I try the utility and realize that that would be silly.

For example, I currently make PDF slides for talks. In theory I'd like to make
HTML slides, but would still like the ability to render a PDF for a robust
record. However, neither this utility (or PhantomJS, which I just tried)
immediately do a good job of converting something like: [http://bit-
player.org/deck.js/limits-to-growth-Harvard-2012-...](http://bit-
player.org/deck.js/limits-to-growth-Harvard-2012-03-30/ltg-talk.html#Lotka-
Volterra)

EDIT: also just tried cutycapt, with similar results to wkhtmltopdf (got all
slides rather than just visible one, with bad page breaks, and no TeX maths).

~~~
kelvin0
Well, I am looking for some feedback on a project that converts XML to PDF.
Give it a try: <https://github.com/kelvin0/PyXML2PDF>

~~~
imurray
I am looking for a command-line utility that could do:

    
    
        webpage2pdf 'http://bit-player.org/deck.js/limits-to-growth-Harvard-2012-03-30/ltg-talk.html#Lotka-Volterra' slide.pdf
    

and actually work (create a sensible PDF representation of what I can see in a
browser). So my feedback wouldn't be useful, as my use case is out of scope
for your project: _"PyXML2PDF is NOT compatible with any XHTML/HTML/CSS. It
uses a small set of tags to quickly allow generation of PDFs."_

~~~
pbhjpbhj
Would it be sufficient to create PNGs of the web pages and extract the text of
the webpage to place in the background of a PDF file (for search,
screenreading)?

~~~
imurray
Not for me. Personally I'll stick to ways of making decent PDFs that don't go
via HTML.

------
driverdan
Over the years I've tried various HTML to PDF utilities and have yet to find
one that works correctly.

Previously I was using htmldoc, a project that was abandoned years ago and
doesn't work with CSS. It worked for what I was doing but without CSS it's
very inflexible and hard to maintain.

I recently moved to wkhtmltopdf but it has plenty of its own issues. The
biggest problem I've found is that it doesn't wrap text between pages
correctly. If you have a multi-page document it's likely the last line of text
on a page will be split over 2 pages. IMO this is a show stopping bug. It has
been known for a while but it seems no one is working on it.

The OS X version is broken. It was creating 5MB+ PDFs that should be about
50k. The Linux version doesn't have this bug.

------
hendrik-xdest
It's practically unusable when not in an environment with X11. I had to use it
on a Windows system and any text would have incorrect letter-space. Every
letter would bleed into the next, it's a typographic nightmare. You could use
Arial Unicode MS to get a somewhat acceptable result but that won't support
bold or italic text cleanly.

I'm not quite sure but I think the fix isn't even part in the 0.11 release.
One has to compile wk himself to get it working.

When this issue is resolved this will be perfect, though. It has great
capabilites to render footers and headers and JS-based output (in my case
Highcharts). For the time being you can't even switch to commercial systems -
ActivePDF, for one, has the same issue in the latest release.

~~~
TimMontague
You can work around to the text kerning issue by using custom web-fonts. But I
agree, they definitely need to fix this issue.

<http://code.google.com/p/wkhtmltopdf/issues/detail?id=72>

------
blakeeb
Tip: wkhtmltoimage is part of the package, allows you to render HTML+CSS into
PNG.

I used this for a project which needed a CSS powered image builder, which
created sharable images:

Builder: <http://circlek-flugtag.heroku.com/entries/shipomatic> Thumbnails:
<http://circlek-flugtag.heroku.com/entries>

------
thejosh
I use to use this, but it has started segfaulting on a large range of
websites.

I've since switched to cutycaps which handles all my needed features out of
the box.

~~~
mdaniel
I think you mean <http://cutycapt.sourceforge.net/> which I only found by
switching back to Google; DDG was not able to decipher your type-o. :-)

------
hieronymusN
Wkhtmltopdf does have its share of warts, but if you need to do a quick and
dirty PDF dump of an entire site, it can help.

I used it with wget to scrape a site for conversion:
[http://darrenknewton.com/blog/2011/10/30/mirror-site-and-
con...](http://darrenknewton.com/blog/2011/10/30/mirror-site-and-convert-to-
pdf/)

------
beggi
I used this for a project the other day, but after discovering PhantomJS I
feel like it has more traction.

~~~
snowmaker
How specifically did you use PhantomJS for PDF generation?

~~~
imurray
There's an example here:
<http://code.google.com/p/phantomjs/wiki/QuickStart#Rendering> (gives examples
of .png and .pdf generation)

~~~
ashconnor
Thank you for posting. I recently did a bit of research on producing
thumbnails with Ruby using this project: <https://github.com/csquared/IMGKit>.

Only problem was getting the screenshot to work with Flash. It seems as
thought the javascript delay option on wkhtmltopdf didn't delay at all.

Can PhantomJS handle pages with Flash?

~~~
imurray
Not any more, as a web search immediately reveals:
<http://code.google.com/p/phantomjs/issues/detail?id=418>

------
dantiberian
Does anyone know of any other engines like this, either paid or free? We are
using this to produce catalogs of 200+ complex pages and it is not handling
generating PDFs of this size very well. It will often become unresponsive and
create a memory leak.

~~~
hendrik-xdest
That seems to be a qt problem, as far as I know. I think a saw somewhere how
to recompile qt to get a more robust solution. The issue tracker of wkhtml is
quite helpful here.

An easy solution could be to just use extremely short URLs as these seem to
affect the space used by wkhtml as well. But that was just my solution for a
200 page output. In addition, if you are using HTML footers or headers, try
not to give them any parameters, if possible.

Edit: I can't find the best entry at StackOverflow (I remember there must be a
Python based solution as well), but this might be a good overview:

[http://stackoverflow.com/questions/633780/converting-html-
fi...](http://stackoverflow.com/questions/633780/converting-html-files-to-pdf)

Some of those are commercial.

------
AshleysBrain
Catchy name.

------
alok-g
While this maintains hyperlinks as such, it breaks those using relative URLs.

