
Wkhtmltopdf - Shell utility to convert html to PDF using webkit. - xd
http://code.google.com/p/wkhtmltopdf/
======
silentbicycle
I tried porting it to OpenBSD and found building it from source to be
surprisingly complicated. IIRC, it wanted me to install their own fork of qt,
just to render PDFs. Pass.

I went with htmldoc (<http://www.htmldoc.org/>) instead. Perhaps the rendered
PDFs aren't as pretty, but for my purposes, it works quite well.

------
thangalin
Use wkhtmltoimage to create images of book-quality syntax-highlighted source
code:

<http://superuser.com/questions/213217/convert-html-to-image>

Use gvim to pick from 100 different ready-made colour schemes:

<http://www.vi-improved.org/color_sampler_pack/>

Shameless plug: I describe this technique in an appendix of my technical
manual.

<http://www.whitemagicsoftware.com/books/indispensable/>

For example:

    
    
        #!/bin/bash
    
        echo Converting $1 to $1.html...
    
        # Schemes: pyte
        vim $1 -c "set nobackup" ${2:+"-c set number"} \
          -c ":colorscheme tango-morning" \
          -c ":TOhtml" \
          -c "wq"  -c ":q"
    
        echo Converting $1.html to $1.png...
    
        wkhtmltoimage --transparent --minimum-font-size 80 \
          --format png --quality 100 --width 4000 \
          $1.html $1.png
    
        mv $1.png $1.orig.png
    
        # For scripts that are under 80 characters, make the generated image always
        # the same size (4000 pixels = width + (border x 2) = 3950 + 25 x 2).
        LENGTH=$(awk 'BEGIN {x=0} {if( length($0)>x ) x=length()} END {print x;}' $1)
    
        if [ $LENGTH -le 80 ]; then
          EXTENT=true
        fi
    
        echo Trimming $1.png...
        convert -format png $1.orig.png -background none -antialias -trim +repage \
          -density 150x150 ${EXTENT:+-extent 3950x} \
          -bordercolor none -border 25 \
          $1.png
    
        echo Removing temporary files...
        rm -f $1.orig.png
        rm -f $1.html

------
billybob
We are using this. Seems to do fine. My only complaint is that trying to
pronounce the name makes me choke on my own uuvula.

------
asymptotic
This is fantastic, thank you so much for posting this. I felt like pointing
out another headless Webkit project, PhantomJS (<http://www.phantomjs.org/>),
which can also perform HTML to PDF conversion.

edit: I've just run a comparison between wkhtmltopdf and PhantomJS and
wkhtmltopdf is far superior. wkhtmltopdf produces correct output and bookmarks
but has a slightly lower quality, where PhantomJS's output was not correct and
without bookmarks but at a slightly higher quality. From my perspective
correctness beats quality.

------
alanh
Very useful tool. Beware a tough time getting various fonts to work, though —
and some characters (thin space anyone?) will just completely crash it.

~~~
togasystems
I am using this for two different projects. The font issue can lead to a lot
of head banging. Try messing around with your css to get it close to what you
expected.

------
acangiano
I use the excellent Pandoc for my conversion needs:
<http://johnmacfarlane.net/pandoc/>

------
cstuder
On a related note: I'm looking for a command line utility (*nix) to go the
other way. Any recommendations for a tool which extracts text from a PDF?

~~~
sho_hn
pdftotext and pdftohtml are shipped with poppler, the de-facto PDF
implementation on Linux systems today.

~~~
hassy
pdftotext can produce garbled output, especially from two-column PDFs.*
Depending on what you need, scripting Adobe Reader may be an option as it does
a better job.

* I did a sizeable data mining project a while ago which required converting a lot of scientific papers to plain text. pdftotext didn't work well on a lot of them, so I had a script sending click events to Adobe Reader running overnight.

~~~
silentbicycle
While still a very new project, pdfsplit (<http://dmwit.com/pdfsplit/>) may be
worth a try.

~~~
cstuder
Thanks for all the links, I'll check those out.

------
acabal
This is a great tool, I've been using it for a long time. Only downside is
that on Ubuntu Server certain graphics libraries aren't in the PPA so you have
to download a static binary instead of using apt-get. But it's a minor quibble
for a great program.

------
Jetlag
I was briefly looking for exactly this, after our lawyer suggested manually
printing out thousands of emails and scanning to PDF. Luckily my encouragement
to use a PST file finally won out. Not a confidence inspiring moment.

------
sim0n
We're using this on <http://interstateapp.com> to allow users to download a
PDF version of their roadmap. The script works very well!

------
daeken
This tool works impressively well. I use it to convert my CV to PDF, as I
don't trust my CSS to work everywhere.

------
jbaker
Cool. Can you get an image out of it ?

~~~
xd
There is a utility on the same project page called: wkhtmltoimage

