

PDFMunge: Improve the display of technical PDFs on eBook readers - felixc
http://www.felixcrux.com/posts/pdfmunge-improve-pdfs-ebook-readers/

======
felixc
Special bonus tip for HN readers! To get a set of good starting values for
cropping the margins, use the existing pdfcrop utility with the --verbose
flag.

It will display the existing BoundingBox property as it processes each page.
Let it run for a few pages, kill it, and use those numbers as a starting
point. They will probably not be as tight as you'd like, since they won't cut
out page numbers or headers.

------
w00pla
For someone that has a Kindle/KindleDX/Nook: How good is the Kindle/Kindle DX
for technical books? I want to buy one (since I am immigrating and will lose
all my textbooks).

Is it good for technical books - since the page turn speed is apparently very
slow? Will the iPad be better for this?

~~~
evgen
I have been loving my DX since Xmas, and as a former Sony Reader refugee I
cannot tell you how much I do not miss needing to run my technical documents
and research papers through this sort of PDF munging in order to get something
that is (barely) readable on the smaller e-ink screens. If you read a lot of
technical books, papers, or other docs formatted for A4/8.5x11 then do not
consider any of the smaller e-ink units.

The page turn speed is not fast, but since I was used to it from my previous
e-ink device I don't find it too much of a bother. The thing that you lose
from a physical book is the ability to scan quickly to a particular section
that you then drill down to the page you want. With e-ink you guess the
approximate area and then guess a couple of more times until you get to the
right page. The biggest "fix" that could be provided in this case would be for
hyperlinks within the doc to work so that you could bounce from the table of
contents or index to a specific page. At least with the DX I can actually go
to the page I want though, with reformatted docs on a smaller display (like
the use case for the OP software) there was no match between the original page
numbers and the actual page number on the reader, so it was a real PITA.

Short version: if you read technical docs or papers in PDF format do not
consider anything smaller than a DX or iRex.

------
ComputerGuru
Pretty much the only thing that's making me consider switching from my Sony
PRS-505 (it's the one pictured in the blog post) to a Kindle DX is the ability
to read PDFs without the annoying and buggy reflowing. Cutting pages doesn't
do the trick for me though.. I probably *will& get the DX :(

~~~
DanielBMarkham
I've had the DX for about six months. Love it. Rotate it into landscape and it
does great with technical PDFs.

~~~
ableal
I've tested the DX only for a few days, but the scaling of PDFs seems to work
nicely (including one scanned book, Bondy & Murty's Graph Theory,
[http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.h...](http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.html)
)

On single column book-page stuff, so far I only came across one figure that
looked bad in portrait and had to be seen in landscape. Otherwise fine, if
your eyes can handle slightly shrunk type.

P.S. by scaling, I mean the fixed scaling the Kindle DX does to fit the PDF on
the page. User-controlled font-size change is only available for the mobi/azw
format.

------
caryme
Interesting. Some practice with this could make textbooks (especially smaller
ones) useable on the regular Kindle / Nook / Sony reader.

~~~
felixc
That was exactly my use case for creating it in the first place :)

------
butterfi
I can't help but chuckle -- this why geeks hate the iPad, and everyone else
will love it.

~~~
malkia
Can't you reflow your pdf before uploading it to the iPad? I thought that
would be possible...

------
agbell
Great Idea. Rather than re-flow, it cuts up the pages so they fit in
'landscape' mode.

Reflow works well, but not with diagrams and images.

------
JulianMorrison
It would be better to publish as epub. Pragprog does this.

------
ableal
For technical papers in two-column format, I've thought of just chopping each
page into four. (Brute force, judiciously applied. ;-)

If you feel like testing that, I'd appreciate to hear about it.

~~~
miloshh
There is a program called pdflrf that does this, and it is far from ideal,
since most two-column papers have figures that span both columns. Another tool
called PaperCrop does this right, but unfortunately only produces images as
output; I have not yet figured out how to reliably join them into e-book
format. The non-existence of a good tool to solve this problem is quite
annoying...

~~~
evgen
Just to clarify a point, pdflrf also converts docs to images. It crops &
chops, then rasterizes and performs a few image manipulation steps to make the
image look better on the Sony screen.

------
sabat
Sigh. Well, no joy on the mac, despite the presumed promise of platform-
neutral Python.

Python 2.5.1:

line 8: !DOCTYPE: No such file or directory

line 9: syntax error near unexpected token `newline'

line 9: `
"[http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>](http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>);

Python 3.0:

    
    
      File "pdfmunge.py", line 8
        <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        ^
    

SyntaxError: invalid syntax

~~~
scott_s
Are you making a joke, or are you being serious? If you're being serious, then
it looks like you somehow captured an HTML page instead of the Python program.

~~~
sabat
No joke, just overtired. Wow, I can't believe I did that.

