

What was the matter with PDF? - menacingly
http://timpromptu.com/2012/12/29/what-was-the-matter-with-pdf/

======
jeremysmyth
PDFs are fantastic, and are still wonderful today. OP's problem isn't solved
by PDFs, just like it isn't solved by XML or MP3 or ZIP formats.

Simply put, PDF is a format designed to allow me to give you a document where
you can be sure (with certain well defined exceptions that I can work around
anyway) that you see what I intended you to see. That works brilliantly if I
want you to see a letter-sized 3-color document, or an A5 booklet, or a funky
3" x 9" flyer, or whatever other size I want.

This doesn't work with e-book readers, because my e-book reader isn't A4 or 8"
x 6" or whatever other format the book is in. Epub, mobi, LRF etc. give the
e-book reader the ability to format the text (if it's pure text, and not, as
OP complains, pre-formatted with assumptions the e-book reader can't match)
for the screen or for my options wrt. zooming and font size.

I say all of this as both an avid PDF user (I write lots of documentation),
and an avid e-book user. You use the right tool for the job. If you're
complaining about PDFs on your e-book reader, you're doing it wrong.

~~~
mikevm
>Simply put, PDF is a format designed to allow me to give you a document where
you can be sure (with certain well defined exceptions that I can work around
anyway) that you see what I intended you to see.

Well put. When I buy technical books, I always look for a PDF format. I want a
copy of the book that is the same as the printed one and that's exactly the
reason why I like PDF. Plus, things like diagrams and graphs are in vector
graphics, and not raster as they are in EPUB/MOBI.

I've made the mistake of buying a couple of technical EPUB eBooks that
contained graphs, and in all cases it was a complete abomination. The
publishers seem to want to produce the smallest files they can, so what used
to be vector graphics, become highly compressed and low resolution raster
images. Trying to zoom in on a graph on your tablet gives you a resized blurry
image.

I don't like this trend where some technical ebook publishers offer a PDF
version of the book in addition to an EPUB, and the PDF ends up being a mere
conversion from the EPUB version, having all the cons I mentioned above. This
happened to me on InformIT when I bought "The Mythical Man-Month", and they
ended up giving me a refund.

------
dgreensp
Not a single sympathizer with the OP?

Kindle books have truly terrible formatting and editing in general. Some are
riddled with what look like OCR errors; in others, blockquotes and italicized
regions start and end in the wrong places. Images are in awkward places.
Sometimes just paging left and then paging right causes the page boundary to
move.

The main reason, I've heard, is that even if the Kindle format and reader app
are reasonable (and I can't vouch for them), it's on publishers to put their
books in the right format. They have no idea how to do this, so they outsource
it. Some publishers probably don't care about or dislike the Kindle platform.

I find it really, really sad and bizarre to read books with so many errors.
Maybe you haven't come across a really bad one yet, but since it varies by
publisher (and I'm talking about big, mainstream publishers), just wait, you
will.

~~~
jpdoctor
> _Some publishers probably don't care about or dislike the Kindle platform._

This is the real source of the problem. Publishers have every reason to
suspect that Amazon will eat their profits and would be stupid to help Amazon
accomplish its goal.

~~~
greenyoda
Except that a badly-formatted Kindle book reflects more poorly on the
publisher than it does on Amazon, since Amazon doesn't create the content. If
a publisher really doesn't want to encourage the e-book industry, isn't it
much easier for them to not publish an e-book at all?

------
k3n
Full PDF support is a security nightmare[1], and PDF supports so much more
than a simple reading program should need, which results in it being bloated,
buggy, and drastically increases the surface area for attacks.

1\. <http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pdf>

~~~
rwallace
Yes but there's no requirement that any particular PDF reader implement all
the extra crap.

~~~
k3n
True, but there's no delineation provided in the PDF files so as to ascertain
which files require which features (via file extension or whatever), and so it
might be impossible to tell if your particular reader supports all of the
features provided by the PDF file. The better readers will tell you that some
features of the current document aren't supported, while others simply won't
render correctly at all, or worse, will just crash the application.

You also end up with users trying to open some dynamic form PDF with scripting
and whatnot enabled, and of course your portable device balks because it
doesn't support it, so now it gets kicked back to the device MFG because the
PDF reader is "broken".

------
creature
I've attempted to read technical PDFs on my Kindle. You either need a
magnifying glass (if you try to do it a page at a time) or spend the whole
time scrolling. Maybe scrolling isn't a big deal on a Kindle Touch, but it's
not fun on a standard Kindle.

I don't think this is a question of formats; I think it's a question of "How
do we show legible code samples on a 7" screen?".

~~~
TeMPOraL
Switch the screen to landscape mode and do fit-to-width. This way, a typical
PDF will only need 3 "next page" pressess per page, without you having to
scroll sideways.

~~~
hahainternet
This is how I have read a number of books now. It's still frustrating, but
it's better than nothing.

------
josh_fyi
PDF and ebooks (MOBI, EPUB) optimize for different things.

PDF is for layout -- it is actually more of a graphic format than a text
format. Contiguous text may be scattered around in the internal data
representation, so long as the location of each character on the page is
expressed. This is good for tables, formulas, and other items requiring
precise layout.

EBooks are for a smooth reading experience, including wrapping. This will
naturally disrupt layout.

I think that with some technical effort, it is possible to build an eBook that
flows -- while still allowing precise layout where needed.

------
Terretta
> _What was the matter with PDF?_

In a word, reflow.

~~~
Avshalom
Actually that's sort of the problem. Reflow can fucking destroy technical
texts.

~~~
bentcorner
Epub apparently supports fixed-layout content (or so says wikipedia). I don't
know if I've ever seen it done (although I haven't read many technical epubs).

~~~
nicw
Fixed-layout is supported via EPUB, but you need a reader that knows how to
display the fixed-layout version. 99% of readers only display reflowable
content.

What you really want to do is go with reflowable content that is formatted
correctly (proper HTML/CSS), and if the content breaks across the page,
changing the font size should help rectify.

(spoken from too many years of ebook + tech book experience)

------
arikrak
I understand there are some formatting issues, but PDF's dont work well on
smaller devices. In fact, I find they don't even work well on Laptops, since
you need to scroll down each time to the bottom of a page. EBooks formats are
better for both, they just require some formatting work before being
published.

~~~
tibbon
Perhaps this isn't a "smaller device", but I love PDFs on my iPad.

~~~
craigching
I want to second this. I use good reader on my iPad and absolutely love
reading PDF on it. I was going to check out Readdle as well since that's been
getting good reviews. The best part is that I have all my technical books
categorized on my iPad which I always have with me so anytime I have a spare
moment or wish I had "that" book with me, I now do!

------
melvinmt
Reading PDFs on my iPad results in pinching every single page to make the text
somewhat readable. It is, frankly, exhausting.

~~~
wonderzombie
Try Goodreader. It has a ton of other useful features, but among them is the
ability to crop.

Cropping is applied to all pages in the document, so once you use the handles
to eliminate excessive margins, you've got nothing but the text. I find it
indispensable, to the point where I prefer reading PDFs on the iPad to reading
them anywhere else.

~~~
CountSessine
Another big endorsement for GoodReader.

GoodReader's UI is a bit of a mess, but that's just a reflection of how many
truly useful features it has. And the systematic elimination of margins -
including the ability to handle left and right pages differently - makes
technical manuals and papers actually readable.

Not having GoodReader was easily one of the worst problems I faced when I
switched to Android - there's just nothing like it.

------
goblue
This is one of the bigger reasons I got my iPad (Retina). I can read most PDF
textbooks/documents in full screen portrait without any zooming or scrolling
with ridiculous text clarity. No other device I have allows for this nearly as
comfortably.

------
lnanek2
I've had some terrible Kindle experiences as well. Textbooks on learning
Chinese here there was no way to zoom in on the tiny intricate characters.
Textbooks on learning Chinese where the lines of Chinese had placed
incorrectly. Whoever did the translation treated them as images basically, and
image support on Kindle is very poor, and in that one book they placed the
"images" under the wrong phrases making it completely wrong. Amusingly these
are also the books that disabled themselves after I updated my phone too many
times, saying I'd exhausted my licensed downloads. So they were simultaneously
the worst content translation I ever encountered and the most useless
licensing for a developer who changes and reflashes phones constantly. I
imagine technical books are similar, with code being treated as badly as
Chinese.

------
whattttttttt
A4 PDFs work fine on a Kindle DX. That's what it was designed for. Regular
Kindles are for reading novels.

------
threedaymonk
PDFs can work really well on ebook readers.

However, most PDFs don't work at all well on ebook readers.

If you make a monochrome PDF with minimal margins, with a sensible font size,
for a 6 or 7" screen, it will be attractive and readable on a that screen. It
can - if you do it right - have better wrapping, widow/orphan control, and
fonts than a normal ebook, too. I have an old Sony Reader PRS-505. The
screen's the same as a Kindle, but the processor is slow, and paginating its
native format takes ages. Nonetheless, it can display PDFs very rapidly.
Before I got a Kindle, I used to convert books from text into small PDFs, with
attractive embedded fonts, and read them on the Sony that way.

However, if you try to read an A4 PDF, with two columns and massive margins,
on a 7" screen, you're going to have a bad time.

The downside to using PDF is that you lose the ability to reflow. But that's
the upside too, at least if you have complex content that's not amenable to
reflowing; many technical books fit into this category.

------
khill
Is this a Kindle-specific problem? I actually just bought the same book in
epub format for my Nook Color and I don't have any complaints yet.

Not sure if I have a higher tolerance for bad formatting or if the Kindle
version has some formatting issues.

------
gushie
With all the PDF tech books I read on my Kindle, I either have to read through
a microscope, scroll the screen from left to right continously or hold the
Kindle landscape meaning the buttons are in the wrong place.

~~~
mikevm
You know that old joke? A patient says to the doctor, "it hurts when I do
this", and the doctor replies: "so don't do that". Your 6" Kindle was not
meant for reading PDF books, so don't complain when you're using it not what
it was made for (reading novels).

------
benlower
I sympathize with the sentiments regarding tech books in Kindle format. I try
to buy all my tech books at O'Reilly since they usually will have each title
available in multiple formats (PDF included).

------
sycren
So if we do not need a self contained file, is html5 the answer?

~~~
ht_th
There is a fundamental problem with viewports of unknown size: you as author
cannot ensure that the way you want to represent your information is also
rendered that way. For a novel or other basically sequential text that doesn't
matter, but for a technical document with complex representations, it might be
problematic. Long or wide tables are a common example, if headers aren't
repeated they fast become a meaningless bunch of symbols. A graph whereby you
cannot see essential features as the legend, axis, all typical points (like
extrema), or the pattern described by the whole, probably makes it troublesome
to support your argument you try to make in your text. And don't get me
started on unique and more complex diagrams and infographics.

Zooming in and out just doesn't cut it.

Of course, you could write a different text with different diagrams for
different viewport sizes, but besides spending more time writing your
documents, it also becomes a referencing hell.

~~~
sycren
Tell me what you think of 'The Future of the Book' by IDEO:
<http://vimeo.com/15142335>

~~~
ht_th
Don't get me wrong, I like information rich interactive documents. I just
think it is an illusion to think that these documents fit different (sized)
devices without (extensive) editing. And there is nothing wrong with that as
long as we are aware of that. The problem I currently see is that textbook
writers (I work in education) write a textbook the traditional way and then
digitize it poorly without any regard for the nature of the devices it is used
on nor of modern information processing capabilities.

Or, stated differently, think about how information is represented best on
different devices (device groups).

~~~
sycren
Perhaps I am far too naive on this subject, but I would have thought that it
would be quite easy to algorithmically work out how best to present
information on each device using different aspect ratios, colour levels and
resolutions as variables to change em or % css values of elements, or even
conditional css layouts.

~~~
ht_th
With respect to a pure technically translation from one device to the other,
you're absolutely right. The problem is in
interpretation/representation/communication of information/ideas/knowledge.

For example, I've built an interactive computer simulation connected to a
graphing component to enable students to explore rate of change. On a computer
screen the simulation and graph are thus positioned that there is a direct
link between what happens in the graph and in the simulation. It doesn't fit
on a smartphone. I can (automatically) change the setup to put the graph after
(below) the simulation, but if the student cannot see the graph and the
simulation simultaneously, they miss out on the support for learning the
concept of change in the original configuration.

Another example, I taught a course on regular expressions and that included a
unit on deterministic finite automatons. For non-trivial examples these
automatons aren't comprehensible on small screens. Zoom and pan doesn't work
to get a picture of the whole. On the other hand, I can think of a simulation
of an automaton that would give a better picture on a small screen how a
particular automaton works by using a simplified track representing the whole
and focusing on a local state and input to see the effect of connections in
the track. However, this would mean two different approaches to learning
automatons that aren't necessarily compatible, especially in an introductory
course. Students will have different questions and problems with the different
representations. They will construct different ideas about automatons that
might make communicating in class about automatons troublesome as students
don't understand each other while talking about the same thing. But it goes
further than that, these representation will need different introductions,
different exercises, maybe even a different structured learning trajectory to
build a similar understanding of automatons.

~~~
sycren
Perhaps we need to invent different ways of displaying information when a lack
of space is a given.

For example, with your interactive simulation, could the entities change
colour or shape to show rate of change?

With your automatons, I would have thought it wise to try to separate the
different groups and then step through each group. Understanding each group in
order for me would be the best way to learn it than nitpicking at different
parts of the whole system.

~~~
ht_th
Certainly, it is similarly to mobile versions of websites: build the best
representation for the device and situation and be aware of the differences
and potential effects of these differences. I think that smartphones do offer
great opportunities in education, not as the main information gateway, but as
an auxiliary tool for students and teachers to interact with the content, each
other, and the learning environment.

------
drcube
Design and looks aside, I can read a PDF on my laptop, desktop, phone, and
e-reader, on any OS. Why the hell would I want to use a format that only works
on one device?

~~~
mikecane
>>>I can read a PDF on my laptop, desktop, phone, and e-reader

When MobiPocket (the basis for Kindle format) and ePub were specced, the world
looked like this:

>>>I can read a PDF on my laptop, desktop

Expensive hardware with powerful CPUs. Kindle/ePub were designed for weak CPUs
that could basically handle text (K/ePub -- tarted-up text files) well. Back
then, these were PDAs.

It's only been really recently -- with iPad 4 -- that monster PDFs, such as
those from Google Books (scans of original book/magazine pages), can be read
without wanting to toss the hardware against the wall.

~~~
pdfsage
That's because Google Books made a decision to use a feature of PDF (JPEG2000
image compression) that is better suited for higher powered devices. They
could just as easily chose JPEG or Flate if they wanted.

Don't blame the format - blame the publisher!

------
jamestc
I feel like a god damned idiot, but what the fuck is a guy to do if he wants
to read PDF files on an e-reader? They look like dogshit on Kindles. Thing is,
I have a ton of them. Most of them somewhat obscure/can't be found in other
formats.

I might be talking nonsense here, but I swear there is a niche market for some
kind of affordable (<$100) tablet specifically designed for PDFs.

~~~
mikecane
>>>I might be talking nonsense here, but I swear there is a niche market for
some kind of affordable (<$100) tablet specifically designed for PDFs.

What would differentiate the hardware from current tablets to make that price
possible?

And then what kind of PDF do you mean? Something squirted from a word
processing/page-layout file that permits reflow or a collection of image scans
(see Google Books) that are static? It's not that easy.

~~~
jamestc
Less memory. Less research. It only has one job to do, etc. No extra bullshit.

I know I'm being ridiculous. I'm just bummed that I can't read some of these
really awesome books with the same leisure that I can a Kindle book or a
physical book. First world problems, etc. I have to get over being
phenomenologically appalled at the idea of reading a book in the same space I
read message boards. There's a weird mental block there.

~~~
mikecane
Less memory? I don't know about that. There are some Google Book PDF scans of
early 20th-century magazines that are close to 1GB in size. It's been my
experience that even a good tablet doesn't keep more than 3-5 of these scanned
PDF pages all rendered and ready to read if you want to switch back and forth
between a bunch of pages.

------
GotAnyMegadeth
I don't know about any of the more modern Kindles (or equivalent) but my girl
friend's Kindle can't handle PDFs very well either. She reads a lot of
scientific journals, which seem to usually be written in columns, and the
Kindle gives you the option of either seeing the left column and the left half
of the right one, or just the right hand half of the right.

------
galaktor
This is the only reason I don't own an ebook reader yet. All the books I've
read over the past few years are technical books. If I'm going to read rather
tough material, I at least want the design/formatting to assist me in taking
in the content.

Ebooks eliminate any typographical support (good) books can deliver.

------
nater
> I mean, who gives a shit about some nerds when you’re moving bazillions of
> copies of books that help teens or moms explore new facets of their
> sexuality?

Can we have this conversation without resorting to highly gendered stereotypes
for trivial-media-that-I'm-not-interested-in examples?

~~~
mikecane
That wasn't stereotyping. He was making an allusion to Fifty Shades of Grey.

------
mjs
I'm really hoping Knuth doesn't discover how bad his books look in their
Kindle editions, he's already had to fix this problem once...
<http://en.wikipedia.org/wiki/TeX#History>

------
styluss
I like PDF's on my nexus 7.

------
danso
As a journalist who spent a large part of my career converting masses of PDFs
(not the scanned kind, though of course those were also problematic) into
parseable text, I'd say the problem was what seemed like a total black box of
document composition. A simple tabular text PDF that was generated through
some bespoke software package could result in a stupefying variety of text
outputs, whether through third-party services like CometDocs or good ol' xpdf.
At least with HTML documents, an XML element is an XML element.

Of course, HTML presents the same stupefying array of possibilities, except in
the form of visual output...which is why I guess we needed PDF in the first
place.

