
PDF: Still unfit for human consumption, 20 years later - ciprian_craciun
https://www.nngroup.com/articles/pdf-unfit-for-human-consumption/
======
jcrawfordor
Any complaint about the PDF format tends to be hard to address because the PDF
format is so complicated and so flexible---except, of course, for the argument
that the PDF format is too complicated and flexible, which tends to be the one
enduring criticism since it has lead to a history of various security,
compatibility, and performance issues related to PDFs.

The major attempts to replace PDF have largely failed, though. DjVu is
relatively limited in scope. Postscript (as a document display format) has
never been well-supported on Windows and is increasingly poorly supported on
Linux due to rarity. XPS is perhaps the most direct "PDF replacement" but is
nearly equally complicated (being based on the MS Office OOXML formats, giving
it a similar cursed heritage to PDF's basis in the Photoshop PSD format), and
there was never really a compelling argument to switch to it.

What I don't get is the suggestion that PDF should be replaced by HTML. The
purposes of the two formats are basically orthogonal and replacing one with
the other is doomed to failure. The author's argument seems more akin to
"print-layout documents should be replaced by hypertext," and perhaps this is
true in some cases, but it's definitely a different matter and one that the
author's arguments don't really support that well.

In my opinion, hopefully more humble than the author's, PDF's main downside is
the remarkable unevenness of the quality of the creation and reading tools,
considering its supposedly "reads everywhere" nature. The "reference
implementation" is a commercial product and supports a huge list of features
that are rarely or never supported by third-party commercial or open-source
implementations. The Linux toolchain still widely used with PDF (e.g.
Ghostscript) is decidedly outdated and hard to work with, but there's not a
lot of momentum towards development of more modern tools. All of these issues
are likely rooted in the basic fact that the PDF format is extremely
complicated, and so thoroughly implementing it is a massive undertaking.

The author's complaints about performance in particular reflect the
flexibility and complexity of the format. Web browsers have mostly switched
over to using pdf.js to render PDFs, which is completely satisfactory for
documents that consist of text or images (like scanned documents), but can be
absolutely unusable when dealing with extremely vector-heavy PDFs like GIS
exports.

Even printing PDFs can become rather frustrating as the complexity of the
format means that parse-related printing issues are relatively common. Even
Acrobat, for a long time, would munge certain characters when printing due to
some sort of inconsistency with how different generators and readers
implemented font embedding leading to Acrobat not being able to locate the
embedded character font. This seemed most common with the letter "l" but maybe
I'm imagining that... but also maybe it reflects some frightening detail of
the format or implementation behavior.

One of the most common issues around PDF consistency comes down to file
size... different PDF generators are prone to create representations of the
same document that are significantly different sizes. Scanners are often an
extreme example, some combination of not "knowing the tricks" for PDF
optimization and a probably very low-performance compression implementation
means that low-end network scanners often produce PDFs that are hilariously
large. Opening them in Acrobat and using the "optimize file" tool can reduce
file size by 90% without apparent visual impact... the whole fact that Acrobat
has an "optimize" tool (and that Acrobat Distiller used to exist) speaks to
the scale of this problem. Inspecting PDFs that are "optimized" by Acrobat can
be an alarming experience, as well. You may remember that this played a
strange role in Obama's birth certificate some years back, as Acrobat seems to
normally split PDFs into all kinds of different layers and apply strange
transformations to them when it "optimizes." It's hard to know how much of
this is actually "best practice" versus just a result of Acrobat accumulating
decades of eccentricities.

So the bottom line is... PDF is too complicated for its own good, but then so
are a great deal of other formats in widespread usage, like modern webpages
which require complex parsing of multiple formats to render, and a great deal
of historic cruft brought along with them. I'm not sure that there's any sound
technical argument that PDF or web pages are a "better format," it's all a
matter of opinion over whether you prefer print-format documents or hypertext,
and that's going to be very application-specific.

~~~
chipotle_coyote
> What I don't get is the suggestion that PDF should be replaced by HTML. The
> purposes of the two formats are basically orthogonal and replacing one with
> the other is doomed to failure.

Isn't "the purposes of the two formats are basically orthogonal" actually the
entire point the article is making? Literally the first line of the summary:

> Research spanning 20 years proves PDFs are problematic for online reading.
> Yet they’re still prevalent and users continue to get lost in them.

From the second paragraph:

> The [PDF] format is intended and optimized for print. It’s inherently
> inaccessible, unpleasant to read, and cumbersome to navigate online.

The bolded statement in the second paragraph that's clearly meant to be the
One Important Thing to Take Away:

> Do not use PDFs to present digital content that could and should otherwise
> be a web page.

Your comment here is eloquent, but the article's argument is not "print-layout
documents should be replaced by hypertext," it's "print-layout documents are a
poor fit for reading on screen-layout devices." When you conclude:

> It's all a matter of opinion over whether you prefer print-format documents
> or hypertext, and that's going to be very application-specific.

Aren't you essentially restating the article's thesis?

I don't want to read an article online that's a PDF for largely the same
reason that I don't want to print the web version of the same article rather
than a PDF. It's generally going to be clunky. The print page size and
dimensions are not going to be my screen/window size and dimensions. I
certainly don't want to read two- or three-column text on screen, which may
require zooming in and out and scrolling back and forth on the same "print"
page. And God help me if I'm trying to do that on my phone or iPad mini.

The article isn't saying "PDF is terrible and nobody should ever use it"; it's
saying "PDFs were meant for specific applications and in nearly all
circumstances, online reading is not it."

~~~
lmm
People who use PDFs generally do it because they _want_ to have a fixed
layout. If you tell those people to use HTML, they'll find a way to produce a
non-reflowable webpage.

~~~
bryanrasmussen
Or they use them because they have a publishing flow implemented somewhere in
the byzantine processes of their company that spits out a nice looking pdf at
the end (and maybe a crappy looking 1999 html document)

~~~
chipotle_coyote
That's actually been more my experience, yes. :)

------
GnarfGnarf
Although I agree that PDFs (and screens in general) are not the best for
reading, the PDF file format is a minor miracle. It is a thing of beauty,
combining text and graphics to preserve the author's design.

I have built a business on PDF. I develop graphics software, enabling my
customers to create large charts (36" x 96" and bigger) in PDF format, which
they can take to the print shop for printing on large-format plotters and
printers.

The sharp crispness of PDF text and vector graphics allows unlimited zooming
while never pixellating (except the photos, of course).

If you are familiar with the technical specifications of PDF (1,300 pages 2006
ed.), you will appreciate the sophistication and power of the internal
structure of PDF.

As an exchange medium, PDF has made huge contributions to commerce, technology
and culture.

~~~
xscott
Other than fillable forms, which is sometimes very important, I don't see how
PDF is much of an improvement over PostScript (which came first).

~~~
jcrawfordor
A very simple answer: Xorg has integrated PostScript support which makes
rendering .ps files very easy on Linux. Very few tools were ever developed to
do this on Windows, to the extent that using Ghostscript ported from Linux is
still a common approach. It's still a pain to deal with PostScript files on
Windows, and obvious tricks like using ports of Linux viewers that support .ps
generally don't work on Windows because those viewers were just leaning on
Xorg to do the hard part.

PDF would have a similar problem, but Adobe leveraged their previous work on
other products so they basically already had the rendering engine for Windows
and it gained traction there.

Keep in mind that both Postscript and PDF were principally designed by Adobe.
Adobe designed both because they were intended for different purposes, and
this stands today.

~~~
kbr2000
Hey, I'd like to read more about this integrated PostScript support for Xorg,
can you point me in the right direction please? Tnx

~~~
lmz
There you go:
[https://docs.oracle.com/cd/E19683-01/816-0279/dps-91433/inde...](https://docs.oracle.com/cd/E19683-01/816-0279/dps-91433/index.html)

Except it's not Xorg but Sun's own server, and it's not Linux but Solaris.

------
smoe
> 4\. Stuffed with fluff. PDFs tend to lack real substance, compared to
> regular web pages.

The exact opposite is the case in my experience. Unfortunately the actual
substance is often in a PDF and all the web pages pointing to it are
superficial, copy and pasted and/or clickbaity fluff.

They then go on about how in web sites the content can be better structured
and navigated. Unless I'm misunderstanding the word in English, what has that
to do with whether the content has substance?

> [...] This leads to overwhelmingly long and inane PDFs

You mean something akin to a book?

~~~
nattaylor
Like the authors, I find that content published as a PDF is often extremely
verbose, almost like the authors are paid by the page.

Government reports, or those prepared by consultants, are often the worst
offenders.

~~~
bachmeier
Honestly, that's just silly. If they'd use html instead of pdf, you'd still
have the same content. pdf is a format. It has nothing to do with the content.

~~~
znpy
If we used html instead of pdf the whole society would collapse.

Didn't anyone notice that it's basically impossible to save an html page today
and have it load and render correctly and offline tomorrow?

~~~
kevincox
Perfect fidelity isn't there but all popular browsers have a "save page"
functionality which seems to work really well.

~~~
oconnor663
That's exactly what the parent is criticizing. The problem with save page is
that the HTML you save still contains tons of links to server resources,
particularly CSS and JS. Of course those links will work if you look at the
saved page immediately after you save it. The problem is that if you come back
later, sometimes even just the next day, they no longer work. A lot of JS file
names are auto-generated random numbers, produced by packaging systems rather
than humans, which change whenever the developers edit their JS. They aren't
designed to be stable.

There are tools that try to fetch those links and update the HTML to point to
the local copy. But those tools can only go so far. JS is allowed to fetch new
files dynamically, and there's no reliable way to look at a piece of code and
automatically figure out what it's going to fetch when you run it.

~~~
kindofastrawman
> JS is allowed to fetch new files dynamically, and there's no reliable way to
> look at a piece of code and automatically figure out what it's going to
> fetch when you run it.

You've diverged from the context and are no longer doing an apples-to-apples
comparison. The things you're describing are all opt-in and amount to having
to deal with an adversarial input. There's nothing inherent to the medium that
_requires_ those things.

In other words, a person publishing a PDF is already abstaining from certain
things. (Namely, the sorts of things you're bringing up that would make for a
pathological case.) If the person who publishes a PDF does a straightforward
translation into a web page, then you end up with something that doesn't
exhibit any of the downsides you're discussing.

~~~
anoncake
No, but the medium allows these things. And that's a problem.

------
rayiner
Incorrect. Modern web pages are garbage and PDFs are far better. No auto-play
animations, no animations at all, no bizarre hijacking of scrolling, etc. a
multi-hundred page PDF loads in a blink of an eye compared to a advertising
tracker-loaded web page.

Screen size-adaptability and reflow remains a problem. It would be better to
fix that on the PDF end than to move those uses over to inferior web
technologies.

~~~
scrollaway
I'd like to see you try to have a conversation on a tech & startup news
aggregator built in PDF, see how quickly your reader loads it then. You're
talking about PDF like the only documents you've seen are printed from LaTeX /
Chrome, but PDF supports forms, javascript, 3D models and more.

PDF is an atrociously bad format, and I don't know what "multi-hundred page
PDF loads in the blink of an eye" for you but even a 100 _blank page_ PDF
takes nearly a second to fully load on my beefy rig (I did the test a few
months back to prove a point). [Edit: Other commenters made the clarification
below, but single page render time is not the same as document render time]

Clearly extracting text from a PDF is nearly as difficult as extracting it
from a photo. Digitally extracting information from PDFs in general is awful,
which makes the format awful for the various things it's used for.

Not to mention that many uninformed users _today_ still install the garbage /
malware PDF readers such as Acrobat because they don't know any better.

~~~
gspr
> I don't know what "multi-hundred page PDF loads in the blink of an eye" for
> you but even a 100 blank page PDF takes nearly a second to fully load on my
> beefy rig (I did the test a few months back to prove a point).

The manual for PGF/TikZ [1] is a huge PDF I frequently open. It's more than
1300 pages and has lots of graphics. It opens and navigates in the blink of an
eye on my 3 year old laptop (with the Okular reader). PDFs aren't perfect, but
they sure feel spiffy compared to modern webpages.

I do agree with some of the article's complaints, but not this one.

[1]
[http://mirrors.ctan.org/graphics/pgf/base/doc/pgfmanual.pdf](http://mirrors.ctan.org/graphics/pgf/base/doc/pgfmanual.pdf)

~~~
tehabe
That is created using LuaTeX and I'm sure the sources behind that PDF document
are carefully crafted and LuaTeX works really well. But if you would do the
same document with the same amount of images in Microsoft Word and create a
PDF document is would be much much bigger and it won't load that quickly.

I will take the last part back, if someone can prove that I'm wrong about Word
and PDF documents.

~~~
gspr
In that case it sounds like a problem with Word and not with PDF.

I wouldn't know – most PDFs I consume are generated by some variant of TeX. I
gave a random 300-page datasheet I have lying around a go. It says it was made
with Acrobat Distiller and "C2 Rendition". Feels just as spiffy as the
PGF/TikZ manual.

~~~
tehabe
All that I wanted to say is: not all PDF documents are created equal, some are
really well and some are just awful.

------
crazygringo
Hard disagree. Also the author is arguing against a strawman.

Normal PDF's are simple, reliable, and interoperable.

In contrast to webpages which are _actually_ more often the "clunky", "slow",
"stuffed with fluff", and "disorienting" (with scroll hijacking) alternative.

But the strawman is people creating PDF content as an alternative to HTML.
Practically nobody is doing that. Virtually every PDF out there is designed to
be a printable document _first_ , that is _then_ made available on the web.
_Nobody_ is saying "how should we architect our new site -- I know, let's make
all our pages PDF's!"

What a truly bizarre article.

~~~
visarga
> Virtually every PDF out there is designed to be a printable document first ,
> that is then made available on the web.

Tell that to Arxiv. Most papers never get printed. Everything is consumed on
screen. Yet the layout is completely wrong for screens.

I think browsers should offer PDF reflow as HTML, to adapt to any screen width
with optimal font size.

~~~
ew6082
The use case for 99% of pdfs is email transfer. They are absolutely superior
to sending a clunky, bloated MS Word or CAD document. The web archive is just
the final resting place in the process that made them.

~~~
crazygringo
Exactly.

With academic articles, I virtually never want to simply read them online.

I need to save them for future reference, read them later when I've set aside
time, annotate them, refer back to my annotations four months later...

Arxiv (or JSTOR or wherever else) is just where you _get_ the papers. It's not
where most academics are going to be _consuming_ them.

(For consumption, a full-size tablet like an iPad, with a stylus or Apple
Pencil, is absolutely ideal.)

~~~
ImaCake
Can I ask what app you use on your iPad for reading PDFs? I use Adobe reader
which is pretty good on iOS, but I find Preview a better experience.

~~~
sjy
I have been maintaining a PDF library this way using GoodReader for about 10
years now. You can connect it to most cloud storage services or any SFTP or
WebDAV server, and sync annotations with Acrobat, Preview, Okular, etc. on the
desktop. I have still yet to find something this good for HTML or EPUB
documents.

------
rietta
PDF is the ultimate WYSIWYG print substitute format. My mom in her 70s can
create PDFs from OpenOffice/LibreOffice without much hassle. Ask her to create
a web site of any type is going to be a problem. Now imagine the tons of
business people who can navigate programs perfectly capable of creating PDFs.

PDF also works GREAT as an archival format. I log into financial accounts
regularly and save PDFs for each statement period. Makes reconciling a snap.
And provides a locally archived document history for audits from taxing
authorities etc. I never have to resort to finding paper.

Finally, PDF works great as a native format that my office printer/scanner
understands how to write to. I can scan those annoying tax documents sent to
my office to PDF and archive on the NAS/cloud backup as I deal with it and
know that I have my documents digitized so I can shred the paper.

~~~
gspr
> PDF also works GREAT as an archival format. I log into financial accounts
> regularly and save PDFs for each statement period. Makes reconciling a snap.
> And provides a locally archived document history for audits from taxing
> authorities etc. I never have to resort to finding paper.

I don't have a problem with PDFs myself, but surely it would be better if your
bank gave you these in text form so that you can actually easily and reliably
process them?

~~~
rietta
The transaction exports in Quickbooks or Quicken format that can be imported
GnuCash or whatever is helpful. However, if at some point there is an audit
nothing beats the usability of easy for the auditor to understand visual
format that is a date/time stamped record.

------
ChrisMarshallNY
PDFs for printing are great, and they make a nice portable envelope for my
vector originals, but I _despise_ them as online or eBook formats.

For eBooks, I've settled on reflowable EPUB. I guess, in some cases, we may
want fixed format, where PDFs might be useful.

For online, I prefer HTML, usually as a continuous page, and with "pretty
print" ( _@media print_ ) CSS. I find it annoying that the _page-break-%_ CSS
rule seems to be ignored by just about every browser, or at least, interpreted
badly.

I really have gotten a lot out of the NNG folks; in particular, Don Norman,
but they do like to kick anthills.

~~~
Joker_vD
I've yet to find an EPUB reader that would give me scrolling without page
breaks. So far, it seems all of them believe that if I scroll past the end of
the chapter it's because I've read it all and want to see the new one, not
because I want the chapter's last lines to be in the middle of the screen, for
easier reading. Argh! And every single PDF viewer, even the shitty built-in
mobile ones, have continuous scrolling!

~~~
layoutIfNeeded
>I've yet to find an EPUB reader that would give me scrolling without page
breaks.

The Books app on iOS can do this.

~~~
ChrisMarshallNY
That's why I prefer Books. It's kinda "meh," otherwise, but I have come to
rely on it for an eBook reader.

------
mrb
Here is Hello World in PDF: a single letter page PDF displaying the string
"Hello World" at font size 48pt. You should be able to copy/paste that into a
text editor, and save it as a .pdf file. Chrome can open it. It is fully
compliant with the PDF spec (I believe). No unnecessary optional object is
present.

    
    
      %PDF-1.2
      1 0 obj
      <<
       /Type /Catalog
       /Pages 2 0 R
      >>
      endobj
      2 0 obj
      <<
       /Type /Pages
       /Kids [ 3 0 R ]
       /Count 1
       /MediaBox
       [ 0 0 612 792 ]
      >>
      endobj
      3 0 obj
      <<
       /Type /Page
       /Parent 2 0 R
       /Resources 4 0 R
       /Contents 6 0 R
      >>
      endobj
      4 0 obj
      <<
       /ProcSet[/PDF/Text]
       /Font <<
        /F1 5 0 R
       >>
      >>
      endobj
      5 0 obj
      <<
       /Type /Font
       /Subtype /Type1
       /BaseFont /Times-Roman
      >>
      endobj
      6 0 obj
      <<
       /Length 51
      >>
      stream
      BT
      /F1 48 Tf
      50 400 Td
      (Hello World)Tj
      ET
      endstream
      endobj
      trailer
      <<
       /Root 1 0 R
      >>

------
bluenose69
All the journals in my field (oceanography) show papers as HTML, with a link
to get the PDF. I go for that link if a three-second glance makes me think the
paper might be of interest. I am certainly not alone in this; I have never
heard anyone state a preference for the HTML view.

This is not just for one journal; it's for the dozen or so journals that I
look at regularly.

The mathematics looks terrible in HTML, and great in PDF.

Figures usually look terrible in HTML, and quite often when you click on the
action to zoom them, you get a choice of just one zoom factor. Plus, the
caption disappears so it's easy to get lost. With PDF, you can select your
zoom factor and maintain context.

PDF has fixed page numbers, so you can refer to material in the paper easily.

The fixedness of PDF aids memory. I can look at a paper I've not consulted in
30 years, and know that something I want is (say) at the top of the right-hand
column just past the figure showing such-and-such. With HTML, I basically get
lost in a stream that changes if I zoom the text (often required to try to
decode poorly formatted mathematical symbols) or even change the geometry of
my viewing window.

I can highlight PDFs, and add comments to them. This is enormously valuable in
research work.

(La)tex-generated PDF files can offer mathematical representations that are
not just clear, but elegant, and in a form that matches historical convention.
HTML representations vary from journal to journal (which is bad enough in and
of itself) and almost never match what the reader expects from standard
textbooks and classic papers.

I suppose HTML has the benefit that it can be set up to adjust to the viewing
platform, so I can try to read a paper on my mobile phone. Not that doing so
makes any sense at all.

For me, it's an easy decision.

~~~
leephillips
Yup, that’s it. I’m sure any scientist or mathematician here knows exactly
what you’re talking about. What do you use to highlight and add comments to
PDFs?

~~~
bluenose69
I use a mac, and often use Preview to highlight and mark up, but sometimes
I'll use acrobat instead, if I'm sharing with people who use windows machines,
since then things seem to interoperate better. I don't know what's best for
linux machines, since I've not used them in years. (As a professor, I need
something that handles microsoft files, because that's what administrators use
... and that narrows the choice to windows or macos.)

------
davrosthedalek
I'd rather read a well set, two-column PDF online than having to deal with
pop-ups, ads, dark patterns, javascript problems etc.

~~~
TheSoftwareGuy
Pretty sure you can embed javascript in a PDF document, actually.

~~~
slowmovintarget
You can also (and really ought to) switch off the JavaScript engine completely
in Acrobat reader. There isn't a legitimate reason to run Javascript during
the viewing of a PDF.

Give me a link... I might visit it in a browser.

~~~
jimhefferon
> There isn't a legitimate reason to run Javascript during the viewing of a
> PDF.

I have a book, and I'd like to display video clips. AIUI, that requires that
the viewer has JS.

------
hirako2000
The advantage is that tools have been developed to make perfect positioning of
elements (text and images). So pdf authors never have to worry about different
reader form factors.

And, when reading a Pdf, you can print it and get exactly what we see on the
screen. So the tooling is very straightforward for the reader, just click
print. With other Web content, it's the browser trying to fit things the way
they should on a page and it generally looks horrible.

Positioning elements coded in marked language into a page is actually a tricky
thing. Until we have the tooling to magically make any markdown content (with
images) fit nicely in a page, pdf will prevail. Any hint on a tool that can
take my markdown and print out beautiful pages, without having to tweak a
dozen params, please show me.

~~~
mekster
> Positioning elements coded in marked language into a page is actually a
> tricky thing.

This is almost impossible to do right. Even browsers can't produce a PDF that
looks exactly the same as a web page.

If you want to programmatically produce a PDF from a web page, the best bet is
to load up a full browser implementation just for that purpose as any other
simpler solutions would certainly break the results pretty often.

------
flowerlad
PDF's predecessor was PostScript. PostScript was a Forth-like programming
language that contained excellent 2D graphics primitives, including bezier
curves, 2D transforms and most importantly, support for scalable fonts.
PostScript was ground-breaking for its time and is the reason for Apple's
early success. If it wasn't for PostScript and laser printers the Mac would
not have been successful.

PostScript was implemented in laser printers and printer drivers output
PostScript language programs when you printed from an application like Notepad
in Windows. High-end illustration and DTP programs output their own custom
programs instead of being limited by the program output by the printer driver.

Over time it became obvious that the programming language features of
PostScript were not being used very much. Printer drivers typically output a
fixed header containing some function definitions then they use these
functions over and over for drawing the content of the page. What if these
function definitions could be built in? Then the programming language
capabilities such as loops and conditionals could be left out and we would
still be able to do everything we're doing with PostScript. In fact the
resulting technology would be even more useful because rendering a page can be
done without implementing a programming language interpreter. Thus PDF was
born.

PDF made perfect sense in the early 90's when it was designed. Page
Description Languages didn't need to be burdened with a programming language
because no one was taking advantage of the language features. But then came
the World Wide Web. PDF was the wrong tech for the Web, and PostScript would
have been perfect. PostScript has all the capabilities of PDF, but it is also
a programming language, which means you can dynamically alter how you render
the page based on where you are rendering it. Alas, Adobe's direction was
already set, PDF was going to be the future and PostScript is obsolete.

In summary, PostScript was invented at a time when nobody needed dynamic
features, and PDF was invented for a static world but then the world suddenly
changed and needed dynamic features.

~~~
Santosh83
In that case why didn't the broader community adopt PostScript for the web?
Was it because of technical reasons (too feature heavy and complex for HTTP as
envisioned originally) or did Adobe have some kind of patent that prevented
its free use?

~~~
jandrese
The most direct answer is that browsers didn't include PostScript support.
They had their hands full trying to beat JavaScript into shape, they didn't
want to support a completely different second language. Nobody was clamoring
for PostScript support either. The advantages would have been fairly minor for
the massive amount of work it would have taken to not only implement the
browser support (both in Navigator and IE) but also for web developers to
learn an entirely new language and create content in it.

I guess there could have been a use case for people typesetting their
documents in Word or LaTeX and then "printing to web", but PDF took that role.

------
qppo
I think there are good and bad uses for PDFs just as there are good and bad
uses for webpages, but you need a hot take like "unfit for human consumption"
to get clicks I guess.

For example, Agner Fog's instruction tables are something I look at from time
to time, and hate browsing that PDF file for the information I need.
Similarly, software manuals as PDFs are really annoying to use - and I've
written them!

But for research that needs to be referenced through other research in a
bibliography, having concrete reference points relative to the length/start of
the content is actually much more reliable than having semantic links to
headings or a URL. I'll frequently find deadlinks in bibliographies, or
missing webpages, or webpages completely altered and unable to parse from an
illegible URL. Versus a page number, which may be in exact or slightly wrong,
but is a good starting point rather than a dead end.

------
ssalazar
Its insane to me that Neilson/highly opinionated contrarians get any attention
for this. Browse [https://arxiv.org](https://arxiv.org) for 30 seconds and
tell me PDFs are "unfit for human consumption."

While its annoying to go to sites using PDFs that should clearly be a webpage,
its obvious that PDF is good at solving some class of problems for certain
people. The scientific community for instance has been slowly moving towards
formats that can generate both HTML + PDF, but for many reasons related to its
legacy of print publication PDF is king.

To come in and just tell these people they're wrong is the height of obnoxious
design hubris. Between that and the boastful self-accolades, delivered in 3rd
person no less, its hard for me to take this seriously.

~~~
fortran77
PDFs may be unfit, but I use them every day. So what if they look like web
pages?

------
ptero
First, the article makes a claim about PDFs problems for the web, when read
online, which is a lot less clickbait-y than "unfit for human consumption".

On the technical claims: while I agree that PDFs are not ideal for many uses
on the web, especially for current attention-span-of-a-fly web usage, they are
great for things where I am willing to dedicate more time for an in depth look
at the subject. For those cases the complaints that authors list about PDFs
(linear access to information, lack of advanced navigation options, optimized
for print (i.e., look best on a large monitor)) are not limiting and in fact
beneficial.

And some complaints (slow to load, stuffed with fluff, jarring user
experience) are just as, if not more applicable to most of the web. My 2c --
work in R&D likely skews my preferences in the direction of paper as an ideal
interface :)

------
mcguire
The title is clickbait: the article is (mostly) about how PDFs are not
suitable for reading on-screen. (Which is mostly true.)

Further, the arguments the article makes are gibberish:

" _4\. Stuffed with fluff. PDFs tend to lack real substance, compared to
regular web pages. When you’re building out a web page, you can visibly see
how long it’s getting and how far users will have to scroll to consume the
content. Methods of structuring and formatting digital content such as
chunking, using bullets, subheadlines, anchor links, and accordions help users
efficiently skim and scan sections that may contain the answers they seek amid
long-form copy. However, in PDFs, those techniques aren’t always used and
content creators tend to favor quantity of content over quality and
formatting. This leads to overwhelmingly long and inane PDFs._ "

"PDFs tend to lack real substance, compared to regular web pages." Really?
Really? That's the argument Jakob Nielsen is going with? HTML is magically
better?

"However, in PDFs, those techniques aren’t always used and content creators
tend to favor quantity of content over quality and formatting." In _HTML_ ,
those techniques aren't always used! They often aren't used. And HTML somehow
enforces quality of content?

~~~
young_unixer
> PDFs are not suitable for reading on-screen. (Which is mostly true.)

I read PDFs on screen everyday and I don't see the problem with it. It's
honestly a great experience.

------
CalChris
The article is about online or for me, on laptop, but I also have a 32GB
Kindle Paperwhite on to which I download a ton of PDFs, mostly papers but some
books. For example, I concatenated Onur Mutlu's Architecture lecture slides
into a one GB PDF file. I like that that the papers look like papers and that
the fonts and graphics are rendered correctly. Links work but I don't use
them.

However, PDFs on the Paperwhite don't make for easy reading. I could and have
converted papers to EPUB which is much easier for reading but less good for
studying, and the purpose of these PDFs is studying. Yeah, I can grouse about
PDFs but it's a tool which I use.

By comparison, I check EPUBs out from the library and they are surprisingly
pleasant to read on the Paperwhite.

Yeah, the article is about the web and I'm answering about the Paperwhite.
Maybe they have a point about browsing on the web. But for content meant to be
read, for academic content, PDFs are pretty good.

BTW, on my MacBook I use Skim which is _much_ better than Reader.

------
alfalfasprout
This is so, so wrong. Yeah, OK if you have an interactive website (a chat or
message board feature) then you have no choice. If you're trying to present
information or an article I'd take a beautifully typeset PDF _any day_ over
some website with so many trackers and javascript it takes seconds to load.

Not to mention on a tablet PDFs are much, much nicer to read.

------
transfire
The problem is that PDF targets the printed page, while HTML targets screens.
PDF does a better job with respect to printing then HTML does for screens,
because HTML has been largely repurposed for creating GUIs. Unfortunately PDF
are not easily scripted, and HTML has essentially no support for proper
printing.

And so alas it all sucks.

~~~
gjvc
This is a remarkably complete summary of the state of things.

------
Santosh83
PDFs are great for typesetting for print, where you know the paper size and
adjust everything pixel perfect to it. Nothing beats PDF when it comes to
complex typesetting for print. Web pages are meant to reflow and much better
for reading on smaller screens. Also modern web technologies can go far beyond
a PDF when it comes to interactive/dynamic content, but web pages (sites) are
also cumbersome for a non-technical user to download for offline use with all
elements intact.

But I suspect HTML will eventually win this. While HTML can be printed, PDFs
will always struggle with changing device sizes. Plus the web is becoming more
of an app as time passes while PDFs will probably remain dumb content due to
security reasons, so their applicable niche is growing smaller as the Web
creeps in scope.

~~~
coliveira
It's not true that PDFs are only for print. In fact, MacOS display
technologies are based (at least the first iterations) on PDF. PDF for screen
can work very well, the problem is that the industry never standardized this
aspect of the technology. The result is that viewing PDF nowadays is far from
optimal. I truly believe that PDF could have been a much better technology
than HTML for modern websites. Instead HTML, which started as a semantic
technology, was shoehorned into what we have today.

------
young_unixer
Maybe I'm a weirdo or something, but I love PDFs.

I basically disagree with 80% of what this website says.

"PDFs tend to lack real substance, compared to regular web pages." made me
chuckle. I don't know what kind of PDFs this person reads, but my copy of
"Computer Networks: A Systems Approach" sure as hell has more substance and
quality than a Twitter feed or whatever the author considers a "regular" web
page.

------
doonesbury
Um no. LaTeX + PDF = awesome. All tech docs in PDF are great.

Look, web pages to the degree PDF is bad, is worse because it's riddled with
adds and waits for servers you didn't intend to bother that,

\- "help you" \- "give you info you might be interested in viz adds \- all
other manner of social media nonsense \- and to pay the domain owner $$$$

Maybe the OP should stop getting PDFs from NYPOST, Vanity Fair, Mad Magazine,
Graphics Designers Guild dot com, or Madison 5th Ave S&Mrkting.com

------
alexfromapex
The design goal of PDFs is display on any device, not a platform for e-books,
so this makes sense.

~~~
wtetzner
Yeah, unfortunately they are used for everything.

One way to get the best of both worlds would be to have a normal webpage, but
have the "Print this page" button generate a PDF that is nicely laid out.
Often webpages are a mess to print.

I wonder how difficult it would be to write a tool that can turn a PDF into a
usable webpage.

~~~
kevincox
In theory CSS has specific controls for laying out the "print" format of page
page so your browser's print action should do the right thing.

However in practice many websites don't put any effort into this, which is
probably an indication that they wouldn't put any effort into a custom
solution either.

> I wonder how difficult it would be to write a tool that can turn a PDF into
> a usable webpage.

I was looking into this but it is basically impossible. Since PDF is basically
a collection of images (with some "fancy" stuff on top) you can get the
basics, such as text and headings, however you won't be able to do much for
semantics or layout. All web-based PDF viewers I have seen just render each
page to an image and put invisible text on top for copy-paste support.

------
zadkey
I think the real problem here is usage. PDFs make more sense for printing.

Using PDFs to distribute content online instead of web pages is the real
issue.

Same problem with trying to use a hammer with screws.

------
jandrese
Its interesting how many of these complaints about PDFs also apply to modern
websites.

~~~
generationP
I'd say modern websites are worse. PDFs don't keep moving under your feet
while the javascript is loading. PDFs can be straightforwardly saved on your
computer. PDFs don't blank out if you lose connection because of ajax. PDFs
don't embed malware from third parties.

~~~
dhosek
On the last point, PDFs can still embed first-party malware. E.g.,
[https://www.sentinelone.com/blog/malicious-pdfs-revealing-
te...](https://www.sentinelone.com/blog/malicious-pdfs-revealing-techniques-
behind-attacks/)

~~~
sliken
Or spy on you, without malware.

~~~
generationP
Modern websites spy on you by default, though.

------
grumbel
The underlying problem isn't PDF, but the fact that HTML is still completely
unsuited for long-form content. Really basic stuff like a proper markup-based
TOC isn't a thing in a HTML. And on the browser side there are just as much
problems, you can't bookmark your scroll position and basic scrollbars are a
terrible user interface for long HTML content anyway. There are other really
basic problems like not being able to link arbitrary HTML content unless the
author of the HTML put an anchor in the document.

ePub, mobi and such were developed to work around those limitation and make
more usable book formats, but no web browser has native support for them (Edge
had some support, not sure if that still there after the Chrome switch).
Despite being HTML-based, those formats aren't really part of the WWW.

PDF does what PDF was designed to quite well, it's virtual paper. But the WWW
has kind of failed to evolve into becoming a platform where you can publish
long-form documents on, so PDF still continues to dominate.

------
skybrian
One thing PDF's have going for them is that they are standalone files, so you
can download and collect them. The advantages are similar to MP3's for music.
HTML doesn't qualify since not even the images are included in the file.

I'm wondering what other file formats might work better, and why aren't they
more popular? Epub maybe?

~~~
OnlyOneCannolo
Lately I've been wondering what ever happened to XPS.

My understanding is that it's basically just a zip file with XML markup and
any other assets like images. It's both human-readable and machine-readable,
which is great for everything from version control to search to conversion
between formats.

------
xhkkffbf
Uh, yeah, they have some big limitations, but they generally work well for me.
It's rare for one to fail to do what was intended which is to display a
document as it might be printed. Fonts and all.

Isn't it true that every software project -- and indeed every project -- falls
short of what people may want?

------
adrianmonk
PDFs exist to emulate paper (a need which won't totally go away), but maybe it
would be nice if the format and authoring tools supported a sort of alternate
rendering mode that is online-friendly.

So for example, a word processor may be set to produce two-column text, and
for paper that makes sense ergonomically. But it is horrible in combination
with scrollbars. The same goes for margins at the top and bottom of pages.

A typical word processor allows you to easily switch text to one-column mode
or adjust the page margins, so with just a few changes it could render your
document in a more online-friendly way. So when you save as PDF, it would be
neat if it could include both renderings into the same document.

In this hypothetical world, the PDF viewer would then decide whether to render
it in faithful-to-paper mode or in online-friendly mode.

------
mwfunk
They seem to be specifically talking about the case where you're on a web
page, you click a link to go to what ought to be another web page, but instead
you're in a PDF in your browser. I get it, PDFs are documents and not web
pages, and dumping a visitor into a longish PDF when a concise web page with
the answers they're looking for would be better.

So, use web pages for presenting information best presented in a web page, and
use PDFs for presenting information best presented in a PDF, and don't use a
PDF when a web page would be better and don't use a web page when a PDF would
be better. But that doesn't seem to be the point they're making for some
reason.

------
indymike
The purpose of a PDF is to have a document that can be viewed and printed as
designed and laid-out by the creator. HTML doesn't do that. The experience of
an embedded PDF viewer is still pretty horrible (even new browsers have a
pretty bad experience, and well, even Adobe Acrobat's UI is just... bizarre).

PDF supported embedded type, vector graphics, and many other features long
before the web browser could. Honestly, the issue with pdf is how documents
are created (often via fake printer drivers that often compile/translate
whatever you are printing to some pretty gnarly postscript).

------
anigbrowl
Despite all the many problems with PDFs (frequent lack of internal navigation,
too much or too little or just plain wrong metadata, inherently static)
they're still great precisely because they print out the same way they look on
screen (which is often something you want for a long or highly technical
document) and because they don't slide around and constantly throw up modal
dialogs.

Don't get me wrong, I find many aspects of PDFs hugely frustrating. But many
websites are just _horrendous_ and a complete misery to interact with. If it's
more than a couple of thousand words I tend to start looking for a pdf
version.

------
davidwitt415
Great attention grabbing headline, but it ignores the typical user scenarios
where PDFs are created. So how is your typical Office worker who is probably
using Word going to create this awesome web page? It's simply not realistic to
expect that office workers are going to use HTML, and it's been tried for
years and years. Nielsen may as well go after Powerpoint next. Same criticisms
and human limitations apply. Yes, better formats exist in the ideal, but
ignoring the user's real context and limitations goes against the principles
of User Centered Design.

------
tannhaeuser
Let's not forget that the reference PDF reader, Adobe Acrobat, has turned into
a pile of shit about 15-20 years ago, with "plugins" and stuff making its load
time surpass that of browsers at the time, and severe security issues going
unfixed, that PDFs frequently use text in non-semantic text order or even as
stored bitmaps, with the deficiencies in searching and linking within PDFs
that goes with it. Also, Adobe found it necessary to include JavaScript
execution from PDF, and also dysfunctional PDF forms/signing and interactivity
features such as linking which more often than not pose a problem rather than
solution on the rare occasions where I've encountered their use. AFAICS, valid
(?) use cases for PDFs (apart from sending out a document to a print shop)
include e-books (incl fingerprints), academical publishing, user manuals,
formal business and legal statements, and personal archival (PDF/A with
prerendered layout and embedded fonts). Even as a critic of CSS complexity, I
believe all these use cases except academic publishing should use markup+CSS
instead, and if there are deficiencies in browsers, they can and should be
addressed and fixed. I find it particularly painful that .mht, .warc, or other
HTML-based archival format hasn't gained trust (and probably won't work well
with today's JavaScript-heavy sites, many of which don't have a reason to use
JavaScript except for lock-in, analytics, and plain incompetence).

------
euske
A big mistake is that people still consider PDF as a "document" format. In
reality, it's just a convoluted image format. Because it's an image, it
doesn't have any logical structure and reformatting them is pain. Worse yet,
its syntax is the worst of two worlds - a mixture of text and binary. It's
horrible to parse, display, and modify. As if it's like that programming
language that everyone hates (and I don't want to name). But really, it's a
burden of the future generation. We should eradicate it. End of rant.

~~~
coliveira
PDF is a document display technology. It doesn't pretend to encode the
semantic of the document. PDF producers are responsible for doing this.

------
sivvie
This article is incredibly biased with completely unscientific claims seeming
to stem from personal opinion. PDF is a great format due to the fact that the
document will look exactly as it was intended and how you would perceive the
document in real life. Using it for scientific papers, CVs or similar
reinforces trust and that the author actually invested time to create a well
formatted document. Additionally it is difficult to modify which also
reinforces the authenticity of the contents.

------
compscistd
Data communicated as PDFs when it shouldn't be has frequently been a pain.
Recently, I needed banking data that was _only_ available as bank statements
in PDFs or an unpredictable web ui.

Consuming banking data as PDFs is a nightmare. The bank I was working with
seemed to have spent _some_ money on its website (Regions Bank in the US, if
anyone wants to know, but just so happens to provide .ofx exports for
19.95/month starting from the month you sign up, but not generated for
previous months, although that's tangential). Meanwhile, my local bank that at
first glance from its website seems like it's in the stone age provides a PDF
statement that looks like it was made in the 80s (all monospaced font, no
graphics), but they also provide a .csv export for transactions with seemingly
no limits on date.

The latter bank approach signals to me that data is in the format it should be
in. No more, no less. The former suggests The PDF and a pretty web UI is the
de facto standard for communicating tabular data when it shouldn't be.

I get that PDFs online are a great alternative as a document that was
originally meant to be printed and mailed, but it is a poor substitute for
consumable banking data.

------
simonblack
While I'm not particularly enamored of PDFs, they are streets ahead of the
previous widespread document format, the Microsoft Word DOC.

The Word DOC format had the problem of becoming unreadable every few years
until you managed to splash out and buy the latest and greatest Microsoft Word
and its associated version of Windows.

At least the PDFs remain legible pretty much indefinitely.

~~~
leephillips
This is important: it is a published format.

------
jhallenworld
Well here are some reasons why .pdfs are better:

1\. They are a flat format. Why is this good? When you text search for
something it can be found, vs. in HTML where you can search only a single web
page instead of a hypertext graph- I mean what would a complete search even
mean in HTML?

2\. They are also hierarchical. I can print a hierarchical schematic and
navigate through it by clicking on sheet-blocks.

3\. You can view 3d renderings in them. Someone can save their solidworks
document as a .pdf, and I can open it and zoom and rotate the view in acrobat
reader. There is certainly no standard way to do this in HTML.

4\. There are no ads.

5\. I can send somebody the complete thing as a single file. For a web-site I
would have to send them a zip file that they then would have to extract- it's
just not as nice somehow, though in theory it should be OK. This shows up in
microcontroller documentation for example. Usually the chip TRM is a 1000 page
pdf, but the software is a bunch of HTML files (a web-site really). It's
inevitably easier to get the chip TRM than it is the software documentation.

Actually in this particular case there is more- the software documentation is
generated as extracted comments from source code by doxygen and it is usually
crap. Pdf documentation someone actually wrote, so it tends to be better.

When you get HTML documentation, there is often not an index.html file. If
there isn't one, which document do you open first?

6\. Every documentation as a web-site system has their own navigation method,
whereas .pdfs have acrobat reader or whatever. Even on web documentation that
has something like a go to next page or section button, it's hit or miss if it
works well. For example, the placement of the next button will vary from page
to page, so you can't easily just page through it.

------
ahmedfromtunis
I wish MHTML had more recognition as well as a chance to play the role of a
"portable document format". For one, it's easy to open (everyone has a browser
on whatever device they're using), easy to work with either for creators or
consumers and can automatically adapt the screen it's been read on.

~~~
sjy
Last I checked, iOS and Android devices don’t support MHTML out of the box. It
would be more accurate to say that everyone has a PDF reader than everyone has
a browser that supports MHTML.

------
nonbirithm
Highly disappointed there was no mention of the pain of annotating PDFs. The
only way I've been able to reliably annotate a PDF with writing that I've
downloaded on Linux is to use WINE or install a bunch of KDE dependencies just
for Okular. PDF is a document format intended for consumption but so many
institutions insist om giving you a PDF with no form elements and expect you
to edit it and send it back. A web-based solution that would have a form that
autogenerates the finished PDF would work so much more, but PDF is apparently
easier for them to send and expect back an answer. As a result I dread PDF
when using it as a format that's intended to be edited. I feel like this is a
misuse of what PDF is supposed to be, as people believe that since it looks
like printed paper then you ought to be able to write on it like printed
paper.

~~~
asdff
I think it depends on your viewer. Preview on mac is pretty simple to fill
forms and markup however you like. Forms are actually pretty simple to fill
and tab to the next, feels like the format was made for this. I've had no
problems doing this on PC with acrobat as well. I'm not sure what is out there
on linux, but there has to be at least one fully featured PDF viewer.

------
aj7
I prefer PDFs on the web for serious reading. They have the least probability
of having disruptive background processes and jarring graphics. It’s a
pleasure when I’m alerted to a lawsuit dropping on Twitter, and I can find the
actual pdf.

I couldn’t disagree more with most of the assertions in this article.

------
_reza
Paper is a vastly different medium compared to computers. The ignorance large
companies(digital book distributors) show when dealing with humans(by only
focusing on ebook sales and nothing more) is really annoying. Take Adobe
Reader for example, it is really awkward in how dozens of researchers are
unable to grasp the most basic feature of computers: dynamism. These people's
minds are still stuck in Gutenburg era and they fail to notice how powerful
computers are. Having had headaches with pdfs(I read lots of books) and the
way knowledge is buried in this format, I started a project to inject some
dynamism into our book reading.

Please check it out if you are interested.

[https://github.com/rezahsnz/readaratus](https://github.com/rezahsnz/readaratus)

------
fomine3
This is just Android's problem, but I sometimes very hard to copy PDF's URL:

Chrome for Android doesn't support to load PDF so it automatically downloads
and opens in another PDF app. Reading PDF in app is fine but I can't copy url
from app because it's already downloaded. Normally I can just copy url from
link but some site like Google and Twitter uses link jumper so I unable to
copy url.

Yes it's same as other file types that can't be opened by browser, but other
file types are rarely directly linked. PDF shouldn't be first-class citizen in
web.

------
nearmuse
At least PDF books and papers are mostly self-contained and easily accessible
in their pristine condition. This piece looks very inflammatory and appears to
say "PDF is bad because PDF is not web" in too many words.

------
nikisweeting
The durability of PDFs are one of the main reasons why they're one of the core
methods of archiving websites in ArchiveBox.io.

Despite being "clunky", they render much more reliably than HTML on the
decades timescale.

------
eithed
Today I tried to get a blank PDF. I've created a blank docx file using
official Word and used official Adobe Acrobat to convert it to PDF. On the
first try I've received a message saying there was an error while sending the
file. On the second and subsequent tries I've received a message saying that
there was an error converting the file. So, after 23 years of development if a
case of converting a blank document is not supported...

------
supernova87a
I don't know if anyone can suggest some tools, but my minor problem with PDFs
is that any kind of data table gets absolutely mangled/unusable for cut/paste
purposes after creation.

It's like somehow the PDF generation process randomizes the order in which it
populates tables, such that selecting by a user later is generally impossible.

Maybe it needs to be interpreted / extracted from the PDF source itself, but
average user graphical selection of a table is out the window.

------
Lammy
I just wish iOS Safari supported opening its regular document view for PDFs
embedded on web pages in <object> tags. It treats them like an <img> and
displays just the first page of the PDF with a transparent background letting
the page show through. Of course there's no good way to shrink that UI into an
arbitrarily-sized box on the page, but I'd prefer a button to open the regular
fullscreen view over the current behavior.

------
katsume3
Also relevant:

[https://www.sans.org/reading-
room/whitepapers/malicious/pape...](https://www.sans.org/reading-
room/whitepapers/malicious/paper/33443)

[https://digital-forensics.sans.org/media/analyzing-
malicious...](https://digital-forensics.sans.org/media/analyzing-malicious-
document-files.pdf)

Malicious PDFs are still around, even today

------
daffy
> Do not use PDFs to present digital content that could and > should otherwise
> be a web page.

To get decent typography, one needs TeX, and TeX produces PDFs, not web pages.

~~~
corty
TeX can also be used to generate HTML or even ODF. But of course decent
typography is only available in PDF or maybe PS if you like it oldschool

------
RobRivera
Pdf is a format; the content is a choice of the author.

------
mmmrk
The article deeply resonates with me. I have been annoyed so often by a
website splitting content into PDFs that would have been perfectly fine as
HTML. I suppose this happens because the content department makes nicely
(sometimes) layouted documents first to print and give someone to review _and
then_ someone decides to throw them up on the website as an afterthought.

------
objektif
If the alternative is epub i will take pdf any day.

------
centimeter
I disagree with almost every point in this article.

------
StillBored
The author seems to want CHM...

For all the PDF hate, 99% of the time the rendering is better than most web
pages, and it actually works properly. I don't have scaling issues with it on
hi-dpi displays/etc.

I've also yet to see a browser do proper sgml/svg graphics scaling of high
density (thing multiple hundreds of MB) maps/etc that are common in PDFs.

------
tehabe
The British Government also published two years ago they reasons against using
PDF [https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-
should...](https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-
published-in-html-and-not-pdf/)

------
onion-soup
At least PDF doesn't bombard me with click walls and cookie disclamers so that
I can actually read and scroll.

~~~
tatersolid
I encounter many click-wrap-agreement PDFs in my dayjob.

The hilarious thing is that these monstrosities are created by “security”
people and work only in Acrobat on Windows with scripting _enabled_.

------
red_admiral
> PDFs tend to lack real substance, compared to regular web pages.

I am seriously wondering which universe this person is from.

------
commandlinefan
I usually prefer HTML content, but for long-form technical documentation, I
actually prefer PDF because it's always written to be read "cover to cover"
rather than randomly hyperlinked. I do bail as soon as I see two-column
output, though - too painful to deal with on a computer.

------
cryptonector
There is one thing about PDFs that makes them OK: it's one file, with
everything needed, so it works offline. That's not nothing. The Web N.0 is not
offline-friendly, and while most of the time that's not a problem, when it is
a problem, it's a nasty problem.

------
FerretFred
_Sized for paper, not screens_

Yeah! This year I decided to support Indie journalism and help the environment
by not having the paper edition mailed to me. Big mistake. They'd literally
rendered the print version as PDF, and reading that on an iPad was nearly
impossible.

------
agumonkey
PDFs are horrendous but they work in their horrendous context. Most people are
not tech saavy and want universal visual and printable. I so wish people could
exchange text + svg but you need to educate and modify workplaces. Until then
PDF it will be.

------
fomine3
As non-English native, Translating PDF is pain especially two col layout like
thesis. Don't use fixed layout for web content, Please! (Seriously it's a11y
problem)

------
blunte
Part of what makes the PDF experience so abysmal is the Adobe reader most
people use. Apple Preview (and Quicklook!) is so much faster and more stable
that I can forget how miserable the experience is for Windows users.

------
2ion
> 1\. Linear and limiting

Perfect for information discovery. Rich annotations and hypermedia features
(external links, document-internal links, TOC) in PDF fix pretty much all
issues stemming from this. All searchable (if the PDF has been constructed
properly). Permanent, static structure vs everchanging, confusing messes of
websites. The web is NOT QUOTABLE and unusable without advanced full-text
search. Barely an URI remains stable.

> 2\. Jarring user experience. PDFs look completely different from typical web
> pages.

Typesetting on the web is a clusterfuck. Subpar microtype. Font rendering
issues galore, tens of versions of popular fonts purchased at different points
in time from different vendors with differently messed up CSS font
configuration settings. Fonts are not embedded, but hyperlinked. I want a
maximum fidelity reading experience for large portions of text and classic
formats, because familiarity aids navigating a complex document. There is no
need for fancy styles and whatnot.

> 3\. Slow to load.

Renderers differ in quality and speed. PDFs render lightning fast at
acceptable settings and if you wanna tune for maximum quality, you can do so
at the expense of slower rendering. Besides, it took 2.401ms to load the web
page these points are writteen on, excluding content blocked by ublock origin.
This point is delusional. A 700 page beautifully typeset PDF opens and renders
in <<1s on my 7 year old laptop, and my reader will prerender pages to speed
up navigation even more.

> 4\. Stuffed with fluff.

The entire paragraph is invalid because PDF has all those features.

> However, in PDFs, those techniques aren’t always used and content creators
> tend to favor quantity of content over quality and formatting.

The same goes for most web content put out today.

> 5\. Cause disorientation. Because PDFs aren’t web pages, they don’t show a
> standard navigation like a website would.

Document structure is clearly presented in tree on the right side if the PDF
is properly annotated/hyperlinked and the reader has a TOC view (productivity
tooling should have this). Websites lack this discoverability almost always,
and if they have it, it looks different and works differently everywhere,
creating disorientation.

> 6\. Unnavigable content masses.

This has nothing to do with PDF and everything to do with the reader in use. A
semantic desktop would index all file content, allow cross-linking between
files using file:// or other protocols, and generally expose all content to a
local or internet search engine. Google search indexes PDFs just fine! (Again,
a badly constructed PDF may not contain text at all or broken text, but that's
a generation problem.)

> 7\. Sized for paper, not screens.

This is correct, and an advantage, because the web and most other screen
content lack the fidelity of typesetting systems like LaTeX, ConTeXt, InDesign
etc which each incorporate decades of digital best practice, and several
decades more of typesetting knowledge.

It is an disadvantage in special settings, like on mobile, but even then, PDF
text can be reflowed with appropriate software.

> Users Strongly Dislike PDFs

It's my favourite format for archiving documents, knowledge, and even website
printouts.

------
auggierose
PDF is great, especially for books and papers. Really the only proper choice
for a digital technical book. Of course, you should read it on a device large
enough, like an iPad Pro 12.9 inch.

------
sradman
The distinction should be made between fixed layout formats like PDF and
reflowable text formats like HTML. In a RESTful sense, these should be two
representations of the same resource.

~~~
michieldotv
The PDF spec has everything on board to support text reflow. A good PDF
library will typically have the option to output what is called a Tagged PDF.
Annotating the structure of the document allows readers to reflow the text.
It's what the Web Content Accessibility Group recommends doing.

~~~
sradman
Good to know, thanks:

[https://en.wikipedia.org/wiki/PDF#Logical_structure_and_acce...](https://en.wikipedia.org/wiki/PDF#Logical_structure_and_accessibility)

------
pmdulaney
PDF format is useful if you want a hard copy.

But do any of you know how to use Pandoc (or some open source or command line
tool) to convert a PDF to something easily readable on a Kindle?

------
ehutch79
I don't see how users hate pdfs? They're perfect for sending to print houses,
so we know what we send is exactly what ends up on paper!

~~~
corty
Only after hours of linting. If you don't do preflight, colors will be off,
some objects won't render, transparencies will be solid and maybe fonts will
be missing. PDF is by no means idiot-proof in this regard. I've been burned
more than once.

However, PDFs are still better than everything else.

~~~
ehutch79
Agreed! Designers spend tons of time on creating a document for printing. I
can only imagine if the print house devcided to change the aspect ratio or
size of the final output randomly, or lower the printer resolution.

------
jneplokh
I have to disagree. I love consuming content through PDF format, even books,
especially when paired with a great PDF reader.

------
dejongh
I really like PDFs. It is a portable way to render something not design for a
web browser.

------
pjmlp
Nope they are quite alright and much better than the PS tooling that they
replaced.

------
dkersten
I way prefer a PDF to most modern advert-infested, javascript-laden website.

------
tzs
> Linear and limiting

When I'm trying to learn something that is not short and simple, linear is
good.

Far too often when someone tries to present a long and complex subject via
HTML, they don't provide an easy way to go through the entire thing in an
order that is pedagogically sound.

It doesn't _have_ to be that way...but it usually is. I'm not sure why.

Instead, they provide each page with a sidebar that links to other pages,
turning the whole collection into a directed graph of pages full of dead ends
and regions that have no links to other regions.

You reach some page where the sidebar links to X, Y, and Z, which are all
things that depend on what you learned on that page and you are now ready to
learn. If you follow the X link, you may end up learning all about X but may
never again see the links to Y and Z unless you remember that a dozen pages
back you saw them and purposefully seek them out. It's very easy to not even
realize that you missed a whole major subtopic.

In a linear format, such as an actual book, a PDF, an EPUB, or even a plain
text file, the author or editor makes a decision on how X, Y, and Z should be
ordered. Maybe they decide X, followed by Y, followed by Z. Maybe they decide
X, then Y, then things that depend on both X and Y, then Z, then things the
use X, Y, and Z.

Different authors might pick a different ordering, but they point is they have
to choose something. Whatever they choose, you just keep turning the page and
you'll hit it all.

For a big subject, maybe you don't want to hit it all. I've seen math books
address this by having a list or diagram in the front giving you alternate
orders to go through a subset of the book if you just want to learn just a
subset of the subject.

In theory, HTML should be great for this, especially HTML with JavaScript. You
could have a page that lets you select from different learning paths, and then
the JavaScript would put "Next" and "Previous" buttons on each page that take
you through all the pages on your selected learning path. You could still have
the sidebar links, but if you follow one the JavaScript could add a "Return to
Learning Path" button so you can always get back on track.

But until more HTML authors put in the effort to provide a linear path through
the material that books/PDF/EPUB/text formats force their authors to provide,
PDF and to a lesser extend EPUB will remain the best option for most people
trying to learn a long and complex subject online.

(I give PDF the nod over EPUB because most EPUBs do not have mathematical
notation that looks as good as it does in PDF. I don't know if this is a
technical limitation of EPUB itself, or of the EPUB readers I've used, or of
the authoring tools used to create the EPUBs, or simply the authors didn't
know how to do it right).

A good example of HTML authors putting in the effort is "The Feynman Lectures
on Physics" online edition [1]. That shows you can make a website that
presents a long and complex technical subject that works as well as a book or
PDF, yet adjusts well to a variety of different screen sizes.

[1]
[https://www.feynmanlectures.caltech.edu/](https://www.feynmanlectures.caltech.edu/)

------
Havoc
And even worse for manipulating with code.

------
Yaa101
PDFs are for printing, they come out almost every printer the way they are
suppose to, there are other formats like txt, office and html that are better
suited for direct consumption.

------
SiempreViernes
While the comprehensive ranking of file extension reliably by Munroe 2013 does
not contain .htm or .html, one can infer by the related file formats, that
html content would rank below pdf's.

Randal Munroe, "File Extensions"; xkcd.com, 1301, 2013-12-09.

~~~
fcatus
couldn't you just copy the link for it?
[http://imgs.xkcd.com/comics/file_extensions.png](http://imgs.xkcd.com/comics/file_extensions.png)

------
bmsd_0923
Ok, boomer.

------
WrongThinkerNo5
I find that claim rather ironic, because I feel like the primary reason that
PDFs are "unfit for human consumption" is a formatting issue, not as much a
technical or practical issue. The reason they are unfit to read on line is
that they are formatted using past formatting standards that are meant for
print … not inline reading.

There are of course some technical limitations to PDF that would prevent them
from being mad "digital first", but even just changing page layout and
adjusting margins and spacing and font for horizontal display (as most of our
screens are) vs vertical layout as one would read a printed sheet of paper,
would make huge differences.

I for one actually compensate for that in that I have a dedicated monitor that
is vertically oriented in order to read PDF documents. Better yet if you can
do it on a very high dpi screen. But even that is not ideal because although I
actually like print formats, standards, and conventions (like margins,
spacing, and structure), it's simply not relevant or applicable in digital
until we get A4/Letter formatted tablets or desktop screens that emulate
physical paper … albeit even that, inadequately. Nothing can really replace
the advantages of paper, at least not until we get paper thin displays that
have zero measurable response times on pen inputs … i.e., likely never.

------
node-bayarea
I LOVE PDF

------
sahoo
What will you do after printing the pdf document, send it to legal? Lawyers
are not humans, right? I agree to it's good for paper not for screen,
everything else is just phony.

