
Guido van Rossum: “PDF Must Die.” (2014) - shubhamjain
https://twitter.com/gvanrossum/status/533280495392419842
======
james_morton
PDF was made for printing, not displaying. As someone who has been in the
printing business for a long time, PDF was a god send, no more worrying about
missing fonts, no more page margin differences, no more driver discrepancies.

What you see is what you get, wherever you open it, wherever you print it,
that is _huge_.

~~~
delcaran
> What you see is what you get, wherever you open it, wherever you print it,
> that is huge.

Not really, it depends on the PDF software. I have printed the same bill with
SumatraPDF, Acrobat Reader X and Acrobat Reader DC: different fonts, different
sizes and missing logos...

~~~
Freak_NL
Usually it is the PDF that is at fault, or rather, the assumptions made by the
author (or the authoring software). A PDF meant to be portable and usable for
a long time should embed all fonts used. If you get different fonts and sizes
in different viewers, that tends to hint at a lack of embedded fonts, leaving
the PDF viewer to substitute using its fall back fonts.

Most PDFs I come across tend to come with embedded fonts and work just fine on
any decent PDF-viewer.

------
sambeau
All these people pointing out that PDF was for printing, not for screen
display, are I believe (at least partly) incorrect. I distinctly remember PDF
being touted as a screen display technology.

Adobe (and the world) already had a printable format: Postscript. But
PostScript was interpreted and had loops and other programming constructs that
made it hard to stream and to add hyperlinks to. PDF was explained to us as
'unrolled Postscript'. PDF was also claimed to be much faster to display as it
didn't require a interpreter although NeXT (and SGi) used 'Display Postscript'
for their windowing systems.

As someone who was involved in print media at the time the font & raster-image
embedding was also a godsend but that was a characteristic of the PDF file
format rather than the PDF drawing technology.

PDF was deemed so suitable as a display technology that the Mac OS X window
server was hailed as 'Display PDF' on launch and a big deal was made of how a
Mac could not only display exactly what a printer would show but that high-
resolution printing was already built in: everything was essentially already
being printed to the screen. Although seldom mentioned theses days I assume
the Quartz 2D render is still Display PDF?

[https://en.wikipedia.org/wiki/Display_PostScript](https://en.wikipedia.org/wiki/Display_PostScript)

------
dempseye
He doesn't like PDF because it doesn't reflow to fit his screen, but that is
precisely the point of PDF.

------
cpach
I agree with the comments pointing out that PDF was not primarily intended for
screen usage. However, it’s quite a good format for typeset text. What if the
problem is not really with the format itself, but the way it us used? What if
the PDF file was customized to the specific device they will be opened on?

It would probably not be impossible to take EPUB source and then convert to
PDF via (La)TeX, with good typograhic settings for the specific device. For my
laptop screen the page would be around 28 centimers wide, perhaps set in three
or four columns. For my smartphone, one column is enough. With this setup one
would not have to scroll up and down while reading the columns, which is the
case now when reading e.g. scientific papers. And because of the great quality
of the TeX engine it might be easier to achieve a good layout than hacking
around with HTML/CSS/Javascript.

------
0xFFC
I don't have any problem with pdf's, and I think they are wonderful, specially
for reading books. They almost simulate same experience of reading old fashion
book, but in computer ! I like the idea of not being responsive. Because let
be honest being responsive would come with so much headache. Every device then
render it in a different way.

But I think the problem is software's we use as pdf viewers. Companies tend to
ignore pdf's. They think if they can render pdf's , that is enough.No it
isn't. Some pdf viewer cant get fit-to-width right.It kind of feels like
software companies tend to ignore pdf's.

Apple has done _wonderful_ job with their word lookup functionality . If
English is your mother tongue then you don't have fucking idea how helpful is
that.

I am PC user , _Ask Cortana_ functionality (in pdf's) in Edge made me ditch
Google-Chrome for Edge entirely, and believe me I spend all day in browser
24/7 and my whole life was in chrome before ditching it. (just imagine how
important that simple functionality is for non-English speakers) I don't know
how to emphasize it more, but believe me, that simple Lookup (without opening
new browser tab and searching for "define $word" is _the_ most functionality
most of non-English users want).

p.s. I wish google add pdf word lookup functionality to this :
[https://chrome.google.com/webstore/detail/google-
dictionary-...](https://chrome.google.com/webstore/detail/google-dictionary-
by-goog/mgijmajocgfcbeboacabfgobmjgjcoja)

------
ocdtrekkie
From an additional tweet, we can see that Guido's biggest reason for disliking
PDF is the text not being responsive.

This is actually my FAVORITE thing about PDFs. They're like pages. They
perfectly replicate the book medium they are often taken from. If I want to
make something bigger or smaller, I can do that, and it just works with zoom.
It's not trying to guess what I'm trying to do and adjust things in ways I
don't want.

I'm sure it's an unpopular opinion, but I love the PDF.

~~~
jwdunne
They're delightful to read on mobile too with a reader that maintains zoom and
allows continuous scrolling. I don't mind flipping the page but it's far more
friction than flipping the page of an actual book since you do it about 5 to
10 times more.

~~~
cpach
What mobile PDF readers do you recommend? I’ve been longing for zoom fixation
and continuous scrolling but had no idea this existed in current apps.

~~~
jwdunne
Foxit PDF Reader does this beautifully. It's far better on iOS but still works
better than the alternatives on Android.

Trust me, it is a DREAM to read PDFs on mobile with Foxit. I avoided PDFs for
years because iBooks and Kindle suck on mobile. Acrobat failed to even render
the text in a PDF I tried so that was a non starter.

~~~
cpach
Cool! Will try Foxit then.

------
mytddu
My gripe with PDF is that it's the standard format for academic publishing,
rendering a whole mass of scientific knowledge largely inaccessible for text
processing purposes. I've wanted to analyze the Libgen archive of journal
articles for a long time but have never found an adequate solution for
extracting text from PDFs. Any suggestions on this?

~~~
IngoBlechschmid
Sure, the Linux tool "pdftotext" works just fine for this. Two small caveats:
ligatures get converted to proper Unicode ligatures and not their ASCII
fallback (as one might want or expect) and of course complex mathematical
formulas are rendered badly.

~~~
mytddu
I've tried both pdftotext and pdf2txt and I remember not being satisfied with
either. Neither seem to handle non-ASCII characters very well, but I'll take
another look soon though.

------
girzel
PDFs are great for _reading_ , but nothing else. If you want to _do something_
with the text, then plain text or some derived format is better. I would
include searching as "doing something". PDF viewers can search, but it's
always more painful than searching plain text, and I would say if you're using
search to simply locate your subject in the file, then the file author has
failed to provide a sufficient TOC/index.

To me, the case in point is the PDF documentation of LaTeX packages. The PDF
files are great as a demonstration of what the package does, but they are
awful for learning how to do it in your own documents. I am writing LaTeX in
plain text, in an editor. I want the package documentation in plain text, in
my editor. I sure as hell do not want to fire up a separate PDF viewer and
then start flicking through pages, just to discover the correct package option
incantations.

I love PDFs, but I only love them as a prose reading experience: open the
file, start at the top, read until the end. When I make PDFs for other people
to consume, I usually make three different layouts: print, screen, and
tablet/mobile. Thanks to LaTeX, that's easy to do.

------
Havoc
PDFs are great. Things look the same everywhere. Word docs on the other hand
god help you if there is some minor difference in Word version / printer model
etc and you're shooting for consistency.

------
0xmohit
Quoting
[https://twitter.com/gvanrossum/status/533337876415533056](https://twitter.com/gvanrossum/status/533337876415533056)

    
    
      @eugeneglybin Proprietary though it is, DOC[X] is easier to reflow to fit my screen. Also, I guess PDF fans don't use mobile screens enough.
    

Reflow isn't about PDF _per se_ , it is about the underlying reader being
used.

That said, it is certainly a much better alternative than DOC[X]. Different
versions of Microsoft Word tend to produce different output for the same
document.

With PDF, you are guaranteed the same output upon printing to any device;
which isn't quite the case with DOC[X].

~~~
cpach
Are there really any PDF viewers that does reflow? I’ve never heard of that
before.

~~~
0xmohit
The PDF creating application needs to ensure that the document is
_accessible_. Adobe Reader (on desktop) does support reflow [0]. Adobe Reader
on Android also supports reflow [1].

[0] [https://helpx.adobe.com/acrobat/using/reading-pdfs-reflow-
ac...](https://helpx.adobe.com/acrobat/using/reading-pdfs-reflow-
accessibility-features.html)

[1] [https://www.quora.com/Whats-the-best-free-PDF-reader-on-
Andr...](https://www.quora.com/Whats-the-best-free-PDF-reader-on-Android-that-
can-reflow-and-zoom-text-at-the-same-time-So-far-Ive-evaluated-Adobe-Reader-
Polaris-Office-and-Qoppa-among-others)

------
_jomo
Is there a container format for HTML that includes assets such as images used
in the document? I think having a single file is important. Firefox (possibly
others) allow you to save a website, but you end up with a folder full of
assets which isn't very user-friendly.

Replacing everything with data-uris could be an option, but this would have to
be some sort of standard for documents usually sent as PDF.

Websites could then create the "raw" document and display it in an iframe, so
the website's template is separated from the document.

~~~
based2
[http://stackoverflow.com/questions/2429934/is-it-possible-
to...](http://stackoverflow.com/questions/2429934/is-it-possible-to-put-
binary-image-data-into-html-markup-and-then-get-the-image)

------
shubhamjain
I am wondering why there is still no compiled HTML format in any age when web
pages can render almost everything (at least, I can't think of any from the
top of my head). With base64, images can be embedded without any external
dependency and I think it should be easy to reduce what Javascript can do to
address privacy concerns (like, disabling XHR).

A web page renderer is more ubiquitous than PDF. Web pages are responsive and
everyone can build one with just a simple text editor. So why do we need PDF?

~~~
jeandejean
> "A web page renderer is more ubiquitous than PDF"

Just any browser can render a PDF as well. PDF is good because it guarantees
the formatting intended by the author.

~~~
oneeyedpigeon
And PDF is bad because it guarantees the formatting intended by the author.
The author's formatting _might_ work on a single screen, but it sure as hell
won't work on the multitude of devices that access the web.

PDF guarantees the end print result, but it shouldn't be used for anything
else.

~~~
SFJulie
Well letter, legal vs A4 is a concern. Some printing drivers tends to alter
content in the «fitting» process making some pdf hard to read. And screen
tends to be in weired and diverse height vs width ratio.

US standards are ridiculous. At least A1/A2/A3... are homothetic.

------
Angostura
So lets see, you would prefer (say) schools to upload letters to parents on
their Web site in the original docx format?

Because that's what will happen if you kill PDF.

------
diimdeep
+1, pdf cant do this
[http://explorableexplanations.com/](http://explorableexplanations.com/)

------
legulere
Probably it's actually the other way around.

PDFs usually fulfill lots of typographic criteria for better legibility like
use of justified text and ligatures that are hardly used in HTML or badly
supported.

The only real problems are with two column text moving around is annoying and
you often have to zoom in. However with a modern touchpad offering 2D
scrolling and pinch to zoom this is hardly a problem anymore.

~~~
cpach
_”with a modern touchpad offering 2D scrolling and pinch to zoom this is
hardly a problem anymore”_

I really can’t agree with you there. Despite that, the reading experience
suffers a lot IMO.

~~~
legulere
Yeah maybe I was a bit too enthusiastic, but it's a dramatic improvement
compared to before.

~~~
cpach
I agree :)

------
camillomiller
Meh. Let's be real, PDF has its use, it's all a matter of who uses it __how
__. That 's exactly the same with HTML. Which of course is something you can't
really use for digital contracts, for example.

Wouldn't it be better to educate users to choose the right software tool for
the right job, instead of generalizing the problem with shortsighted
assertions?

------
chmaynard
Smart guy, stupid tweet. If you decide to take an extreme position to attract
attention or vent, at least say something intelligent.

------
darkhorn
I don't like PDFs but you can sign them with certificates. This makes them
great for busness contracts.

~~~
girzel
You can sign any lump of plain text with a GPG signature. I know I'm dreaming
here, but...

Honest question: How much traction has PDF-plus-certificate gained versus
plaint-text-plus-signature?

~~~
PointerReaper
Lots when you consider that almost everyone on every platform has a PDF reader
to work with that signature, whereas not many people have GPG or GPG tools
that are friendly to the non-technical users.

------
based2
[https://www.adobe.com/uk/epaper/tips/acr5reflow/](https://www.adobe.com/uk/epaper/tips/acr5reflow/)

[https://github.com/mozilla/pdf.js/issues/4816](https://github.com/mozilla/pdf.js/issues/4816)

------
zelcon
Both PDF and HTML are inferior substitutes for proper TeX formats.

~~~
cpach
What do you mean? TeX is a source format, not a display format.

