
PDF is not portable in the digital world - robert-zaremba
http://rz.scale-it.pl/2017/07/05/stop_publishing_pdf.html
======
massar
Rendering a Microsoft Word/Powerpoint document as a PDF is a good thing as
then one does not need a Doc/Docx/PPT/pptx viewer anymore while most devices
come with a PDF viewer builtin (eg. Chrome :) (and as a bonus it kills the
anims if they are there) this while keeping the formatting intact (some minor
color changes though depending on one export it).

I tend to keep a whole bunch of things I want to 'read later' in my iBooks
collection, just save as PDF and transfer to phone, or if already a PDF just
download it directly; zooms great too. I got all kinds of device manuals, but
also PADI and other diving reference books; always good to quickly check up on
it when in doubt and then to reinforce that information with the knowledge of
your dive buddy.

Indeed, for content that does not really need a layout outside of some headers
(<h1>) and paragraphs (<p>) HTML is perfectly fine.

Quite a few text portions of conference papers (read: Tex :) can be rendered
as markdown and then easily converted to HTML, but it won't feel 'as well',
thus PDF is a easier format that also reflects the original intent and format.

IETF RFCs typically can be rendered in a myriad of ways thanks to xml2rfc,
then again, one mostly will end up reading them from tools.ietf.org or to keep
local, render as PDF and load it into iBooks.

~~~
discreditable
On the topic of MS Word, I really wish their export to html function was more
simplistic. Most of the time I just want to export the document structure
(headings, tables, bold, italic, etc.) and not include all of the styles and
extra markup.

~~~
ferdterguson
Pandoc?

~~~
discreditable
That is what I use lately. It's not perfect but it's the closest thing I have
found.

------
rubidium
PDF is beautiful. Long live PDF.

I read a lot. I don't always want to read from a screen.

Making PDF's available on the internet just saves me from having to search
through a journal stack.

PDF's have the advantange of the formatting looks good, and the
author/publisher gets to choose how it looks. Usually with input from a
professional. This is much better than the "styling" many websites and epubs
provide.

~~~
robert-zaremba
that's the thing. 1) If you use simple HTML / EPUB majority of publications
are good for printing. 2) Think about a relief of storing this documents on an
ebook reader

~~~
falcolas
> majority of publications are good for printing

This is a problem. PDF supports "all". Epub would have to have equivalent
support, as well as native support in most/all vanilla OS distributions to be
a real competitor for PDF.

> Think about a relief of storing this documents on an ebook reader

For many people this is their phone or tablet. Both of which support PDFs as
well (as do many standalone ebook readers). You also don't have to get an
extra app to view the PDFs.

~~~
robert-zaremba
Majority of publications are simple text with few tables or diagrams. We don't
need complex typesetting for that. Printing from EPUB should be good.

You always need an extra app to view PDF. It's either installed by default or
not. Phones, tablets, and other screen devices have available EPUB, MOBI, HTML
browsers to install.

------
guidoism
A few weeks ago I would have made the same statement but I started reading the
PDF implementation docs and now I really like the format.

The main issue I think we all have with the format is that people make docs
that are almost impossible to read on a small screen.

There are ways around this: 1. Tagged PDFs present the underlying content and
semantics in order to reflow for accessibility purposes though right now very
few people seems to use this feature and 2. Maybe it wouldn't be a bad thing
to make PDF pages closer to a paperback book rather than an A4 page with the
resulting shorter line length and reduced margins.

PDF is indeed more complex than plain HTML with some cribbed CSS but in many
ways it's a lot better: 1. It truly in portable in the sense that every
computer will render it in exactly the same way, 2. It packages up all assets
in an efficient manner (only the glyphs that are needed are included, not the
entire font with all glyphs and position hints like web fonts), 3. The
expensive layout computation is done once, on a computer in a galaxy far far
away from my battery limited phone, and 4. PDFs are (by convention) free from
all of the cacophony of crap like share buttons and navigation chrome and ads
and articles-you-may-enjoy fluff.

The format itself is actually not that bad, it's a text format in that it's
relatively easy to open up a text editor and bang one out. The only
inconveniences are the places where you need to state exactly how long strings
are (which your text editor can help with) and the creation of the index at
the end (which I've been cheating by just running my hand created PDFs through
a PDF lint-like utility.

The reason why most PDFs look crazy when opened up in a text editor is that
the streams are almost always compressed. You can uncompress with them "qpdf
--stream-data=uncompress in.pdf out.pdf"

~~~
klodolph
The share buttons and navigation chrome certainly _can_ be put in a PDF,
they're just incredibly uncommon. I've even played video games that were
distributed as PDF files.

~~~
mercer
Pdf games? Do you have an example you could send me, perhaps? I'm really
curious.

~~~
klodolph
The ones I played were adventure games, basically a bunch of areas
(implemented as pages) linked by buttons, with some extra logic thrown in. If
you can imagine the kind of scripting capabilities you'd need to run a
PowerPoint style presentation from a PDF file, and the kind of scripting you'd
need to make sophisticated interactive forms, you're on the write track.

Unfortunately I don't know what the games I played were called.

------
psion
I find PDF to be way more portable than most other document formats in terms
of saving a document or for printing a document. Saving an HTML page has it's
own set of problems, and if I share it I have to make sure to get all the
images gathered as well. Word processor documents depend on system fonts,
etc., and I cannot be sure that what my document depends on is installed on
the other computer. With a PDF, I can be sure to get the necessary elements,
be them font, images, etc.

~~~
robert-zaremba
That's where EPUB / MOBI comes for.

~~~
logfromblammo
EPUB is essentially an entire self-contained HTML web site in a ZIP file
container.

Whatever you can do on a website, you can theoretically do in an EPUB. It
might not display as expected when rendered by an e-reader or printed,
however, which is why most EPUB files stay rather safe and unambitious in
their CSS and JS.

I'm a bit disappointed that web browsers don't generally function as EPUB
readers or include a "Save As... EPUB" option, but I can't seem to muster the
motivation to write a Firefox add-on to do that. It wouldn't even be that
difficult, as I have created EPUBs from filesystem directories using nothing
more than 7Zip and a shell script.

There is a possibility that law firms would pay for a premium version that
included a crawler and some form of cryptographic validation that could prove
that the EPUB file was created at a certain time, from a certain IP address,
and hasn't been altered since then. The idea being that trademark owner's
lawyer takes a snapshot of a website selling knockoffs when sending the C&D
letter, then another when filing the lawsuit, and the evidence for the
complaint is preserved without having to rely on static snapshot images of the
rendered website or third-party archive sites.

~~~
zie
Someone already did it for us: [https://addons.mozilla.org/en-
US/firefox/addon/dotepub/?src=...](https://addons.mozilla.org/en-
US/firefox/addon/dotepub/?src=search)

------
omgtehlion
Finally, in 2017, when PDF became abundant, does not require additional
drivers to use it, has somewhat usable spec, and a lot of 3rd party and open
source software to work with do we really need to get rid of it?

Bashing PDF is so 2000...

------
emeraldd
My first thought on this is that EPUB is not fully portable either ... it's
just non-portable in a different set of circumstances. If you want to publish
on the internet, for general consumption, just use plain html. That's about as
portable as you can get without moving into raw text.

~~~
ldjb
Plain HTML isn't so good if you want to include images in your document or
have multiple pages. You end up making the user download a whole bunch of
files if they want a copy of the document.

The great thing about EPUB is that all the files are bundled together in a
single .epub file. You can copy the file and move it around without worrying
about keeping the structure intact.

I think that perhaps the main issue currently present with EPUB is that EPUB
readers aren't really part of the standard installation on devices. Pretty
much every PC or smartphone you come across in the wild will have software
installed to view PDFs, but the same isn't the case for EPUB. I think Apple
have done a good thing by including an EPUB reader (iBooks) as part of macOS
and iOS, but that's not the case for other operating systems.

It might be nice if web browsers could natively act as EPUB viewers, in the
same way a number of them natively act as PDF viewers. That way, the user
already has an EPUB viewer installed, and they don't have to go and find and
install one.

~~~
mercer
Now that I think about it I'm kind of surprised that browser _don 't_ natively
show EPUB files.

~~~
robert-zaremba
there are plugins for that (same as you have with PDFs, though you reuse same
rwndering engine, since EPUBs are html documents under the hood)

~~~
ldjb
There are plugins, but having to find and install one is an extra step that I
think a lot of people don't take.

It's far different than being able to link to a PDF file and having it
instantly display in the browser. As far as I'm aware, Firefox, Chrome and
Safari (and possibly other browsers) can all display PDFs natively, so I think
it would be convenient if they could also display EPUBs without the need for
additional plugins.

As you say, EPUBs use HTML under the bonnet, so rendering should be pretty
straightforward.

~~~
robert-zaremba
I hope one day all popular browsers will render EPUB straight away. Should be
lot easier then PDF.

------
scholia
The point of the story is as follows:

 _> PDF is not portable on digital screens. It doesn’t scale. It’s not
comfortable to read PDF files on a mobile or ebook readers_

Arguing that PDFs are good for printing out, or better than some other format,
doesn't actually address the issue ;-)

------
accordionclown
1a. .pdf is a fine format.

1b. unless -- as is increasingly the case these days -- you're reading it on a
screen smaller than the one for which the .pdf was designed, in which case
.pdf is an awful format.

2a. .epub is a fine format.

2b. unless you're reading it in a viewer-app which is wonky, which most of
them are. (the inconsistencies of rendering with this so-called "standard" are
unbelievably bad, and seem to be getting worse rather than better as time goes
on.)

3a. when you try to re-use text by copying it out of a .pdf, you often get
some really bad stuff that loses a lot of important styling.

3b. when you try to re-use text by copying it out of an .epub, it's not much
better.

4a. the standard line is that an .epub is just "a website packaged into a .zip
file", implying that anything you can do on a website can be done in an .epub.

4b. the standard line is a lie. an .epub requires .xhtml rather than .html,
and a complex mess of associated files, and most .epub viewer-apps have
trouble supporting the full gamut of .css, and also do not allow you to use
javascript at all.

conclusion: the state of sharing documents on the web in a way that allows
offline use while enabling the convenient re-use of text is a sad state
indeed.

------
dragonwriter
PDF a perfectly _portable_ in the digital world (the only world in which it
has ever existed.)

It's not perfectly _optimized_ for every display (or print page, the two being
equivalent) size, resolution, etc., but then neither is any other format that
can handle the same range of content, nor will any format _ever_ be until we
have AI layout that does as good as professional layout from a single source
file for all media sizes and properties.

I find professionally laid out PDFs that are designed for letter/A4 size pages
to superior in practical use to any reflowable format I've yet seen at pretty
much every size for most content more complex than plain linear text like
you'd find in a novel. (Smartphone and smaller devices aren't great for it,
but then they aren't great for reading content more complex than linear text
regardless of format.)

------
icebraining
Most devices include a PDF reader, but not an EPUB reader. Yes, you can
download one, but as a publisher, you can't expect your readers to jump
through that hoop.

~~~
robert-zaremba
EPUB / MOBI reader is not a problem this days

~~~
sigzero
That really doesn't speak to his statement. The majority of devices read PDF
by default where EPUB/MOBI, the user needs to go get an app.

~~~
Spivak
10 You should publish PDFs because devices include readers.

20 Devices have readers because everyone publishes PDFs.

30 GOTO 10

------
thinkMOAR
Instead of calling for a 'ban' on a format by very much subjective reasons,
how about calling for publication in multiple formats, so the people have a
choice? It is certainly not much more work, and it looks in my humble opinion,
professional.

~~~
robert-zaremba
Good point! Sorry if my Call sounds repulsive. Your idea with creating
publications in multiple format works. I will update my post for that. Though,
in my post, I want to highlight that usually simple solutions works fine. Most
of this publications are not complex in terms of typesetting. If there is an
objective for complex typesetting than fair enough.

------
unsignedint
I'm understanding hard time understand some of the point this article makes;
particularly the claim about that you need to think about typesetting and
design more. Most of word processors these day have some type of style system
that akin to HTML.

PDF is also one of few formats that is readily available that has a well
defined archival spec (PDF/A) which further makes it more compatible across
the readers. (As essentially it requires documents follow certain specs.)

------
robert-zaremba
Do you publish on the Internet? Do you read a lot of publications on your
digital devices?

How about stopping using PDF for internet publications and using EPUB instead
(or some other screen independent format)? Please, share your comments.

~~~
throwaway2016a
Trying to be helpful...

I think you may be getting down voted because it is generally considered bad
form to submit your own blog unless it is a "Show HN" but if the content is
good it can out weight that and an article can be up voted anyway. But if you
do submit your own content it is probably best to let the content speak for
itself vs trying to solicit HN as a discussion forum.

~~~
robert-zaremba
Thanks for a comment. What's the reason for not posting own article to content
discovery services / agregators? It's a common thing.

~~~
throwaway2016a
HN is not a typical aggregator.

HN historical is not a marketing site, it's a forum where technologist share
things they find interesting. So any form of self promotion is usually viewed
with a bit more critical lens than if it was submitted by a third party.
Rightly or wrongly the bar is set higher.

People liked your article (it made it to the front page) but your comment
above feel like it is extra "sales" oriented even if the article itself if not
about a product.

If you must comment on your own story you should do it as if you were
commenting on another user's post, in first person. For example, you could
rewrite your comment as:

"I wrote this article because I was frustrated with the poor reading
experience when content is delivered in PDF. What do you all think?"

