
PDF Is the World's Most Important File Format - jbegley
https://motherboard.vice.com/en_us/article/pam43n/why-the-pdf-is-secretly-the-worlds-most-important-file-format
======
crazygringo
I didn't "get" PDF's until I started doing academic research.

But the ability to collect papers, books, documents, etc. all in a single
format that I can read on any device, _and_ mark up with highlights and notes,
has been a game-changer.

Yes it's a lowest-common-denominator format. And it's designed for human
reading and manual office tasks, not computer processing of data. But it
_works_. And it's supported everywhere.

Doesn't matter if I'm on the Apple or Google or Microsoft or Adobe stack.
Doesn't matter if the PDF is 20 years old. It just works.

~~~
arcticbull
Some features, not all of them. Forms in particular are poorly supported in
the Apple stack, and specifically, it can't generate QR codes from form fields
in documents like Reader can. Fonts are usually inconsistent and wrong in
forms. Letter spacing is all wrong. I'm sure there's tons of other things
missing, the spec is ginormous and includes it's own version of JavaScript
that's similar to -- but not quite -- ECMA which is a huge attack vector [0].

For instance, check out the Canadian passport simplified renewal form [1]. The
upper-right corner on the first page is "$FORM$054(06-2018)$V$1.4$CS$0$C$0" on
the Apple stack and a proper QR code which changes as you fill in the
application in Reader. The big blue "Read Instructions" buttons don't work on
the Apple stack either.

It may be important, but it's a waking nightmare of a spec.

[0] [http://mariomalwareanalysis.blogspot.com/2012/02/how-to-
embe...](http://mariomalwareanalysis.blogspot.com/2012/02/how-to-embed-
javascript-into-pdf.html)

[1]
[https://www.canada.ca/content/dam/ircc/migration/ircc/englis...](https://www.canada.ca/content/dam/ircc/migration/ircc/english/passport/forms/pdf/pptc054.pdf)

~~~
sfkdjf9j3j
Not that you're wrong about the PDF spec being monstrously complex and not
universally supported, but why would someone want to generate a QR code from
form fields? Who actually _wants_ to use QR codes?

~~~
gibolt
Anyone in Asia, they are hugely popular and super useful.

Adding friends on WeChat, making payments to vendors, getting discounts,
installing an app, etc

~~~
geomark
Yep. In my case add friends on Line, buy subway tix, pay at 7-11 and many
other places. You can also receive payments to your own QR code.

------
timw4mail
PDFs for reading is one thing. Fillable PDFs are evil incarnate. Apparently
there are multiple ways to do it, and only Acrobat Reader is capable of making
them all work.

Then you have abominations like embedded flash...

As a standalone file representing the format of a book, PDFs are a good
format. But then PDFs can unfortunately (sometimes) store much more, and then
can be a security minefield.

~~~
morpheuskafka
PDFs can even contain JavaScript...

~~~
iamnothere
Indeed they can: [https://github.com/osnr/horrifying-pdf-
experiments](https://github.com/osnr/horrifying-pdf-experiments)

(Note, breakout game only works in Chrome's PDF reader)

------
maxxxxx
After spending time extracting data and text from PDF I would also say it’s
the worst file format. We have perfectly fine structured documents, convert
them to PDF to lose half of the information and then we spend insane effort to
get the data back somehow.

It also doesn’t render well on different resolutions.

PDF is a perfect case study how inferior solutions can become standards.

~~~
svnpenn
PDF isn't and was never meant as a data storage format. It is meant as a
presentation format. You write a file in latex or docx then convert to PDF for
sharing. This means the recipient only needs a PDF reader, not whatever tools
you used to create the file. This is similar to writing a file in C and
compiling to an executable. You _can_ distribute the source, but most (outside
of programmers) just want the executable so they don't have a to worry about
the tooling.

~~~
maxxxxx
That’s the problem. It was never meant as what it’s used for now. It’s bad for
manuals, it’s bad for data, it’s bad for mobile devices, it’s bad for
versioning. It’s only good for printing. But somehow it’s used for all of
these.

~~~
Spooky23
Formatting matters in many cases, and PDF respects that better than any other
common format.

I have policy and legal documents from 2005 in PDF/A that can be rendered
identically in 2019, and likely in 2105. That isn't the case for HTML, Word or
almost any non-plaintext format. If for no other reason than the US Federal
Courts require use of PDF, the format will exist and be somewhat vibrant for
many decades to come.

I can wholeheartedly will agree that Adobe Reader sucks, but the format solves
lots of problems that are difficult to solve otherwise.

~~~
ghostly_s
How exactly do you think your HTML files from 2005 would render differently
today? Sure, your modern browser engine would probably make slightly different
decisions with fonts and margins, but would that negatively impact the
usability of a legal document, research paper, etc.? Meanwhile, you would have
gotten things like robust search and real compatibility with mobile phones and
e-readers _for free_.

A heavily restricted subset of HTML to replace PDF as the 'archival' format
would make the world so much better of a place.

~~~
Spooky23
Formatting is important — all of the arguments made in favor of HTML can be
made for plaintext. If you don’t care about format and want to read the
document 30, 50, 100 or more years ago, you should use text.

Will HTML formatting on a 4K display will look the same as an 800x600 monitor?

Will all of the ancient display elements display the same? Will IE4 specific
artifacts display?

Search works fine in PDF. Mobile is not optimal, but platforms optimized for
mobile require different design considerations. Few webpages look or function
identically on mobile.

~~~
ghostly_s
_How_ is formatting important though? Semantic formatting, sure, but again, a
20-year-old HTML file still has that preserved. The em-perfect formatting
inherent in PDF seems entirely purposeless unless you for some reason need to
be able to reproduce a printed copy of the document exactly. Formulas are a
special case, but the great majority of documents I see distributed in PDF
would be _more useful_ as HTML, and I don't buy the argument that it would
risk archivability. At the end of the day, HTML is a plain-text format which
is quite human-readable even if all the world's HTML renderers were somehow
lost to the ages.

The only use-case I can conceive for perfectly-reproducable layout if you are
not a print publisher is in fields where it is convention to reference text by
page+paragraph number; in those cases, the page number is actually semantic
information so could quite trivially be encoded in the content of the document
to maintain that referent.

> Search works fine in PDF.

 __Search works fine in PDFs that don 't use hyphenation or which properly
implement it, and when they don't it breaks silently in ways that are
potentially disastrous.

> Mobile is not optimal, but platforms optimized for mobile require different
> design considerations.

And my point about mobile was _you don 't need to optimize for it_.

>Few webpages look or function identically on mobile.

Again, it _doesn 't need to look identical_. That is a falce requirement.
There is no value in that. The only web pages that don't function on mobile
are ones which have been optimized for desktop. We're not talking about
general-purpose web pages here, we're talking about textual documents.

This is not some hypothetical scenario, BTW. The UK has been using HTML over
PDF for public-facing documents for a couple years now, it seems to be working
out for them[1].

1\. [https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-
should...](https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-
published-in-html-and-not-pdf/)

~~~
ezequiel-garzon
_The only web pages that don 't function on mobile are ones which have been
optimized for desktop._

Although it's rare to find them nowadays, this is false for pages with no CSS
at all. By no CSS I mean not even the infamous proprietary viewport meta
tag... which is a posteriori being made CSS. Access for instance the basically
unstyled page [1] with a $1000+ smartphone of our day... and you're likely to
find unreadably small font.

Now, you could argue [1] _functions_ on mobile, but let's agree it's
stretching the meaning of that term. But that's not the main point: HTML
elements come and go (for instance MENU; find others in [2]), so it's clear
that archival reliability is not a big priority.

We all agree that ideally the source should always be made available (mandated
if tax payers' money is involved if you ask me), but that doesn't invalidate
the value of a universal presentational format.

[1] [http://www.qrg.northwestern.edu/papers/files/simhobby-
local....](http://www.qrg.northwestern.edu/papers/files/simhobby-local.htm)

[2] [https://meiert.com/en/indices/html-
elements/](https://meiert.com/en/indices/html-elements/)

------
interlocutor
PDF is based on PostScript, which is a Page Description Language (PDL) also
from Adobe. The first PostScript printer, the Apple LaserWriter, launched the
desktop publishing revolution. The interesting thing about PostScript is that
it is a full programming language, with loops and conditionals and so on. When
Adobe designed PDF, they kept the same imaging model as PostScript, but
stripped out the programming language. Thus they ended up with a dynamic page
description language (PostScript) for media (printers) that cannot fully take
advantage of a dynamic PDL, and a static PDL (PDF) for media (i.e., computer
screens) that could have benefited from a full programming language!

~~~
srean
Postscript is a pretty interesting stack based language on its own. Its size
is intimidating though. If I remember correctly there was a toy web-server
written in postscript just to show that it is possible.

------
irrational
I don't know. I personally think plain text is a more important file format.
It's read and writeable by a million programs. Any first year programming
student can easily build their own program to read and write plain text. It is
fairly easily parseable. Etc.

I just opened a random PDF on my computer in a text editor and it starts off
with "xÕ\ko‹∆˝Œ_1@ÉbD 9|ÌËáƒZçÌDJÇ¢)UZ[nıÚÆÏƒA˛Pˇeœπ"

~~~
cosmie
Plain text can be a really problematic format for data preservation.

\- While it's fairly easy to read and write plain text, it's also fairly easy
to inadvertently introduce unintended artifacts in the process.

\- The more frequently a file gets passed around and read and written to, the
more likely mojibake[1] will get introduced. This concern rises exponentially
when you move to non-US audiences and introduce local-specific encodings. File
storage settings, client operating system settings, server configuration
settings, database settings, programming languages that touch it along the
way. All of them introduce assumptions along the way of a file's encoding, and
many failure cases can be subtle and easily go unnoticed at a glance while
causing some irreversible damage to downstream recipients.

\- Even if you solve for the encoding, you still have structural issues with
tabular data. Different parsers treat escaping and quoting policies
differently. This can result in data shifts as things get mis-parsed, data
corruptions if literal values get interpreted as escape characters or vis
versa, etc.

For preserving data, generic plain text tends to get worse and worse over time
because it's such a non-opinionated format and even if you document the
specifics on econdings and parsing details it's easy for those to get lost
over time as things exchange hands or for intermediaries to corrupt the
plaintext because they relied on defaults instead of the documented parsing
details.

For better or worse, PDF tends to solve the preservation issue while
introducing potential barriers on the parsing/processing side.

[1]
[https://en.wikipedia.org/wiki/Mojibake](https://en.wikipedia.org/wiki/Mojibake)

~~~
upofadown
>mojibake

These days, those that are promoting the idea of plain text as a long term
archive format are assuming UTF-8 by default.

------
jerzyt
My gripe with PDF is that I don't understand why a standard format which is
almost 30 years old, requires seemingly weekly updates of the Acrobat Reader,
which in turn requires reboot of my work laptop. I upgrade the reader far more
often than I actually use it.

~~~
SigmundA
Now days you don't need acrobat reader typically, Chrome, Edge and Firefox can
all view/print PDF's. If your on a Mac/iOS the viewers are built as the OS
uses display postscript internally (quartz) which matches up pretty closely
with PDF.

~~~
bluedino
Chrome/Firefox don't handle complex PDF's well. They render slowly and
sometimes inaccurately

It's not uncommon to print things from Chrome/Firefox and have the
margins/cropping be wrong. Then again you're using a web browser to print a
pdf file, you get what you deserve

~~~
SigmundA
I very rarely run into this in Chrome, they licensed Foxit and made it open
source
[https://opensource.google.com/projects/pdfium](https://opensource.google.com/projects/pdfium)
its native and fast.

Firefox I think still uses
[https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js) which
was not bad last time I used it, but it all javascript so performance isn't up
to par with native. Also printing is not great and I don't think they have
implemented a SVG backend yet for better printing. On the plus side you can
embed it in your web app if you want and have more control over the viewer.

------
nathan_f77
I run a PDF generation service [1], so this is nice to read. When I first
launched the service, I was worried that PDFs and paper forms might become
obsolete in the near future, when everyone starts to go paperless and digital.
Now that I'm more familiar with the market, this is no longer a concern. (I
have banks who are using the software to modernize their operations.)

I also realized that there might be some pressure to turn into the next
TurboTax, where the company eventually lobbies against improvements just so we
can stay in business. I made a resolution that I'll never do anything like
that. But I guess the founders of TurboTax never intended to do that either.

[1] [https://formapi.io](https://formapi.io)

~~~
nojvek
I love little profitable businesses like this. Seems like you’re living the
dream

------
bluedino
I love PDFs. The first time I used one was Acrobat 2.0. I bought a shitty SAMS
computer book that came with a bonus shitty SAMS computer book on the CD. As a
PDF of course.

It was cumbersome on a Pentium 75MHz with 640x480x8 graphics. I'd be blown
away 20 years later viewing those same files on a Macbook Pro with Retina
display.

However, it was easily the best way to distribute printed documents EXACTLY
the way they were meant to be seen.

They weren't meant to be edited, modified, data extracted from...And adobe
went from a quick, minimal viwer to a bloated security nightmare by adding
'features.

Luckily, 3rd party and open-source projects came to the rescue.

------
nyfresh
PDF is the perfect tool for maintaining document formatting but the worst tool
for maintaining document data. The format is not concerned the the actual data
of document. Programatically extracting data from the simplest PDF is an
exercise in patience. I find it kind of odd that this was chosen as the
standard all things considered

------
djhworld
I sometimes read papers but I find them an absolute pain to read them on my
phone because the two aren't really designed for each other. So I end up with
this frustrating experience of zooming in/out to read this bits I'm interested
in - which is impossible to do one handed!

I know there's that arxiv vanity thing, which is cool, but most of the time I
get the "sorry we can't render this" error message.

------
jyriand
PDF is fine if you are using a laptop/desktop/tablet. But it doesn’t fit well
into phone screens. Usually I have to turn my phone horizontally and then zoom
in little bit. And when I reach the end of the page I have trobles turning it.
And sometimes page turning messes up my perfect zoom fit etc. Responsive PDF
is what I need...

~~~
shanhaiguan
That’s not a PDF issue though: it’s just an issue of most content being the
same dimensions as real world paper sizes. No idea how responsive PDF would
work but I agree that would be cool.

~~~
colejohnson66
People in these comments keep mentioning a responsive PDF, but PDF is supposed
to be a presentation format. Wouldn’t HTML and CSS serve the purpose fine?

~~~
akvadrako
You can't represent vector graphics, complex math, or anything non-standard in
HTML+CSS. Even if you could, you couldn't rely on the user seeing it
correctly.

~~~
colejohnson66
Aren’t SVG and MathML standard in HTML5?

~~~
akvadrako
Interesting; I'm not sure if it counts as "in HTML5" but I see they are
referenced in the spec.

It's still not something you can rely upon.

------
anon1253
It's also actively harming medical and scientific progress as I allude to here
in this talk
[https://www.youtube.com/watch?v=EM61rn9Gxl4&list=PLjzcwcP9P2...](https://www.youtube.com/watch?v=EM61rn9Gxl4&list=PLjzcwcP9P2Lfcz38lDtMF_XupRnecKokz&index=12)

~~~
mkl
Can you summarise your point about that here?

------
bla3
This would've been a lot more convincing before mobile happened a decade ago.
Reading PDFs on my phone is a pain, and it hasn't improved at all in the last
ten years. Looking at the web on mobile used to be painful too, but it's ok
now.

~~~
casefields
What phone? I'm still using an iPhone 6S and they are seemless. Some textbooks
that are like 300+ mb start having hiccups though. Probably due to my oldish
phone.

~~~
bla3
They're fast, but I have to scroll and zoom around a lot.

------
wyld_one
It is not open data.

You still can have proprietary blocks of info inside the file.

The lack of open source tools to manipulate the format is a major hindrance
IMHO.

It is also a very space wasting as well when people only do a bitmap dump into
the file for scans. Forms are also an area that is not open source either.

It has so many hacks and kludges, it would be better if it was trashed and we
start with postscript again.

------
lixtra
What happened to HTML? Nobody could read nor discuss about PDF here without
it.

~~~
SigmundA
Let me know when browser support the full CSS print spec...no Prince doesn't
count.

~~~
lixtra
Tell me when PDF can reasonably reflow in all environments/devices.

~~~
creatornator
The whole point of a pdf is to display _exactly_ the same across all devices.
For something like a scientific research paper, you want all figures, etc., to
remain in the same place to avoid formatting oddities.

~~~
lixtra
That’s an important feature, just like reflow. But does the former make PDF
_World’s most important file format_?

~~~
speedplane
>> The whole point of a pdf is to display _exactly_ the same across all
devices.

> That’s an important feature, just like reflow. But does the former make PDF
> World’s most important file format?

It's not just that PDFs are displayed the same across devices, it's that it's
displayed the same over time. I can open a PDF generated in 1995 and it will
look identical today as it did then. The same can't be said about HTML or Word
documents generated in 1995.

------
ris
The thing I find craziest about PDF/A is it isn't really a format in itself,
just a vague promise not to use certain features in the ensuing file. Whether
any reader holds the file to that promise is something I'm quite doubtful of.
Instead I suspect most readers will do their best to display anything they're
handed, happily passing it through any of the hundred-or-so, possibly legacy-
code-powered sub-format decoders the file author wishes - leading to a massive
attack surface.

From a developer's point of view, when trying to _enforce_ that submitted
files are strictly in PDF/A format, from what I can tell there isn't much more
you can do than dissect the file looking for umpteen disallowed features.

 _Is there_ an ISO-compliance validator to anyone's knowledge?

~~~
mkl
Yes! [https://verapdf.org/](https://verapdf.org/)

~~~
ris
Thankyou!

------
jdblair
I remember when PDF came out, I didn't get it. I thought the problem was
already solved by compressed postscript files! I was already used to
downloading and sometimes printing paper documentation in this format.

It was natural to view Postscript files on NeXT and UNIX machines, and
Ghostscript was already thing. What could be better than just using the
"native" language of the printer? I didn't realize that this was not a common
view, or even possible for most personal computers at the time.

I was also misinformed for quite some time about the internal format of PDF,
assuming it was just PS wrapped up in a container. In a sense, this is true,
but there's a lot more (embedded fonts, transparency, forms are just a few
that come to mind).

~~~
mkl
> I was also misinformed for quite some time about the internal format of PDF,
> assuming it was just PS wrapped up in a container. In a sense, this is true,
> but there's a lot more (embedded fonts, transparency, forms are just a few
> that come to mind).

There's also a lot less, as PDF is not a full programming language like PS.

------
levesque
Possibly also the most hated file format. Just the fact that no two PDF
editors behave the same way goes to show how bad of a format it is.

------
bondolo
PDFs are an accessibility nightmare and most production pipelines are terrible
at preserving the semantic structure of the document or, in many cases, even
preserving the text. A PDF is in many cases as series of page images which
aren't usable for anything other than human viewing or printing. "Export as
PDF" generally produces much better results than print to PDF though not from
every application. Many days I wish there was an alternative solution based on
SVG. While not perfect it certainly would avoid many of the problems of PDF
while having all of the important capabilities.

------
KingMachiavelli
I really hate PDF for many reasons. It seems it's only a partially open
format, a lot of features are implementation dependent (how can a file be
'locked' to prevent printing or editing?). There are very few free and FOSS
clients that handle forms, highlighting, etc. Some clients do highlighting and
annotations but don't save it to the PDF itself.

The failure of epub and other HTLM based formats in this use case, IMO, is
that their focus on reflowing to support any display and device makes them
inconsistent and therefore impractical for replacing PDF based content.

------
UglyToad
I've been working on an open source PDF library for C# [1] and its given me an
immense appreciation for the PDF format.

Sure it's horribly long, complex, comes with vulnerabilities and different
consumers have different behaviour. But given the constraints of machines at
the time it was created and the wide range of requirements and usages it's
pretty damn good and has stood the test of time.

[1]: [https://github.com/UglyToad/PdfPig](https://github.com/UglyToad/PdfPig)

~~~
walterbell
Could this be used to remove potentially-malicious content from a PDF, e.g.
anything executable?

~~~
snailmailman
It’s a different software entirely, but QubesOS has an interesting way of
removing malicious content from a pdf. It basically opens the document in a
throwaway VM, then the host takes a high resolution screenshot and makes a new
pdf from scratch.

The digital equivalent of physically printing and re-scanning the document.

------
ashelmire
Even among non-technical people this isn't true. Those people use excel and
word far more than pdf. For devs, we're all about various text formats.

------
Adamantcheese
PDF is great! For human reading. But it's garbage for everything else, and, as
I found out from a a bunch of frustrating job application efforts, absolute
shit for resumés. Literally not parseable, even with OCR apparently. I don't
even know WHY any job application even accepts a PDF for a resumé. It
shouldn't be allowed to if there's any sort of post-processing done to it to
extract information. A bunch of applications didn't even let me see what it
extracted, which, depending on the PDF to text application they're using, may
spit out absolutely nothing from an entire page of text. That's right, a whole
page converted to a single empty line. Marvelous. Truly a technological marvel
that's helping the "it's hard to get a job" feeling go away.

~~~
forrestthewoods
PDF is the one and only format any resume should ever be submitted in.

It's the only way to reasonably guarantee someone will be able to open and
read a resume in its intended form.

~~~
Adamantcheese
Only from my view, once it's automatically parsed, nobody is going to look at
it, because either it'll automatically throw it out from the parsed data (in
which case a blank resume is an obvious throw away), or it'll do some keyword
parsing and determine you haven't copy/pasted enough. The second one is a
problem with all of those systems though. After that point it'll pass it on to
the hiring manager, at which point they'll look at it regardless of format.
PDF then is simply not the best choice if it gives the highest chance of
failure at the first step. A Word document is a much better choice.

~~~
forrestthewoods
> A Word document is a much better choice.

It really truly isn't. Anyone on any device can reliably open a PDF. That is
not true for Word files.

Funny enough, this thread made me want to look at my old resumes. I have a
.doc resume from 2009. You know what happens when I double-click it? Nothing.
I don't have anything that can view it! Windows 10 doesn't come with a preview
tool for doc files. Chrome/Firefox can't preview it either.

~~~
Adamantcheese
I feel you're missing my point here. A PDF is bad for the specific case of
inputting a resumé into an applicant tracking system, due to the myriad of
things that could go wrong without notifying the applicant of any error,
discarding their application without a second glance by a human. A Word
document makes the likelihood of system discard due to error much smaller. A
docx file is just a zipped file, which contains a document.xml that can be
easily read by any regular text editor. It's got all the style information in
it, but it's at least more easily computer parseable than a PDF. A doc file is
also just a zipped file, with a WordDocument file inside. It can also be read
by any regular text editor, but it's got some binary garbage around it. Still
more easily parseable than a PDF.

------
qrbLPHiKpiux
To me, a pdf is only a digital version of a paper version. Just to look at,
present... nothing more. Data duplication, extraction always had to be done
manually. I could never extract text correctly. So I don’t anymore. I just
accepted it. As I did with a lot of other things.

------
shadowtree
.xls runs half of the world economy.

------
lo_fye
Not moreso than .txt

------
xvilka
And at the same time PDF Forms[1], animation[2] and 3D extensions are poorly
supported in FOSS implementations - poppler, mupdf, etc.

[1]
[https://gitlab.freedesktop.org/poppler/poppler/issues/463](https://gitlab.freedesktop.org/poppler/poppler/issues/463)

[2]
[https://gitlab.freedesktop.org/poppler/poppler/issues/683](https://gitlab.freedesktop.org/poppler/poppler/issues/683)

------
alexhutcheson
Author must not realize how many businesses, government organizations, etc.
have operations that are completely dependent on .xls files that someone first
created in the late 90s.

------
dana321
[https://www.youtube.com/watch?v=-cFOsAzigyQ](https://www.youtube.com/watch?v=-cFOsAzigyQ)

------
hieronymusN
For more info on the PDF format: [VIDEO] Programming data for display, the PDF
Story by Chas Emerick -
[https://www.youtube.com/watch?v=MAki8C6qFHY&list=PLGRqfvsPiR...](https://www.youtube.com/watch?v=MAki8C6qFHY&list=PLGRqfvsPiRShRY94F1p_3TT1HMoipcc3c&index=6&t=0s)

------
stcredzero
Are there any people going around and harvesting data of all the PDFs in the
world, using Machine Learning to clean collate, and localize the data?

------
topicseed
Surely database file formats are at least as important to hold whatever data
the World needs to go round.

------
johnvega
Microsoft tried to compete with XPS, xml paper specification around mid 2000's
but did not succeed.

------
boromi
Adobe acrobat reader has been atrocious. I'm so glad I switched to sumatra
this year.

------
aktuel
I couldn't disagree more. I almost never use PDFs. I read a lot of books on my
mobile phone and PDFs are basically unusable there due to their fixed layout.

To say in 2019 that a file format without mobile support is the most important
is moronic.

------
AtlasBarfed
Worse is better strikes again? I'm not talking about SGML

------
781
If they said "Most Important DOCUMENT File Format", maybe, through even then
you have stuff like JPEG.

But arguably executable file formats (.exe, .so, ...) are more important
overall.

~~~
c0vfefe
Having some executable file is a prerequisite for opening any other!

------
thatoneuser
I guess if you wanna claim that, but it's a shit format that causes way more
pain than convenience for most people. Just today I had to upload sensitive
docs to a server to edit it because I can't locally and refuse to pay to do
so. Fuck pdf.

~~~
sbuk
It’s not designed to be edited.

~~~
thatoneuser
Oh so all those fillable pdfs are just hacks then right?

------
revskill
Archiving without extracting is kind of one-way to me.

I prefer a long JSON file, which i can read from, or write down it to file
format i want, to a hard-to-extract format such as PDF.

~~~
owenmarshall
> a hard-to-extract format such as PDF

What's hard about it? It's an open standard with libraries in every language
known to man.

If you use it as a dumb wrapper for scanned images, that's going to suck. But
as a way to store and faithfully reproduce nicely typeset documents with
images - ie, to make an archival copy - I don't think it can possibly be beat.

~~~
krastanov
I think the comment was about bulk extraction of data. For instance, it is
pretty difficult to do bibliographic studies if all the articles you have are
provided only as PDFs (even if it is text pdf). For starters, I do not think
PDFs have a notion resembling a "paragraph of text", because every symbol is
placed separately with its own unique coordinates.

Extracting tables of numbers from PDFs is also a pain.

~~~
maxxxxx
Exactly. I have written software for legal document search. Most of what they
get is in PDF and it’s a major PITA to get data out of them. Forget about
tables. Just try to extract text without some garbled characters and you will
lose your mind.

