A lot of people here seem to knock PDF, but I love it. Anyone who has tried to u...

aikah · on Sept 14, 2020

> A lot of people here seem to knock PDF, but I love it.

People will disabilities that rely on screen-readers don't love it. There is no such a problem with HTML/CSS which should be the norm for internet documents.

> With PDF, everyone sees the same thing.

Yes, provided you can see at first place...

jacquesm · on Sept 14, 2020

> There is no such a problem with HTML/CSS which should be the norm for internet documents.

It should be. But meanwhile everybody seems to think it is perfectly ok that there is a bunch of JavaScript that needs to run before the document will display any text at all and how that text makes it into the document is anybody's guess.

ehnto · on Sept 14, 2020

It was an amazing shift in priorities that I feel like I somehow missed the discussion for. We went from being worried about hiding content with CSS to sending nothing but script tags in the document body within 5 years or so. The only concern we had when making the change seemed to be "but can Google read it?". When the answer to that became "Uh maybe" we jumped the shark.

My bashful take is that nobody told the rest of the web development world that they aren't Facebook, and they don't need Facebook like technology. So everyone is serving React apps hosted on AWS microservices filled in by GraphQL requests in order to render you a blog article.

I am being hyperbolic of course, but I was taken completely off guard by how quickly we ditched years of best practices in favour of a few JS UI libraries.

czechdeveloper · on Sept 14, 2020

This compain can be applied to paper too. PDF is not much more than precise document to be printed or exactly visually presented.

monadic2 · on Sept 14, 2020

Is PDF not supposed to improve on paper...? I'm rather surprised at the revelation that PDF is not accessible.

dirkt · on Sept 14, 2020

PDF is tied to page layout. PDF is a way to digitally describe something that's intended to be printed to paper, on a sheet of a certain size.

And as a format, it's much more sane than, say, Word or Excel.

Even if the focus on "where do I put this glyph" means the original text isn't in there by default.

monadic2 · on Sept 14, 2020

Yea, but that's a terrible explanation for lack of basic accessibility in 2020. Literally just laziness.

FWIW this is not a technical barrier; it would be absolutely trivial to associated blocks of non-flowed text with the layed out text.

czechdeveloper · on Sept 14, 2020

My use is exclusively "this will end up on paper in a minute", so any improvement would be irrelevant.

Why would I want document type that can't even refloat on display size to represent any longer written text that is supposed to be consumed on digital device?

monadic2 · on Sept 14, 2020

I would ask a blind person.

IncRnd · on Sept 14, 2020

Adobe Reader has an option to Read Aloud PDF files. I don't know how well or poor that works, but I'm writing this comment just in case you were not aware of that function.

z3t4 · on Sept 14, 2020

The problem is that you cant parse PDFs reliably. Half of the time they are just bitmap images from a scanner.

jlokier · on Sept 14, 2020

Some Word documents are like that too! Can't blame the PDF format if the source material is a bunch of scans.

daveFNbuck · on Sept 14, 2020

html/css has a similar problem. A lot of my email is just a collection of images that contain text.

franga2000 · on Sept 14, 2020

I absolutely hate those, but the HTML spec does require the alt attribute which is actually used in practice quite commonly.

daveFNbuck · on Sept 14, 2020

I've never read the HTML spec directly until checking it now to verify your comment. I usually use MDN, which says that the alt attribute is not mandatory. [1]

[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Im...

bregma · on Sept 14, 2020

Non-conforming email should be filtered by the scam filter before it ever gets to your MUA.

daveFNbuck · on Sept 14, 2020

What is that email not conforming to?

nine_k · on Sept 14, 2020

PDF is digital paper.

You like it for that very quality: immutable, reproducible rendering.

Those who have to extract data from PDFs face nearly the same problem as those who have to deal with paper scans: no reliable structure in the data, the source of truth is the optical recognition, by human or by machine.

silisili · on Sept 14, 2020

I agree, but my argument here is that it's up to the producer. They obviously wanted it to be a digital paper, for some reason, and not a data mining source. We should blame the producers, not the format. It's equivalent to saying it's hard to get the source code from the pesky exe files people distribute, so exe is a mess.

thayne · on Sept 14, 2020

PDF is great for what it was originally designed for: a portable format for instructing printers on how to print a document. The problem is people using it in ways it wasn't designed for. Sharing a PDF of your document is about as useful as sharing an SVG export of your document (actually, an SVG probably has more semantic information). It is a vector image format, not a document format.

pathseeker · on Sept 14, 2020

> They obviously wanted it to be a digital paper, for some reason, and not a data mining source.

"Because it's pretty" is it. 99% of people don't care about text being a data mining source.

ntucker · on Sept 14, 2020

The original goal of PDF was to create documents that could "view and print anywhere" (literally the original tagline of the Acrobat project), substantially the same as how the document creator intended them. What Adobe was trying to solve was the problem of sending someone a document that looked a particular way and when they rendered it on their printer or display, it looked different, e.g. having a different number of pages because subtle font differences caused word-wrapping to change the number of lines and thus the page flow. It wasn't about it being "pretty," it was about having functional differences due to local rendering and font availability. In this regard, the format is an emphatic success.

I do wish they had focused a bit more on non-visual aspects such as screen-reader data, but to say the whole point is "because it's pretty" is a bit uncharitable. The format doesn't solve the problem you wish it solved, but it does solve a problem other than making things "pretty."

bee_rider · on Sept 14, 2020

Alternatively, "the journal only accepts LaTeX."

I quite like PDFs, but this thread has been an eye-opener.

bscphil · on Sept 14, 2020

> Anyone who has tried to use OpenOffice full time probably does too.

I agree, I do too (LibreOffice), but for the opposite reason. Even internally, the font rendering in LibreOffice with many fonts is often quite bad. This is especially noticeable for text inside graphs in Calc.

If I'm going to read something lengthy that's a LibreOffice document, I open it (in LibreOffice), and export it to a PDF. LibreOffice consistently exports beautiful PDFs (and SVG graphs), which tells me that it "knows" internally how to correctly render fonts, just that its actual renderer is quite bad.

TwoBit · on Sept 14, 2020

Is the renderer dealing with the classic small text glyph hinting problem?

bscphil · on Sept 14, 2020

Could be. I'm not sure what the issue is. Firefox, Chrome, and basically every other thing I use works fine.

Spooky23 · on Sept 14, 2020

The audience here is developers and other geeks who get stuck dealing with PDFs. The issue when you read into it is usually about structured data delivered via PDF — which I would wholeheartedly agree is a monstrous and unnecessary misuse of the format.

The other thing that is unfair is assholes who deliver tabular data in PDF format usually don’t want you to have it. When your county clerk prints a report, photocopies it 30 times, crumples it and scans to PDF without OCR, that’s not a file format issue.

Wowfunhappy · on Sept 14, 2020

Yes, thank you! I have exactly the same feelings, because I like the write in old versions of iWork. With a PDF, I know that whatever I export will look the same for whoever I send it to.

I sometimes see people complain about how PDF sucks because it doesn't look quite the same everywhere (namely, non-Adobe readers), but if you're not doing anything fancy is pretty much does. It is, at minimum, more reliable than any other "open" format I'm aware of, save actual images.

willvarfar · on Sept 14, 2020

The problem you have is that its likely, increasingly likely, that your CV is the exact document that will next be 'read' by computer rather than a human.

I know something about that area. Today, perhaps a 10th of CVs are sorted and prescreened by software. That fraction will only increase.

IncRnd · on Sept 14, 2020

As someone who actually used to program in PostScript, I am happy as a clam with PDFs!

There are two issues with parsing them however.

  1) PDF is an output format and was never intended to have the display text be parseable.
  2) PDF is PostScript++, which means that is is a programming language.
     This means that a PDF is also an input description to the output that we
     are all familiar with seeing on a page.

PS I don't know if it is the case anymore, but Macs used to have a display server that handled all screen images in PDF format. That was an optimization from the NeXT display server, which displayed using Display PostScript.

tonyedgecombe · on Sept 14, 2020

>PDF is PostScript++, which means that is is a programming language.

The big change that came with PDF was removing the programming capabilities. A PDF file is like an unrolled version of the same PostScript file. There is still a residue of PostScript left but in no way can it be described as a programming language.

IncRnd · on Sept 14, 2020

PDF is absolutely a programming language. It is not a general purpose programming language but a page description language. You are referring to looping constructs and procedures being removed, but a loop does not a language make. Similarly, LaTeX and sed are programming languages.

tonyedgecombe · on Sept 14, 2020

What features do you think make it a programming language? Because I have spent quite a bit of time working with it and all I can see is a file format.

IncRnd · on Sept 14, 2020

"A programming language is a vocabulary and set of grammatical rules for instructing a computer or computing device to perform specific tasks."

A programming langauge is not inherently a programming language due to features it contains but due to it being used to program. A program is "a series of coded software instructions to control the operation of a computer or other machine."

In this way, a PDF file embodies a program that performs specific tasks. A PDF file does not contain a general purpose programming language, but it does contain the page description language of the output format that describes what is to be imaged. Then, the PDF program is given to an interpreter that displays the output.

This is the same as a simple program in turtle graphics to display a rectangle, even if no other language feature was used. In such a case, one would say that rectangle was programmed. We would not use the word program in connection with that turtle graphics program, if the rectangle description were not sent to an interpreter that displayed the rectangle.

tonyedgecombe · on Sept 15, 2020

By that definition notepad is a programming language because it can open and show a text document.

JadeNB · on Sept 14, 2020

> PS I don't know if it is the case anymore, but Macs used to have a display server that handled all screen images in PDF format. That was an optimization from the NeXT display server, which displayed using Display PostScript.

Quartz! https://en.wikipedia.org/wiki/Quartz_(graphics_layer)#Use_of...

IncRnd · on Sept 14, 2020

AHA! Thank you.

taneq · on Sept 14, 2020

Ugh, yeah I use LibreOffice for all my internal stuff but I have to keep MS Office installed for editing externally-visible documents, so I can be (somewhat) sure the formatting isn't going to get screwed up.

Finnucane · on Sept 14, 2020

PDF is very good for what it was designed to do, which is to represent pages for printing. It is not so good for use cases where parsing the text is more important than preserving layout.

pmiller2 · on Sept 14, 2020

Indeed, I often say there's a special place in Hell where there are programmers trying to extract data from PDFs.

paledot · on Sept 14, 2020

The souls who labour there, in life, posted PDFs to websites when HTML would have sufficed.

DarkWiiPlayer · on Sept 14, 2020

> With PDF, everyone sees the same thing.

SVG does that too, but it can also have aria tags to improve accessibility and have text that can be extracted much more easily.

arnvald · on Sept 14, 2020

> everyone sees the same thing

Mostly. I've seen issues where PDF looked fine on a Mac but not on Windows.

Also, the fact that you see the same thing everywhere is good if you have one context of looking at things - e.g. if everyone uses big screen or if everyone prints the document, that's fine. But reading PDFs on e-book readers or smartphones can be a nightmare.

leephillips · on Sept 14, 2020

If the PDF uses a font that the creator neglected to embed, the reader’s system will have to supply the font, which could be a substitute. This is the only case I’ve seen where the PDF did not render exactly the same on all systems.

icelancer · on Sept 14, 2020

I like PDF a lot. It's got its drawbacks but the universal format is really helpful for layout-driven stuff. Shrug. It gets it done.

iovrthoughtthis · on Sept 14, 2020

And yet everyone is different, on different devices.

Why should they all see the same thing?

Using PDF here is self serving. It’s actively user-hostile.