A lot of people here seem to knock PDF, but I love it. Anyone who has tried to use OpenOffice full time probably does too. We have 'descriptive text' in MS various formats, or even html/css. The problem is every implementer does things in slightly different ways. So my beautiful OpenOffice resume renders with odd spacing and pages with 1 line of text in MS Office. With PDF, everyone sees the same thing.
> A lot of people here seem to knock PDF, but I love it.
People will disabilities that rely on screen-readers don't love it. There is no such a problem with HTML/CSS which should be the norm for internet documents.
> There is no such a problem with HTML/CSS which should be the norm for internet documents.
It should be. But meanwhile everybody seems to think it is perfectly ok that there is a bunch of JavaScript that needs to run before the document will display any text at all and how that text makes it into the document is anybody's guess.
It was an amazing shift in priorities that I feel like I somehow missed the discussion for. We went from being worried about hiding content with CSS to sending nothing but script tags in the document body within 5 years or so. The only concern we had when making the change seemed to be "but can Google read it?". When the answer to that became "Uh maybe" we jumped the shark.
My bashful take is that nobody told the rest of the web development world that they aren't Facebook, and they don't need Facebook like technology. So everyone is serving React apps hosted on AWS microservices filled in by GraphQL requests in order to render you a blog article.
I am being hyperbolic of course, but I was taken completely off guard by how quickly we ditched years of best practices in favour of a few JS UI libraries.
My use is exclusively "this will end up on paper in a minute", so any improvement would be irrelevant.
Why would I want document type that can't even refloat on display size to represent any longer written text that is supposed to be consumed on digital device?
Adobe Reader has an option to Read Aloud PDF files. I don't know how well or poor that works, but I'm writing this comment just in case you were not aware of that function.
I've never read the HTML spec directly until checking it now to verify your comment. I usually use MDN, which says that the alt attribute is not mandatory. [1]
You like it for that very quality: immutable, reproducible rendering.
Those who have to extract data from PDFs face nearly the same problem as those who have to deal with paper scans: no reliable structure in the data, the source of truth is the optical recognition, by human or by machine.
I agree, but my argument here is that it's up to the producer. They obviously wanted it to be a digital paper, for some reason, and not a data mining source. We should blame the producers, not the format. It's equivalent to saying it's hard to get the source code from the pesky exe files people distribute, so exe is a mess.
PDF is great for what it was originally designed for: a portable format for instructing printers on how to print a document. The problem is people using it in ways it wasn't designed for. Sharing a PDF of your document is about as useful as sharing an SVG export of your document (actually, an SVG probably has more semantic information). It is a vector image format, not a document format.
The original goal of PDF was to create documents that could "view and print anywhere" (literally the original tagline of the Acrobat project), substantially the same as how the document creator intended them. What Adobe was trying to solve was the problem of sending someone a document that looked a particular way and when they rendered it on their printer or display, it looked different, e.g. having a different number of pages because subtle font differences caused word-wrapping to change the number of lines and thus the page flow. It wasn't about it being "pretty," it was about having functional differences due to local rendering and font availability. In this regard, the format is an emphatic success.
I do wish they had focused a bit more on non-visual aspects such as screen-reader data, but to say the whole point is "because it's pretty" is a bit uncharitable. The format doesn't solve the problem you wish it solved, but it does solve a problem other than making things "pretty."
> Anyone who has tried to use OpenOffice full time probably does too.
I agree, I do too (LibreOffice), but for the opposite reason. Even internally, the font rendering in LibreOffice with many fonts is often quite bad. This is especially noticeable for text inside graphs in Calc.
If I'm going to read something lengthy that's a LibreOffice document, I open it (in LibreOffice), and export it to a PDF. LibreOffice consistently exports beautiful PDFs (and SVG graphs), which tells me that it "knows" internally how to correctly render fonts, just that its actual renderer is quite bad.
The audience here is developers and other geeks who get stuck dealing with PDFs. The issue when you read into it is usually about structured data delivered via PDF — which I would wholeheartedly agree is a monstrous and unnecessary misuse of the format.
The other thing that is unfair is assholes who deliver tabular data in PDF format usually don’t want you to have it. When your county clerk prints a report, photocopies it 30 times, crumples it and scans to PDF without OCR, that’s not a file format issue.
Yes, thank you! I have exactly the same feelings, because I like the write in old versions of iWork. With a PDF, I know that whatever I export will look the same for whoever I send it to.
I sometimes see people complain about how PDF sucks because it doesn't look quite the same everywhere (namely, non-Adobe readers), but if you're not doing anything fancy is pretty much does. It is, at minimum, more reliable than any other "open" format I'm aware of, save actual images.
The problem you have is that its likely, increasingly likely, that your CV is the exact document that will next be 'read' by computer rather than a human.
I know something about that area. Today, perhaps a 10th of CVs are sorted and prescreened by software. That fraction will only increase.
As someone who actually used to program in PostScript, I am happy as a clam with PDFs!
There are two issues with parsing them however.
1) PDF is an output format and was never intended to have the display text be parseable.
2) PDF is PostScript++, which means that is is a programming language.
This means that a PDF is also an input description to the output that we
are all familiar with seeing on a page.
PS I don't know if it is the case anymore, but Macs used to have a display server that handled all screen images in PDF format. That was an optimization from the NeXT display server, which displayed using Display PostScript.
>PDF is PostScript++, which means that is is a programming language.
The big change that came with PDF was removing the programming capabilities. A PDF file is like an unrolled version of the same PostScript file. There is still a residue of PostScript left but in no way can it be described as a programming language.
PDF is absolutely a programming language. It is not a general purpose programming language but a page description language. You are referring to looping constructs and procedures being removed, but a loop does not a language make. Similarly, LaTeX and sed are programming languages.
What features do you think make it a programming language? Because I have spent quite a bit of time working with it and all I can see is a file format.
"A programming language is a vocabulary and set of grammatical rules for instructing a computer or computing device to perform specific tasks."
A programming langauge is not inherently a programming language due to features it contains but due to it being used to program. A program is "a series of coded software instructions to control the operation of a computer or other machine."
In this way, a PDF file embodies a program that performs specific tasks. A PDF file does not contain a general purpose programming language, but it does contain the page description language of the output format that describes what is to be imaged. Then, the PDF program is given to an interpreter that displays the output.
This is the same as a simple program in turtle graphics to display a rectangle, even if no other language feature was used. In such a case, one would say that rectangle was programmed. We would not use the word program in connection with that turtle graphics program, if the rectangle description were not sent to an interpreter that displayed the rectangle.
> PS I don't know if it is the case anymore, but Macs used to have a display server that handled all screen images in PDF format. That was an optimization from the NeXT display server, which displayed using Display PostScript.
Ugh, yeah I use LibreOffice for all my internal stuff but I have to keep MS Office installed for editing externally-visible documents, so I can be (somewhat) sure the formatting isn't going to get screwed up.
PDF is very good for what it was designed to do, which is to represent pages for printing. It is not so good for use cases where parsing the text is more important than preserving layout.
Mostly. I've seen issues where PDF looked fine on a Mac but not on Windows.
Also, the fact that you see the same thing everywhere is good if you have one context of looking at things - e.g. if everyone uses big screen or if everyone prints the document, that's fine. But reading PDFs on e-book readers or smartphones can be a nightmare.
If the PDF uses a font that the creator neglected to embed, the reader’s system will have to supply the font, which could be a substitute. This is the only case I’ve seen where the PDF did not render exactly the same on all systems.