Ask HN: Why is HN often talking about PDF files?

swsieber · 2024-06-26T21:34:57 1719437697

The are widely used and in a terrible format that's hard to work with (programmatically). So yes, there are problems.

My particular pet peeve is that many rendering platforms suffer from floating point rounding shenanigans, such that rendering/rasterizing a 8.5x11 doc at 300 dpi results in a the height being off by a pixel :/

ldenoue · 2024-06-26T21:41:07 1719438067

My issue with PDFs was the fact they are hard to read on smaller screens because the format isn’t natively easy to reflow.

I guess another issue today is RAG for AI: how can a pdf be chunked in meaningful pieces. Funny to see old methods such as document layout analysis being used for RAG.

JohnFen · 2024-06-26T21:48:49 1719438529

PDFs are, in my opinion, entirely unsuitable for anything that isn't intended to be printed.

Yawrehto · 2024-06-26T22:02:45 1719439365

Not necessarily. The clarity offered is helpful if you need to cite something, e.g. for a paper, and know exactly which page it's coming from - though this only works for book scans and is a somewhat specific use. Or when you need everyone to have exactly the same words on exactly the same page. Quite helpful for standardization.

JohnFen · 2024-06-26T22:16:04 1719440164

> e.g. for a paper, and know exactly which page it's coming from - though this only works for book scans and is a somewhat specific use.

I'd include that as "intended to be printed". I should have said "unsuitable for anything that isn't printed matter", though.

In the purely electronic world, there are much better ways of achieving the standardization you're talking about that don't come with the rather large downsides of PDFs.

Yawrehto · 2024-06-27T18:53:22 1719514402

I didn't technically intend for it to be printed, but yes, I will get on board with "unsuitable for anything that isn't printed matter".

anotherhue · 2024-06-26T21:37:51 1719437871

PDFs and JPEGs are the standards for 'digital documents'. I expect they still will be in 30 more years because they're "good enough".

Other than plaintext, are there any others?

fragmede · 2024-06-26T22:01:44 1719439304

HTML comes to mind, but mobi/epub seems to be popular for ebooks. Then there's also docx, which is a standard these days.

kartoffelmos · 2024-06-27T06:42:23 1719470543

Problem with docx is that there's no real way to _just_ view a file. I don't really want to spin up an entire word processor just to read the meeting minutes. I honestly much prefer a PDF to docx.

fragmede · 2024-06-29T11:57:20 1719662240

Someone could make a docx viewer easily enough if there was demand for it. it's just an xml file inside a zip file with the contents. it's no LaTeX but it's serviceable.

ldenoue · 2024-06-26T21:42:38 1719438158

So that would explain why people are interested in tools that help manipulate pdf files?

cassianoleal · 2024-06-26T23:12:58 1719443578

LaTeX, Markdown, OpenDocument Format