Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why is HN often talking about PDF files?
6 points by ldenoue 11 months ago | hide | past | favorite | 12 comments
I noticed quite a few posts about PDFs (last today was about pdfscale) and wonder why they receive so many comments here on HN.

Are there problems with PDFs that are somehow making us all passionate about?




The are widely used and in a terrible format that's hard to work with (programmatically). So yes, there are problems.

My particular pet peeve is that many rendering platforms suffer from floating point rounding shenanigans, such that rendering/rasterizing a 8.5x11 doc at 300 dpi results in a the height being off by a pixel :/


My issue with PDFs was the fact they are hard to read on smaller screens because the format isn’t natively easy to reflow.

I guess another issue today is RAG for AI: how can a pdf be chunked in meaningful pieces. Funny to see old methods such as document layout analysis being used for RAG.


PDFs are, in my opinion, entirely unsuitable for anything that isn't intended to be printed.


Not necessarily. The clarity offered is helpful if you need to cite something, e.g. for a paper, and know exactly which page it's coming from - though this only works for book scans and is a somewhat specific use. Or when you need everyone to have exactly the same words on exactly the same page. Quite helpful for standardization.


> e.g. for a paper, and know exactly which page it's coming from - though this only works for book scans and is a somewhat specific use.

I'd include that as "intended to be printed". I should have said "unsuitable for anything that isn't printed matter", though.

In the purely electronic world, there are much better ways of achieving the standardization you're talking about that don't come with the rather large downsides of PDFs.


I didn't technically intend for it to be printed, but yes, I will get on board with "unsuitable for anything that isn't printed matter".


PDFs and JPEGs are the standards for 'digital documents'. I expect they still will be in 30 more years because they're "good enough".

Other than plaintext, are there any others?


HTML comes to mind, but mobi/epub seems to be popular for ebooks. Then there's also docx, which is a standard these days.


Problem with docx is that there's no real way to _just_ view a file. I don't really want to spin up an entire word processor just to read the meeting minutes. I honestly much prefer a PDF to docx.


Someone could make a docx viewer easily enough if there was demand for it. it's just an xml file inside a zip file with the contents. it's no LaTeX but it's serviceable.


So that would explain why people are interested in tools that help manipulate pdf files?


LaTeX, Markdown, OpenDocument Format




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: