Hacker News new | past | comments | ask | show | jobs | submit login

That's the weirdest part of the PDF spec IMHO. It's a mix of both binary and text, with text-specified byte offsets. It would be very interesting to read about why the format became like that, if its authors would ever talk about it. My guess is that it was meant to be completely textual at first (but then requiring the xref table to have fixed-length entries is odd), and then they decided binary would be more efficient.



I actually was at a Acrobat/PDF launch event in midtown NYC. It was an embedded file type that could be generated at the type of publishing and all dependencies could either be embedded or not.

This made a coherent point in a digital workflow that could be saved and reprinted with ease. This was a big deal before the portable document format came to be.

I once made a workflow that took pdf files from Word, filemaker, excel, and mini-cad. This all got combined into a single 9,000 page pdf. The final pdf had a coherent thumbnails, page numbers and headers and footer.

Only took a couple of hours to get the final documnet after pushing the go buttton.


> My guess is that it was meant to be completely textual at first

It indeed started life as “not Turing complete postscript with an index” (those makes it easy to render just the third page of a PDF file, something that’s impossible in postscript without rendering the first and second pages first). Like postscript, that was a pure text format.

One nice feature is that you can append a few pieces and a new index to an existing PDF file and get a new valid PDF file (which would still contain its old index as a piece of “junk DNA”)

I think compression was added because users complained about file sizes. Ascii85 (https://en.m.wikipedia.org/wiki/Ascii85) grows binary data by 25%.

> but then requiring the xref table to have fixed-length entries is odd

My guess is that made it easier to hack together a tool to convert PDF to postscript.


The roots of PDF are PostScript, which is like Forth, and is text-based, so that’s why




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: