That's the weirdest part of the PDF spec IMHO. It's a mix of both binary and text, with text-specified byte offsets. It would be very interesting to read about why the format became like that, if its authors would ever talk about it. My guess is that it was meant to be completely textual at first (but then requiring the xref table to have fixed-length entries is odd), and then they decided binary would be more efficient.
I actually was at a Acrobat/PDF launch event in midtown NYC.
It was an embedded file type that could be generated at the type of publishing and all dependencies could either be embedded or not.
This made a coherent point in a digital workflow that could be saved and reprinted with ease. This was a big deal before the portable document format came to be.
I once made a workflow that took pdf files from Word, filemaker, excel, and mini-cad. This all got combined into a single 9,000 page pdf. The final pdf had a coherent thumbnails, page numbers and headers and footer.
Only took a couple of hours to get the final documnet after pushing the go buttton.
> My guess is that it was meant to be completely textual at first
It indeed started life as “not Turing complete postscript with an index” (those makes it easy to render just the third page of a PDF file, something that’s impossible in postscript without rendering the first and second pages first). Like postscript, that was a pure text format.
One nice feature is that you can append a few pieces and a new index to an existing PDF file and get a new valid PDF file (which would still contain its old index as a piece of “junk DNA”)