|
|
| | Ask HN: Why is the PDF format so inaccessible? | |
99 points by shawnfrostx on May 4, 2022 | hide | past | favorite | 104 comments
|
| | I am working on some typographical software that is supposed to generate PDFs at the end. It seems like there is no accessible information on how to do this. The PDF ISO specification is behind a paywall and has a dead link to a 2008 spec. There are open source converters like pandoc, but nothing that actually writes to PDF that I can find. Is there any resource that goes over the process of PDF generation? |
|
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
|
There’s also this book which provides a good introduction and overview and is useful for understanding how the format works (although the PDF reference itself is pretty decent too, as far as specs go): https://www.oreilly.com/library/view/developing-with-pdf/978... (You can find a PDF copy if you look around.) EDIT: There’s also https://www.oreilly.com/library/view/pdf-explained/978144932... which might be even better.
However, be warned that the PDF format can be quite complex and is not exactly for the faint of heart. It’s best to use an established library to generate PDF output, like PDFBox, iText, PDFSharp, PDFKit, etc. Those tend to have their own tutorials.
For emphasis: Do not generate PDFs “by hand”! You risk inadvertently generating PDFs that do not fully conform to the spec, and not noticing it because PDF readers are quite lenient in what they accept. A lot of PDFs in the wild are not standard-conforming in some way or other, because their generators were not carefully written against the spec, but against “whatever Acrobat Reader accepts”. This is the bane of every software on the receiving end that needs to process PDFs.