Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you know what this field/type is called, and I’d any of the big names (MS/Adobe etc) support creating such PDFs?


OCR software like ABBY can spit out something called a "searchable PDF", which has a text layer underneath a picture of a scan. Otherwise, PDF has 'dictionaries' with arbitrary key-value pairs in them. The "Info" dictionary has some specific metadata fields like Author, and a "Font" dictionary embeds fonts, but you're free to use those dictionaries for whatever. There's also a standard to embed 'dublin core', rights management and custom metadata called XMP. Files can be embedded. You can also use comments, as PDF is a subset of postscript. When a PDF gets converted to PDF/A (by archiving software) or flattened/optimized, most of these are likely to be lost.


I believe it's a "hybrid PDF" but I'm not sure if there's a further standard for merely embedding text.

https://stackoverflow.com/questions/67358370/what-the-standa...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: