Do you know what this field/type is called, and I’d any of the big names (MS/Ado...

bux93 · 2025-02-06T07:19:23 1738826363

OCR software like ABBY can spit out something called a "searchable PDF", which has a text layer underneath a picture of a scan. Otherwise, PDF has 'dictionaries' with arbitrary key-value pairs in them. The "Info" dictionary has some specific metadata fields like Author, and a "Font" dictionary embeds fonts, but you're free to use those dictionaries for whatever. There's also a standard to embed 'dublin core', rights management and custom metadata called XMP. Files can be embedded. You can also use comments, as PDF is a subset of postscript. When a PDF gets converted to PDF/A (by archiving software) or flattened/optimized, most of these are likely to be lost.

dkjaudyeqooe · 2025-02-06T09:31:43 1738834303

I believe it's a "hybrid PDF" but I'm not sure if there's a further standard for merely embedding text.

https://stackoverflow.com/questions/67358370/what-the-standa...