Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

then you at least need a pdf reader that implements that, and have to be sure that the parsing you do have cannot be exploited, while still giving a useful representation. This might be easier for ML where you don't care about visual display, but a human generally doesn't want to read raw, unformatted text. And a surprising amount of stuff is probably needed for a half-way decent visual display.


I do view the documents we format after they have gone through the processing stage. They seem to be the same in most ways I would care about. Diagrams are still present, etc. I don't know about PDFs that contain forms as these are not those kind of documents but closer to research documents.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: