> *No one can tell the difference between source and destination PDF unless they... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

mopsi on March 3, 2020 | parent | context | favorite | on: What's so hard about PDF text extraction?

> No one can tell the difference between source and destination PDF unless they look at the file size on disk.

Not even when they try to select and copy text?

hnick on March 3, 2020 [–]

You can add PDF tag commands to make rasterised text selectable and searchable, though they probably aren't doing that.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact