Hacker News new | past | comments | ask | show | jobs | submit login

> PDFs are incredibly flexible. Text can be specified in a bunch of ways. Glyphs can be defined to the nth degree. Text sometimes isn’t text at all. There’s no layout engine and everything is absolutely positioned.

Can't stress this enough. The next time you open a multi-column PDF in adobe reader and it selects a set of lines or a paragraph in the way you would expect, know that there is a huge amount of technology going on behind the scenes trying to figure out the start and end of each line and paragraph.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: