The biggest risk of vision LLMs for OCR is that they might accidentally follow i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		simonw 7 months ago \| parent \| context \| favorite \| on: Mistral OCR The biggest risk of vision LLMs for OCR is that they might accidentally follow instructions is the text that they are meant to be processing. (I asked Mistral if their OCR system was vulnerable to this and they said "should be robust, but curious to see if you find any fun examples" - https://twitter.com/simonw/status/1897713755741368434 and https://twitter.com/sophiamyang/status/1897719199595720722 )

pilooch 7 months ago [–]

Fun, but LLMs would follow them post OCR anyways ;)

I see OCR much like phonemes in speech, once you have end to end systems, they become latent constructs from the past.

And that is actually good, more code going into models instead.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact