Question to anyone with experience in this domain: I have CSAM spam problems on ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		apt-get 4 days ago \| parent \| context \| favorite \| on: Why LLMs still have problems with OCR Question to anyone with experience in this domain: I have CSAM spam problems on a forum I host, with bots putting link shortener URLs embedded in images rather than the post body. Traditional OCR software deals poorly with them due to font modifications and intentional text edge modifications, and I'm obviously not gonna use a SaaS/closed source model to upload a bunch of may-be-may-not-be-CSAM pictures, so looking for a way to do this locally, with cheapish inference if possible (I don't mind spending a minute of compute to get the result out for one image, but need to do it on the CPU). Is there any small model that would do this effectively, with pure text extraction (without going for any kind of formatting or whatnot)?

parsakhaz 1 day ago | [–]

Yup, Moondream is great for this use case! You can use locally with the quickstart: https://docs.moondream.ai/

It is a 2b vision model that runs anywhere and can object detect, point, query, and more.

sramam 4 days ago | [–]

Have you looked at https://moondream.ai/?

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact