Question to anyone with experience in this domain: I have CSAM spam problems on a forum I host, with bots putting link shortener URLs embedded in images rather than the post body. Traditional OCR software deals poorly with them due to font modifications and intentional text edge modifications, and I'm obviously not gonna use a SaaS/closed source model to upload a bunch of may-be-may-not-be-CSAM pictures, so looking for a way to do this locally, with cheapish inference if possible (I don't mind spending a minute of compute to get the result out for one image, but need to do it on the CPU).
Is there any small model that would do this effectively, with pure text extraction (without going for any kind of formatting or whatnot)?
Is there any small model that would do this effectively, with pure text extraction (without going for any kind of formatting or whatnot)?