Hacker News new | past | comments | ask | show | jobs | submit login

The challenge I have is how to get bounding boxes for the OCR, for things like redaction/de-identification.



AWS Textract works pretty well for this and is much cheaper than running LLMs.


Textract is more expensive than this (for your first 1M pages per month at least) and significantly more than something like Gemini Flash. I agree it works pretty well though - definitely better than any of the open source pure OCR solutions I've tried.


yeah that's a fun challenge — what we've seen work well is a system that forces the LLM to generate citations for all extracted data, map that back to the original OCR content, and then generate bounding boxes that way. Tons of edge cases for sure that we've built a suite of heuristics for over time, but overall works really well.


Why would you do this and not use Textract?


I too have this question.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: