The challenge I have is how to get bounding boxes for the OCR, for things like r...

dontlikeyoueith · 2025-03-06T22:24:03 1741299843

AWS Textract works pretty well for this and is much cheaper than running LLMs.

daemonologist · 2025-03-06T22:48:09 1741301289

Textract is more expensive than this (for your first 1M pages per month at least) and significantly more than something like Gemini Flash. I agree it works pretty well though - definitely better than any of the open source pure OCR solutions I've tried.

kbyatnal · 2025-03-06T21:18:22 1741295902

yeah that's a fun challenge — what we've seen work well is a system that forces the LLM to generate citations for all extracted data, map that back to the original OCR content, and then generate bounding boxes that way. Tons of edge cases for sure that we've built a suite of heuristics for over time, but overall works really well.

dontlikeyoueith · 2025-03-06T22:24:38 1741299878

Why would you do this and not use Textract?

schcrosby · 2025-03-07T00:04:05 1741305845

I too have this question.