One problem I’ve encountered at my small startup in evaluating OCR technologies is precisely convincing stakeholders that the “human-in-the-loop” part is both unavoidable, and ultimately beneficial.
PMs want to hear that an OCR solution will be fully automated out-of-the-box. My gut says that anything offering that is snake-oil, and I try to convey that the OCR solution they want is possible, but if you are unwilling to pay the tuning cost, it’s going to flop out of the gate. At that point they lose interest and move on to other priorities.
Yup definitely, and this is exactly why I built my startup. I've heard this a bunch across startups & large enterprises that we work with. 100% automation is an impossible target, because even humans are not 100% perfect. So how we can expect LLMs to be?
But that doesn't mean you have to abandon the effort. You can still definitely achieve production-grade accuracy! It just requires having the right tooling in place, which reduces the upfront tuning cost. We typically see folks get there on the order of days or 1-2 weeks (it doesn't necessarily need to take months).
It really depends on their fault tolerance. I think there's a ton of useful applications where OCR would be 99.9%, 99%, and even 98% reliable. Skillful product manager can keep these limitations in mind and work around them.
... unavoidable "human in the loop" - depends imo.
From the comments here, it certainly seems that for general OCR it's not up to snuff yet. Luckily, I don't have great ambitions.
I can see this working for me with just a little care upfront preprocessing now that I know where it falls over. It casually skips portions of the document, and misses certain lines consistently. Knowing that I can do a bit massaging, and feed it what I know it likes, and then reassemble.
I found in testing that it failed consistently at certain parts, but where it worked, it worked extremely well in contrast to other methods/services that I've been using.
PMs want to hear that an OCR solution will be fully automated out-of-the-box. My gut says that anything offering that is snake-oil, and I try to convey that the OCR solution they want is possible, but if you are unwilling to pay the tuning cost, it’s going to flop out of the gate. At that point they lose interest and move on to other priorities.