Hi HN! I've spent a couple of months fiddling with OCR and wanted to share some of my findings.
The approach I share here (fine-tuning recent deep learning models) is the first one that's gotten me anything resembling high-quality OCR on these particular noisy historical documents. OCRing these has been something of a white whale for me for several years (except, a white whale that I have spent comparatively little time on).
At this point I think I am reasonably competent in OCR, but no expert... Curious for your thoughts.
The approach I share here (fine-tuning recent deep learning models) is the first one that's gotten me anything resembling high-quality OCR on these particular noisy historical documents. OCRing these has been something of a white whale for me for several years (except, a white whale that I have spent comparatively little time on).
At this point I think I am reasonably competent in OCR, but no expert... Curious for your thoughts.