Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: OCR image pre-processing resources for beginners?
2 points by Curiositry on April 27, 2023 | hide | past | favorite
I'm using Tesseract 5 to do optical character recognition on (typewritten) scanned documents, and the output quality is mediocre, despite decent image quality.

Could anyone point me to semi-automated tools for pre-processing scanned pages to improve OCR accuracy?

I have run across scantailor-advanced, unpaper, and textcleaner, but the settings for all of them are a bit in depth, and I haven't found any beginner-friendly starting point blogposts/script for what would be good, reasonable default settings.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
