1. Pre-process PDF images to detect letters better?
2. Use LLMs to spell/grammar check and perhaps even auto-complete missing pieces?
3. Employ rich text to capture style (ex: lexical.dev)?
Unsure if it is feasible to bundle it all up for web.
See also: https://github.com/RajSolai/TextSnatcher / https://github.com/VikParuchuri/surya
I would really want human review. Remember that copier that changed digits because it was being clever with compression?
Not sure how difficult it is to run it in the browser, though.
1. Pre-process PDF images to detect letters better?
2. Use LLMs to spell/grammar check and perhaps even auto-complete missing pieces?
3. Employ rich text to capture style (ex: lexical.dev)?
Unsure if it is feasible to bundle it all up for web.
See also: https://github.com/RajSolai/TextSnatcher / https://github.com/VikParuchuri/surya