Hacker News new | past | comments | ask | show | jobs | submit login

Wow, this is promising. I tried on a few poorly scanned papers I've lying about. A few observations:

1. Pre-process PDF images to detect letters better?

2. Use LLMs to spell/grammar check and perhaps even auto-complete missing pieces?

3. Employ rich text to capture style (ex: lexical.dev)?

Unsure if it is feasible to bundle it all up for web.

See also: https://github.com/RajSolai/TextSnatcher / https://github.com/VikParuchuri/surya




> Use LLMs to spell/grammar check and perhaps even auto-complete missing pieces?

I would really want human review. Remember that copier that changed digits because it was being clever with compression?


I've been trying out alternative versions of this that pass images through to e.g. the Claude 3 vision models, but they're harder to share with people because they need an API key!


In case you wanted to add a pre-processing step, I found this ImageMagick script useful: https://www.fmwconcepts.com/imagemagick/textcleaner/index.ph...

Not sure how difficult it is to run it in the browser, though.


FYI, cert is expired.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: