Hey everyone! Today we're launching the stable release of Zerox, our open source OCR tool we've been building at OmniAi.
This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document". But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.
In particular, we've seen the vision models do a great job on charts, infographics, and handwritten text. Documents are a visual format after all, so a vision model makes sense!
I posted the first experiments on HN, and since then, we've had some great contributors who have helped turn this into a full package. We have two versions now:
- pip package [https://pypi.org/project/py-zerox/]
- npm package [https://www.npmjs.com/package/zerox]
Next steps for us are working on building an open source dataset for fine tuning. We've seen some early success with a charts=>markdown fine tuning data set, and excited to keep building.
Github: https://github.com/getomni-ai/zerox
You can try out a hosted version here: https://getomni.ai/ocr-demo
It'd be great if your hosted version would also accept a URL to a PDF and give a permalink to the result as well (if you're looking for upgrades)