Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Zerox v1 – Document OCR with GPT-vision (getomni.ai)
7 points by themanmaran 7 months ago | hide | past | favorite | 3 comments
Hey everyone! Today we're launching the stable release of Zerox, our open source OCR tool we've been building at OmniAi.

This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document". But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.

In particular, we've seen the vision models do a great job on charts, infographics, and handwritten text. Documents are a visual format after all, so a vision model makes sense!

I posted the first experiments on HN, and since then, we've had some great contributors who have helped turn this into a full package. We have two versions now:

- pip package [https://pypi.org/project/py-zerox/] - npm package [https://www.npmjs.com/package/zerox]

Next steps for us are working on building an open source dataset for fine tuning. We've seen some early success with a charts=>markdown fine tuning data set, and excited to keep building.

Github: https://github.com/getomni-ai/zerox

You can try out a hosted version here: https://getomni.ai/ocr-demo




It's interesting how it ignores things like headers and footers. LLMs have an edge there in "deciding" whether to include something in the output or not.

It'd be great if your hosted version would also accept a URL to a PDF and give a permalink to the result as well (if you're looking for upgrades)


I've noticed the same "deciding" what to include issues. Despite explicit instructions in the prompt to include all text on the page.

This is one of the items that can hopefully be resolved with fine tuning.


I thought it was a big upgrade. Comparing Zerox w/ Unstructured on the first 5 pages of [this datasheet](https://www.ti.com/lit/ds/symlink/lm5117.pdf); zerox gave me what I wanted, and Unstructured gave me a bunch of extra junk that was harder to sort through at the top




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: