How does this compare with commercial OCR APIs on a cost per page?

yigitkonur35 · 2024-09-22T06:10:17.000000Z

It is a lot cheaper! While cost-effectiveness may not be the primary advantage, this solution offers superior accuracy and consistency. Key benefits include precise table generation and output in easily editable markdown format.

Let's make some numbers game:

- Average token usage per image: ~1200 - Total tokens per page (including prompt): ~1500 - [GPT4o] Input token cost: $5 per million tokens - [GPT4o] Output token cost: $15 per million tokens

For 1000 documents: - Estimated total cost: $15

This represents excellent value considering the consistency and flexibility provided. For further cost optimization, consider:

1. Utilizing GPT4 mini: Reduces cost to approximately $8 per 1000 documents 2. Implementing batch API: Further reduces cost to around $4 per 1000 documents

I think it offers an optimal balance of affordability & reliability.

PS: One of the most affordable solution on market, cloudconvert charges ~30$ for 1K document (pdftron mode required 4 credits)

johndough · 2024-09-22T06:48:21.000000Z

> I think it offers an optimal balance of affordability & reliability.

It is hard to trust "you" when ChatGPT wrote that text. You never know which part of the answer is genuine and which part was made up by ChatGPT.

To actually answer that question: Pricing varies quite a bit depending on what exactly you want to do with a document.

Text detection generally costs $1.5 per 1k pages:

https://cloud.google.com/vision/pricing

https://aws.amazon.com/textract/pricing/

https://azure.microsoft.com/en-us/pricing/details/ai-documen...

yigitkonur35 · 2024-09-22T06:52:13.000000Z

You've got a point, but try testing it on a tricky example like the Apollo 17 document - you know, with those sideways tables and old-school writing. You'll see all three non-AI services totally bomb. Now, if you tweak it to batch = 1 instead of 10, you'll notice there's hardly any made-up stuff. When you dial down the temperature close to zero, it's super unlikely to see hallucinations with limited context. At worst, you might get some skipped bits, but that's not a dealbreaker for folks looking to feed PDFs into AI systems. Let's face it, regular OCR already messes up so much that...