Hey I wrote the Omni benchmark. I think you might be misreading the methodology ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		themanmaran 6 months ago \| parent \| context \| favorite \| on: Show HN: OCR Benchmark Focusing on Automation Hey I wrote the Omni benchmark. I think you might be misreading the methodology on our side. Order on page does not matter in our accuracy scoring. In fact we are only scoring on JSON extraction as a measurement of accuracy. Which is order independent. We chose this method for all the same reasons you highlight. Text similarity based measurements are very subject to bias, and don't correlate super well with accuracy. I covered the same concepts in the "The case against text-similarity"[1] section of our writeup. [1] https://getomni.ai/ocr-benchmark

kapitalx 6 months ago [–]

I'll dig deeper into your code, but scanning your post does look like your are addressing this. That's great.

If I do find anything, I'll share with you for comments before I publish the post.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact