In case any scientist actually working on adaptive OCR is reading this, I was gi...

zeograd · 2025-02-08T13:58:54 1739023134

I tried https://github.com/PaddlePaddle/PaddleOCR for my own use case (scanline images of parcel labels) and it beat Tesseract by an order of magnitude.

(Tesseract managed to get 3 fields out of a damaged label, while PaddleOCR found 35, some of them barely readable even for a human taking time to decypher them)

moffkalast · 2025-02-08T10:19:36 1739009976

When I was doing OCR for some screenshots last year I managed to get it done with tesseract, but just barely. When looking for alternatives later on I found something called Surya on github which people claim does a lot better and looks quite promising. I've had it bookmarked for testing forever but I haven't gotten around to actually doing it. Maybe worth a try I guess?

ianhawes · 2025-02-08T13:18:54 1739020734

Surya is on par with cloud vision offerings.

ritvikpandey21 · 2025-02-07T21:58:31 1738965511

would love to give this a shot with pulse! feel free to reach out to me at ritvik [at] trypulse [dot] ai, and i’d be very curious to give these a run! in general, i’m happy to give some general advice on algos/models to fine-tune for this task

sumedh · 2025-02-07T22:16:34 1738966594

Are you targeting business or consumers?

I cannot find the pricing page.

sidmanchkanti21 · 2025-02-07T22:23:19 1738966999

our current customers are both enterprises and individuals.

pricing page is here https://www.runpulse.com/pricing-studio-pulse

pbhjpbhj · 2025-02-08T10:40:55 1739011255

Not currently looking for this, but can I just say thank you for being open and direct with your prices. So useful to just be able to look.

How are you on 18-19th Century cursive, English language. Do you have a guarantee for number of errors.

powerhugs · 2025-02-08T16:52:12 1739033532

Not OP, but you might be looking for https://www.transkribus.org/

sidmanchkanti21 · 2025-02-08T17:42:34 1739036554

thanks! re: 18-19th century cursive, while we handle historical handwriting, we can't guarantee specific error rates. each document's accuracy varies based on condition, writing style, and preservation. happy to run test samples to check.

feel free to send over sample docs: sid [at] trypulse [dot] ai

jaggs · 2025-02-10T21:58:41 1739224721

No API pricing available?

patcon · 2025-02-07T22:13:43 1738966423

Pls contact archive.org about adopting this digital archive once it exists (they also have a bad habit of accepting physical donations, if you are nearby)

ahoka · 2025-02-07T22:24:54 1738967094

I’m very far from an expert, but had good luck with EasyOCR when fiddling with such things.

pbhjpbhj · 2025-02-08T10:37:52 1739011072

If it's a large enough corpus I imagine it's worth fine tuning to the specific fonts/language used?

mdbmdb · 2025-02-07T21:47:00 1738964820

I would love to get access to that archive!