I did my share of OCR funny business by running the Firebase OCR SDK inside an android app acting as a webserver, through nested virtualization back in 2018. Nothing at that time beat it's accuracy & throughput for something which ran offline once the weights were fetched into the container.
Looking forward to GPU support for workers. Cloudflare announced they were working on it in 2021 [1], but it doesn't seem generally available yet; they still have a signup page for notification [2].
I know other companies have struggled with demand, so maybe they're doing it on an invite basis.
I was under the assumption that, because of how cloudflare works, it has to be globally available.
That would mean support in every dc and a easily to expand capacity.
They can't just make it available in one region and then see how it goes.
I like how cloudflare works, but this use-case for them ( on the edge) seems more difficult to plan. It's not just the tech, it's the infrastructure in this case.
Agreed, but this is a blocker for anyone seriously considering using the product. CPU-only inference simply isn’t good enough for anything besides toy workloads. If they’re waiting for people to use Constellation before investing in GPU support, nobody will use constellation because it doesn’t have it, so they’ll never end up investing in it, and so on…
- if you have infrequent access patterns you don't pay for the time it isn't used; and
- if you have huge bursts (say your AI project gets on the homepage of hacker news) capacity automatically scales with demand.
In return each compute-second costs more than a normal vps, so there is some threshold where doing it yourself is more cost effective.
The benefit of "edge" here is imho that it works together with other cloudflare "edge" stuff. Not hugely useful for OCR, but imagine your blog on Cloudflare Pages has a contact form (with a CF Pages Function) and you want an AI spam filter; or bot detection that is updated each time a page is visited, or in the auth function you have offloaded to a CF Worker; or if you want to enhance your email-triggered worker with AI.
It depends on your setup and use cases. There's three major considerations:
* What language are you trying to OCR? And only language or also things like math symbols?
* Do you have a GPU or not?
* Are you trying to OCR handwriting or typed words?
I explored OCRing English documents from the 1960s that were primarily typed, though some handwriting. I tried out PaddleOCR, TrOCR, Tesseract, EasyOCR, and kerasOCR for FOSS, and then Google, Amazon, and Microsoft for paid.
To be clear, the paid solutions beat the FOSs ones handsdown, no question. However for FOSS I found that TrOCR was the best for both typed and handwritten, however for typed, it was closely followed by tesseract, but for handwriting TrOCR was by far the best with all the others basically being worthless. However, TrOCR took ~200x longer even on GPU than Tesseract on CPU (Tesseract if fastttt, even more if you parallalerize it). Tesseract isn't the best, but it's the best all around, it's the one the Internet Archive uses.
Need to write up a blog on this. And the docTR looks interesting, I'm going to check that out.
FWIW, I just tried EasyOCR on some sans-serif text and Tesseract5 absolutely blew it out of the water. The only thing Tesseract got wrong that EasyOCR (sometimes) got right was uppercase Is ("I") were recognized pretty much 100% of the time as vertical bars ("|"), but since my text of interest is extremely unlikely to have any vertical bar characters, a simple sed post-processing stage fixed that.
- Tesseract5 *demolished EasyOCR on paragraph detection, getting that 100% on the 10 pages I checked. EasyOCR missed most of the paragraph breaks.
- Tesseract got most of the punctuation correct, EasyOCR only got apostrophes and two double-quotes (out of 14) correct. Every single period, comma, exclamation mark, and hyphen was missing or wrong, as were most of the double-quotes. Some question marks were recognized, but with garbage after them.
- In general EasyOCR seems to just add in square closing brackets ("]") where none are
Why are Cloudflare Workers limited to languages that compile to Wasm? Can't they use a container of some sort to isolate the code? Many languages can be compiled to Wasm but many times is quirky.
Containers have isolation problems thus requiring a further layer of isolation (microVMs like Firecracker by AWS), which slows startup times. CloudFlare Workers are mostly intended for Edge-ish scenarios so they need to have fast startup times.
The "Why Edge Compute?" section compares running OCR on a Cloudflare Worker against on-device but the first two advantages also apply to running on-device (low latency and low cost). Is this a mistake in the article or do I not understand correctly?
I apologise, I probably could have been clearer. One key advantage is being able to improve the model without needing to re-release via the App Store/Play Store. Our app (Blinq) is also native so we'd need to prepare a model to natively run on both platforms.
I use the word Edge and Microservice in my head interchangeably. If something does not make sense for a microservice, then it does not make sense for the Edge.
No. It’s running on the same machine. Our goal is distributed ML/AI in line with our usual mechanism of running everything, everywhere. That is powerful because it means anything we build scales with our network.