Hacker News new | past | comments | ask | show | jobs | submit login
OCR at Edge on Cloudflare Constellation (willhackett.com)
80 points by _bhrz on July 3, 2023 | hide | past | favorite | 30 comments



I did my share of OCR funny business by running the Firebase OCR SDK inside an android app acting as a webserver, through nested virtualization back in 2018. Nothing at that time beat it's accuracy & throughput for something which ran offline once the weights were fetched into the container.


Looking forward to GPU support for workers. Cloudflare announced they were working on it in 2021 [1], but it doesn't seem generally available yet; they still have a signup page for notification [2].

I know other companies have struggled with demand, so maybe they're doing it on an invite basis.

[1]: https://blog.cloudflare.com/workers-ai/ [2]: https://www.cloudflare.com/nvidia-workers/


Good things come to those who wait :-)


I was under the assumption that, because of how cloudflare works, it has to be globally available.

That would mean support in every dc and a easily to expand capacity.

They can't just make it available in one region and then see how it goes.

I like how cloudflare works, but this use-case for them ( on the edge) seems more difficult to plan. It's not just the tech, it's the infrastructure in this case.

Just my 2 cents


They probably just want to see how many people would sign up if they had it. Before they start building something nobody wants to pay for.


Agreed, but this is a blocker for anyone seriously considering using the product. CPU-only inference simply isn’t good enough for anything besides toy workloads. If they’re waiting for people to use Constellation before investing in GPU support, nobody will use constellation because it doesn’t have it, so they’ll never end up investing in it, and so on…


I think people will be surprised how far CPU optimization will go for specialized inference. An example of the progress being made - http://ggml.ai/


They use it themselves to route traffic, which probably is a not gpu intensive.

I've ran face detection on cpu and it takes 1 / secondv and it's probably a pretty intensive "action".

Plenty of use-cases that don't require a gpu

And plenty that require one too though.


I don't get it, how is this useful?

- Takes 1.5 seconds to run, so there goes the "edge" benefit.

- And if it takes that long, it would cost more than having it running on a proper instance/vps.


It's "serverless", so

- if you have infrequent access patterns you don't pay for the time it isn't used; and

- if you have huge bursts (say your AI project gets on the homepage of hacker news) capacity automatically scales with demand.

In return each compute-second costs more than a normal vps, so there is some threshold where doing it yourself is more cost effective.

The benefit of "edge" here is imho that it works together with other cloudflare "edge" stuff. Not hugely useful for OCR, but imagine your blog on Cloudflare Pages has a contact form (with a CF Pages Function) and you want an AI spam filter; or bot detection that is updated each time a page is visited, or in the auth function you have offloaded to a CF Worker; or if you want to enhance your email-triggered worker with AI.


Using the most expensive service at scale to handle "huge bursts" or "spam filters" seems like a fantastic idea!


Sure, for traffic spikes it makes sense.


What's the best for OCR currently that I could deploy for myself? Like time I tried Tesseract 5 I wasn't too impressed.


It depends on your setup and use cases. There's three major considerations:

* What language are you trying to OCR? And only language or also things like math symbols? * Do you have a GPU or not? * Are you trying to OCR handwriting or typed words?

I explored OCRing English documents from the 1960s that were primarily typed, though some handwriting. I tried out PaddleOCR, TrOCR, Tesseract, EasyOCR, and kerasOCR for FOSS, and then Google, Amazon, and Microsoft for paid.

To be clear, the paid solutions beat the FOSs ones handsdown, no question. However for FOSS I found that TrOCR was the best for both typed and handwritten, however for typed, it was closely followed by tesseract, but for handwriting TrOCR was by far the best with all the others basically being worthless. However, TrOCR took ~200x longer even on GPU than Tesseract on CPU (Tesseract if fastttt, even more if you parallalerize it). Tesseract isn't the best, but it's the best all around, it's the one the Internet Archive uses.

Need to write up a blog on this. And the docTR looks interesting, I'm going to check that out.


EasyOCR is a popular project if you are in an environment where you can use run Python and PyTorch (https://github.com/JaidedAI/EasyOCR). Other open source projects of note are PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) and docTR (https://github.com/mindee/doctr).


FWIW, I just tried EasyOCR on some sans-serif text and Tesseract5 absolutely blew it out of the water. The only thing Tesseract got wrong that EasyOCR (sometimes) got right was uppercase Is ("I") were recognized pretty much 100% of the time as vertical bars ("|"), but since my text of interest is extremely unlikely to have any vertical bar characters, a simple sed post-processing stage fixed that.

- Tesseract5 *demolished EasyOCR on paragraph detection, getting that 100% on the 10 pages I checked. EasyOCR missed most of the paragraph breaks.

- Tesseract got most of the punctuation correct, EasyOCR only got apostrophes and two double-quotes (out of 14) correct. Every single period, comma, exclamation mark, and hyphen was missing or wrong, as were most of the double-quotes. Some question marks were recognized, but with garbage after them.

- In general EasyOCR seems to just add in square closing brackets ("]") where none are


Why are Cloudflare Workers limited to languages that compile to Wasm? Can't they use a container of some sort to isolate the code? Many languages can be compiled to Wasm but many times is quirky.


The cloudflare workers run as "isolates". I think of them as "browser tabs." This post helps explain better" https://blog.cloudflare.com/cloud-computing-without-containe...


Containers have isolation problems thus requiring a further layer of isolation (microVMs like Firecracker by AWS), which slows startup times. CloudFlare Workers are mostly intended for Edge-ish scenarios so they need to have fast startup times.


No containers. It’s a v8 isolate ie Chrome and Node’s JS engine. Wasm come included.


The "Why Edge Compute?" section compares running OCR on a Cloudflare Worker against on-device but the first two advantages also apply to running on-device (low latency and low cost). Is this a mistake in the article or do I not understand correctly?


I apologise, I probably could have been clearer. One key advantage is being able to improve the model without needing to re-release via the App Store/Play Store. Our app (Blinq) is also native so we'd need to prepare a model to natively run on both platforms.




I use the word Edge and Microservice in my head interchangeably. If something does not make sense for a microservice, then it does not make sense for the Edge.


Isn't constellation just relaying the inference job from the worker to a more powerful machine?


No. It’s running on the same machine. Our goal is distributed ML/AI in line with our usual mechanism of running everything, everywhere. That is powerful because it means anything we build scales with our network.


This is the key reason we use Cloudflare.

I’m obsessed. Very keen to try PubSub next.


Is there a reason you didn't include any non-alphanumeric characters in your dictionary?


you can do it for free with tesseract.js




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: