

Ask HN: How Do You Create An OCR Web Service? - HNY 2014 HN - davidsmith8900

HNY 2014 HN. Im trying to create an ocr web service with Tesseract OCR, but I dont know how to get started. What should I do? Do I need dedicated hosting service or virtual private service?
======
tgflynn
There are a number of existing OCR web services, both free and paid (try
Googling "online ocr"). I think some of them use Tesseract. What advantages
would your service offer compared to these ?

~~~
davidsmith8900
\- Thank you for the response tgflynn and happy new year. I just wanted to
understand the architecture of how one is created. Like I want to understand
how it works. If somebody points me to a paper or a book, I would be glad to
look at it.

~~~
tgflynn
I doubt there are any books specifically on the subject of how to build an
online OCR service. If you're interested in the underlying technology of OCR
or the orthogonal question of the mechanics of building web applications in
general you may find something.

The most basic workflow would be to implement a simple web app that allows
uploading an image file, save the the file to local disk, run the Tesseract
program and generate a new web page that displays the Tesseract textual output
to the user.

There is going to be much more CPU utilization per query than for a typical
web application so if you want to scale you'll probably need some sort of
distributed load balancing architecture with multiple (possibly virtual)
servers.

If you don't care about scaling and just want to test something out you'll at
least need a hosting provider that allows you to compile and install software.
That probably means a virtual server like EC2 or something similar.

~~~
davidsmith8900
\- Okay thank you for the knowledge tgflynn. I really appreciate your time and
patience.

