

Ask HN: Recommend solution for scanning forms? - dxjones

I am working on an application where I need to collect data from lots of regular people.  I am looking for an efficient, reliable, inexpensive alternative to "Scantron" (used for machine-readable multiple-choice exams)<p>I am hoping to use a Mac laptop with a portable scanner (http://www.fujitsu.ca/products/scansnap/s300m/) that scans pages to PDF.<p>I have complete freedom for how the data forms are designed, ... and I am looking for reliable (free?/open source?) software that could capture the data from the (scanned) PDF and save it to a text file.<p>If anyone has any experience with something similar, I'd really appreciate hearing from you.  Any other links or suggestions would be appreciated too.
======
weaksauce
You probably have to roll your own version if you want it.
[http://answers.yahoo.com/question/index?qid=20090421184939AA...](http://answers.yahoo.com/question/index?qid=20090421184939AAx9aCj)

I haven't looked too far into this domain but I would imagine that it is not
too difficult to do on a smallish scale.

On the actual form itself have some kind of scale and position marker in the
top left and bottom right corners to give you orientation and scale of the
scanned image. Then do some image analysis at predefined positions on the
page. (of course you need to take into consideration the scale of the sheet
and the orientation to make sure your offset vector is going in the right
direction and distance per question.)

edit: this may be what you are looking for:

<http://www.cs.uwaterloo.ca/~a3seth/udai/OMRProj/>

~~~
dxjones
Thanks. This link looks useful.

------
noodle
its not quite what you're looking for, i think, but i've heard good things
about <http://www.pdftoword.com/>

~~~
dxjones
Looking interesting. But as you guessed, I am not looking for OCR (optical
character recognition).

I am looking for capturing which circle was filled in: A or B or C.

~~~
noodle
well, that service does more than that, including trying to capture and
recreate images.

~~~
dxjones
yes, ... except I want to move beyond the "image" of circle A shaded in, and
circle B _not_ shaded in.

I want a CSV text file listing the captured data: 1,A 2,B 3,B 4,A etc.

~~~
noodle
while, again, i did mention that this isn't the out-of-the-box solution you're
looking for, i think it could provide you with a good place to take off with
something custom written.

------
dxjones
Doing a little more online research, it seems like I am looking for a
free/open-source solution for OMR (optical mark recognition) using a generic
image scanner or PDF (instead of a specialized device that just scans OMR
forms).

I wonder if there are computer/machine vision hackers who might know where to
find (or put together) a good solution.

