
Back to the Future of Handwriting Recognition - jabagawee
https://jackschaedler.github.io/handwriting-recognition/
======
snowwrestler
This a cool exploration of technology, and I don't want to take away from
that.

> The program was efficient enough to run in real-time on a IBM System/360
> computer, and robust enough to properly identify _90 percent_ of the symbols
> drawn by first-time users.

I just want to point out that 90% accuracy is, from a user's point of view,
_awful_ handwriting recognition performance. It means you will be correcting
on average about 10 words per paragraph! Even 99% percent accuracy is not
nearly good enough to give people a sense that the computer is good at
handwriting recognition.

I also want to point out the difficulty and danger in interpreting strokes
when doing handwriting recognition.

In the last demo box, try writing a capital Y without lifting the pen. You'll
have to go "up and down" one or both upper branches. Because of this, the
recognizer will call it a K, A, or N even though it is obviously a Y when
you're done.

This demo is constrained to only using one stroke per letter, but systems that
permit multiple strokes still get into trouble when the strokes don't match
what they are expecting--for example if you draw an X using 4 individual
strokes outward from a central point.

This also happens with words. In Microsoft's handwriting recognition in Office
in the early 2000s, writing the letters of a word out of order completely
borked the recognition. For example writing "xample" and then going back and
adding an "e" at the beginning would not produce a recognized word of
"example."

My point with all of this is that there is a reason you probably don't do all
your computing with natural handwriting. It's a surprisingly difficult
problem. Users do not expect it to matter how they form letters and words on
the page. And they have very low tolerance for correcting computer mistakes.

~~~
defgeneric
> This demo is constrained to only using one stroke per letter, but systems
> that permit multiple strokes still get into trouble when the strokes don't
> match what they are expecting--for example if you draw an X using 4
> individual strokes outward from a central point.

Arguably, an X drawn this way should NOT be recognized as an X--that's not how
an X is spelled.

If the task is communicating with the computer, then recognition of the
gesture is a valid approach. Just as there are conventions regarding the
spelling of words, there are conventions involved in the formation of letters.
Why not use them? It would even seem incorrect to leave these out.

~~~
snowwrestler
The human convention of written language it to interpret the symbols after
they have been completed, not during the act of writing them.

A computer that interprets the behavior of writing, rather than the final
symbols, is going to violate user expectations at some point.

Why? Because people do not always write as linearly as you might expect,
especially when writing fast. They might drop or mis-write letters or words,
then go back and fix it. Or quickly jot down just enough letters to remind
themselves of what they heard, then go back and fill the rest in. A routine
that interprets actions in order is going to have a hard time with actions
that the user completes out of order.

~~~
1787
"human convention of written language" is a bit much. Stroke order is almost
as important as what the actual strokes are in the definition of a Chinese
character, for example. Of course unless you literally watch someone write you
observe the characters after they're written, but the most predictive latent
mental representation of a character does include an order component. I know
this because I made the mistake of memorizing many characters almost like
bitmaps and have had to go back and learn how to reliably write/read hand
written characters.

~~~
snowwrestler
I don't know what to say other than that the entire purpose of written
language is to carry information between people who aren't in a position to
directly observe each other writing. (If they were, they could just talk and
would not need to write.)

~~~
1787
There exist counterexamples in the broader world. Historically in the
Sinosphere it was commonish that two people might share a command of written
Classical Chinese but not really be able to speak to each other. For a modern
example, consider the paper below:
[https://eric.ed.gov/?id=ED515291](https://eric.ed.gov/?id=ED515291)

------
rayiner
Handwriting recognition is a great example of technology whose development
seems to have plateaued before it became "good enough." Stroke-based
recognition has been in development for half a century now, but my iPad Pro
still makes errors at least a couple of times per line, which is enough to
make it pretty much useless unless you're writing only for your own later
consumption. That and voice recognition. It's shocking how bad Android and iOS
still are at that, even after decades of work on voice recognition technology.

------
pipio21
>I think it’s worth asking why anyone in their right mind should care about
mid-century handwriting recognition algorithms in 2016.

Lots of people care, specially in Asia(Chinese and Japanese). It is just that
the problem is incredible hard.

We put 5 very smart people working for a year on that, and it was totally
impossible meeting people's expectations, specially people like doctors taking
notes fast(and ugly).

We thought that the market was in creating mindmaps or something instead as
people could write slower and better.

But people write a double u and expect the computer to see an "m". With deep
learning is possible but extremely flimsy.

------
pkaye
This is kind of interesting. I had a through about how to approach the
handwriting recognition problem a few years back and surprisingly I though of
this curvature based approach also. I never implemented it (too lazy to
try...) but its cool to see how well something like that might work.

------
blattimwind
The linked demo is by far the most impressive thing I've seen all week. I wish
a certain Microsoft chart editor was as easy and unfinicky to use as this demo
from 1966 (52 years ago), and that's still one of the better editors out
there.

~~~
taneq
Comparing this with the Graffiti system on my old (2000-ish) Palm Pilot, this
is somewhat more reliable even on a first attempt than that was after I'd made
a concerted effort to learn it. Very cool!

Edit: I think where the Afterword says "inputting text with a stylus is likely
slower than touch typing", they're forgetting that we still don't have a
really acceptable way of inputting text on mobile devices. Swype and its ilk
are close, but still hamfisted at times.

------
symlock
I missed it the first time, but the article has linked source code
(github.com/jackschaedler/handwriting-recognition) for all the D3.js demos
that is worth a read.

------
interfixus
All this constant talk of AI and singularities and whatnot.

Reality check: Our machines do not yet accurately manage simple reading tasks.

------
watmough
I did something like this in Visual Basic and submitted it to PC PLUS in the
UK, back in the early 90's.

It was (yay!) published as recognit.bas (VB) and I'd be really happy if
someone still has a copy.

It recognized just numbers but the basis of operation was similar to the
linked article.

------
EliasY
I wonder if it was possible to use Hinton's idea of local features (where a 3
is recognized as an E in a 180 rotation map and a W in 90 deg. rotation map)
to make the recognition partially rotation invariant....

------
singularity2001
so much time spent on manual feature engineering which could be implicitly
picked up by RNNs.

