Dan Ingalls PARC Talk on Sanskrit and OCR (1980) [video]

reason-mr · on June 10, 2019

I was lucky enough to be at PARC for a time when some of the original PARC researchers, like Dan, Danny Bobrow, Stew Card, etc, were still around and working. I can only say that I reached the conclusion that the early web really did a end-run around a lot of the really cool stuff that came out of the earlier PARC work. Core distributed system work, real OOP, Ubiquitous Computing in the style envisioned by Weiser, collaborative MOOs, early Ray Tracing and computer graphics, data visualization, human factors, etc. etc. And: most of this stuff came from a time when joining PARC was like was more like a life-ticket to work on stuff you thought was interesting and cool, not necessarily stuff with commercial potential. It wasn't directed so much as managed at a cat herding level. This dissipated in the later versions of the lab.

svat · on June 10, 2019

This is a nice talk, in two parts.

First, Daniel Ingalls (Sr), the Harvard Sanskritist, gives a talk on (IIRC) some aspects of Sanskrit, the Mahabharata, and some reasons why he was trying to use computers for his task. I found it a reasonably good and impressively concise summary, though not sure how it appears to someone who doesn't know Sanskrit — it is clear from the video that he loses the audience at some point (late in his talk) and you get some idea of why he had a reputation for running hard classes and turning out strong students!

Then his son, a young Dan Ingalls, gives a talk on his work: OCR for the Devanagari script, specifically for scanned images of the critical edition of the Mahabharata (produced by Sukthankar and team at BORI in Pune). He goes into detail on how he went about the task; as a programmer I found this talk inspiring!

svat · on June 10, 2019

(Can't rewatch the video right now as daughter is sleeping and I've misplaced my headphones, but found in my G+ backup a post I'd made about it in 2012; reproducing it below as it's no longer on the internet anywhere. Sorry to reply to myself; made this a separate comment because it's rather long for HN.)

Sanskrit and OCR: A talk by Daniel Ingalls and his son (also) Daniel Ingalls.

The OCR part starts at around 28:00.

Apparently in 1980 a one-programmer hobby project achieved 99.5% accuracy on manually typeset Devanagari (typeset in India)! Gives one hope that it should be possible to write good OCR for Indian languages today.

----

Daniel Ingalls was Professor of Sanskrit at Harvard. He was one of the few Western Sanskritists (IMO) to understand and appreciate Sanskrit poetry on its own terms, by its own standards. He translated the Subhashita-ratna-kosha (also excerpted as Sanskrit Poetry from Vidyākara's Treasury, and has a good introduction) and Anandavardhana's Dhvanyaloka with the commentary of Abhinavagupta. He worked on Indian philosophy/logic too (navya-nyāya). A large number of today's prominent American Indologists are his students, including Sheldon Pollock, Robert Thurman, Gary Tubb, and Wendy Doniger.

His son Dan Ingalls was a computer scientist at Xerox PARC and is the main creator (along with Alan Kay) of Smalltalk, from which the discipline of object-oriented programming arose. Apparently he's also responsible for context menus, and something called "bit blit" (which he indeed mentions in this talk).

Quotes:

(30:00) "This is the text that we're working with, and here's a typical page. And this is just to show you that a significant part of the problem is simply locating the lines of text on the page. You've got page headings which you don't care about, and here comes the text that you do care about — it's in two-column format — and there's this little [?] down here and then commentary below that. And in addition just to make things interesting, there's these little squiggles here under things and they relate to footnotes on the page... Now the two-column format also is worse because you get these breaks between chapters where this is column 1 and it completes there, and then the next chapter begins here, column 1 and column 2. So you have to deal with all that page layout. But that's, I mean, you just do it and then it's done."

(33:00) "The page may not be perfectly horizontal. Or even if it is, the type probably won't be because these are all pretty much manually typeset."

(35:00) "It's interesting: you learn about the actual pieces of type that they have."

(39:40) [Handling of matras.] "This part of the program is very heuristic. I did it in a hurry and it's not actually done uniformly, but it does the correct thing."

(41:45) "This is a bha and this is a ma. ... They look almost identical. And they can look identical... or at least you can't tell them apart."

(40:00) "It's really been fun doing this because, you know, computers inside 'em don't have any idea what you're doing. And I'm in that relationship to this project."

:D

His method was something like this:

* Identify horizontal lines of text (by looking for total blackness of pixels at each height)

* Within each line, identify words (by looking for gaps between words)

* Within each word, identify bounding boxes for individual glyphs (similar)

* For individual glyphs,

Training: For each character in all its variations that appear among the text, find its "skin" (OR of all the variations) and "bones" (AND of all the variations)

Recognition: a known character is bad for a glyph-to-be-recognised, if the glyph has a dot where skin doesn't, or has no dot where bone does. Did not worry about rotations! Known glyph with lowest badness wins. (Some reward for good bits also.)

* Did the boxing (identifying a glyph) only up to the baseline, because the under-baseline matras can get kerned below the next letter, interfere with keeping them separate, etc. Turned out to be enough.

* Always keep in mind that 99.5% is good enough; cuts out a lot of complexity.

acqq · on June 10, 2019

It's a talk by two Daniels, a father and a son:

https://en.wikipedia.org/wiki/Daniel_H._H._Ingalls_Sr.

and

https://en.wikipedia.org/wiki/Dan_Ingalls

The later: "is a pioneer of object-oriented computer programming and the principal architect, designer and implementer of five generations of Smalltalk environments. He designed the bytecoded virtual machine that made Smalltalk practical in 1976. He also invented bit blit, the general-purpose graphical operation that underlies most bitmap graphics systems today, and pop-up menus. He designed the generalizations of BitBlt to arbitrary color depth, with built-in scaling, rotation, and anti-aliasing."

As Dan Sr. talks about the research on how the epics were made and recited, he refers, around the 12th minute, to Milman Parry and Albert Lord, and we're now lucky that there is Milman Parry Collection of Oral Literature On-Line:

https://mpc.chs.harvard.edu/

Although most of it is in too "raw" state to be used by the researchers, who probably depend on the more processed editions.

tragomaskhalos · on June 10, 2019

Interesting bio info for Ingalls Jnr, thanks: Fun to see his wry references to the performance issues that dogged Smalltalk and which, I would argue, ultimately prevented it from achieving the far more significant place in programming that its design deserved.

jecel · on June 11, 2019

The talk is from 1980, but right after that L Peter Deutsch and Allan Shiffman added dynamic compilation to Smalltalk (known as "JIT" these days) and made it a reasonable option on the new 68000 based workstations that were coming out. This technology was spun out as the company ParcPlace in 1987. So Dan's comment "... or make Smalltalk faster!" was actually prophetic.

AdmiralAsshat · on June 10, 2019

This should be uploaded to the Internet Archive for preservation.