

A Spellchecker Used To Be A Major Feat of Software Engineering - quanticle
http://prog21.dadgum.com/29.html

======
chime
While these specific challenges are trivial now because of increased memory
and CPU, user expectations have increased too, requiring more advanced
features which are not trivial to implement across different platforms. Users
now expect good auto-suggestion when doing text entry on mobile platforms and
laugh when it fails ( <http://damnyouautocorrect.com/> ). I just launched an
iPad app to improve communication for speech-disabled users and nearly
everyone I talked to wanted word-completion and next-word suggestion, even
when offline.

While it seems trivial for Google or a database backed server to provide real-
time intelligent suggestions (e.g. suggest 5 words that follow: Harry),
implementing such a feature on iPad took me over a month even though I knew
exactly what I wanted to make and had all the necessary data beforehand. I had
a list of 1 million words and phrases with frequency of usage (16mb of text
data) and wanted to suggest 3-7 words in real-time as the user typed each
letter. And implementing this on iPad required quite a bit of engineering and
optimization.

~~~
jbrennan
That sounds like a really cool technical challenge. Care to write about how
you solved it?

------
yason
Oh, those were the times.

What makes working in a finite, very limited set of resources so rewarding is
that those limitations turn mere programming into art.

You can't have art unless you constrain yourself somehow. Some people paint
with dots only and some people express themselves in line art. If they allowed
themselves any imaginable method that is applicable, they wouldn't be doing
art. They could just take a photograph of a setting, and that photograph
wouldn't say a thing to anyone.

Endless bit-twiddling and struct packing may turn trivial methods into huge
optimization-ridden hacks and not get you too far vertically but given only
few resources, those hacks are required to turn the theoretic approach into a
real application that you can actually do useful work with. Those hacks often
exhibit ingenious thinking and many examples of that approach art—and the best
definitely are. And any field where ingenious thinking is required will push
the whole field forward.

Similarly, for example, using a Python set as a rudimentary spell-checker is
fast, easy, and convenient but it's no hack because it requires no hacking.
It's like taking that photograph, or using a Ferrari to reverse from your
garage out to the street and then driving it back. Which ingenious tricks are
you required to accomplish that? None.

The bleeding edge has simply moved and it must lie somewhere these days, of
course. Maybe computer graphics—although it has always demonstrated the
capability to lead the bleeding edge so there's actually no news there. The
fact is that the bleeding edge is more scattered. Early on, every computer
user could feel and sense the bleeding edge because it revolved around tasks
so basic that you could actually measure with your own eyes. Similarly, even a
newbie programmer would face the same limitations early on and learn about how
others had done it. Now you can stash gigabytes of data into your heap without
realizing that you did, and wonder why the computer felt sluggish for a few
seconds. Or how would you compare two state of the art photorealistic,
interactive real-time 3D graphics demos? There's so much going on beyond the
curtains that it's difficult to evaluate the works without having extensive
domain knowledge in most the technologies used.

Findind the bleeding edge has become a field in itself.

~~~
joemoon
I understand your general thesis, but your statements about art just seem
completely off.

> You can't have art unless you constrain yourself somehow.

> They could just take a photograph of a setting, and that photograph wouldn't
> say a thing to anyone.

While I realize that art is subjective, I'm very surprised that you would put
these conditions around what you consider to be art. Especially since it seems
to be a condemnation of a large subset of photography.

~~~
yason
Just curious, but would you like to give me a couple of examples of recognized
good art that isn't constrained by some rule, method, technique, or approach?

A large subset of photography isn't art. In fact, most everything people
create isn't art per se—if it were, there wouldn't be good art nor bad art,
just art and lots and lots of it. Spend a few hours on some photo-sharing site
and see what people shoot. They're photographs, but rarely art.

But there are grades of art. Look at this search: <http://goo.gl/2mLVI> — a
thousand sunset pictures, while maybe pretty, aren't generally art and not
because it's the same sun in each picture. Most of these pictures have nothing
to say. Now, some object lit by the sunset or silhouetted against it gives a
lot more potential to be art. A carefully crafted study of a sunset in the
form of a photograph _can be_ art, but it requires finding certain constraints
first, finding a certain angle that makes the photograph a message, and
eventually conveying through the lens _something that makes_ the viewer stop
for a moment, to give an idea, to give a feeling, to give a confusion.

~~~
joemoon
I'm genuinely mystified by your response. Art is an entirely subjective
endeavor. You seem to think that you can decide what is and is not art.
Frankly, you're not qualified for that task (read: no one is).

> A large subset of photography isn't art

> a thousand sunset pictures, while maybe pretty, aren't generally art

Seriously? Because you get to decide? Your response just seems incredibly
egocentric. You can define what art means to you all day long, but you can not
define what art means to everyone.

------
Radim
I know it's HN folklore to claim "Trivial! Would do over a weekend!", but
please...

Doing a half-decent production spell checker is STILL a major feat. Same as
"just crawling the web" (further down the discussion). Both require problem
understanding and engineering you can't see and appreciate at a glance.

And no, looking up individual words in some predefined dictionary doesn't
qualify as half-decent spell checking, especially for non-English languages.
Spelling correction is another step.

    
    
        "Their coming too sea if its reel."

~~~
nandemo
That's not the point of the article. He's not talking about writing a state-
of-the-art spelling-and-grammar checker.

> And no, looking up individual words in some predefined dictionary doesn't
> qualify as half-decent spell checking,

Well, but the author _is_ talking about _that_ problem! Even if you don't
consider that real spell-checking, his point still stands. Let's define
_crappy-spell-checking_ as "looking up individual words in some predefined
dictionary"; that problem used to be hard and now it's very easy, as in, you
could write one in 15 minutes using Python.

~~~
Radim
Ok, fair point -- I blame the misleading title :)

I read the point of the article to compare "spell-checking then (80s) and
now", whereas others read it more along the lines of "looking up static
English words then and now". Your nickname sounds japanese, but I assume
you're talking about English as well, with those 15 minutes.

------
Dn_Ab
Interesting. But maybe a better title would be: _A Spellchecker Used To
Require A Major Feat of Software Engineering_.

Some may say something is lost and resources wasted - taken for granted as we
now brute force our way through such problems. Surely going backwards?

but now we are, a million times a second; free to disambiguate a word's
meaning, check its part of speech, figure out whether its a named entity or an
address, figure out if it is a key topic in the document we are looking at and
write basic sentences. I agree. That is progress.

~~~
Evgeny
_Some may say something is lost and resources wasted - taken for granted as we
now brute force our way through such problems. Surely going backwards?_

I think it is progress: Yes, we can brute force today through the problems
that were a feat a decade or two ago. But: Not having to solve those problems
frees a lot of time. Time that can be spent on problems that are a major feat
today.

------
DanBC
It's a shame that solutions like Bob Morris's no dictionary spell checker[1]
are left languishing just because we all have fast computers.

Getting better at a problem in one domain can spin off benefits in others.

Spelling is part of language, and language is something that computers are
really bad at. Brute force helps a bit with that (auto-correct; siri;) but
better understanding would be cool.

[1]
([http://www.spellingsociety.org/journals/j20/spellchecking.ph...](http://www.spellingsociety.org/journals/j20/spellchecking.php))

Morris, Robert & Cherry, Lorinda L, 'Computer detection of typographical
errors', IEEE Trans Professional Communication, vol. PC-18, no.1, pp54-64,
March 1975.

------
erikb
It seems the author of that article doesn't know that spell checking,
translating and understanding text are actually major features of pretty new
software, too. I don't know how the spellcheckers are for English, but in
German they really suck since years. You can't just let MS Word autocorrect
your text, because the spell checker will insert more errors than it can
remove. So when I want to find out, how to spell a word correctly in German, I
just use the Google search bar. Why? Because they consider spell checking a
hard task TODAY!

Spell checking is not about comparing a list of words to what the user wrote
and tell him what didn't match. It's much more about understanding the users
intention and helping him shaping that intention into an officially recognised
grammar/spelling. Example: "then" is a correct word. But in the context of
"Google's spell check is better then Word's" "then" is actually wrong. (Google
also doesn't tell you about that mistake but the first search result actually
contains a "than", which is recognised as what you actually meant)

I hope I could make it clear, why I think it still is a major feature to have
the best spell checker and I think, the title really should be revised.

~~~
cbsmith
Yes. This whole article struck me as the spell checking equivalent to "I can
build Map Reduce in 5 lines of language N" meme that went around a while ago.

------
zacharyvoase
This relates to Fred Brooks's 'No Silver Bullet': the 'accidental complexity'
of coping with hardware constraints has given way to the 'essential
complexity' of writing an algorithm to check someone's spelling.

------
dangoldin
I know Bloom Filters came in handy for this sort of work but I'd love to see
some other data structures and algorithms that were developed in the 80s to
deal with limited memory.

~~~
btn
Jon Bentley's _Progamming Pearls_ books are full of such insights into
designing incredibly efficient data structures and algorithms.

~~~
daantje
Yes, and "Programming Pearls" has a nice chapter (13.8) on how doug mcilroy
fit the spell dict into less than 64k ( <http://code.google.com/p/unix-
spell/>).

That site also has a paper on the development of spell ([http://unix-
spell.googlecode.com/svn/trunk/McIlroy_spell_198...](http://unix-
spell.googlecode.com/svn/trunk/McIlroy_spell_1982.pdf)).

------
RodgerTheGreat
Progress for computing, absolutely. Progress for software engineering?
Questionable. Are the clever solutions applied in legacy spellcheckers
obsolete simply because sufficient brute force is now available on everyday
machines to solve everyday problems?

~~~
boredguy8
There are always questions on the edge of computational complexity. In 30
years: "Indexing all of the individual web pages in the world used to be a
tough challenge. Network speeds were in the hundreds of megabits, and..."

~~~
hyperbovine
I would be surprised if that analogy holds up. I have heard the Google guys
state that the growth of the web easily outpaces advances in computing and
bandwidth. The English language, not so much...

~~~
boredguy8
You're missing the point. "Growth of the language" wasn't the problem for
spell-check, it was the size of the language combined with limited space.
Computational breakthroughs solved that problem. The parent to my post implied
that solving the 'space' problem in a different way (i.e. by having more
space) somehow harmed software engineering, which it hasn't. Ergo, the
suggestion that the web is outpacing computing and bandwidth _all the more_
supports my point, since that problem will be persistent.

------
mda
Designing a high quality and compact spell checker for agglutinative languages
like Turkish or Hungarian is definitely a non trivial and challenging problem.

------
WalterBright
Now, if they could only fix OCR technology! I'm getting tired of "the" being
scanned as "die".

~~~
simcop2387
Just run it through google translate, German -> English. That'll fix "die"
into "the", but it'll make some other things strange I bet.

------
bozho
It's not that easy today either. Loading a list of words in a hashtable will
work for..Chinese (pinyin), but most languages have declensions and
conjugation. And you want to validate both "cat" and "cats", both "walk",
"walking" and "walked" and the word list normally wouldn't contain those.
(haven't checked the English ones, but it certainly doesn't for languages with
more complex inflection).

Yes, you don't have the memory complications that are really hard, but you
still need to think. Get a proper data-structure (Trie, for ex), and fill it
with all forms of the words.

------
joejohnson
How long was it until a grammar checker was able to be implemented with these
space constraints?

------
ADent
I had a decent spell checker on my Apple //c - 128K RAM (64K commonly usable)
and had to fit on 140KB floppy. While slow it worked better than about 10
years of MS Word.

