
A Spellchecker Used to Be a Major Feat of Software Engineering (2008) - jaybosamiya
http://prog21.dadgum.com/29.html
======
rwallace
I remember when I was about twelve or thirteen, seeing some mention of
automatic spelling checking, trying to figure out how it could be done, and
ending up very skeptical. It seemed to me there just wasn't enough regularity
for any compact algorithm to do the job, so it just couldn't be done without
literally having a list of every word in the English language, and even if you
had a 64K machine, that wouldn't fit in memory alongside a word processor and
your document.

~~~
VonGuard
Imagine my surprise when my final project in discrete math class in college
was building one of these.

------
teh_klev
Obligatory Peter Norvig "How to Write a Spelling Corrector":

[http://norvig.com/spell-correct.html](http://norvig.com/spell-correct.html)

------
bithive123
Others might also find interesting this video from 1982 where Brian Kernighan
demonstrates building a simple spell checker using UNIX command line
utilities:
[https://youtu.be/XvDZLjaCJuw?t=5m15s](https://youtu.be/XvDZLjaCJuw?t=5m15s)

~~~
danso
That was a great watch...and if you keep on watching for a few more minutes,
you can watch Lorinda L. Cherry rewrite his implementation using Unix pipes
and then creating a talking calculator. The screencasting technology wasn't
quite there yet ;), but both are really excellent demonstrations and
eludications of the Unix philosophy and mindset that is pretty much unchanged
and invaluable ~34 years later
[https://youtu.be/XvDZLjaCJuw?t=13m47s](https://youtu.be/XvDZLjaCJuw?t=13m47s)

------
guard-of-terra
This is perhaps true for English, isolating language with clear word
boundaries.

Making spellchecker for Russian (inflecting language), Japanese (I think they
don't always put spaces between words, Koreans less so) or Finnish
(agglutinating language) still takes good part of a decade, more or less.

------
tokenadult
Natural language processing is HARD, and is an ongoing topic of research.
Making a spellchecker that works right is still a major feat of software
engineering. I have to hand-tweak all the spellcheckers in all the software
programs I use that include spellcheckers.

------
gmu3
I still remember getting a new computer for Christmas almost two decades ago
and seeing Microsoft underline words red and correct spelling/grammar the
first time. It felt like magic.

------
personjerry
But it seems like to me the "progress" mentioned is more in the hardware than
the software, in contrast to the emphasis on Software Engineering from the
title.

~~~
ihm
We still could be stuck writing assembly on super fast hardware. Programming
languages' UI has definitely improved, enabled to be sure by hardware (as in
his Python/Perl example).

------
ufmace
Spell-checking seems straightforward enough now, if you're just comparing
words against a stored dictionary. The tricky part seems to be providing
reasonable suggestions for a misspelled word.

------
rileymat2
I would like to see some progress on the OSx spell checker. I am tired of
going to google to get the correct spelling for words. It seems to be much
better at correcting spelling.

~~~
sp332
Google's spellchecker has been watching billions of people correct their own
typos keystroke-by-keystroke for years. It runs on a global network of
computers with untold hoards of RAM and disk arrays with indefinite IOPS. Your
OSx spellchecker gets updated... occasionally, and it runs on your Mac.

~~~
true_religion
> It runs on a global network of computers with untold hoards of RAM and disk
> arrays with indefinite IOPS.

Much of that is just to provide automatic spell checking to millions of people
concurrently.

I would doubt the actual spell checking software for a single language,
excluding proper names, would be something a Mac can't run.

------
coldcode
Everything was a major feat if you go back far enough. Having started as a
programmer in the early 80's every programming generation did things that
seemed impossible to the prior one. Today the "generations" just happen
faster.

------
timonoko
Tell me about it:
[http://www.ling.helsinki.fi/~fkarlsso/genkau2.html](http://www.ling.helsinki.fi/~fkarlsso/genkau2.html)

Karlsson's Finnish spellchecker was already there around 1982(?), because it
is rulebased, but very complex. Ruletables are sized almost like english
dictionaries.

Google tries to compile Finnish language based on dictionaries, but fails
everytime and will fail forever.

------
transfire
Er... nope. Spellcheck is not just a matter of looking up a word in a hash
table. Spellcheck is still a hard problem even if hardware and software make
it a bit easier today than it was a couple decades ago. To elucidate,
computers are still far too slow to do a simple brute force search for the
closest matching words. One has to use binary searches, at least, or better,
tree data structures to get good performance.

------
ntumlin
Does anyone have a link to specific tricks used or even implementations of
spellcheckers from then?

~~~
PeCaN
AFAIK most of them use some blend of edit distance (usually a modification of
Levenshtein distance to support transpositions) and often a phonetic algorithm
like Soundex[1].

Dictionaries are most often stored as a radix trie, and a bloom filter can be
used to quickly check if a word is misspelled.

EDIT: I'm dumb, you said "from then". Whoops. If I had to guess it was
probably similar, as radix trie + bloom filter is a very memory efficient way
to store and search the dictionary.

1:
[https://en.wikipedia.org/wiki/Soundex](https://en.wikipedia.org/wiki/Soundex)

~~~
unhammer
Bloom filter: That's assuming your vocabulary is reasonably finite.
[http://www.ling.helsinki.fi/~fkarlsso/genkau2.html](http://www.ling.helsinki.fi/~fkarlsso/genkau2.html)
has a list of forms for a single finnish noun. That's including case,
possessives and clitics, but you can also have compounding (eg. several nouns
in a row as one word) and derivations into other parts of speech (verbs become
nouns etc). And once you've found out it's misspelt, you need suggestions..

------
rabbyte
I can only conclude from this that every line of code you write today, no
matter how brilliant or beautiful, is an ugly hack.

~~~
hueving
Why would you conclude something so cynical and ignorant? It's an indicator
that in general we are advancing forward.

~~~
rabbyte
I mean it as a matter of perspective. As a perfectionist, I find it helpful.
Sorry you took it as ignorant cynicism.

