
A Spellchecker Used to Be a Major Feat of Software Engineering - breily
http://prog21.dadgum.com/29.html
======
nirmal
<http://www.norvig.com/spell-correct.html>

A simple Python based spell checker along with some discussion of probability
and the pitfalls of doing things the naive way.

~~~
aston
Norvig's code solves an even more difficult problem, spell _correcting_. But
yeah, good read nonetheless.

------
pistoriusp
I remember bloom filters (<http://en.wikipedia.org/wiki/Bloom_filter>) been
good for this sort of thing.

~~~
jrockway
Thanks for this link. Much more interesting than the article.

~~~
dangoldin
Definitely. I enjoy reading about these data structures/algorithms.

Seems a lot of these cool things came out in the 80s due to the limits of
memory/computational speed.

------
pfedor
Looking up a word in a dictionary is only part of what a spellchecker does.
The other part is offering suggestions. This one is still not easy, especially
if you want it to give reasonable suggestions for all languages. For example,
the algorithm vim 7.0 uses (based on tries) was first described in a research
paper but was not implemented until Bram Moolenaar did it, because the
implementation was hard.

------
tlrobinson
Random side note... justin.tv's spellcheck problem is a fun one...
<http://www.justin.tv/problems/spellcheck>

------
dfranke
I don't get what the big deal is. As long as the document you're checking fits
in RAM, then you can just sort the words alphabetically and then check them in
a single pass through the dictionary. If you really need random access, use a
B+ tree.

~~~
tom_rath
Where do you plan to store the dictionary for reference?

~~~
gojomo
Gzip compression will achieve around 6x compression on a sorted wordlist. So
even the 2008 wordlist of 240K words would reduce to about 400K. Cut the list
in half or more, use a compression better suited for a sorted wordlist, and
you could have a useful wordlist that would fit temporarily in RAM even in
256K machines.

~~~
eru
Perhaps you should do a <http://de.wikipedia.org/wiki/Burrows-Wheeler-
Transformation> first?

------
falsestprophet
It still is I think. Nothing seems to compare to Google's spell check.

~~~
ashleyw
Yes, and in some situations, you can easily tap into googles spell check for
you own automated needs.

------
tptacek
It was just as trivial then, in decent languages, as it is now. Conversely, if
you make your data set large enough (say, the whole web as a corpus, not
/usr/share/dict/words), the same problem is just as hard today.

------
jrockway
Executive summary: "programming was harder when there was less abstraction".

I was hoping for a discussion of some algorithms, but instead I just got a
long winded "wow, progress is amazing" rant.

This post has inspired me to start a new blog where I state the obvious and
then submit it to every social news site. My first article will be "the sky is
blue" except I will explain that in 1000 words or more. With ads.

~~~
xlnt
now that you've given away the summary, i'm not going to read your post. and i
was planning to click on lots of ads.

