
From Google (Norvig): How to write a spelling check in 20 lines (very cool) - chandrab
http://www.norvig.com/spell-correct.html
======
python_kiss
As one of the collaborative projects on my social network, we asked that
people code a spelling check script in PHP. One member, who had zero previous
knowledge of PHP took up the challenge. After investing a few hours into it,
he came up with the following:

<http://shuzak.com/Teamwork.php>

You're welcome to download the source code. One major issue with the code is
that it loses efficiency when the dictionary size is increased from 1,000 to
20,000 words.

~~~
patryn20
Memcache your dictionary if you can. Or dump it into a MySQL heap table. That
will speed it up a lot.

Seriously, though, looping through the dictionary is really a bad solution.
Maybe reading the dictionary into an associative array and matching on index
or something like that would be a better idea for speed. Loop through words in
the post instead of words in the dictionary.

The major limitation here is that it requires a word to be in the dictionary
to be considered correct. Non-optimal. No pluralization handling (other than
brute forcing by adding to the dictionary), no possessive case (as was noted
in the code), and no new-line handling (hint: simply strip all newlines from
input, replace with a space, and be done with it. Check for hypens before a
new-line to detect continued words; though, that is unlikely in a web post.
Just don't use the output directly into a post. Maintain a control copy.)

Also, I can't download the source directly (file not found). I had to edit the
URL to make it work. You need to fix the link by removing the second instance
of shuzak.com.

------
gms
The headline is somewhat misrepresentative - this is on Mr. Norvig's personal
site, so it's only from him, not from Google.

------
dhbradshaw
Half of the value of this example is the chance to see beautiful code. Looking
at it has improved my programming.

That makes me wonder: are there repositories of quality code out there for us
to use as training sets? Links?

