

Natural Language Corpus Data (2009) [pdf] - danso
http://norvig.com/ngrams/ch14.pdf

======
agibsonccc
This might be relevant for some people:
[http://storage.googleapis.com/books/ngrams/books/datasetsv2....](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)

------
danso
FYI, I found the OP while re-reading Norvig's classic on spell-checker design,
in which he mentions the chapter in passing: [http://norvig.com/spell-
correct.html](http://norvig.com/spell-correct.html)

The chapter is part of a book, Beautiful Data, and Norvig has a separate page
for the source code:

[http://norvig.com/ngrams/](http://norvig.com/ngrams/)

~~~
abecedarius
Fun fact: the edits function in the newer chapter doesn't find quite the same
edits as the older spelling corrector. (I found one or two differences in
testing on /usr/share/dict/words -- originally with the code I'm acknowledged
for at the end, but I think I checked it was still there in the chapter's
version of the code. But since the difference is so rare and correctness is
probabilistic anyway, it's kind of nitpicky to call it a bug.)

