

Ask HN: N-Gram spelling correction - arthurk

Hi,<p>I was wondering why no one has done an n-gram spelling correction yet. Nearly all the research papers take spelling correction as an example of what can be done with n-gram data, yet I see no services that make use of this.<p>What are the disadvantages of using n-gram data for spelling correction?
======
ynn4k
A general problem with n-gram is the conundrum of data-sparseness vs
reliability of estimation. To have reliable estimation, you need larger order
n in n-gram, but it also increases the size of the model which requires larger
amount of data and storage. Thanks to the Web as a corpus and cloud computing,
we now have upto 5-gram models computable on Terabytes of data provided you
are resourceful. One problem with this approach is the selection of the web
data to be used for training. The better adaption to the target scenario, the
better accuracy.

    
    
      i see no services that make use of this.
    

Most services have proprietary implementations of spell correction that is an
amalgamation of several techniques including n-grams, and they might not like
to make it public.

------
sagacity
Very interesting question. We've been thinking of offering something like this
at the back of:

<http://www.RapiDefs.com>

Will post more on this later today.

