

Damn Cool Algorithms: Levenshtein Automata - mbowcock
http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata

======
mattdeboard
For those of you like me who tried to read this article without the proper
education to understand everything at first pass, here is the definition of a
couple of acronyms he used without defining first:

NFA = Nondeterministic Finite Automata
([http://en.wikipedia.org/wiki/Nondeterministic_finite-
state_m...](http://en.wikipedia.org/wiki/Nondeterministic_finite-
state_machine)) DFA = Deterministic Finite Automata
([http://en.wikipedia.org/wiki/Deterministic_finite-
state_mach...](http://en.wikipedia.org/wiki/Deterministic_finite-
state_machine))

------
sk5t
I wonder if this method could extend to character transposition and weight-
able costs; e.g., allow "hello"/"ehllo" = distance 1, as logically this is a
single typing mistake of whacking the 'h' a bit early.

I've used the perl String::Approx module to great effect, but it takes a lot
of resources to blast through a large dictionary.

~~~
robrenaud
You can do both supporting variable weights and transposition. The key is
understanding the initial construction of the DFA.

It's pretty general and clever in its simplicity. It's really just trying all
possible edits, recording the cost and progress until the costs exceed the
budget or you've reached the goal.

To handle transposition, you need the state to remember the last character
typed if it happens to match the next character, so the state space gets
bigger. In your example for hello, from the start state you'll introduce a
state for starting with e. This state encodes typed 'last char was e, matched
nothing, cost 1'. From here, when you get an h, you move to the 'matched first
two characters, used cost of one' state.

------
nikcub
I thought for a moment that Nick is back writing blog posts, but this is from
the archives.

If you are a developer, esp Python and AppEngine, you really should read all
the archives - it is one of the best general online sources for python,
appengine (and the datastore) and dev.

~~~
mbowcock
I wasn't familiar with the site but found this post interesting. After digging
around for a while there is a lot of good content.

------
s-phi-nl
Unfortunately, "food" is a poor example for explaining how Levenshtein
Automata work in general because it has a double letter. This means that in
one state, you have the "o" transition to more states than usual because it is
both the next letter and the next letter after that. If you want to generalize
his automaton, do the powerset construction for some other word like "news".

------
viggity
I use levenshtein heavily for one of my products <http://www.getatomiq.com>

~~~
krschmidt
Me too, and for a rather similar purpose, though for a different market.
<http://www.myhandin.com>

