
Damn Cool Algorithms: Levenshtein Automata - desigooner
http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata
======
sqrt17
Actually, if you want to see whether the edit distance (Levenshtein) is within
a fixed bound, say k, it's no longer O(mn), but around O(kn) (since you're
only ever filling a narrow band of width ~k along the diagonal instead of the
whole rectangle).

Nonetheless, the DFA construction is priceless if you want flexible matching
against a dictionary.

------
viggity
I used Levenshtein Automata when I wrote Atomiq Duplicate Code Finder
(<http://www.getatomiq.com>). It is some pretty cool stuff!

~~~
Confusion
Something I wonder about sometimes: how did you know about the existence of
this algorithm? Do you:

\- Simply know about it, because it was mentioned once, somewhere in a book on
algorithms or a class and you remembered it when you knew what you needed?

\- Knew what you needed and searched for it. If so, how do you know where to
search and what terms do you use?

\- Rediscover the algorithm, because you knew what you needed, implemented it
and recognized it only later?

I don't have a formal CS background and I sometimes get stuck on: I know what
I need and I'm sure someone has implemented it before (and better), but I
don't know how to get at that information.

~~~
sqrt17
There's several central ideas in CS that most people know. (Not everyone knows
every single of them, but most people know many of them). One of them is the
idea of finite-state automata. If you have a finite set of strings, or a
regular language, you can build a DFA to recognize that.

Once you know know about DFAs, and you realize that there are only finitely
many words that have a Levenshtein distance of less than k, it's just a
question of (i) practicality and (ii) technical details. But when you have the
idea to combine concepts X and Y, you know what terms to search for and what
papers you want to look at.

As to practicality, there's potentially many words that have a Levenshtein
distance of at most two for some word, but you don't care about the bits that
are different, so the automaton is much smaller.

Part of being a successful CS background is to know all the kinds of hammers
that are around and having a feel for the right hammer to use given a
particular nail.

In this case, the hammers are abstractions. Algorithms are fundamentally
driven by abstractions into which you then fill necessary technical detail -
getting the technical details wrong may leave you with a correct but
inefficient algorithm at worst (but enable you to recognize a more efficient
way when you come across it), whereas you're bound to reinvent a new duct-
taped makeshift wheel wherever you go when you don't know the right
abstractions.

~~~
Confusion
Thanks for your explanation. It's a reasonably comforting answer, since it
suggests that as long as I spend time picking up concepts (which I have been
doing ever since I went into software engineering) and connecting my problems
to those concepts, there is a reasonable chance I will stumble upon a
beautiful solution like a Levenshtein automaton, if it exists for the problem
at hand.

------
moultano
Saving this to read for later. I just wanted to say in advance that it's
really refreshing to see a blog so well designed that I feel like I've
_already_ clicked my Readability bookmarklet when I first arrive.

~~~
eru
Hmm, the first pictures (that floats right from the text) touches the text.
That's a bit ugly. Otherwise, it's a nice and readable design.

------
caf
This is one of those algorithms that's so clever that the end result almost
feels like magic.

------
torpor
Great article, and as it happens right now I'm using this algorithm in a
hardware MIDI sequencer I'm working on .. it works very well for phrase-based
recognition of patterns in a MIDI context, also ..

~~~
Confusion
I've posed the same question to viggity, but in case you only see this in your
'threads' view: how did you know about the existence of this algorithm? Do
you:

\- Simply know about it, because it was mentioned once, somewhere in a book on
algorithms or a class and you remembered it when you knew what you needed?

\- Knew what you needed and searched for it. If so, how do you know where to
search and what terms do you use?

\- Rediscover the algorithm, because you knew what you needed, implemented it
and recognized it only later?

I don't have a formal CS background and I sometimes get stuck on: I know what
I need and I'm sure someone has implemented it before (and better), but I
don't know how to get at that information.

~~~
torpor
I first encountered the Levenshtein distance at work (where I am part of a
team developing SIL-4 level operating systems for use in the transportation
industry), as part of our coding rules which describe the differences in
variable names allowed (e.g. "char fic; char foc; // ") .. After I looked up
the algorithm on my own intuition, I realized it had applicability to MIDI
sequences in unique ways, and thats how it came about.

