

Levenshtein distance and the Triangle Inequality - alter8
http://richardminerich.com/2012/09/levenshtein-distance-and-the-triangle-inequality/

======
Sniffnoy
This post is confusingly written. As the author points out, the issue is not
with Damerau-Levenshtein distance, which certainly does obey the triangle
inequality; rather the issue is with the incorrect algorithm used to compute
it. Nonetheless, after the initial introduction he usually refers to it as
"Damerau-Levenshtein" when in fact it's an incorrect version of Damerau-
Levenshtein.

The difference he's pointing out isn't that Levenshtein obeys the triangle
inequality but Damerau-Levenshtein doesn't; it's that a naive algorithm to
compute Levenshtein works, but a naive algorithm to compute Damerau-
Levenshtein doesn't -- and that the measure it does compute does not obey the
triangle inequality.

While it's clear that the author recognizes this, he should really be more
explicit and avoid conflating terms like this; this sort of thing is going to
confuse people.

~~~
malkia
according to wikipedia Damerau-Levenshtein does not obey the triangle
inequality -
[http://en.wikipedia.org/wiki/Damerau–Levenshtein_distance#Ap...](http://en.wikipedia.org/wiki/Damerau–Levenshtein_distance#Applications)

~~~
Sniffnoy
Read more closely -- it's the "restricted edit distance" which does not obey
the triangle inequality, which, if you read the "Algorithm" section, is not
actually the same thing as Damerau-Levenshtein distance. (Notice also how the
section you point to makes sure to differentiate between restricted edit
distance and real edit distance, i.e. Damerau-Levenshtein distance.)

That Damerau-Levenshtein distance obeys the triangle inequality follows
trivially from the definition, since it's just distance in an appropriate
graph.

~~~
malkia
Thanks, Sniffnoy!

------
tomrod
You know, I have a mathematics and economics background. I love coming across
CS/Applied Math gems like this. In my daily work I never even consider the
computational consequences of resorting. Love it.

------
sauravc
Reminds me of this blog entry:

[http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-
Part-1-BK...](http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-
Trees)

------
numlocked
I must be losing my mind, but I can't figure out what 3 edits would get you
from rick->irkc in the final diagram. It seems like the distance is 4, not 3
(not problematic because the triangle inequality still holds, but it's bugging
the heck out of me).

~~~
jknighton
I trust you will find the following works, based on following the minimal
branches of the Levenshtein Algorithm.

"rick", "irkc": remove the final character of "irkc" (insertion)

"rick", "irk": remove the final character of both strings at no cost

"ric", "ir": remove the final character of both strings at a cost of 1
(substitution)

"ri", "i": remove the final character of both strings at no cost

"r", "": base case, cost of 1.

In summary, "rick" -> "rickc" -> "ickc" -> "irkc".

