

Damn Cool Algorithms: BK-Trees - nickb
http://blog.notdot.net/archives/30-Damn-Cool-Algorithms,-Part-1-BK-Trees.html

======
shahper
I couldn't understand what this means -- Say we take an arbitary string, test
and compare it to query. Call the resultant distance d. Because we know the
triangle inequality holds, all our results must have at most distance d+n and
at least distance d-n from test.

Can anyone please explain me in more simpler terms.

~~~
shiro
It's easiser to understand by using 2D geometry. Pick one point on a paper and
call it _query_ , the input, and draw a circle of radius _n_ around that
point. Your goal is to find the points in the dataset which are inside the
circle. Let's call those points _answers_.

Pick an arbitrary point (in the dataset) and call it _test_. The distance
between _test_ and _query_ is _d_. Now, you can see the distance between
_test_ and the furthest and closest points in _answers_ are _d+n_ and _d-n_ ,
respectively (assuming _d_ > _n_ ).

Since the dataset is organized by distance, we can quickly narrow the range of
search.

~~~
shahper
thanks dude, for clearing it up. I don't know why I always get confused on
easy things like this..

------
machine
Metric trees (sometimes called ball trees) are a simple generalization of this
to arbitrary metrics. They're also useful for points in a high dimensional
euclidean space that have low intrinsic dimensionality.

------
enonko
I don't see why this would be much better than simply doing fuzzy match on a
trie.

~~~
aswanson
How do you do a fuzzy match? Are there any sites out there that deal with all
the text matching algorithms? I need to do this for a project.

~~~
aston
No websites, probably, but lots of papers. A good start might be
<http://courses.csail.mit.edu/6.851/spring07/lec.html> (lectures 8 and 9 or
so).

~~~
aswanson
Thanks.

------
chwolfe
Very cool. Has anyone here used BK-Trees in their work?

------
henning
Needs more code.

