
A dive into spatial search algorithms - signa11
https://medium.com/@agafonkin/a-dive-into-spatial-search-algorithms-ebd0c5e39d2a
======
geophile
This discussion is incomplete without mentioning space-filling curves. These
curves map the data into a 1-d space and then you can use _any_ conventional
search structure -- sorted array with binary search, balanced tree, skiplist,
b-tree, etc. This allows you great flexibility, including the use of 3rd party
libraries of data structures.

(A quadtree is basically the z-order space-filling curve combined with a trie
of degree 4. Octtree -- degree 8.)

This approach also handles non-point data very cleanly. A non-point object
gives rise to multiple index entries. (You can tune the number of index
entries quite easily.)

Space-filling curves also leads to a very nice kNN algorithm: To find the k
nearest neighbors of p, first search the index for p. Find k neighbors along
the space-filling curve, (these are adjacent in the index), and of these, pick
the point q maximizing distance(p, q). All k nearest neighbors of p must be in
a circle centered at p with radius distance(p, q). Now search your index again
using this circle and pick the k points n1, ..., nk minimizing distance(p,
ni).

~~~
robrenaud
Yeah, the effectiveness of that idea mad me a little sad when I worked on
google maps. I spent all this time studying computational geometry in college
and it went out the window. Use space filling curves and then standard text
indexing techniques and you don't need to deal with any fancy geometric
algorithms.

~~~
amelius
Space filling curves, although elegant, are not a solution for every problem
in computational geometry of course.

------
vvanders
Along similar lines Real-Time Collision Detection[1] is a tour-de-force in
spatial partitioning algorithms. It's basically data-structures for 2D/3D
along with some really good coverage of common floating point numerical issues
and cache friendly algorithms.

Easily the in the top 3 technical books I've purchased. Each page is solid
gold and very little space is wasted.

[1] - [https://www.amazon.com/Real-Time-Collision-Detection-
Interac...](https://www.amazon.com/Real-Time-Collision-Detection-Interactive-
Technology/dp/1558607323)

------
mattnewport
Coming from a games background, I've not heard of an R tree before. Quadtrees
(or octtrees for 3D) are probably the most common spatial query data structure
used in game engines. I'm surprised they weren't mentioned in the article.
Gonna have to do some research on what the pros and cons of an R tree are vs a
quadtree now.

~~~
jandrewrogers
R-trees are relatively slow, particularly if you have a very high mutation
rate. Their sole redeeming feature is that they can handle non-point
geometries with constant space complexity whereas quad-trees have pathological
space complexity for geometries like polygons.

If you are just dealing with points, an R-tree is a poor algorithm choice.
Quad-trees are pretty close to ideal in terms of performance for that use
case.

~~~
santaclaus
> quad-trees have pathological space complexity for geometries like polygons

Any idea what is used in the wild for closest point queries against non-point
data (triangle meshes or otherwise)?

~~~
geophile
Quadtrees are fine. Just don't approximate the polygon to the limit of
resolution of the space. I wrote a paper many years ago on a related problem.
The conclusion was that a very small number of quads would be optimal. Too
many is obviously a problem; too few and you have to filter out a lot of false
negatives.

The paper was in ACM SIGMOD 89, Redundancy in Spatial Databases.

------
greg7mdp
You should check out the Churchill Challenge
([http://churchillnavigation.com/challenge/](http://churchillnavigation.com/challenge/))
and the 54 solutions (source code available at
[https://github.com/churchillnavigation/challenge1](https://github.com/churchillnavigation/challenge1)).
I came in at #7.

Beyond the classic spatial search data structures, the fastest solutions all
used SSE instructions (see [https://github.com/sDessens/churchill-
challange](https://github.com/sDessens/churchill-challange) for nice writeup).
People did try pretty hard to win the grand prize of $5000, and the fastest
solution was 7 times faster than the best optimized code that Churchill's team
had previously.

------
Boothroid
In my experience most of the work in this area has been done in that you will
rarely find yourself having to think seriously about spatial indexing, and
even when the task forces you to, most of the tools one would commonly choose
between have different indexing options implemented as standard.

------
neel8986
Nice algorithm. I am curious what happens when the dimension becomes really
large (more than 100). I have heard about kd tree but as far as I know the
performance becomes really bad for d > 16\. What should be the strategy of
finding nearest neighbours for such high dimensions?

~~~
snovv_crash
It often depends on what the actual dimensionality of the problem is. If it
can be embedded in a lower dimensional space, as many problems are, then a
well tuned kD-Tree can still do a good job up to about 16 dimensions of used
space, as you say. Cover trees also work well in this case, and are better at
extracting just the salient dimensions than kD-Trees are, especially as the
discrepancy between the subspace and data space grows.

If the dimensionality gets larger then a better approach is to instead remove
overhead and improve parallelism, for example by pushing the problem to the
GPU or discretizing the measurement space down to 8-bits or even 1-bit per
dimension. I know of products that do both, and solve O(N^2) problems 20x
larger than a naïve approach could in the same time. POPCNT runs really fast
on GPUs.

