

Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves - nickjohnson
http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves

======
crux_
Addenda & thoughts since I do a lot of work along similar lines:

It's probably not a bad idea to have the option of a spatial index at the
leaves of a geohash, as the geohash is necessarily of limited precision and
will behave badly if someone inserts 100k points all within a 1x1 foot area.
;) (This is equivalent, roughly, to a plain quadtree only with the first N
levels of the tree pre-calculated and cached locally.)

I'm guessing the riddle at the end ("Does it remind you of anything?") would
be gray code.

Quad-trees can be used for polygons, although they are less than ideal, by
storing a list of polygons for each quad. If a quad is entirely contained by
the polygon, add the polygon to its list; if a quad is part in and part out of
the polygon, add it to its list if length(poly-list) < heuristic_value;
otherwise split the quad and redistribute its list among its leaves. (The same
arrangement can be used for line segments only, too.) The "query for a region"
image shows how this would work, just change the caption to "insert a region."
;)

A reference for this type of index, which is called a PMR-quadtree:
<http://www.cs.umd.edu/~hjs/pubs/NelsSIGG86.pdf>

Another good option for points is a KD-tree. You can't turn it into a hash, at
least not straightforwardly, but it gives you some nice queries like
n-nearest-neighbours, and can be done with as many dimensions as you like
rather than just two. (Example use: accelerating the k-means clustering of big
multivariate datasets.)

~~~
rbranson
Quadtrees are actually actively used for spatially indexing polygons (in
shapefiles) by the OGR tools (part of the GDAL package) and MapServer.
Quadtrees work quite well in MapServer because it's primary use case is as a
read-only renderer of shapefiles. In this scenario, rebuilding the index from
scratch each time you make a change is trivial.

------
martian
This is a great write-up.

For n-dimensional Hilbert curves check out
<http://www.tiac.net/~sw/2008/10/Hilbert/>

------
EricBurnett
That was a good explanation to calculating the index of a point on the Hilbert
curve - I had been wondering how that was done myself. I'm going to have to
sit down and implement one of these someday.

Other than the quadtree, the Hilbert curve can be used to optimize the r-tree
as well: <http://en.wikipedia.org/wiki/Hilbert_R-tree> .

------
codexon
You may be interested in my more detailed writeup of morton codes here that
was briefly mentioned in this article. It also comes with Python code.

<http://www.codexon.com/posts/morton-codes>

------
wheaties
Very interesting stuff. Our project has been required to work with the ESRI
ArcObjects stuff for the past 4 years and I've been dying to get inside of
their algorithms. Thanks for resparking my curiosity.

~~~
crux_
ESRI ArcObjects under the hood:

    
    
       if (random()) { crash(); }
       else return UndefinedError;
    

Tongue in cheek, of course, and they've gotten noticeably better since 4 years
ago. (I've been working with ArcObjects since their 8.3 release... _shudder_.)

If you're interested in how they implement vector operations:

    
    
       - http://www.vividsolutions.com/jts/main.htm
       - http://trac.osgeo.org/geos/
    

As far as raster operations go, I think that GRASS does all they do, plus
more. All of the above are nice open-source implementations so you can go as
deep as you'd like. ;)

p.s. Most of the algorithms ESRI (and similar) use might not be spelled out in
blog posts, but they certainly are in textbooks. ;)

p.p.s. Out of curiosity, what have you been working on? ESRI tech is about as
anti-startup as one can imagine. It's like using Oracle for your startup's
database.

~~~
wheaties
Heh, I've been using ESRI since 9.0. Well, to be honest it's required by the
contract. I joined when it got bought out.

------
memetichazard
Interesting. I've been thinking about the 'Small World' problem for the
Facebook puzzles and came up with the quadtree concept, but I never even
imagined something like the geohash/Hilbert curve.

------
BearOfNH
FTA: _the next part of our position is 1, and the square we should consult
next is the third one._

Doesn't he mean the second one (again)?

This is a good article, but he really should have added an 8x8 illustration
and/or mentioned his coordinates start at (0,0) in the upper left. And maybe
picked a different point so the example doesn't end up going from the first
square to the second to (I believe incorrectly) the third.

