
Damn Cool Algorithms: Quadtrees and Hilbert Curves - emson
http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves
======
jrockway
This might be obvious to many, but spatial indexes are good for more than just
"show me the restaurants within five miles of me". It's a good technique to
know for efficiently searching _any_ two columns simultaneously. If you have,
for example, a database of people, their hometown, and their favorite color,
and you want to know all people from Chicago that like green, a spacial index
over color and hometown will be vastly more efficient than a "regular" single-
column index over any one of those columns. (The reason is that an index on
one column doesn't efficiently produce a small resultset. If we index on color
and there are five colors, an index over color only reduces the dataset by
1/5th. Similarly, indexing over hometown might reduce the size of the dataset
by 1/1000th. But if we index over both columns at the same time, then the only
results that the index will ever produce are results that are the exact answer
to our question. No unnecessary scanning!)

Anyway, lat/long might be the easiest "pair" of columns to imagine when
discussing anything "spatial", but most other two-column pairs work the exact
same way.

~~~
tom_b
Yes - I remember using these techniques addressing 2d dense matrices. The
specific case was when the matrices involved grew so large that the linear
access operations were causing page faults for sequential row or column
access. This was a perfect example of pathologically bad memory accesses when
data was stored in row-major or column-major layouts.

Very impressive speedups for these interleaved data layouts. The end goal was
to embed these layouts into the compiler tools, so that the programmer would
do normal two dimensional matrix allocations and would get the optimized
layout based on analysis.

I wrote terrible C to accomplish these layouts and minimize the instruction
counts.

------
Smerity
Whilst I was an intern at Google I worked with Hilbert Curves. Range queries
are something BigTable and Cassandra excel at and are the core database
component of any Hilbert Curve based spatial index.

The real issue is building the database queries. Google released Uzaygezen[1],
a Java library for multi-dimensional indexing, that handles the majority of
this for you.

[1]: <http://code.google.com/p/uzaygezen/>

~~~
manmal
I did not get this in the article - how exactly do we construct range queries
(which the Hilbert curve improves on) here? Is the quadtree range resolved
into actual coordinates, or is the datatype of the index bound to be integer,
so we can query a range over the index (0110110 to 0110111 = 54 to 55, so the
query range is [54, 55])?

------
robrenaud
This technique is simultaneously cool and depressing. Remember all that cool
stuff you learned in computational geometry? Throw it out.

Just associate geometry with a set of well chosen ids, and you can stuff it
into really simple, conventional data structures and do all of the spatial
operations you wanted to quickly and simply in practice.

------
willvarfar
I'm a bit confused. I've naively put axis-aligned bounding-boxes (AABBs) in my
octree without any kind of thought. I put them in the smallest box that
contains them. I also keep a byte using bits to mark the sectors they
intersect, a cache that greatly cheapens things.

It works very well. I even keep small things like bullets in my octree (most
game engines keep them separate), putting them in an AABB representing their
position for some duration.

I am doing my queries in nanoseconds.

Its worth saying that for big worlds (think minecraft++), octrees don't work
as well as zoning systems e.g. pages. You can of course use a hierarchy, using
pages or zones with octrees in them.

------
jasonkolb
Does anyone know if there's a commercial database using quadtrees or hilbert
curves? (Aside from the libraries mentioned in the comments, I mean something
more productized)

~~~
abbotao
Oracle used to use a similar fractal curve system for their spatial index, but
has since moved to R-trees.

I believe MongoDB is using a geohash system for their spatial queries as well.

------
rcfox
Thanks, I had never heard of geohashes before, but it sounds like they might
really help with a problem I've been having lately.

------
manmal
How would someone implement this, then? Is it really just an index column
besides the data tuples, containing the recursive quadrant position (111111
for bottom-right in a 3-level grid)? And, what datatype would you use for this
- clearly, an integer is not long enough if you have a great number of
levels..

~~~
manmal
Ok I just wrapped my head around it. For querying to work here, the datatype
HAS to be integer or long, or else queries on higher levels would not return
the tuples associated with lower-level-quads.

------
philf
The Wikipedia description of Geohashes is a bit odd in that its base32
decoding uses the alphabet with some letters (a, i...) left out instead of
simply using [0-9a-v]. Anyone know why that would make sense?

