

Ask HN: Storing spatial data in SQL - Azuldolphin

What's the best way to store spatial data in SQL in a way where the query "find all points within x miles of point a" can be performed efficiently.<p>My initial approach is to store latitude and longitude, and create an index on both longitude and latitude. However, since I'd be querying both longitude and latitude with a range, only one of the indexes would be useful. Basically, the index could narrow down the search to one vertical bar of the entire earth, and then a table scan to find the locations within that bar. (Of course, clustering on latitude first would have the opposite effect: one horizontal band of the earth, and then within the band would be a table scan).<p>I know there are spatial data types in some of the SQL implementations, but can anything really ever be more efficient than that when doing a "find all points within x miles" type query? If so, how's this possible?
======
arnabdotorg
Postgres: <http://www.postgresql.org/docs/8.1/static/indexes-types.html>:

"PostgreSQL provides several index types: B-tree, R-tree, Hash, and GiST.
R-tree indexes are suited for queries on two-dimensional spatial data."

PostGIS: <http://postgis.refractions.net/> :

"PostGIS adds support for geographic objects to the PostgreSQL object-
relational database. In effect, PostGIS "spatially enables" the PostgreSQL
server, allowing it to be used as a backend spatial database for geographic
information systems (GIS)"

Also see "Programming the K-means Clustering Algorithm in SQL":
<http://pdf.cx/6gc9a>

------
geophile
To roll your own:

    
    
      - Interleave the bits of the coordinates to form a "z-value".
      - Index the z value.
      - "X miles around a" is a circle. Approximate with a bounding box if you'd like.
      - Generate 1-d search regions to yield a set of z ranges,
        and search the index using it.
    

For more details, look up z-order, or see books by Hanan Samet.

But a spatial index will probably do something reasonably close to this.

One big thing to watch out for: The spatial index or z-value index should be
clustering, so that the points you retrieve are packed tightly into pages. If
the index is not clustering, your index lookup will be fast but you'll pay a
lot of IO to bring in all the pages containing the qualifying points.

------
vyrotek
SQL Server 2008 has Spatial Support.

They also let you perform spatial operations/queries. You can even store GPS
'polygons' and perform queries such as find ones that overlap, find the
polygon which covers a specific point, etc. The spatial query support is
amazing.

Indexing the Geography datatype is also supported.

[http://www.microsoft.com/sqlserver/2008/en/us/spatial-
data.a...](http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx)

~~~
Azuldolphin
I'm hoping to use PostgreSQL (because I'm using Heroku;) ), but I read up on
how MS Sql Server implements spatial indexes. Basically, it recursively
divides the coordinate system into quadrants. Then, when looking for a range
of coordinates, it can figure out which quadrants are covered by that range
and efficiently query them using b-trees. Very cool. I wonder if PostgreSQL
spatial support works similarly...

~~~
nym
Use PostGIS!

Heroku supports it -> [http://www.mail-
archive.com/heroku@googlegroups.com/msg04556...](http://www.mail-
archive.com/heroku@googlegroups.com/msg04556.html)

------
joshu
The lazy way is to hash coordinates to tiles, and then index on the tile
identifier. To query a bounding box, calculate the tiles that would cover the
bounding box, and then query for those tiles.

------
nym
I recently built a site for finding vineyards, and used GeoDjango with
Postgres+PostGIS.

Setting up was a small PITA, but a bounding box query looks like this:

    
    
      footprint = Polygon(((ullon, ullat), (lrlon, ullat), (lrlon, lrlat), (ullon, lrlat), (ullon, ullat)))
      vineyards = Winery.objects.filter(loc__within=footprint)
    

If you're interested, check it out: <http://finelocalwine.com/>

~~~
Azuldolphin
Thanks a lot, that looks like it's along the same lines as what I'm trying to
do. How did you test this locally? Does Sqlite support this? Or did you set up
PostgreSQL on your development machine?

~~~
nym
Yeah, you have to set it up locally, but again, it's worth it.

------
thibaut_barrere
I've been using sphinx for georanking, on top of mysql. Works really well.

If you're using ruby, be sure to have a look at thinking_sphinx which provides
a great dsl to configure the index and query it.

Also MongoDB has a built-in geographical index.

I would not personnally roll my own here.

