Hacker News new | comments | ask | show | jobs | submit login
Redis Geo (matt.sh)
241 points by mnutt on June 24, 2015 | hide | past | web | favorite | 58 comments



Hello, I'm currently integrating, refactoring and fixing the original code, in the context of this branch: https://github.com/antirez/redis/commits/geo

So far the code was simplified and polished, fixed (it did not returned the right set of elements in all the cases), GEORADIUS speed was improved.

On the other side I removed all the JSON things and the Pub/Sub (IMHO is interesting to subscribe to areas to see objects entering/exiting them, not much to position updates since it's the same of doing PUBLISH+GEOADD from the client side).

Added: GEOHASH to return a Geohash standard string from of a given element, mainly for interoperability. I'm also adding GEODIST (to calculate distances) and GEOPOS (to retrieve latitude/longitude of objects without the need to query for radius).

The work will be available in the context of Redis 3.2, unless I get mad and merge back into 3.0, which is unlikely but not impossible.

EDIT: before of three days ago I was not aware of Geohashing at all, so if you have five mins to check the code, I'll be glad :-) Thanks. I'm pretty confident that it's ok now after I added a fuzzy test that replicates the feature (by brute forcing distance computation) in Tcl that checks against the Redis output, but still...

EDIT2: wanted to add that I'm very glad Matt efforts are not going to be wasted and we are finally merging this.


Redis is damn cool. As an effort to better understand it and after reading through the Twitter in Redis example, I decided to see how much of a graph database I could create using it and Python (https://github.com/emehrkay/rgp)

Dynamic Redis seems like a decent project and would make hacking core functionality fun and simple.


Loading plugins into Redis, a-la Dynamic Redis approach or otherwise, has long been discussed in the Redis community. Presently, antirez prefers not to destablize Redis (and thus damage its solid stability reputation) by allowing external code be run in it.

So, despite the obvious perks that we'll enjoy by side-loading our own junk, antirez's "Quality or DEATH" approach on this matter leaves no open questions about the possible inclusion of this functionality. There is, however, another way to get code running in Redis... it is called a GitHub Pull Request :) If you've thunk up something really amazing/useful and coded it, drop by the redis-dev mailing list to discuss it, maybe do an RCP and submit your code. No one promises that your PR will be accepted, but even if it won't make its way into the core, anyone will be able to use it nonetheless.

Another point worth mentioning in this context is that in order to even contemplate such a mechanism, Redis' core will need heavy refactoring of the internal API. This is by all accounts a huge effort that, if taken by the developer, is likely to postpone a lot of other important work.


As someone outside to redis development, I'd like to know: is Dynamic Redis that more performant than good ol' lua scripts that it's worth introducing potential incompatibilities between redis instances deployments ? I stumbled upon the old discussion about adding a scripting language to redis, and the number 1 requirement was that additional code should be "migratable" very easily from instance to instance.

A related question: is there some kind of repo for "useful lua scripts that everybody could use but didn't make it in core redis (yet)" ?


Depends a lot on the kind of task at hand. For many things Lua is very fast, but not for implementing something like geocoding. On the other hand Dynamic Redis does not offer a proper API to Redis (I would be against plugins even if it was better in this regard, btw), so the assumption is that modules manipulate Redis internals in order to do anything remotely advanced, and you can't change a Redis internal without breaking modules, a constraint that we currently don't have at all: if the exported API is the same, we can improve Redis internals anytime without backward compatibility issues. This was crucial in the history of Redis.


It's really cool to see more projects tackling location problems. The API is very simple which is another great plus.

Had Redis Geo been around at the time we replatformed at Hailo we would have given it some serious consideration. But instead we have developed our own solution and open sourced a library in Go called "geoindex" to store and retrieve geo information: https://github.com/hailocab/go-geoindex .

It's part of our core functions within our microservices stack. You can read more about it on our blog too: https://sudo.hailoapp.com/services/2015/02/18/geoindex/


This is unrelated, but I've been hearing great things about the readability and teaching ability of the Redis code for people interested in learning good C practices.

I knew/know C but have been doing high level programming for years. Would anyone suggest if I wanted to visit this source tree, I check out an earlier version (like 1.x) or just jump straight to latest stable or master?


The latest stable branch is an amazing show of readability. It's very simple to follow, and very easy to hack at if you want to try and manipulate structures to how you're wanting them.

10/10 would read through again.


I'm also curious in this if anyone has an answer.


This is awesome. I've been using Redis -- naively, with hashes -- as a lat/lon cache for some path analysis on my website. Didn't even know this was out there, and can't wait to switch over!


Indeed - and as we speak antirez is busily refactoring, merging, testing and extending it in Redis... happy-meter is off the scales here.



Would love to know how this performs compared with analogous PostGIS operations on a similar sized set of geos.


It is a little bit apples and oranges. This is a simple point-in-plane index, PostGIS is a full-blown geospatial data modeling capability. PostGIS is going to be much slower but it is also doing much more as part of the process so that it is useful for sophisticated spatial applications. PostGIS is not fast relative to comparable systems but this particular comparison is not fair because PostGIS is expected to do so much more.

That said, I have also implemented large-scale point-in-plane indexing systems, though not in a while. Per CPU core, you can encode and index around 20M geo-points per second if the implementation is good.


We're using Redis Geo to track customers who are "online" so that we can easily retrieve a list of users who are online and in the area. The user's permanent location is stored in a SQL database for permanent storage.


How do you deal with locations close to boundaries? Isn't that still a problem with comparing geohashes for proximity?


This is handled by the Redis implementation (and most others AFAIK). So it returns the right items regardless of the query position or the items position. Redis has tests in place for this.


It's great to see you (jandrewrogers) of SpaceCurve reply to correctly to the question! Your blog post on why Geospatial Databases are Hard has long influenced me and a handful of others I've run into thinking about all of this.

Geohashed (X+Y) points in Redis is only peripherally related to actual geospatial database concerns, particularly big-data or ("NoSQL") spatial database concerns - how do you efficiently index anything other than point data (polygons, multi_polygons), or ask spatial queries (e.g., nearest neighbors) is not even close to being addressed.

Blog post here for reference: http://www.jandrewrogers.com/2015/03/02/geospatial-databases...


i can only ask that this extension requires longitude and latitude in the correct order: x,y.


Somebody opened an issue about this too. While in the GIS world apparently this is the standard, outside the GIS world everybody talks in terms of latitude,longitude (y,x). So I'm not sure what to do. I suspect the majority of the users were not exposed to GIS systems, and that the GIS exposed users will check the parameters better, so I'm inclining in taking y,x in favor of the average person that like me thinks latitude,longitude is normal.


the issue is that you are now entering a different sector of technology, where there are already standards in place, and those standards are x, y, z ordering:

* mongodb geospatial extensions - http://docs.mongodb.org/manual/core/geospatial-indexes/

* postgresql geospatial extensions - http://postgis.net/docs/manual-2.1/

* geocouch - https://github.com/couchbase/geocouch/wiki/Spatial-Views-API

so, it isn't just GIS users who are doing this, it is also the convention across databases that support geo functionality.

i hope that you reconsider this, and don't go against what is both standard by committee and standard by implementation.


Oh if it's a standard among implementations of databases consider this done. It will not be funny for people migrating from the module to the Redis implementation, but I can't be backward compatible with API that never entered Redis official code base after all... Thanks for pushing me into the right direction with your arguments.


thanks! i know that it is going to be harder for those migrating, but as they move into a more location-driven world, they'll ultimately be happier i think.


Damn, you're right about GIS databases. In my apps I always used lattitude first, as was taught in geography lessons, but you've convinced me to move from lattitude endian to longitude endian. Two choices - too many choices...


it gets worse. there are tons and tons of spatial references, and they can all be defined differently.

see: http://spatialreference.org

oregon, for instance, has two official state spatial references (if top of memory is correct).


Spatial References I can understand - when you want to create a map of a single country/region, a custom projection centered specifically around that region can provide a more accurate approximation of equal-distance points, parallel lines etc, something that is not possible with a worldwide projection.

Note that Redis Geo uses the infamous Web Mercator, used also by Microsoft Maps, Google Maps, OpenStreeMap, MapQuest, etc. but apparently rejected by the standards body EPSG with the declaration: "We will not devalue the EPSG dataset by including such inappropriate geodesy and cartography." Ouch. [1]

[1] https://en.wikipedia.org/wiki/Web_Mercator


But now accepted as EPSG 3857.

I always found their argument a little specious given that EPSG 4326 exists.


It is there: EPSG:3857


There is nothing that says "the standard is longitude,latitude". If you want to be picky, most implementations say "easting,northing" and in a geographic coordinate system (e.g. EPSG:4326) that means longitude,latitude. There are so many "standard" coordinate systems out there (see https://github.com/mbostock/d3/wiki/Geo-Projections and http://spatialreference.org/) and many times the units are different (e.g meters, feet, etc using a cartesian coordinate system).

To be fair, if you look at most geo APIs (e.g. GMaps) and you just blindly copy values around without paying attention, you are more likely to be right if you were copying [y,x]. Mind you I say most and not all, since there are very popular geo libraries (e.g. Leaflet) that use [x,y] instead.

I would just pick something, make sure it is well documented and clear and just go with that.


Dynamic Redis (which is a Redis Fork that lets you add modules to it) is a great find! Especially with the Geo module added.

I needed a limit parameter for the Georadius method because i wanted a result set sorted by distance in a large radius, but not have thousands of results send from the redis server to my webservers. Anyone interested in such a parameter can use my little fork for the Geo module:

https://github.com/wouter33/krmt


Good idea, I'll add LIMIT. Makes sense to say I want N nearest elements with max distance of Y.


Great Salvatore! Could you make the max distance parameter optional? Sometimes i just want the nearest items, even though they could be on the other side of the world.


I feel like that is a fairly unusual usecase; plus you could just set the max distance parameter to the diameter of the Earth.


One of our applications has user generated content with a geolocation attached to each of the items. For several selections we want to show the top 10 nearest items. If they are only a few, the radius has to be very large to have some items in the result set.

Currently the Geo module only supports a range up to 20.000 km. So your idea does not work at the moment. I agree that those items are not really "nearby", but to be consistent through out the application it would be nice to be able to select without a max distance.


I think the redis convention would be required and -1 gives you everything.


I've got a Docker image for Dynamic Redis that includes all of the standard modules, including geo.so: https://registry.hub.docker.com/u/urlgrey/dynamic-redis/

Github repo: https://github.com/urlgrey/docker-dynamic-redis


We've been using this since day 1 at our startup (well, at this point it's more like early-stage free-time group project). Very happy to see support from antirez


I never knew that redis can be extended for geo. This is really cool.

I researched geodb a while back and I only found PostGIS but I recently found out that elastic search also supports geo features.

Shameless plug - This is the simple API I built on top of PostGIS https://github.com/moo-mou/GeoGo


Very interesting. We could have used this ourselves, but ended up writing a way (circa 2010) for Postgres to be queried by MGRS (Military Grid Reference System: https://en.wikipedia.org/wiki/Military_grid_reference_system)


I wrote an introductory article to the new Geo API. Comments are welcome :) http://cristian.regolo.cc/2015/07/07/introducing-the-geo-api...


I have a question that I haven't seen address in the article. Does the code always assume the "sphere" is Earth or can you provide geometric information for other geometries?


It assumes the Earth is a sphere, also the code is not designed to work well towards the poles, at latitudes less or more than -/+ 85.


I work on a system that uses GPS data to figure out some metrics about roads. At that scale, the spherical model was not a useful approximation. It's been useful for drawing on top of Google Maps, but that's obviously because Google Maps chose it to keep straight lines rectilinear. The spherical model is a visually convenient model, but I don't think you can really do any serious calculations with it.


Redis Geo commands are mainly intended to solve the specific problem of indexing points into the earth and querying by radius. In this setup, the error we have currently is around 0.5% because Earth is not actually a sphere, but the good news is that fixing it should be just a very simple matter of using a more accurate distance formula: http://www.movable-type.co.uk/scripts/latlong-vincenty.html

Since anyway the bounding boxes we calculate and the resulting areas we scan are larger than needed, so all the actual precision is in the filtering function that removes points outside the radius.

TLDR: It will be trivial to fix that later in a backward compatible way. Btw adding an item in my TODO in order to check if I can do it right now without compromising performances.


Having done both Spherical Mercator and Universal Transverse Mercator in my own code, UTM is something like 25 times as many arithmetic operations as Spherical. That's not to say I have the most ideal implementation, though.


I think the parent post was asking if you could swap out the default sphere for another. Like, Mars. That way this could be used for geo queries on any planet.

Or is it just assuming that the earth is a perfect sphere?


The latter. Very few people need precise geolocation on Mars or any other non-Earth planet. But planets are not perfect spheres (mainly due to rotation), so treating them as such introduces errors.

If you're curious about the topic, you can start here: https://en.wikipedia.org/wiki/World_Geodetic_System.


Hell yeah ! I've been waiting for this to be integrated into Redis.


Some can ELI5 redis for me? And also this extension?


Redis is an in-memory "data structure server", in that besides simple key-value (think memcached) storage you can also have sets, lists, sorted sets, hashes and more. See redis.io.

This extension adds geo-specific commands and storage to the list of operations supported.


Interesting! Thank you for your answer. I have previously heard that people use that to do cache server for websites. How does that use case is covered by reddis? Also, what are other common use cases for reddis? Thank you.


What reading have you done of http://redis.io/ ? What search terms have you used on your favourite search engine?

People here are generous with their time, skills, and knowledge, but your initial question and your followup makes it sound like you've not even tried to educate yourself.


As I replied to chriswgt, I am not able to see any real use cases. All I see are features that look cool but no use case.


Steve, you're probably talking to a bot, designed to collect some karma points, to be used for upvoting/downvoting some important article or comment in the future.


Ahaha lol whut?! Seriously, I'm asking questions because all I see are features but no real use cases that I could say that Reddis is useful. I think you may be a litle too defensive.


even though your account is only 5 hours old, and has only posted on this thread, i'm going to give you the benefit of the doubt.

some uses for redis, as we use it:

* pubsub (publish/subscribe) - push information through to other servers on channels

* job queuing - redis supports lists, which can easily become job queues

* statistics - redis supports incrementing and decrementing keys very quickly and easily, allowing for real-time statistics gathering and reporting

hope that helps.


> hope that helps

Yeah thank you. That looks like a message bus, but with many more features than just a bus.

My account is only 5 hours old because I created it to ask my question. I am relatively here and I was just lurking before.


that's not including the very fast caching - so much much much more than just a bus :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: