
A fast, offline reverse geocoder in Python - jp_sc
https://github.com/thampiman/reverse-geocoder
======
yellowbkpk
If you're interested in making both forward and reverse geocoding better,
please consider paying attention to a project I started and help maintain
called OpenAddresses:

[http://openaddresses.io](http://openaddresses.io)

The goal is to collect address datasets so that forward and reverse geocoding
is an easier problem to solve. A contributor wrote an excellent overview of
the project the other day:

[https://medium.com/colemanm/creating-an-open-database-of-
add...](https://medium.com/colemanm/creating-an-open-database-of-
addresses-73a7d0dc24c5)

~~~
mvexel
It's a nice overview but it glosses over the fact that while the OA database
is governed by a CC0 license, the individual address collections in the
database are still governed by their own licenses which can be (much) more
restrictive. The fact that you can download the data doesn't mean you can use
it the way you want. The OA web site hints at this but doesn't address (ah-
hah) that underlying problem. That doesn't mean that OA is not valuable -
quite the contrary - but I think the fact that it's presented as one big free
and open dataset can be misleading.

~~~
gabemart
As the (dead) sibling comment points out, my (non-lawyer) understanding of US
copyright law is that simple collections of facts, when compiled in a way that
requires no creativity, do not enjoy any copyright protection at all.

I would be surprised if a simple list of addresses, even a very large one, is
something that could be subject to copyright.

------
bzz01
You can't really use KD trees with lat/lon coordinates, at least you can't use
euclidean distance there for nearest neighbor search.

First, longitude wraps from -180 to +180 at antimeridian, meaning distance
calculations will fail there; second, and I'd say more importantly, one degree
longitude length in meters differs a lot depending on latitude; meaning this
library will be heavily biased towards longitudal neighbors when using it for
locations far from equator.

~~~
shawn-butler
I can recommend the excellent Geographic lib::Geodesic for this having used it
in the past.

[http://geographiclib.sourceforge.net](http://geographiclib.sourceforge.net)

There is a python implementation available as well.
[http://pypi.python.org/pypi/geographiclib](http://pypi.python.org/pypi/geographiclib)

~~~
thampiman
Thanks!

------
natch
Kudos for a very well done README (and it's not just cribbed from the original
project, it explains the new stuff very well and tells what the project is,
and gives credit back). So many projects neglect the README.

One question - is it OK to put an MIT license on something that is based on
LGPL code? I don't know enough about how the LGPL works (I do know it is less
"infective" than plain GPL).

Well two questions: python2, or python3?

~~~
thampiman
Thanks for that comment!

Good question regarding the license. I'm not too sure about that. I'd
appreciate it if someone could shed some light on it.

Regarding the version, I've only tested it on python2. I should add that in
the README. Thanks!

~~~
stared
As of now, for Python 3 it does not work, but it seams that fixes are not
hard: [https://github.com/thampiman/reverse-
geocoder/issues/2](https://github.com/thampiman/reverse-geocoder/issues/2)

~~~
thampiman
UPDATE: I've just released v1.2 which supports Python 3. For details:
[https://github.com/thampiman/reverse-
geocoder](https://github.com/thampiman/reverse-geocoder)

~~~
natch
Sweet!

------
Animats
While we're on this subject, is there a good, free street address parser that
will work for at least the US, Canada, UK, and the major EU countries? I've
tried most of the available ones, and they can parse about 90-95% of business
addresses.

(Regular expressions don't work well for this. Neither does starting from the
beginning of the address. Proper address parsing starts at the end of the
address and works backwards, with the information found near the end, such as
country name and postal code, used to disambiguate the information found
earlier.)

~~~
unclesaamm
The most principled approach I've seen on this is at
[https://github.com/datamade/usaddress](https://github.com/datamade/usaddress).
They use tagged training data and conditional random fields. I haven't seen
comparisons with other systems, but it's worked well enough for my projects.

Though as the name suggests, it's only trained for US addresses.

------
mcbetz
Very good companion for Geocoder -
[https://github.com/DenisCarriere/geocoder](https://github.com/DenisCarriere/geocoder).
Glad to see Python getting more geo libraries for Non-GIS users.

~~~
dheera
Are there any offline Geocoders that work for the whole world, even if not
free? Nominatim doesn't work for a lot of Asian addresses.

------
bronson
Very impressive, I'll be looking closer at K-D trees.

I wrote a quick (500k lookups/sec) offline geocoder for Ruby:
[https://github.com/bronson/geolocal](https://github.com/bronson/geolocal) to
comply with the silly EU cookie rules. It precompiles the statements you're
interested in:

    
    
        Geolocal.in_eu?(request.ip)
        Geolocal.in_us?('8.8.8.8')
    

Glad to see that my lib has a role model if it ever grows up. :)

------
zetahunter
Awesome, one more thing that can be made standalone instead of using google
maps service.

~~~
thampiman
Thanks! I'm the developer of this library and I hope you find it useful.

------
sandstrom
Looks really interesting!

Would it be possible to use OpenStreetMap data?

[http://planet.openstreetmap.org/](http://planet.openstreetmap.org/)

~~~
mtmail
OSM data doesn't contain an easy way to find the top 1000 cities. You'd end up
with 100.000s. Looking for wikipedia tags, population (which often comes from
Wikipedia) and 'admin' tags might be a good guide.

(I work on a OSM geocoder, not offline but has a Python library
[http://geocoder.opencagedata.com/](http://geocoder.opencagedata.com/))

------
nickstefan12
Nice! Shameless plug for a SQLite no network geocoder that uses (I believe)
the same text files to seed everything. [https://github.com/NickStefan/no-
network-geocoder](https://github.com/NickStefan/no-network-geocoder)

------
pjkundert
On a related note: An efficient geolocation encoder/decoder with error
correction using Reed-Solomon. 3m accuracy with error correction in 10
symbols. 20mm accuracy with 5-nines certainty in 15 symbols:

[https://github.com/pjkundert/ezpwd-reed-
solomon](https://github.com/pjkundert/ezpwd-reed-solomon)

------
dexterbt1
Starred. We're currently using nominatim + osm data + postgis on our own
hosted servers. Can this be a good alternative?

~~~
thampiman
I should think so. I've tried nominatim/osm data but it took forever to query
a large set of coordinates. I was only interested in knowing the nearest city
and admin regions 1/2\. And this library is really fast... ~20s to lookup 10M
coordinates on my MBP. If you'd however like to know the full address, then
this is maybe not a good idea.

~~~
dexterbt1
That's fast. Though yes, our use case involves reverse geo of full street
level addresses. Currently we do several 10s to 100s of req/s on nominatim.

The sibling thread asked about using OSM data; it'd be awesome if street level
OSM data is workable.

------
alexcroox
This is great, does anyone know of a js version? I'm currently using
[http://nominatim.openstreetmap.org/reverse](http://nominatim.openstreetmap.org/reverse)
in my Node app but I'd rather not rely on a 3rd party, especially under heavy
load.

~~~
thampiman
Check out this library for Node [https://github.com/tomayac/local-reverse-
geocoder](https://github.com/tomayac/local-reverse-geocoder)

~~~
tomayac
Thanks for posting this. Would appreciate pull requests and feature requests
or simply general feedback if you use this in practice.

------
thecodemonkey
This is super cool! Shameless plug. If you're looking for street-level reverse
(or forward) geocoding, we offer[1] a super affordable API and CSV upload
tool.

[1] [http://geocod.io](http://geocod.io)

------
kelukelugames
Hello, I read a little bit about geocoding on wikipedia but was hoping to
learn more. Is a good beginner guide on geocdoer/geocoding?

