The goal is to collect address datasets so that forward and reverse geocoding is an easier problem to solve. A contributor wrote an excellent overview of the project the other day:
I would be surprised if a simple list of addresses, even a very large one, is something that could be subject to copyright.
I assume you don't have a manual procedure to report and correct individual addresses that are in error, and the trick would be to contact the agency that provided the erroneous dataset, is that right?
It would be useful to have a way to find out "where did the data for this location come from, and who do I contact to correct it?"
For example, our house doesn't have an outline on your sample map, although there is a red dot there. Our street address shows up on the house next door, and the neighboring houses in that direction are shifted similarly.
Most other online maps for our location have the same error or similar. I think the confusion happened because we are on a corner and some years ago a previous owner changed our street address from one street to the other. So whatever agency maintains this data goofed something up in the process.
If there were an easy way to find out who this mystery agency is, then I could help them sort out this mixup. :-)
(In the US the data might come from someone who accepts updates from someone else, but it is usually the state, county or city
First, longitude wraps from -180 to +180 at antimeridian, meaning distance calculations will fail there; second, and I'd say more importantly, one degree longitude length in meters differs a lot depending on latitude; meaning this library will be heavily biased towards longitudal neighbors when using it for locations far from equator.
There is a python implementation available as well. http://pypi.python.org/pypi/geographiclib
from collections import namedtuple
def haversine_distance(origin, destination):
""" Haversine formula to calculate the distance between two lat/long points on a sphere """
radius = 6371 # FAA approved globe radius in km
dlat = math.radians(destination.lat-origin.lat)
dlon = math.radians(destination.lng-origin.lng)
a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(origin.lat)) \
* math.cos(math.radians(destination.lat)) * math.sin(dlon/2) * math.sin(dlon/2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = radius * c
# Return distance in km
LatLng = namedtuple('LatLng', 'lat, lng')
origin = LatLng(51.507222, -0.1275) # London
destination = LatLng(37.966667, 23.716667) # Athens
print "Distance (km): %d" % haversine_distance(origin, destination)
One question - is it OK to put an MIT license on something that is based on LGPL code? I don't know enough about how the LGPL works (I do know it is less "infective" than plain GPL).
Well two questions: python2, or python3?
Good question regarding the license. I'm not too sure about that. I'd appreciate it if someone could shed some light on it.
Regarding the version, I've only tested it on python2. I should add that in the README. Thanks!
Beyond that, everything regarding licenses look good if you ask me. MIT is both compatible if you modify the LGPL code or if you link with it, so one can use either of the two licenses.
(Regular expressions don't work well for this. Neither does starting from the beginning of the address. Proper address parsing starts at the end of the address and works backwards, with the information found near the end, such as country name and postal code, used to disambiguate the information found earlier.)
Though as the name suggests, it's only trained for US addresses.
I wrote a quick (500k lookups/sec) offline geocoder for Ruby: https://github.com/bronson/geolocal to comply with the silly EU cookie rules. It precompiles the statements you're interested in:
Would it be possible to use OpenStreetMap data?
(I work on a OSM geocoder, not offline but has a Python library http://geocoder.opencagedata.com/)
The sibling thread asked about using OSM data; it'd be awesome if street level OSM data is workable.
 - https://github.com/pelias
edit: We are only interested in knowing the nearest big city though