If you're interested in making both forward and reverse geocoding better, please consider paying attention to a project I started and help maintain called OpenAddresses:
The goal is to collect address datasets so that forward and reverse geocoding is an easier problem to solve. A contributor wrote an excellent overview of the project the other day:
It's a nice overview but it glosses over the fact that while the OA database is governed by a CC0 license, the individual address collections in the database are still governed by their own licenses which can be (much) more restrictive. The fact that you can download the data doesn't mean you can use it the way you want. The OA web site hints at this but doesn't address (ah-hah) that underlying problem. That doesn't mean that OA is not valuable - quite the contrary - but I think the fact that it's presented as one big free and open dataset can be misleading.
As the (dead) sibling comment points out, my (non-lawyer) understanding of US copyright law is that simple collections of facts, when compiled in a way that requires no creativity, do not enjoy any copyright protection at all.
I would be surprised if a simple list of addresses, even a very large one, is something that could be subject to copyright.
I assume you don't have a manual procedure to report and correct individual addresses that are in error, and the trick would be to contact the agency that provided the erroneous dataset, is that right?
It would be useful to have a way to find out "where did the data for this location come from, and who do I contact to correct it?"
For example, our house doesn't have an outline on your sample map, although there is a red dot there. Our street address shows up on the house next door, and the neighboring houses in that direction are shifted similarly.
Most other online maps for our location have the same error or similar. I think the confusion happened because we are on a corner and some years ago a previous owner changed our street address from one street to the other. So whatever agency maintains this data goofed something up in the process.
If there were an easy way to find out who this mystery agency is, then I could help them sort out this mixup. :-)
You can't really use KD trees with lat/lon coordinates, at least you can't use euclidean distance there for nearest neighbor search.
First, longitude wraps from -180 to +180 at antimeridian, meaning distance calculations will fail there; second, and I'd say more importantly, one degree longitude length in meters differs a lot depending on latitude; meaning this library will be heavily biased towards longitudal neighbors when using it for locations far from equator.
Kudos for a very well done README (and it's not just cribbed from the original project, it explains the new stuff very well and tells what the project is, and gives credit back). So many projects neglect the README.
One question - is it OK to put an MIT license on something that is based on LGPL code? I don't know enough about how the LGPL works (I do know it is less "infective" than plain GPL).
If its based on LGPL code, then any original code need to retain it's license.
Beyond that, everything regarding licenses look good if you ask me. MIT is both compatible if you modify the LGPL code or if you link with it, so one can use either of the two licenses.
hi, I am the author of the original library, which uses the LGPL license. My reason for using LGPL was so people would be obligated to share their modifications, so I would expect this is not compatible with MIT.
While we're on this subject, is there a good, free street address parser that will work for at least the US, Canada, UK, and the major EU countries? I've tried most of the available ones, and they can parse about 90-95% of business addresses.
(Regular expressions don't work well for this. Neither does starting from the beginning of the address. Proper address parsing starts at the end of the address and works backwards, with the information found near the end, such as country name and postal code, used to disambiguate the information found earlier.)
The most principled approach I've seen on this is at https://github.com/datamade/usaddress. They use tagged training data and conditional random fields. I haven't seen comparisons with other systems, but it's worked well enough for my projects.
Though as the name suggests, it's only trained for US addresses.
Have you tried Nominatim? I've found it to work for most of the above. I'm only disappointed it doesn't work well for developing countries, particularly a lot of Asia.
Very impressive, I'll be looking closer at K-D trees.
I wrote a quick (500k lookups/sec) offline geocoder for Ruby: https://github.com/bronson/geolocal to comply with the silly EU cookie rules. It precompiles the statements you're interested in:
OSM data doesn't contain an easy way to find the top 1000 cities. You'd end up with 100.000s. Looking for wikipedia tags, population (which often comes from Wikipedia) and 'admin' tags might be a good guide.
On a related note: An efficient geolocation encoder/decoder with error correction using Reed-Solomon. 3m accuracy with error correction in 10 symbols. 20mm accuracy with 5-nines certainty in 15 symbols:
I should think so. I've tried nominatim/osm data but it took forever to query a large set of coordinates. I was only interested in knowing the nearest city and admin regions 1/2. And this library is really fast... ~20s to lookup 10M coordinates on my MBP. If you'd however like to know the full address, then this is maybe not a good idea.
For reverse geocoding only as it does not seem to do geocoding proper. Pelias[1] might be a better alternative once they simplified the install process. I had to "reverse engineer" (aka read and understand) their chef cookbook as a Vagrant was not an option form me. Not that complicated but time consuming when don't know Chef.
This is great, does anyone know of a js version? I'm currently using http://nominatim.openstreetmap.org/reverse in my Node app but I'd rather not rely on a 3rd party, especially under heavy load.
This is super cool! Shameless plug. If you're looking for street-level reverse (or forward) geocoding, we offer[1] a super affordable API and CSV upload tool.
http://openaddresses.io
The goal is to collect address datasets so that forward and reverse geocoding is an easier problem to solve. A contributor wrote an excellent overview of the project the other day:
https://medium.com/colemanm/creating-an-open-database-of-add...