The biggest pain in running your own OSM geocoder (nominatim) is kind of a bear to set up. I imagine there docker images that could make things a bit easier and allow you to grab a region extract, but I could be wrong there.
On a side note, I was working on an open source geocoder based on OSM that was easy to build as part of a mapping util I've been working on (https://github.com/buckhx/diglet). I lost some traction on the geocoding side of things and have been focusing more on map building/serving, but did get a stable version for US addresses. There are lot's of interesting problems that come along with geocoding.
> The biggest pain in running your own OSM geocoder (nominatim) is kind of a bear to set up.
You said it shouldn't be scary then immediately call it a bear. I am scared of bears.
I'm founder of the OpenCage geocoder, we'll gladly work with anyone large or small, we have clients doing millions of requests per day. We use OpenStreetMap but also other open geo data sets. https://geocoder.opencagedata.com
One thing our app needs is a way to only get street address matches. Basically, I already know that an address is a street address or at least a very small side road. It's not an entire county or city, and never a very long road like a highway. Google lets me filter out such bogus results by returning a "bounds" field in the results. Using this bounding box I can calculate the area, and ignore anything less than 200x200m, regardless of the type of feature returned (which is sometimes a "route" even though it's very small). I see your API returns a "bounds" in the results, but it's not documented. Is that what it's for?
I'm also looking for a good autocompleter. Basically, if my app is about Montana, then the user should be able to type "M" and get "Missoula" as the top hit within 50ms. Not Mississippi or Montreal or anything outside Montana. It should also be possible to misspell names, type partial street addresses and so on. Google's geocoding API is useless for this since it doesn't do fuzzy matching. They have an API called Google Places, but it has restrictions and doesn't seem designed exactly for this stuff.
Places API autocomplete can be limited to only return results of type 'geocode', which will give you the autocomplete of just geocodable locations, and was designed for this purpose.
Generally, the geocoder's substring matching has limited range, which is why it is fairly useless for autocompletion. Its partial matching seems limited to misspellings or nearly-complete terms. For example, I just tried typing "mead", and asked the API for 10 results. I got 5 results, all things like "Mead Lane" or "Meade Ave", but no "Meadowood". So it stopped at 5 even though there are more sharing that prefix. If I type "mea", I get just _one_ match, about some random place with "Mea" in the name. In other words, it's doing something else (something more sophisticated, but less appropriate for this use case) than mere prefix/substring matching.
Again, it's been a while, but from the last time I looked, I recall the Places API being better for this type of incremental autocompletion, but it had some issues related to geofencing that made it almost as bad.
We don't offer an autocomplete.
The advantage of OSM geocoding is that, unlikely Google Geocoder, there are no terms of service barring you from using the data on a non-Google product or for storing the data for your own use and analysis.
Basically correct. However the OpenStreetMap licence (ODbL) may apply to geocoded data. So it might not be as simple as "no terms and conditions".
(which for many regions will have much better coverage than OSM)
I looked at hosting OSM myself but it seemed like a lot of work. Huge data files for the initial import and setting up daily increment jobs. Glad to see a managed service emerging!
But yeah, for the occasional geocode, running your own instance is overkill.
We also felt more comfortable with MapBox given Google's history of sudden api breaking changes...
Also, our requirements had to have the geocoding be very accurate. MapBox and Google's geocoding results came out to be pretty much identical. About every other service, both free and paid, were not accurate enough.
Our current config allows an import in 8 hours and responds within 20ms (not including network latency). It's not cheap though.
Just saw the company is Hyderabad based too, I always enjoy seeing new Indian startups on the radar!
Thanks for your wishes!
* Good free and premium paid service
* They've gone a bit above and beyond to provide examples/demos in all languages in their docs. Super easy for users
* Can process bulk/batch request super fast in parallel on your behalf
* Have a super cool front-end CSV upload tool built in React so non-programmers can geocode data in seconds
* Forward/reverse geocoding
* Also give Census data, congressional districts, state legislative districts, and even school districts
Full disclosure: I know the husband and wife founder. Very nice people
If geocoding needs are enterprise-grade / you are OK with spending a bit, you should look at Mapzen, OpenCage, and now, Geocodio.
Are you publishing your source code?
We're using Tiger/Line and Rooftop-level (through OpenAddresses) datasets under the hood. We do not use OSM as it's generally not optimized for geocoding.
Of course OSM is not a great source of addresses either, as it simply doesn't have data for so many regions.
"I'm at 50.00000N, 15.00000E (a GPS coordinate). How do I get to the Foo Bar in Baz City (a text input or selection), by public transit (a choice of transport modes)?"
- Reverse geocode "bus stop or train stop or tram stop near 50 N, 15 E" - "there's a bus stop named Xyzzy at 50.0012 N, 15.0003 E"
- Geocode "Foo Bar, Baz City" - "51 N, 14 E"
- Reverse geocode "bus stop or train stop or tram stop near 51 N, 14 E" - "there's a train station named Baz City Central at 50.99998 N, 14.001 E"
(plus routing and scheduling on top of that - but that is beyond the scope of geocoding, which is one part of the toolchain)
Other example: "I'm at 50 N, 15 E; get me a list of restaurants around here" (optionally: non-smoking, currently open - not sure if Nominatim directly supports filtering like that)
So say you have a textual address and want to know where it is located. You feed it into a geocoder and it returns a location.
The same database can often be used to go the other direction, taking a location and returning the name and other details about a place.
By the way, (again, at least for the US), the best (fastest, very accurate) geocoder I've ever used was created by Alteryx. I've always been curious if it's actually their own geocoder, or they are using another service in the background. (edited to add: though of course, this is for Alteryx's proprietary system, and though it provides decent ways to get the data in/out, it's not simply a plug and play system if you're writing your own software.)
ESRI's is one of the worst; relatively slow, not all that accurate and worst of all (at least this was the case) it'll choke on anything over 300,000 records.
I can't speak to the opinions of others, but for me your question is a lot like asking "what's the best programming language?" The only realistic answer is that it depends on your task. We're continually facing new customer requirements, and what one customer thinks is absolutely essential, the next guy couldn't care less about.
A good example is speed. For some clients every millisecond is critical (imagine real time bidding systems), for others they are running a batch process to geocode their database in the middle of the night and couldn't care less if it takes one hour or two. Likewise huge differences in requirements in terms of accuracy. Some clients will accept only perfection, meanwhile the next guy intentionally wants a vague answer so that consumer privacy is maintained. Then obviously there are big differences across countries, forward and reverse, etc, etc. Some clients must have the attributes that using an open data source like OpenStreetMap allows, others care only about price.
So there is almost certainly a perfect answer for your specific geocoding needs, but there is no perfect geocoder.
And I get that different users have different needs, but I'm still curious about the accuracy (it's geography after all, I don't care how fast the results are returned if they're wrong.)
And especially given the multi-data sets that OpenCage uses, how do you know that you're returning the right results? (obviously there is the spatial aspect, ie, within 100 yards of the true location; but I'm most curious about the percentage of returned address with a greater than 90% probability of being the "correct" address.) I wouldn't expect it from most geocoder services, but that's what "ground truthing" is for.And what happens if you happen to come across conflicting results when you're using the multiple data sets?
So again, all these new geocoders provide some nice services, but how are they measuring their accuracy of results? I could also shrink this question down to a business question, what makes your service better versus all the others? Who can prove to me that they provide the "best" (most accurate) results? (I'm not in the market, sorry, it's a hypothetical.)
Nevertheless, yes of course I get what you are asking. Fundamentally all geocoders rely on someone having verified the input data, be it a government surveyor, a car taking pictures that are then evaluated (by humans and/or image processing software) or an OpenStreetMap volunteer, etc. We are at the end of a long data chain and have to trust the inputs we get.
In my 20% time I'm working on a world map at 1:1 scale which will solve this problem, hoping to launch next quarter ....
And i've searched everywhere.
There's Nominatim, Data Science Toolkit and the Two Fishes geocoders.
All of this is built on open geospatial data including OpenStreetMap, Yahoo! GeoPlanet, Natural Earth Data, Thematic Mapping, Ordnance Survey OpenSpace, Statistics New Zealand, Zillow, MaxMind, GeoNames, the US Census Bureau and Flickr's shapefiles plus a whole lot more besides. Here's the full list of datasources. https://geocoder.opencagedata.com/credits
I am about to use this for a project so thought I would recommend the find when I saw this post.
Today I use the MapQuest API, and it's been stable and fine for the 2+ years I've used it, and the search has always been very good, intuitively selecting the right entity... i.e. London, UK rather than London, OT, and finding the right lat:lon for pubs and not getting confused by other places with similar names.
Make sure to pay close attention. MapQuest recently announced that they are going to discontinue their direct access OSM tile offering:
This is part of a ongoing reorganization of their offerings, it wouldn't be surprising if they flipped the switch on their other open data stuff.
i.e. the London example I gave.
For instance on locationiq.org
Department of Food Safety and Zoonoses (FOS) World Health Organization Avenue Appia 20
Does not work.
Department of Food Safety and Zoonoses (FOS)
World Health Organization Avenue Appia 20
World Health Organization
Avenue Appia 20. CH-1211 Geneva 27
Note that it places the statue in New Jersey. So it isn't properly handing the fact that Liberty Island is an exclave of New York (so most likely a bug).