Currently, the company I work for uses Google for geocoding and we have 1.1mil a day which ends up costing around $5k (22%) per year more than these folks... but! It includes international geocoding, google maps, etc.
Simply using census data to Geocode US addresses is easy; and there are directions how to do it here in the comments... but setting up Nominatim (from open street maps) is a serious amount of effort (and not cheap for a 32GB server) but /is/ capable of global level geocoding.
One great use case for this service though: using mapbox, which is currently forbidden by Google's TOS...
While I'm stoked to see competition in this space, I wish the competition was a bit more robust (but everyone has gotta start some where, right?)
I hope you all continue forward with this, and hopefully add international capabilities as well as price drops. I for one would do away with your free offer altogether as the free users ROI will probably always be an expensive crap-fest and allocate those resources to driving the price down for your paying customers.
If/when you all can do ~1.1mil international geocodes per day for less than $10k a year, LET ME KNOW! :)
Building off of what's above, Geocodio is intended to be accessible to developers who don't have $10k to drop on geocodes. We found that this is a big need in the community (and for ourselves for our other projects). All of the other non-major-mapping geocoding services we found, including CSV upload instead of API, were more expensive than $0.001/each (oftentimes much more -- $0.25+)
Also, we don't have limitations to how you use the data. No requirements that you use a specific brand of map with it, no attribution requirements, etc.
We priced it at this point, with a free tier, so that people can give it a try first. No, our data isn't quite as good as Google's -- we get about 90% of addresses within 1 mile, and most within a tenth of a mile -- and we want people to be able to play around with the service and get to know it before they have to give credit card info.
With that said, we definitely plan to continue improving the product and add international support.
PS. We are HUGE fans of Mapbox, so we're pretty excited that you listed that as a potential use case :)
As mentioned in our FAQ  we do indeed provide special pricing and capacity for high-volume users, we would definitely be able to match Google's pricing by far.
How does the accuracy (as well as address parsing capabilities) compare to the completely free solutions such as Nominatim or DSTK?
Both services provide capabilities for local installations, obviously with no query limits and minimal latency.
We have mostly been running tests against the Google Maps API, and from a totally random sample of 100 address, 90 of them were within a mile from the Google Maps API returned location (Most of them were actually within 0.01 mile).
I'm not sure how we would compare to OpenStreetMaps and Data Science Toolkit since our data source is different (US Census Bureau). - But the obvious reason why we provide this as a SaaS, is that you don't have to host anything yourself, or juggle around with gigabytes of boundary data. We handle all the mess.
We ended up using that as a base and then making some customizations for our US-based geocoding solution. As these guys are figuring out, there's no great int'l option. Google is bad from a licensing perspective (but their tech is fantastic). MapQuest is great but can get really expensive. We've had decent luck with TomTom I think, but if I remember correctly there are a lot of caveats.
I rewrote the geocommons geocoder in Java to speed up the loading and geocoding process, and wrapped a REST api around it. I used a minimal perfect hash function to map zips/streets (metaphone3'd and ngramfingerprint'd) to data stored in a key-value structure. The key-value structure is small enough to fit in memory of a decent sized EC2 instance, but I haven't tested the throughput except from a slow disk--which got me about 100-150 results/sec.
The results include parsed address, lat/lng in WGS84 datum, and associated US census region info (state, county, block group, block, msa, cbsa/csa, school district, legislative district, etc.).
I'd considered open sourcing it, and I was trying to architect it such that one could plug in various data sources beyond TIGER when higher-accuracy info is available (e.g., from SF's address parcels, Massachusetts has lots of E911 parcel data available, etc).
The state of the art of open source geocoder would be TwoFishes: https://github.com/foursquare/twofishes written in Scala and developed and used by FourSquare
I also wrote a primer explaining the basic geocoding ideas:
The biggest problem we've had is changing non well-formed addresses / ambiguous addresses into canonical addresses with lat/lng. Google Maps wins on that front.
setting it up was quite a pain because they don't use semantic http codes, and I had to play with it a lot to handle their undocumented error codes (they store it inside body.info.statuscode). Good to read that you return semantic http codes.
If you want to differentiate from the competition, I would suggest that you improve the address parsing and support more patterns. Think of us having to geocode user-typed location fields from twitter. Enjoy it :)
Would actually be neat though for that to be a quick demo of your software.
When people read "$0.001 each" they sometimes understand it to be one thousandth of a cent rather than one thousandth of a dollar.
Even though you are completely correct/accurate, people find it confusing (1).
Wouldn't it be clearer to say "1 cent fore every ten uses" (or "10 calls for a cent" or "a tenth of a penny per call")?
Admittedly, your audience is semi-technical, and should parse it correctly, but why not simplify it?
So, if, as you admit, the service's audience is semi-technical, and if the average semi-technical person's brain works like me (big assumption, I know), I would argue that they should stick with $0.001.
- http://geocod.io/contact/ says DC but shows me a map centered somewhere south of Topeka.
- Random $0.02 suggestion: stop using "ridiculous".
Are you close to Topeka?
+1 on the versioned API endpoint... when we released ours nearly 8 years ago, versioning APIs wasn't really a thing yet. We're paying that technical debt off now as we vigorously rewrite and improve our service.
Quick feedback: Links on the FAQ page are hard to distinguish from regular text.
Good luck with the project!
We'll update the FAQ links, thanks!
1) Most services will accept shortcuts for names, like "SF" for San Francisco or NYC for New York, but in both cases, I got error messages instead of geocodes.
2) Addresses that aren't "properly" formatted (i.e., without commas or something) often return very incorrect information. Here's an example:
2680 NW 8th Pl, Fort Lauderdale, FL 33311 - returns correct info
2680 NW 8th Pl Fort Lauderdale FL 33311 - returns incorrect info (see suffix, formatted_address)
For what it's worth, SmartyStreets mangles even the first address that you got correct, but on the other hand, they're very good at correctly returning data for improperly formatted addresses like the second one.
Anyway, good luck. Great tool.
Our address parser will try to pick up the address even if isn't formatted correctly with commas, but it obviously won't work in all cases. Address parsing is indeed a very complex problem.
If you can start with a list of all the following, you've got a great start:
Add to that all the possible misspellings and then factor in levenshtein and soundex to account for misspellings you didn't know about and you've got a pretty dang good address parser. Figure out how to do that lickety-split fast, and you've got gold.
People, this is a lesson. If you post a "Show HN" then be ready to respond to people's questions and comments. Posting and then going silent for hours is not a good message to send
to people who you want using your service. It says you haven't thought enough about your level of service.
Kudos to GeoCod.io.
I wrote you guys a Ruby client: https://github.com/davidcelis/geocodio
The code's maybe a bit rough, but it's worked in my limited usage. Maybe you can take it for a test run before I push version 1.0.0 to RubyGems?
As for the pricing, we are indeed much cheaper than Google's geocoding offerings (given the nature of our product). If you are looking to do a high amount of geocoding requests, just contact us and we'll work out a pricing model for you.
Version 1.0.0 is on RubyGems now, by the way!
* query both services for each address
* if the [lat,lon] are equal (within a threshold), store Joe's result as correct
* store nothing otherwise
Are you storing Google's results in that case?
Also great is Pete Warden's http://www.datasciencetoolkit.org/
Street Address to Coordinates: Street Address to Location calculates the latitude/longitude coordinates for a postal address.
Currently only the US and UK have street-level detail.
Google-style Geocoder: Are you currently using Google's geocoding API and want to switch? Replace maps.googleapis.com with the address of a DSTK server and your code should work without changes.
Free to use, also available as a (free) self-hostable VM.
edit: signed up. does not work outside us. Why not bother documenting that?
And apologies that we didn't make the US-only part more prominent before. We've added it to the front page and moved it to the top of the FAQ.
For others looking for a solution you can play with yourself, here's a VM image with a pretty good geocoder you can set up yourself (iffy address parsing, though):
TIGER is a pretty bad starting point, geocoding based on block faces is really inaccurate if you want to zoom in to the street level. And its U.S. only.
OSM Nominatim should be a better place to start.
I'd love to see open sourced Street View data collection / processing as part of the OSM project. Then there is a chance to compete with Google.
Thereby spreading out the bulk cost of an API license amongst your customers who have to pay a significantly smaller amount, but adding up to profit?
We had millions of them though, so maybe an API isn't really the way to go.
If you guys can do the same without the rate limiting restrictions they place on us, we'd switch over in a heartbeat.
Note to self: code back-end API consumers with Interfaces and drivers instead of hardcoding API calls.
Most of the times users will only care about the results, so you'll be sending useless data
What does the "accuracy" value in the return mean? Maybe I am missing something but I don't see it in the FAQ or docs.