Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Ridiculously cheap bulk geocoding (geocod.io)
224 points by thecodemonkey on Jan 21, 2014 | hide | past | favorite | 101 comments

Just for those curious, it's not that cheap actually compared to Google's enterprise level Geocoding. Nor, I'm guessing, is it able to geocode internationally. In which case you might as well use Mapquest, as it's completely free.

Currently, the company I work for uses Google for geocoding and we have 1.1mil a day which ends up costing around $5k (22%) per year more than these folks... but! It includes international geocoding, google maps, etc.

Simply using census data to Geocode US addresses is easy; and there are directions how to do it here in the comments... but setting up Nominatim (from open street maps) is a serious amount of effort (and not cheap for a 32GB server) but /is/ capable of global level geocoding.

One great use case for this service though: using mapbox, which is currently forbidden by Google's TOS...

While I'm stoked to see competition in this space, I wish the competition was a bit more robust (but everyone has gotta start some where, right?)

I hope you all continue forward with this, and hopefully add international capabilities as well as price drops. I for one would do away with your free offer altogether as the free users ROI will probably always be an expensive crap-fest and allocate those resources to driving the price down for your paying customers.

If/when you all can do ~1.1mil international geocodes per day for less than $10k a year, LET ME KNOW! :)

Other founder of Geocodio here -- thanks for your feedback! You bring up a lot of good points.

Building off of what's above, Geocodio is intended to be accessible to developers who don't have $10k to drop on geocodes. We found that this is a big need in the community (and for ourselves for our other projects). All of the other non-major-mapping geocoding services we found, including CSV upload instead of API, were more expensive than $0.001/each (oftentimes much more -- $0.25+)

Also, we don't have limitations to how you use the data. No requirements that you use a specific brand of map with it, no attribution requirements, etc.

We priced it at this point, with a free tier, so that people can give it a try first. No, our data isn't quite as good as Google's -- we get about 90% of addresses within 1 mile, and most within a tenth of a mile -- and we want people to be able to play around with the service and get to know it before they have to give credit card info.

With that said, we definitely plan to continue improving the product and add international support.

PS. We are HUGE fans of Mapbox, so we're pretty excited that you listed that as a potential use case :)

Thanks for the feedback, it's really valuable! The idea behind Geocodio is definitely to prevent you from needing to go through the hassle of building a dataset yourself and hosting it, it is indeed a very time consuming process. Note that mapquest still has a 5k/requests per day limit [1] making this a viable alternative.

As mentioned in our FAQ [2] we do indeed provide special pricing and capacity for high-volume users, we would definitely be able to match Google's pricing by far.

[1] http://developer.mapquest.com/web/products/dev-services/geoc... [2] http://geocod.io/faq/

Mapquest is free, but only up to 5k per day, right?

Neat website! Very clean, simple pricing. Thank you for batch geocoding to minimize network traffic...

How does the accuracy (as well as address parsing capabilities) compare to the completely free solutions such as Nominatim[1] or DSTK[2]?

Both services provide capabilities for local installations, obviously with no query limits and minimal latency.

[1] http://wiki.openstreetmap.org/wiki/Nominatim

[2] http://www.datasciencetoolkit.org/

Thanks! Yes, I haven't really seen a lot of other services that provides batch geocoding as an API endpoint.

We have mostly been running tests against the Google Maps API, and from a totally random sample of 100 address, 90 of them were within a mile from the Google Maps API returned location (Most of them were actually within 0.01 mile).

I'm not sure how we would compare to OpenStreetMaps and Data Science Toolkit since our data source is different (US Census Bureau). - But the obvious reason why we provide this as a SaaS, is that you don't have to host anything yourself, or juggle around with gigabytes of boundary data. We handle all the mess.

They both pull in Tiger data + other sources (so the expected outcome would be that they are more complete in some areas).

For those looking to roll your own, the Ruby implementation of a TIGER geocoder released by GeoIQ a while back is a pretty solid starting point: https://github.com/geocommons/geocoder/

We ended up using that as a base and then making some customizations for our US-based geocoding solution. As these guys are figuring out, there's no great int'l option. Google is bad from a licensing perspective (but their tech is fantastic). MapQuest is great but can get really expensive. We've had decent luck with TomTom I think, but if I remember correctly there are a lot of caveats.

I had the need to geocode 10s to 100s of thousands of US addresses weekly, with the ability to accept slightly-reduced accuracy vs. the parcel-level accuracy of Google Maps.

I rewrote the geocommons geocoder in Java to speed up the loading and geocoding process, and wrapped a REST api around it. I used a minimal perfect hash function to map zips/streets (metaphone3'd and ngramfingerprint'd) to data stored in a key-value structure. The key-value structure is small enough to fit in memory of a decent sized EC2 instance, but I haven't tested the throughput except from a slow disk--which got me about 100-150 results/sec.

The results include parsed address, lat/lng in WGS84 datum, and associated US census region info (state, county, block group, block, msa, cbsa/csa, school district, legislative district, etc.).

I'd considered open sourcing it, and I was trying to architect it such that one could plug in various data sources beyond TIGER when higher-accuracy info is available (e.g., from SF's address parcels, Massachusetts has lots of E911 parcel data available, etc).

It's a very smart geocoder that one, have contributed to it - It is a bit old now and not that easy to get started with.

The state of the art of open source geocoder would be TwoFishes: https://github.com/foursquare/twofishes written in Scala and developed and used by FourSquare

sounds like twofishes doesn't do addresses, only city names. That puts it in a different category of product.

That's a pretty nice starting point, but unfortunately the code base hasn't been updated for years and the data import process is extremely time consuming [1]. That said, rolling your own geocoding solution is the most restriction free service you can get. Just be prepared for the maintenance and the time consuming set up.

[1] https://github.com/geocommons/geocoder/wiki/Installation-Ins...

Seconding the usefulness of that service - has worked well for me before now. Would be interested to know if this service is using it.

I've been doing freelance geocoding gigs for a couple of institutions in the past years (canadian addresses), with only open source tools and data.

I also wrote a primer explaining the basic geocoding ideas:


Great blog post! We obviously also use address interpolation to determine the exact location.

For an API, you can also try http://open.mapquestapi.com/nominatim/ (which is kinda free -- and uses OpenStreetMap data).

The biggest problem we've had is changing non well-formed addresses / ambiguous addresses into canonical addresses with lat/lng. Google Maps wins on that front.

We obviously can't beat Google in that case :) That's also why it's priced to be way more affordable. It does however happen that Geocodio is more accurate than Google Maps - try for example "8895 Highway 29 South, 30646" (Address of a CVS store) on Google Maps and Geocodio.

I'm using mapQuest geocoding API[1] which basically does what you do for free, without the rate limits.

setting it up was quite a pain because they don't use semantic http codes, and I had to play with it a lot to handle their undocumented error codes (they store it inside body.info.statuscode). Good to read that you return semantic http codes.

If you want to differentiate from the competition, I would suggest that you improve the address parsing and support more patterns. Think of us having to geocode user-typed location fields from twitter. Enjoy it :)

[1]: http://open.mapquestapi.com/geocoding/

Cherry picking one example does not make you more accurate than Google Maps. TIGER has some giant holes in it, and is based on block faces not building footprints like Google Maps. In most cases Google Maps will be much more accurate and comprehensive.

UI suggestion: 'street addresses' currently has a box around it, so I thought it was an <input type="text"> field, thought "how cute", tried to click on it to enter an address to geocode, and was disappointed to find out it was just some bolded text. Might be a fun little feature to have that actually be an entry point into trying out a demo of the API (I thought I was supposed to enter an address to have geocoded).

Agreed. Did the same thing.

Would actually be neat though for that to be a quick demo of your software.

Good point!

Your pricing page is not as clear as it could be.

When people read "$0.001 each" they sometimes understand it to be one thousandth of a cent rather than one thousandth of a dollar.

Even though you are completely correct/accurate, people find it confusing (1).

Wouldn't it be clearer to say "1 cent fore every ten uses" (or "10 calls for a cent" or "a tenth of a penny per call")?

Admittedly, your audience is semi-technical, and should parse it correctly, but why not simplify it?

[1] http://verizonmath.blogspot.com/

Maybe it's because I'm semi-technical, but "$0.001 each" is clearer to me than "1 cent for every 10 uses." I mean, I recognize that they're synonymous, but the former clicks in my mind faster than the latter.

So, if, as you admit, the service's audience is semi-technical, and if the average semi-technical person's brain works like me (big assumption, I know), I would argue that they should stick with $0.001.

That would only be an issue if they wrote 0.001c. $0.001 is very clear.

The part I was unclear about, actually, is whether or not one is charged for the first 2500 if one were to hit that threshold.

The first 2,500 are always free. So 2,501 would be $0.001. I'll make that clearer on the site -- thanks!

Other founder here -- sorry about that! We can definitely make it clearer, you're right.

Congrats on the product! Just a couple website level things:

- http://geocod.io/contact/ says DC but shows me a map centered somewhere south of Topeka.

- Random $0.02 suggestion: stop using "ridiculous".

I'm getting a map centered in Brooklyn. I'm in Midtown Manhattan... so, maybe its trying to find our location?

Are you close to Topeka?

Seems like our embed code was bad, I've updated it now. Thanks for the feedback!

mkessy, you are hellbanned.

fun fact: the "geographic center" of the US is in Kansas. Often if something geocodes to only "USA" then it gets placed there.

I work at SmartyStreets, where we've learned that geocoding is very, very difficult, so I definitely feel your pain! We started with basic Census Bureau stuff and it's definitely complicated, and accuracy can be spotty. (We've since worked with other data vendors to improve the accuracy.) It's too bad we don't all have little cars to roam the country with and manually collect rooftop-level data like Google does.

+1 on the versioned API endpoint... when we released ours nearly 8 years ago, versioning APIs wasn't really a thing yet. We're paying that technical debt off now as we vigorously rewrite and improve our service.

Quick feedback: Links on the FAQ page are hard to distinguish from regular text.

Good luck with the project!

Thanks! Yes, it is definitely not easy, a lot of edge cases to take care of too. Luckily we are not trying to directly compete with any of the big guys out there, which makes us able to keep the price low and the output high.

We'll update the FAQ links, thanks!

We were tired with dealing with the often steep pricing on geocoding when you reach your daily free limit (e.g. Google Maps starts at $10k/year). So I built this service so I can use it myself and hopefully it would be useful for others too.

Love it. I'll keep using my current service for now (SmartyStreets), but I'll let you know two things I noticed:

1) Most services will accept shortcuts for names, like "SF" for San Francisco or NYC for New York, but in both cases, I got error messages instead of geocodes.

2) Addresses that aren't "properly" formatted (i.e., without commas or something) often return very incorrect information. Here's an example:

2680 NW 8th Pl, Fort Lauderdale, FL 33311 - returns correct info

2680 NW 8th Pl Fort Lauderdale FL 33311 - returns incorrect info (see suffix, formatted_address)

For what it's worth, SmartyStreets mangles even the first address that you got correct, but on the other hand, they're very good at correctly returning data for improperly formatted addresses like the second one.

Anyway, good luck. Great tool.

Thanks for the feedback! We don't currently support shorthands for city names - only states. But this is definitely something that's on the todo list now.

Our address parser will try to pick up the address even if isn't formatted correctly with commas, but it obviously won't work in all cases. Address parsing is indeed a very complex problem.

Address parsing is actually very easy. Knowing when you got it right (or wrong), that's the hard part, and that's where address validation come in handy.

If you can start with a list of all the following, you've got a great start:

prefix abbreviations street names street types suffixes city names state names

Add to that all the possible misspellings and then factor in levenshtein and soundex to account for misspellings you didn't know about and you've got a pretty dang good address parser. Figure out how to do that lickety-split fast, and you've got gold.

Is it possible to delete an account? I created one before discovering it was US only.

Sure! Just send us an email at support@geocod.io and I'll remove your account right away.

An admirably quick response, both to my question here, and to the email I sent.

People, this is a lesson. If you post a "Show HN" then be ready to respond to people's questions and comments. Posting and then going silent for hours is not a good message to send to people who you want using your service. It says you haven't thought enough about your level of service.

Kudos to GeoCod.io.

Will your service be expanding to provide reverse geocoding? we would be interested if it did both (we need both).

If there's enough interest, we'll definitely be working on this next. It would just require a slight restructuring of our data to make the lookups as efficient as possible.

+1 to reverse geocoding support. Our app runs around 25k/day reverse geocode calls to OSM and Mapquest's Nominatim. We are projecting up to 4x growth within the year so an accurate, bulk and cheap service will help ease our pain. And oh, we're based and operating in the Philippines (which hopefully you can add soon as well).

I'd be interested in this. We'd pay if it was accurate, fast, and available.

Cool project. Like others have said, not particularly convinced that it's cheaper than Google's enterprise geocoding, but I'm more than glad to see the competition.

I wrote you guys a Ruby client: https://github.com/davidcelis/geocodio

The code's maybe a bit rough, but it's worked in my limited usage. Maybe you can take it for a test run before I push version 1.0.0 to RubyGems?

Thanks! This looks great! Would you mind if we possibly mentioned this in our documentation?

As for the pricing, we are indeed much cheaper than Google's geocoding offerings (given the nature of our product). If you are looking to do a high amount of geocoding requests, just contact us[1] and we'll work out a pricing model for you.

[1] hello@geocod.io

Definitely feel free to mention it in the docs! Thanks again for an awesome new alternative.

Version 1.0.0 is on RubyGems now, by the way!

I wonder how the "choose your own api key" policy is going to work in practice... given that people don't usually make very secure passwords and that the example is "Real estate website" you're going to get some pretty easy to guess api keys.

That's actually just a name to identify the API key, the actual API key is a 40-character automatically generated string. The idea is that you will be able to create an API key for each of your projects and revoke them individually as necessary.

I tested this website api for 2000 randomly selected home address. And it's not accurate enough. It's 4000 foot away in average to google's lat lng. This number is kinda less accurate comparing to bing's 1000 and datasciencetoolkit is 2200.

Geocoda (http://geocoda.com) launched last year, does point storage as well as geocoding, and should be comparable for low amounts of geocoding, and cheaper for large amounts per month (> 250K).

TIGER (dataset that this is based on) has some giant holes in it, and is based on block faces not building footprints like Google Maps. Its also U.S. only... why not base on OSM, which should include TIGER as well as all the other contributions.

Gah! This is awesome. Where were you when I was trying to get an idea launched and the cost of geocoding was the wall I kept hitting??? Seriously this makes my week, maybe it's time to dust off some old projects...

Feedback like this is exactly why we released this (and made it so cheap) :)

IIRC Google's TOS prohibits saving geocoded points. "Caching" is allowed, but I think this has value/is different insofar as it would let you store points permanently without breach of contract.

I wonder if, given APIs for both Google Maps and Joe's Free And Permissive But Sometimes Wrong Maps, you could:

* query both services for each address

* if the [lat,lon] are equal (within a threshold), store Joe's result as correct

* store nothing otherwise

Are you storing Google's results in that case?

Where the pricing says $.001/ea for 2501+ geocodes, are the first 2500/day prior to that still free? Or am I paying $2.50 for the day as soon as I make that 1 extra request above the free limit?

Yep, the first 2500/day will always be free. So if you geocoded 2510 addresses in one day, you would pay $0.01

This is great.

Also great is Pete Warden's http://www.datasciencetoolkit.org/

Street Address to Coordinates: Street Address to Location calculates the latitude/longitude coordinates for a postal address. Currently only the US and UK have street-level detail.

Google-style Geocoder: Are you currently using Google's geocoding API and want to switch? Replace maps.googleapis.com with the address of a DSTK server and your code should work without changes.

Free to use, also available as a (free) self-hostable VM.

Why don't you put a demo query page so I can try addresses in my country without signing up?

edit: signed up. does not work outside us. Why not bother documenting that?

Good idea about the demo page!

And apologies that we didn't make the US-only part more prominent before. We've added it to the front page and moved it to the top of the FAQ.

Neat! If you can get your address parsing up to Google's level or anywhere close, you should do quite well.

For others looking for a solution you can play with yourself, here's a VM image with a pretty good geocoder you can set up yourself (iffy address parsing, though): http://www.datasciencetoolkit.org

Parsing isn't the hard part, its the source data, which if you do the math on what Google has done (drive around the world taking 360 video and LIDAR of streets) is literally billions of dollars worth of work.

TIGER is a pretty bad starting point, geocoding based on block faces is really inaccurate if you want to zoom in to the street level. And its U.S. only.

OSM Nominatim should be a better place to start.

I'd love to see open sourced Street View data collection / processing as part of the OSM project. Then there is a chance to compete with Google.

What you're talking about (massive ground-level driving effort to pinpoint where along streets specific addresses are located) would boost accuracy. Without a Google level address parser, though, you don't get usability for a lot of use cases, which is frankly much more important for a lot of companies. One of the best things about Google's geocoder is that you can throw various location names, as humans type them, and Google will return something, and it's usually the right thing. For many applications, this is the desired behavior, rather than precision.

I really like the look of your API. I work a lot with location-based apps, I'll probably be giving this a go :)

Thanks! Please let us know what you think.

What's preventing people from simply signing up for Google Maps API for Business then sending your requests that way and returning the results?

Thereby spreading out the bulk cost of an API license amongst your customers who have to pay a significantly smaller amount, but adding up to profit?

I would imagine what you're talking about is a violation of their terms of service.

I believe they have a daily limit as well. They also charge $50/20k addresses for bulk geocoding which is way more than what we charge :)

I oversaw a project like this elsewhere (where we had reams and reams of geo coordinates, but we needed text searchable tags (like "Canada", "Toronto", etc).

We had millions of them though, so maybe an API isn't really the way to go.

How does the dataset compare to google? Would love to see some side by side comparisons.

See my previous answer [1], obviously it's impossible to compete directly with Google and especially not at this price point. Our goal is to return a geo coordinate that is at least on the correct block and as close to the street number as possible.


Would love if you could get integrated into http://www.rubygeocoder.com/ - That would make my switch much easier. Would love to support you guys.

I think this is great. We use Google Maps for geocoding today, we paid around 10k for this years license.

If you guys can do the same without the rate limiting restrictions they place on us, we'd switch over in a heartbeat.

We actually don't have any rate limiting currently (we can handle a pretty high amount of concurrent requests and will hopefully be able to scale up hardware before we hit any performance issues).

Very cool. I'm in the telematics industry and forward geo-coding is something in which I am always interested, since it can be quite the bitch of a task. How did you go about assembling the shape files?

Great to see something new in this space. I remember having to rewrite quite a bit of backend code when SimpleGeo shutdown.

Note to self: code back-end API consumers with Interfaces and drivers instead of hardcoding API calls.

Small thing: I would drop the "bulk" as the tagline is too much of a mouthful and "bulk" is unnecessary. It's free at smaller volumes anyway, so certainly not deceptive to drop it.

The ability to understant how the input was parsed is an interesting feature, but i think it'd better be optional.

Most of the times users will only care about the results, so you'll be sending useless data

Good point, we might want add that as an optional parameter. Also note that our address address parsing API endpoint is free and doesn't count towards the usage statistics and billing :)

Is there a similar service for reverse geocoding? US and international?

Very nice! Can't wait to try it out.

What does the "accuracy" value in the return mean? Maybe I am missing something but I don't see it in the FAQ or docs.

Great job, guys! This definitely opens up some nice options. Reminds me how much I miss the old TIGER/Line file formats, though.

Interesting. What interesting things can you do once you geocode a street address? How are you (your business) using this?

I used it to take a contact database for a series of seminars around the country and allocate participants to the geographically-appropriate venue.

We mainly use it personally for putting locations on a map. It can also be used to calculate distances and many other things.

It allows you to put markers on an embedded google map. You can also do directions to and from markers, etc.

I like the price point about $1/1000records. Just curious to hear how you arrived at this price point.

Our infrastructure is pretty efficient, making us able to keep our operating costs low. We wanted to have a pricing point that was below any other similar services we could find.

This is very cool! We use SmartyStreets because of the price. Where did you get the addresses database?

In another comment OP says the source is US Census Bureau.

Very nice. Pricing is very attractive. May need to use this in the future.

so.. us addresses only?

Yep! Unfortunately we will manually have to add support for each country (including getting data, normalizing it, etc.) which is quite some work. We're planning to add support for additional countries if demand is high enough.

Separating a street name into "street" and "suffix" is a baffling decision which probably has a few issues even in the US, and definitely won't work elsewhere.

Nope, it won't work outside the US. But US addresses actually have a list of suffixes that all ordinary addresses comply to. See https://www.usps.com/send/official-abbreviations.htm

Agreed, this is unusable for us until there's international support. We have a large percentage of international users in 130 different countries. Doesn't make sense to use this over, say, MapQuest.

next up, geofencing-as-a-service?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact