
Ask HN: What's a good US address parsing library? - traviswingo
I&#x27;m looking to parse address strings (user input) into address objects for use across multiple services. What&#x27;s a good library to do this?<p>I&#x27;m currently using Google Geocoding API, but since I don&#x27;t use these addresses within a Google Map, I&#x27;m pretty sure I&#x27;m violating the TOS and would like to yank and replace it before I get shut down.
======
SingAlong
My coworkers recently setup an internal API using a Flask app for US address
parsing. They used libpostal after comparing it with usaddress and address
(both are Python libraries).

libpostal had the best results and good abbreviation parsing ("St" for street,
"blvd" for Boulevard, etc). We use this to check if two address strings are
equal, so you can imagine how much abbreviations are important to us. The US
postal department website has a list of postal abbreviations but writing our
own address parser with this is not a simple task.

libpostal is a few gigs of download and will take about 2gb ram to setup. But
worth it.

~~~
traviswingo
Yeah. I've tested it before and it looks very promising, but I'm not sure the
overhead is necessary for my use case. I'll need to run some tests against all
options and figure out where the equilibrium lies between cost (size, memory,
price, etc.) and functionality/ux.

------
kardos
Have you looked into libpostal?

[https://github.com/openvenues/libpostal](https://github.com/openvenues/libpostal)

~~~
traviswingo
I have looked into libpostal. I'm not sure the overhead is completely
necessary for my use case. Since posting I've toyed with usaddress and it
seems nice. It's a python library so I've had to pull it into my project
(node.js) with a child process but it seems to work very well.

------
VoidWhisperer
It would appear you are correct that you are breaking google's ToS

[https://developers.google.com/maps/documentation/geocoding/p...](https://developers.google.com/maps/documentation/geocoding/policies)

~~~
traviswingo
Thanks. I found this after adding the reply below since I felt the need to
verify :)

------
tylercubell
Keep in mind that address parsing is only one piece of the puzzle. A good
library can parse out address components with some degree of accuracy but
unless the address can be verified, the result is worthless in many cases. In
order to get accurate results, the library must use up to date, real world
data.

The best options are commercial: ArcGIS, HERE, Google Maps, SmartyStreets,
etc. There are free options such as OpenStreetMap, Census TIGER, and Open
Addresses but in my experience the results are unreliable.

------
urbanstat
Our company is dealing with this problem. We have implemented an API to
provide enterprises address parsing & geocoding capabilities.

If you want something cheap, business-friendly and ready to go asap, I would
advise you to use Here Maps. They have batch-geocoding plans that are quite
affordable.

If you want an API that runs behind the firewall or some customization on
parsing & geocoding, then we can help you with that.

------
olegkikin
Why do you think you're breaking Google's ToS?

~~~
traviswingo
I'm automatically parsing the user input and using the geocoding api data
output to link their address with their account, and then I'm charging for
services that are placed by using this address. I'm currently under the
impression that automatic parsing without using the data in a Google Map is
against the TOS, but haven't been able to verify this 100%.

------
zbyte64
There is a python library that uses probabilistic string parsing on addresses.
You can even add labeled data to improve accuracy:
[https://github.com/datamade/usaddress](https://github.com/datamade/usaddress)

------
caleblloyd
I've had luck using [https://smartystreets.com/](https://smartystreets.com/)
if you don't mind using a paid service. They're far cheaper than Google.

~~~
bruck
I also use SmartyStreets - generally very happy with it, and you can do up to
250 lookups per month for free. One thing to watch out for is that SS is meant
primarily for mail delivery. There are many addresses in the US that represent
real places on a map, but cannot receive USPS mail, and thus will be rejected
by SmartyStreets. Our customer base includes a large number of industrial
addresses, so that may skew our results, but we find that up to 20% of our
legitimate addresses get rejected by the service. You can control how strict
the validation will be, up to a point, but beyond that we have to rely on
Google as a backup for cleanup and standardization.

