
Show HN: Parse Address – Simple Address Parsing API - traviswingo
http://parseaddress.io/
======
traviswingo
Hey all. I built this as a simple to use solution for parsing free-form street
addresses around the world. It's currently integrated into my other side
project, postcardbot.co. I've had many people reach out to me about the
address accuracy, so I've decided to separate it out into it's own project.

I'll keep it free for now, unless people start abusing it.

------
brad0
I'm curious as to why you chose an array of components as a response instead
of an object of component keys?

~~~
traviswingo
Thanks for pointing that out. Left behind from some cleaning/testing. It's
back to normal now.

~~~
brad0
Nice turnaround! I'm building an API myself to get public feedback.

Are you exposing this to get an idea for demand? How are you tracking this?

~~~
traviswingo
Nope :). It's just a problem I've come across a bunch before. I'm exposing it
so people will have a quick resource to use for a side project. As long as
things stay relatively modest, I'll keep it open for free.

------
Jemaclus
This does a pretty good job for a simple, naive parser.

Some possible edge cases to consider:

 _123 Summit Blvd West Palm Beach FL 33406_ This returns the city as "West
Palm Beach", when it is actually in "Palm Beach" and the street would be
"Summit Blvd W". (How do you know? lots of ways, but ZIP is the obvious
contender)

 _W324S7840 Paul Ln, Mukwonago, WI 53149_ In some places in Wisconsin, they
have something called the WISCRS ("whiskers"), which is a coordinate-based
address system. The "W324S7840" is actually the "number" component, while
"Paul Ln" is the street name.

 _Carmel, CA_ There is actually a portion of Carmel that doesn't have any
mailboxes. Kind of a strange place. Probably not relevant to this, but worth
mentioning.

Like I said, this is pretty good for naive parsing. It's not _terribly_ useful
for businesses, though, because for most business cases, you need more than
simple breaking down into components.

For example:

\- "SF" and "San Francisco" are equivalent but not == to each other, so if
someone were using this to de-duplicate addresses, they would hit a failure
point here.

\- "Street" and "St" are equivalent but not == to each other, so not de-
duplicatable.

\- Minor misspellings, like "Warrington" vs "Warington". You can't detect
these, necessarily, without a more comprehensive database, but misspellings
would contribute to billing/shipping errors and duplication issues.

\- ZIP codes can tell you a lot about an address. For example, all ZIPs
beginning with "941" are in San Francisco, "946" are in Oakland, "945" are in
Alameda, and so on. Getting a list of all the states and city ZIP codes can
help you determine whether to keep or reject an address's ZIP code. For
example, I put in "123 Main St, Boston, MA 90210", and it just spat out
"90210". If you remember the 90s at all, you'll know that 90210 in California,
not in Massachusetts. This would cause a failure in either billing or shipping
for any business. The ZIP Code is actually super powerful. For example, in
94102, there are only 44 possible streets. If you were to create this ZIP-
city-street mapping, you could very quickly verify the authenticity of the
street, city, state, and ZIP components. (Numbers are harder, units even
harder than that.)

\- 401 Rodeo Way, Austin, TX is an apartment complex, but 403 Rodeo Way,
Austin, TX is a parking lot (and is not a real address). For most business
cases, you don't want to allow people to put in bad addresses. This is hard,
though, because sometimes databases don't get updated with new construction or
renovated buildings. Tricky, but something to consider.

To be clear, I think you've done a fantastic job. I built a full-featured
address parser for a large real estate company a few years ago. In fact, this
is my go-to coding challenge when I interview engineers. I say "Here's 123
Main St, Boston, MA 00235. Break it apart into its components." Typically,
they try and split on commas or use a regex, in which case I would remove the
commas and everything breaks. There are a ton of ways to write an address, all
of which may be valid, but are hard to predict. Another good example would be
something like "123 St Francis Ave, Boston, MA 00235". The "St" in "St
Francis" trips people up often, but your app catches that. Fantastic job with
that.

Keep up the good work. Super impressed. It'd be great if you could tackle some
of the edge cases I mentioned above, especially with regards to correcting ZIP
codes.

