I'll keep it free for now, unless people start abusing it.
Are you exposing this to get an idea for demand? How are you tracking this?
Some possible edge cases to consider:
123 Summit Blvd West Palm Beach FL 33406 This returns the city as "West Palm Beach", when it is actually in "Palm Beach" and the street would be "Summit Blvd W". (How do you know? lots of ways, but ZIP is the obvious contender)
W324S7840 Paul Ln, Mukwonago, WI 53149 In some places in Wisconsin, they have something called the WISCRS ("whiskers"), which is a coordinate-based address system. The "W324S7840" is actually the "number" component, while "Paul Ln" is the street name.
Carmel, CA There is actually a portion of Carmel that doesn't have any mailboxes. Kind of a strange place. Probably not relevant to this, but worth mentioning.
Like I said, this is pretty good for naive parsing. It's not terribly useful for businesses, though, because for most business cases, you need more than simple breaking down into components.
- "SF" and "San Francisco" are equivalent but not == to each other, so if someone were using this to de-duplicate addresses, they would hit a failure point here.
- "Street" and "St" are equivalent but not == to each other, so not de-duplicatable.
- Minor misspellings, like "Warrington" vs "Warington". You can't detect these, necessarily, without a more comprehensive database, but misspellings would contribute to billing/shipping errors and duplication issues.
- ZIP codes can tell you a lot about an address. For example, all ZIPs beginning with "941" are in San Francisco, "946" are in Oakland, "945" are in Alameda, and so on. Getting a list of all the states and city ZIP codes can help you determine whether to keep or reject an address's ZIP code. For example, I put in "123 Main St, Boston, MA 90210", and it just spat out "90210". If you remember the 90s at all, you'll know that 90210 in California, not in Massachusetts. This would cause a failure in either billing or shipping for any business. The ZIP Code is actually super powerful. For example, in 94102, there are only 44 possible streets. If you were to create this ZIP-city-street mapping, you could very quickly verify the authenticity of the street, city, state, and ZIP components. (Numbers are harder, units even harder than that.)
- 401 Rodeo Way, Austin, TX is an apartment complex, but 403 Rodeo Way, Austin, TX is a parking lot (and is not a real address). For most business cases, you don't want to allow people to put in bad addresses. This is hard, though, because sometimes databases don't get updated with new construction or renovated buildings. Tricky, but something to consider.
To be clear, I think you've done a fantastic job. I built a full-featured address parser for a large real estate company a few years ago. In fact, this is my go-to coding challenge when I interview engineers. I say "Here's 123 Main St, Boston, MA 00235. Break it apart into its components." Typically, they try and split on commas or use a regex, in which case I would remove the commas and everything breaks. There are a ton of ways to write an address, all of which may be valid, but are hard to predict. Another good example would be something like "123 St Francis Ave, Boston, MA 00235". The "St" in "St Francis" trips people up often, but your app catches that. Fantastic job with that.
Keep up the good work. Super impressed. It'd be great if you could tackle some of the edge cases I mentioned above, especially with regards to correcting ZIP codes.