

Data Science Toolkit: free-as-in-speech Natural Language Processing & Geo tools - nosignal
http://www.datasciencetoolkit.org/about

======
timr
Before you start building something too elaborate on top of this, you should
play with the tools to get a sense of their limitations. The Geocoder::US
module, in particular, has some problems with real-world input (e.g. geocoding
"151 Third St San Francisco" doesn't work, but geocoding "151 Third St San
Francisco CA" does). Similar caveats apply to the heuristics used by the other
tools.

There are a lot of these corner cases, and they're hard to solve. As always,
nothing replaces experience when you're doing this kind of work.

~~~
taliesinb
Kinda of amusing Alpha result:
[http://www.wolframalpha.com/input/?i=151+Third+St+San+Franci...](http://www.wolframalpha.com/input/?i=151+Third+St+San+Francisco)

~~~
timr
Yeah...parsing addresses is non-trivial. It quickly turns into a
probabilistic/NLP problem, if you want to deal with any sort of real-life
input. I can believe that Wolfram Alpha hasn't put a lot of effort into it.

The DB lookup part of the problem (the part that Geocoder::US solves) is the
straightforward, mechanical part -- but geocoder precision/recall is almost
entirely determined by the quality of the parsing. That's a lot harder to do
well.

