

How to parse postal addresses from documents - khichi

I have seen that google and yahoo do a neat job on parsing postal address information from free form text. What technique/api do they use? What can I use to parse all postal addresses from a pdf, word, text, html document.
======
trollhammeren
Parsing, Tokenizer, NER (Named entity recognition), Matching extracted tokens
to geographical names list (gazetteer matching). Thats the process more or
less. For tools you can google according to the terms above and get many. Hope
this helps

