
Libpostal: international street address parsing in C trained on OpenStreetMap - riordan
https://mapzen.com/blog/inside-libpostal
======
freyfogle
An amazing effort, many congrats to all involved.

I work on the address-formatting project, one small piece of the many used
here. We currently have formatting rules for 93% of the world's 249
territories (as defined by ISO 3166-1 alpha-2 codes), but we need help to
finish things out - especially from people with local knowledge and native
speakers. Even for the countries we've "finished" more tests are always
useful.

Here's the repo if you'd like to get involved:
[https://github.com/OpenCageData/address-
formatting](https://github.com/OpenCageData/address-formatting)

Here's a post I did a week ago on the regions we need help with, though since
then we've started making good progress on Arabic speaking countries.
[http://blog.opencagedata.com/post/138991962708/an-update-
on-...](http://blog.opencagedata.com/post/138991962708/an-update-on-address-
formatting-help-needed)

Feel free to ping me if you'd like to get involved. Thanks.

~~~
bojanz
I have a project with similar aims:
[https://github.com/commerceguys/addressing](https://github.com/commerceguys/addressing)

It uses the Google dataset, which is under the public domain. Might make sense
to compare formats?

Btw, I love how your worldwide.yaml, both the deduplication and the
redirection for subterritories (such as Vatican City).

~~~
kdeldycke
Was planning to use the same dataset too in my Python module:
[https://github.com/scaleway/postal-
address](https://github.com/scaleway/postal-address)

And thanks @bojanz for asking Google about their data's license! :)

Haven't looked in details at your addressing PHP module or even Libpostal, but
I feel like there should be some ways to deduplicate efforts and converge all
datasets. Both for testing and i18n/l10n.

In the mean time, OpenCageData's address-formatting language-neutral YAML
structure seems quite nice.

------
pella
other open traning dataset :
[https://openaddresses.io/](https://openaddresses.io/)

