Hacker News new | comments | ask | show | jobs | submit login
Frank's Compulsive Guide to Postal Addresses (columbia.edu)
85 points by pera 7 months ago | hide | past | web | favorite | 19 comments

I contribute sometimes to OpenAddresses, a project to collect address data for the entire world from official sources. It's remarkable how complex the schema for addresses are and how very much hyper-local.

We're used to simple things in the United States like "123 Maple Lane" but even those addresses can be awfully complex. And then you get oddities like Portland's 0234 SW Bancroft St; the leading 0 is significant. Hawaii addresses are like "96-3208 Maile St, Pahala"

Of course it gets much more complicated internationally. India has addresses like "291, HA Block, Sector III, Salt Lake City, HA Block, Sector III, Salt Lake City, Kolkata". And much of the world doesn't have any regular address system at all, going instead by navigational systems like "third left after the banyan tree, down the road by the dairy."

Previously on HN, a related article: https://news.ycombinator.com/item?id=8907301

> the leading 0 is significant.

Word to the wise: Don't store numeric address information in integer fields. I sometimes see ZIP codes with missing leading 0's.

My rule of thumb is that unless I'm going to do arithmetic with it, it probably ought to be a string.

When I was in Nicaragua you’d see addresses like ‘down the street from where the Mercado used to be’.

I have received mail from a variety of rural USA locations, and many times the Postmaster is just one person, running the local shop. They get to know you, so informal or incomplete addresses may still work.

But if it is non-standard, it will be delayed or returned or just thrown into storage.

I love the openaddresses.io project and use it as a data source for geocode.xyz (https://geocode.xyz/legal ). It seems however that they have stagnated in recent years (after some healthy growth initially, the number of addresses in the dataset has been hovering at just under 500M addresses globally - 496,689,649 as of the last count) Has the project been abandoned?

It still appears to be active on GitHub[0][1]. Perhaps all the low-hanging fruit has been picked. As mentioned in the contributing guide[2], they only use authoritative data sources.

"OpenAddresses is a collection of authoritative data for address locations around the world. We collect address data from authoritative sources; we do not create our own data. A source is a location where authoritative address data can be found. Examples might be a downloadable CSV file or live ArcServer feature service hosted by a national postal service, a state GIS department, or a county property parcel database."

[0] https://github.com/openaddresses/

[1] [for posterity] https://web.archive.org/web/20180721063519/https://github.co...

[2] https://github.com/openaddresses/openaddresses/blob/master/C...

No it's not abandoned at all; there's a lot of active work finding new sources. You can see a log of commits here: https://github.com/openaddresses/openaddresses/commits/maste...

If the number doesn't go up as fast as it used to it's probably because the project has already collected all the addresses that were easy to find. Denmark's 3,689,469 addresses, for instance, are in a single CSV file. We found it once and that was it (although we do update the data every few days). The work now is mostly in finding much more challenging address data buried in databases. Also convincing government agencies to let us have the data.

Indian addresses are some of the worst. (I grew up there, so I know).

The article lists some guidelines here => http://www.columbia.edu/~fdc/postal/#india

In reality though, people use all kinds of hints and landmarks in the address [1].


TACTV Arasu Sevai Center

O/o J.Er,No.88,Durgadevi Nagar,

6th Cross Street,Extension,

Behind ECI Matric Higher Secondary School,



Source: http://www.chennaicorporation.gov.in/images/ucsc.pdf

People inject their own "Old this, old that", or "Next to Mari Amman Koil" ( Name of Temple / Landmark ) or "Behind this school / that landmark" etc.

Compounded by the fact that a lot of South Indian town names and district names are reeaaaalllly long and a lot of times, I run out of data entry space when making booking for my fam in India on online travel sites.

Ex: My cousins live in a town called Thillaiganganagar (17 chars). Another friend of my gramps lives in a place called thiruvananthapuram (18 chars)

Postal addresses are really complicated and this page makes them look far simpler and more consistent than they really are. I used to deal with these all the time when working with geocoding at Google Maps, and I ended up braindumping a lot of what I learned the hard way into Wikipedia, because for many countries there's very little documentation available in English:



Something that could also go into https://en.wikipedia.org/wiki/Japanese_addressing_system:

Ryūgasaki-shi in Ibaraki is a japanese town that doesn't use cho, so you get addresses like: 〒301-8611 茨城県龍ケ崎市3710番地 (3710 Banchi, Ryūgasaki-shi, Ibaraki-ken 301-8611)

Japanese news video about this oddity: https://www.youtube.com/watch?v=8MfFnpx7e6k

Apparently they neglected to adopt 町名 (chōmei) subdivisions when the town was promoted from machi to shi in 1954.

It took me about 10 tries to put my address in the US ESTA (Visa waiver) system.

First, it said the Ø in København was no good, so I changed the whole thing to Copenhagen. Then it changed that to Koebenhavn. Then it complained about the æ in the street name, which I changed to ae.

Then it kept saying "Street number invalid" or something. The building is about three years old, but the number wasn't in their database.

I tried several combinations of splitting the building number and apartment number between three boxes before one passed validation.

Then my application was "held for manual review".

I left the bit on social media aliases blank.

The page is now giving a 404 error (maybe was removed after the traffic spike?). But here's the most recent archive.org mirror: https://web.archive.org/web/20180430071524/http://www.columb...

There's also the UPU S1 standard for a common vocabulary for postal addresses. A couple of years ago I was working on implementing it for a large logistics customer. Postal addresses are actually a field where XML can be put to good use. An international address would look like a piece of semistructured text with all formatting into multiple lines etc. preserved, with portions of text such as ZIP codes, names, etc. tagged with elements.

We don't really use postal-codes for Romania, at least not for big cities (I live in Bucharest). The reason is that nobody really knows them, every time I have to fill up a form on a non-Romanian website which has the postal-code as a mandatory field I have to actually look up which postal code is attached to my address.

We use the street name, street number, the building's name if needed (for example my building/block of flats is named "12", but the building sits at number 91) and the entrance name (which usually is a capital letter, starting from A). And of course the apartment number. For individual houses is simpler, you just need the street name and the street number, and for houses located in small villages you just need to put the village name and the person to whom your letter/package is addressed, the postman knows how to get to that person.

I hope I never again have to write a concept for the UI and validation of international addresses again...

The use of plus codes for addresses would simplify mail delivery and vehicle routing in many parts of the world. https://plus.codes/

I just tested this on Google Maps. Dropped a pin, copied the plus code from the details section, pasted it again in the search box and hit Search. No results!

Tried it a couple of more times for slightly different locations, but the same issue.

This makes it seem like it’s not ready for prime time.

There’s something oddly satisfying skimming through all these addresses in Courier New. Maybe I was a mailman in my previous life.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact