Hacker News new | comments | show | ask | jobs | submit login
Falsehoods programmers believe about addresses (mjt.me.uk)
144 points by michaelt 1427 days ago | hide | past | web | 128 comments | favorite



The article remains western- (and especially UK-) centric in outlook, a more international business will probably find plenty others. For instance japans disprove:

* The address format is uniform across the country (Sapporo has its own, and Kyoto uses an alternative system on top of the standard one)

* Addresses go from the most specific to least (e.g. flat number to postal code), japan is the exact opposite

* Addressing systems don't change (Japan's was reformed in 1998)

* Building numbers are street-based (Japan's are block-based)

It also expands on things like "addresses will have a street": standard japanese addressing is subdivision-based so addresses provide the prefecture, the ward (~county), the district and the city block.

Except for Sapporo and Kyoto (see 1): Kyoto allows an denoting blocks as the intersection of two streets and the position relative to the intersection (north, south, east or west). One reason is that some wards have multiple districts with the same name... Sapporo uses a system where blocks are addressed by their distance (in blocks) + direction from city center, so you might be told the address is "the 4th building, 3 blocks north and 5 blocks east".


US addresses exhibit neither increasing nor decreasing specificity. Apartment numbers are usually written in the middle, like "30 Tristan Way, Apt. 107, Gwyneth VA 27384". This never seemed strange to me until I saw it being done more sensibly elsewhere.

Singapore is another good one: its postcode scheme changed twice in less than a century, and being a city-state, there is nothing useful to put in a "city" field (mail is often sent to "Singapore Singapore", or even "Singapore Singapore Singapore").


This really bothered me when I had mail shipped to me in Shanghai from amazon (US). Amazon forces you to provide a state. Shanghai is its own administrative unit, at a level with Chinese provinces. So my address ended up being "<street>, Shanghai, Shanghai China".

I've seen natives handle that problem by filling in their district in the "city" field, so e.g. in my case "<street>, Minhang district, Shanghai China", analogous to e.g. "<street>, Nob Hill, San Francisco, USA".


> Singapore [...] being a city-state, there is nothing useful to put in a "city" field

Good point, which should apply to most city-states (e.g. Monaco, or the Vatican)


...not to mention the substantial number of rural addresses that don't include city names in most countries.


>US addresses exhibit neither increasing nor decreasing specificity. Apartment numbers are usually written in the middle, like "30 Tristan Way, Apt. 107, Gwyneth VA 27384". This never seemed strange to me until I saw it being done more sensibly elsewhere.

Best to think of it as an onion. On the outer layers (beginning and end) automated and semi-automated sorting can be applied most easily (the ZIP code was designed for automated sorting and routing). Apartment and suite numbers in the middle are handled (usually) manually by carriers already at or near the service address.


That's a very verbose way of writing an apartment... In Australia, we'd write the first part of that address as 107/30 Tristan Way.


And in Sri Lanka it would be written the other way around - 30/107 Tristan Way.


Or for Singapore, I've seen people put in "China" for the country field. But that was just someone from the US trying to ship to us..


...worse, Japan's addresses are chronologically assigned! You're essentially limited to a big lookup table by the structure of the system.

This is both a comical inconvenience for travelers (try finding an address without speaking the language), and a huge problem for porting any sort of local-data application to Japan.


I think the UK examples are quite good though. They show that you don't even have to change language to run into massive problems with addresses. More than being anglo-centric, the issue of addresses is often even more specific, i.e. US specific.

Japan et. al. add another layer of complexity, other text encodings etc. etc., but even in the seemingly simple case of English being used, it's not quite as simple as many forms make it out to be.


For those that want an example:

100-0001 (ZIP Code) 東京都 (Tokyo-to, Prefecture) 千代田区 (Chiyoda-ku, City) 千代田1−1 (Chiyoda 1-1, Area Block-Subblock)


I'd really like people to turn these into specialized wikis or Github repos or something.

A format of:

  Statement of Falsehood
  Explanation / Description
  Counterexamples
  Suggested mitigations
would be fine. It'd make it possible for us to submit ones that aren't apparent to the originator.


No mention of unicode, either. It might not be a huge problem in Japanese (unless the system silently ignores unicode, and gets all the way to shipping with a blank address field), but romanizing Chinese characters can result in very unreadable text (for a pathological example, see http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_D...)


The biggest takeaway for me here should sound familiar: don't overengineer your data model. The more decisions you make about specific properties, the more special cases you will need to handle (and those are a continuing source of frustration for both programmers and users down the line).

In this case, most of these issues can be avoided by allowing a free text field instead of dedicated fields for street name, number, apartment, floor, state, county, district, and whatever.


The typical problem with what you're advocating is that in an enterprise environment typically your application is communicating with a number of external data sources that already make a bunch of assumptions about data, whether they're reasonable assumptions or not, and they typically expect that you'll be able to return them their data at the same level of granularity as they've provided it to you. E.g. if your client's 70s COBOL application gives you a list of their customers with addresses including a postcode field and then your application filters that or something and sends back a second list, it won't be acceptable to give them back addresses with the postcode simply concatenated into a text field with the other parts of the address.

The correct way to handle this is to be able to dynamically accept and store arbitrary granularity of data (and be able to translate that into a single text field at runtime if needed).


Then you will have a problem if you want to filter the entries. You will need a good parser to identify that "Paris" means the capital of France or one of the many in the USA[1].

[1] http://en.wikipedia.org/wiki/Paris_%28U.S.A.%29


> In this case, most of these issues can be avoided by allowing a free text field instead of dedicated fields for street name, number, apartment, floor, state, county, district, and whatever.

The problem is that free form text fields are the opposite of over-engineering, because you're exchanging complexity at input time for the impossible task of parsing what the user typed in later on. Garbage in, garbage out.

An optimal way to tackle complex addresses is allowing the user to fill a free-form address, but annotate with meaningful metadata so you have useful data. Something like this:

[Number ] [24] [Street ] [Westover Lane] [City ] [Palm Coast] [State ] [Florida] [Zipcode] [32164]

[+ Add another field]


In the U.S., if you're mailing a large number of items, you can get significant discounts for providing the mail to the post office pre-sorted by the 9-digit ZIP code. To take advantage of that, your address schema would at least need to be able to extract that information.

One compromise would be to impose an address schema by default, but allow users whose addresses are incompatible with it to opt out of it and enter a free-form address.


There's a fine line between being flexible in what you accept for input and parsing, and writing another programming language.


Another point I'd like to add. Quite a few addresses in India tend to have popular "landmarks" listed with them. Something like: Opposite <a popular establishment> or Near <a well known park> etc. This makes it very easy to physically locate them!

Here's an example from Citibank India website:

     Citibank,
     No. 91, Prestige South End,
     South End Road, Opposite Surana College,
     Jayanagar, Bengaluru - 560 004


I regularly see these... petrol pumps and bus stops seem popular reference points:

State Bank of India, No.997, Opposite To Bus Stop, Service Road, 4th Cross, 9th Main, Rpc Layout, Vijayanagar, Bangalore - 560040


I had no idea - I really like that and I think we should all adopt this convention. I could use "opposite beautiful landscaping" for my address, not sure what my neighbor would put in.


Some additional things I've seen:

* Address encodes where the building is physically located

* It does not matter whether address is written as name, company or company, name.

* That there is some unique mapping between postal codes and geographical areas (or post offices). I've seen buildings that have different postal code depending on what shipping service you use and multiple discontinuous areas that have same postal code.

* Address has exactly one city/town (although in Czech case most such addresses does not have street)

* Address contain non-latin characters used only in one language.

* Non-latin characters in address can be mapped to ascii in non-ambiguous way.

* Address is always written in order from most specific to least specific.

* Buildings whose number differ only in letter postfix (33a vs. 33e) are near each other. Alternatively: such postfix does not matter.

* There is single building number for address. (Czech addresses have two building numbers, usually only one is written in address, but for some places you either have to write both or indicate which one you mean)


I would like to add one point:

* Addresses can have accented or other foreign characters.

I ordered a package to "Fäkéstreet", but it arrived addressed to "Fäkéstreet", luckily the postman was smart enough to decipher it.

I have also run into the address length issue, even thou my address is only 54 characters long.


What's really depressing is I've had that happen when ordering from a Swedish company. I can kind of understand that US companies get it wrong, but when a company can't deal with all the letters in the alphabet of the country it's based in you know something is fucked.


One of the issues is actually that web browsers can have inconsistent encoding of the data they send, and depending on the amount of testing (across browsers) done that can yield surprises.

For instance, the "unicode snowman" is because MSIE 5-8 will refuse to send a form as UTF-8 (completely ignoring `accept-charset`) if it can encode everything to Latin-1. Conversedly, most browsers will default to UTF-8 (but I believe normalization may vary). If the system was built in the early 00s and only tested in MSIE, it might well expect all input data as latin-1 (because that seemed to work at the time) and crap out when UTF-8 comes in.


What does "unicode snowman" have to do with this?


Some websites now will include a hidden input field in all forms <input type="hidden" name="snowman" value-"&#9731" />

to convince IE that it's supposed to be sending UTF-8, not latin1 (And so the site can recognize if the input was likely mangled.


It's built into Rails, except they use utf8=✓ now.


I am from Brazil...

Most programmers here assume stuff US-like in addresses.

Well, and government forms even frequently require a specific style of address (Street - building number, city, state, zipcode)

But in Brasilia (Brazil capital) it already breaks.

A typical address there for example is: Building 3, Apartment 12, Block H2, Residental Section, North Wing, Brasilia

As you can see, Brasilia does not have street-based addresses, in fact most streets there are not named at all, the city was made on purpose to allow this new type of adressing that the city planners thought it was better.

Also, zipcode precision is almost random, I lived once in a building with 20 apartments that had its own zipcode, and I also lived in a city with 150.000 people that had only one zipcode, and I know a building that has has three zipcodes.

And there is zipcodes for streets, neighborhoods, and several other random things.

And the coolest feature of all: brazillian zipcodes can start with a zero, breaking every stupid database that store them as number instead of string.

Also, streets not only can have more than one name, it is possible to the post office have the WRONG name.

I lived once in a street named "Gracia Mauro Chieni", and for many years it was listed on post office as "Graça Maria Cheni"

After nagging them a lot, they fixed...

It was now: "Gracia Mauro Cherene"

And of course, sites that used the post office company API to find the zipcode worked great with the name of that street ;) (NOT)

Oh yeah, remember the street I mentioned, the one with two wrong names in the post office? Well, in the power service registry (That in Brazil is valid as address proof and some other address related things), the name of the street is "H", yes, just one letter... It was the original name of the street when it was first opened, and they never bothered to fix it in their database.


There are US zip codes that start with 0 as well.


Which Excel loves to silently turn into integers and subsequently destroy your zip code column. Not. Fun.


Those excel "features" always seem to be breaking something. The other recent one that comes to mind is SKUs being converted to scientific notation.


Your address for Brasília looks strange to me. Typically I recall them being like:

SQS 102, Bloco A, Apt 100 Brasília, DF 70000-000

That is SQS or SQN and the number indicating which wing and which section. And the Block and Building are also the same. Are there places in the city that actually require the long format you describe?


I was using this from memory, but yes, your short form is correct.


> And the coolest feature of all: brazillian zipcodes can start with a zero, breaking every stupid database that store them as number instead of string.

Considering the number of countries with alphanumeric zipcodes, or zipcodes with interspersed signs (e.g. japan is \d{3}-\d{4})... the mind boggles.


Another falsehood:

* Street numbers are integers.

A house near my university was numbered something like 3 1/2. It was between 3 and 5, and 4 was taken by a house on the other side of the street. I'm also sure I've seen houses with 1/4 and 3/4 in their numbers. Platform nine and three quarters doesn't need to be magical!

I once lived in a house that was numbered 50-2. A lot of U.S. websites wouldn't accept the hyphen. Fortunately, I could just truncate it to 50 because there were only a handful of houses with the same prefix and the postman knew where my family lived.

I also once lived at an address where the street number had the suffix "B".


There are a lot of 1/2 addresses here. Usually happens when a house that was originally a single address has a room (or several) divided off and rented out as a separate dwelling. So that can become "320 1/2 S. Lincoln Ave" or whatever. Sometimes though, in the same situation, separate apartment numbers will be used on the orignal adddress. I'm not sure what drives this one way or the other.


A friend just moved to a house number 1 on the street. Left of him is the typical number 3. To the right, someone added duplex, which provides two units: 1A and 1B.


Australian blocks that have been sub-divided (Turned into some form of Duplex) will often have an A and a B. So you will have 17A Station St.

I also lived on a street (~15 years ago -- without numbers. Small country town, the Postman knew who we were -- only 5 houses on the street)


Grand Junction Colorado has all of it's streets numbered by how far away they are from the Utah border. Basically all of the numbered streets are a fraction of some kind.


It seems to include one in itself, with:

> The user will know their postal code/zip code.

Which assumes that the user has a postal/zip code. For a counter example, all addresses in the ROI outside of Dublin do not have postcodes. (And most online sites won't validate Dublin postcodes either, being of the form D4 or Dublin 4 for example)


Same for the 7M+ inhabitants of Hong Kong: no-one uses postcodes, and it becomes really bothersome when most websites asking for your address insist on you entering one...


Not sure how the folk in Hong Kong handle that, but as an Irishman: I always use the province (Munster) in online forms which insist on a postcode.

Which works fine on the envelope, postman hardly notices, but leads to another one for the believing programmers: a postcode field contains a postcode


One of my favorite address stories: the Russian Post Office successfully delivered a package with an address hand-written with the wrong character encoding.

https://en.wikipedia.org/wiki/File:Letter_to_Russia_with_kro...


Another fun story, this time just involving repeated escaping and mangling of an address:

http://www.jwz.org/blog/2013/05/i-resemble-this-remark/


The example of "GB Technical Services, Unit W7a, Warwick House, 18 Forge Lane, Minworth Industrial Park, Minworth, Sutton Coldfield, B76 1AH, United Kingdom" could still reasonably be written within five lines this way:

GB Technical Services

Unit W7a, Warwick House, 18 Forge Lane

Minworth Industrial Park, Minworth

Sutton Coldfield, B76 1AH, United Kingdom

I don't think British people would actually write it as eight lines. Does anyone have a better example of an address that does not reasonably/conventionally fit on five freeform lines?


You could compress it a bit, but the standard is to write the postcode and the country name on lines of their own, and usually the post town too (that's "Sutton Coldfield" here). So trying to compress down to five lines would require really weird formatting (and possibly confuse the automatic postcode-reading hardware).

(http://www.royalmail.com/personal/help-and-support/How-do-I-... specifically wants postcode on a line of its own.)


You could get away addressing that as:

GB Technical Services

Unit W7a

Warwick House

18 Forge Lane

B76 1AH

The postcode effectively encodes the post town and county. It may not be easily 'human readable' but Royal Mail will deliver it no problems.


Very few experienced programmers will have these beliefs. Addresses are the most fucked up personal information that we have to deal with. Names run a close second.

Over beers a few of my mates came up with a system like DNS to map a mailing domain for an individual/organization to a physical location. But after much discussion we decided it was just easier to let the geo-challenged to lease postal boxes with saner locations.


I'd say that in the US Social Security Numbers run neck-and-neck with names for causing problems, though the data is less screwed up. The problem is that they look simple, unique, and universal so they're an obvious primary key... but only the first of those is close enough for most uses. Duplicate SSNs have been issued often enough that I've run into multiple collisions in a single production DB (and confusing two people on electronic home arrest can be pretty bad). Even more common are people who simply don't have one - religious objectors, non-citizens who haven't acquired them (though some have them), etc. The net result is that people keep using them as primary keys, as join keys, and as unique constraints and then get bitten hard.


> Very few experienced programmers will have these beliefs.

You're over-generalizing your personal experience: I've run into quite a few experienced programmers who just haven't spent enough time working on geo code to understand how treacherous it can be, particularly if their past work hasn't been sufficiently international. This is particularly bad with low-level / overly-academic developers who think the problem is that previous attempts failed because they didn't model addresses sufficiently rigorously.


> Over beers a few of my mates came up with a system like DNS to map a mailing domain for an individual/organization to a physical location.

Isn't that what ZIP codes should have been?

In the UK my postcode is limited to not many addresses. (Mine is 10, but they can be more.) The Post Office database is widely used. It's not always particularly accurate. While there's no official database of addresses the Post Office database is pretty close.


In France postcodes usually span several towns, or one medium city, and only larger cities have several postcodes (three for my 300 000 inhabitants city, for example).

They are not very precise, which is to be expected since there are only about 100000 possible postcodes in our scheme.


The US used to have five-digit postcodes, now has nine-digit ones, but laypeople mostly ignore the extra four digits. Singapore used to have two-digit codes, then upgrade to four, then to six. Maybe French postcodes will grow another digit or two someday.


The U.S. system is nice in that most people only even need 5 digits. The USPS reduces fees for their large customers that use all 9 digits, so they still end up with the majority of their mail being very specific, while those just sending a letter to a relative can just use the 5-digit code and let the destination post office take care of re-sorting it before delivery.


This really makes me wonder about the practice of "anonymizing" data down to birthday / sex / zip


It's not anonymous enough. I pointed out that this fact was buried in a story a while ago on HN [1]; it had a grand total of 6 upvotes and mine was the only comment.

Moral: The best content doesn't always stay on the front page for any length of time, you should check out /newest once in a while too.

[1] https://news.ycombinator.com/item?id=5609470


I saw that; I was aware that such data could be fairly reliably deanonymized. But with a country norm where zip codes frequently contain 10 people, who would ever imagine that just zip, with birthday / sex stripped, would be even slightly anonymous?


Wow. That's insane.

As a software developer, I'd like to propose some changes in the world.

1) Addresses should be the very precise lat/long of the mailbox + the name of the recipient. Nothing else.

2) There should be one time zone for the whole world and no daylight savings. Time should be 24-hour-based. Perhaps you'll need to set your personal alarm for 19:00 in the "morning". Tough.

3) Nobody is allowed to change their names.

4) Whoever keeps sawing outside our building should stop.

That is all for now. I look forward to the complete capitulation of all world leaders to my frank good sense.

Thank you.


I generally agree. Regarding #1, that should be the lowest level available, akin to IP addresses on the internet. There should be another layer of indirection available, so that I can e.g. register "mike@mikeash.com" with the post office and have it work for physical mail. That way I don't have to notify anybody else when I move, and I don't have to tell people where I live just so that they can send me mail.


Once we adopt a single time zone, I also propose using the same core work hours everywhere in the world. That way, if you need to schedule a conference call with someone on the other side of the world, no one needs to come in early or stay late. Also, it will eliminate jet lag when traveling.


That's fine, until the next major earthquake invalidates a whole mess of addresses. Also, long term this won't work as the continents do drift ...


You left out adopting the metric system.


Ah, so you could never move a mailbox?


Or have mailboxes stacked on top of each other. Has GP never visited a mail room?


Japanese addresses are also unusual in that they generally don't involve streets (or street numbering) at all but are instead based around city districts and blocks.

http://en.wikipedia.org/wiki/Japanese_addressing_system


Korean addresses are similar, but they're slowly transitioning to an American-style addressing system that uses street numbers and street names.

I guess the district-based addressing system made a lot of sense back when people just built houses in random locations around the town center.


Should be "Falsehoods US programmers believe about addresses".

Anyhow true. I recall the "Apply for this position" form of a big player, open to remote working. A full US address was actually required to submit /facepalm


Which is two problems, since it implicitly discriminates against the homeless, who need a job to get an address.


Though I don't think many US programmers would think postal codes were small. My rural birthplace ZIP has a few thousand addresses and there are some ZIPs here with over 100,000. But yes indeed I learned quite a few things about the Royal Mail in the post.


My Adress is:

street_name 42/38/13

Yes, with slashes. It's the number of the block, then the number of the stairwell, and then the number of the flat. I cannot enter that bloody number anywhere online, every website shouts at me that it's incorrect. It's not, I think I know where I live!


My address is of exactly the same form and I can't use it to subscribe to the delivery of the magazine I read. The site doesn't alow slashes.

BTW the zip code contains the letters and the dash too.


So, the obvious temptation here is to say "fuck it" and just leave a textbox for users to put in their address.

I cannot help but feel, though, that this is inviting ruin and disaster. I'm currently dealing with some of this myself--what's the best solution other than just forcing people to use a five-line US-stle address and hope for the best?


I would look at what you need the address for and actually make the request based solely on that. Generally speaking, you ask for an address in order to send mail of some kind. Inside the US, you basically never need the city/state: they're for error-checking (wait, XXXXX isn't actually in California; it's in Iowa). I had a friend give me an address that the postman told me didn't make sense (no city field), but it still got to him fine: I'd lay good odds that the post office people local to my friend simply had a modified convention.


I thought this was about computer memory addresses. Falsehood about falsehood about addresses ;)


its interesting how a lot of these are from the UK. In Germany on the other hand - every address follows the same format. The only weirdness you run into is that sometimes (Especially in former east german areas) - the numbering of buildings isnt really sequential.


sure, but you still have problems, even with a well structured system. Postcodes and addresses are designed for delivering mail, but then society tries to shoehorn them into doing other things. As an example consider the query "show me all addresses in the database in Bavaria". It sounds very straight forward, but it isn't because postal address don't contain that info, you have to keep a mapping of postcodes to Länder or whatever other geo unit you want to use. Likewise there are all sorts of very well known places that have no correspondence to official postal directories, for example here in London if I ask someone to meet me in Chinatown they will know exactly where that is. But it doesn't exist as an "official" address.

We (my company Nestoria - a real estate search engine in 8 markets) have to geocode tens of thousands of listings a day all with incomplete or haphazard address info. Here's a talk one of our guys gave a few months ago about how we do it. TL;DR - geocoding real world data is hard

http://www.lokku.com/nestoria-geo-system/

he's going to give a much more detailed talk at the next #geomob if anyone is in London and interested: http://geomobldn.org/post/47481139743/date-set-for-july-geom...


I think Australia also has a very rigid system. I signed up for Telstra prepaid online services, and everything was very strictly validated (including if the street number existed), and almost everything came out of a dropdown (down to a separate one for "is the road called road, lane, street, way, etc"


Nuh-uh. I had this as my address for a while:

  5 Division
  Australian Defence Force Academy
  Department of Defence
  Canberra ACT 2600
Or another one:

  Officer's Mess
  RAAF Williamtown
  Newcastle, NSW
Both military of course. Try sending mail to someone stationed on a boat, uh sorry, ship!

I'm sure various universities, government departments, and companies have similar silly situations. For example I would be willing to bet that you get some very weird addresses in some mining operations.


Ships are usually pretty sane, at least y military standards. Officers' mail tends to be addressed to the wardroom (the Naval equivalent of the Officer's Mess). Other ranks' mail goes to their mess, which is generally designated by bulkhead number and port/starboard. (Rank s, of course, optional, and not expected on civvy mail.) The "street" is the ship's name, city is the home station (and where the mail is sorted and dispatched) and there is usually a separate postal code for each ship. So, for a fictional Canadian Able Seaman:

  Ralph Rackstraw
  19 Starboard,
  HMCS Pinafore,
  CFB Halifax, NS (, CANADA - optional)
  X1X 1X1
The system in Oz at least used to be similar. Which is great if you're actually posted to a vessel. That isn't always the case in Canada; we air types, the folks what flies and fixes the helicopters, are nominally "Air Force" and are seconded to ships for voyages only, remaining on squadron strength back at our home airfield, so unless one is vigilant about changing addresses every few weeks, important mail may be headed to your post box at the barracks while you're at see and to the ship while you're at home.


Thats not true: e.g. "27321 Finkenburg, Germany" no street, no house number. And the Finkenburg is in western Germany, close to Bremen. It was a former Saxenschanze, an artificial hill between Weser and Dike, that is surrounded by water several times a year.


One aspect of German addresses is rather strange, I find as a foreign resident. If a building is split into individual flats, each flat does not have its own number in the address. Letters are addressed to names in a building. We have to deal with the electricity company with designations like Erdgeschoss Links (ground floor, left). They also get rather confused if married people do not have the same surname.


Married people with different names is not a problem: you list all of the names on the name shield. This is also common with WGs (shared accomodation).

The common case that fails badly is where two people in the same apartment block share the same surname. There is no good workaround for this case, especially given the German resistance to putting first names in formal correspondence.


This was the case when I lived in Switzerland as well. My address was just my name at the building's address.

There were 12 units in the building and I don't believe they had numbers. The occupants attached their names to the mailboxes outside the building so the mail carrier knew which box to put things in.


Counterexample: the village Rutha (in Thuringia) seems to have only one un-named street. So its addresses lack the street name field.


Same in Knechtsteden, near Dormagen. It's a former cloister, currently a school, and has no street name in its address.


Any PO-Box address in Germany has an address format different from the normal addresses. Some PO-boxes only have a zip-code and the town. It's still all fairly standardized but exceptions exist.


Lets cross check what the United Nation has to say about NAD:

http://www.unece.org/trade/untdid/d04b/trsd/trsdnad.htm

As you can see, only the 3035 "PARTY FUNCTION CODE QUALIFIER", telling if this address is a buyer, a supplier, a shipper, or something else, is mandatory and everything else is conditional.

You have 5 optional lines of "Name and address description", 5 optional lines of "Party name", 4 optional lines of "Street and number or post office box identifier", an optional "City Name", several optional "Country Sub-Entity Details", and optional "Postal Identification Code" and an optional "Country name code"

I think that pretty flexible, allowing short addresses like "27321 Finkenburg, Germany" and long ones also. The main restriction is line length, so the address has to fit onto an envelope.


My personal favorite is Nicaragua. Here's an example:

Reparto Serrano, de la Policía de Plaza del Sol 4 cuadras al Lago, Casa esquinera Managua, Nicaragua

Translation: Serrano Division. From the Policía de Plaza del Sol go four blocks towards the lake (which is North in Managua). Corner House. Managua, Nicaragua.

Absolutely crazy. No street names at all. Addresses are really directions from well-known places (except when it doesn't exist anymore - "de donde fue" : "from where there was.."). They use the old spanish unit 'vara' - 84 cm instead of the meter to specify houses within blocks. Up and down aren't north and south; they're where the sun goes up and down (east and west).

More information here: http://vianica.com/nicaragua/practical-info/14-addresses.htm...


See also “Falsehoods programmers believe about geography”: http://wiesmann.codiferes.net/wordpress/?p=15187


As a complete aside about addresses and because falsehood number one needs some clarification: even when you are living on a houseboat you are required to have a mooring license unless you want to be classed as a "constant cruiser" (moving beyond the parish borders every [arbitrary interval]) and in order to get this license you have to provide British Waterways with a fixed address - which by their own account they do not check.

I'm somewhat biased against BW as my (disabled) father had his boat repossessed under a "section 8" with no warning and was made homeless because he did not supply them with a "fixed address". When confronted a few months before the eviction BW even refused to take an address my father's advisor offered during the meeting because they knew it wasn't his. Even after confirming that they do no background checks on addresses and knew full well that most boaters were using addresses that were not their own.

There are real human costs to these sorts of things and my dad was one of those people caught up in a self reinforcing loop that saw no solution. Fortunately it is an edge case, and most likely the result of corruption - the £30K+ boat was eventually sold on by BW for £5K and my dad never saw a penny. He can't even contest the decision in court because it would cost a small fortune.


I worked for a US company for a while that had a satellite office in Costa Rica. The street address, translated from Spanish, was "the office above the chicken factory"


Slightly off-topic, but street names in the Twin Cities metro are broken. I live in a subdivision in which nearly every street has the same name, differentiated only by the road's "last name." I.e., Maplecliff Drive, Maplecliff Circle (which is actually a cul-de-sac), Maplecliff Alcove (which isn't even a type of street), Maplecliff Curve, Maplecliff Way, Maplecliff Court...

In Minneapolis, all the streets are numbered by their distance from the center of town, and the north-south thoroughfares are called Avenue, while the east-west roads are called Street. So if someone tells you they're waiting for you at the corner of 2nd and 4th, you have 8 potential intersections to check.

Everywhere else in the cities (OK, that may be hyperbole) building addresses have something to do with intersecting streets. For example, 47 7th Avenue is going to be between 4th and 5th Streets, closer to 5th (and on the west side of the road). God help you if you try to apply that logic in St. Paul. Their numbers bear no relationship at all to their cross streets.

Finally, while my Maplecliff example above evinces a certain laziness or lack of creativity on the part of the subdivider, I think the height of laziness has to go to the subdividers who don't even bother to come up with a whimsical base name for their streets, and just name them after nearby, already-existing roads. Especially when said existing roads are merely numbered roads. You'll get road names like 172nd Street Circle or 163rd Avenue Way. And yes, I did once see a 185 Street Lane Court.

A friend from Arizona reports that in his city, NS roads are Avenues, and EW roads are Streets, but there was one road that cut diagonally through the city--so they called it a Stravenue. Abbreviated Stra, if I remember correctly.


Here in Seattle, on the opposite side of Lake Washington from the city of Bellevue, there is an intersection between streets named Bellevue Place, Bellevue Avenue, and Bellevue Court. A block away from this strange triplicate intersection you will find the intersection of Belmont Avenue and Belmont Place.


Here is the State of Illinois catch-all approach to addresses (from their voter registration form:

http://www.elections.il.gov/downloads/votinginformation/pdf/...

    IF YOU HAVE NO STREET ADDRESS, 
    below describe your home: list the name of subdivision; cross
    streets; roads; landmarks; mileage and/or neighbors' names.


As a regular shipper, the addresses I have the biggest problem with are at colleges and universities. No address verification/normalization software recognizes things like "3948-B Engineering Wing, Richguy Hall, Anystate University, Cityville, ST, 99999". But if you force the address through, the local UPS guy will know where that location is every time and deliver the package without a problem.


"The user will know their postal code/zip code. Most users will, of course." Not in Indonesia.

Also, house numbers are sometimes not sequential. Some people have "favorite" number and when they see it's not taken, they will use it ("Nobody in the neighborhood has 123, so it's mine.")


Next article should be about times/dates. Times and dates are F*cking hard sometimes :-)



Jon Skeet's "The joys of date/time arithmetic"

http://msmvps.com/blogs/jon_skeet/archive/2010/12/01/the-joy...

His talk in London Dev Days 2009, also goes into problems you can have with dates and timezones (including Argentina's 2009 last minute time zone change back into not using daylight saving time).

http://vimeo.com/7516539


It still annoys me that unix-time is not monotonically increasing. It was such a beautiful idea and it got totally screwed up by the lame idea of leap seconds.


Unix time is monotonically increasing, as long as there are no negative leap seconds (and there never have been). It's just not uniformly increasing, since it tracks UTC.


I think you have the concept of leap-second backwards. A positive leap-second means that the time-of-day is held for one additional second. Since unix time does not respect leap seconds there are 35 different unix time values at the resolution of seconds which refer to periods of time lasting 2 seconds instead of one.

And that means that unix time at a resolution higher than a second will jump backwards at those 35 different leap seconds.


> I think you have the concept of leap-second backwards. A positive leap-second means that the time-of-day is held for one additional second. Since unix time does not respect leap seconds there are 35 different unix time values at the resolution of seconds which refer to periods of time lasting 2 seconds instead of one.

Monotonic functions can yield repeated values, they don't have to be strictly increasing or decreasing. A constant function is monotone.

> And that means that unix time at a resolution higher than a second will jump backwards at those 35 different leap seconds.

Yes sub-second timestamps won't be monotonic, but UNIX time (or POSIX time) is only defined with a resolution of a second, so it is monotonic.


This isn't quite correct, and as the grandparent applies the concept it is definitely incorrect. A monotonically increasing function may never repeat a value. A constant function is not monotonically increasing; it is monotonically nondecreasing (and of course, also monotonically nonincreasing).


POSIX time.h defines "unix time" as seconds since the epoch in type... time_t, which (according to the POSIX standard) can be either an integer or a floating point value.

In practice though, "unix time" is merely a convention which is based off of the POSIX standard.


The "roads have only one name" rings very true for me. I usually denote my road as "S Hwy X" (where X is a letter, not a number). But over the years the official USPS designation has changed slightly and different organizations have different naming standards, ranging from "Highway X" to "South County Rd X" etc. One time while attempting to make a credit card payment I discovered that the vendor validated the billing address against both my bank AND the USPS standard, which were doomed to never agree. Needless to say I did not end up buying anything from them.


The worst is that every country has postal/zip codes. Hong Kong doesn't. When the field is required (most of the time, even for core internet companies), /(HK){1,3}/ tends to work..but not always.


Ireland doesn't


Latitude and longitude based addressing is well-past due. Two around-10-digits decimal coordinates and a building number is all we really need these days.


Hah!

Actually, Salt Lake City in Utah is sort of vaguely like that, though with street numbers (in the usual units of 100 per block) rather than latitude and longitude per se.

The addresses are things like "250W 500S, Salt Lake City" -- which is going to be on the fifth street north of the Temple (and marked as 500S on the street signs), two and a half blocks west of the north-south street that's centered on the Temple (which is Main street, not W. Temple Street. W. Temple Street is what would be 100W if it weren't named).

I'm not sure what the history is on canonical orderings of the two parts of the address. Currently it seems to be usually the one that's the "street number" followed by the one that's the "street name", but I'm not sure if that's a result of auto-regularization by systems that assume such a thing, or if it's historical.


Here's a funny UK one.

I was opening an account at the Lloyd's TSB Branch in Hanover Square (http://www.allinlondon.co.uk/directory/1063/11849.php). Turns out their address form requires the first half of a postcode to end in a number, even though the address of the branch itself ends in a letter.


Not all addresses in the US have streets, either. I grew up in rural Wisconsin at a house that only had a fire number and county. "N3042, Rock County WI" was a valid address.

I believe my grandparents only got a street assignment in the last 2 or 3 years -- and even then, the actual street is about 2/3 of a mile from their house.


HL7 V3 (including ISO 21090) has a good, flexible data model for addresses. Anyone can have zero or more addresses each of which has a use code (home, work, postal, etc). Each address has zero or more parts each of which has a type (line, street, city, postal code, direction, suffix, etc).


One that applies to the US, many folks have TWO addresses.

One where they live, and the other where they receive mail. Up here in Rural California, that is the case, postal delivery is not a given to most of the residences. Sometimes the only delivery option is getting a post office box.


And sometimes an address that references a box number is both the "where they get mail" and the "where they live" address.

I grew up at the address of "Route 1, Box 78, City, State ZIP". That was the rural mail delivery route, but it was the only actual address there was, unless you wanted "last house on county road 601 before you go over the top of the mountain" sorts of things. (The mailbox in question was on a post at the end of our driveway, and said "78" on it.) UPS and FedEx would deliver there, though sometimes we had to point out to a new UPS driver that no, we weren't box 79; they were two miles away on the next bit of the mail route.

This also points out the changeability of addresses. Before I was old enough to remember, it was Route 1 Box 150A. And then ten or fifteen years ago, they went through and named all the streets and numbered all the houses, so it became "420 Streetname, City, State ZIP". The postman still knew that when my grandfather sent mail to Box 150A it should come to us, though.

My current house is a duplex. The normal scheme would be something like "100 Streetname, unit A" or something, but somehow we ended up with "100A Streetname" instead. For added confusion, the actual doors aren't labeled, just the mailboxes.


Or said another way, the falsehood is "Addresses don't depend on delivery mechanisms." I'm also in California without home postal delivery. If you're going to deliver a package via USPS, you need to use my PO Box. If you're going to deliver via FedEx/UPS, you need my street address. If you don't tell me which way you're going to send it until the end of the order process, I have to go back and fix it if I guessed wrong.


When I was little, we had no address. We did live next to the local post office, so using a PO box for mail was nice and simple. For UPS, we had to give an address like, "the house next to the post office on the east side". It usually worked, but occasionally they'd get their directions mixed up and deliver to the house on the other side.


The situation is pretty similar to email addresses. You could try parsing them, but it's pointless, and usually not what you want. What you want isn't a street, 'zip code', or city name. What you want is a route from point A to point B.


I live in India and most of the points mentioned are common sense stuff here - in other words just don't assume anything apart from Country and State. Just give two/three fields and let the user write his address... :-)


A house will have just one postcode.

- Not if it is a new build, had a temporary postcode during construction and has since been assigned a new postcode.

(In the UK) an address will contain a county.

- Not if it is in London (city and administrative county).


Addresses, like phone numbers, are difficult to normalize.

The best you can do is hit the 80/20 rule: provide a solution for most addresses/phone numbers out there and minimize the edge cases.


I read recently that Dubai doesn't even have physical addresses. All package and mail delivery is done via detailed directions and descriptions of the destination.


In the places I have been in Dubai, they definitely have addresses, but its more like "Office X, Jumeirah Business Tower, Jumeirah Lakes, Cluster W." (I am not sure about post boxes or post codes) Sometimes the building names are arabic, so you would instead say "Building 4" in english as they are well numbered.

In some places in Africa this is definitely true, an example of an address in Mauritius would be "Second house on the right, past the hindu temple, the house with the red wall".


It sounds like we should just make name and address Unicode-compatible textareas no validation.


Here in Ireland there are no postal/zip codes. I always have to enter 00000 on websites.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: