Hacker News new | past | comments | ask | show | jobs | submit login

While interesting, I can't see why any programmer would assume half of these ever. Why would anyone go out of their way to restrict place names to be in the "usual character set of the country"?

No one would go out of their way to, but it's pretty common to have these cases break because no one's ever bothered to test them. On screen keyboards don't have characters no one will ever type, fonts don't have characters no one will ever display, sorting and string manipulations may not bother to handle accented characters correctly if there will never be accented characters, etc.

No one actively thinks "I'm going to intentionally omit solid unicode support"; we just don't bother with it until we feel there's a good reason to, and by then it's often too late.

Data validation - you want to make sure your users are entering addresses correctly, and catch errors early (say, at checkout) rather than result in a negative experience (say, a missed delivery or returned package).

You may also want to make sure your customers have entered their full address, rather than a short form that cannot be used (see: "123 Fake St", without any markers for city, county, country, etc) - and doing so necessarily requires some structured understanding of addresses... which comes with all the pitfalls of assumptions.

There are also uses for addresses that aren't necessarily about delivering a physical item to said address - for example determining the correct taxes to charge a customer based on zip code (some zip codes do not map to a physical area, therefore are not useful for determining taxation).

There are lots of perfectly understandable reasons why programmers would assume the format of an address.

It's not that you'd restrict it, but you might not test it with "weird" characters, and it might break your application in some way (e.g. layout).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact