Hacker News new | past | comments | ask | show | jobs | submit login
Falsehoods programmers believe about geography (2012) (codiferes.net)
62 points by AndreyKarpov on July 31, 2017 | hide | past | favorite | 57 comments

I work on geocoding, so I have a ridiculous stash of these saved up for a talk some day. Here's a couple:

- Place names are listed smallest feature to largest feature (address, city, state, etc)

- There is one name (per language) for every place, and everyone agrees on it

- Place names will start with an alphabetic character (see: 's-Heer Arendskerke)

- Addresses start with a number, for each building

- Addresses include a number for each building

- Address exist for each building

- On streets with addresses, the addresses will be in order of location on street

- Speaking of streets: all streets have names

- Every country's place hierarchy is roughly similar, or at least has a similar number of layers

- A city can not contain other cities

- Zipcodes have defined areas

- Zipcodes are areas

- Zipcodes are stationary areas

Here's some that are more Arg! Programming!:

- People will search for features in the language in their phone settings

- Two similar names separated by punctuation are likely to be synonyms (lookin' at you, Helena-West Helena)

- Governments use consistent projections

- Governments use consistent file formats (thanks for the SOSI, Norway) (no but really y'all make great maps, thank you)

I could go on. Even though there's a ton of inconsistencies and frustrating moments, it really is a joy to learn about the world through its idiosyncrasies.

(edited for formatting)

> - Every country's place hierarchy is roughly similar, or at least has a similar number of layers

It was always fun seeing packages arrive in Singapore with an address ending in "Singapore, Singapore, SINGAPORE" since every international shipping company assumes that city, state/prefecture, country are all different and required.

Ideally address input would just be one multiline text field, but unfortunately users are just too unreliable to be left to their own devices like that...

Yeah! It's also hard to distinguish programmatically when this is a good response: New York, New York, is expected, while Hong Kong, Hong Kong makes a lot less sense.

There's a fun street in West Hartford: "Boulevard". Everyone abbreviates it to Blvd, of course. They didn't use it as a descriptor, it's the name.


  - Zipcodes have defined areas
  - Zipcodes are areas
  - Zipcodes are stationary areas
Can you elaborate on these? I always assumed (thankfully not for any professional purposes) there was indeed an area assigned to each zip code.

Military zipcodes often do not have a defined area - as military units are mobile. The zipcode may direct mail to varying actual physical places.

Certain extremely high-traffic entities (e.g., government agencies) may also receive their own zipcodes - and mail directed at them may go to different places. Private places (see: Empire State Building, 10118) may also receive their own zipcodes which, while representing a physical building, defies the usual "defined boundary" zipcode definition.

It's worth noting that the zip-for-building case is very common, most cities with skyscrapers have a few with their own zip.

It still definitely counts as an edge case though, in the sense that if the building were torn down or foreclosed I don't think its block would still warrant its own code

Indeed, and it's important to remember that skyscraper-zipcodes create holes in otherwise regular polygons, which is often not accounted for when storing shapes of zipcodes.

Lots of edge cases abound when using zipcodes, especially zipcodes in a spatial context.

> It still definitely counts as an edge case though, in the sense that if the building were torn down or foreclosed I don't think its block would still warrant its own code

For example, the World Trade Center's zip code was retired after the 9/11 attacks.


In Singapore, every building of a reasonable size has its own zip code

Zipcodes in the United States are postal routes, which aren't really polygons like cities. We can make an approximation, but that's not how they're defined. In some countries, point postcodes are common. We just spent a bunch of time on postcodes in Great Britain, which commonly only have a few addresses in each one. Even in the Unites States, some universities or other large institutions will have a "postcode" which is a single address. The US military also assigns postcodes to ships, which tend to move a bit more than your average building.

As others have mentioned, zip codes might not have any geometry at all (point zips), and they are frequently updated/changed. In addition, zips do not follow standard constraints that other defined region types do. For instance, counties are entirely contained by their parent state, but that is not the case for zip codes, as they frequently cross county and state borders.

My understanding: Zipcodes are postal routes. We just typically visualize them as the box containing those routes.

USPS maintains them as lists of addresses (with some sort of relationship to delivery routes). A new address will be assigned a zip code based on convenience.

So at a given moment you can generalize those lists into areas, but the way they are defined, there isn't a real boundary, just a line that happens to separate the addresses with different postcodes.

Bangalore, a large city in Southern India, is probably the world leader in the proportion of layers that have embedded numbers. Here's an example:


I happen to be in Bangalore right now and literally no one uses addresses. Everything is landmark based.

And that's missing all the coordinate stuff. Let's add

* FALSE: You don't need to know the difference between geographic and projected coordinate systems

* FALSE: There is only one geographical coordinate system and it's called WGS84

* FALSE: For any pair of coordinate systems there is one and only one way to transform data between them

(On the last one: https://blogs.esri.com/esri/arcgis/2009/05/06/about-geograph... )

Another fun one:

*FALSE: A coordinate that looks like 15 45 38 is Degrees, Minutes, Seconds.

(Sometimes it's degrees, decimal minutes. Happily, the maps in question had some "seconds" values greater than 59, which tipped us off.)

* FALSE: calculating the distance between points is straightforward and there is one way to do it.

> Buildings do not move

In communist Romania even churches moved, and quite by a long distance (200 meters or so). There's this Google Images link for moving churches (https://www.google.ro/search?client=firefox-b-ab&biw=1920&bi...) and this other one for moving buildings (https://www.google.ro/search?client=firefox-b-ab&biw=1920&bi...)

I had a fast food chicken restaurant near me that moved about 30 miles once. Guess it was cheaper than building a new one.

It happens so rarely, though, I can't imagine putting in any specific code for a building having been moved. Idk.

* FALSE: all place names are hierarchical (eg City ⊂ County ⊂ Country)

* FALSE: the levels of the hierarchy are the same everywhere

* FALSE: the levels of the hierarchy are the same within each country

* FALSE: street addresses contain street numbers

Also, postcodes have a geographic area. Here in the UK, postcodes can be non-geographic: http://www.royalmail.com/sites/default/files/docs/pdf/11july...

I remember trying to deliver an urgent parcel in London (I was coming from France) and discovering that the address numbers don't follow each other and are not organized by odd numbers on one side of the road and even numbers on the other side.

So you could have house number 200 between 15 and 42 ... of course I had to go the full length of the street before finding the right address. I remember feeling sympathy for English postmen back then ;-)

In Japan, most streets don't have names and buildings are numbered based on when they were built. Even with Google maps most businesses provide a drawn map showing nearest convenient stores and stations.

They are usually odd on one side, even on the other. But not every road fits into that neat pattern so sometimes you get them all on one side. They should be in a vaguely sensible order though - it would be very weird to get 200 between 15 and 42.

In Australia, postcodes have a many:many relationship with suburbs. A single postcode can cover multiple suburbs and a single suburb can have multiple postcodes.

I think postcodes are geographically contiguous but I'm probably wrong. :P

In the US postcodes have no fixed relationship at all with administrative areas, they are just lists of addresses. A code might span a county or be limited to a small village.

I would add: All house numbers with the same street name in a town are on the same street.

My address is 11 XXX Street. The house nextdoor to mine is 17 XXX Street. 13 and 15 XXX Street both exist, however they are on an identically named street on the other side of town with the same town name and zip code mailing address. Ever single "in between" number on my entire street is on the doppelgänger street.

Not so much of a problem from programmers (yet!) but delivery/repair people sometimes get confused and end up on the wrong street.

FALSE: Place names are a neutral fact, and using one source without interrogating what it names what and where it draws borders will never cause an incident.

Derry vs Londonderry being one. Its interesting how different publications have different style guide rules to handle it as best as possible. I can't remember which it was, but one had that it would always be called Londonderry first, the Derry for the rest of the article/piece.

I just noticed that BBC Weather plays safe and goes for "Derry-Londonderry" - I would imagine even that would anger people due to the ordering!


All countries have a Postal Code / ZIP Code or equivalent. Ireland only very recently added them and called them Eircodes (because of course we did). It used to be slightly annoying having to find some combination of characters that the online form would accept as they often varied site to site. Interestingly enough, several of these companies had their European head-quarters in Ireland. Also having to talk to the bank manager because the cashier in a UK bank wouldn't let me open an account without giving a valid postcode for my Irish address.

EDIT: typos

And to increase confusion, Eircodes aren't areas. Each letter box gets it's own eircode. So Irish postcodes are points, and 2 eircodes can have the same point. Also eircodes aren't similar to each other. "D01 ABCD" can be beside "D01 87D4"

We have some code here, not written by me!, which queries by Canadian Postal Code, of which, their are several hundred thousand.

Apparently, dropping the last character and requerying is the equivalent of a proximity search!

This in itself is laughable. But the company sent the code back to the creators to have the ability to query by city added.

Guess what. Apparently if you drop the last letter of a city, it's the equivalent of doing a proximity search!

I must be totally turned around on this.

Postal addresses have counties or states. In the UK counties haven't been a required part of postal addresses for many years.


I recently had to give up buying an item from a web shop. They were asking for my address and forcing me to include a state (my country, and thus my address, has nothing like it), but then complaining that my address wasn't the one associated with my credit card.

I had to give up buying an item from a UK web shop, because they "do not ship outside EU". For a digital item. For an address within the EU.

And if you do give a county, do you mean the historic unchanging counties or the modern government administrative areas with the same names?

> One of the Kergelen islands (part of France) is called Île de Croÿ, most french persons have no clue how to type the “ÿ” character.

Meanwhile, programmers believing some falsehoods about text encoding (that one character is one byte and -1 is EOF) may find themselves with a surplus of the “ÿ” character.

> most french persons have no clue how to type the “ÿ” character

Of course we do. While ÿ is not common, you find it here and there.

Typing it on a keyboard is no different from typing ë (the tréma first, then y). It is true however that some mobile keyboards do not have it as part of of the "y" family.

Here is github repo with much more lists of "Falsehoods programmers believe in ..." https://github.com/kdeldycke/awesome-falsehood

Post office oddities in my universe:

Two identical number and street name. (Eg 12 Aspen Lane) (). As a delivery person we (USPS, fedex, ups) make that mistake once. Why knowing name is critical.

Farmland made into houses: postal address in town A, property in town B. Have 4 houses with this in my area.

() the only reason this never changes is cause 911 knows the difference.

Let me add one, pertaining to my corner of East Europe that drives me crazy whenever I try to look up places on Google maps:

- Residential buildings can be identified by the name of the street they're on and a number.

For whatever reason, we decided that naming streets and numbering the buildings on them is too passé, so lots of our cities use Section + Number (where the numbers carry no geographic meaning), in parallel with Street + Number, but every building is addressable using only one scheme.

Concrete example with the city of Sofia, Bulgaria - consider these two buildings [1] [2]. They're both next to Vasil Kalchev street. One is a kindergarten, the other is a block of flats. Let's see what the address for each is, if you want to send a letter to them. The kindergarten is, obviously, St. Pimen Zografski street No. 5... well OK, that's the street on the other side of the building, nothing too strange; while the block of flats is zh.k. Dianabad bl. 54. The abbreviations mean literally "residential quarter Dianabad, residential building number 54". No, the building is not addressable via the street, you cannot send post to that building or locate it on a map via "Vasil Kalchev street, No. X" for any X. There are, in fact, no numbers on Vasil Kalchev street. And the residential building numbers aren't geographically meaningful - directly east of said building 54 is building 53, but directly west of it are buildings 42 and 43. There is no building 44. There are, however, 33, 33A, and 33B. They are just ad-hoc numbers (maybe with letters) that you need to have in a database, like you have the locations of streets and where the numbers on the street are geographically.

So what does this have to do with Google? Maps doesn't understand the Section+Number address system. 90% of the residential buildings cannot be found on Google maps using their official address. If I need to send my address to a friend, I can only do it as coordinates, because entering my address, the one on which I receive mail, will result in either no results, or worse - Maps will try interpreting it as a place name, do a partial match, and send you to some completely unrelated building, maybe on the other end of the city.

They're getting kind of better, because people are adding buildings to the map as "missing places", but it's still much safer to just use our local maps site. What I do to plan routes using Google Maps is first locate where the place is using our maps, then match the location on Google's. At least it has pretty good road data.

[1] https://www.google.bg/maps/place/Kindergarten+49+Radost/@42....

[2] https://www.google.bg/maps/place/42%C2%B039'53.7%22N+23%C2%B...

Edit: P.S. I just checked and OSM understands my address, and even shows you building numbers when you scroll around the map.

One of the Kergelen islands (part of France) is called Île de Croÿ

Surely they mean the Kerguelen Islands ?

My office has two separate zip codes... Regularly causes confusion with the delivery people.

While interesting, I can't see why any programmer would assume half of these ever. Why would anyone go out of their way to restrict place names to be in the "usual character set of the country"?

No one would go out of their way to, but it's pretty common to have these cases break because no one's ever bothered to test them. On screen keyboards don't have characters no one will ever type, fonts don't have characters no one will ever display, sorting and string manipulations may not bother to handle accented characters correctly if there will never be accented characters, etc.

No one actively thinks "I'm going to intentionally omit solid unicode support"; we just don't bother with it until we feel there's a good reason to, and by then it's often too late.

Data validation - you want to make sure your users are entering addresses correctly, and catch errors early (say, at checkout) rather than result in a negative experience (say, a missed delivery or returned package).

You may also want to make sure your customers have entered their full address, rather than a short form that cannot be used (see: "123 Fake St", without any markers for city, county, country, etc) - and doing so necessarily requires some structured understanding of addresses... which comes with all the pitfalls of assumptions.

There are also uses for addresses that aren't necessarily about delivering a physical item to said address - for example determining the correct taxes to charge a customer based on zip code (some zip codes do not map to a physical area, therefore are not useful for determining taxation).

There are lots of perfectly understandable reasons why programmers would assume the format of an address.

It's not that you'd restrict it, but you might not test it with "weird" characters, and it might break your application in some way (e.g. layout).

Would be nice if there was an ANSI/ISO standard for addresses like we have for dates.

Why the hell do these things keep coming up?

Can we just concat "Falsehoods (Arbitrary group of people) believe about (Arbitrary Massive Topic)" and just post it directly to a wiki?

This seems like the equivalent of "10 ways you're doing sex wrong".

They keep coming up because people find them interesting and/or helpful in some way. If you don't, that's ok, you won't like everything posted to HN. Just don't click and move on.

Almost any topic in CS has a multitude of edge cases and potential pitfalls. I've learned quite a few tricky edges to things I had never considered just reading through RFCs and standards docs on topics like calendars, Unicode, floating point numbers, and lots more. Practical guides like these blog posts of "Things I've really encountered problems with" are just the sprinkles on top.

Such a wiki would be great, but it would basically amount to collecting all the practical knowledge of every domain to which computers have ever been applied!

It's not spitefully calling out programmers for being ignorant. It's an invitation for reflection and getting some perspective.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact