Hacker News new | past | comments | ask | show | jobs | submit login
How to untangle phone numbers (factbranch.com)
45 points by tosh 13 days ago | hide | past | favorite | 48 comments





I particularly like how the UK phone number examples are not written how we would write them in the UK, which I guess underlines the point.

True, I’m not sure I’ve ever seen a 2 + 4 (e.g. 12 3456) style six digit number in the UK. 01234 567890 is probably what I’d expect; maybe 01234 567 890.

I've always seen it with the leading zero part of the area code to make 4, then 3 then 4 grouping. The leading zero was always listed.

e.g. 0123 456 7890

and 01/02 prefixes are landlines, 07 are mobile phones, 08 free, and 09 premium rate lines.

Wikipedia [0] says that there are 8 digit local numbers, and 2/5 digit area codes, but I happen not to have lived in a place that uses them until mobile phones took over anyway.

[0] https://en.wikipedia.org/wiki/List_of_dialling_codes_in_the_...


I like how the first US number does not have enough digits to be valid.

It is all just extremely confusing if you stare at the examples as I am pretty sure there can't be a country code +12... that would get parsed as +1. So, that first one is actually +1(234)567-8901 ;P.

Obviously no country code can be a prefix of another one. The switches would not know how to interpret it.

But 2 or more countries can use the same country code. The area code will determine which country you call to.

In the North American numbering plan +1 all numbers must have a fixed length 3 + 3 + 4.

In many other numbering plans the length varies. Both area code length and subscriber number length can vary. So it gets pretty complicated to parse a number. Basically rules don't help, you need a database and you need to update it regularly. No idea whether an open, non-commercial data set exists.

When I call to Germany Android shows the name of pretty small places. I believe Android reports every number you call to Google. But the database is probably still local, I believe locations are also shown when you have no data connection.

I am not from the US, but I understood there you might also need extensions to be dialled after the number proper.


The way the system used to be in the USA, area codes only had 0 or 1 in middle of it. So only +120 & +121 would do USA until they shifted in 1995 to have USA area codes w other numbers in the middle. I'll guess any of those little islands only get subsets of the number space to maintain comparability. Once I almost got ripped off fromthis, someone called an adult number w a domestic area code that went to Guyana in South America. I had American Long Lines block all foreign and all pay numbers, they knew if was a foreign porn number and tried to get their cut. I refused as they knew it was international and worse they knew it was porn. I knew to block as my roommate was still in college and I assumed issues would arise with out those blocks.

There's no +1234 (you'll end up in a number block allocated for Ohio), but other +12* numbers can go international. +1242 will end up in the Bahamas and +1246 is the international dialing code for Barbados. A bunch of Caribbean islands are part of the American numbering plan: https://en.wikipedia.org/wiki/List_of_North_American_Numberi...

But that isn't because +12 is a country code: it is because, as your link even shows, some of those are codes are international. I did not claim anything about the country, only the parsing process and how using +12 as an example is just confusing the whole matter as the reader is probably familiar with +1 and +12 can't exist.

The E.164 Standard is also a great place to start.

E.123 is for printing and general purpose use (as in use in URLS: https://datatracker.ietf.org/doc/html/rfc2806#section-2.2).

While the E.164 is (primarily) for storage and processing.


> If a local number starts with a single 0, strip the 0 and prepend the country code. If it starts with 00, strip the 00 but assume the country code is already there.

This will work for most of the world, but not everywhere. There are countries out there that don't use 00 as the international call prefix (Wikipedia has a list: https://en.wikipedia.org/wiki/List_of_international_call_pre...).

Take Austrialia, for instance, where 0011 is the prefix to strip to turn an international number into a local number, but 0018 is what you use to route a call to Optus. Australia's country code is not 18!

All of this can be prevented on mobile phones by simply placing a + at the front, but landlines (and systems simulating landlines, such as VoIP) quickly become part of a hellish web of deviating telephony standards.

Like email, the format may look simple, but only if you pretend not to have to deal with any other country or culture in the future.


Tangent - I really like Factbranch. At a previous company we used it to display data from our production database within Zendesk. It was a snap to set up the connection and also easy to edit the template.

Fun fact, I have vanity numbers ending with NNN-0001 and 0000 on my cell phones and I have revieved ZERO telemarker calls to date on either number. Whatever mass calling software telemarkers use, they won’t call numbers that look obviously fake.

Unfortunately -9999 seems to get lots of misdials.

A small note "If a local number starts with a single 0, strip the 0 and prepend the country code" is FALSE for at least Italian numbers.

For instance Turin, Milan, Rome and Genoa landline prefixes are 011, 02, 06, and 010. Any landline in Italy start with a leading zero BUT you can't strip it.

Let's say a Rome landline 061234567 can't be called like +3961234567 it MUST BE +39061234567. On contrary in France where essentially all numbers (mobile included, witch in Italy start with 3, no leading zero) start with zero you MUST cut the leading zero when you call it with an international prefix.

That's why in some countries the leading 0 is written as (0) meaning "you might or might not need it depending from witch extension of witch country you originate the call".


At my first job out of college I wrote a client-side phone number validator for US phone numbers. I was quite pleased with myself at the time, but I'm sure it was crap in retrospect. I remember reading up on the North American Numbering Plan and being sure to exclude phone numbers with e.g. a prefix starting with 1 and the 555 numbers that are unassigned (which is not all of them, to be clear).

Sorry if you got stuck maintaining it :-)

As an aside, my fake phone number for bullshit store loyalty programs is 420-911-6969. CVS seems to cull it reliably, but other places have accepted it. I'm amazed that no human I've given that to has asked if it's actually my phone number.


I suppose if a computer needs to use this for texts or calls then normalizing makes sense. Other than that, I don't think it's worth it as most of the time the phone field in a database is just for humans to look at in the UI.

This article also completely skips over Phonewords, eg 1-800-Flowers

https://en.m.wikipedia.org/wiki/Phoneword


Yes but users of your web site or app are pretty unlikely to put their phone number as one of these, aren’t they?

Not really, at least for B2B stuff.

I have access to a bunch of big marketing lists and there are always at least some Phonewords in there. The only big lists without them are the ones collected by forms that only allowed digits in the phone field.

Also on a personal level I have a "joke" number of (404)myname that I put into contact forms that allow it. Developers I talk to seem to have an easy time remembering my # because of the joke.


Shout out to http://phonespell.org, on the internet since 1995.

> What language is PhoneSpell written in?

> PhoneSpell is a system of multiple parts, some in C++, some in Perl, some in C, some in shtml, and some in shell scripts.

Checks out.


The mapping from letter to digit varies by country. Less so nowadays with mobile phones, but still.

> The way someone writes a phone number can give you hints about the country and area codes.

Or other things, depending on the country. For example, in Japan, area codes 090, 080, and 070 indicate mobile numbers. 050 means it's an IP phone, irrespective of area (hikari denwa).


In the UK, 01, 02 area codes are land lines, 07 are mobiles, 08 are freephone or fixed price services, and 09 are premium services. 03 and 05 are in use but less common, 04 and 06 are unused.

... and here in Buenos Aires, we have a phantom 9. If your cell phone is +54-11-5555-5555 in some apps/webpages you must write +54-911-5555-5555. But not in all apps/webpages, so sometimes you must try both options until the app/webpage is happy.

Also, from a cell phone you can use just use 5555-5555 but from a line phone you must add a 15, i.e. 15-5555-5555. So users type whatever combination of 11, 911, 15 or nothing they think is the best one.


For my phone at work, I could need a "how to untangle phone cables" guide! :)

I used to just store them as e164 (always with the country code), and as bigints.

you need a bit of country-code based metadata to convert back to strings, but the storage format is ultra compact and unambiguous.


Is the storage savings worth this added complexity vs just using normalized strings? What do you do when it needs to store `*555` in such a field? To me, in 2024, this seems like using a bitmask in a database to pack a bunch of bools into a single byte: technically it could be perfectly valid (ignoring the example above), but you're probably going to ruin someone's day who comes along after you and just wants things to work.

What customer has *555 as their phone number?

Edited to add:

It does depend on what you are doing with the numbers. Focussing on the storage side is missing the point. It’s about the ambiguity. In general I’ve found that anything other than integers is a mistake.

For example, *555 is not a valid phone number. You can’t put it in a tel: URL and you can’t dial it or send a text to it. It will work only in limited situations. If you want to store an extension it should be a separate field.

Once you realise that phone numbers are functional, almost exactly like IP addresses, you realise that storing them as integers has benefits because that way you literally are unable to store useless data.

Once you get in the habit of using only e164 numbers, the phone number becomes usable in almost any context, and you can make functional assumptions about it.

As an aside, it’s much easier to create fast and small prefix indexes on integers, but that’s a different story…


Maybe not a regular customer, but there's other reasons you might have a phone field:

> In Israel, certain advertising numbers start with a *.

> In New Zealand, non-urgent traffic incidents can be reported by calling *555 from a mobile phone.


> I’ve found that anything other than integers is a mistake.

They're not integers, they're strings. Strings of digits it's true, but still strings.


Actually, a better example is 911. It is a phone number, you can dial it, but has no E164 representation.

Wouldn't it be (assuming US) +1911?

+1-911 isn't a valid E164. 911 and similar numbers can't be represented in global form, they are only represented in local form.

I discovered that the tel: URL supports local numbers with context. So the bigint scheme can't represent valid tel: URLs.


I still feel like my point is being lost.

A generic contact identifier that you’re only going to show to other humans can of course be a string. People can put whatever they want in there. But it won’t be useful for automation or other kinds of processing.

If you’re going to actually use the number in any kind of automation, including tel: urls, it’s better as an integer because then you’re forced to process the incoming number properly before you put it in the database.

In terms of E164, You’re not ever going to store numbers like 911 in a database of phone numbers that you’ll plug into automations.

The whole point of E164 is to normalise the international numbering system. Why would you not use the international telecommunications standard for numbering, to store telephone numbers that are intended to be used to contact people?

It’s the same as storing an IP address as a string. Sure, you can do it, but why? The only possible outcome is that you will eventually store useless strings that aren’t actually valid.

Of course the tel: url scheme can support E164 numbers. I mean unless your app is limited to only a single geography, why would you even want it to be local only?


> What customer has *555 as their phone number?

"Surely we won't ever need it"


“Surely accepting randomly formatted text instead of an international standard won’t ever cause problems”

how about accepting anything that can be typed with a phone keyboard `[0-9()+*;]` or whatever the right chars are? going from phone keyboard to database and back to phone keyboard should not modify what gets dialed

DTMF signalling restricts what you can dial over a traditional PSTN line. In practice you can only dial what a regular landline phone can dial. * and # are used to signal commands rather than calling numbers. There are another 4 DTMF signals, but CPEs don't generally use them.

So that leaves you with just the 10 digits. Which can be easily stored in an integer, which can then be formatted reliably for humans according to the international numbering plan.

Like IP addresses, phone numbers are just numbers. You can fuck about all you like by adding funny letters and brackets and complicated parsers (just like you can add dots and colons to an IP address), OR you can just store them as normalised E164 numbers, which will work everywhere and for which there are clear formatting rules. Just like you can store an IP address as an IP address datatype in most databases.

Almost nobody wants to store the NZ time service or 911 or anything else in a large database. Nobody is dialling an extension number over the PSTN network - that's like trying to include the port number in an IP address and calling it ... an IP address. Such numbers should be stored somewhere else.


Lots of extensions are written like that.

An internal extension on your PBX

One huge problem with this is that you need to normalize every phone number. Lots of people enter local numbers without country code, and you need to know the country.

For most phone numbers, you just need to round trip them. They will be presented in the same context. It is better to store ambiguous numbers as entered, along with context, and let human figure it out than mess up.


It's trivially easy to deal with this on input, for example using https://intl-tel-input.com

> For most phone numbers, you just need to round trip them

That would be a great entry in a book, "myths developers believe about phone numbers".


remember to dial 9 first to get out.

And god help you if you get assigned an extension that matches a popular first 4 digits of phone numbers in your area, because you will forever get calls by people that forget the 9.

Or 0, depending where you are.

or M, for murder

Just use ChatGPT API lol



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: