
Falsehoods Programmers Believe About Names (2010) - rahuldottech
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
======
staticassertion
I've stopped using my real name. My last name has an apostraphe, and at this
point I'm just sick of websites rejecting it and then forcing me to refill in
all of my information.

So, now I drop the apostraphe.

In high school my school ID was completely broken. It printed <first half of
last name> <first letter of last name> <middle name> <first name>.

They ended up putting a quotation mark instead of the apostraphe - somehow the
machine handled that fine. That was the workaround.

Hotel wifi systems tend to break if they ask for a last name to connect.
Again, training me to reserve the hotel under a 'fake' last name.

I actually notice it every time I write my last name. I'm like "is this a
system that will break" and now I'm inclined to just never use the "real"
spelling because then I'll get a mismatch later on.

This is something I've really noticed over the last decade. The older I get,
and the more systems become a part of my life, the more systems I have to give
a fake name. Paper records will show my true name, but I wonder if at this
point, given digital records, what it would appear to be.

~~~
maire
I was at Xerox when they were inventing i18N in the mid 1980s. They were so
excited when I started because my real name is Mairé and not Maire. They
wanted my login name to include the acute accent.

Over time all of the i18N people left for Apple. My login broke and I removed
the accent. When I went to Apple I never bothered to add the accent again even
though I was friends with the Unicode gang. I never added it back.

~~~
sushid
Username relevant :(

~~~
jacurtis
Ouch, that one hits close to home

------
partyboat1586
The last sign-up system I built you get a single max length utf8 field called
'Name'. This field is not used at all in the UI except on the profile screen
where it appears in full and in email templates where it also appears in full.
It might not be perfect but it's as good as you are going to get without
having to become an expert in names. I'll leave full localisation to companies
that can afford to dedicate a team to it.

~~~
cprecioso
In my university (TU Delft), there are people from all kinds of countries, so
they have a (I suppose legally-mandated) first name / last name entry, but
also a "How should we call you" entry, in which you can write anything you
want, and that's what will appear in the UI and class lists.

------
cblum
Tell me about it.

I'm Brazilian, a country where the vast majority of people will have _at
least_ two last names (typically one maternal, one paternal, but not
uncommonly multiple from either or both parents), separated by spaces.

I've immigrated to the US, where the standard is for people to have a single
last name, or if they have multiple, they're separated by hyphens.

To make my situation a little worse, legally I have two first names i.e. the
second one is not registered as a middle name in any official document I have.

I've dealt with systems that wouldn't accept my last names. I either had to
join them together, or separate them by a hyphen.

Some systems don't accept my legal double first name. When I got my driver's
license in WA, the system accepted it but later had trouble generating the
license identifier, so they had to manually apply the system's logic and enter
it by hand.

Some systems just ask you to enter your full legal name, and then try to be
smart about figuring out what your first, middle, and last names are. So I end
up with the first of my last names being registered as a middle name, and I
can't edit it.

Fortunately I'm close to being able to apply for naturalization, and that'll
give me the opportunity to change my name. I'll just drop part of my first
name and one of my last names and finally make it simple.

Edit: another thing that's funny wrt names in my life is my family situation.
My wife has her own two last names (it's not super common in Brazil for women
to take their husband's last name), and she has a daughter from her first
marriage. So we all have totally different last names, with only my wife and
my stepdaughter sharing one of their last names. That caused us a little bit
of trouble crossing the border to Canada once.

------
rdiddly
Ha! This _has_ to be in response to this from several hours ago:
[https://news.ycombinator.com/item?id=21490850](https://news.ycombinator.com/item?id=21490850)

Wherein someone named Amr keeps having his first name split into Mr. A by the
airline booking system. Which would be bad enough, even if they didn't also
insist that a name must be at least 2 characters. Good times.

~~~
Piskvorrr
It's almost a decade old...but still brings out the Followers Of Holy ASCII ;)

See also the article about Christopher Null.

------
jacquesm
If you think names are hard, try email addresses. Most complex validator I
ever wrote, and then had to switch off lots of it because email addresses in
the wild don't actually follow the RFC...

In school we had a guy whose last name was 'Van', a fairly common prefix for
names in NL, which tends to be used like this 'van den Broek' or 'van der
Berg'.

Teachers would ask for his name, he'd say 'Jan Van' and they'd invariably ask
'Jan van Wat?'. So he eventually gave up and answered 'Jan van Wat' right of
the bat, which would leave the teachers even more confused because there was
no such person on their lists of students...

Given that we know at least one guy whose first name is Van (Morrison) it is
theoretically possible there is a person called Van Van somewhere.

~~~
t0mbstone
Here's an example regex that implements the official RFC 5322 email validation
rules, along with the "preferred" syntax from RFC 1035 (which is one of the
recommendations in RFC 5322):

(?:[a-z0-9!#$%&' _+ /=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'_+/=?^_`{|}~-]+) _|
"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])_")@(?:(?:[a-z0-9](?:[a-z0-9-]
_[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]_
[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\\])

Alternately, you can just use this super basic regex which handles all of the
common use cases (while still allowing all of the weird edge cases), and
functions as a quick "smoke test" for email addresses:

[^@^\s]+@[^@^\s]+

Basically, require at least one character (except a space, a line break, or an
@) followed by an @, then at least one character (except a space, a line
break, or an @).

The main flaw this simple regex is that it is too permissive and potentially
allows weird edge cases like special UTF-8 characters that might not be
allowed.

But hey, one might argue that if a user is intentionally entering an invalid
email address with wonky characters to create an account, then that's kind of
their fault when they never receive the account activation email with the
email validation link, right?

~~~
jacquesm
Prediction: it will fail on many email addresses that are out there in the
wild and in daily use. The problem with 99% cases is that if you handle enough
accounts that 1% becomes a serious tech support issue. In the end; the best
test whether an email address is valid or not is if the mail arrives with the
intended recipient. So what you do if an email address does not pass
verification is prompt the user to that effect but do not force them to do
anything at all.

~~~
mathisonturing
Is that a prediction for the simple @ validation that was suggested? If so,
I'm curious about possible cases

~~~
jacquesm
GPs comment was edited extensively since I replied.

------
anonymfus
The few commonly believed falsehoods which are missing in this list at the
moment of me loading the page with it:

n. Names can be included in any sentence in the same form as user entered
them.

For example any language like Russian where names decline breaks it.

n+1. Names are grammatically nouns

In Russian last names and patronimic/matronimic names are usually adjectives

n+2. Any person's name can be disclosed to people who know them by their other
name.

For example if some LGBTQIA person has homo/bi/trans+-phobic relatives they
financially depend on then such disclosure can be a catastrophic outing.

~~~
Merrill
Names are associated with roles, not with persons. This is particularly true
for prefix and suffixes. Mayor Lyndon Smersh and Lyndon Smersh, Esq. are
different roles as politician and lawyer of the underlying person Lyndon
Smersh. Many women also have multiple names, e.g. a married name used socially
and their original name used professionally.

When dealing with persons, it's necessary not to confuse their roles.

------
pteraspidomorph
Once I was working in a piece of software that handled people's names. The
client wanted exactly a first name and a last name for every person in the
database. I tried to explain why that could be a problem, but they weren't
interested. It isn't necessarily the programmer's fault.

~~~
hn_throwaway_99
TBH, I disagree with most of the points in this article, or least the implied
idea that a particular piece of software needs to handle every permutation of
how someone wants to be addressed.

I'm pretty sure most US legal documents required some form of given name and
some form of family name. If you're trying to interface with as US government
system, and that's your primary use case, a single first and last name will
probably suffice (assuming it's not too arbitrarily short, people can change
their name, etc.)

~~~
rovr138
I’m from the US. My legal name since birth is not 1 word nor my last name. My
first name also includes an é.

My birth certificate has all of these as well as my passport. My drivers
license doesn’t have the é.

Truncating my name is how I get mail from people states away who have never
lived here. One of them being my dad. He’s never lived in this state so it’s
not due to an old address.

That’s also how I commit felonies by opening those letters thinking their for
me.

~~~
hn_throwaway_99
Spaces and é can easily be represented in a single DB column, and none of what
you wrote precludes having a single "first name" and "last name" field to
represent your name in a DB.

------
beerandt
We didn't name our baby until about two weeks after birth, which probably
wouldn't have been a problem, except for the required daily jaundice checkups
and blood work, all outpatient.

While delivery and the nursery apparently are set up to deal with this, (by
barcoding the babies patient bracelet) computers in other departments were
not, at least without navigating multiple menus and screens.

Every computer that required patient name and birthdate to be entered was a 10
minute process, to the point that we were famous by the third day of visits.

At some point, a higher up was able to log in and "fix" the name to be
discoverable by just last name and birthday, or via mothers patient info. I'm
not exactly sure how, but I was impressed with it being "fixable" so easily
(for us at least), especially network wide.

I also understand now why the vital records people were so persistent in
trying to get us to finish the birth certificate before discharge. We
obviously didn't.

------
meristem
Programming decisions around names get encoded operationally as well: within
the last decade I missed a flight when a major airline's system did not accept
my hyphenated last name at the check-in kiosk. I was then told by customer
service I did not need the hyphen in my name -- their system was correct, and
my view of my name was not.

------
jackschultz
Oh my god names are so, so, difficult to deal with. I'm doing something where
I get information / stats about players from different sites, and it's like
all of them have different names. For example:

    
    
      Joseph Smith     // nice and generic
      Joseph Smith Jr. // gotta make sure we tell people he's the son of another Joseph Smith
      Joseph Smith Jr  // psh, we don't use periods!
      Jose Smith       // if from a different country, some sites keep their native name
      José Smith       // yup, sometimes accents are kept
      Joe Smith        // nicknames, all about those.
      Joey Smith       // kid nickname, sure
      Jos� Smith      // literally even had to deal with a site that had unknown unicode letters
      Juan Jose Smith  // sometimes, people go by their middle names, but some sites like to use their full name instead
      Juan Smith       // remember how some people go by their middle names? Maybe a site only gives you their first and last
    

How, how in the world is someone supposed to go through and match all those
names with each other in a somewhat quick manner? I can't imagine it's fully
solved because there'll always be other cases.

I've tried a couple ways, like removing periods and accents, removing "Jr",
"Sr", "III", removing all vowels even. And then went further and tried some of
the string matching libraries that return a number with how close words are to
each other. That would help cases where Joe, Joey, Joseph would come back
quite close, but then I'd run into the issue of a "Darren Smith" and "Darek
Smith" would come back so similar the computer thought they were the same
person.

In cases like names, that Patrick writes about here, yeah, they're almost
impossible to get right and it ends up being mostly by hand, which is fine
since I can get it correct, but eats up so much time.

~~~
Jasper_
Manual data entry. There is no such algorithm that can do it.

~~~
svnpenn
You can do it with AI

[https://youtube.com/watch?v=gQddtTdmG_8](https://youtube.com/watch?v=gQddtTdmG_8)

~~~
Jasper_
The problem is that you're using names as a proxy to ask whether two people
are the same. If I ask you: "in this film, are Will Smith and Will Smith the
same", you do not have enough information to answer, because there are at
least 74 Will Smith's on record in film credits:
[https://www.imdb.com/find?q=will%20smith&s=nm&exact=true](https://www.imdb.com/find?q=will%20smith&s=nm&exact=true)

And this is from an industry with guilds and unions to try to ensure that no
two credits have the same written name.

You can make a very good guess, but names can't tell you enough to determine
if these two people are the same or different. One person can have many names,
and many people can have the same name.

~~~
krferriter
We should like, give tax breaks or pay people based on how nationally-unique
they name their child. Should be an easy system to implement. Some like
inverse log or inverse root of the frequency of the name within the existing
population.

------
te_platt
The one I would add is:

Names are a reasonable indicator of the person's sex. Go ahead and prefix a
Mr. / Ms.

~~~
Zekio
It is even better when the assumed sex changes depending on what country you
are in

~~~
Piskvorrr
...and what decade.

[https://www.scarymommy.com/girl-names-formerly-boy-
names/](https://www.scarymommy.com/girl-names-formerly-boy-names/)

~~~
cprecioso
I think GP was rather referring to names such as Andrea (which is a
traditionally female name in Spain, but traditionally male in Italy).

~~~
Piskvorrr
I'm aware of this, having such name myself. Border control agencies eye me
with various degrees of suspicion for this, depending on what gender my name
is supposed to signify to them. (Yes, that's in the passport as well,
but...well, bureaucracy)

------
pkaye
I worked at a place where the convention was to give employee emails being
first initial and last name. Well this guy had a last name just the letter "I"
and his first name started with "H". So his email became hi@company.com.

------
stevoski
> People have last names, family names, or anything else which is shared by
> folks recognized as their relatives.

My fav example of this is U Thant, who was Secretary General of the United
Nations.

His name was actually just “Thant”. The U is something akin to “Mr” in
Burmese.

~~~
noneeeed
I once met a guy whose name is Tree. That's it, just Tree.

I cant imagine how often he must encounter systems that get tripped up by
that.

~~~
spc476
In college, there was a student I knew whose name was officially "The
Rebellious One," and went by the nick name "Rebel One".

------
Ayesh
Speaking of names, my full name has 6 parts, and I share 4 of them with my
brother. We never use the first three, and I stead, use 4th and 6th names to
make the full name that satisfies the falsehoods.

In the web sites I work on, I have decided to have one field that asks "How
would you like to be called", and allow full Unicode range in it. If you want
to use Emoji, fine, go for it. I have fewer data to store (privacy and data
protection concerns), and users feel better not typing the last name either.

------
ben509
Hopefully, "names correspond to pronouncable words" is only false in a Fry and
Laurie sketch[1].

[1]:
[https://www.youtube.com/watch?v=1LopIroSjsU](https://www.youtube.com/watch?v=1LopIroSjsU)

~~~
jfk13
Well, for a while there was the artist known as an unpronounceable symbol that
was sometimes explained as "the artist formerly known as Prince".

------
l0b0
"Preferred name," a UTF-8 string with absolutely no implied formatting, is a
great way to deal with a bunch of these. Simply ask people what they want to
be called and use that verbatim, always. Unfortunately the absolutely most
common misconception about names is that they are easy, so almost every system
tries to implement something more "clever," inevitably failing in stupid ways.

------
Merrill
Machine-readable Passport Name Standards

[https://egov.ice.gov/sevishelp/schooluser/machine-
readable_p...](https://egov.ice.gov/sevishelp/schooluser/machine-
readable_passport_name_standards.htm)

Actually, names should be considered to be descriptor data that can be
searched on, not as identifiers.

------
irrational
[https://travel.stackexchange.com/questions/149323/my-name-
ca...](https://travel.stackexchange.com/questions/149323/my-name-causes-an-
issue-with-any-booking-names-end-with-mr-and-mrs)

------
chris_wot
Elaine Yellow Horse was banned from Google+:

[http://randomtechnicalstuff.blogspot.com/2014/03/google-
name...](http://randomtechnicalstuff.blogspot.com/2014/03/google-names-
policy.html)

~~~
Mindless2112
Well, she didn't miss out on much.

------
neonsunset
Alright, this one got me at Unicode and people having names at all.

~~~
samatman
There are cultures in which it is considered unlucky to name a child before,
say, their second birthday, so it's never done.

In this case, it would be quite normal to have an 18 month old with no name at
all. I mean, you could _call_ them "Baby Surname" or something. But they don't
have a name.

~~~
pimmen
What if you have twins in that case? Or triplets? ”Baby 1 Surname”, ”Baby 2
Surname”, ”Baby 3 Surname”...

~~~
leovander
Typically entered as “Baby Surname”, “Baby Boy Surname”, “Baby Girl Surname”.
For twins, yeah I don’t think I’ve seen and index appended to their name. I
feel like even if the EHR system had the birth time, I would normally get
their birthdate starting at midnight. Let’s hope the EHR system keeps a unique
MRN for those twins, because down stream systems are totally going to
commingle their records. Same applies to Jrs living in the same house as the
parent, since their demographics match the parent’s other than birthdate.

~~~
Scoundreller
From what I’ve seen in multiple pregnancies; they’ll give a name like “Male
Child 1” for the first male birth in that pregnancy and so on.

All MRNs should be unique and everything else can be a duplicate.

Every system should support name changes and merges. John Does come in all the
time, and then you figure out their existing MRN, or their real name.

~~~
leovander
> All MRNs should be unique and everything else can be a duplicate.

> Every system should support name changes and merges. John Does come in all
> the time, and then you figure out their existing MRN, or their real name.

In theory yes, I’d hope to see that across the board, but fat fingering
happens way too often on top of the reuse of mrn’s. I work in the IHE space,
so I can point fingers at most of the big guys as we accept their HL7/CDA
feeds and not even ids from the same system are consistent between the
formats. I don’t want to get too off topic with that last line, but yeah we
try to consolidate records that are received by leveraging Fellegi-Sunter[0].

[https://en.m.wikipedia.org/wiki/Record_linkage](https://en.m.wikipedia.org/wiki/Record_linkage)

------
FabHK
It's always annoyed me that this blog post above didn't contain examples, and
I'm grateful that someone here on HN in another thread posted a later post
that _does_ contain examples:

[https://shinesolutions.com/2018/01/08/falsehoods-
programmers...](https://shinesolutions.com/2018/01/08/falsehoods-programmers-
believe-about-names-with-examples/)

------
WilliamEdward
Linkedin is extremely upset at this post.

~~~
Piskvorrr
Has been for a decade now ;)

------
gtirloni
_> People’s names are all mapped in Unicode code points._

Well, what hope is there then?

~~~
Piskvorrr
There's hope in functional literacy. The list clearly states that this is
_not_ a bunch of checkboxes your app needs to tick, and that some of the
requirements are irresolvably contradictory.

------
kuharich
Prior discussion:
[http://news.ycombinator.com/item?id=1438472](http://news.ycombinator.com/item?id=1438472)

------
teekert
A friend's last name is Fun. A normal last name in the Netherlands. Facebook
doesn't accept it so now he has a fake name.

~~~
AllegedAlec
> Fun. A normal last name in the Netherlands

No it's not.

------
jancsika
> People’s names fit within a certain defined amount of space.

Ok I'll bite:

A defined amount of space: 250 Terabytes.

Please give me an example of an extant name which does not fit into that
amount of space.

Bulletproof edit: _or_ the physical space that is the facade of the empire
state building.

------
avip
I have personally encountered a system that used name + birthdate to uniquely
identify its "users" (large % of citizens of a country of non negligible
size).

That was absolutely insane. Try applying such logic in China or a big Arab
state.

~~~
noneeeed
That's terrible :/

The "birthday paradox" means you need just 23 people before you have a 50%
chance of a collision, and 70 gets you 99.9%...

~~~
Merrill
23 people born in the same year with the same name.

------
burtonator
> People’s names are all mapped in Unicode code points.

Uh.. can't help you with that one!

------
razzimatazz
I propose we introduce a clever new concept we can call your "Fizz". You get a
first fizz, and a last fizz. You can change it whenever, but you need to have
one. But a fizz can only be ascii characters without case and no spaces. We
just need a way to map (with some but not perfect reliability) your fizz to
your identity as proven by some authorative third party..

The point being that 'name' is a term interpreted in culturally varied ways...
which when I design most systems I dont care about. so to make it clear, lets
not use that any more.

~~~
doubleunplussed
My first fizz is the ascii bell character. My second fizz is an ascii
backspace.

------
behnamoh
> 40\. People have names.

Any examples where people _don 't_ have names?

~~~
CapitalistCartr
In the USA, "John Doe" is used for an unknown name. God help someone with that
actual name. But hospitals commonly encounter the problem of a patient with
unknown name that must still be entered into the database. Presumably the
patient _has_ a name, it's merely unknown. Except babies with no or unknown
parents. Really no name.

~~~
Scoundreller
While someone could have the name John Doe, their real date of birth won’t be
some random day and month in 1867.

~~~
Piskvorrr
Unless you have a system which also references dead people. ("Oh! Those
exist?") There are administrative databases spanning centuries..."just use a
random year in the 19th century" is just a y2k bug in a new suit.

------
jajaioxjeyo
That's actually a great and informative article

------
adamnemecek
(2010)

------
Exuma
> People’s names are all mapped in Unicode code points

How is this not true?

Edit: it's a joke lol, as I get farther in the list

~~~
Jasper_
China has a long tradition of naming people with rare symbols that might not
be in Unicode, and it's causing real grief for some [0].

[0]
[https://archive.nytimes.com/www.nytimes.com/2009/04/21/world...](https://archive.nytimes.com/www.nytimes.com/2009/04/21/world/asia/21china.html)

~~~
pteraspidomorph
I'm not sure it's reasonable to expect support for characters not in unicode.
It would be more reasonable to get the character mapped.

~~~
Jasper_
I mean, it worked for 3000 years without support for those characters in
Unicode.

