Hacker News new | past | comments | ask | show | jobs | submit login
Falsehoods Programmers Believe About Names – With Examples (shinesolutions.com)
159 points by andrelaszlo 10 months ago | hide | past | web | favorite | 169 comments

I like reading these lists and find them very interesting.

At the same time, I would caution anyone from building software that blindly tries to accommodate every point in these lists.

Every software has a set of requirements, and a target market. It should be built to meet those requirements and anticipate strange situations, but within reason. This is very context sensitive. There is no one ruleset.

At the same time, it should be possible to address edge cases in some reasonable way. For example, if your software does not support pasting images in as names (and therefore cannot represent the prince symbol), it should allow transliterations into a script that it can support. We can't support all cases perfectly, but being aware o them and finding some way to fit them in makes for a better experience.

I find it interesting because it's a more universal problem that predates software. Each civilization had its own way of doing things and had to occasionally accommodate people who didn't follow its customs, and they had to use many of these very same workarounds.

"You don't use our script? Okay, you'll have to come up with a version of your name that we can write in our script."

"You don't have a [second] family name? We'll have to come up with a convention for deriving one when we log you."

"Your name has sounds we can't pronounce? We'll treat you as having a version of your name that we can pronounce."

"You don't have a signature version of your name? Uh, write X or something."

Exactly this. Your name is not your own--your name is a contract between you and the culture / civilization you're dealing with.

yeah, its like me saying my name is every character that was/is written by a human. it keeps growing. you have to accommodate me

try our best, find compromises, and fail gracefully.

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. - Douglas Adams

Not only would handling every case listed be time-consuming and introduce other problems, it wouldn't be good enough. Given enough time and users, a system that needs to take and honor 'open' inputs like names will eventually hit a case it can't handle, no matter much work went into flexibility. When it does, the question that matters is whether it fails elegantly and offers some kind of usable support.

Being aware of common edge cases is important to avoid getting stuck in a design that can't fail gracefully. Covering edge cases is still good, not least because cutting the amount of support you need to offer makes more time to do support well. But I hope people aren't reading these lists and concluding that the fix is to be sure they support everything on the list.

I think the point of these “Falsehoods Programmers Believe” lists is not that we should all write massively complicated code to accommodate each and every edge case. The point is we should question the practice of codifying our incorrect assumptions into requirements in the form of needless restrictions and validations.

Is to really necessary for your software to go through the trouble of rejecting names that contain percent signs or emoji? Why is a name over 20 characters invalid? What causes your business to stop working if your customer’s name is 21 characters? Do you really need to distinguish between a persons first and last name? What is the exact business requirement driving this? So much code out there (not to mention end user’s time) is wasted needlessly and artificially dividing user input into “good” and “bad” piles and rejecting the bad.

The wisdom from these “Falsehoods” lists should result in less code, not more code.

In my experience, the answer to each of those questions is "sometimes yes, sometimes no."

Our software does not exist in isolation. If I don't ask you to separate your first name and your last name, and a need comes up to interact with another software (or manual) system that requires the first and last name to be separate, you shift the problem. It will either not be possible to interact with such a system or I will need to make the decision in my system, which is worse than asking you to do it).

In a perfect world I would snap my finger and all legacy systems, government forms, and legal requirements would be updated to eliminate some of the more arbitrary (by today's standards) limitations, and would use UTF-8 encoding to boot.

In a less perfect world I will choose my battles and sometimes force people whose names don't conform to the more prevalent norms of western society (which is where my clients mostly do business) to compromise. But I will be mindful and hope of a better future.

I actually wrote a service attempting to deal with this exact problem, the amount of edge cases and exceptions make it particularly challenging, nearing that of general NLP.


Probably my favorite case were some Manchurian names, which could only be spelled vertically.


Wow, that actually looks like Klingon.

Sometimes the answer to item 29:

> And will your software only be dealing with people named by your society?

Is: "yep"

This is true in more places than software. Heck, even signing a check involves rendering a name in ink on paper, which is a restriction for some names I imagine.

Sure its true people have different renderings for their name. So choose one that fits into the software form you're filling out. Not rocket science.

Is there any requirement that one's signature be a handwritten version of the printed version of one's name?

My signature is pretty legible at least if you already see my name in print, but I know lots of people who sign a squiggle. Your name could be ʤᔙ but you still might sign it ᠼ and I doubt anybody would care.

(Made-up examples here, obviously.)

I'm told the squiggle is actually better than a legible signature. The legible signature is something you were trained to do following rules, and thus can be forged by someone else following the same rules. The squiggle is you, and so the existence/absence of some loop is enough to show a forgery and thus a forger has to memorize the entire thing with no rules to help.

I'm not qualified to state if the above is true, but I like to believe it is.

Also, if you have an elaborate squiggle, and you write it fast, it makes it very difficult to copy since the forger needs to write it quickly as well to get the same pen stroke results.

In some cultures a signature is somewhat like a signat-ring, a written emblem not necessarily related to alphabetic sequences. An 'X' for instance, signed on a contract by an illiterate and countersigned by a witness (well that was alphabetic but you get the idea). I knew a fellow from Bangalore who's signature was a graphic, some symbol written above a line and some letters below the line.

In Japan some places require me to sign my name with the exact same way than it's written in my passport using block letters. Normally I'd just write a cursive, stylized version (without my middle name).

Some other names allow stylized signatures, but then they require that the signature must be almost exactly the same every time I write it. Again, not something I'm used to.

This, of course reflects the fact that the Japanese don't use signatures to sign contracts. They use personal name stamps. The requirements make sense if you think your signature as a "name stamp that's written manually".

Of course, signatures are not names, they have extra cultural, societal and legal baggage, but... just a data point that might be interesting.

There was a minor kerfuffle a few years ago because Obama was considering someone for Secretary of Treasury whose signature (which would have appeared on all American bills) was just five or six loops. It bore no resemblance to his name. He just apparently decided that was going to be his signature, and that's what he used for most of this adult life.

I have really terrible handwriting, whenever I sign a document it literally is just some random continuous line. It's never been an issue. Nobody seems to care. You could probably make a smiley face and never have an issue regardless of the importance of the document.

My name is too long to fit on the Australian customs forms when I return from holiday. I just omit my middle names. I've yet to be refused entry.

Which is "it compiles? Ship it!" in a different cloak. And like the original, it will probably come to bite you.

Often the schema is already enforced - e.g. in healthcare you'll find that somebody else has already normalized the wonderful world of names down to two Unicode strings called last and first. Names that don't fit that schema will be coerced by this wonderful technology called "an admissions clerk".

The DICOM standard has rather more to say about names than that:

A character string encoded using a 5 component convention. The character code 5CH (the BACKSLASH "\" in ISO-IR 6) shall not be present, as it is used as the delimiter between values in multiple valued data elements. The string may be padded with trailing spaces. For human use, the five components in their order of occurrence are: family name complex, given name complex, middle name, name prefix, name suffix.

Any of the five components may be an empty string. The component delimiter shall be the caret "^" character (5EH). Delimiters are required for interior null components. Trailing null components and their delimiters may be omitted. Multiple entries are permitted in each component and are encoded as natural text strings, in the format preferred by the named person.

For veterinary use, the first two of the five components in their order of occurrence are: responsible party family name or responsible organization name, patient name. The remaining components are not used and shall not be present.

This group of five components is referred to as a Person Name component group.

For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 groups of components (see Annexes H, I and J) may be used. The delimiter for component groups shall be the equals character "=" (3DH). The three component groups of components in their order of occurrence are: an alphabetic representation, an ideographic representation, and a phonetic representation.

Any component group may be absent, including the first component group. In this case, the person name may start with one or more "=" delimiters. Delimiters are required for interior null component groups. Trailing null component groups and their delimiters may be omitted.

Precise semantics are defined for each component group. See Section

For examples and notes, see Section

It depends which healthcare system you're dealing with. Legacy systems often have severe limitations. The more modern HL7 CDA R2 data interchange standard has an extremely flexible model for names which can properly accommodate the majority of "falsehoods".

Ah! So, are you saying that it's actually not "yep, all the data fit", but "that which doesn't fit will be made to, by force if necessary"? Yup, that's what I am saying: let's just design this in a lazy way and pretend.

Can you propose a schema that permits all names to be represented without requiring coercion in any case? You'll need to start with with a graphical format, or a character encoding scheme more expressive than Unicode.

Computer system or not, you won't be admitted to a hospital with your name recorded as the first 1000 digits of pi followed by the emoji for coffee and a drawing of a mouse in sneakers.

I can not, nor am I pretending to. What I was reacting to was this:

> Sometimes the answer to item 29:

> > And will your software only be dealing with people named by your society?

> Is: "yep"


I understand that the list is not a set of absolute rules, rather a set of caveats - even mutually exclusive ones! - and that an actual implementation will necessarily violate some of that, and that some intermediate representation would be needed. In other words, there will probably be a "name(s)" textfield, none of this graphical strawman.

What I have bristled at was the abovementioned "let's wish it away, that's enough: we'll pretend that everything is ASCII and let the users cope with our design problems ad hoc."

He's not wishing it away. His answer could just have been rooted in, "I understand my software won't cater to anyone. I'm ok with that."

Bristle at that if you must, I guess; some people aren't happy unless they're mad.

Sometimes refusing to ship means that you just lost first to market advantage to someone who did. There are always trade offs, cost benefit analysis, and risk analysis to do with these decisions. For most software out there, assuming that a person at least has a name is a pretty safe assumption.

A few random thoughts:

They chickened out on giving an actual example of a "bad word" name, though there's a not very dirty and really common example:

Dick, which can be short for Richard.

My ex is a Junior. When I get phone calls asking for "Doreen Traylor Junior" I know it's a telemarketer.

I used to try to crack jokes, but they never got it. As far as I know, Junior and Senior are masculine. Even if they weren't, it's my married name. Doreen Traylor didn't exist until I married. I couldn't have inherited it from my mother, even if culturally we did that kind of thing. (I mean even if it was common to give daughters the same name as their mother and call them Junior.)

I'm not sure where that falls on this list, but it seems like a glaringly obvious error if you know my gender. (And, as far as I know, Doreen is a female name. Though Michele can be male in some places, like Italy, they usually don't seem to know my middle name.)

> As far as I know, Junior and Senior are masculine.

Falsehoods programmers (including me up until about 30 seconds ago) believe about Junior: it's masculine only.


Example: https://en.wikipedia.org/wiki/Winifred_Sackville_Stoner_Jr.

In a lot of Latin American countries people named Enrique are nicknamed Kiké, sometimes with the accent mark and sometimes without. (And many online services and forms don't accept accent marks anyway so it's kind of moot).

This can, understandably lead to some confusion. It's kind of like when people visit India or Japan and see swastikas everywhere.

The word "Kiki," in France, is sort of a juvenile way to refer to a penis (like "wee-wee" or "pee-pee"). So all the Kristines and Annikas and just plain Kikis of the world can be met with a lot of giggles when they travel.

Jerk is a Swedish name that once was common in the region I grew up in. Now there are only 88 Swedes with the name left. It's short for Jerker (1522 people)...


I've gone through life with a surname that euphemistically refers to an intimate part of male anatomy. While I've never had problems with it (well, technological problems, at least) -- this writer with a slightly different spelling, has:


That entry could probably get a sub-entry: "Well, I can assume this list of long, language-specific bad words doesn't have any names in it!"

I'm sure a lot of the most common issues are internationalization problems where short words arise in several places - Fanny and Wang come to mind as English names that are only inappropriate in certain countries. But if you're matching substrings, literally anything offensive is going to show up somewhere. My favorite example is that 'specialist' includes 'cialis', which is enough to trip some spam filters.

Interesting that the one pronounced "winer" has problems, where the "weener" doesn't. Perhaps someone not familiar with German, it is the second letter that is pronounced.

I'm not German -- this ends up being another, though unrelated, constant difficulty with this name. (It's pronounced like "Whiner.")

Family legend holds that when my great grandfather reached Canada and told the immigration agent his Lithuanian name, the agent replied, "I don't know how to spell that. I'm writing down 'Wiener'."

Strangers will insist we are pronouncing the family name wrong, though I firmly hold the position that we can pronounce it however we want, since it's our name.

Hmm, sounds easier to change the spelling rather than spend a lifetime working against standard expectations.

Ha! I have a friend who emigrated to Canada; his family name was Penus (pronounced Pen-woosh). He had to change it :)

> They chickened out on giving an actual example of a "bad word" name

Incontinentia Buttocks and Bigus Dickus spring to mind.

I knew a "Dick Rogers". I have a really difficult time calling anyone 'Dick', but I also didn't feel right calling him "Mister Rogers" because that just conjured up images of the American Public Television kids show host.

I pray that I am never friends with someone named "Dick Rogers" because I think I would be a terrible friend who would call them Duck Dodgers.

For those who have to design systems to properly accommodate most of these "falsehoods", don't reinvent the wheel. Take a look at how the HL7 CDA R2 format handles names for healthcare data interchange.


Each person can have multiple different names tagged by use (legal, maiden, alias, etc). Names can be composed of multiple different parts (prefix, suffix, given, family, or just free text). All of Unicode is supported. And names can also be NULL in various different ways (which is distinct from just blank or missing).

One for "people have names"- in the context of an emergency room or criminal investigation, a person's name may not be initially known.

ERs work around this by assigning a code name to every trauma patient... meaning those patients have an additional name in the context of the hospital system.

> "It seems some people believe that you get a name and it never changes. Not so, even in Western countries, where a person may change their name when they marry."

Since it is traditionally /women/ who change their names when married, this blind spot does not surprise me.

In common-law countries, you could also change your name just by using the new name. You could stand in the town square and announce, "I'm Max Power now, everybody. Stop calling me Doug Putz. New name: Max Power." And that would be it.

Or you go up on stage and say "I'm not just 'Gordon' any more. Call me 'Sting'." And everybody rolls with it, because no one cares about a bass player named Gordon, but a guy named Sting has to be cool. And then he actually has two completely separate names, that both refer to the same person.

In modern times, governments make you fill out a form and pay a fee, because now people need to know people that don't all live inside an hour-long trip radius by horse. And publishing a name change to hundreds of millions of people that may want to know about it cannot be done by soapbox and megaphone.

And those forms cause all kinds of problems. All the information on it has to fit inside the boxes, you see.

I don't think you need to reach for sexism as an explanation there. Name changes are sufficiently rare that building the flows to support it will usually only become an issue in mature production software where accurate names are extremely important, and where preserving long term records is equally important. That's not most systems. And that's before you get into cases where names are primary keys like usernames.

The business case for supporting such changes can be pretty weak relative to other features.

Name changes are not rare.

In recent decades, about 80 percent of brides choose to change their names after the wedding, both professionally and legally. [1] Most people marry at least once. [2] Women are half the population.

That being said, I do agree that in many systems, supporting name changes is not an immediate priority.

[1] http://www.nytimes.com/2015/06/28/upshot/maiden-names-on-the... [2] https://flowingdata.com/2017/11/01/who-is-married-by-now/

It is not rare. Overwhelming majority of married women changed surname and remember "fun" that came with it.

Women are half population and while marriages go down, they are not rare at all.

I know people see No. 29 on this list and decide that their product or service will only be used by English speakers of European descent, but from experience even with these extraordinarily low requirements for testing most services fail. I’ve got three short (<8 letters) entirely ASCII names, and even then I still have trouble:

* I regularly get letters addressed to my first name and my middle name, missing my surname.

* Banks and airlines always seem to try a different way of dealing my middle name, from only using an initial to abbreviating it to anywhere from three to six characters.

* A friend once got issued travel documents where all of their given names were concatenated together (no spaces) and printed after their last name.

* Many websites still require you to pick an honourific from a drop down menu, but the options vary considerably (and usage is different in every country, even with a common language). Just make it a text field and be done with it.

I really don’t think it’s too difficult nowadays, especially if you’re writing something from scratch, to just accept really long Unicode strings and store given names and surnames separately. And it’s not like testing this is difficult - you can find datasets of a few hundred names in different languages and scripts fairly easily.

I'm sure it's not easy. Even if you're writing from scratch. Imagine you're making a new travel site. You add the any size name field. But, you have to send that name to airlines, hotels, car rentals, credit card companies, all of which will have different limits.

There are a number of websites now which ask me for my “name as it appears on your card” when doing transactions/bookings, which seems like a decent solution. I’ve also had had hotels ask for a card number or some form of ID, which also seems more sensible than asking for a name.

To your second and third point, recently traveled internationally and the airline concatenated my first and middle name on my ticket and boarding passes. My passport has a space between my first name and middle initial My driver’s license has a space between my first name and middle name.

Needless to say, I was unable to check in online or with one of the machines at the airport.

For some reason Delta prints GAVINJ as my first name on all tickets even though the J is the first letter of my middle name. When I book flights this isn’t the case, either directly or through a 3rd-party; I assume they have some master record in a mainframe somewhere that was created by some other system incorrectly and I’m stuck with it forever.

However, I’ve flown hundreds of times and nobody who has checked my ID has ever been bothered by this, which is a little worrying.

Why is it worrying? What would you prefer they do?

I had something similar happen on a recent flight (First name and middle initial concatenated) but had no issues with checking in.

"Many websites still require you to pick an honourific from a drop down menu, but the options vary considerably"

In Switzerland, many major web sites are offered in multiple languages, including sometimes English. Yesterday I ordered off the English language version of a site I generally use in German, and was amused to see the order be addressed to "Sir <My Name>".

Was “Sir” a translation?

“Master” vs “mister” is annoying because technically there are a lot of rules about the use of one or the other, but these rules differ considerably between countries. Apparently in parts of the US it would be rude to refer to anyone over 12 as “master”, but in other situations I’ve been told it’s perfectly acceptable until early adulthood, or until one becomes a father or marries. I generally hope forms only give me “Mr.” as an option.

A translation of "Herr", which as a form of address simply means "Mr." (though by itself, it also refers to God, i.e. "Lord"; at least they did not call me "Lord X").

Same; I have a suffix and it often gets concatenated onto my first or last name.

My son has a further generational suffix and his often gets truncated to a single letter and used has his middle initial.

Our names are basic short-ish names using standard ASCII characters and most systems still fuck it up.

This was an interesting list, but I’d be more interested in reading examples of how these assumptions are manifested as erroneous design decisions.

For example, many of the flawed assumptions are a variation of “Names don’t change/vary, ever”. I think most U.S. programmers are very aware of this — especially the many who have some variation of a name like “Mike/Michael”, “Dan/Daniel”, “Kate/Katlyn/Katherine”. But it took me a bit to figure out where a programmer would most likely forget that fact, in a way that’s a disservice to users — making a database system that assumes the first name/surname fields should never be updated.

Updated? That's a design error right there.

"Nope, can't give you the records of Jane Doe from 2005, you are in our system as Jane Smith. You had a different name then? Doesn't compute!" In other words, the old name needs to be kept, the new one is not a fix from an erroneous state: it used to be correct then, the new one is correct now, and you still need to match on both.

Why? Because changes never propagate instantly, if they propagate at all: I still get mail addressed to my mother (who hasn't lived here for decades), addressed to her previous name (another decade on top of that), into my mailbox, even though no trace of any of those names remains. As far as the post office goes, "XY Street 123, box 45" is still the residence of Mrs Foo. (That's actually a counterexample, yeah)

Interested question: Why shouldn’t names be updated in-place?

If you do have a denormalized "this is a current name" column, sure, go ahead. But the example is right above you: you might need to match not just on a current name, but on historical names as well. People change their names a lot.

That’s true, actually. I didn’t read the reply properly. It makes complete sense to ban updates if you want to have access to historical names.

The whole concept of using your mother's maiden name as some kind of secret is already dodgy in Western cultures, but it's even dumber when they do it in cultures where women don't change their name after marriage.

I have a coworker whose last name is the same as his mother's maiden name and a form asked him his mother's maiden name but would not allow the answer to be the same as his current name, for "security reasons". He had quite a field day showing that around to every programmer at the company.

Here's one for name changes - my brother-in-law has the same call name as me, my first given name; and we share a surname initial so have the same nickname (like TJ).

So I go by my middle name, sometimes.

My late uncle just flat out changed his name by use when he started a new role post-retirement. Which I found weird.

At funerals it seems relatively common, IMLE, to hear "X better known to many of you as Y ...".

Your funeral experience is certainly shared with me, especially when in comes to more distant, older relatives. I have been to several funerals where I learned someone's actual birth name and had never realized I knew them by a nickname or middle name.

My paternal grandfather was named Howard, but only went by Red from a fairly early age, a reference to pretty extreme red hair. By the time I was born his hair was neither red nor present, but the name was forever. His youngest son (my uncle) is also named Howard and is also a notable redhead, but I don't think I've ever heard anyone call him anything but Howie.

My brother in law is Alexander. Except he uses his middle name that starts with a J. He has 4 brothers, all whom also have a middle name that starts with J that they use day to day.

My father and brother in law's names are also homonyms, so in emails making plans it's always clear who is being referenced, but when someone shouts from the other room it's confusion.

Names are weird.

There's a lot of systems which seem to assume that name changes are of fraudulent intent, rather than being fairly routine when women get married. Let alone the long list of other reasons to change one's name.

Men change their name on marriage too, not that often in my culture, but it happens. Children also change their name when their parents marry, and sometimes change it back (my father did that).

So what is the solution now?

Have a single full Unicode 'name' field which is over 200 characters long?

What is even the longest possible name?

The ex german minister of defence has 108 characters in his ridiculous name.

Relevant stackexchange answer [1]:

> one second-century consul was named Quintus Pompeius Senecio Roscius Murena Coelius Sextus Iulius Frontinus Silius Decianus Gaius Iulius Eurycles Herculaneus Lucius Vibullius Pius Augustanus Alpinus Bellicius Sollers Iulius Aper Ducenius Proculus Rutilianus Rufinus Silius Valens Valerius Niger Claudius Fuscus Saxa Amyntianus Sosius Priscus.

> The ex german minister of defence

That would be "Dr. Copy Paste" a.k.a. Karl-Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Buhl-Freiherr von und zu Guttenberg [2].

[1] What linguistic impact, if any, has the the Roman three name naming system left on modern Romance and European languages? https://linguistics.stackexchange.com/a/29277

[2] https://de.m.wikipedia.org/wiki/Karl-Theodor_zu_Guttenberg

> So what is the solution now?

The solution on Machine-Readable Passport is that everyone has a surname and one or more given names, using only A-Z characters, for a total of 39 characters.


Everyone who travels internationally has a name in this format, even if they have other names for other purposes as well.

I would suggest for a lot of software, the name in Machine-Readable Passport format, plus another field "How would you like to be addressed" allowing any unicode characters, would satisfy most requirements.

Now you assume everyone travels internationally (and that assumes they travel with a passport, not necessarily true for instance in the EU), or even remembers how their name was botched by this format.

> Now you assume everyone travels internationally

I'm not making that assumption. I'm saying that in terms of all the directions that name-processing and name-recording software are pulled in, it's a compromise that works for many purposes.

It isn't perfect, and I'm sure some people can come up with edge cases where other data formats might work better.

There is no single solution. In terms of complexity required, I would suggest:

- Not having a field, if you don't need it. - Having a single Unicode variable width field for "how would you like us to call you" (so not necessarily a full or formal name), with a maximum length of at least 1K characters. - Having two such fields for "given name" and "full name", with the understanding that neither may match the legal name because of practical limitations. Which field you use depends on the formality required by the specific use.

If your business is related to certain government, healthcare, legal, etc. stuff you'll probably have domain specific requirements, but I could imagine you might need to be able to have several names connected to a single person. One scheme that might be useful in such a situation might be:

- Having a set of (description, name) Unicode string tuples (variable length, generous max length). For example, one might have:

    {("Given name", "John"), ("Birth name", "Vera de la Cruz"), ("Legal name", "John Quincy de la Cruz-Stevens")}

>- Not having a field, if you don't need it. - Having a single Unicode variable width field for "how would you like us to call you" (so not necessarily a full or formal name), with a maximum length of at least 1K characters. - Having two such fields for "given name" and "full name", with the understanding that neither may match the legal name because of practical limitations. Which field you use depends on the formality required by the specific use.

I look forward to being in line to check into a hotel room while the clerk tries to puzzle out how to lookup the reservation for a guy whose name is just a string of emoji.

> Have a single 'name' field which is over 200 characters long?

Or, as postgresql calls it, "varchar":

> The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB.

Or just use "text" :-)

There's no storage advantage using varchar in PostgreSQL. Just the length constraint checking.

> There's no storage advantage using varchar in PostgreSQL. Just the length constraint checking.

Contrariwise there's no advantage to using text either.

Longer than 200 characters?

Makes no difference to postgres. The one and only difference is that you can limit the length of a varchar when you define the column but that aside the storage mechanism is the one I quoted above for both, and for char as well:

> There is no performance difference among these three types (nb: varchar, char and text), apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

And for applications which introspect the schema and display different field types depending on the column type, well you may not want a textarea for a name field.

Where’s a reasonable cut off point? Surely you don’t want someone uploading a copy of War and Peace as their name.

Portuguese-origin names can be a bit ridiculous in this sense as well. Famously around here, the first emperor of Brazil was named "Pedro de Alcântara Francisco António João Carlos Xavier de Paula Miguel Rafael Joaquim José Gonzaga Pascoal Cipriano Serafim" [1]. That's 128 characters, beating Picasso's example! The tradition of long names persists to this day, although not to this extreme.

[1] https://en.wikipedia.org/wiki/Pedro_I_of_Brazil#Birth

That still violates point 11:

"People’s names are all mapped in Unicode code points."

It seems like it is impossible to address all these points. Obviously there's no way for me to allow people to enter their names which don't have any known code point. So at some point you're always going to be saying "tough luck, you must adapt your name - or come up with something - that confirms to these constraints." And I am sure people are clever enough to do so.

That point and point 40 ("People have names") can't be solved by technical means. There are two options here:

- You can make name fields optional for people without (representable) names - You can loosen what the field stands for to "what you want us to call you": for people who don't have a (representable) name, they can choose an alternative, without implying the user's (lack of) name is Wrong.

Yes, you can't solve all cases but I'm betting someone will find a workaround for the sub 0.1% of those cases.

Much more rare than your system breaking because it chokes on the apostrophe in O'Donnels or something similar.

This is not a design requirement - just something to keep in mind when designing a system. Otherwise, you're going to have a tough time requiring a `tussenvoegsel` form field in cultures where nobody knows what that is (spoiler: it's the Dutch "van " prefix; been there), or requiring that everyone has a middle name (USA), or a second surname (Mexico).

In other words, the usual problem is not "how do I fulfill all of these things?", but "how do I convince the USians that ASCII is not just insufficient, but that not even using Latin-1 won't do?"

You could just allow them to draw/upload an image.

Well...there's Monty Python's John Gambolputty. https://www.youtube.com/watch?v=UDPqB9i1ScY

While a comical exaggeration, there simply isn't a hard limit - just somebody drawing the line and saying "640 kB should be enough for anybody."

As for "ridiculous" - having many names is apparently a nobility custom, even if most people don't insist on using the full name in everyday matters: https://en.wikipedia.org/wiki/Karel_Schwarzenberg is also Karl Johannes Nepomuk Josef Norbert Friedrich Antonius Wratislaw Menas Fürst zu Schwarzenberg and Karel Jan Nepomuk Josef Norbert Bedřich Antonín Vratislav Menas kníže ze Schwarzenbergu (oh look, several non-equivalent but canonical names!)

Name field is a blob that can contain a vector graphic with animation capability, or an audio file. Each individual can have 0 to n names. For each name, allocate a canvas for the image and scale the name image into it, or present audio controls. A person with no name may be identified by pronouns.

If someone manages to change their name to a taste, smell, or tactile sensation, you may need to revise your system again.

A deaf person may consider their name to be a series of hand movements. A mariachi may identify by their signature grito. A luchador may be known mainly by the pattern on their mask. A corporation has its logo, trademarks, and audio marks ("by Mennen", Meow Mix jingle, etc.). A tiny purple musician could switch to an unpronounceable symbol, to protest something in their recording contract. Clowns are identified by their egg.

If your name is a million Unicode characters long, you might need zooming and panning capability for your name canvas.

First of all, there should be no artificial limitations. This is something that always comes back to bite developers.

It’s worth noting that mistakes here have enfranchisement implications as well: officials with one U.S. political party have been eager to combat voter fraud, including by refusing to allow people to vote whose names don’t perfectly match in all relevant databases.

Also see the curated list of falsehoods programmers believe in:


Another one for "People’s names do not change." - transgender people will almost always change their name during their transition.

The other obvious example is changing names when you get married.

yes, which is mentioned in the article..

I knew names were hard but now am convinced they are diabolical. The only safe solution seems to be to store every name as an alphanumeric blob and forget about parsing altogether.

> I knew names were hard but now am convinced they are diabolical.

That's a pretty common theme of more or less any datapoint evolved through human cultural history, because you're dealing with divergent cultures and cultural mores.

> The only safe solution seems to be to store every name as an alphanumeric blob and forget about parsing altogether.

That is absolutely the recommendation. Depending on your use case you may e.g. ask for a name and a "name of address" (possibly defaulting to the regular name) so you don't display "Hello, Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso" on every screen or use that in every notification email or whatever.

As was pointed out elsewhere, it’s rare to have a system which doesn’t need to interact with other systems. If you’re running a travel agency, those names have to be sent to airlines, hotels, etc...and they won’t be expecting a blob of data.

Scalable vector graphic with animation capabilities, audio file, or both.

An SVG file could render Prince's symbol, a series of ASL hand movements, or a regular old ASCII text string.

Searching and sorting would be a real bitch, though.

If only there was a unique aplhanumeric blob representing a name.

A name, or a person? The first one is a bit easier; person:name is, at the easiest, 1:n. But yeah, a name can have multiple representations, even before multiple languages come in (Peter/Cephas/Petros).

(Pedantry: Jean-Claude van Damme doesn't fit into the alphanumeric requirement on two separate counts)


One option might be to use a hash of one's DNA sequence and be done with it. The name identifier can then just be an image with any markings made by the person. This would then move name / id search from a character set problem to an image recognition problem. Works for me :-)

Chimeras: https://en.wikipedia.org/wiki/Chimera_(genetics)

Essentially, people can (sometimes) not have a single set of DNA.

And this has already caused issues:

> In 2002, Lydia Fairchild was denied public assistance in Washington state when DNA evidence showed that she was not related to her children. A lawyer for the prosecution heard of a human chimera in New England, Karen Keegan, and suggested the possibility to the defense, who were able to show that Fairchild, too, was a chimera with two sets of DNA, and that one of those sets could have been the mother of the children.

> 9. People’s names are written in ASCII.

This is apparently true in the UK, although they use the terms "special characters" and "normal letters". https://passportapplication.service.gov.uk/help/html/pages/1... :

>We cannot show full stops, hyphens, accents or other special characters in your passport. Full stops and hyphens in a name will be replaced with spaces. > >If your name has a special character or accent mark please enter your name using a normal letter eg e instead of é or a instead of ä etc.

"normal" letter sounds rather judging, doesn't it? Altough I suspect it is probably a product of an undereducated person.

Agreed, simply "please use the unaccented letter" would have been fine instructions. Accented letters are pretty "normal" in many languages.

Accented letters are not the only issue. Apart from the already mentioned apostrophes and hyphens, there are consonants with diacritics such as ç, č and ñ, or cases where substituting an “accented” letter with the corresponding unaccented one is not appropriate (e.g. ä in German is transcribed as ae in ASCII).

> "normal" letter sounds rather judging, doesn't it?

it is a bit.

> Altough I suspect it is probably a product of an undereducated person.

Or an attempt to use non-technical English. I doubt if most people know what an ASCII character is.

An Australian businessman and politician called Benjamin Benjamin died in 1905.

There's a current Australian politician called Grace Grace.

Yeah, this is actually quite common. This is the man who rediscovered how to move the moai using simple tools: https://en.wikipedia.org/wiki/Pavel_Pavel In this case, neither the first name nor the surname are rare, when used separately.

He (Patrick) missed out "People in the same family have unique names" as a general concept.

He pointed out "Jr." and "I/II/III", but how about Maria <middle name> <last name>?

Both my sisters' first names are Maria, as is my daughter and my upcoming one. My new daughter will have the same initials as one of my sisters (My mother, her sisters and my cousin also are all Marias, but luckily many of them have different last names).

So, if your form only has an initial for middle name, you could have a situation where a person living in the same household (i.e. same address) is indistinguishable for another.

So neglecting the middle name, as if often done, is terrible.

And let's not forget George Foreman, who named all five of his sons George Edward Foreman. So a middle name wouldn't have helped you there.

I enjoyed the list and examples but I feel the hypothesis in the introduction is flawed.

All anecdata, but I expect those of us who read these sorts of lists will likely understand them quickly without needing examples. There's a large number of developers that are not reading software blogs and lists and are not applying much thought to their programs beyond "how do I meet the minimum requirements put forth in this user story". As long as those sorts of developer continue to be prevalent it's partly on the business to specify these requirements explicitly.

Tragically I don't see this changing any time soon.

Small example with ny own name: In my home country im called Jüri, in Finland probably Jyri, in Russia Juri, and in the passport it’s also written as Jueri. Some people write it mistakenly as Yuri.

Can't we just refer to people as a hash of their DNA.. And forget about identical twins.

"Falsehoods Programmers Believe About DNA"

Yea, there's at least two in there already.

"Lydia Fairchild is an American woman who exhibits chimerism, in having two distinct populations of DNA among the cells of her body." - https://en.wikipedia.org/wiki/Lydia_Fairchild

Microchemerism more common than previously believed:


TLDR: Having a baby tends to leave you with DNA from the baby distributed throughout your body.

And entry errors, and matching errors, and misassigned records, and yeah, let's just have [Jedi handwave] a perfect identification system. Hey look, no more problems!

Some percentage of people have a different dad than they or the world thinks. Some are adopted and don't know it. Some have genetic disorders they don't wish to publicly disclose.

Methinks this goes from frying pan to fire.

A cryptographic hash of a person's DNA sequence wouldn't reveal any of that data.

Your DNA is not constant (it changes over time), and not everybody has just one DNA sequence (chimerism).

My bad. I forgot: There is no such thing as a security breach.


Cryptographic hashing algorithms are one way. A breach which gave an attacker access to a hash of your genome wouldn't reveal anything about you, not even your species.

I think the original comment was intended humorously, so it is probably pointless to argue this in earnest, but I did used to write HIPAA violation letters as one part of my job when I worked in insurance. So I am familiar with myriad ways that information can get leaked. It doesn't always involve someone intentionally attacking your data.

The letters I wrote were very frequently due to situations like "Your dad/brother/cousin who lives in the same house/on the same street/in the same town and whose name is weirdly similar to yours and they happen to also have a policy with us happened to get your check/letter by accident because we got you two mixed up (sometimes: Again!)."

Not that we could word our letters that way. We couldn't. But that was a surprisingly common scenario.

So, color me skeptical that a cryptographic hashing algorithm somehow magically solves this and ensures that no breaches happen ever.

And that may seem like no big deal if it is the occasional individual scenario and you are sure it can't happen en masse, but such information can be simply devastating when it comes out, such as this case:


Anyway, that's kind of my line of thinking, which may be a case of "I am short of sleep and taking a one line joke way too seriously."

This was a fun read. However, at least one point is wrong. "People’s names fit within a certain defined amount of space" This is actually true.

And how do you define the amount of space people's names fit within? ;) Tautologies are cheating.

"People’s names fit within a certain amount of space" is definitely true though.

Technically. Alas, I have seen too many cases where this meant "20 bytes of ASCII is this amount."

Despite being a middle-class white guy from the US, I've experienced three of these on the same day.

When I met my future wife, my last name ("Coon") happened to be a (somewhat rare) slur on her ethnic background. At that time, a man couldn't automatically change names on his wedding day in California, so we had to go to court ahead of time to fill out a legal name change form.

The paper form had only one blank for the entire new name. A Hispanic family ahead of us in line had a son whose last name was supposed to be [father's surname] [mother's surname], but his birth certificate had accidentally recorded his father's surname in the middle-name field instead. So the family wanted to move his "middle" name to the front of his surname. With only one blank for the whole name, the kid's current legal name had the same exact sequence of characters as the new legal name, and there was no place on the form to make a note. The judge asked if they were OK with adding a hyphen, and they were.

Epilogue: each of my daughters has two middle names.

This article points out lots of false or at least weak assumptions that programmers make about names. But it doesn't really address the root of the problem: why programmers make these assumptions in the first place.

There are two main reasons: 1. UI/vanity: it's nice to see your name in your app/email somewhere, and to display your name, the software often must make assumptions about your name. 2. Disambiguation: Software often has to determine if "Jon Livingston" is the same as "Jonathan M. Livingston". That is by it's nature pretty error prone, but over a finite and well understood dataset, can be made to be relatively accurate.

At it's worst, this article says that the two pieces of functionality above are impossible. At it's best, it says that they are possible, but there are a number of things you need to consider. I'm an optimist, but I'd like to see more that points to the latter.

If you want to refer to a user by a "friendly" name, the registration form should have a "I prefer to be addressed as" field instead of ass-uming anything.

Okay, so now your signup form has "first name", "last name" , plus "preferred name". Seems pretty minor, doesn't it, yet adding that next question drops signups say 4%. Is it really worth asking the question if you're going to scare away 4% of your customers? Or do you try to make due with what you have?

> Software often has to determine if "Jon Livingston" is the same as "Jonathan M. Livingston"

wat ? what kind of software does that ? that's dumb as hell. You have one name and that's the one on your id card.

I’m British.

All our ID cards are optional.

We don’t have “legal names”, just names we happen to be using. If I want to change my name to “Mr Yellow-Rat Foxysquirrel Fairydiddle” for pint of beer, I can. I have no desire to do so, but I can: http://www.freerepublic.com/focus/chat/760106/posts

I have bank cards with and without my middle initial. Cards aren’t long enough for all my full names. My dad’s were even longer.

I don't think birth certificate is optional in the UK; and children get an NHS number at birth usually. Birth Certificates / entries in the register of births include the mother's name and child's -- though it can be a temporary name ("baby Smith", "Tuesday Jones") IIRC, to be corrected later if desired.

Also if you work, or claim benefits you'll have an officially recognised designation that comports with your birth certificate and have a National Insurance Number (NINO).

It's not quite as anarchistic as you imply.

My NI card very explicitly says “not ID”. Also, and this is important, my NI number is a number, not a name.

The name on the (yes, mandatory) birth certificate is in law merely a suggestion.

The only things I cannot call myself are things which constitute fraud: “Theresa May”, “Doctor Phlogiston”, “HRH The Duke of Wellington”.

Maybe de jure the name is "merely a suggestion" but de facto it's pretty solid. When you make a claim for child benefit, say, or other action then you have to enter the name as shown on the birth certificate; you can't just vary it at will. This then is established as part of your identity as held in government records.

You can make your own deed poll, but you need an enrolled deed poll to change your name on official documents like a passport.


My ex is known by three first names and two last names. Two of those combinations are in use on valid passports, issued before and after we were married respectively, by different countries. Neither of those two combinations are the ones she is known as, and she usually would not use either in a situation where she is not required to use the government recognised version of her name.

So, yes, it's dumb as hell to assume you can link two records relating to the same person by name. But it's also dumb as hell to assume the name on any one given id is the only valid official name. It's not even safe to assume it's the only valid official name recognised by a single government - often someone changing their name will for a long time have valid government id documents with different names, for example.

I use my full name for all government documents, unless it'll be checked against my driver's license. That has my nickname. It's never caused a problem, as most people have the sense to know I'm the same person. The computer's dumb, but people are flexible.

I think your story proves the point. As long as humans analyze your name and see it in context, no problem. But once a computer sees your name, and potentially mistakes your nickname for your last name, it can't be terribly good.

The following clip is fictional, but the exact same thing does happen in data entry IRL, even as you are reading this.


Bam, you now have a government-assigned name. How come you didn't use it before?

( For a very brief period, I was known to the US authorities under a slightly different name than what was on my other ID cards, just because a data entry clerk has decided that my family name is a typo of a similar, Spanish-sounding name, and "corrected" it for me. Fortunately I noticed and had a fix applied before the error propagated; I was well aware that sooner or later someone would use your line, i.e. "that's dumb as hell. You have one name and that's the one on your id card"; that assumption would have been false, and would have me gotten deported on the account of invalid visa.

Same thing, global corporation a few years later: "yeah, the surname matches in first five characters, let's link this schmuck's ID card with an old account of somebody who previously worked for us in a completely different country and capacity." A few hours later: "What exactly are you trying to access here, your credentials are for Marketing in Ireland, not IT here!!!"

In other words: which ID card? )

This article might be of interest to you: https://news.ycombinator.com/item?id=18567548

a) I don't have an ID card

b) I rarely use my middle name or initial, except on "official" documents

c) People abbreviate their names a lot

d) Sometimes it's helpful to match people with the same name at the same address as being the same person .. and sometimes it's adversarial.

Ad d: an acquaintance of mine named his son after himself. We don't do suffixes here, so no "II." or "Junior" - they're both Mr. Foo Bar, at Baz Drive, Quuxtown. The age differs, of course, but rarely does anyone collect it, much less use it for disambiguation. As you say, it's a mixed bag - either you get to impersonate the other, or the police gets all confused when looking for a rowdy adolescent and finding a middle-aged man, or you just can't get booked ("we already have you here" "no you don't, that's my son" "that's impossible, there can't be two of you, computer says so!").

"Correcting" names is more than just a data-scientist-cleanup procedure, it's a political one too. Some states now require names on voter registrations to exactly match the names on IDs. I run a law firm analytics company across a few continents, I wishit was easy as saying "stop working with that team", "start working with thos team".. For our handul of big accounts, the want more than just our "service" at this bra, they want our love.

So now the little bug that failed to match Michael Sander with Michael Sanders, Esg., leads to under or overcounting in a general election, and we've seen that an extra 200 votes in any county could swing it Democrat.

Regardless of how racist you guilt trip your self into, if you go out with Migos, and they promise you a fun time, it's probably in your interest to go with them and do what you want.

That's just how the government (or issuing authority) knows you. Culturally, one might have other names. Government does not define culture (though try, they might).

Case in point - in Lithuania your ID has to use your name translated to Lithuanian. So if Bill Clinton wanted to get a Lithuanian passport, it would state his name as "Bills Klintons". His wife would be called "Hilarija Klintonine". By the way a hilarius percentage of genealogy software assumes that names written differently are different, even though in a ton of language they depend on the sex of the person.

> By the way a hilarius percentage of genealogy software assumes that names written differently are different, even though in a ton of language they depend on the sex of the person.

I had never even considered that. That doesn't sound like it would be fun to code for. I know some other languages do that as well, and I assume they're not all quite the same, probably?

Not at all fun or easy, but having some option for defining equivalence groups of names solves the problem. This also solves the issues of weird spelling (the same name written 200 years ago looks different). A simple solution, covers a ton of use cases.

This was an interesting read!

Being from Belgium, I wouldn't have expected the difference between how we index "Vincent Van Gogh". We always (or I do) thought that using the same software in Belgium (Flanders) and the Netherlands would be easy - i18n wise.

(There are issues with different laws though depending on the domain of your software)

>People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎. >No, your system is badly designed.

>People have names. >This one is perhaps the most difficult for which to give solid examples. There was an isolated culture in which no one had names – they referred to everyone in relative terms, such as “my mother’s eldest sister”.

So taking the last two statements together, any system that assumes a name exist is thus poorly designed?

Some of these examples are things people need to take into account, but that doesn't mean bad design. A well designed system takes into account that there is a business need and a business budget. To ignore the constraints of real world situation makes less sense than ignoring the one culture where people don't have names at all.

"e e cummings preferred his name written in all lower case." - this is not exactly true. He signed with capital letters, most of the time. (see https://en.wikipedia.org/wiki/E._E._Cummings#Name_and_capita...)

A good example of such a name, though, would be eden ahbez: https://en.wikipedia.org/wiki/eden_ahbez (he's also a good example of someone with more than one name.)

The original McKenzie list was posted to HN many times from 2010 on. Original, 170+ points:


41. You share a surname with your father.

This is not true of many people from South India. My last name is my father's first name. So my sibling, mother and I share a last name, but my father has a different one (his father's first name).

Shoutout to the consultant who was helping me fill a visa application and helpfully "fixed" my typo.

I have a hyphenated last name and I'd say 1/4 of online forms do not validate when I enter my name.

When a system had an account created for me and doesn't accept logging in with my name, it is always a guessing game to see if they normalized it with a space, am underscore, removing the first or second name etc.

Nitpick. I don't think most developers "believe" most of these things. I think most developers would rather wire up some simple code to handle names and move on to (what they feel is) more interesting logic so they end up using 40 chars of ascii for names and leave it at that.

Also somewhat related: Computerphile's video on dates and timezones. Programmers might be tempted to make similar assumptions about dates, times and timezones:


The worst offense is two-world last names. If you give me one field that says "Full name" I may as well roll the dice to determine if the three strings I input in there are First Middle Last, First Last1 Last2, First1 First2 Last, etc.

I always wanted an exhaustive list of name falsehoods. The worst part is probably that I read it with interest... I have no life!

Exhaustive? Ha, this doesn't even scratch the surface! How about "There is a set of strings that never refers to a real person, so treat 'Christopher Null' as test data." https://www.wired.com/2015/11/null/ (Also, talk about timing: "'Abcde' is not a real name, either" https://www.bbc.com/news/world-us-canada-46393501 )

Let alone "Little Bobby Tables" :-)

Some years ago, a certain Mr. O'Neil had something to do on a site of ours, which made certain assumptions. He couldn't get through to our contact form - in the e-mail, he was...outspoken.

I recall reading a story somewhere about a company whose very first customer was one Henry Test. Apparently well-meaning new employees would occasionally try to delete his account, thinking that it was left over from when they were first setting up the billing system.


I know that was just a rant, but really, supporting names in non-western cultural formats isn't that unreasonable.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact