Hacker News new | past | comments | ask | show | jobs | submit login
It’s hard to have an unusual name in China (1843magazine.com)
113 points by sohkamyung on July 27, 2018 | hide | past | web | favorite | 120 comments

Every time I enter China, I have to go to a separate special terminal at customs because their regular systems don't have my Chinese character in their character set.

Even worse is if you try to apply for a bank account in China as a foreigner. Their systems are designed for 2-4 character logogram names, so if you are using a 15-30 character alpha numeric name, then good luck. My English name attached to my bank account has gone through a half dozen permutations over the years, with spaces, without spaces, in the wrong order, etc. I've given up trying to fix it and now am content with a name that is not quite my name, but is close enough to pass passport checks when I go to the bank.

I have a last name with accent and multiple complicated last names, separated by hyphens, commas in my passport. I never encountered a system which accommodated all of these in China. Best case scenario, the accented letters gets switched to a the standard ones. Then usually my multiple last names are cut at the middle as it's too long. Finally sometimes all my name components gets stuck together, no space or hyphens.

Félicité Jean-David François Laurent => FELICITEJEANDAVIDFRANCO

Linking different bank cards on one app was impossible for example because my name was butchered in different was by the system and mismatched.

I'm never able to enter systems who ask for name as part of authentification, it's impossible to remember how my name was altered for each thing I registered.

As additional fun, many app or services are not accessible or crash if you are not a Chinese citizen with Chinese id, even if you have a long term resident permit.

Administrative life in China just sucks in general if you don't fit the norm. It really made me think using name as part of any authentification/login process is harmful because it's never gonna be handled well by programs. Let me login with my passport number and phone number for example, they are designed to be handled by such systems.

I'm surprised you don't use Félicité Laurent as your official name (though no judgement at all). In USA people will often drop "middle names" except on only the most official paperwork.

I also drop the middle names for all daily usages, but in China, the banks and any kind of public offices are required to enter your name as displayed on the passport, meaning the whole thing. Interestingly it could work as a passive way to protect my privacy since different organism recorded my name differently so they might not join the data properly

Sounds like the inverse of what some Asians, who have 2-letter surnames like “Ng”, have to put up with when systems expect last names to be 3+ characters. Though I don’t think that minimum restriction comes up in major govt systems.

My previous boss had a single letter last name. Turns out most airline ticketing system have a 3 letter minimum so he just entered his single character name 3 times to purchase the ticket. But this meant that every single time he got to the airport he would get stopped because his ticket didn't match his passport.

I ones met a man named Jo Å. Have started to use that name when testing names in it systems.

People named Null also have a hard time: http://www.bbc.com/future/story/20160325-the-names-that-brea...

>People named Null also have a hard time

Strong type systems do help.

Having NaN as a name must be even harder

At least it's the truth

Try having a surname Å'); DROP TABLE Users;--

Little Jo A Tables we call him.


Well, actually they do in some state's driver license. And 2-letter surnames are actually quite common ("lu", "li", "so", "ng", etc)

That's really interesting to me. I have an account with Bank of China, China Merchants Bank, and used to have one with Agricultural Bank of China (needed it temporarily to receive some money once). My passport full name has 26 characters, including one hyphen and three spaces (I have two middle names). I've never had a trouble making a bank account anywhere (made in multiple cities). I also have no problem linking my online payments (Alipay and WeChat) with my bank accounts and using them to buy things every day.

I have recently had a problem where my Internet line stopped working. Customer service called me and told me it's because my name has a hyphen and their system can't process hyphens. When I talked with them more, it sounds like China Telecom's IT department made some kind of system change that made hyphens impossible to process in names. That sucked, not sure why they would do that when it was perfectly fine before. There may or may not have been a good reason for it from their side, I obviously didn't get to talk with anyone in their IT department about it. So now my China Telecom account has my name all mashed up with no spaces and no hyphens. But banks I'm still OK for some reason.

I'm curious why some people have certain experiences that I don't share. We're all using the same service and system. I'm waiting one day for some Chinese bank IT employee to finally write up a really weird debugging story and submit it to HN for discussion. It would be good to know what's up.

Or... it could be a bank employee's user error too, I suppose.

It is pretty common for Asian people to use a phonetic version of their name in latin alphabet in the West. Would it be reasonable to do the same in reverse with your western name? Would it even be accepted by banks/government institutions?

Time for a Kafkaesque story:

Initially, when I applied for a bank account under the name of "Jack Middle Chen," they listed my name as "Jackmiddlechen." Just one name, like Cher. Chinese names don't have spaces so it made sense, but this of course made it impossible for me to pay online, where they require family and surnames separately.

So I went back and asked them to differentiate between my first, middle and last names, so they added spaces, without changing the order. So my name became "Jack Middle Chen," where "Jack" was my last name and "Middle Chen" was my first name.

So I went back and asked them to change the order, because it's obviously wrong. So they changed my name to "Chen Middle Jack." Where Chen is now my correct family name, but my new first name is now "Middle Jack."

So I went back and asked them to change my first name to "Jack" because "Middle" is my middle name. But of course, Chinese don't have middle names so they don't have a field for that. So my first name then became "Jack Middle." Which is fine.

The issue is that my names now all had spaces. Chinese names don't have spaces, as mentioned, so to separate my alpha numeric names, they literally had to insert the space character into my name. So when I try to shop online, my name doesn't match because my official registered name has several space character somewhere in there, encoded in a weird way.

So at this point, I go back and try to clarify what my name actually is in their records. I tell them just to make it look exactly like my passport. So now my name is "JACK MIDDLE CHEN," all caps. I still have no idea what my first name and last name is but I've given up shopping online with this card, and since it matches my passport exactly, I can use it at the bank, which is good enough for me.

*Name altered for anonymity

Pulling a Cher and just changing your name _to_ Jackmiddlechen probably would have been easiest.

Fun anecdote: In Sweden it's not uncommon for people to have their middle name first in the passport. For example a person named Gustav Sven Andersson might actually have Sven as his "first name". This of course causes problems in the US which assumes everyone's first name is the one that comes first in their passport. I have several friends who have just given up and go by a different name in the US (ESTA, immigration, plane tickets etc).

Are you paying online with a credit card? What type of sites are you having issues with? In a normal online payment gateway, there is no facility to verify the cardholder’s name so I’m curious what type of transaction you are doing.

If Chinese people don't have middle names, why not ask for 'Jack Chen'?

Dunno about China, but in Japan your bank name is supposed to match your government name (for foreigners, this means your passport) for money laundering regulatory reasons, although this isn’t always enforced.

That would be fun for me as the first word on the name field in my passport is my nobility title, which is not supposed to be part of my name; On my national ID it’s in a different field.

"Would it even be accepted by banks/government institutions?"

No. You can choose a Chinese name (or have one chosen for you) to make life easier for your Chinese friends and colleagues, but your bank account etc. will use your real name.

The name on your bank account is likely to be on of the below, depending on which bank you use, and what type of day the teller is having:




And you need to know what it is, because without the correct name it's impossible to receive inbound bank transfers, or to link your bank card with WeChat/AliPay. Your name is not usually printed on your bank card, which you are given as soon as you open the account.

Fun anecdote: Born in China, given a Chinese name, then naturalized in Canada. My name was spelled out phonetically on Canadian documents, unsurprisingly. When I return to China now, the name shown on all documents is LASTNAMEFIRSTNAME, not a single Chinese character in sight.

My friend has a Chinese driver's license since he lived there for 4 years, and he had to use his Chinese name for it. I don't know why (inability to enter Latin into their system?) But they insisted he give them a Chinese name.

I am surprised at this. Even in Hong Kong, if you choose to take a Chinese name, at the time that you apply for your HKID, that becomes your legal name.

There are very few foreigners in mainland China who are eligible for and have received a Chinese ID card. I've never met one in person, so don't know what name is shown on there. But that card is especially for foreigners so I'd expect it to use the regular name.

Cf. Ateji in Japan where kanji were fitted to Latin words. It was more common in the past, transliteration is more popular now.

A daily use of ateji is for abbreviations for country:

仏 fu, 仏蘭西, France 英 ei from 英吉利, England 米 from 亜米利加, America 露 from 露西亞, Russia

One that uses katakana: ソ from ソビエト連邦, Soviet Union

Usage: 米ドル bei doru, US dollar, 英国 United Kingdom

米 is used for America because 亜 was already an abbreviation for 亜細亜, Asia.

Have done both:

Successfully have name in latin characters with bank (CCB & CMB & Citi) and it links to Wepay and Alipay.

Done Chinese name. Work permit has Chinese name (Hanzi) and that's linked to passport. Can be used to open an account (CB).

Though it's so much easier now than a decade ago.

God forbid if your last name has a space in it in your passport (MC DIRMID). The fun in China will never end with that. Also, your middle name is now a part of your first name, sigh.

On the other hand, reverse roles for a second. Try opening a bank account in a Western country with a real Chinese name and you don't even get to the "fun" part. We have to put up with some language gotchas; they're forced to invent an entire new name.

Huh? Pinyin romanization in mainland China is extremely standardized and fixed by your passport anyways? How can these screw it up?

Taiwan and the other Chinese speaking places is where all the weird romanizations come from.

>they're forced to invent an entire new name.

If you're Chinese and find yourself in the West, you spell your name in Pinyin. You don't have to invent a new name unless you want to (I know a Chinese person who did this, but it was for social reasons, not necessity.)

On the flip side, you definitely should invent a new name if you're from the West and want to live in China for a extended period of time, or you'll find that life can be very painful. Sometimes you can even find a direct translation, e.g. David => 大卫 (Da Wei).

Chinese names aren’t that useful in China for official business. They are useful for non-English speaking in-laws.

I rarely used mine while living in Beijing for 10 years.

Two personal anecdotes about this.

I'm from a country with the Cyrillic alphabet, and somewhere in 2012 or so they decided to change the romanization rules. When I renewed my passport, I became e.g. VASILII MAIAKOVSKII instead of VASILIY MAYAKOVSKY. Endless grief in China. There is literally no way to certify you are you if your passport number has changed (never ever happens with Chinese ID) and your name has changed (never ever happens with Chinese names). In the end, it was so much trouble that I had to change the passport again and supply a special letter to the Consul asking to romanize my name in the old way.

I have a German friend with Ö in their name which is sometimes written by Chinese staff as "O", but in some German documents it's transcribed as "OE" so he has one bank account with O and one with OE, and of course it never works right. Also, almost no Chinese system would allow to input Ö, and a few times it changes it to "Ö"...

I'm in the process of proposing 7 new Taiwanese/Minnan characters to the Unicode IRG.

They're in the 1996 Bible and 2009 Presbyterian Hymnal. In order to print those texts, the church uses inline JPGs or Private Use Area codes with special fonts.

There's more in the Hakka Bible and hymnal, which I plan to study more this weekend.

That's interesting - what's the process there? Do you have to provide proof that they are used? Let's say they are accepted, do you then have to convince someone to create the glyphs? Do you have to add them to a current font or do you create an entirely new one?

Is this an area you work in or is it a hobby project?

My first draft was based on the Unicode Power Symbol project. It turns out that was overkill - they just need a Word document with a table of glyph/source/radical/stroke count.



They also need a font. I tried to find the original, but in the end just asked Andrew West from BabelStone Han to redraw the characters for me.

It's a hobby project. Finding characters was a side effect of trying to scrape data for Pingtype https://pingtype.github.io (my Chinese learning program). When I tried to scrape characters from the Taiwanese Bible, I found inline JPGs and thought "that can't be right..." which led me down a rabbit hole ending here, exchanging emails with Richard Cook, Ken Lunde and John Jenkins (the world experts).


If you have access to a Chinese-language paper library, please try to help take photos of some characters. Search "please contact" on the BabelStone list for some urgent ones. For example, "U+F2DD Alternative character for Db (Dubnium)" and "U+F2E2 Alternative character for Rf (Rutherfordium)" could be easily found on a periodic table, I guess.


It's hard to have an unusual name anywhere really. Whether it's from unusual characters, or a name that doesn't fit cultural expectations.

I wrote Alphanym to help encourage an interface pattern which preserves the natural diversity found in people's names, and to hopefully help mitigate technical issues like these.


Is this 'just' the front end part (in quotes because it's such a hairy issue, I imagine you have lots and lots of 'special case' code)? How do you suggest people structure their databases when storing this information? Does your product help with the full 'processing chain' of working with names?

Alphanym is a full-stack solution, and can help anywhere you need to use people's names.

For storing names, I'd suggest two very long Unicode string fields. At least 1024 characters or greater (some names actually get that long). One field being a full name field, the other being a "betanym" field (the name used anywhere you'd use a persons "first name" normally). Use the full name in billing/idiomatic contexts, and the betanym when addressing customers directly (or if you need a shorter name for UI reasons).

The full-stack UI is there to encourage user feedback, because names are surprisingly ambiguous. Though in more lax contexts like ML/NER, direct API calls without feedback may be adequate.

>Is this 'just' the front end part (in quotes because it's such a hairy issue, I imagine you have lots and lots of 'special case' code)?

There's surprisingly little special case code on the backend, because it primarily relies on ML to generate name interpretations. So most of the special casing is embedded in the ML models. However I am introducing more special case code to refine the ML models with a cleaner dataset.

Using names is ridiculously complex in the general case, seeing as it's a proper subset of NLP. So the API relies on user feedback, which is stored by Alphanym so it can offer more accurate interpretations in subsequent requests. The `name-uncertain` field allows clients to bypass the confirmation if the API has encountered the name before, so at no point does the system assume anyone's name. Yet most of the time people will only have to fill out a single form field.

Do you have to solve that horrible Google captcha for every invocation of the demo? What's the point of that?

The demo doesn't work at all on my smartphone. I see the "solve recaptcha" button below the input field, but nothing happens when I press it.

Strange, know your OS/Browser combo? I've had issues with it before in areas with poor connectivity, since it requires making API calls off to Google (after the page has loaded). The button should start the whole reCAPTCHA flow, but I've noticed it's not 100%.

Android 5.0.1, Brave browser. I tried disabling the tracking protection, but it still did nothing. Connectivity shouldn't be a problem, because I'm using my WiFi right now.

Edit: nevermind, it works now. Maybe I was too impatient before.

As someone whose first name is spelt in a way that's uncommon where I live, I can appreciate to some degree how frustrating this is. Most places spell my name wrong thanks to people "helpfully" correcting the "mistake" when I submit forms or other such things.

In America we have a similar problem but related to length instead of special characters. I have two middle names. My name lengths are 6 7 6 6, but apparently 28 characters is just too many for some databases (or maybe the extra space).

I've had a lot of trouble over the years, especially when my name needs to match in two places (like getting TSA pre -- I had to sit down with a TSA agent to figure out exactly what to type where when buying airline tickets so it would match and give me precheck).

Clearly they didn’t follow the guidelines that popped up [1], in this great thread [2]. It turns out that assuming almost anything about people’s names is probably incorrect.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

[2] https://news.ycombinator.com/item?id=13637102

It's probably not so much that they believe names never get above a certain length, as they have to physically fit on some form of paper tickets.

Similar issue with drivers licenses.

Koreans' full names usually consist of 3-4 syllables, thus, a large number of databases in Korea simply were not designed to store long [western] names. It does hurt sometimes. For instance, my company uses the first 3 characters of my _given name_ as my _family name_, and the last 3 characters of my _given name_ as my... well, given name. They simply could not store my full name (17 bytes) to their database.


The name from the article (i.e. 𬎆 ) is included among the 88,000-odd characters in the Unihan repertoire so there shouldn't be any technical problems in using it, only political problems.

Just because a character is in Unicode doesn't mean it's supported in practice: it's missing from the font Chrome is using to render your text and is highly unlikely to be selectable from an IME.

I can't see it either with Windows/Firefox :(

I don't have that problem because I've set the Hanazono font (A and B) as first choice for all Unihan characters on my computers (both Linux and Windows). The latest version, released last year , contains fonts for all 88,000 or so characters in Unihan up to extension F. The downside is that the Japanese version of the character is displayed instead of the Chinese where they differ.

Not sure about Mainland China, but usually it's expected that your name can be encoded using the Chinese Commercial code [1] which goes from 0000-9999. Here is an example of an HKID[2]. The 12 digit number is the commercial code.

[1]: https://en.wikipedia.org/wiki/Chinese_telegraph_code

[2]: https://upload.wikimedia.org/wikipedia/en/2/23/HKID_pic-adul...

I paged through 15 sets of characters (at 10 per page) in my Pinyin IME and couldn't find it, and yeah, mine also shows up as a box. Here's the info if anyone's interested: http://www.fileformat.info/info/unicode/char/2c386/index.htm

Ironically, the character is just a box for me, presumably because it isn't supported on my Mac with Safari.

It's supported, you just don't have a font with a glyph for it.

Well since I'm using the default fonts, wouldn't that effective be the same thing?

No, because supporting a character means knowing all the metadata about it, not merely drawing it on the screen. You have to know whether it's a letter or a number or punctuation so that you can match against it in regular expressions, or do word breaking when you double click on it, etc. You have to know its bidi class so that you can render it in the right direction when putting it on the screen. You have to know the shaping rules so that apply when it's near other characters. None of that comes from your fonts; fonts just have glyphs in them. All the other metadata comes from the Unicode specification.

Same (well, box with hex code) on Mac with Firefox.

Many Chinese systems use the GB18030 (in mainland China) or Big5 (in Taiwan and Hong Kong) encodings. Unicode has proven unpopular for complicated, somewhat political reasons.

While in general the Chinese are pretty pragmatic with standards and just go with what seems to be the best out there or is at least well established, in this case it really seems to be somewhere in between. The technical reason was that it is compatible to GB2312 which was the pre-unicode encoding for simplified Chinese, so it made smooth transition of existing systems possible, while at the same time being able to encode the full Unicode spectrum. Some services like baidu actually use a mix of gb18030 and utf8 for different resources which I imagine could be rather strange to manage sometimes.

GB18030 is more or less Unicode, isn't it?


$ echo 张𬎆| iconv -f utf-8 -t gb18030 | iconv -f gb18030 -t utf-8


I would say that is hard to have an unusual name everywhere.

If I could receive some money each time I teach people how to write it I would be rich by now :) and my name is only mildly unusual. So, future parents of this world, think carefully before naming your children.

This is really curious because this suggest that Chinese database does not have a unique identifying number for each individual. Can there even be a database without a unique identifier? There must be millions of individuals with the same name.

Huh? Why does this suggest that? After you create a bank account for instance a unique account number is generated for you. But usually they don't create these for each citizens on birth in each bank.

> My Ying character is also absent from the database used to make online medical appointments...

I meant, in the above case, why can't he make an appointment using his identity number from his Resident Identity Card? http://www.wiki-zero.co/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVk...

I know in Japan it's fairly common to have unusual characters in your name that are only used in ceremonial situations, calligraphy and such. In everyday life, a more common version of the character is used.

Other name restrictions across the world: https://youtu.be/f5Y3cf3MFIw

My last name always blows up when I get shipments from asian countries.

Sometimes I simply get Pler

Sometimes I get things like Pl#&!π¥er on my packages.

For what it's worth, it's there in Unicode, but only in traditional Chinese, so it's 張㼆 or nothing.

Wow! I take it back! https://news.ycombinator.com/item?id=17623870 gives the character 𬎆. But for whatever reason, 㼆 and 𬎆 are not marked as variants of one another.

Somewhat related: Falsehoods Programmers Believe About Names [1]

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

From that, the scariest one is: "People’s names are all mapped in Unicode code points." (i.e. some names are not doable in Unicode)

My god man, if you can't be done in Unicode, you'll have to live in a hut somewhere as 'the internets does not want you'. Might be an interesting idea for those who truly want off the grid, as they can't be 'on' the grid in the first place!

I would actually love to see an example of every list item he's presented. Well, except maybe the last one, might be disingenuous to ask for that.

Last one still doesn't make sense as soon as you introduce anyone who doesn't already know someone from the "tribe" (or whatever) or isn't related to anyone.

You can't describe yourself as "my mothers oldest son" if I have no idea who your mother is. Or rather you can, but thats literally meaningless. How do these people even get someones attention? Is it always "hey you!", since you can't ask your mothers sisters oldest cousin to pass the meat at the camp fire if you have non family memebers present.

The Machiguenga [1] are one example of a people using that system [2]. They apparently have no problem using the personal names given to them by Spanish speakers, but do not usually name themselves, instead preferring kinship terms. They probably don't (or didn't) have much contact with outsiders, which makes it possible to know enough about everyone you might talk about to accurately describe them without a name.

[1] https://en.wikipedia.org/wiki/Machiguenga

[2] https://books.google.com/books?id=_JXC70OnxEgC&pg=PA9

We do something similar when we say “hello Cousin” at a family event or address our parents (dad and mom).

If your community is small enough (100 people) it could easily be applied to every type of relationship, at the exclusion of individual name.

Also note that in many languages there is or used to be a dedicated word for very specific relationship to someone, with different word if the relationship is via the mother, the father, etc.

In India, in my language Punjabi, we have different words for

Sister of mother

Sister of father

Elder brother of father

Younger brother of father

Wife of elder brother of father

Wife of younger brother of father

Brother of mother

Wife of brother of mother

Son of brother

Daughter of brother

Son of sister

Daughter of sister

Husband of sister

Wife of younger brother

Wife of elder brother

Sister of wife

Brother of wife

Husband of sister of wife

Wife of brother of wife

Brother of father of husband or wife

Sister of father of husband or wife

Wife of son of brother

Wife of son of sister

Husband of daughter of brother

Husband of daughter of sister

In English, most of these are Uncle, Aunt, Cousin, sister-in-law, brother-in-law, mother-in-law, father-in-law, nephew, niece.

In Japanese, there are also distinctions like these, but only in characters, not words. For example:

- 叔母さん and 伯母さん both read おばさん (obasan) and mean, respectively, younger sister of father or mother and elder sister of father or mother.

- 叔父さん and 伯父さん both read おじさん (ojisan) and mean, respectively, younger brother of father or mother and elder brother of father or mother.

- お母さん and お義母さん both read おかあさん (okaasan) and mean, respectively, mother and mother in law. etc.

These distinctions are imported from Chinese, where 叔母 and 伯母 are pronounced differently, although nowadays 婶婶 is much more common than 叔母 and both only refer to the wife of the father's younger brother, while 伯母 is an older brother's wife.

There are many more different terms for various fine distinctions of relatedness, but I only know those I had a need to use, and when I asked a Chinese friend for help, he told me that he can't remember all of them either.

Nice to learn that Bulgarian is not the only language that is so precise about relations. However, many of these words are disappearing from usage as families get smaller and younger people don't learn them.

It may be a bit fantastical, but I think one example is the time that Prince changed his name. See https://www.bbc.com/news/magazine-36107590 for background.

Did he change it legally? I don’t know, but for the spirit of the linked list, it probably doesn’t matter: He says his name is now something else, and so it is, and I doubt it’s in the Unicode tables!

why are there so many (and seemingly exclusively) negative posts about china on hacker news?

You could easily have written an equally negative article about Americans, which for the longest time created systems that only worked with ASCII. Even a small deviation from that, like Europeans that had name with accented letters could cause trouble. One example is payment systems that refused the transaction if you name had an accented letter. I experienced that as recently as 5-6 years ago.

These problems are rarer these days, but Americans have had 50 years to get it right, the Chinese started much more recently. The cause of the problems are the same in both cases. The system designers are oblivious to or ignore cultures outside their own.

It's an economic superpower, and it does plenty of things worth criticizing.

This isn’t even that kind of negative article, more like names are done differently there.

It's called Baader-Meinhof Phenomenon

Negative? I'd rather say it's part of the stuff that's normalizing totalitarianism by being too polite about it.

> Truly, I live in dark times! / An artless word is foolish. A smooth forehead / Points to insensitivity. / He who laughs / Has not yet received / The terrible news.

> What times are these, in which / A conversation about trees is almost a crime / For in doing so we maintain our silence about so much wrongdoing! / And he who walks quietly across the street, / Passes out of the reach of his friends / Who are in danger?

-- Berthold Brecht, ‘To Those Who Follow in Our Wake’

> Every line of serious work that I have written since 1936 has been written, directly or indirectly, against totalitarianism and for democratic socialism, as I understand it. It seems to me nonsense, in a period like our own, to think that one can avoid writing of such subjects. Everyone writes of them in one guise or another. It is simply a question of which side one takes and what approach one follows.

-- George Orwell, "Why I Write"

Why would you on the other hand not pay attention? Because you or nobody you love has been put in a camp, for example? Because you're with those who ignore the millions of people who did resist totalitarianism and were murdered, as if those never existed, while giving the time of the day to those who build on their corpses and call it culture or the nature of the people?

Imagine an alternate universe where the Nazis hadn't "won the war", but never started one, and had "just" done what they did "internally", while militarizing without attacking. Few would have even raised a brow, nobody would have done anything, and after 20-30 years of that, nobody even capable of forming an opinion would be left alive in Germany. And then people would just say "awww, that's just how they are, that's how they like to treat each other". In that alternative universe, there would also be people for whom that would not be enough, who wouldn't stop with acceptance of totalitarianism, who don't want uglyness to not be called uglyness anymore, they want to see it considered beauty. In short, you're being among the worst in a reality where hardly anyone is good.

So this means, if you are in China, you cannot have English or Latin or other names, because they won't have a character for it? Not sure if I understand this correctly.

The very idea that you have to think about logistics before naming your baby is ridiculous.

My son was born in China. His official name uses the Latin alphabet, and this has caused him no problems so far. He is not a Chinese citizen, but his birth certificate (issued in China) carries his name in Latin characters.

I don't know if it's possible for a child born in China, to two Chinese nationals, to have a non-Chinese name on their birth certificate.

But I'm surprised you think this is ridiculous. What happens in the UK if you want to name your child:

- With only Chinese characters? or

- With regular alphabetic characters (A-Z) but also including some ancient English characters like wynn (ƿ) that have unicode code points but are not easy to type?

It's the same logistics as an American thinking twice before naming their baby with the Cyrillic alphabet. It's a foreign language that will probably not play well with domestic systems.

It's not the names, it's the technology. Ask any non-US, non-UK, non MS-Windows computer user: any document not in the original ASC-127 subset is prone to be mangled for being shown with the wrong character set, wrong linefeed configuration, not having the right BOM, not allowing right-to-left...

Open the document in the wrong application, and the text will be interspeded with ∆ and °, or ® and▯in the places of all the non-English characters, with no obvious way to fix the encoding.

The most robust solution would be that each document-showing application had a dialog showing how the text is displayed in all the supported codesets at once, allowing the user to choose the right one. Instead, we get config options buried deep in the menu, showing endless lists of cryptic options, forcing the user to try then one by one. It's frustrating to know that the correct configuration is one click away, but having no idea which one is the right option.

Mmmmmhhh... That gives me an idea for an app...

> any document not in the original ASC-127 subset is prone to be mangled for being shown with the wrong character set, wrong linefeed configuration, not having the right BOM, not allowing right-to-left...

It got much better in the last years as UTF-8 gained wider adoption. The problems are mostly with legacy systems in government. (Cannot comment on that since my name fits in ASCII.)

Maybe for registering to government systems...

Using desktop applications is still a nightmare, where opening any document from the internet or navigating to a foreign page is hit or miss.

Also, the main frustration is not that it happens often; it's that when you encounter it, fixing it may involve studying the whole software stack to find out where the setting is switched or at what point in the toolchain the format was mishandled.

What kinds of documents are these? Because while sometimes I don't have the right fonts installed, never in my life have I seen a foreign web page fail to decode. The only mojibake I've seen is text that was corrupted during posting, not even showing up right to the author. Documents that people make have always worked as intended.

It depends... I've seen pages in Spanish where common specific Spanish characters are shown mangled. It tends to happen on websites with older platforms, but still...

It also often happens with pages in Chinese or Japanese. But then you bring another problem: when a document is shown as a list full of ▯▯▯▯▯▯,▯▯-.▯▯▯, there's no way to tell if the problem is with the encoding or the fonts, and no obvious way to know which font should be installed and what part of the software stack is the one responsible.

The whole field of displaying documents to end users could benefit from some user-centered design to build tools that helped users to fix these situations, based on some common heuristics to solve the most frequent problems.

And there are non-zero chances that changing to the wrong option and saving the document will permanently break the file as the combination of two wrongly configured codesets.

I don't know, maybe it's me who's doing something wrong and turn to using only a small subset of software tools, properly configured?

But I do need to process documents from many origins using a variety of different tools; I've never found a good programable multimedia editor that satisfies all my needs (something like emacs but with WYSIWIG capabilities).

And apparently I'm not the only one having this problem [1], but I can't find software tools explicitly dedicated to solve this problem at the user layer (though I suppose there will be plenty in IDEs and programming languages).

[1] https://softwareengineering.stackexchange.com/questions/1871...

All sorts of people from non-English extraction will have the same problem.

I've known a few people called Zoë (enough now that I still know the alt-0235 short-code for it on windows), and inevitably on their licenses the omit the diaeresis above their name.

It's really a version of Falsehoods Programmers Believe about Names[1] writ-large, by government.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

I was curious and googled a bit. Some British people on netmums.com report that they successfully put these names on birth certificates: Éilis, Zoë. I would guess that "Michèle" would also be allowed. I've never heard of an official set of allowed characters, but everything relating to names of people seems to be unregulated in the UK.

One argument people mentioned (though I have no idea whether it has any validity) is that if you have the diacritic on the birth certificate then it's easy to drop it later, or not use it in practice, but the other direction would be harder to justify.

but everything relating to names of people seems to be unregulated in the UK

As an aside, one can indeed simply change one's name whenever one likes and start going by a new name, so long as it's not for fradulent purposes. One is advised to get some kind of documentary name-changing paperwork, and getting a passport etc in your new name will require that paperwork.

The diaeresis in Zoë is a purely English mark. It's not foreign in any sense.

It is however slowly going the way of the þorn, since most people don't know how to type it, and even if they did they wouldn't see a reason to use it.

With Cyrillic, because of it's proximity to Latin, there may be a way to have a 'standard transfer pattern' set of rules, whereby Cyrillic<->Latin can be done with clarity and consistency. But we'd have to get our act together ...

This requires assuming some pronunciation for Latin, so the problem of choosing standard patterns devolves into choosing the standard Latin-alphabet language. (Assuming pronunciation for Cyryllic is can be done approximately correctly -- there are no issues such as the difference in pronunciation of 'ja' between German (e.g. Jacke) and English (e.g. Jack)).

Aside: The inverse problem is also funny: when you have a Latin name, it's hard to tell how it's supposed to be pronounced. When one wants to transliterate it into Cyryllic, one would wish to preserve pronounciation. This causes some names to be transliterated in many different ways, depending on the transliterator's opinion of the correct pronunciation of the original name.

> choosing the standard Latin-alphabet language

And of course choosing the standard Cyrillic-alphabet language

There are dozens of languages using Cyrillic and many of them have their own unique letters. That would be very hard.

ISO 9 was adopted 1954.

It's worth to note that it's a standard, not the standard - even that page includes multiple variations, and there are other transliteration standards officially used in various places (e.g. the Russian international passports transliterate names a bit differently than ISO 9), so you can't really do "Cyrillic<->Latin can be done with clarity and consistency", you get different inconsistent transliterations of the same name and also it's not 100% reversible, especially if you don't know how it was transliterated.

For example, the (quite common) name Юрий has been transliterated as Yuriy, Yurij, Yurii, Yuri, Juriy, Jurij or even other options.

Also, you can't transliterate from Cyrillic, you can transliterate from a particular language, since any phonetic transliterations will be slightly different between, for example, Russian and Ukrainian - even ISO 9 accounts for that, so a sequence of letters without context can't be sufficient for transliteration, the exact same sequence of cyrillic letters may have to be transliterated differently depending on its language.

Pedantry: I think you can transliterate from Cyrillic. But you can't transcribe from Cyrillic.

There are situations in which you have to transliterate, rather than transcribe, because you don't know what language it is. For example, it's a name in a list of names of people from different places.

Sure, you can do that if you want or have to, but you won't be able to do it consistently or properly - you simply have to accept that some of your transliterations will be different than the official/proper transliterations of the same names, that the some of these people will have an official ID in Latin alphabet with a different name than what you wrote. And this is not a theoretical situation, such issues with wrong transliterations (and somebody missing a name because they're searching for a different spelling, or someone being offended because you wrote their name wrong) tend to appear ocasionally in various international sporting events, law enforcement and medicine/casualty situations.

There are no "official/proper transliterations" other than the ones you create for your institution, which is probably an academic library because I've not heard of anyone else caring about consistent transliteration. I've seen the same Greek and Russian names transcribed in all sorts of ways in government documents. Fortunately, all government documents have some kind of a number on them. That's what you use for your database key. Names aren't unique in any case, even with the extra variation introduced by whimsical transcription.


I’d say you should also check the availability of the DNS entry before picking a name!

reminds me of a post on 4chan says that he named his kid '--;drop table users;

Little Bobby Tables https://xkcd.com/327/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact