
It’s hard to have an unusual name in China - sohkamyung
https://www.1843magazine.com/dispatches/its-hard-to-have-an-unusual-name-in-china
======
howlingfantods
Every time I enter China, I have to go to a separate special terminal at
customs because their regular systems don't have my Chinese character in their
character set.

Even worse is if you try to apply for a bank account in China as a foreigner.
Their systems are designed for 2-4 character logogram names, so if you are
using a 15-30 character alpha numeric name, then good luck. My English name
attached to my bank account has gone through a half dozen permutations over
the years, with spaces, without spaces, in the wrong order, etc. I've given up
trying to fix it and now am content with a name that is not quite my name, but
is close enough to pass passport checks when I go to the bank.

~~~
danso
Sounds like the inverse of what some Asians, who have 2-letter surnames like
“Ng”, have to put up with when systems expect last names to be 3+ characters.
Though I don’t think that minimum restriction comes up in major govt systems.

~~~
null0pointer
My previous boss had a single letter last name. Turns out most airline
ticketing system have a 3 letter minimum so he just entered his single
character name 3 times to purchase the ticket. But this meant that every
single time he got to the airport he would get stopped because his ticket
didn't match his passport.

~~~
runarb
I ones met a man named Jo Å. Have started to use that name when testing names
in it systems.

People named Null also have a hard time:
[http://www.bbc.com/future/story/20160325-the-names-that-
brea...](http://www.bbc.com/future/story/20160325-the-names-that-break-
computer-systems)

~~~
baybal2
Having NaN as a name must be even harder

~~~
k__
At least it's the truth

------
peterburkimsher
I'm in the process of proposing 7 new Taiwanese/Minnan characters to the
Unicode IRG.

They're in the 1996 Bible and 2009 Presbyterian Hymnal. In order to print
those texts, the church uses inline JPGs or Private Use Area codes with
special fonts.

There's more in the Hakka Bible and hymnal, which I plan to study more this
weekend.

~~~
philbarr
That's interesting - what's the process there? Do you have to provide proof
that they are used? Let's say they are accepted, do you then have to convince
someone to create the glyphs? Do you have to add them to a current font or do
you create an entirely new one?

Is this an area you work in or is it a hobby project?

~~~
peterburkimsher
My first draft was based on the Unicode Power Symbol project. It turns out
that was overkill - they just need a Word document with a table of
glyph/source/radical/stroke count.

[http://unicodepowersymbol.com/we-did-it-how-a-comment-on-
hac...](http://unicodepowersymbol.com/we-did-it-how-a-comment-on-hackernews-
lead-to-4-%C2%BD-new-unicode-characters/)

[https://www.unicode.org/L2/L2017/17204-uax45-additions.pdf](https://www.unicode.org/L2/L2017/17204-uax45-additions.pdf)

They also need a font. I tried to find the original, but in the end just asked
Andrew West from BabelStone Han to redraw the characters for me.

It's a hobby project. Finding characters was a side effect of trying to scrape
data for Pingtype [https://pingtype.github.io](https://pingtype.github.io) (my
Chinese learning program). When I tried to scrape characters from the
Taiwanese Bible, I found inline JPGs and thought "that can't be right..."
which led me down a rabbit hole ending here, exchanging emails with Richard
Cook, Ken Lunde and John Jenkins (the world experts).

[http://www.lingshyang.com/bible/taiwan_Bible/2ch/2ch14.htm](http://www.lingshyang.com/bible/taiwan_Bible/2ch/2ch14.htm)

If you have access to a Chinese-language paper library, please try to help
take photos of some characters. Search "please contact" on the BabelStone list
for some urgent ones. For example, "U+F2DD Alternative character for Db
(Dubnium)" and "U+F2E2 Alternative character for Rf (Rutherfordium)" could be
easily found on a periodic table, I guess.

[http://www.babelstone.co.uk/Fonts/PUA.html](http://www.babelstone.co.uk/Fonts/PUA.html)

------
rectangletangle
It's hard to have an unusual name anywhere really. Whether it's from unusual
characters, or a name that doesn't fit cultural expectations.

I wrote Alphanym to help encourage an interface pattern which preserves the
natural diversity found in people's names, and to hopefully help mitigate
technical issues like these.

[https://www.alphanym.com/demo](https://www.alphanym.com/demo)

~~~
roel_v
Is this 'just' the front end part (in quotes because it's such a hairy issue,
I imagine you have lots and lots of 'special case' code)? How do you suggest
people structure their databases when storing this information? Does your
product help with the full 'processing chain' of working with names?

~~~
rectangletangle
Alphanym is a full-stack solution, and can help anywhere you need to use
people's names.

For storing names, I'd suggest two very long Unicode string fields. At least
1024 characters or greater (some names actually get that long). One field
being a full name field, the other being a "betanym" field (the name used
anywhere you'd use a persons "first name" normally). Use the full name in
billing/idiomatic contexts, and the betanym when addressing customers directly
(or if you need a shorter name for UI reasons).

The full-stack UI is there to encourage user feedback, because names are
surprisingly ambiguous. Though in more lax contexts like ML/NER, direct API
calls without feedback may be adequate.

>Is this 'just' the front end part (in quotes because it's such a hairy issue,
I imagine you have lots and lots of 'special case' code)?

There's surprisingly little special case code on the backend, because it
primarily relies on ML to generate name interpretations. So most of the
special casing is embedded in the ML models. However I am introducing more
special case code to refine the ML models with a cleaner dataset.

Using names is ridiculously complex in the general case, seeing as it's a
proper subset of NLP. So the API relies on user feedback, which is stored by
Alphanym so it can offer more accurate interpretations in subsequent requests.
The `name-uncertain` field allows clients to bypass the confirmation if the
API has encountered the name before, so at no point does the system assume
anyone's name. Yet most of the time people will only have to fill out a single
form field.

------
basica
As someone whose first name is spelt in a way that's uncommon where I live, I
can appreciate to some degree how frustrating this is. Most places spell my
name wrong thanks to people "helpfully" correcting the "mistake" when I submit
forms or other such things.

------
jedberg
In America we have a similar problem but related to length instead of special
characters. I have two middle names. My name lengths are 6 7 6 6, but
apparently 28 characters is just too many for some databases (or maybe the
extra space).

I've had a lot of trouble over the years, especially when my name needs to
match in two places (like getting TSA pre -- I had to sit down with a TSA
agent to figure out exactly what to type where when buying airline tickets so
it would match and give me precheck).

~~~
lostlogin
Clearly they didn’t follow the guidelines that popped up [1], in this great
thread [2]. It turns out that assuming almost anything about people’s names is
probably incorrect.

[1] [https://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-...](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

[2]
[https://news.ycombinator.com/item?id=13637102](https://news.ycombinator.com/item?id=13637102)

~~~
nitwit005
It's probably not so much that they believe names never get above a certain
length, as they have to physically fit on some form of paper tickets.

Similar issue with drivers licenses.

------
senozhatsky
Koreans' full names usually consist of 3-4 syllables, thus, a large number of
databases in Korea simply were not designed to store long [western] names. It
does hurt sometimes. For instance, my company uses the first 3 characters of
my _given name_ as my _family name_, and the last 3 characters of my _given
name_ as my... well, given name. They simply could not store my full name (17
bytes) to their database.

-ss

------
vorg
The name from the article (i.e. 𬎆 ) is included among the 88,000-odd
characters in the Unihan repertoire so there shouldn't be any technical
problems in using it, only political problems.

~~~
jedberg
Ironically, the character is just a box for me, presumably because it isn't
supported on my Mac with Safari.

~~~
db48x
It's supported, you just don't have a font with a glyph for it.

~~~
jedberg
Well since I'm using the default fonts, wouldn't that effective be the same
thing?

~~~
db48x
No, because supporting a character means knowing all the metadata about it,
not merely drawing it on the screen. You have to know whether it's a letter or
a number or punctuation so that you can match against it in regular
expressions, or do word breaking when you double click on it, etc. You have to
know its bidi class so that you can render it in the right direction when
putting it on the screen. You have to know the shaping rules so that apply
when it's near other characters. None of that comes from your fonts; fonts
just have glyphs in them. All the other metadata comes from the Unicode
specification.

------
galfarragem
I would say that is hard to have an unusual name everywhere.

If I could receive some money each time I teach people how to write it I would
be rich by now :) and my name is only mildly unusual. So, future parents of
this world, think carefully before naming your children.

------
nyc111
This is really curious because this suggest that Chinese database does not
have a unique identifying number for each individual. Can there even be a
database without a unique identifier? There must be millions of individuals
with the same name.

~~~
erikb
Huh? Why does this suggest that? After you create a bank account for instance
a unique account number is generated for you. But usually they don't create
these for each citizens on birth in each bank.

~~~
nyc111
> My Ying character is also absent from the database used to make online
> medical appointments...

I meant, in the above case, why can't he make an appointment using his
identity number from his Resident Identity Card? [http://www.wiki-
zero.co/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVk...](http://www.wiki-
zero.co/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmVzaWRlbnRfSWRlbnRpdHlfQ2FyZCNJZGVudGl0eV9jYXJkX251bWJlcg)

------
TorKlingberg
I know in Japan it's fairly common to have unusual characters in your name
that are only used in ceremonial situations, calligraphy and such. In everyday
life, a more common version of the character is used.

------
CaliforniaKarl
Other name restrictions across the world:
[https://youtu.be/f5Y3cf3MFIw](https://youtu.be/f5Y3cf3MFIw)

------
k__
My last name always blows up when I get shipments from asian countries.

Sometimes I simply get Pler

Sometimes I get things like Pl#&!π¥er on my packages.

------
Rounin
For what it's worth, it's there in Unicode, but only in traditional Chinese,
so it's 張㼆 or nothing.

~~~
Rounin
Wow! I take it back!
[https://news.ycombinator.com/item?id=17623870](https://news.ycombinator.com/item?id=17623870)
gives the character 𬎆. But for whatever reason, 㼆 and 𬎆 are not marked as
variants of one another.

------
DenisM
Somewhat related: Falsehoods Programmers Believe About Names [1]

[1] [https://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-...](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

~~~
sonnyblarney
From that, the scariest one is: "People’s names are all mapped in Unicode code
points." (i.e. some names are not doable in Unicode)

My god man, if you can't be done in Unicode, you'll have to live in a hut
somewhere as 'the internets does not want you'. Might be an interesting idea
for those who truly want off the grid, as they can't be 'on' the grid in the
first place!

~~~
marzell
I would actually love to see an example of every list item he's presented.
Well, except maybe the last one, might be disingenuous to ask for that.

~~~
vitovito
[https://shinesolutions.com/2018/01/08/falsehoods-
programmers...](https://shinesolutions.com/2018/01/08/falsehoods-programmers-
believe-about-names-with-examples/)

~~~
nextlevelwizard
Last one still doesn't make sense as soon as you introduce anyone who doesn't
already know someone from the "tribe" (or whatever) or isn't related to
anyone.

You can't describe yourself as "my mothers oldest son" if I have no idea who
your mother is. Or rather you can, but thats literally meaningless. How do
these people even get someones attention? Is it always "hey you!", since you
can't ask your mothers sisters oldest cousin to pass the meat at the camp fire
if you have non family memebers present.

~~~
juliendorra
We do something similar when we say “hello Cousin” at a family event or
address our parents (dad and mom).

If your community is small enough (100 people) it could easily be applied to
every type of relationship, at the exclusion of individual name.

Also note that in many languages there is or used to be a dedicated word for
very specific relationship to someone, with different word if the relationship
is via the mother, the father, etc.

~~~
davchana
In India, in my language Punjabi, we have different words for

Sister of mother

Sister of father

Elder brother of father

Younger brother of father

Wife of elder brother of father

Wife of younger brother of father

Brother of mother

Wife of brother of mother

Son of brother

Daughter of brother

Son of sister

Daughter of sister

Husband of sister

Wife of younger brother

Wife of elder brother

Sister of wife

Brother of wife

Husband of sister of wife

Wife of brother of wife

Brother of father of husband or wife

Sister of father of husband or wife

Wife of son of brother

Wife of son of sister

Husband of daughter of brother

Husband of daughter of sister

In English, most of these are Uncle, Aunt, Cousin, sister-in-law, brother-in-
law, mother-in-law, father-in-law, nephew, niece.

~~~
glandium
In Japanese, there are also distinctions like these, but only in characters,
not words. For example:

\- 叔母さん and 伯母さん both read おばさん (obasan) and mean, respectively, younger
sister of father or mother and elder sister of father or mother.

\- 叔父さん and 伯父さん both read おじさん (ojisan) and mean, respectively, younger
brother of father or mother and elder brother of father or mother.

\- お母さん and お義母さん both read おかあさん (okaasan) and mean, respectively, mother and
mother in law. etc.

~~~
yorwba
These distinctions are imported from Chinese, where 叔母 and 伯母 are pronounced
differently, although nowadays 婶婶 is much more common than 叔母 and both only
refer to the wife of the father's younger brother, while 伯母 is an older
brother's wife.

There are many more different terms for various fine distinctions of
relatedness, but I only know those I had a need to use, and when I asked a
Chinese friend for help, he told me that he can't remember all of them either.

------
nikofeyn
why are there so many (and seemingly exclusively) negative posts about china
on hacker news?

~~~
loeg
It's an economic superpower, and it does plenty of things worth criticizing.

~~~
seanmcdirmid
This isn’t even that kind of negative article, more like names are done
differently there.

------
wawhal
So this means, if you are in China, you cannot have English or Latin or other
names, because they won't have a character for it? Not sure if I understand
this correctly.

The very idea that you have to think about logistics before naming your baby
is ridiculous.

~~~
howlingfantods
It's the same logistics as an American thinking twice before naming their baby
with the Cyrillic alphabet. It's a foreign language that will probably not
play well with domestic systems.

~~~
sonnyblarney
With Cyrillic, because of it's proximity to Latin, there may be a way to have
a 'standard transfer pattern' set of rules, whereby Cyrillic<->Latin can be
done with clarity and consistency. But we'd have to get our act together ...

~~~
bmn__
ISO 9 was adopted 1954.

~~~
PeterisP
It's worth to note that it's _a_ standard, not _the_ standard - even that page
includes multiple variations, and there are other transliteration standards
officially used in various places (e.g. the Russian international passports
transliterate names a bit differently than ISO 9), so you can't really do
"Cyrillic<->Latin can be done with clarity and consistency", you get different
inconsistent transliterations of the same name and also it's not 100%
reversible, especially if you don't know how it was transliterated.

For example, the (quite common) name Юрий has been transliterated as Yuriy,
Yurij, Yurii, Yuri, Juriy, Jurij or even other options.

Also, you can't transliterate from _Cyrillic_ , you can transliterate from a
particular language, since any phonetic transliterations will be slightly
different between, for example, Russian and Ukrainian - even ISO 9 accounts
for that, so a sequence of letters without context can't be sufficient for
transliteration, the exact same sequence of cyrillic letters may have to be
transliterated differently depending on its language.

~~~
bloak
Pedantry: I think you can _transliterate_ from _Cyrillic_. But you can't
_transcribe_ from _Cyrillic_.

There are situations in which you have to transliterate, rather than
transcribe, because you don't know what language it is. For example, it's a
name in a list of names of people from different places.

~~~
PeterisP
Sure, you can do that if you want or have to, but you won't be able to do it
_consistently_ or _properly_ \- you simply have to accept that some of your
transliterations will be different than the official/proper transliterations
of the same names, that the some of these people will have an official ID in
Latin alphabet with a different name than what you wrote. And this is not a
theoretical situation, such issues with wrong transliterations (and somebody
missing a name because they're searching for a different spelling, or someone
being offended because you wrote their name wrong) tend to appear ocasionally
in various international sporting events, law enforcement and
medicine/casualty situations.

~~~
bloak
There are no "official/proper transliterations" other than the ones you create
for your institution, which is probably an academic library because I've not
heard of anyone else caring about consistent transliteration. I've seen the
same Greek and Russian names transcribed in all sorts of ways in government
documents. Fortunately, all government documents have some kind of a number on
them. That's what you use for your database key. Names aren't unique in any
case, even with the extra variation introduced by whimsical transcription.

