
‘Ghost kanji’ lurk in the Japanese lexicon - camtarn
https://www.japantimes.co.jp/life/2018/10/29/language/ghost-kanji-lurk-japanese-lexicon/
======
hardmaru
There was also a nice discussion about Ghost Kanji a few months back:

[https://news.ycombinator.com/item?id=17637375](https://news.ycombinator.com/item?id=17637375)

A spectre is haunting unicode:

[https://www.dampfkraft.com/ghost-
characters.html](https://www.dampfkraft.com/ghost-characters.html)

~~~
vilhelm_s
Yeah, also that article seems a lot less sensational. The Japan Times article
says

> The fact is, most ghost characters have yet to be fully understood. One such
> example is 彁

while the blogpost says

> In the end only one character had neither a clear source nor any historical
> precedent: 彁.

According to the table in Wikipedia[1], of the 28 characters which were
originally considered ghosts, 16 were found to be legitimately used to write
place names, 8 occurred in earlier dictionaries, 3 (including 妛) were similar
to (and probably a mis-read form of) some dictionary entry, and 彁 is of
unclear origin.

[1]
[https://ja.wikipedia.org/wiki/%E5%B9%BD%E9%9C%8A%E6%96%87%E5...](https://ja.wikipedia.org/wiki/%E5%B9%BD%E9%9C%8A%E6%96%87%E5%AD%97#%E5%85%B8%E6%8B%A0%E3%81%8C%E4%B8%8D%E6%98%8E%E3%81%AA%E3%82%82%E3%81%AE)

------
lifthrasiir
In another example of ghost characters, someone couldn't spell Baht correctly
[1]. And typos are frequent as ever [2].

Unicode is full of such mistakes, now stabilized thanks to the Stability
Policy [3]. Thankfully having some character mistakenly encoded and/or named
in Unicode seems to hurt virtually no one.

[1] [https://blogs.adobe.com/CCJKType/2016/03/bahts-is-
parts.html](https://blogs.adobe.com/CCJKType/2016/03/bahts-is-parts.html)

[2] [https://www.unicode.org/notes/tn27/](https://www.unicode.org/notes/tn27/)

[3]
[https://unicode.org/policies/stability_policy.html](https://unicode.org/policies/stability_policy.html)

~~~
sho
Amusingly, even the Thais often don't spell baht "correctly". I'm there right
now and see prices specified any number of ways:

    
    
      100 บาท (in the local language)
      100 B
      100 ฿ 
      100 BAHT
      100 BATH (disturbingly common)
    

None of it matters of course as the meaning is totally evident from the
context, but it's interesting coming from countries where symbols like $ or ¥
are totally dominant. ฿ is probably the least common of all of them (yes, less
common than "bath") and it's kind of easy to see why - there's only one
conceivable meaning a capital B could have coming after a number in Thailand,
so why even bother learning the keystrokes to bring up the unicode?

~~~
caf
Very much like c in place of ¢ for cents which is very common in some places.
I've even seen e instead of €.

------
harimau777
I'm a little surprised that these haven't gained any sort of meaning over the
years. For example, the Japanese version of Prince could change his name to 彁
and people would have to refer to him as "The artist formerly known as 王子"

~~~
wodenokoto
While there is no meaning associated with that character, it does have a
recorded reading.

[https://jisho.org/search/彁](https://jisho.org/search/彁)

It is read "ka" or "sei".

------
paradoxparalax
Nice! let me try to guessduct where this ghost character 彁 came from... it was
in the 70's ,maybe something to do with the soda pop named coca-cola, was
written in some old sign in a vertical way? : ) if you take a look at their
logo , maybe the ghost of someone who died of thirst , was the thirsty ghost
and wrote something like : : 口可 (mouth thirsty?) twice in the wall with a
brush...Its interesting to read the story about how the chinese name of coca-
cola was choosen. Other theory is that it was a karaoke singing ghost who died
of boredom after the beer finished...may have something to do with "brother"
or "turtle" haha, who knows...anyway very interesting read this post on
japantimes is.

------
rococode
Hovered over the words with my dictionary extension to find that they're also
referred to as "jabberwocky words" which I personally think is a more apt
name.

Interestingly, the dictionary also gives them readings. I wonder how they come
up with them?

[https://i.imgur.com/14N8yo1.png](https://i.imgur.com/14N8yo1.png)

~~~
lifthrasiir
The reading カ derives from the phono-semantic analysis (the same reading as
哥). The origin of the reading セイ is much unclear, the first known attestation
comes from IBM's conversion table [1].

[1]
[http://www.asahi.com/special/kotoba/archive2015/moji/2011081...](http://www.asahi.com/special/kotoba/archive2015/moji/2011081800015.html)

------
km3k
So the unicode standard doesn't have a way to deprecate/remove characters?

