
A Spectre is Haunting Unicode - hardmaru
https://www.dampfkraft.com/ghost-characters.html
======
rossdavidh
So, since we have them and they're not going away, we should invent a meaning
for them. Obviously, one of them will have the meaning "ghost character".
Other than that, meanings like "mistake that would have been simple to correct
if it had been caught quickly, but now it is too late", or "something that
slips through the cracks of bureaucracy", or "mistake that does not cause an
actual problem, but seems wrong".

~~~
UncleEntity
I say just leave them alone so drunk teenagers can get them tattooed and get
laughed at by native Japanese speakers.

~~~
HelloNurse
They could be given sarcastic Unicode names like MEANINGLESS KANJI-STYLE
TATTOO #1 to #12.

------
userbinator
It somewhat reminds me of this:
[https://en.wikipedia.org/wiki/Cangjie_method#Early_Cangjie_s...](https://en.wikipedia.org/wiki/Cangjie_method#Early_Cangjie_system)

 _A particular "feature" of this early system is that if you send random
lowercase words to the character generator, it will attempt to construct
Chinese characters according to the Cangjie decomposition rules, sometimes
causing strange, unknown characters to appear._

[https://en.wikipedia.org/wiki/File:Mingzhu_xiaoziku1.PNG](https://en.wikipedia.org/wiki/File:Mingzhu_xiaoziku1.PNG)

------
ume
This is utterly wonderful - brought to mind the film Brazil's fly in a
typewriter.

I'm now going to have to go down the rabbit hole of each of these ghost kanji.
Good times.

~~~
Freak_NL
I wonder if some interested Japanese ever got together to simply find new
sensible meanings and words to retroactively apply to these ghost kanji. As a
sort of creative pass-time.

~~~
nneonneo
The article links to the Nico Nico Douga Wiki (a wiki for creative works), in
which each of the 12 characters is imagined to be the name of a Japanese
youkai (a type of spirit/demon/monster):
[http://dic.nicovideo.jp/a/%E5%B9%BD%E9%9C%8A%E6%96%87%E5%AD%...](http://dic.nicovideo.jp/a/%E5%B9%BD%E9%9C%8A%E6%96%87%E5%AD%97)
(Japanese). It’s also quite a neat coincidence that there are exactly 12
remaining unattested characters (down from ~60 before the thorough
investigation), because 12 is a rather auspicious number in East Asian
tradition.

For example, 妛 (山 mountain atop 女 female) is imagined to mean a yama-uba
(traditional “mountain witch” youkai) who lives at the foot of the mountain
(as befitting the “女” position in the character).

There’s a good tradition of reusing old or outdated characters in new
contexts. For example, in Chinese, the archaic character 囧, meaning “window”
(modern term 窗) has been resurrected to mean “embarrassed” or “awkward” due to
its pictorial resemblance to a facial expression.

~~~
vorg
There also happens to be exactly 12 standard ideograph characters that were
erroneously encoded in the Unihan compatibility character block (i.e.
﨎﨏﨑﨓﨔﨟﨡﨣﨤﨧﨨﨩).

------
invalidusernam3
Obciouly not exactly the same, but this article reminded me of the scroll lock
key. Many people seem to have forgotten its original purpose: When PC Magazine
asked an executive of keyboard manufacturer Key Tronic about the key's
purpose, he replied "I don't know, but we put it on ours, too"

~~~
dbenhur
Wikipedia remembers: "The Scroll Lock key was meant to lock all scrolling
techniques, and is a vestige of the original IBM PC keyboard. In the original
design, Scroll Lock was intended to modify the behavior of the arrow keys.
When the Scroll Lock mode was on, the arrow keys would scroll the contents of
a text window instead of moving the cursor. In this usage, Scroll Lock is a
toggling lock key like Num Lock or Caps Lock, which have a state that persists
after the key is released."

[https://en.wikipedia.org/wiki/Scroll_lock#Window_scrolling](https://en.wikipedia.org/wiki/Scroll_lock#Window_scrolling)

~~~
ComputerGuru
It still works that way too in a real vtty. Hit scroll lock to scroll through
the text buffer at the terminal in a text-only session with the up and down
arrows on BSD. Very handy.

~~~
mrguyorama
My modern keyboard even still contains little arrow glyphs on the number pad
keys to correspond with them.

~~~
collinmanderson
Isn't that for num-lock? (or when num-lock is off?)

------
duckerude
The fixating effect of digitalization is scary.

New characters were introduced by accident, and although they're useless,
they're probably not going anywhere. But what about the inverse? Will
character sets still be able to change and grow new characters now that
they're standardized like this?

~~~
Freak_NL
Character sets in general will certainly continue to grow — look no further
than the ever-expanding emoji set. Unicode is designed to be able to
incorporate new characters.

Whether the CJKV¹ set will grow is another matter entirely. I expect that
there will be a couple more extensions for characters found in historical
sources; the goal of the Unicode standard is — roughly speaking — to be able
to digitize any and all existing texts, after all.

But for actual _new_ characters to be added they would first have to become
popular enough to warrant inclusion. Neither Chinese nor Japanese is a
language that produces new Chinese characters like they did historically. The
barrier to get a character excepted beyond a fringe group is quite high — in
any language — and there really is no strong need to create new characters.

1: Chinese, Japanese, Korean, Vietnamese (the latter two historically used
Chinese characters as well)

~~~
nabla9
Emojis are mistake and added too soon. We end up with standard that is full of
short lived symbols from 2000-2025 that were once in a fashion and new
generations of people will abandon them for something else when the culture
changes.

Adding new stuff (excluding old languages) to Unicode should have a delay, at
least 20-25 years from proposal to the standard (roughly a generation).

In the meantime it's possible to insert emoji into text as :), smiley,
-smiley-, smiley.jpg, ::smiley:: or whatever you want and the system you use
is free to change it into a picture.

~~~
fermuch
Weren't emojis added because Japanese phones had them, since a long, long time
(before smartphones), and the main pursuit of unicode is to be able to
represent all written characters? The first emojis would be close to the 20
year lapse now, and they are the most used.

~~~
jsjohnst
1999, so yeah, the first set of almost 200 emoji are nearly 20 years old. And
yes, they were added first in Japan and didn’t become a western fad until
after the iphone added them (not claiming the iphone was the catalyst, but the
timing was about the same).

------
narrator
This reminds me of the famous mistake of "referer" being spelled with only 1
'r' in the HTTP standard.

~~~
maaaats
As a kid that tinkered with programming/computers before I learned English,
this has stuck as something I always misspell.

------
reaperducer
If they're in Unicode, it should be possible to register ghost characters as a
ghost domain name, right?

~~~
zamadatix
I don't see why not. Hell, it's probably possible to register a domain name
for characters that aren't even defined in Unicode yet.

~~~
reaperducer
So you could take a chance registering some not-yet-defined code point, and
possibly end up with the next poop emoji as your e-mail address!

Worst lottery ever.

------
akx
The Google Translate Chinese text-to-speech for that core set of ghost kanji
is also spooky. [https://translate.google.com/#zh-
CN/en/%E5%A6%9B%E6%8C%A7%E6...](https://translate.google.com/#zh-
CN/en/%E5%A6%9B%E6%8C%A7%E6%9A%83%E6%A4%A6%E6%A7%9E%E8%9F%90%E8%A2%AE%E9%96%A0%E9%A7%B2%E5%A2%B8%E5%A3%A5%E5%BD%81)

------
ape4
Like DNA coding mistakes

------
mistaken
Any reason why these charcters cannot be revoked if nobody uses them?

~~~
TheDong
How do you prove a unicode character is unused? Look at every database field
in the entire world? Every book someone has ever typed on a computer and
printed? See what fonts include them or don't?

You can't define what an unused character looks like once it's in the
standard... even less if it's already available on almost every modern
computer.

People have already used these "ghost characters" when describing them, as
jokes in their online handles, etc... This article and now this comment
include "妛" which is one such character.

Furthermore, how would you even remove them? Tell all font authors "unicode
point X now shouldn't be included in your font"... why would they ever bother
to remove something from a font?

And what would be the benefit? Unicode offers enough codepoints that there's
plenty of space left over for whatever garbage you wish to include, like new
emojis and super-astral-aether-planar characters.

------
malkia
Basically "NaN"s in character form :) - fascinating read!

------
garmaine
> In the end only one character had neither a clear source nor any historical
> precedent: 彁. The most likely explanation is that it was created as a
> misreading of the 彊 character, but no specific indcident was uncovered.

I think I've uncovered a new English word, "indcident." I wonder what it means
and how it is pronounced?

