
Jōyō kanji variants: The curious case of 叱 and 𠮟 (2016) - yuhong
https://namakajiri.net/nikki/joyo-kanji-variants-the-curious-case-of-and-%E5%8F%B1/
======
yeukhon
> Can you spot the difference between 𠮟 and 叱? Me neither.

Of course I can, not because I am Chinese, but they really do look different.
How can you NOT see the difference? Also they have different meanings.

Edit: Here is one that’s really hard to recognize especially in handwritings
and often written wrong if not careful.

已 vs 己

This puzzles many Chinese people...

Can you spot the difference? 已 has to do with time (stopped) 己 is self

So 自己 is me/self, while 已經 means already.

There is also 巳, which is ancient Chinese clock means 9-11am I believe.

~~~
TheSpiceIsLife
I can't _read_ any of these characters but to me the difference between 已 vs 己
is as clear and obvious as the difference between 𠮟 vs 叱

~~~
IgorPartola
The question is if you would recognize the difference if they weren’t side by
side.

I am trying to teach myself Japanese, which allegedly is easier than Chinese.
Hiragana as far as I can tell makes no internal sense. The characters for yo
and ya don’t incorporate elements from o or a. And when you are done eh
Hiragana you realize that it is on its own not enough to actually communicate
in Japanese, so now in addition to those 46 characters, you have to memorize
2-4 thousand kanji. Oh man that is disheartening. I keep plugging away but it
is slow going.

~~~
azernik
> I am trying to teach myself Japanese, which allegedly is easier than
> Chinese.

Who did you hear _that_ from? The Japanese writing system is the Chinese one,
minus all of the internal consistency, plus lots of characters the Chinese
aren't traditional enough to keep in the language. It often uses the same
character to write distinct Chinese loanwords which differ wildly in
pronunciation based on _when_ and _from whom_ the Japanese heard them, and
they often have different shades of meaning.

Chinese, on the other hand, uses only one writing system in any given body of
text, uses the Latin alphabet with tone markings for phonetic spelling for
learners, and is much less reliant on garbled loanwords.

~~~
jack1243star
If you know English, the loanwords become immediately accesible. Also Japanese
dictionaries could be easier to check, since you just need to know the
pronunciation.

Your conception on the writing systems is quite weird, consistency (if any) is
the same, I would say. I do not get what you mean by characters being
"traditional" or not.

Chinese loanwords are more garbled, with different methods of assigning them
random chinese characters. And the Japanese use Latin characters to mark
pronunciations also, minus the tone markings. (Let me introduce you to --
bopomofo)

~~~
rangibaby
> If you know English, the loanwords become immediately accesible.

That's a common misconception (on the level of "katakana is for loan words"):

I switched my televi off and rode my autobike from my mansion to the cleaning
to pick up my Y shirt because my car's front glass broke. On the way home I
stopped at the conveni.

Televi: TV Autobike: motorcycle Mansion: apartment Cleaning: dry cleaning
Front glass: windshield Conveni: Conveni-ence store

> Chinese loanwords are more garbled, with different methods of assigning them
> random chinese characters.

This was standard in Japanese too (ateji). It is more common now to just make
a loanword, but if someone doesn't learn them they may be confused about why
Japanese abbreviate America as "rice".

米 US from 亜米利加 A ME RI KA (亜 is for Asia) eg 米軍 (US military) or 米ドル (US
dollar)

~~~
Manishearth
IIRC 米国 (mi guo / "rice country") is used in Taiwan as well. The other Chinese
word for America (美国 / mei guo / "beautiful country") is also a phonetic
thing.

(I used to think 美国 was descriptive since it didn't sound phonetic, until I
came across 米国 and was confused as to why anyone would describe the US as
"rice country", and then I learned that both were phonetic, but for the "me"
in "america")

------
tekacs
I was stunned to find that no article has ever been submitted to HN with 'Han
unification' in the title[0].

(there are comments [1] on the topic if you're interested to see HN's
discussion on the topic as I was)

[0]:
[https://hn.algolia.com/?query=Han%20unification&sort=byPopul...](https://hn.algolia.com/?query=Han%20unification&sort=byPopularity&prefix&page=0&dateRange=all&type=story)

[1]:
[https://hn.algolia.com/?query=Han%20unification&sort=byPopul...](https://hn.algolia.com/?query=Han%20unification&sort=byPopularity&prefix&page=0&dateRange=all&type=comment)

------
ramshorns
The reason the second Japanese character is censored out of the title seems to
be that having a character outside the BMP in the title broke everything
horribly, not because of actual censorship of swearing or something.

I wonder how long it will be until UTF-8 is used everywhere and non-BMP
characters enjoy first-class support and testing. You'd think the U+1Fxxx
emoji would have been enough to make this happen.

~~~
yuhong
I just filed [https://github.com/algolia/hn-
search/issues/104](https://github.com/algolia/hn-search/issues/104) after
editing the HN title and discovering problems.

------
glandium
Not even going into the "fun" of the Han unification, there are some weird
things in Chinese/Japanese characters.

For example, 右 (right) and 左 (left). You'd think the top-left part is the
same, but it's not. In the case of 左, you write the horizontal stroke first,
but in the case of 右, it's the second stroke. They also have a slightly
different shape.

Another example from the Jōyō kanjis, 臭 (stinking, odor) and 嗅 (smell). You'd
think the second is just 口 (mouth) added to the first, but it's not.
Etymologically, 臭 is 自(nose, simplified form of 鼻)+犬(dog), but was simplified
in the Jōyō list, and became 自+大. 嗅 was not part of the list back then. Which
doesn't mean it didn't exist. It just means it was not recognized as regular
enough by the ministry of education. 嗅 was only recently added to the list
(2010), but was not simplified to remove the extra stroke, so it's 口+自+犬,
leading to this funny inconsistency.

~~~
dastbe
not sure if you're coming from a purely japanese background, but in chinese 左
and 右 both have the same stroke order for the upper left portion.

and to show just how ridiculous han unification is, on my laptop your example
makes no sense because the 臭 has a 犬!

~~~
glandium
So interestingly, for 左 and 右, the etymological stroke order is the japanese
one. It was "simplified" in mainland China and Taiwan.

~~~
patal
Do you have a source for your claim? Particularly, you're suggesting that the
stroke order for the top left part was once different in China. I'd really
like to see a document on this.

~~~
dastbe
I can buy that, as I know the top left portion in chinese historically is for
the left hand. If the historical variation of right was for the right hand,
that would make a lot of sense.

~~~
patal
Are you really suggesting that writers changed hands during writing a single
character? That sounds totally impractical. Also, that does not explain why
there should be a different stroke order.

~~~
Manishearth
No, 左 is "left" and 右 is "right", they're saying that the 𠂇 radical itself
means "left hand", and they later added a similar one for "right hand".

Looking it up (on wiktionary, at least) it seems like originally 又 was "right
hand", however if you look at really old versions of that character it looks
mostly like a mirror of 𠂇.

Interestingly, Wiktionary lists both left and right as phonosemantic compounds
where the _phonetic_ part is "left hand" or "right hand" and the semantic part
is "assist" and "mouth" (I think in this case the "mouth" is used to bolster
the "pronounced like" of the phonosemantic compound, since it's used on the
right side not the left). This seems to be because the word for "left hand"
became the word for "left" and same for "right hand", so they're pronounced
the same; and the semantic component was added later to bolster/specialize the
glyph.

Anyway, it seems like the 𠂇 radical in 右 is etymologically a variant of 又
which is a mirror of 𠂇 (well, a mirror of a three-pronged historical form of
that) except it was rotated around the glyph so that it looks exactly like 𠂇
but the stroke order is reversed.

~~~
patal
Thanks for clearing that up.

------
corey_moncure
The depth and breadth of the diaspora around Chinese characters makes it
extremely difficult to get these things right. Even this article presenting
highly specific domain knowledge tees off with a questionable example:

`The Japanese cross the blade in 刃, the Koreans don’t`

Well Japanese have both 刀 for katana, the well known sword, and 刃 for yaiba /
"blade". But in fact 刀 is used for many kinds of blade, by itself as `katana`
and as a component of words like 太刀, 印刀, 日本刀, &c. (see
[http://kanji.quus.net/jyukugo1498/](http://kanji.quus.net/jyukugo1498/)) Is
this really pointing to a distinct concept from 刀 in Korean, as the article
suggests?

~~~
emodendroket
You've misunderstood.
[https://i.imgur.com/pTkbX7g.png](https://i.imgur.com/pTkbX7g.png)

(for a bigger list see
[https://en.wikipedia.org/wiki/Han_unification#Examples_of_la...](https://en.wikipedia.org/wiki/Han_unification#Examples_of_language-
dependent_glyphs))

~~~
corey_moncure
If that's what the author is referring to, there are less ambiguous ways to
express it.

Even so: in my experience in Japan I've seen e.g. 認 hand-written both ways,
and specifically remember becoming curious about this variant. My conclusion
after consulting with native speakers and university professors I knew there
was that they are essentially and functionally equivalent. That is, against
the author's point, Japanese speakers do not note a difference at all.

~~~
Manishearth
FWIW many of the unified characters under Han unification are considered
"wrong" specifically in Japan (in other CJK countries folks seem to be more
lax). From what I understand the feeling is as if you started writing `s` as
`ſ` in English.

I'm not very sure of this but it seems like the Japanese characters mostly
stayed the same after branching off of Hanzi usage at various points many
centuries ago (more than a millenium ago, actually), whereas in China these
characters evolved.

~~~
emodendroket
To the extent this is true I'd say it's because Korean people only
occasionally use Chinese characters and the default is almost always to
display Chinese variants, and not because Japanese people are uniquely
particular.

------
g09980
A favorite of Japanese study beginners is learning how to distinguish between
ツ and シ, as well as between ソ and ン.

~~~
cperciva
For those of us who don't read Japanese, can you explain what the four
characters in question mean?

~~~
tjallingt
Not a Japanese reader so i can't fill you in on what exactly the characters
mean but they are characters from the Japanese katakana script. Katakana are
generally used for loanwords and each character represents a sound:

ツ - tsu

シ - shi

ソ - so

ン - n

The reason these are difficult to learn is because the tiny differences in
stroke angles (especially when handwritten) make it easy to confuse them.

~~~
anw
In college, my Japanese teacher told us that a lot of foreigners get these
characters wrong, but if you know the stroke order then it's easy to see the
difference.

シ (shi) is written top to bottom. You can see that all the starting points for
the strokes line up vertically on the left. Also, the last stroke curves from
the bottom-left to the upper-right.

ツ (tsu) is written from left to right. You can see that all the starting
points for the strokes line up horizontally at the top. Also, the last stroke
curves from the upper-right down to the bottom-left.

ン (n) lke 'shi' is written top to bottom. The starting points for the strokes
line up vertically on the left. It also uses the same direction for the
longer, final stroke as 'shi'.

ソ (so) like 'tsu' is written left to right. The starting points for the
strokes line up vertically on the left. It also uses the same direction for
the longer, final stroke as 'tsu'.

~~~
anw
Replying since I can't edit: The sentence for "ソ (so)" should read "The
starting points for the strokes line up horizontally at the top".

------
jasonjei
Chinese gets even more complicated being a language without an alphabet. Even
among the traditional and simplified variants, there are different forms for
the same character based on popularity.

For example, 吃 and 喫 have the same modern meaning "to eat," but one is more
commonly used. A character like 鎌 has a variant like 鐮, just as well as 塚 and
冢. People choose characters based on stylistic reasonings (and in Taiwan, many
choose the Japanese variant to be "hip").

~~~
HumanDrivenDev
> Chinese gets even more complicated being a language without an alphabet.

If I could be very pedantic for a moment...

Mandarin can be written in many alphabets. Almost every single native speaker
one earth uses an alphabet to input chinese characters on computers and
phones. And almost every native speaker learns their language using an
alphabet - at least initially.

[https://en.wikipedia.org/wiki/Hanyu_pinyin](https://en.wikipedia.org/wiki/Hanyu_pinyin)
[https://en.wikipedia.org/wiki/Bopomofo](https://en.wikipedia.org/wiki/Bopomofo)

~~~
Manishearth
To get even more pedantic: I wouldn't say Cangjie or Dayi inputs are an
"alphabet", and a sizeable number of users use those. They're certainly a way
of inputting characters; and a method of looking at how glyphs can be broken
down (one by stroke order, one by shape), but for it to be an alphabet there
must be a rough phoneme mapping AFAICT; which there isn't.

(but yes, the majority seems to use phonetic/alphabetic input methods, i.e.
pinyin or zhuyin.)

~~~
HumanDrivenDev
As a fellow pedant I made sure to use words like "almost" (:

~~~
Manishearth
Right, it's not even "almost"; I'm saying a significant number of folks use
Cangjie -- not a majority; but still a large number.

------
hudibras
If you're interested in what the other mentioned kanji with regularly-used
non-standard forms mean, here are some rough definitions:

餌 - animal feed, bait

遡 - go back in time; go upstream

遜 - humble, modest

謎 - riddle, enigma

餅 - rice cake; _mochi_

------
greggman
I'm often surprised these kinds of issues aren't fixed quicker. There are so
many issues in browsers related to non Roman based letters (assuming that's
the correct term for it)

on the one hand I get that most browser (and OS?) dev happens in the West by
people unaffected by these issues. One the other hand with > 1 billion people
using non Roman based languages I'd expect this kind of stuff to be more of a
priority.

maybe this particular issue isn't that important? the one that bites me the
most is pressing ESC to cancel IME editing and having it exit some dialog
because the browser/os passed the ESC all the down to the app when it was
meant only for the IME. I get there are probably no easy solutions tho

this is also a place where VSCode fails because VSCode and any other is based
text editing in html needs more IME info than current broswer APIs provide

~~~
gorgedev
I am developing a program using a webbrowser control, for my own use, to aid
in learning the Jouyou Kanji and their readings. I need some text controls to
be exclusively latin, hiragana, or katakana.

The state of IME support in Windows is very poor.

WPF (legacy software?)does not support this at all with reported bugs going
back five years.

Winforms does, but doesn't meet my requirements.

Html does define the inputmode attribute, but this only works as a hint to
smart phone on-screen keyboards. The inputmode attribute is totaly ignored by
desktop browsers.

The Microsoft IME does not appear to have any API which can determine the
current state or switch between modes. However, there is currently a Html5
working group on an API for IME control.

In this day and age of multinational software, this is truly pathetic.

------
titanix2
Interesting read. Chinese characters and chinese character-like characters can
be quickly hard to deal with as soon as you leave the BMP plan. The ids.txt
file (Kanji database) is really a lifesaver in this case.

------
greggman
I wish more handwritten kanji input systems would use stroke order and count
as a weight and not a filter. As a learner is generally know the stroke order
when looking at an unknown kanji but as most input method filter on order and
count a single wrong order or count means the matching kanji doesn't even show
up in the list of possible matches

~~~
wodenokoto
All the proprietary systems do that. For some reason, proprietary Chinese
handwritten input systems are more widely available than Japanese.

macOS and ios are good examples. Ships with a good one for Chinese and nothing
for japanese. If you buy a Sony android phone you'll get a good handwritten
Japanese input out-of-the-box.

The reason is because basic stroke order information is freely available, but
you need much more information to build a good system (you need common
abbreviations and mistakes, fx)

------
Typhon
Han unification was really a pretty stupid idea.

------
mjevans
Is the 'unspecified' rendering supposed to be the same as the 'popular'
rendering (on the right); or is it supposed to be the one in the middle?

For me seemingly every example's left most and right most column (for the sets
of 3) looked identical, while the middle form was different.

~~~
hudibras
I tried a couple browsers, and I'm seeing the left and middle kanjis as being
the same, with the right one different.

------
GolDDranks
I see 「つかむ」 often being written as 掴む, but that seems to be a common
abbreviation too, since the official version is 摑む. The simplification 國 → 国
is just applied there, similarly to the 辶 → ⻌ case.

------
yuhong
One of the tables seem to show characters that are in Unicode BMP but not in
JIS X 0208 (many I think came from JIS X 0212). Another table of course show
characters not in the BMP.

------
ksec
And that is why sometimes I wonder if CJK encoding should something else
entirely rather then part of Unicode making Han Unification.

------
jhanschoo
Isn't there some form of case folding of one of the characters into the other?

