Hacker News new | past | comments | ask | show | jobs | submit login

> So the fact that kanji literally means "Chinese character" doesn't mean that kanji and hanzi should be considered the same script.

The Japanese writing system differs from the Chinese one by having its own distincts scripts (hiragana, katakana), but most of its subset made of Chinese characters (kanji) is the same than the Chinese script. The most comprehensive Chinese character dictionary is a Japanese one (Daikanwa jiten), which give definition and Japanese readings and this is possible precisely because the script is the same.

The only differences are characters created for use in Japan (kokuji) which can be treated as an extension like the Vietnamese Nôm, characters simplified by the Japanese government (some jôyô kanji) and variation in some glyph's shape (黃/黄). So, treating the full inventory of these languages as different scripts wouldn't make more sense than encoding the English, French and Czech alphabets separately because few characters differ.

My opinion is that Han unification makes sense, but the mistake made was to encode the variant interpretation at application level (e.g. html lang tags), which is not portable. I don't know how Unicode variant form works in details (putting a trailing code to a character to indicate precisely which variant is meant) but something like that at text encoding level could ease a lot of pain.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact