Hacker News new | past | comments | ask | show | jobs | submit login
Japanese writing system basics (candyjapan.com)
961 points by bemmu on Aug 20, 2016 | hide | past | web | favorite | 338 comments

This was an interesting symbol to choose for an internet explanation. It took me a while to realise that the rectangle is actually the symbol that I'm supposed to see, rather than a missing glyph.

It's not exactly a rectangle though, the two downward strokes extend slightly past the bottom horizontal stroke.

In handwriting, or calligraphy, it might be rendered like this, in fact:

  1 \        /
     \ ---- /
The three strokes not all connected, and the left and right sides tapering downward a bit (exaggerated in the ASCII above).

Image search for "kuchi-no kanji + shodou (calligraphy)":


It seems there are two schools on the relationship between stroke 2 or 3: 2 can extend downward past 3, or 3 can underscore 2.

Wikipedia has stoke order animations for radicals.


Not on my screen (http://i.imgur.com/korTPAQ.png)

Maybe it is a missing character after all :) (Oh no, I've been fooled by chrome/freetype/something!)

I see, it is indeed showing a missing character symbol on your computer. The actual character is much more square than that, in addition to the strokes I mentioned. (http://i.imgur.com/s6bW7u0.png)

Yep, that's the missing character glyph - known colloquially as "tofu".

Yes, it is.

It has the added bonus of being almost correct even if the symbol is missing.


I agree. And I'm an Australian in Japan learning Japanese.

(I was expecting it to be a rant about insufficient unicode usage.)

Doesn't help that at least for me it gets rendered differently in the title on HN and the actual article.

That said, I expected something Japanese related, given that it's Candy Japan, but you'd think common fonts would have different looking characters for “no glyph” and the kanji for mouth.

Another problem is that if the character appears in isolation, it's hard to tell whether you are looking at 口 (mouth) or 囗 (the "box radical" kanji, or "kunigamae").

Ha, for me, they look quite different in the font used for editing, but not so much in the rendered comment other than a small difference in size.

That rectangle denoting a missing glyph is called "tofu" because it looks like it, so there is a connection to Japanese anyway.

It's not clear whether or not that naming for that character originated in Japan.

In Chinese 食 is pronounced shí.

As someone who speaks Chinese, I got a chuckle out of reading 'put your favorite snack in your 口 and 食t it!' due to the association in my mind of that character and its Chinese pronunciation, immediately followed by a 't'.

Japanese use of Kanji is usually more similar to Cantonese (Cantonese is a much older language than modern Mandarin)

So in Mandarin you wouldn't find people using 食 as a verb, but in cantonese it is the correct verb.

In Cantonese it's pronounced something like "sekk"

("sihk" in Yale romanisation, but I think if you're not familiar with Yale you'd try and read that "sick" or "sikh")

Edit: "sikh"

Not to mention that Mandarin would use 嘴(里)rather than 口

The pronunciations of kanji correlate to different Chinese eras and areas.

The usage and choice of characters are influenced by Classical Chinese, which was the literary language in Japan until, uhm, recently-ish.

There is a very good recent YouTube video that explains the usage of Chinese characters in Japanese and which I recommend to anyone interested in the subject:


I was very impressed by the accuracy and knowledge displayed, as most of what is written and said on the subject is ehhh somewhat disappointing.

I've never learned Japanese, nor spent much time there.

From my brother who studied Japanese for a while, the Kanji was a real blocker for him. So I did wonder what it'd be like, given I already know a good amount of Chinese characters.

I guess this video answers that question:

Even more confusing.

Yes, 食 is a verb in Cantonese and Hakka! Both of them are more close to archaic Chinese than Mandarin.

As I understand Hakka is actually a collection of languages, not necessarily mutually intelligible.

I learned a bit of one dialect from a friend's mum. And the basics were very similar to Cantonese.

Would be interested to learn more dialects once I'm done with Mandarin and Cantonese.

Standard Mandarin (especially Beijing dialect) pronunciation of shí is much closer to the American pronunciation of "sure"†, though, so shít would be pronounced more like "shirt". (Sorry to ruin the joke.)

†IPA something like this: /ʃɝ/

A technical correction. In Mandarin, shí is pronounced very close to the sound of "shi" in ship and "shee" in sheep. You are right that it sounds close to "sure" when it is pronounced by people with strong Beijing dialect.

Um. Modern Standard Mandarin pronunciation is based on Beijing dialect, and it doesn't take a Beijing accent to pronounce it like "sure", that's totally standard. Beijing dialect does feature more erhua than other dialects, but even 哪儿 is considered standard, not specific to Beijing dialect. There are definitely dialects (mostly southern iirc) that pronounce shi as in "ship", but that's not Standard Mandarin. Every television show I've seen, for example, pronounces it like "sure".

Edit: To add a clear example, look at 事. Standard Mandarin would pronounce it "sure", Beijing dialect goes even further and pronounces it more like "shar".

I would argue 事儿, the combination of two characters, are pronounced like "sure", in most Northern dialects. (Speaking strictly, some are dialects and some are accents. Let's skip this difference here since it doesn't matter to the current topic.)

Not all the people in television shows, including talk shows, speak Standard Mandarin. This is especially true in shows produced in the north. They adopt Beijing accent to certain degrees, which, more or less, leads to the erhua pronunciation.

For Standard Mandarin, listen to 新闻联播,the official news program from CCTV.

I notice that for non-native Chinese speakers whose mother tongue is Indo-European language, it sounds erhua pronunciation is easier for them than Standard Mandarin. If erhua pronunciation is pushed to extreme, it is called 大舌头。

Also erhua pronunciation is usually NOT used in very formal conditions, like presentation, etc. Some people regards erhua pronunciation as vulgar, except for very widely adopted cases.

I've listened to 新闻联播, the way they pronounce shi sounds much closer to sure than to ship. It is somewhere in-between, and that vowel is more rhotic in Beijing dialect, but Standard Mandarin definitely still rhotacizes that vowel (especially compared to southern dialects where it isn't rhotacized at all).

Yeah I know it's pronounced differently, but as someone who enjoys (sometimes strained) cross-language puns, it didn't stop me from finding it funny :-)

Another interesting aspect of traditional Chinese characters is that complex words are expressed by combining simpler symbols. For example, the Chinese word for computer is 電腦. The first character represents "electricity", and the second character represents "brain". Which is really what a computer is, an electric brain. Similarly, a computer programmer is 程序員, where the three symbols are "rule", "order", and "person" - one who orders rules.

An interesting consequence of this is that you only need to learn around 3000 symbols to read a Chinese newspaper, just like how you can ascertain the meaning of an unfamiliar English word by having knowledge of a small set of Latin/Greek roots.

> you only need to learn around 3000 symbols

This is not a good result. Learning 3000 characters could take years, and even then there would still be some you don't know. To read an English newspaper you only need to learn 26 symbols. Yes you need to learn the words, but you also have to do that for Chinese which is separate from learning the symbols for each word. It's a hugely inefficient way to write.

Furthermore, while combining smaller words to represent a more obscure word is an improvement over introducing a new character (and is a step towards an alphabet), the examples I've heard are very one way. You might look at 程序員 and think "one who orders rules" makes sense for "programmer." But that sequence of characters could just have easily meant any number of other things (lawyer, politician, etc).

>It's a hugely inefficient way to write.

Agreed. Remindeds me of an anecdote from David Moser:

"I was once at a luncheon with three Ph.D. students in the Chinese Department at Peking University, all native Chinese (one from Hong Kong). I happened to have a cold that day, and was trying to write a brief note to a friend canceling an appointment that day. I found that I couldn't remember how to write the character 嚔, as in da penti 打喷嚔 "to sneeze". I asked my three friends how to write the character, and to my surprise, all three of them simply shrugged in sheepish embarrassment. Not one of them could correctly produce the character. Now, Peking University is usually considered the "Harvard of China". Can you imagine three Ph.D. students in English at Harvard forgetting how to write the English word "sneeze"??"

Regarding the grandparent's point of joining characters, my favorite has always been 火雞 (fire chicken) for turkey.

[1] Why Chinese Is So Damn Hard http://www.pinyin.info/readings/texts/moser.html

> my favorite has always been 火雞 (fire chicken) for turkey.

Turkeys are not native to Asia so it had to be added late to the language, which is why it doesn't make much sense.

The problem with Chinese is that when you have a new thing like turkeys you have to either create a new character, which then people have to memorize forever, or you have to string together sort-of related ideas into a compound word that has only some tangential relation to its parts. When you have thousands of words that are only somewhat related to their parts, the parts lose their meaning and become not much more than a really large and complicated alphabet.

Chinese was over-engineered to work great for maybe a few thousand words, but the world keeps getting bigger and bigger and every new thing makes the Chinese language worse.

This has been mitigated to a large degree by phones. Even pre-smartphone, they were used to look up difficult characters.

> To read an English newspaper you only need to learn 26 symbols.

Replace English with any other language using the latin alphabet and this doesn't hold up as there is no indication of meaning. For that matter, there's not a strong indication of pronunciation. I may know English and French but that doesn't help me understand Hungarian or Swedish.

I do not know Chinese but my understanding is those 3000 characters are not only glyphs, they are elements of meaning. Incidentally, that is the same order of magnitude of words one needs to know to read an English paper, with the difference that in English, there is a portion of the meaning that is encoded in grammar, ie. the relation & order of words. That a cognitive overhead we take for granted in our language system.

That's the point though. To learn Hungarian or Swedish, you'll have to learn the spoken language, and then reading and writing comes essentially for free.

In Chinese, you have to learn the spoken language, and then you have spend just as much time (if not more) to learn the written language.

Many Chinese characters do contain elements of meaning and elements of pronunciation, but that doesn't mean that you don't have to learn them.

"Dilapidated" contains elements of its meaning in the root lapis = stone, but you'd be hard pressed to deduce the meaning from it.

English has 52 characters including both cases. Moreover, the ways in which these are combined to form words are highly inconsistent and idiosyncratic.

Chinese characters are made by combining 214 radicals, and most of the characters are written by combining two of these. Most words are created by combining two such characters. It's not really that much more difficult than remembering how to spell an English word.

程序 means program. 程序员 builds on this, not the individual parts of the constituent words.

Chinese is actually a very logical language. The writing system is arguably more complicated than it needs to be (see Victor Mair and his quixotic crusade for romanization), but it conceals a language with very flexible morphemes and simple grammar. I recently had to translate 照顾着 and 被照顾着 for an app, and I had a hard time coming up with concise and decent translations that mirrored the simple relational antonymity of the Chinese.

Characters do give a lot of semantic information than an alphabet and it's definitely easier to learn new words through context if you can see the characters vs just hearing them.

Knowing Chinese characters is closer to knowing not only the alphabet but also understanding most basic English words plus Latin/Greek stems like auto (self), locus/loco (place), -ous (possessing or full of -), etc. From that base it's pretty quick to learn longer words and there are tons of clues to help you remember them.

> It's a hugely inefficient way to write.

Not necessarily true.

For instance, "Not necessarily true" is 18 characters and 2 white spaces, while in Chinese, it could be as simple as "未必".

I think "hugely inefficient" was meant not so much in terms of space requirement or number of strokes, but rather in terms of cognitive load and time required to master it.

While English might be among the worst alphabetic writing systems (compared to, say, Spanish, which has a wonderful correspondence between what is written and what is spoken), it is certainly a more efficient writing system than Chinese, in that fewer years of school have to be dedicated to just learning to read and write.

Furthermore, as highlighted by someone else, it is not uncommon for writers of Chinese to just completely forget how to write a word.

In English, you might misspell it, but it'll be rare that you can't render it at all.

Great book on the topic of Chinese, debunking several misconceptions, is "The Chinese Language: Fact and Fantasy" by John DeFrancis. http://www.uhpress.hawaii.edu/p-819-9780824810689.aspx

What traditional character using place are you speaking of? In Taiwan, programmer is 程式設計師 and in Hong Kong, it's 程式設計員. I had thought 程序 was basically a mainland-only variant.

Also, FWIW, I couldn't read a paper at 3k characters. It's more like 4-5k if the goal is reading rather than slogging through with a dictionary.

Perhaps GP was thinking of the mainland variant, but only had a Traditional Chinese IME available on the computer.

Speaking of mainland-only variants, their official translation of computer is "计算机" (lit. computation machine).

Isn't that the same in most languages though? For instance in English "electronic mail" became "e-mail", which became "email". "Web log" became "weblog", which became "blog". "Cellular phone" became "cell phone".

And indeed in older books and articles it is not unusual to see people writing "electronic brain" to refer to a computer.

A lot of non-English words for computer terms in various languages are translations of outdated English terms for the same that stuck around after the English terms changed.

English tends to sneakily disguise its words with Latin or Greek. So you may not think about the literal meaning of a word like "predict", while in German "vorhersagen" (to say before) is the exact same thing, just more obvious.

> Another interesting aspect of traditional Chinese characters is that complex words are expressed by combining simpler symbols. For example, the Chinese word for computer is 電腦. The first character represents "electricity", and the second character represents "brain". Which is really what a computer is, an electric brain.

That's nice, but I think most languages do this. It's not particular to Chinese. The word "computer" comes from "com-" (with/together) and "putare" (to reckon). Or "composition"; to position together. "compile"; to pile together, etc.

Right, but when learning English as a second language, they don't teach you "com" and "putare", because the roots are too far away and too diverse (Latin, Germanic) whereas in Chinese, it's often immediately obvious. Well, more often than in other languages. Most of it is still random-looking.

Well, that's the cool thing about Chinese - it's the Latin/Greek of East Asia :) It's to Japanese/Korean/Vietnamese what Latin/Germanic are to English, so there are less indirections.

"To pile together?"

In Chinese, 'eat' is usually '吃'. We use '吃了 (le)' to say we 'ate' and '在 (zài) 吃' or '吃着 (zhě)' for 'eating'.

It is really interesting that in China people often ask their friends '吃了吗? (Have you eaten?)' rather than '嗨 (Hi)' in daily life. So initially I thought this post was describing something in Chinese w/o my noticing the URL.

Same in Thai. They ask กินข้าวหรือยัง ("Have you eaten yet?", or literally "Have you eaten rice yet?") as a greeting.

In lowland Scotland one finds 'You'll have had your tea?' which is a semi-polite way by which the enquirer informs the visitor that he'll not be given any tea ( i.e. dinner, food ).

Has entered mainstream as a cliche of Scottish parsimony but does exist in the wild.

Interesting. It's sort of the opposite when Thais ask if you have eaten already because if you answer "not yet" it is common for them to offer you something.

The Scottish response to 'not yet' is usually something like 'well it seems foolish to have ventured out in that case', obviously delivered in the local dialect.

Though I have extracted one dinner from an Edinburgh man who deploys the phrase, but that was probably because his ( English ) wife had put it on to cook earlier!

That reminds me that sometimes in my family we'll jokingly say jeet? It comew from the comedian Jeff Foxworthy who said it's redneck for "Did you eat? "

I'm in Canada and not redneck I wonder if jeet is actually used in the US.

> I wonder if jeet is actually used in the US

Not really. Foxworthy's joke isn't that that is actually a word, which it's not; it's that that is how the phrase "Did you eat?" comes out sounding in the accent of his home, which is a couple states east of mine. (Also apparently in the accent of New Jersey, which suggests to me that this particular trait may be more coastal than specifically Southern.)

My father's native New-Jerseyan uses "jeet jet" for "did you eat yet".

The one bit of that I consistently use is "agnishna", for "air conditioner".

I recently found a list of these for Scottish that were amazing.

E.g.: "space ghettos" (in standard American accent) becomes Scottish "Spice Girls" :D

If you want literal, it's "eat rice already" - Thai does not have conjugation

Actually, if you want it literally it would be "eat rice or not yet" (กิน=eat, ข้าว=rice, หรือยัง=or not yet). Like you said, verbs are not conjugated in Thai but instead verb tenses are accomplished with helper words to indicate past, present or future. Sure is a lot more orderly than English verb conjugation with all its irregularities.

That greeting got me a bit when I was just starting to learn to read Chinese (I never attempted to learn to speak it). Learning something so different to the languages I know really made me appreciate the similarities between Swedish and English, let's say, and how that made learning easier.

I was taught that "Chi le ma?" was an ancient greeting. In old times when food was scarcer, asking whether someone had eaten was another way of asking about someone's general situation.

How do you respond? "Yes"?

Depends on the question.

If the question "你好吗" ("How are you?" although literally translates to "Are you good?"), you can answer with "好" ("Good"). If you're asked "你是老外吗" ("You're a foreigner?") you can answer "是的" ("I am").

So it usually depends on the verb that was used in the question. Although you often can say "对", which is with “是的" the closest Chinese have to "Yes".

In this specific case, you would likely just answer "吃了" ("I ate") or "吃了,你呢?" ("I hate, what about you?").

Keep in mind that this is more or less the equivalent to"How are you doing?" or "What's up?" in the US; it's a greeting and people aren't necessary expecting an actual answer.

How about if one hasn't eaten. Should one say so in reply to a stranger asking or would that be rude?

You'd probably say something like "还没有", "没有" or "没吃" ("Not yet", "No" or "I haven't eaten"). Again, it's just a greeting, the answer doesn't matter much (unless it's your mom asking you in which case she'll make a big deal of it, but that's a pretty international behavior).

Typically just a restatement: 吃了。

Yeah yeah, this all makes sense until you see 品, which isn't "a three-mouthed monster" but "goods".

Languages are weird, man.

I wOUld agree with you if it weren't for the wOUnd I have where a bOUlder pOUnded me in the head, so I don't have the cOUrage.

I see what yOU did there.

I interpret that character as "three boxes" so "goods" is a very intuitive translation :)

According to kanjinetworks.com that is, in fact, what the etymology boils down to:

> Square-shaped object (tripled) → high-quality goods spread throughout a protective container (compare 舗 and 販) → quality; counter for goods; (person's) character; grade; value.


Yeah, I'm not sure what world you have to live in for "three-mouthed monster" to be a more obvious meaning for that glyph than "stacked crates".

But I think I might like to visit.

I see you haven't gotten to the 4th paragraph of TFA...

"口" = mouth, not crate

"品" is a stack of mouth characters

I read the whole article, thanks. Do you think it's possible that more than one square or rectangular thing may exist?

Out of the several uses of 品, one is the verb "taste" which makes slightly more sense.

Whoa! I know that one -- it is "Shinagawa". I know this because my hotel was at that train station in Tokyo on a trip in the early 90's, long before signage in English was common in Japan..

It's actually just the "shina". The "gawa" is 川, river.

You need a lot of goods to feed many mouths.

Two or three of something is a pretty common pattern in kanji. Two basically implies "several", while three implies "a fuckton". Two suns, for instance, is "bright". Three suns is a sparkling crystal.

You might enjoy (and probably have already seen) "Why Japanese People": https://youtu.be/AqragQq63Js

I liked the beginning and expected the next word after "mouth" to look almost the same, with a subtle change, which would be a logical extension. But it was quite a jump from a simple square (口) to god-knows-what (食).

Also, why 食 is translated as "Eclipse" in Google?

It's not a very good choice to use "食” to represent "eat" here. In modern Chinese we say "吃", and you can see that "吃" has a square part, which is exactly "口". And "食" also means "food", but "吃" doesn't.

Also, since ancient Chinese people thought eclipse was caused by a "dog in the sky" eating the moon, it's reasonable for them to use "食" to describe it. But modern Chinese almost always say "月食", meaning eclipse of the moon, to distinguish from eclipse of the sun, so Google Translate did this wrong. It should translate it to "eat" or "food" first.

Putting the parts together, 吃 would mean the begging mouth. The native Japanese word for eat, kuu, was at least at one time written 喰う which I think we would all agree makes a lot of sense. Both languages underwent different simplifications, I suspect in Chinese it became 吃 and Japanese 食, so in both languages the original logic has been lost.

I'm a westerner who learned Japanese as an adult. I feel its quite unfortunate how much of the meaning was lost with the Chinese simplification. I can mostly make out Taiwanese/Republic of China newspapers, but can see nothing in the simplified characters.

Edit: Yay, I'm wrong, see below. Thank you internet.

Nah, 喰 is entirely unrelated; it's one of the few characters Japan invented (most are imported and sometimes simplified from Traditional Chinese).

On the other hand, you might recognize 吃 as the 喫 from 喫茶店 (café).

The Chinese simplification was overall a good thing. A lot of the simplifications are from Japanese, even. Like, compare the Traditional 體 with the Simplified 体 (body) - the latter is from Japanese.

The Japanese simplifications were pretty good, but a lot of the Communist ones are aesthetically ruinous. 车东气门 have none of the symmetry that 車東氣門 have. 气 doesn't even have its center of mass over its base of support, although at least in this case it is six strokes less. The Japanese simplifications seem to have kept the artistic flavor.


Those simplifications appear to have been designed purely for reduction of stroke count (that is, making it faster to write by hand), not for simplification in the sense of making it more simple, logical, and consistent.

(As a matter of fact, that "simplification" introduced further inconsistencies, in that certain radicals were written differently when part of a character, while the traditional writing maintained it. Example: 金 gold is the left part of money, which you can see in the traditional 錢, but not in the simplified 钱. Similarly 言 in traditional 說 vs simplified 说.)

Yeah, and the radicals sure got uglified. I calmed down a bit when I found that apparently a lot of the simplifications where just officializing shortcuts people were already taking. Kind of like spelling "with" as "w/", I'm guessing.

Prewett - true. But then, why make it "simple" but ugly for _printing_? It's absurd... just keep the complex form in books and reading printed text, and tolerate what people are writing out by hand in cursive. That's distinct anyway. It's as if we'd "simplify" the "-ing" at the end of words to some wiggle with a dot and a loop in printed matter.

for those who don't know, this is called 略字 (ryakuji) eg 門 > 门

To add some anecdata from native speakers (not me), I've noticed many simplifications like 机車 in "Traditional" Taiwanese handwriting, but I've never seen 机车.

That's completely subjective, most characters are not symmetrical anyway, and when they are it's a mistake to draw them symmetrical.

Characters are mostly displayed on screen nowadays, and most fonts are indeed symmetrical. I don't find Simplified Chinese handwriting ugly at all, but it looks odd on-screen.

喰 is not a Chinese character, Japanese created this character...吃 is not a modern simplification for 喰, it has appeared in ancient written Chinese but has a different meaning. Now in modern Chinese 吃 has the same meaning as ancient 喫(eat). 食 has always been in Chinese characters, composed of 人良, which means "things that hold one's life". 食 serves both as a noun and a verb.

Japanese kanji come from Chinese characters, but has become very different too. As a native Chinese, I think there is a clear link between modern simplified Chinese and 'near ancient' Chinese character. Here 'near ancient' means characters used up to 汉(漢, Han) Dynasty. Before Han, Chinese was very different too. There has always been a simplification process and a link.

You can't simply take a characater apart and glue meanings of the parts together - it's more complicated. Having "乞" part doesn't necessarily mean it has the meaning of begging - it's a "sound", rather than meaning, element, which "provides" the sound of "吃".

And although I don't know the "喰" character, I can tell it didn't become Chinese 吃 and Japanese 食. 食 is more "ancient", where "吃" seems only used so widely in modern time.

Simplification of Chinese characters indeed started many arguments, but the "tranditional" Chinese used in Taiwan has also developed some "simplified" characters.

Yeah the Japanese version of hànzì is much more ancient (Song or Tang dynasties, I think) and have changed comparatively little since then. Also, they simplified some characters in a very different way than how the Chinese did. (I'm probably wrong but: if I remember correctly, Japanese has changed a lot, but the writing system hasn't so the sounds like part of the character isn't always correct or even close. It's a lot like Irish or English in that the language has changed much but the writing hasn't.)

I have lived in Japan since I was a teenager and I had a much better time reading in China (was there for a few weeks as a tourist) than speaking.

It was quite funny because staff at restaurants thought I was some kind of weirdo who could point at exactly what he wanted on the menu but couldn't answer basic questions.

They appear to have simplified them predictably in a way that is not impossible to understand if you have a senior HS-level of kanji knowledge.

It's probably because it's a Japanese website and 食 does make sense to represent 'eat' in Japanese.

True. And it also makes sense in Chinese despite its "ancient" feel. I just didn't realize it's a Japanese website :/

Well, it is called Candy Japan.

When I was younger and took chinese lessons, there was a push to use more "official" chinese. We'd be told that in official use "吃" is a verb used by ghosts (which is to say, it's impolite to use 吃, and "食" was to be used by humans. So... eh YMMV.

I personally find that elitist though, but then again, in chinese history there was always a distinction between what the commoner spoke (白话文) and what the intellectual elites wrote (文言文) so I'm not surprised that this mentality has continued on


I'd also like to add that 食物 is food, which translates literally to "eat" and "thing". Put together it means "edible thing", i.e. food. The original meaning of 食 still means, "eat", not "food". It's a modern contraction that "食" means food

According to Kangxi Dictionary (康熙字典), 吃 is equivalent to 喫, which means `to eat`. Its another meaning was `stuttering`.

The ghosts part was not supported.

http://tool.httpcn.com/Html/KangXi/22/PWCQUYAZMEUYILAZKO.sht... http://tool.httpcn.com/Html/KangXi/22/PWCQRNKOCQUYUYAB.shtml

Agreed. The ghosts bit is apocryphal. Not difficult to see where it comes from though. Mouth + Qi = spiritual nonsense.

I had to look a bit to figure out how you got "qi" from a word pronounced "chi". 契, the right part of the character is pronounced qi4, as is the character 氣, which does have spiritual meanings. 契, however, has no such meaning (according to wiktionary), so I assume this is another one of those Chinese superstitious puns.

Is a preference for 食 a Taiwanese thing? All the mainland Chinese people I've ever heard say 吃.

There was a linguistic shift to prefer 吃. I would say it probably happened in the Cultural Revolution. In Cantonese 食 is pronounced "sek", and is regularly used - "sek fan" as in "eat rice".

Since HK was quite insulated from the Cultural Revolution, and evidence from older texts that use 食 all the time (喫 was not really used IINM), it would not be amiss to say that the development to prefer 吃 is quite new. Hence in my other post I mentioned that it was political agenda that drove selection of preferred words to use.

addendum: I think there is also a nice narrative in the shift to use 吃 - it was more a "commoner" word, and communism was then about replacing the elite sounding words with simpler words that is common to everyone.

乞 is most commonly used with 乞丐 (begger), but the etymology of the word comes from qi (气) according to zhongwen.com

I learned Japanese first, and it bothered me that 食 wasn't "eat" in Chinese. I'm glad to know to know the history!

Didn't know that ghost part.. Fun to know :)

I'm just suggesting from modern Chinese's perspective, because, after all, we are modern people. 文言文 (uh I don't know its English translation) is fun to read and learn, but it's like Latin since basically nobody writes it anymore.

Probably some stupid made up shit to scare kids into using "proper" chinese (for certain definitions of proper as defined by political agendas).

I do find 文言文 to be quite elegant and terse though.

So a native Chinese might have less knowledge in his language :) thanks for the added part.

Both Cantonese and Hakka use "食" as verb since ancient times, no doubt about it.

The article is about Japanese (as you can see from the part on inflection).

If anyone is interested in this kind of etymology, academic Kenneth Henshall has written 'A Guide To Remembering Japanese Characters' which contains the etymologies for the ~2000 general use Kanji. Over the millennia of evolution of the characters, some characters have multiple disputed etymologies which are still unresolved by the academics who study the history.

For the record, 食 is a pictogram of a small amount of food (the "roof" looking thing) stacked on a heavily stylized pictogram of a kind of table or plate (do an image search for 'takatsuki table'). It's claimed the Japanese word for bean 豆, has an older stylization of the same takatsuki table with a little bit of food at the top.

From a learning to read and memorization perspective, most people will probably find doing Look/Cover/Write/Check type drills (either manually or with a spaced repetition flashcard program like Anki) much more effective than using mnemonics based on (sometimes very complex) etymology.

The etymology of 食 is a bit more complex than that. It's ancient chinese that combined an upside down mouth: 亼 (best approximation) and a bowl of rice. This is the 甲骨文 version: http://imgur.com/NRBoG7F. Cute eh? It looks like someone nomnomnoming a bowl of rice.

EDIT: found another one: http://imgur.com/BsQsNBb

The method of "Tuttle Learning Chinese Characters" ( https://www.amazon.com/Tuttle-Learning-Chinese-Characters-Re... ) is working very well for me for memorization.

It uses mnemonics, but it only loosely follows real etymologys. It diverges to nonsensical, but memorable stories when this will make things easier to remember. It also has mnemonics to remember tone and pronunciation (for Mandarin Chinese only though).

That is not wrong, but weird to give as a first definition. The word "食" can mean either food or eclipse. In both cases it is read the same way; so I assume this happened because they did not have a symbol for eclipse so decided to use a homophone.


I thought the original character for eclipse is 蝕, but people got lazy and started using a more common character with the same pronunciation.

Maybe because in the eclipse the moon eats the sun.

...and in the Eclipse, the Oracle eats the 日.

(https://en.wiktionary.org/wiki/%E5%8F%A3#Etymology and https://en.wiktionary.org/wiki/%E6%97%A5#Etymology if you didn't get the admittedly horrible pun.)

The link you provided has no information as to whether that's a correct etymology or not, only that 月食 yuèshí is made up of characters 月 yuè 'moon' and 食 shí 'eat'. There are numerous other explanations for why those two characters could be used: for example, perhaps 食 used to have a different meaning which dropped out of common usage, or maybe 月食 used to be written with different characters and people switched to the easier-to-write 食 from a more complicated character, or any of a number of other reasons.

I don't know which of these is true, if any of them is. But your link doesn't have enough etymological information to indicate one way or another, either!

Thus proving that symbolic meaning applied at the glyph level is too cumbersome to be practical because it requires pure memorization of all possible combinations of glyphs and conjugations thereof.

Thus proving that the alphabet is too cumbersome to be practical because it requires the pure memorization of all possible combinations of letters, with the lengths of such combinations becoming longer and longer as words are added to vocabulary.


It's a "leaky abstraction". Makes things easier, until it doesn't. Much lik3 "I before E, except after C"

The primary problem with people who don't know Japanese or Chinese arguing against kanji/hanji is that they use very loose comparisons that they think are 1:1.

For example, when you write "inconceivable," you're not regurgitating every single letter in a line from memory. You probably remember the prefix "in," "conceive," and you know "able." You probably also know the common patterns "cei" or "eive" or "con," so the word "inconceivable" really isn't as complex as the initial length makes it look, as long as you know the blocks.

Kanji/hanzi are the same way -- they look complex and inscrutable to the uneducated eye, but they're all made of common building blocks that make it easier to remember them. After all, human memory works roughly the same way all the world around; people wouldn't be able to memorize thousands of 20-stroke character if they were all completely patternless.

The vocabulary utilizing kanji/hanzi works the same way.

Someone could look at "inconceivable" and say "well shit, that doesn't make sense! It's long, you'd have to memorize so many letters, and the letters themselves have so many bits! Plus it has 'in' in it, which makes no sense because 'in' commonly means 'inside of something', and 'con' usually means 'to swindle someone'! This alphabet thing is completely useless."

It's absurd, reductionist, and a bit offensive.

If chinese characters weren't unusually cumbersome, why then do chinese schoolchildren learn a different alphabet first (pinyin), just to assist them in learning chinese characters?

> people wouldn't be able to memorize thousands of 20-stroke character if they were all completely patternless.

Well, people don't. 20 strokes is an unusually high stroke count, and people don't remember thousands of those. Simplified chinese characters were created because traditional characters were too complex and cumbersome for people to remember.

> If chinese characters weren't unusually cumbersome, why then do chinese schoolchildren learn a different alphabet first

Also why did Korea and Vietnam abandon them entirely?

Probably because Chinese characters are a poor fit for Korean and Vietnamese grammar/vocabulary. They are a poor fit for Japanese grammar/vocabulary, too, as you can see by the fact that every character has multiple possible sounds depending on which word it is used in. In Chinese, however, the characters very much make sense for the language. Most words are one or two syllables, and correspond characters correspond to both the meanings and the pronunciation. A large number of characters even have a pronunciation hint built in. I, personally, think that Chinese is much easier to read in characters than pinyin, and you certainly won't find any Chinese ever using pinyin for more than a teaching tool. (Especially because nobody except foreigners seems to put tones on the pinyin). The fact that China has kept using them, despite a very pragmatic government that wanted to move the language more phonetic, should say something about their utility.

> The fact that China has kept using them, despite a very pragmatic government that wanted to move the language more phonetic, should say something about their utility.

Or just inertia. Norway has had a steady stream of language reforms over the last century aimed at bringing the official written language better into sync with a majority of spoken dialects. This is a result of hundreds of years of Danish rule that ended in 1814, followed by the period of national-romanticism in the period up to the subsequent break from Sweden, that led to a lot of desire to make language etc. more uniquely Norwegian.

As a simple example, we inherited parts of Danish counting.

It used to be in some parts of the country that we'd say "fire og tyve" for 24 - literally "four and twenty". This was changed to "tjuefire" (twentyfour) in the early 1950's. Anyone who has learned Norwegian in school since then has learned the new form in school and been marked down for using the old forms etc.

Despite that, and being born to parents who were in primary school when this had just changed and who learned the new forms, I still regularly use the old form.

I never learned it at school, and I occasionally had teachers complain about it. I don't use it consistently, to make matters worse - it's not a conscious choice to use a more conservative style or anything, it's just habit I picked up mostly from my dad, which is persisting in my spoken language now, when I'm 41, despite having changed in a language reform a couple of decades before I was born.

This is a difference where there's no practical benefit at all to the old form - it's longer, and the new form is more consistent with spoken Norwegian overall -, yet more than half a century later the old form still persists out of habit.

In particular, trying to engineer changes to language tends to take a long time even when there's no resistance to the change.

> learn a different alphabet first (pinyin), just to assist them in learning chinese characters?

Is that what they use hanyu pinyin for these days? I've always thought of pinyin as a pronunciation guide for Mandarin, similar to furigana in Japanese.

A child/foreigner can point at "inconceivable" and ask:

'what does "in-con-ceiv-able" mean?'

as opposed to 'what does "..." mean'

A japanese beginner will see 照り焼き and say "uhhhh.. ri..."

compare this with seeing テリヤキ

> 'what does "in-con-ceiv-able" mean?'

Really? You don't think someone with less experience in English would say "what does... inkonsayvaybull mean?" There are plenty of instances where the word is not pronounced the way you think it is.

> A japanese beginner will see 照り焼き and say "uhhhh.. ri..."

A child or beginner would probably be more likely to say "uh, what's that thing with the 日 and the 火, it's something ri something ki." Just because something is a symbol doesn't mean you can't describe it. Children are also very likely to just sketch out a picture of what they remember, even if it's incorrect, and you can usually figure that out.

> Really? You don't think someone with less experience in English would say "what does... inkonsayvaybull mean?" There are plenty of instances where the word is not pronounced the way you think it is.

You're missing his point and English is a crappy example because it's spelling is an unmitigated disaster. For example, if you can read and vocalize the Greek alphabet, you can just ask someone what "νόστιμο φαγητό"* means because you can vocalize it. You only need basic knowledge of the alphabet there. Where as with Chinese/Japanese you need to have a good base of characters to be able to potentially vocalize an unknown character which requires much more work than learning a new alphabet.

(* νόστιμο φαγητό means delicious food)

Similarly in Spanish, which has great consistency, in that if you see a word written, you pretty much know how to pronounce it, and vice versa. English is pretty bad in that department, but still much better than Chinese.

While many Chinese characters have a phonetic component (in addition to a component related to the meaning), it rarely corresponds exactly to the current pronunciation (in Mandarin).

Furthermore, you can very rarely conjure the right character out of pronunciation and and some aspect of the meaning.

Alternatively, it could mean that Google Translate is not very good.

If I had to guess, most Japanese people aren't going to have much trouble disambiguating ‘eat’ from ‘eclipse’. (And as zorceta explains, Chinese uses different hanzi for them anyway.)

The original character for eclipse was "蝕". We use 食 now in Japan because of some simplification. (But the symbolism of moon eating sun is there, as you see 蝕 has 食 in it.)

蝕 was still used with this writing in Berserk, though.

Not sure how great an explanation that really is. I like Zompist's explanation: http://www.zompist.com/yingzi/yingzi.htm

That certainly looks more complete. I was mostly wanting to see if I can pull the reader into learning Japanese without realizing that they were doing so.

Your domain was a bit of a hint. ^_^

Thanks for this, it's a great primer.

Aren't we sort of starting to doing this with the introduction of emojis? They're a little bit ambiguous, but they do have meaning behind them, nonetheless. ‍️

That's why you see services with lots of pictographic functions eclipse pure test-based ones in Asia, because they are very used to that kind of symbolic communication. And why complex emoji using text characters in creative ways were developed in Asia in the first place.

For example, Twitter never really took off in southeast Asia, but Line is incredibly popular. Why? Stickers. Line offers endless little pictures you can use with your messages, while Twitter doesn't.

Now stickers (and emoji) are taking off a lot more in the West, because they are super compact and effective communication symbols. I think we'll see more and more of it.

Sometimes I wonder if all these people that criticize or that think that a Latin alphabet can be adapted seamlessly to all languages have tried to study past a beginner level any logographic language.

I have studied Japanese, and still think that a logographic writing system was a mistake. Consider the time and effort it takes for native speakers to become literate.

I also think that the Latin alphabet could be easily used for Japanese, which does not contain any sounds that do not have an obvious equivalent in English, and even if it did, we could always repurpose a character or sequence of characters for that sound (do we really need a 'c').

Having said that, the Japanese phonetic system writes voiced sounds as a modification of their unvoiced counterparts. why can't we all do that.

The biggest risk of using Latin is that simply sharing an alphabet could cause spelling conventions of other languages to bleed in.

Native speakers seem to do fine. Learning a language while growing up, having the Hiragana as a helper, while all your media is written in Japanese makes everything easier. When they finish school they know enough Japanese to go by. It's obviously different for non-native people.

Also, it's not like you stop learning even after school. For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies. Does all English university students know these words?

Logographic systems have some major disadvantages:

• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

The only aspect in which logographic systems win is information density. You can fit more words on a single page. This is obvious if you've ever seen Chinese or Japanese copies of works that were originally written in English. The Harry Potter books are crazy thin. Also, Chinese and Japanese tweets can express a paragraph of information.

1. https://en.wikipedia.org/wiki/Character_amnesia

> It's possible to know how to say a word, but have no clue how to write it.

> Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it.

As a second language learner of English I can attest that this is not just a problem of languages written in logographic systems:-)

>The only aspect in which logographic systems win is information density.

I vaguely remember a paper that claimed that information density is pretty much constant across languages and writing systems, but I couldn't find it as for now. There is another thread on HN [1] where people compared the size of "Universal Declaration of Human Rights" in different languages. I think this misses the point because it doesn't account for intra-character information density. It'd be much more interesting to render the text into a bitmap and then compare compressed bitmap sizes.

[1] https://news.ycombinator.com/item?id=8236135

People like to joke about English spelling, but see farther down-thread for examples of how bad things are in logographic systems. Even native-speaking PhDs can forget how to write words like "sneeze" or "toad". It's a failure mode that simply doesn't exist in phonetic languages (even ones as imperfect as English).

Sorry if it wasn't clear, but by "information density" I meant area on a page or screen, not digital bytes. In the thread you linked to, people correctly point out that digital information density depends on encoding and compression schemes matter far more than language.

The paper you're probably thinking of is A Cross-Language Perspective on Speech Information Rate[1][2], which (as the title indicates) studied spoken language, not written. Annoyingly, the study was widely misrepresented in the media. It found that languages with lower information density tended to have higher syllabic rates. That is: Spanish contained less information per syllable than English or Mandarin, but Spanish speakers spoke faster to make up for that. Most media summaries of the paper omitted an important finding: the compensations didn't balance out. Different languages had different information rates. In the study, English had the highest. The runner-up (French) was 10% slower. And Japanese was 30% slower at conveying information.

1. http://ohll.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_...

2. This blog post has a more accessible summarization of the data: https://www.tofugu.com/japanese/why-do-japanese-people-talk-...

>Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

You can certainly write things out in kana. When I was more serious about studying Japanese, I knew less than 1000 kanji, but had a vocabulary several times that size, and would at times write out the word I meant in hiragana. And if we're counting autocorrect, your IME is going to take that hiragana and let you find the character.

>• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

Eh, I disagree here. It's harder if you're used to looking things up by the spelling, but once you're fast at looking things up by radical, it's not that difficult. My misguided attempts at slogging through 1Q84 while reading at a, at best, middle school level got me pretty fast at looking up kanji. Not any appreciable difference vs. looking things up in a regular dictionary.

You cannot write things out in Kana in Chinese. As such, GP's point against logographic writing systems stands, notwithstanding mixed writing systems such as Japanese.

Even without autocorrect, you can write a word in English such that most people would understand. Of course, in a logographic system you'd just write a homophone (which is what people actually do, write a simpler word pronounced the same).

As for looking up, it is in principle easier though. You only need to learn the order of about 26 things, not about 200, and can then run iterative binary search over it, and don't have to switch to stroke count. It is possible, of course.

Some upper and lower case letters have no clear resemblance, see Aa Rr Gg Nn, so one has to learn 52 symbols. Add other 52 symbols for script, if you have to. Then in the case of English learn how to pronounce or spell words, because in some cases there are no rules (why ocean and not oshean? Because of derivation from Greek, still...)

Anyway, any alphabet is better than Chinese characters.

>• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix). > >• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

I don't think English is much better in these cases. In fact, the writing can be so divorced from speech that spelling bees are a thing.

I've had Chinese colleagues who, when asked to write a word they'd just used in a sentence, were simply unable to. At first I thought they were playing a joke on me. But nope, they'd just forgotten the appropriate hanzi, and they couldn't even hazard a guess. It's a totally different failure mode than imperfectly-phonetic languages like English.

From Why Chinese Is So Damn Hard[0]:

> I was once at a luncheon with three Ph.D. students in the Chinese Department at Peking University, all native Chinese (one from Hong Kong). I happened to have a cold that day, and was trying to write a brief note to a friend canceling an appointment that day. I found that I couldn't remember how to write the character 嚔, as in da penti 打喷嚔 "to sneeze". I asked my three friends how to write the character, and to my surprise, all three of them simply shrugged in sheepish embarrassment. Not one of them could correctly produce the character. Now, Peking University is usually considered the "Harvard of China". Can you imagine three Ph.D. students in English at Harvard forgetting how to write the English word "sneeze"?? Yet this state of affairs is by no means uncommon in China. English is simply orders of magnitude easier to write and remember. No matter how low-frequency the word is, or how unorthodox the spelling, the English speaker can always come up with something, simply because there has to be some correspondence between sound and spelling.

0: http://www.pinyin.info/readings/texts/moser.html

To be fair, you can also "come up with something" in Chinese. Since there aren't all that many sounds, you can write in generic characters for the sound of the word that you can't remember.

Yep. The analogy I use is, it's a bit like if someone walked up and asked you to draw the logo of this or that company. Even if you've seen the logo a million times, you might not be able to summon up a mental picture of it, or you might remember the rough shape but have no idea how many lines go where.

I've never heard this term "Character Amnesia" but its an analogue to my situation.

I can read and write (via pinyin) a large number of characters, but cannot recollect their shape in abstraction.

I think that's just because as a foreigner learning chinese in the modern world I've never had to learn this skill.

The difference between Recollection and Recognition.

Same here - and strangely enough, it's rarely a problem. Faking characters by using the correct radical and a random homophone base character works okay in a pinch.

But because I never write characters by hand, I have a really hard time reading handwritten notes, and that is a problem.

> or autocorrect can fix

If you're bringing computers into it, isn't text entry in Japanese usually done phonetically anyway?

> For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies.

Here is a website which questions you with some random sample of words from an English dictionary, mixed with randomly generated non-words. Then it estimates the percentage of English words you know.


I am a non-native speaker, and I have scored in the 77% to 89% range, when doing this test several times.

I'm curious: did you only answer yes to the words whose meanings you knew, or to anything that you knew was indeed a word? There were some that were pretty obviously words, but I wasn't certain the exact meaning (although I could guess), so I answered no. Ended up with 77% (as a native speaker). Apparently average for native speakers is 67%, so 77-89 as a non-native speaker sounds really good.

I just did it, and I answered yes to words I knew, or knew that were actual words but I didn't know the exact meaning of. Like Argon, I know it is something related to chemistry but I don't actually know what it is. Some words were compound words which I am not sure would be in a dictionary, but still valid words.

I got 73% and I didn't say 'yes' to any fake words.

73% is apparently "This is a high level for a native speaker."

> I also think that the Latin alphabet could be easily used for Japanese

Writing Japanese entirely in Latin characters would be no different from writing it entirely in hiragana. Have you ever tried reading that way?

Kind of. In my first semester of japanese we worked in hiragana+spaces.

Having read English language papers on Japanese linguistics, I can also say that reading the Latin is easy too.

Sure, I didn't mean to suggest it can't be done in short spurts. But reading a novel that way would be hellish.

The larger point being, Japanese isn't locked into using a logographic system - it already has two phonetic syllabaries that people could start using exclusively if there was some advantage to doing so.

That sounds like an absolutely miserable experience. I'd rather be forced to look up every 3rd or 4th kanji than try to deal with all hiragana writing.

> I also do not think that the Latin alphabet could be easily used for Japanese, [...]

You stuck an extra “do not” in your sentence

* * *

As far as alphabets go, the Phoenician/Greek/Etruscan/Latin alphabet is pretty ad hoc and mediocre. But hey, it’s what we know. At this point, I think we’re stuck with it.

Similar story for modern Hindu/Arabic/European numeral glyphs. Learning arithmetic would be noticeably simpler if the glyphs expressed some of the symmetries of the number system. Alas.

Removed the "do not"

As far as the alphabet itself goes, I do not think that Latin is that bad. All symbols have a canonical sound associated with them. The problem is that our usage of the alphabet is horribly inconsistent. This is partially due to the fact that English has sounds that cannot be expressed using the "pure" alphabet. Arguably Japanese has this same problem in their system, with the ゃ、ょ、ゅ modifiers. But at least they distinguish those from や、よ、ゆ by size, and are disciplined about their usage, so we can consider the set of compounds to be their own characters and not have a mess.

Of course you still have the ず/づ issue, and the pronunciation of は and を as わ and お in their most common usage. But, even in modern Japanese, these oddities are not universal.

Out of curiousity, are you aware of any numeral system that beats Arabic? By pre-Arabic European standards, Arabic numerals are a masterpiece of symmetry.

Here’s my proposal for base twelve numerals, http://i.imgur.com/UobIObq.jpg ; multiplication mod twelve, http://i.imgur.com/dRielBv.jpg

It can also be nice to use a “balanced base”, with digits for negative numbers, e.g. in a base ten context you’d have digits for –4 to 5 (or if you’re willing to have multiple expressions for the same number, –5 to 5).

A balanced base twelve multiplication table might look like this: http://i.imgur.com/quEcxH0.png

> As far as alphabets go, the Phoenician/Greek/Etruscan/Latin alphabet is pretty ad hoc and mediocre. But hey, it’s what we know. At this point, I think we’re stuck with it.

You mix the whole development line of that Latin alphabet into one dismissive argument. I see lots of difference between the Phoenician and the Latin alphabet and FWIW, the Latin alphabet is quite versatile as its wide application shows.

It wonder what do you consider mediocre about them?

> Similar story for modern Hindu/Arabic/European numeral glyphs. Learning arithmetic would be noticeably simpler if the glyphs expressed some of the symmetries of the number system. Alas.

I don't think learning arithmetic would be much simpler with other numerals. Even the Romans could do it and they had one of the worst possible numerical systems.

I find our numerals quite fine. My daughter was recognizing numbers before she turned 2. There is some mnemonic to the first four (1 line, 2 corners on the left, 3 corners on the left, 4 corners overall) and most are quite distinct from our Latin letters. 6 and 9 are annoyingly symmetrical of each other, though.

Writing a less dismissive / more serious argument about the Latin alphabet would take a few hundred pages. You’re right though, I’m not a speaker of (or expert in) ancient Phoenician, perhaps their alphabet was a bit better structured for that language (it looks pretty ad hoc though). I can primarily speak to the Latin alphabet’s irregularity and mediocrity for representing modern English/Spanish/etc., though it doesn’t seem to have been much better for Greek or Latin. Obviously it works well enough to be the practical anchor for written culture, and I can certainly imagine worse systems (little Egyptian-style pictographs for letters for example). But it’s hardly elegant or systematic. The ordering of the letters is also pretty much arbitrary, and has nothing to do with the separation between consonants and vowels, or the relationship between particular sounds.

For an example of a better designed alphabet, check out Korean Hangul.

* * *

The numerals 1, 2, 3 come from just writing strokes, like tally marks, which over time became connected in handwriting. The other numbers were mostly fairly arbitrary symbols, which morphed slowly over time with occasional replacements and swaps. Otherwise, the symbols have absolutely nothing to do with the numbers they represent or with the base ten number system. Overall, I’d say numbers 0 and 1 are pretty effective. The rest are a huge waste of potential.

Same story for the words/names used to represent the numbers. They are made of arbitrary sounds in arbitrary numbers of syllables, reveal nothing about the theoretical properties of the numbers, some of them are hard to say or easy to mistake, etc. Especially for numbers beyond ten, the names are irregular and confusing. This has a real practical impact. Counting is notably easier for Chinese speaking children than for English speakers.

> I don't think learning arithmetic would be much simpler with other numerals. Even the Romans could do it and they had one of the worst possible numerical systems.

In general, Romans did their arithmetic using little pebbles (“calculus”) on counting board (“abacus”), and used written symbols only for recording the output of their calculations. This made some types of computation very difficult (because using pebbles to record every step gets cumbersome), which helps explain why science has taken off in the past 500 years in Europe after we started developing better notational conventions and using Hindu–Arabic numerals and later decimal fractions, logarithms, etc.

My son is about 2 weeks old, so I can’t tell you yet how well he learns arithmetic using a different set of numerals. Ask me again in about 10 years.

We should switch to Fëanorean script. It's almost IPA without the notational horrors.

> the Japanese phonetic system writes voiced sounds as a modification of their unvoiced counterparts. why can't we all do that.

Fun fact, we do do that in English, at least for C and G. (G was introduced as a modified C to indicate the voicing).

by that measure we should forget about historical languages and learn something constructed like esperanto.

languages are not solely a means of communication but a part of a people's cultural identity. I think the greater dependence on contextual cues and ambiguity in Chinese/Japanese lends itself much better for linguistic art forms like poetry and literature.

I think the debate is more Logographic vs. Alphabet, rather than Logographic vs. the Latin Alphabet.

There are pros and cons. A big con with Alphabets is that words lose their meaning over times. I find reading Old English (1500 years old) to be less comprehensible than "modern" Latin, despite being a native english speaker, and only knowing a little latin.

I find reading even Early Modern English (400 years old) an effort initially before I get reacquainted with it (Shakespeare).

In 300 years time I hate to think what English speakers will think of our texts.

That said, if I had to choose another language to learn, it would be one with an Alphabet, which seems far easier to me to learn, and type, than memorizing 1000s of symbols.

If you replace kanji with katakana and keep hiragana for particles and conjugation, you can call it a day.

Easy to learn, no more trying to guess if it's on/kunyomi, immune to mispronunciation from using a foreign alphabet, the list goes on.

Speaking as someone who started as an adult and is now fluent, I think this would make Japanese much, much harder to learn.

Or rather, the 2-3 months would be ten times easier and everything after would be ten times harder.


What is the advantage of using a different symbol for each word, that offsets the huge disadvantages of having to learn and remember a different symbol for each word?

Especially considering that the spoken language already distinguishes between all possible words through pronunciation (and context in the case of homophones.)

It's hard to explain. In English, spelling, pronunciation, and meaning are all more or less interrelated, right? In Japanese, writing (kanji) correlates to pronunciation and to meaning, but pronunciation and meaning are mostly unrelated to each other. Kanji is what disambiguates them.

So, obviously learning 1000 kanji isn't easy. But doing that is what makes it possible to learn 100,000+ words whose pronunciations and meanings would be otherwise largely unrelated.

It's quite similar to the role that Latin/Greek roots play in English. When you see a word that includes "-graph-" you know it probably involves writing, and similarly when a student of Japanese sees a word with "間 (kan)" they know it involves an interval or space. Throw away the kanji, and your student now just sees "kan" - which means the word will probably involve an interval -- or a barrier, or emotion, or appearance, or a tube, or a building, a warship, a crown, an ending, China, a publication, a government ministry, or.. you get the idea.

A lot of people think that and personally as someone fluent in Japanese (as a second, well rather something like fourth, language) I also sort of feel the same way. However if you look at it without the learned biases, there is a great example where a country with fairly similar language in terms of grammar and sounds that had used to use chinese characters switched to a phonetic alphabet and are not noticeably worse off for it: Korea.

This has been brought up and replied to lots of places elsewhere on the page.

There are way too much homophones and you don't always have the luxury of the context. Learning a symbol for each root (not word!) is not that bad, English spelling is almost as bad, actually.

Spoken language is quite limited compared to written Japanese.

Do Japanese audiobooks exist?

Assuming yes, do their users have significant problems understanding the written text when pronounced in an audiobook? Are there well-known conventions or shortcuts or explanations that audiobook readers insert into their speech to signal the correct meaning of the word?

Do Japanese audiobooks provide evidence for or against the idea that doing away with kanji in writing would not harm understanding significantly?

Fiction audiobooks do exists (although not nearly as common as in English-speaking countries), but audiobooks can't possibly work with non-fiction and especially technical texts unless you are going to use English words for literally every single term. I mean, Japanese has only about 100 moraes and way too much words are just 2-3 moraes long.

We should all speak Hawaiian.

Another interesting side-effect that the compactness of Chinese symbols (other variations: Hanja in Korean, Kanji in Japanese and so forth) allowed was a higher chance of survival against natural disasters like wild fires or crimes like thefts or vandalism.

It was/is far easier to ensure redundancy of scripts and books since the costs of reprinting/copying was far lower compared to other forms of phonetic systems.

The compactness explains how so many archaic, buddhist scripts could survive to this day.

A counterpoint: When a message is written with an alphabet, it's not as compact, but its meaning can be guessed at even if significant portions of the message are missing (known as lacunae). See, e.g., the TV game show Wheel of Fortune; another example is the Dead Sea Scrolls and other ancient manuscripts that have deteriorated over time.

That may be true. And perhaps the same argument could be made for Chinese characters?

Could you elaborate on why the alphabetic system is intrinsically more efficient than Chinese characters in terms of recovering messages from partial loss of texts?

Spatial dispersal of the glyphs means that fewer glyphs would be taken out by any given insect-gnawed hole, UV-radiation fading, hurled paint glob, etc., and thus less of the overall message would be lost to that single incident of damage.

It's the same principle as how soldiers are trained to spread out when in battle: If they bunch up, it increases the risk that a single mortar shell (or artillery round or machine-gun burst) could take out a lot of troops.

Oh, now I see. Thank you for explaining your point succinctly.

Though I do not have data to back up my argument, I still reckon the Chinese glyphs/scriptures would have had a better chance of survival.

While I think your point is valid, its disadvantages outweigh the advantage, at least since paper/papyrus was invented.

Being spread out to double in length (double being an arbitrary multiplier) would still be inferior to being dispersed to two physical locations (redundancy). I think this is where don't put all your eggs in one basket holds true.

Plus, important docs must have been actively maintained by hired librarians(?). With human maintenance involved, less in volume could have been an advantage for it is easier to move around and maintain the docs. Ofc, when left out in the wild, it is a different story.

Personally I do not like Chinese character system as it has so high a barrier to entry for learners. I love alphabets, Korean Hangeul, or Japanese Hira/Katakana for this matter. Have you tried learning any of those? :-)

Anybody care to explain why "costs of reprinting/copying was far lower compared to other forms of phonetic systems"? I know nothing about printing technology but I'm a Chinese.

Far many more symbols are required to represent a word with alphabetical systems. I used 65 symbols and 11 spaces to write this sentence whereas with ideograms I'd need about 13 symbols.

OIC, To express the same meaning, using Chinese needs far more less physical space than using English. And you know what? Classical Chinese takes the compactness to the next level ;D

I'll try give an example: English: The quick brown fox jumps over the lazy dog

Chinese: 敏捷的棕毛狐狸从懶狗身上跃过

Classical Chinese: 棕色敏狐跃懶犬 Note: This is composed by me, maybe not very well-written, and maybe it can be even more compact, but you see what I mean ;)

入 means "enter", 口 means "mouth". 入口 means... "entrance". Actually for most kanji there is no single meaning. Some meanings might even have nothing in common with each other, because they've been based on ancient Chinese wordplay or something.

But 口 also means "loophole", so 入口 means "entrance" is perfectly logical - "The loophole that allows you to enter another building/strucutre".

Fun coincidence:

A vomitorium (any modern person associates that with vomiting, i.e., stuff coming out of your mouth) was the name for entrances in Roman amphitheatres.

Entering the mouth of the building?

I think the character does double service in that it has meanings which are very much mouth-related - 薄口 (thin-mouthed - like weak taste) 後口 (after-mouth - after taste); but it also has a ton of meanings which are like opening / spout / hole / crater.


It's not that a land's-mouth is a crater, more like, mouth and opening are more synonymous in feeling in Japanese.

Almost certainly. One fascinating aspect of language is that many metaphors that are baked into language appear in many languages. E.g. In English we can form the future tense with modal verbs, "I will..." and "I am going to..." and in Chinese there are similar modal verbs "我要..." and "我去...". In both languages the idea of intention, or motion, are used as a metaphor in forming the future tense. Or 加油, an expression of encouragement similar to "put your foot on it" which has no equivalent in English, but does in Danish, "giv det gas".

"Put your foot on it" means the accelerator / gas pedal, that seems very much equivalent.

I mean in English it isn't used as a generic encouragement, while in Mandarin and Danish it is.

put the pedal to the metal

(idiomatic) To exert maximum effort.


Yes, that is an equivalent phrase, but much less commonly used than 加油 and "gi' det gas".

"Since we already have symbols for all the sounds we can pronounce"... Not with 26 letters we don't. Other languages have other sounds that English can only try to emulate, and even English has sounds that require multiple letters.

Here are the 44 sounds that English generally uses: http://www.antimoon.com/how/pronunc-soundsipa.htm

It does not include sounds which are borrowed from other languages, like the Hebrew 'ch' which happens at the top of your throat, or the Spanish trilled r, or the glottal stop which actually occurs in spoken English all the time in some dialects.

If 口 substitutes for mouth -- the adjective form of "mouth" is "oral", an etymologically (and audibly!) distinct word. Should that use 口 too?

If king is 王, kingly is 王ly, and royal is 王al, what is regal?

If mouth is 口 and mouthed is 口ed, why would ate be 食t rather than 食ed?

Japan misinterpreted the Chinese writing system (already terrible) into easily the worst writing system known to mankind. It won't look cute when you go beyond two symbols.

>If 口 substitutes for mouth -- the adjective form of "mouth" is "oral", an etymologically (and audibly!) distinct word. Should that use 口 too?

Yes. And that's exactly what they do in Japan. Rather uncommonly among languages, English is a fusion between romance and germanic stems. Often you'll see two phonographic bases for the same concept: Easy ones are food: Pork/Swine, Cow/Beef, but Oral/Mouth, etc.

Japanese is similar, many words have both their Japanese native stem and an imported chinese sound. The imported chinese character was assigned to the Japanese native word, even though it sounds totally different. I imagine it's makes Japanese quite frustrating to learn as a chinese speaker, but Chinese is really easy to learn for a Japanese speaker.

Take for example 食 (eat, from the article)


can be read: i, ji, jiki, shoku (which derive from chinese, imported twice during two dynastic periods, corresponding to chinese ji/yi/zi)

but also deriving from native japanese words: ku-, ha(mu) ta(beru) o(su), uka, uke, ke, shi, (last five are very rare).

All of these uses mean variously "to eat". The pronunciations are entirely contexual.

> Yes. And that's exactly what they do in Japan.

I'm aware that that's what they do in Japan. I'm pointing out that it's idiotic.

That's because you're thinking in English. "Kingly" is a very English way of thinking of things (as in king-like behaviour for example). In Chinese and Japanese, there simply isn't a way to describe it as elegantly as in English

In Chinese it'd be 王道 (the tao of the King if translated directly to English), which has a completely different connotation (it's more holistic in concept (i.e. it has an overloaded meaning) when compared to "kingly" in English, which has a more singular-use meaning).

"Royal" itself is an overloaded adjective in English. AFAIK there are no adjectives in the east asian language that has the same semantic meaning as "royal" - The Japanese version would simply be 王の, which translate to "belonging to the king", while the chinese version would be 王室的 or 王室之(belonging to the king's office(I guess you could say crown)[0]). Ditto with "regal". There simply isn't any proper translation for the adjectives in English.

Other than that, in Chinese, there is the concept of a radical, which can be combined to inform the readers about the context it's used in. In Japanese, as bemmu wrote, it'd be additional kanas to inform of context.

[0]Fun fact: 之 and の used to denote the same things up to about 300-ish years ago I believe (timeline could be wrong). In Japanese 之 may be pronounced the same as の. Either way, the Chinese and Japanese words are very much the same, barring some minor kanji differences. Grammarwise, however, it's a completely different language

>"Kingly" is a very English way of thinking of things.

I'm not sure about that. Japanese has adjectival nouns (commonly referred to as na-adjectives), and the 的 suffix.

Additionally, as you identify, the の particle also serves this function; but you give it a much more restrive role that it actually has (as is typically in English language Japanese learning material. In general, の marks the genitive case, which simply means that the first noun modifies the following noun in some way. It is often used to show possesion, but can also be used in a way close to ~ly in English.

> That's because you're thinking in English.

No, it's because I thought making the point in English made more sense in an English-speaking forum than making the point in Chinese. I'd go with 像国王一样(的) "like a king" or 适合国王(的) "fit for a king" for the English senses of "kingly".

The point is that, as bad as the Chinese writing system is, it's still fundamentally a writing system. 女, 娘, and 妮 may all variously mean "girl" (actually, 女 is an adjective), but they are written differently because they are different words (or, as the case may be, stems). Conversely, it makes sense to talk about the pronunciation of a Chinese character. Japan somehow overlooked this principle when trying to adopt writing, and the Japanese system is a total mess. Japanese 漢字 can only be read in context; in isolation, they represent a grab bag of some unrelated words with shared semantics along with assorted nonsense syllables.

> Other than that, in Chinese, there is the concept of a radical, which can be combined to inform the readers about the context it's used in. In Japanese, as bemmu wrote, it'd be additional kanas to inform of context.

This appears to be... nonsense? The concept of a character radical is not restricted to Chinese. It refers to a part of the character that gives you a hint about the overall meaning. For example, the radical of 冷 "cold" is the 冫 on the left, which means "ice". The radical of 切 "cut" is the 刀 "knife" on the right. And the radical of 漢 "the Han race" is the 氵, which means "water" (they're not all helpful). They are an inherent part of the character and are totally independent of any context. And the kanas you describe as "inform[ing] of context" in the OP do no such thing; they encode grammatical suffixes which have no characters of their own. Chinese uses (wait for it...) Chinese characters for the same purpose; Chinese grammatical markers, unlike Japanese ones, do have dedicated characters. Radicals are a completely unrelated phenomenon.

>already terrible Don't quite see how Chinese is so terrible. It's a remarkably efficient mode of writing for a language filled with monosyllabic words and homophones.

Japanese's appropriation of hanzi is largely a historical accident due to geographical circumstance, but most learned in the language most would agree it's far more efficient than simply using hiragana or katakana or even romaji; disambiguation by pictographs (though in modern times they are more accurately phono-semantic compounds) is of great value in written language where space is at a premium.

> Don't quite see how Chinese is so terrible. It's a remarkably efficient mode of writing for a language filled with monosyllabic words and homophones.

It is the opposite. A writing system that requires multiple years to learn is not "efficient".

It's more efficient if you have a lot of words with the same pronounciation.

In what way?

I think you're pointing out the idiosyncrasies of English here.

"Mouth" is both a noun and a verb. The past tense of "eat" is "ate", not "eated".

No wonder English is so difficult to learn.

The only problem is that you learn all 2000 at the very minimum and more than that if you actually want to do something practical.

Each one has more than 1 reading, a particular stroke order, and many other things.

Pretty sure you need to learn more than 2000 English words at a minimum too. The only difference is that instead of learning the 1-dimensional arrangement of letters, you learn the 2-dimensional arrangement of radicals.

> The only difference is that instead of learning the 1-dimensional arrangement of letters, you learn the 2-dimensional arrangement of radicals.

Except that usually, almost nothing about the arrangement of strokes can be inferred from the sound or meaning of the word, and vice versa.


No, you also have to learn the sounds of the words so that you can speak them, and then most of the characters have multiple sounds used in different places, and they place multiple characters together for a lot of words.

It's a lot more complex than just using letters, which is something they can also do.

I just discovered NativLang on YouTube so this really is in my zone of interest today.

I've just spent the last few hours learning all about languages how they developed and each culture's spin on adding as much meaning as efficiently as possible to written symbols. I've always loved languages so this was more of a brush up plus learning.

It seems and rightly so ambiguity is death to any characters and efficiency is also fundamental to the character.

I'm not Korean but I like their style literally I like how their language style is so efficient in context to mouth position. It was created because Chinese characters didn't suit Korean language. Japan also streamlined Chinese characters to better suit their culture.

Mayan is another wild language full of meaning in such compact symbols. I had a hard time following their characters.

Everything old is new again.

I have no , but I must ...

Edit: Nevermind, HN swallowed the Emojis.

Actually, without the emojis, it's even more intriguing!

This has the added advantaged of being recognized by both native Japaneses and Chinese speakers instantly, as long as we keep it to kanji.

I recognized what the author is doing from the start as a Chinese speaker.

How many symbols are there? And how would the keyboard look like though?

A few thousand, with most adults knowing ~1000-2000, but you can get pretty far just knowing the 250 most common.

The keyboards look about the same. The trick is there is also a phonetic alphabet that you can use to compose the ideographic characters. Basically Japanese input methods work kind of like autocompletion in an IDE. You spell the word you want using phonetic characters and a little popup lets you pick the ideographic transliteration when you hit the space bar.

Here's a typical Japanese keyboard https://s7.postimg.io/i5fg1c1kr/SKB_KG3_BK_FM.png There are a few different kinds with slightly different ways of working but they're more or less the same. A lot of Japanese just use a standard American layout and spell the phonetic characters using Romaji (Japanese phonetic characters transliterated to Latin characters).

So then typing Japanese on a keyboard still takes several key-presses for each word? How does writing speed compare between English and Japanese on a keyboard?

Furthermore, how was the placement of the characters decided? Are they more closely related to QWERTY (so that typewriters don't jam) or to Dvorak (so that the most frequently used letters are on the home row, and so that alteration between left and right hand is maximized), or unlike either? I use Dvorak and if I were to learn Japanese and type it on a keyboard, I'd want the typing experience to be similar to how it feels to type on Dvorak for English compared to QWERTY.

Yeah, it takes several keypresses per word. Speed is pretty comparable to English as the process is very similar to spelling English words.

To explain how it works in detail, it's important to first note that Japanese doesn't really use spaces between words. This probably sounds weird if you're used to English / most Western languages like it'd be hard to read but actually it's not. You basically compose one word at a time and it draws an underline under the word you are currently typing, then when you press space bar it autocompletes to the correct ideographic spellings. Pressing space again let's you cycle through different possible transliterations (including leaving it spelled in phonetic characters); it almost always gets it right except for homophones (which there are a lot of in Japanese) or if you want to deliberately pick some unusual/archaic spelling for stylistic reasons. Then you proceed to composing the next word. Sometimes dedicated Japanese keyboards have a separate button for the "autocomplete" function, but most use space bar AFAIK. Wikipedia has a description with a demo image of the Windows IME (they all work pretty similarly AFAIK) https://en.wikipedia.org/wiki/Japanese_input_methods . Cell phone input described on that page is where things get interesting / deviate more if you're interested.

So there's a small amount of extra overhead with selecting the correct transliteration, but it's minute once you get used to it. Japanese is I'd say slightly more information dense than English, so it compensates and the typing speed is about the same. I'm not actually sure how the character layout was chosen for the Japanese keyboards. Looking at it roughly, I'd guess that the placements are approximately matching the English usage frequency of QWERTY corresponding to the frequency of usage in Japanese as that looks about right to my eyes, but that's just a guess.

You can use Dvorak or whatever you want to though. You don't need a specialized Japanese keyboard. Instead of typing the kana (phonetic characters) directly you just type their Romanized form. So instead of typing たべます (phonetic spelling of "I eat") you'd type "tabemasu", but it'd otherwise behave as I described. I know a lot of Japanese that don't even bother using proper Japanese keyboards and just use standard English keyboards, especially programmers. You'll have to fiddle with some settings, but I know for sure that it can at least be done on Linux and Mac and I'm sure Windows can do it too.

Edit: to explain how Japanese writing works a little better, there are actually 3 "alphabets" - two phonetic (hiragana and katakana) plus the ideographic alphabets, called kanji. The phonetic alphabets always correspond to the same sounds, whereas the kanji refers more to a meaning/idea and can can be read to correspond to multiple different sounds depending on context. For example 食, the character for eating/food is pronounced "ta" in "taberu/食べる" ("to eat"/"I eat"), but "shoku" in "shokudou/食堂" ("cafeteria"; the two characters literally mean "eating room").

Kanji usually only serve as the "root" of a lot of words and Japanese writing tends to be a mixture of ideograms and phonetics. For example, 食べる, where べ ("be") and る ("ru") are phonetic characters. If you conjugate the verb to for instance past tense 食べた ("tabeta"/"I ate"), the root character 食 that means "eat" stays the same but the phonetic characters change to indicate tense. It's also completely valid to not use kanji and spell things out entirely phonetically and this is how most people learn starting out this way and gradually replacing them more and more kanji as they learn, but to do so is considered childish / uneducated.

Japanese is a lovely language, I recommend learning some. Relative to English I'd say it's actually grammatically much more organized / logical (and hence easier), but on the other hand reading and writing are significantly harder to learn. The easy availability of manga/anime/novels in both untranslated and translated forms across a wide range of language levels makes it much more accessible than it was even just 10-20 years ago.

Here's a hilarious design for a keyboard with a key for most Japanese symbols: https://japan.googleblog.com/2010/04/google.html

It's an April Fool's joke, obviously. For Japanese you can use romaji or some other phonetic system; for Chinese you can use pinyin or bopomofo.

LOL I like the toroidal shape in the first sketch!

Hah, and there's the kicker. In Japanese the most common input method is phonetic, either with a latin alphabet keyboard or a hiragana keyboard. The IME kicks in and has you select the ideogram you mean, so in reality it would save you no time typing and actually cost you time.

> The IME kicks in and has you select the ideogram you mean, so in reality it would save you no time typing and actually cost you time.

I don't think anyone does it kanji-by-kanji. In reality, it's really autocomplete as it exists with English keyboards. You type the first few bits in hiragana or romaji, then autosuggest comes up with commonly-used words and you select the one you want.

In addition, hiragana input on mobile devices is FAR faster than romaji input, so I'm not sure how you lose time.

Yep, and add in that with swipe gestures, the hiragana keyboard only needs ~10 keys, making it much less fidgety to hit the keys on mobile.

Typing Japanese on mobile phones is pretty fast and painless. Easier than English on a regular keyboard, though not quite as fast as English "swipe" style input (whatever the term for that is?).

> in reality it would save you no time typing and actually cost you time.

There's often (always?) a reaction when new forms of communication are introduced to language. Usually the older generation railing against the "degradation" or "misuse" of the language the way they learned it. For example txt and emojis currently in English.

Similarly with e.g. Chinese and modern IMEs. Because there are frequently many characters that match phonetically, to save time, typically younger people just pick the first one that's suggested. Mostly due to internet and phone/tablet use.

I highly doubt that's true, based on my experience with Chinese pinyin IMEs.

A Japanese phonetic IME should have enough local context to guess which Kanji is correct. You type, then go back and quickly proofread later.

It's been about 10 years, but I used the IME in Windows 2k quite extensively.


I recall this window strongly, with ~10 items in it being common. Playing with it now in OS X perhaps it's improved in recent years.

Again just talking about my experience with Chinese, but the IME guessing quality has really improved markedly over the last decade.

In fact, you can often just type initials for your entire sentence and the IME will guess correctly all the way through. It's like being able to type 'w a y u t?' and getting 'What are you up to?' filled in for you.

I'm pretty sure Chinese is even more homophonic than Japanese, which is why I'd expect the Kanji inference to work better.

There are several ten thousands of Chinese characters, but the vast majority are rarely, if ever, used. If you know say 3000, you'll do quite well reading most things.

(In Japanese, the topic of the original article, the situation is somewhat better, as mostly only a fairly well defined subset of about 2000 characters is used, next to (two) syllabaries and the Latin alphabet).

Knowing only 250 or even 1000 characters is quite unsatisfying, though. When you read a newspaper, it'll read like this: "At the meeting yesterday, president <someone> said that it is of great importance to <something> the <something>, otherwise surely the <something> will <something> down. <someone> suggested a possible solution, though, by bringing the whole country together to <something> for the future and implement a better <something>."

Sure, you understand 90%, but that's not really cutting it.

In Japanese primary school at the end of grade 6 students are expected to know 1,006 characters. At the end of high school you are expected to know total 2,136.

If Japanese or Chinese speakers had invented computers before discovering the Latin alphabet, I'm guessing keyboards would have about 6 keys, one for each type of stroke. (I think that's how many types of strokes there are.). You would have stroked in each character. There are input methods that do this, but nobody uses them.

That was cool. Interesting enough, you can come to the same conclusion with emojis.

Exactly. Although the Japanese realized that the emoji are moji a long time ago.

This was my thought about how the article would progress until finishing it!

I think it would be interesting if there were a Latin equivalent of Chinese characters. Different roots could be represented as different characters, some could be used for each of the suffixes like "ly", "tion", etc., and the characters would be joined together to create words like in Chinese.

Different Romantic languages could be represented this way. In the same way that Mandarin and Cantonese use the similar character sets with different pronunciations, and with some characters specific to each one, different languages that have Latin roots would have a few of their own characters specific to their own language, but mostly drawing from the Latin pool.

The pronunciation for each would have to be memorized of course.

Japanese is an unusual language to write. It's influenced by Chinese (in two separate eras), English and perhaps many other systems. Symbols alone aren't a perfect fit for the language (since they add tense and such), but neither is an English style alphabet.

Why wouldn't an "English style" (roman alphabet) style alphabet work?

Romaji works just fine, albeit with the addition of a macron diacritic. Though if it really was the primary writing system, a way to notate tone accent might be necessary (My gut says no, it carries relatively little semantic weight).

It doesn't work due to the insane amount of homophones. When you are speaking with someone you have context and you can discern the meaning of what it's said. But random words or texts can change it's meaning depending of what character is used. And the tone system doesn't help as it can be seen in Chinese pinyin.

For example, how many kanji can be read as 'shuu'/しゅう: http://jisho.org/search/%E3%81%97%E3%82%85%E3%81%86%20%23kan...

Try to do that with tones.

Why do you have less context in writing than in speech?

I'd be willing to bet heavily that the vast majority of those "homophones" are primarily writing-only, domain specific or archaic "shorthands", which are referred to in speech with slightly more verbose alternatives. Switching to a non-character based system would admittedly in that case mean some domain specific writing would be slightly less compact, but that seems a reasonable tradeoff given the unwieldiness of the current writing system.

> I'd be willing to bet heavily that ...

You'd lose your bet. In that "shuu" link as an example, most (10-12 or so) are common enough that you might hear them in a typical newscast, with that pronunciation.

What makes things manageable is the combinatorics. E.g. there are dozens of kanji read "shuu", and many dozens more read "kan", but most of them are only read that way when part of a 2-character compound, and only a small subset of the possible "shuukan"s are words, and only a subset of those words are common in spoken conversation.

Even then, it is a very homophone-heavy language. I can think of four "shuukan"s off the top of my head that you might hear from a newsreader; it would only be after those that you'd get into domain-specific words. This is pretty typical.

It's not that you have less context, as much as you /need/ less context. Instead of a few extra words to describe something, you get a different character.

Here's a great example of how in writing you can disambiguate "aunt"/"older woman": https://twitter.com/MaggieSensei/status/765769637372030977

In the above example all three are read as おば (pronounced: oba). When spoken you still need to differentiate, but it'd either be obvious from context or you'd just explain it manually.

Because you are going to select different phrases and words while speaking than you are when writing. Even with context clues from the conversation, it can at times be confusing, so you have to explain what you meant. Usually it's verbal, sometimes it's 空書 (sky writing). To avoid having to do this frequently, people will often adopt a subset of the language that is less prone to confusing homophones for their vocal communication.

It creates a situation where you have people who have wildly different voices in their writing than they do in their everyday speaking, which is an interesting phenomenon. (To me, at least)

Korean has just as many homophones and used to be written with characters like Japanese. Now they are doing just fine with their phonetic writing system.

Korean didn't work at all under the imported Chinese system. Japanese also had problems but they solved inventing the Hiragana, a syllabary system. Both languages chose different systems and both languages work fine.

Also Korean avoids many homophones thanks to it's 10 vowels. Japanese has 5.

In addition, Korean spelling is heavily morphophonemic, which is a fancy way of saying that words are written based on its "base form" even when the actual sound is different due to interaction with grammatical suffixes.

A bit like English "packed" being written with "-ed" even if it sounds identical to "pact". Helps disambiguation.

(Actually, come to think of it, it's rather analogous to the Japanese way of maintaining the same Kanji while the suffix changes.)

Koreans did have an old writing system[1] made of Chinese characters, where some were used for meaning and others were used to denote Korean suffixes with a similar sound (kinda like how Hiragana started out, I guess). But it eventually died out.

[1] https://en.wikipedia.org/wiki/Idu_script

There's a tongue twister in Chinese with all characters pronounced "shi".[1][2] Hmm, I don't know Japanese grammar, but if those characters are Chinese ones, you can almost write something like that with these characters. 秋收 and 修習 are legitimate words for starter.

[1] https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_... [2] https://zh.wikipedia.org/zh-sg/%E6%96%BD%E6%B0%8F%E9%A3%9F%E...

All characters are pronounced "shi" in Mandarin, with 4 different tones, leaving 4 distinct pronunciations.

In other "dialects", such as Cantonese or Teochew, the characters are pronounced as 7 or distinct syllables, with 6 different tones, leaving more than 20 distinct pronunciations.

Mandarin has very few available syllables compared to other languages (not only, say, English, but also older Chinese "dialects").

Homophones aren't a problem. Just use a silent radical in front of kana sequences. Swapping 2000 kanji for about 200 radicals is a good enough "90% solution". Some possible examples from the first page of results from you link:

    主 = 主シュウ
    集 = 木シュウ
    终 = 幺シュウ
    州 = 川シュウ
    衆 = 血シュウ
    就 = 京シュウ
    秋 = 禾シュウ
    收 = 又シュウ
    週 = 辶シュウ
    周 = 周シュウ
    宗 = 宗シュウ
    修 = 彡シュウ
    習 = 白シュウ
    執 = 幸シュウ
    秀 = 乃シュウ
    渋 = 氵シュウ
    拾 = 扌シュウ
    袭 = 衣シュウ
    捜 = 扌シュウ
    祝 = 礻シュウ
There'll be some duplicates, especially looking at all 213 found, but it solves most of the problem of homophones.

There are a lot of homophones between Chinese loanwords in Japanese. A writing system like Romaji that only records the phonetic information would be no more useful than just hiragana (which might then arguably more useful as it aligns with the syllables?)

It would also be no more or less useful than just speaking. Do Japanese people carry around flash cards with kanji on them to disambiguate their speech?

There are non verbal cues or different nuances in inflection that help disambiguate. There's also situational context.

Furthermore, people use simpler language when speaking then when writing.

In Chinese, which has even more homophones, it is quite difficult to tease out the meaning of a passage written phonetically (in hanyu pinyin). When speaking and there is a word out of context (such as a name), it is necessary to explicitly disambiguate by putting the word in a common phrase or describing the characters constituent parts. For instance, I would introduce my self as "Zhe as in 'philosophy', Hao which is 'sun' on top of 'sky'".

It's actually not uncommon when speaking to have to either draw a kanji in the air, or refer to the symbol you mean by cross-referencing another word that uses it to disambiguate.

No, but that's spoken Japanese. Different phrases are prevalent between spoken and written Japanese, as a workaround for those ambiguities.

Spaces carry a lot of semantic weight, as do other components such as capitalisation and punctuation (most of which in use in Japan is borrowed from romaji anyway).


Well, in English spaces certainly carry weight, partly because the word unit has so many characters. In languages like Chinese and Japanese, units /are/ the characters (or a few of them). Particles then make natural separators in strings of kanji.

That's exactly why you need more than just hiragana, it'd be too difficult to tell what each character was parf of at speed. Something like katakana replacing kanji and hiragana for particles would be perfect.

Modern Japanese could learn a few lessons from hangul imo.

But then Chinese moved to simplified Chinese, making some symbols very different.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact