Of course I can, not because I am Chinese, but they really do look different. How can you NOT see the difference? Also they have different meanings.
Here is one that’s really hard to recognize especially in handwritings and often written wrong if not careful.
已 vs 己
This puzzles many Chinese people...
Can you spot the difference?
已 has to do with time (stopped)
己 is self
So 自己 is me/self, while 已經 means already.
There is also 巳, which is ancient Chinese clock means 9-11am I believe.
I am trying to teach myself Japanese, which allegedly is easier than Chinese. Hiragana as far as I can tell makes no internal sense. The characters for yo and ya don’t incorporate elements from o or a. And when you are done eh Hiragana you realize that it is on its own not enough to actually communicate in Japanese, so now in addition to those 46 characters, you have to memorize 2-4 thousand kanji. Oh man that is disheartening. I keep plugging away but it is slow going.
Who did you hear that from? The Japanese writing system is the Chinese one, minus all of the internal consistency, plus lots of characters the Chinese aren't traditional enough to keep in the language. It often uses the same character to write distinct Chinese loanwords which differ wildly in pronunciation based on when and from whom the Japanese heard them, and they often have different shades of meaning.
Chinese, on the other hand, uses only one writing system in any given body of text, uses the Latin alphabet with tone markings for phonetic spelling for learners, and is much less reliant on garbled loanwords.
Your conception on the writing systems is quite weird, consistency (if any) is the same, I would say. I do not get what you mean by characters being "traditional" or not.
Chinese loanwords are more garbled, with different methods of assigning them random chinese characters. And the Japanese use Latin characters to mark pronunciations also, minus the tone markings. (Let me introduce you to -- bopomofo)
That's a common misconception (on the level of "katakana is for loan words"):
I switched my televi off and rode my autobike from my mansion to the cleaning to pick up my Y shirt because my car's front glass broke. On the way home I stopped at the conveni.
Cleaning: dry cleaning
Front glass: windshield
Conveni: Conveni-ence store
> Chinese loanwords are more garbled, with different methods of assigning them random chinese characters.
This was standard in Japanese too (ateji). It is more common now to just make a loanword, but if someone doesn't learn them they may be confused about why Japanese abbreviate America as "rice".
米 US from 亜米利加 A ME RI KA (亜 is for Asia)
eg 米軍 (US military) or 米ドル (US dollar)
(I used to think 美国 was descriptive since it didn't sound phonetic, until I came across 米国 and was confused as to why anyone would describe the US as "rice country", and then I learned that both were phonetic, but for the "me" in "america")
Bit of a stretch IMO - the pronunciation and usage can still often be different enough to trip you up, especially when you consider there are non-English loanwords on occasion (e.g. playing Persona 5, it took me far too long to realise that ルブラン, which I naively transliterated as ruburan, was actually the French "Le Blanc").
As a Chinese speaker learning the kana, I found knowing the etymology of the kana somewhat useful as a learning tool, but obviously this is of no help if you don't already know Chinese.
Also, Mandarin (and other Northern Chinese) pronunciations have changed quite a bit more than southern dialects, so knowing Cantonese is more helpful than Mandarin, e.g.
か 'ka' comes from 加, which is 'jia' in Mandarin but 'ga' in Cantonese
A syllabary that 'makes sense' would be Korean Hangul, where the relationships between the various parts of the mouth were purposefully encoded into the relationships between the components of glyphs.
Don't worry about "learning kanji": learn words. A few kanji are used in isolation as words, but most kanji have meanings that you can only infer from context (as part of words). Furthermore, even if you "know the meanings" of kanji, you'll find you just have to memorize words anyway, because you usually can't combine the meanings of the kanji to come up with the meaning of a word. Not easily, anyway.
Too many westerners try to learn Japanese by memorizing 2,000 kanji, because that's easier than memorizing 30,000 words. It doesn't work. The way you "learn kanji" in immersion language schools is that you get word lists, and you're quizzed on how to read (i.e. write the readings) and write (i.e. write the kanji) those words. You quickly start to see patterns.
Memorizing lots of words is painful, but then, language learning is painful. This method comes with the huge advantage that the sooner you start learning words, the sooner you'll get positive reinforcement from the real world. You won't get that by memorizing kanji.
That said, I've found that the kanji separately really helped my Japanese studying. If nothing else, it's kept me from falling too far behind the Chinese and Taiwanese students in class...
The book I used for the kanji is The Kodansha Kanji Learner's Course by Andrew Scott Conning. Can't recommend it enough.
(Though, it should be said that the reason the Chinese kids are better isn't that they "know kanji", but that many of the words are written the same way in Japanese. Substitute some of the kanji with hiragana, and they'll lose the bubble quickly.)
The only things I can say with confidence are:
* focus on words
* learning words becomes faster in *any* language as you gain proficiency
* learning radicals helps you recognize kanji, and is a good thing to do
Those god forsaken counters.
You can't just say five pencils. You have to say
5 long thin object even-numbers-4-and-under-and-odd-numbers-5-and-over-counter pencil
Go hon enpitsu
The pattern for which is
They can't simply say the number of clouds the same way they say the number of machine presses. Or count small animals the same way they count apples. Or demigods the same way they count flat objects.
Small round objects and military brigades would obviously be counted the same way though. Obviously.
There's also a unique counter specifically for armed naval vessels and slices of fish on top of balls of rice.
And one just for straw mats.
And one for graves, CPUs, wreaths, and dams.
And while small animals and arthropods have one counter, and large animals have another counter, you count butterflies using the counter for large animals.
Oh and the first 10, 14th, 20th and 24th days of the month essentially have unique names.
I knew a native Japanese guy who spent 4 years mostly away from Japan, and he started to have trouble remembering the days of the month.
At this point for me, the measure words are actually a help instead of a hindrance when I encounter a new noun that I don't know -- it hints strongly enough at what sort of noun I'm dealing with that the context usually allows meaning to more readily snap into place. More useful than memorizing generic masc./feminine/neuter for most Indo-European languages.
(Assuming it isn't just like.. "parutikkuru akuseroreta" or something)
(Chinese is also much simpler when it comes to measure words)
Kanji isn't that hard if you are realistic about how large of a task it is. Japanese children spend 10 years learning kanji at school and that is their native language.
Capital I, lowercase l, numeral 1
Numeral 0 and uppercase O
> Of course I can, not because I am Chinese, but they really do look different. How can you NOT see the difference? Also they have different meanings
The difference at my resolution is about 10-15 pixels, this seems like much less than the difference between one font and another. But then of course I don't have any knowledge about Hanzi. Is the difference whether the diagonal dash touches the part to the left? Seems very hard to notice.
So I'd say that the difference is mostly how your font chose to indicate the proper stroke direction.
We have plenty of more ambiguous characters in our limited alphanumeric set and if special care was not given by don't designers it could look exactly the same.
Every written system has its own clues that readers learn to focus on.
In Japanese the number and direction of strokes is important but may translate to barely noticeable pixel differences that Westerners would dismiss as noise.
I think that this particular pair is confusing because in Japanese orientation is not supposed to be important, but is, in that case.
I saw the difference, because it reminded me of the difference between ン and ソ in katakana, but if you aren't Chinese I can see how you could assume they are the same. You can draw the same character differently in English without changing the meaning (e.g. 9 straight or curved, 7 with or without a bar through it, looped k or f), so overlooking the difference between 𠮟 and 叱 seems easy enough.
荼 vs 茶
日 vs 曰
已 vs 己 vs 巳 vs 巴
士 vs 土
水 vs 氺
人 vs 入 vs 八 vs 几
干 vs 千 vs 于
天 vs 夫
... the list goes on.
Also 苶 looks similar.
Ask any Japanese person why they use kanji and not just hiragana, katakana, or romaji and they'll tell you the same thing: Because it is hard to determine the meaning. They don't say it is impossible, just that it is difficult, which it is.
For a contrived example, if I write はしをさがしてる。What am I looking for, a chopstick, a bridge, or the side of something? There is no way to discern from the sentence alone. However, if I use this kanji for hashi, 箸, then it is obvious I'm talking about chopsticks. There is no ambiguity.
This doesn't always help. Sometimes the homonyms use the same kanji. 青山, for instance, can be read あおやま, which is a surname, or it can be read せいざん, which means a lush, green mountain. Most of the time though, kanji provides meaning and clarity where hiragana, katakana, and romaji do not.
What Chinese lacks is spaces between words.
Even Korean has spaces. It makes reading a large block of text so much easier, and it helps me to identify vocabulary that I recognise.
Adding spaces and Pinyin are just a couple of the many features of Pingtype.
(there are comments  on the topic if you're interested to see HN's discussion on the topic as I was)
I wonder how long it will be until UTF-8 is used everywhere and non-BMP characters enjoy first-class support and testing. You'd think the U+1Fxxx emoji would have been enough to make this happen.
For example, 右 (right) and 左 (left). You'd think the top-left part is the same, but it's not. In the case of 左, you write the horizontal stroke first, but in the case of 右, it's the second stroke. They also have a slightly different shape.
Another example from the Jōyō kanjis, 臭 (stinking, odor) and 嗅 (smell). You'd think the second is just 口 (mouth) added to the first, but it's not. Etymologically, 臭 is 自(nose, simplified form of 鼻)+犬(dog), but was simplified in the Jōyō list, and became 自+大. 嗅 was not part of the list back then. Which doesn't mean it didn't exist. It just means it was not recognized as regular enough by the ministry of education. 嗅 was only recently added to the list (2010), but was not simplified to remove the extra stroke, so it's 口+自+犬, leading to this funny inconsistency.
It a little like replacing every french word / character à with just a. And à is only accessible with variant fonts which is not supported by most OS, and if you have your system locale as French every English word would have à instead of a.
That's the case for HTML:
Either way it's annoying to set people up to screw up in this way. And the unification also means it's quite difficult to get multiple variants into the same text.
and to show just how ridiculous han unification is, on my laptop your example makes no sense because the 臭 has a 犬!
Looking it up (on wiktionary, at least) it seems like originally 又 was "right hand", however if you look at really old versions of that character it looks mostly like a mirror of 𠂇.
Interestingly, Wiktionary lists both left and right as phonosemantic compounds where the phonetic part is "left hand" or "right hand" and the semantic part is "assist" and "mouth" (I think in this case the "mouth" is used to bolster the "pronounced like" of the phonosemantic compound, since it's used on the right side not the left). This seems to be because the word for "left hand" became the word for "left" and same for "right hand", so they're pronounced the same; and the semantic component was added later to bolster/specialize the glyph.
Anyway, it seems like the 𠂇 radical in 右 is etymologically a variant of 又 which is a mirror of 𠂇 (well, a mirror of a three-pronged historical form of that) except it was rotated around the glyph so that it looks exactly like 𠂇 but the stroke order is reversed.
Which in and of itself, is not sufficient, but is a strong clue.
Then, if you look up the seal script origins of 左 and 右, you find that the top-left part of the character comes from different directions: It comes from the left for 左 (https://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/%E...) and from the right for 右 (https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/%E...). One way to look at it is that the descending stroke of 𠂇 in 左 is the arm, while it's the horizontal stroke in 右.
But at this point, that's mostly conjecture.
So the next step is to simply find how the characters were actually written by writers from far in the past. And it turns out there are such resources:
This contains the characters as written from a lot of different material from different periods. And in those many instances of the characters, the way the two first strokes can be connected is a strong indicator of how they were written.
For instance, on 左, they connect on the top-right end, indicating the horizontal stroke comes first:
And on 右, they connect on the lower-left end, indicating the horizontal stroke is last:
You can't do such connections by writing the horizontal stroke first.
There are also examples of the horizontal stroke connecting to the 口 part, further indicating it's the second stroke:
If you look at the pages linked above, you'll also find the names of those writers, which you can look up and check out some of them were living > 1000 years ago.
Funnily, there's this interesting exception of 左 with the horizontal stroke second: http://pic.guoxuedashi.com/shufa/sf26/r201509387.0252.9eb7a9...
after looking through my dictionary, it appears that 右 was originally written with 又 which is a pictograph of a right hand (no idea if the split-top was an accepted variation of the radical or the original form) and was later standardized to use 𠂇 for both characters. Japanese picked them up before the standardization occurred and so missed out on the homogenization.
this also makes 友 a combination of the left and right hand, which imo is kind of cool.
Actually, the writing with 𠂇 is older than the japanese using the characters. The wikipedia page on stroke order suggests the homogenization in stroke order might have happened last century.
My favorite one of these "look the same but aren't" pairs is 土 & 士 which differ only by the relative lengths of the upper and lower stroke. There's also 囗 and 口, which differ slightly in size but more importantly in the hooking of the second stroke; except half the fonts don't display that; and certainly not at the small font size I keep English text at (which is why I've tweaked the font-size for Chinese in my browser). It would be even more confusing when they get used as radicals except 囗 is always used to surround a glyph, whereas 口 is used squished up the "regular" way as in your example.
I'm not sure what you mean about the hooking of the second stroke of 囗 and 口. Do you mean the fact that in the former, the angle tends to be 90° while in the latter it tends to be < 90° (except in fonts, as you note).
I don't know about Chinese, but in Japanese, the former is not a character on its own (at least, it's not in the 2136 Jōyō kanjis, neither is it in the ~6000 characters of the top level of the Kanji Kentei).
No, I mean that the end of the stroke in the former is often hooked and crosses the third stroke, whereas in the latter the second stroke stops short of the third stroke and isn't hooked. I'm not 100% sure; and I'm still learning :)
I don't think it's a common character on its own in Chinese either; I know it as a radical.
(In general there doesn't seem to be a good source for what kinds of variations in a character are accepted and what the "kernel" of the character is.)
There are a few resources, like glyphwiki or Kanji-VG, that have stroke information, but TTBOMK, they don't contain any information about the kinds of strokes.
Even if they did, that would barely scratch the surface of what I am, personally, interested in. For instance, I'm taking Japanese calligraphy classes, and am interested in the various ways characters can be written in the different styles (楷書, 行書, 草書).
I've also found that interesting etymological facts about kanjis help me remember how they're written. To give a couple examples of things I've found by looking at japanese kanji dictionaries, 並 (line up) is derived from 立 (stand up) repeated twice (立立), and 自 (self) had the meaning of nose before the character for nose was created (鼻), which is why you'll find it in many nose-related characters like 臭 (smell) or 息 (breath). The latter is the kind of thing that I've found most learning apps completely miss. All those I tried, which are all essentially based on the same data sets anyways, will tell you that the radical 自 is for self, and be done with it. Somehow they do manage to tell you the radical 月 can be related to flesh (which is why it appears in many organ-related characters). In the worst case, Kanji-VG based apps will even tell you that characters with the radical 自 contain the radical 目, because you know, 自 looks like 目 with a stroke on top. Which is completely misleading.
Essentially, what I've found works for me is to go to the local library, open good old fashion japanese-japanese dictionaries, and research the subject. It helps that I live in Japan.
I did try a few japanese-targetted apps, but they are mostly drill based, and/or heavily targetted at kids. That doesn't work for me. Maybe I simply haven't found the good ones... finding something on those app stores is so impossible.
As for japanese learning tools for foreigners, they tend to be targetted at beginners, which is fine, when you're a beginner.
On the subject of drills, I haven't found any that actually tries to challenge on similar-looking, possibly even homophone characters. At best, they will make you pick between homonyms, like "given $context, do you write きんし 禁止 or 近視?" (both are correct writings for きんし, but have totally different meanings, so the context tells you which to use), but I haven't found any that makes you pick between homonyms that are purposefully wrong but very close looking. Like "do you write そくおん 促音 or 捉音?" (only one of those actually exists, the other would actually have the same reading if it did exist, but it doesn't).
But I digress.
`The Japanese cross the blade in 刃, the Koreans don’t`
Well Japanese have both 刀 for katana, the well known sword, and 刃 for yaiba / "blade". But in fact 刀 is used for many kinds of blade, by itself as `katana` and as a component of words like 太刀, 印刀, 日本刀, &c. (see http://kanji.quus.net/jyukugo1498/) Is this really pointing to a distinct concept from 刀 in Korean, as the article suggests?
(for a bigger list see https://en.wikipedia.org/wiki/Han_unification#Examples_of_la...)
Even so: in my experience in Japan I've seen e.g. 認 hand-written both ways, and specifically remember becoming curious about this variant. My conclusion after consulting with native speakers and university professors I knew there was that they are essentially and functionally equivalent. That is, against the author's point, Japanese speakers do not note a difference at all.
I'm not very sure of this but it seems like the Japanese characters mostly stayed the same after branching off of Hanzi usage at various points many centuries ago (more than a millenium ago, actually), whereas in China these characters evolved.
regardless of whether 刀 and 刃 have similar meaning in korean and japanese (in chinese the former refers to the knife/sword itself whereas the latter refers the edge of the knife), the unification of them across multiple languages makes it difficult to handle variations, nuance, and (most importantly for a forward thinking standard) divergence over time. that korean, japanese, and chinese have similar characters for historical reasons doesn't mean they are bound to be the same character forever in their respective languages.
that another commenter below me can't easily transmit to me the character for "smelly" in japanese because whatever font i use is preferring the chinese character for it is a signal that the standard is broken.
then add in your point that the written forms themselve are diverging over time.
ツ - tsu
シ - shi
ソ - so
ン - n
The reason these are difficult to learn is because the tiny differences in stroke angles (especially when handwritten) make it easy to confuse them.
シ (shi) is written top to bottom. You can see that all the starting points for the strokes line up vertically on the left. Also, the last stroke curves from the bottom-left to the upper-right.
ツ (tsu) is written from left to right. You can see that all the starting points for the strokes line up horizontally at the top. Also, the last stroke curves from the upper-right down to the bottom-left.
ン (n) lke 'shi' is written top to bottom. The starting points for the strokes line up vertically on the left. It also uses the same direction for the longer, final stroke as 'shi'.
ソ (so) like 'tsu' is written left to right. The starting points for the strokes line up vertically on the left. It also uses the same direction for the longer, final stroke as 'tsu'.
シ comes from the Chinese character 之, meaning either "of" or "this"
ツ comes from the Chinese character 州 (although some researchers disagree), meaning "state" or "province"
ソ comes from the Chinese character 會, meaning "meet", "party", or "interview"
ン comes from the Chinese character 尓 (again not clear), meaning "you" or "that"
This divergence happened because Chinese characters were used to write Japanese which is mostly from a different language family. Some Chinese characters were used by some for their meaning (ideogram) and used by others for the sound they made to approximate the sounds used in Japanese. Obviously Chinese influenced the Japanese language (for example numbers in Japanese sound similar to numbers in middle ages colloquial Hokkien). Over the last 1,000 years or so there have been numerous occasions for changes, standardisation and simplification. So even to those who study these changes over time it can sometimes be unclear.
ソ comes from 曾 ("once, at one point") not 會. Yes, those are different characters.
The trick is that 曾 is sometimes written with 八 as its "roof", which makes it look like 會, and sometimes with 丷, depending on your font. (Similar to 兑 and 兌 in the article)
pacman -S ttf-hanazono
For example, 吃 and 喫 have the same modern meaning "to eat," but one is more commonly used. A character like 鎌 has a variant like 鐮, just as well as 塚 and 冢. People choose characters based on stylistic reasonings (and in Taiwan, many choose the Japanese variant to be "hip").
If I could be very pedantic for a moment...
Mandarin can be written in many alphabets. Almost every single native speaker one earth uses an alphabet to input chinese characters on computers and phones. And almost every native speaker learns their language using an alphabet - at least initially.
(but yes, the majority seems to use phonetic/alphabetic input methods, i.e. pinyin or zhuyin.)
I've made Pingtype, which lets you decompose a character into the parts, and rebuild another similar character if your handwriting recognition isn't perfect.
Written Chinese has basically 3 different standards, in Sinosphere:
1. Mainland China: Simplified Chinese. Meanwhile, contrary to most people's understanding, PRC has standardized its own Traditional Chinese writing as well, which is permitted in logo/trademark and research purposes.
2.Hongkong: Hong Kong has its own system, and plus several hundred characters that created specifically for Cantonese.
3.Taiwan: Taiwan's system has some interesting distinction in characters like 骨(bone)，which is different from both Mainland China and Hong Kong's system.
Japan ever since 20 century has simplified and revolutionized its Kanji standard on its own, but unlike China, a lot of Kanjis still follows form from the old Kanxi dictionary(康熙字典). While Korea is an interesting case, where they abandoned Hanja in everyday use, so they don't even bother to simplify it, thus Korean Hanja actually follows the most orthodoxical way of writing of all the systems mentioned above.
Examples of writing systems that used characters primarily to represent sounds did come later, though similarly, it is difficult to argue that any known language is purely phonetic. As a simple example in English, take '&' — though it is not a recognised letter, neither is it punctuation, and it is unarguably a glyph in the English writing system about which one could debate whether it is ideagramatic or logographic. (But I digress.)
There are many types and varieties of phonetically representative writing systems — most of the earliest, like Linear B, are Syllabaries. The earliest known surviving examples date from the mid-15C BCE, written forms of Mycaenian Greek.
Fully segmental writing systems (please distinguish 'fully' from 'purely', as elaborated above) are definitely younger than the rest. The earliest examples of these are often abjads and abugidas, which either eschew vowel representations or form consonant-vowel digraphs respectively.
The Phoenician alphabet, or more correctly the Proto-Canaanite script that was used by the Phoenicians, is the earliest surviving example of a segmental writing system — it is an abjad of which the earliest surviving examples are from roughly 1000BCE.
The earliest preserved writings that aren't just numbers date to about 3100BCE in Sumer, so predate preserved phonetic (excluding syllabic) writings by about 2000 years, and are in the order of 5000 and 3000 years old, respectively. So it's a big stretch to call it a 'recent invention', but certainly not as ancient as logographic or mnemonic writing systems.
https://en.m.wikipedia.org/wiki/Syllabary and https://en.m.wikipedia.org/wiki/Linear_B
https://en.m.wikipedia.org/wiki/Abjad and https://en.m.wikipedia.org/wiki/Abugida
> The ampersand is the logogram &, representing the conjunction "and". It originated as a ligature of the letters et, Latin for "and".
餌 - animal feed, bait
遡 - go back in time; go upstream
遜 - humble, modest
謎 - riddle, enigma
餅 - rice cake; mochi
on the one hand I get that most browser (and OS?) dev happens in the West by people unaffected by these issues. One the other hand with > 1 billion people using non Roman based languages I'd expect this kind of stuff to be more of a priority.
maybe this particular issue isn't that important? the one that bites me the most is pressing ESC to cancel IME editing and having it exit some dialog because the browser/os passed the ESC all the down to the app when it was meant only for the IME. I get there are probably no easy solutions tho
this is also a place where VSCode fails because VSCode and any other is based text editing in html needs more IME info than current broswer APIs provide
The state of IME support in Windows is very poor.
WPF (legacy software?)does not support this at all with reported bugs going back five years.
Winforms does, but doesn't meet my requirements.
Html does define the inputmode attribute, but this only works as a hint to smart phone on-screen keyboards. The inputmode attribute is totaly ignored by desktop browsers.
The Microsoft IME does not appear to have any API which can determine the current state or switch between modes. However, there is currently a Html5 working group on an API for IME control.
In this day and age of multinational software, this is truly pathetic.
This script is called "Latin".
> on the one hand I get that most browser (and OS?) dev happens in the West by people unaffected by these issues.
As far as I can tell, this isn't in the browsers' hands. It's a font/unicode issue. Unicode has to encode these variants; and fonts have to support them.
That said, I am seeing that red table work in Firefox but not in Chrome; however this might just be because Firefox is my primary browser; and I've done a bunch of fiddling with font settings for any language I speak or am learning, and Chinese is one of them. The default OSX fonts may not be so good for that.
macOS and ios are good examples. Ships with a good one for Chinese and nothing for japanese. If you buy a Sony android phone you'll get a good handwritten Japanese input out-of-the-box.
The reason is because basic stroke order information is freely available, but you need much more information to build a good system (you need common abbreviations and mistakes, fx)
For me seemingly every example's left most and right most column (for the sets of 3) looked identical, while the middle form was different.