Hacker News new | past | comments | ask | show | jobs | submit login

> like Chairman Mao Zedong, who seemed to equate Chinese modernization with the Romanization of Chinese script

One of Mao's better ideas




Romanization of Chinese writing was already proposed during the New Culture movement in the 1910s-20s. China's most famous modern writer supported it.

However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters. Not only that, but the Romanized writing system is viewed as something that only little children use (as an aid to learn the characters). Once you've put in the effort to learn the characters (as about a billion people have), it's very difficult to accept their replacement by what is viewed as a script for children.


The nice thing about Chinese is information density of writing. Something nice about seeing how much information can be squeezed into a small space. Feels like you front load more on the learning side, but get rewarded when reading and scanning texts. Not sure how much scientific evidence is behind that, just an anecdotal observation. Relatively few Chinese speakers want to give up characters.


I’m not sure how much evidence there is for that either — a Chinese friend couldn’t believe that I could just look at a paragraph of English and instantly know roughly what it was about; she, despite her fluency in written English, thought only Chinese characters would allow for such rapid comprehension.

It’s certainly denser, though. And I agree about the front-loading of learning. It’s like learning vi. An absolute pain at first, then very comfortable.


I don't think (for me) chinese or english reading is particularly different. In both cases you're scanning whole blocks (words, phrases) at a time. Sometimes I feel like I read Chinese slower purely because of how dense it is.


Yeah. That was pretty much my point — no native speaker is even looking at each letter (or even each word), wnlch js wzy yxu cyn upigemqand thws siktsmce wjtdut mnrh of a ptublim. Each word is its own shape, much like how Chinese speakers aren’t looking at each stroke.


Much like in English you can pull out lots of vowels and still read the text, you can cover up the about bottom 40% of the characters in a sentence and still read it.


> information density of writing

I feel like a proper comparison would not be number of characters, but a kind of pixel-budget, assuming both meet a certain reading speed and accuracy rate.


I was reading a Wikipedia page (https://en.wikipedia.org/wiki/Twelve_Metal_Colossi) and was struck by the difference in length of the Chinese quotes and translations. E.g.:

  收天下兵, 聚之咸陽, 銷以為鍾鐻金人十二, 重各千石, 置廷宮中. 一法度衡石丈尺. 車同軌. 書同文字.
was translated into

  He collected the weapons of All-Under-Heaven in Xianyang, and cast them into twelve bronze figures of the type of bell stands, each 1000 dan [about 30 tons] in weight, and displayed them in the palace. He unified the law, weights and measurements, standardized the axle width of carriages, and standardized the writing system.


I don't speak Chinese, but my understanding is that it's not a totally fair comparison: classical Chinese text was often highly abbreviated, to such a degree that you have to be an expert historian to interpret it correctly.

For example, the characters comprising your example text starts like:

collect (收) [from] [all] soldiers (兵) under the sky (天下), gather (聚) at(?) (之) Xianyang (咸陽), melt (銷) and (以) become(?) (為) bell-stand (鍾鐻) metal (金) person (人) twelve (十二) ...

As you can see, the English "translation" is more like an annotated translation. E.g., the original doesn't say who did it, or what he collected from soldiers: we just inferred "weapon" because what else could be melted into statues?

Similarly, "standardized the axle width of carriages" is just: cart (車) same (同) axle width (軌). We're supposed to infer "standardized" because we are talking about the Emperor's deeds.


Classical Chinese (Ancient or Old Chinese – multiple terms are used), the language the quote was written in that predates Middle Chinese and, by extension all modern Chinese languages, had a very different grammar with many features of it having all but disappeared from all Chinese languages. Classical Chinese texts are incomprehensible to a modern Chinese person who has not invested sufficient amounts of time and effort into completing Classical Chinese studies first.

There is a book, «Classical Chinese for everyone: a guide for absolute beginners» by Bryan W. van Norden that is easy to read and gives a gentle introduction into Classical Chinese.

The old grammar and vocabulary coupled with the Chinese style of writing metaphorically with an abundant application of allusions and with the same Chinese characters having multiple unrelated meanings, makes Ancient Chinese texts very terse and notoriously difficult to understand even for the educated Chinese people.


I just started learning Chinese about 2 months ago, to me it seems they stuff whole concepts into characters.

For example,

"去" (pronounced "Qú") is "going to the". "超市" (prounced "Chao Shi") is "supermarket" "去超市" (pronounced "Qú Chao Shi") is "going to the supermarket".

3 syllables vs 7 syllables.

To me, it seems that instead of composing letters into words to convey meaning, they have more letters that are mini-words unto themselves.


Don't forget all the abbreviation. "超市", supermarket, is abbreviated from "超級", super, and "市場", market. The equivalent in English would be "sup-mark" or something along those lines. (Or in Japanese, just "super".)


Since we're now talking about verbal rather than written:

> No matter how fast or slow, how simple or complex, each language gravitated toward an average rate of 39.15 bits per second, they report today in Science Advances.

-- https://www.science.org/content/article/human-speech-may-hav...


This tracks - it's difficult to speak at the same pace in Chinese as I can English. That said - are those 39.15 bits plaintext? Compressed? Encrypted?

The size of a word does not correlate with it's concept - I still posit that some languages can transfer concepts faster than others, minus our baud rate.

Edit: Or, perhaps I am not as gifted an English speaker as my bias has presumed :| For example, I had to lookup "syntagmatic".


Actually “去” is pronounced “Qù”


Thank you


> However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters.

How did that work out for Korea when they switched to Hangul?


They are not comparable. The Chinese script was tailor-made for Chinese languages, while it was simply adopted by the Koreans, which arguably was a bad fit because it’s 1) agglutinative and 2) not even a Sino-Tibetan language. Even then hanja is still part of the national education curriculum today (look up 한문 교육용 기초 한자).


Prewar Korean written script used Japanese style Kanji for nouns intermeshed between Hangul phonetics. Postwar, under US influence they transitioned into all-Hangul phonetic language, but IMO it looks a big regression in their communication ability due to resulting arrays of pure homonyms.

They rely purely on context to distinguish {"apples", "apologies"}, {"mayor", "market"}, {"stomach", "ship", "pear", "double"}, {"acting", "delays", "smoke"} so on and so forth if what I'm scrolling is right. There's no tonal or character distinction. That surely isn't great.


Pure Hangul was used for a long time before then, just not in any kind of official capacity after Sejong. But e.g. most "women's literature" would be written in it.

And back when it was first introduced, it certainly did wonders for literacy. Although it should be noted that original Hangul was more phonemic wrt its contemporary Korean, and the letter shapes were a bit simpler as well.


I don't know where you got the idea of "under US influence", but mixed Korean/Chinese character writing was common in South Korea well into 1980s, long after Korea became its own country. For example, in 1987, the newly founded Hankyoreh newspaper made a splash by deliberately writing all articles in pure Korean script, which was not the norm until then.

Gradually more books and newspapers followed suit, because pretty much everybody found that writing everything in Korean letters actually make communication less ambiguous and easier to understand. If your phrase is ambiguous between whether someone's offering apples or apologies, then you just change the word or add additional context to make it clear which one is being offered. It's no different from how English speakers deal with bear/bear, tear/tear, arm/arm, ground/ground, and so on.


Here is wonderful article by John DeFrancis on the topic:

The Prospects for Chinese Writing Reform (2006)

https://sino-platonic.org/complete/spp171_chinese_writing_re...

It is cited frequently.


Almost all digital communication is written using pinyin, which today is almost all written communication


This is an extremely mainland-centric view. Cangjie is the dominant IME in Hong Kong.


That's why I said almost all


It’s only almost all if you only interact with the millennials or younger. Pinyin is an IME for Mandarin. If you aren’t fluent in Mandarin, chances are you use voice input or stroke typing.


Why shouldn't it be mainland-centric? Mainland China is 99.5 percent of the population of China. That's like refuting a claim about Americans by calling it "a very non-Pennsylvanian view".


Because China is not the only place where Chinese languages are spoken. There’s more than 10 million ethnic Chinese in Southeast Asia alone. And it’s not only a mainland-centric view: it’s a mainland–Mandarin speaking centric view.


Pinyin is used as input to select characters, but the final text that's used to communicate is composed of characters.


Thank God it didn’t happen.


Much of the simplification adopted shorthand already in common use, which is why Japanese shinjitai simplification independently arrived at many similar characters and patterns. The second simplification round was an abysmal newspeak-esque failure, and thank goodness _that_ wasn't adopted either.


pinyin is the best thing that happened to the language after simplification.

Not only did it propel literacy rates to basically 100%, but it added a phonetic component to the language


Again, this is a very mainland-centric view. Hong Kong has never simplified their writing system or even developed a proper romanization, and yet has consistently one of the highest literacy rates in the world. Guess what helped literacy? Post-war socioeconomic development like poverty reduction, mass education and industrialization.

> it added a phonetic component to the language

Fanqie has been a thing since the 2nd century. Zhuyin was invented in 1913.


Agreed. I have seen kids from mainland China spending lots of time learning pinyin while kids from Hong Kong at the same age can already write some characters and pronounce the words accurately


Simplification is just bad. It removes too much that it breaks ability for non-speakers to infer meanings. Complexity of letter shapes is irrelevant to ease of use in computer usage, so it's just a massive loss.


>it breaks ability for non-speakers to infer meanings

Not sure what you mean by this. Do you mean that it's less convenient for people that don't speak / read Chinese? Why would that be a relevant metric?

You may be missing that character standards have changed over time and that different writing styles (草书,行书) are implicitly simplifications. You can think of latin or Russian cursive as a simplification of the printed letters.

In practice, the phonetic component has been mangled / evolved over time, so simplification doesn't make things more or less difficult for students (be it 5 year old native speakers or 50 year old non native speakers).


Worked out excellent for Korean (Hangul) though. Also English.

Both massive wins


I don't think it did for Korean, though I need input from speakers to be sure. From my experience, Korean MT routinely stops halfway through inputs and dumps nonsensical phonetic transcripts, likely from failing to identify words. I suspect they were just being complaisant to American influence in postwar years. Computers failing to even isolate and match words in this day and age is not a sign of an excellent working script.


Translation needs phonetic transcription to handle proper names. If there are words that may or may not be proper names depending on context, machine translation will guess the context wrong at least some of the time and phonetically transcribe what should be translated, or translate names that should be transcribed.

The problem also can also happen when translating from English, if you think about all the surnames that are occupations, or names like "bill" or "lily." Capitalization usually helps disambiguate, but there's title case and all caps and people who never capitalize anything...


It's not just proper nouns. Korean MT seem to routinely "de-synchronize" into wonbonhangugeotegseuteububun mid-sentence and sometimes comes back in sync, sometime stays out of sync until the end of the sentence. it happens way more often than average with the Korean language.


Do you have an example input where that happens?


Simplified Chinese characters are already difficult enough for foreigners to learn. Making them learn traditional characters would just be sadistic.


Traditional characters is built on common parts for pronunciation and meaning cues. Simplified removed that so IMO it compresses worse and therefore harder. It's visually less dense, but, so what.


Those cues are there in exactly the same way in most simplified characters.

The cases where simplification has removed those cues are rare enough that the extra complexity of traditional characters is really not worth it.

I've never heard anyone claim that simplified characters are more difficult to learn, and it just seems false to me.


“Literacy rate” is just a bureaucratic index. It was increased in most countries with mostly the same measures, no matter which their writing system was. If look closely, “literacy” meant “making mass of workers and soldiers capable of following basic instructions”, and there often was not much for them to read except for parroted propaganda (obviously, I'm not talking about China specifically, as it has been the same everywhere).


Phonetics can be counterproductive to comprehension, or converting meaning to text. Take an example much closer to English: Scottish Gaelic, which is written with the Latin alphabet. It's considerably older than English, has more distinct consonants and vowels, and it is really difficult to guess the pronounciation from a written word if you only speak English (unlike Welsh, which has nice orthography and is easier than English in that respect).

Because of these difficulties, there is a long tradition of anglicising names of settlements to meaningless collections of letters which when read by an English speaker approximate vaguely to the original Gaelic name. Unfortunately this is not a reversible process - you can't look at a modern anglicised name and guess what the Gaelic is, in general.

Now while Gaelic has a tiny population of native speakers, there are millions of people who know some "map Gaelic" - that is, we can look at a map with Gaelic place names, and understand the elements. It doesn't work for towns and villages, but generally in the north, no-one bothered to anglicise the names of natural features, just the settlements - and walking is the most popular outdoor recreation in the UK, so we learn this when we read maps.

When the first SNP government of Scotland came in, they introduced bi-lingual road signs, even in areas where Gaelic is no longer spoken. There was and is complaint over this, but I found that things became much clearer. I could look at a placename like Machrihanish, and see that it is Machaire Shanais. I still don't know what Shanais means, but Machaire is a type of landscape that I know, so I immediately know that this is low-lying and grassy, and fairly level. I can do this for thousands of place names without being able to reliably tell how to pronounce the words - similar to the way that the pronounciation of a word indicated by a Chinese character can vary widely with the part of China, so that the pronounciation becomes quite secondary to communication.


Uh... no. Bopomofo which is used in Taiwan is a phonetic script that is used as a popular IME.

And simplification's only "arguable merit" is that it saves a fortune in ink at the expense of losing its historical roots. But guess what? We mostly use computers now. So great job Mao, now we have two competing standards. (Nod to XKCD).

Unrelated but to those of us who started with 繁體字, simplified just looks ugly. (龙 vs 龍)


Sure traditional looks nicer, but holy fuck is writing it (by hand) ever annoying. When I asked friends who grew up with the traditional characters about it they said a lot of people use some form of simplification when taking notes or leaving messages for friends/family. People from mainland seem to only shorten words by omitting characters of longer words, if at all.

And about losing the historical roots, I guess if you're interested in it, the characters will always be there and accessible for you to study. I'd be interested how much the average Joe from Taiwan really remembers about random characters' roots, composition and meaning. I know much more people from the mainland, and among them are people who don't give shit, and those who can also write a lot of traditional characters and give lectures about the origin of meaning of some character and whatnot.

Also, since this is about computers after all, I've seen a study a while ago about from mainland where they tested how many mistakes people make writing less common characters. There was a bar chart that went down between 10 and 20ish, then went up a bit and started to go down again at around 30. It was speculated that people in school still have to write a lot by hand, and during/after college that stops and everything has been digital for a decade now so people just forget again, but folks old enough to have used pen and paper for a couple decades just had enough practice. I wonder if this effect would be more or less pronounced with traditional characters.


I feel like Japanese strikes the right balance, no ugly oversimplified characters but making common kanji easier to write for children (國→国、櫻→桜)

For example 竜 is a fairly common simplification of 龍 and imo not nearly as ugly


There are some strange-looking ones too (圖-図、圓-円), but agree that overall it was lighter touch. I think all simplification projects have an inherent awkwardness in taking handwriting shorthand or cursive and trying to reformalize it back to print. In any case it's a shame that there was no coordination due to obvious geopolitical conflicts, that we're now left with 3 sets. It was easier last time, 2.2k years ago when some dude took over all places that wrote Chinese characters and forced a single way of writing :)


Yeah except hiragana and especially katakana both look ugly though.


> simplified just looks ugly

I prefer simplified for the aesthetics alone. Traditional is cringe and ugly in typed form.


Vietnamese is relatively OK.


Chữ Nôm is a borrowed writing system and not native to Vietnamese, which isn’t even a Sino-Tibetan language to begin with.


Latin is a borrowed writing system not native to English, German, Polish and many others which aren’t even Romance languages to begin with and must resort to di- and trigraphs plus non-Latin characters like J, V, ß, ł or å, among others (not to mention diacritics).


Alphabets are much more flexible than the Chinese characters.

An alphabet can be adapted to basically any language. You just have to map the letters to the sounds, and you're pretty much done.

By contrast, the Chinese writing system is adapted very specifically to the properties of Chinese language. Every syllable in Chinese has a meaning (or set of meanings), so every character represents one meaning (or a few). English does not have that structure: words can have very arbitrary syllables that don't have any meaning on their own. Chinese characters encode a meaning plus a sound, which is often reflected in how they're composed (i.e., a character will often be composed of two simpler characters, one of which has the correct meaning and one of which has the correct sound). Chinese words do not change form: there's no conjugation, no plural form, etc. As a consequence, the writing system has no way to deal with things like conjugation.

I have no idea how one would even begin trying to adapt Chinese characters to write English. On the other hand, it's relatively easy to come up with a way to write Chinese in any alphabet.


"Å" is just "O" stacked on top of "A" though. And "V" is in fact the OG Latin form ("U" is the newly introduced one).

But yeah, the whole notion is kinda silly. Most writing systems in the world are developed from very few originals. E.g. for most of Eurasia, the source is either Egyptian hieroglyphs or the Shang Oracle bone script.


Relatively. The amount of diacritics on Vietnamese surpasses European languages so text rendering becomes a challenge if a naive developer doesn't test with Vietnamese.


Is bringing back Chu Nom script going to simplify Vietnamese support on computers by a lot? It's unintelligible to CJK users, but as far as text rendering goes, it seems just simple Kanji/Hanzi.


The Vietnamese romanized their writing, they seems to be doing fine.


This isn’t factually correct. The French colonial administration romanized their writing and enforced chữ Quốc ngữ.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: