I read a page at a time, looking up and writing all unknown vocabulary in a notebook. There could be dozens of new words on the page. I then reread the same page. Usually I can read the entire page through quickly the second time, because it's fresh in my memory. However, when the same words come up again later I may have forgotten the reading or meaning, so it gets another entry in the notebook (even if I recognize it). After noting the same word a few times, I may start to remember it.
The advantage of reading books, in my opinion, is that they tend to use the same vocabulary over and over. You'll eventually remember the most frequently used words naturally. The other advantage is that vocabulary appears in context. This is better for both remembering and for understanding. This is why I don't use flashcards, and just let the book be my study tool.
It's much easier to use this method if the book is selected carefully. It should be relatively easy, but still interesting. Something that might be assigned reading in middle school or high school, I feel, are often good examples.
Every day, I'd do that, and also do my SRS reviews. Every week or so, I'd reread the whole week's text. If it was a particularly difficult book, then I might do the same thing after a month (ie. read the whole month's work).
Almost always, by the time I got to half way through the book, I didn't need the SRS software anymore. Authors tend to use the same vocab and grammar over and over again. So I could read several books written by that author with almost no difficulty (although it depends on the level).
Anyway, what I found was that this was the fastest way to bootstrap me to free reading. If I didn't memorise vocab as I went, then I'd need to look up stuff for the entire book (and beyond). But if I did memorise vocab, then I could easily get to a point where I'm just reading casually without any desire to look stuff up. Studies (which I don't have pointers to at the moment, unfortunately) have shown that you need 95% comprehension (on average) to learn new vocabulary and grammar from context. So the key is to get yourself there (in the context of what you are reading) as quickly as possible. It makes a big difference.
A pointer to the study that you're probably thinking of is research done by Paul Nation and Robert Waring (vocabulary researchers). They cite their own 1985 study and a 1989 study with the following quote: "With a vocabulary size of 2,000 words, a learner knows 80% of the words in a text which means that 1 word in every 5 (approximately 2 words in every line) are unknown. Research by Liu Na and Nation (1985) has shown that this ratio of unknown to known words is not sufficient to allow reasonably successful guessing of the meaning of the unknown words. At least 95% coverage is needed for that. Research by Laufer (1989) suggests that 95% coverage is sufficient to allow reasonable comprehension of a text. A larger vocabulary size is clearly better." 
However, I have no doubt that the paper I was reading, references this one. Great find. Thanks!
Perhaps you're thinking of glosses? I'm not familiar with graded readers, but I am rather familiar with mediæval manuscripts, wherein it's common to see copies of Latin works with little annotations near certain (sometimes all) words. When the gloss has been added above (or below) the word(s) they annotate, they're said to be interlinear, but marginal glosses aren't unheard of.
The glosses were sometimes written by the same scribe that made the copy, but often they appear to have been added later, perhaps by the owner of the book—sometimes in a comically small hand so as to fit in narrow spaces :)
Thus, I found myself to be more successful with a notebook.
There is another reason though. In addition to the word I write the Japanese definition of the word as well (from a children's dictionary). I mostly avoid using/memorizing English translations. This is just my personal preference, and certainly not the speediest or most effective way depending on your goals.
So it's much better for over-learning on known words, and less good at acquiring tons of new ones.
Related : Do 20 pages of a book give you 90% of its words?
HN discussion: https://news.ycombinator.com/item?id=14673229
At a convenience store or book store I find the display of "new/popular" books, pick one up, and read a few paragraphs to check the level of difficulty and for cursory interest. If it seems about right—I could generally understand even without recognizing all the words—buy and try.
Recently I read Harry Potter #1 and カエルの楽園, which turned out to be a politically motivated parable about Japanese rearmament. They were both at a great level for me. Currently I am reading ハツカネズミと人間 (Of Mice and Men) which was given to me by a friend. Next up is a book on eating one meal a day.
If this technique did help at all I think the effect was mainly psychological. Looking at a page with a bunch of words you don't know is stressful and demoralizing. Even being able to think that you have seen a character somewhere without knowing what it means can be comforting. It sounds silly but any amount of stress can severely inhibit language acquisition. You might be better off meditating for 15 minutes... Or taking drugs
The only thing that seemed to be 100% effective was readimg content that I was completely engaged in. It didn't matter if said content was close to my skill level or way beyond my skill level. As long as I am sufficiently engaged it doesn't matter how many times I have to reference the dictionary or comb google for grammar explanations.
I think any time I mistakenly use SRS expecting that it represents the learning process rather than it facilitates learning, I'm prone to this syndrome. More explicitly, the learning process involves active use and problem solving, creating a web of connections between concepts. SRS merely changes conceptual recall times. For a CS analogy, it moves the memorized concept to main memory instead of residing on a slow disk or over the network. Cumulative slow seeks frustrates learning, so this is useful. But, having something in fast memory isn't useful if you don't have an index to it -- the absence of which (contextually) is flashcard blindness.
I've also felt this problem a lot too, but haven't found a good alternative for self study. Anki/flash card needs to be a support, that uses the minority of your study time, and links closely to the other parts. This is easier when following a class, than when doing self-study.
It is so easy to build elaborate flashcard strategies and get lost in implementing them.
Flash card-based methods are very fast, especially in the beginning. Reading goes further, but is too slow for the first maybe 1000-2000 words (depending on the language).
First of all: Don't start reading a real book from a too low level. It's absolutely fantastic that the author has the energy and perseverance to attack a real book, but it can quickly end in frustration for most people. The author says he passed HSK4. Let's assume that his Chinese was actually a little bit better than HSK4 which would correspond to a vocabulary of around 1800 words. This sounds a lot but it's actually...nothing. I can only assume that his real level was significantly higher than HSK4, otherwise he would have seen many more than 5-10 new words per page (20 or 30 per page would be more realistic for the first 50 pages in my opinion).
Second, in my experience, even if you know all words of a Chinese text, you are still very far from understanding it. The grammar is a bitch. What will probably happen is that you will stare at what you assume is a sentence and wonder whether you accidentally got some old edition where the characters are printed from right to left or vertically. It's a pity that the author has solely focused on vocabulary acquisition in his article. That could give readers who only know Indo-European languages the impression that Chinese is somehow like Spanish (or even Farsi) where the sentence structure is similar enough to English and where just knowing the words can be already enough to enjoy a book.
I was curious though, when you said Chinese grammar was tough, what specifically did you think was tough? I always sell Chinese grammar as being super simple. It’s the easiest grammar of any language I’ve ever studied so far. I understand how depending on what language you come from other languages can be easier or harder of course. (E.g. Koreans can learn japanese easier than Americans, etc.)
Definitely. For me (native speaker of an Indo-European language), learning Chinese is like learning a functional programming language when you only know C. Give me a subject, a verb, and some objects and I can build you a more or less natural sounding sentence in five different European languages (two of them I have never used in an oral conversation). When I tried the same in Chinese, I would first have to consult my grammar book to see what specific pattern to use, decide whether it would be more natural to have a topic in front of the sentence, check the previous sentence to see whether I need the subject at all or whether I can just attach the verb to the previous sentence, and then finally add a couple of "le" here and there :) and still have a result that is wrong because "that's not how you say it".
A grammar book can't tell you this, because it's a style choice. All the grammar book will tell you is how to do it. For what it's worth, topic fronting is a standard feature of English too.
> and then finally add a couple of "le" here and there
Yeah... I feel your pain there.
Thanks for your response!
I've often heard successful learners would learn by rote memorization first, up to a very high level of characters, and only then start picking up grammar and the rest from reading or conversations.
I think the best way to tackle this is to just try and try again: read the sentence a second time, read the next sentence and see if you can understand the previous sentence with the added context, or just be content with an incomplete understanding; maybe that grammar usage will be easier to understand the next time you see it. Another good way is to read textbooks. Textbooks are great for grammar (although spaced repetition tends to work better for vocabulary).
Concerning your tool, how well does the word segmentation of jieba work in your experience? I tried a few, but at the end I always returned to http://mandarinspot.com/annotate
(Edit: because I can add whitespaces to segment words manually if I see that the output doesn't make sense)
I thought of trying to read the three body problem along the audiobook, however the one I found online doesn't seem to contain all the chapters.
Learning Chinese based on characters is going to be very frustrating. Most words have two characters and knowing their meaning only helps slightly with guessing the meaning of the word.
In the SUBTLEX-CH frequency data, 94% of characters are among the top 1000, but only 82% of words. So it's quite common to see a sentence where you know all the characters, but don't understand it because you don't know how they group into words.
It's really infuriating how many two-character words contain an extremely frequent character (like 要 yào) and you have still no idea what the word means.
1. Has different character sets, including traditional and simplified.
2. Has tens of thousands of different characters.
3. Has both conjugates and homophones.
4. Makes new words by combining characters.
5. Has completely different ways of speaking; none of which are easy.
6. Generally assumes that the other communicator can use the context to denote the tense.
Like. Guys. Common.
If you were combining characters to make words why bother with tens of thousands of characters in the first place?
2. Only ~2000 needed for normal vocab. Compare this with 10000+ for english.
3. Mandarin in particular has a lot of homophones but it's practically not a problem since terms are comprised of one or more sound and thus easily disambiguate.
4. This is a good thing, vocab is much lower (see point 2)
5. Same with any language?
6. All humans use context
2. You only need to know several thousands.
3. Chinese uses no cojugation (I guess you meant something else). All languages has homophones
4. That's good. That's why you only need several thousand words.
5. I'm not sure what you mean.
6. It simply doesn't convey tense most of the time.
Those tens of thousands is what you get if you count the most obscure characters you can find, that nobody really uses.
Also useful for this kind of analysis: https://www.chinesetextanalyser.com/
I'm doing something similar but instead of reading native books I'm still sticking to readers with increasing difficulty so that I don't have to look too many words up while reading. Since the readers also include word lists I can learn almost all of the words beforehand.
My reading list (traditional characters) is here: https://www.chinese-forums.com/forums/topic/44336-graded-rea...
having actually learnt to the point of watching news tv shows and reading chinese forums i can say that going the classroom or exam route is a waste of time and money
learn the top 50 to 70 chars then start watching dual subtitled programs like cctv4
do it actively meaning pausing rewinding attempting to read subtitles before the speaker etc but never touch a dictionary in flow only much later if you couldnt figure from context repeatedly ie greater than 6 7 context attempts fail
1.5 years and you will be much ahead of anyone else
If it is not for the hollywood movies/western RPGs, I won't find my inner calling to conquer English at all. I picked up writing by simply want to have a reasonable discussion with people on the Reddit (well, that was years ago, not sure it is possible nowadays).
To some degree, I see there is similarity between how human/algorithm approaches language. The trick is always data-driven: expose yourself/model more to the true distribution, which in this case the society where the language is used natively, the more the better.
It has a huge database of modern Chinese TV shows (ranging from ancient dramas to modern CSI-esque crime).
But the greatest part about it is that it has a "learn mode" that provides instant translations when you hover over characters you don't know, and pauses the show automatically .
i learnt chinese pretty much just watching Homeland Dreamland 远方的家 and Across China 走遍中国 from about 2016 to mid 2017 after which i was able to watch purely chinese shows without english subtitles
these are travel and tech shows so you learn about people places and lot more than just the language
also keeps it fresh and never boring so you never lose motivation for long
100s of hours of content all dual subtitled
It also has support for simplified/traditional conversion, bopomofo, Taiwanese and Cantonese, and typing from pictures.
The most useful texts are bilingual. For me, I was reading the Bible, song lyrics, comics, and subtitles. I want to upload those, but I've been threatened with copyright issues.
I've got a rough prototype of a tool for finding unknown words in text with european languages, but you've gotta mark them as known rather than integrating with anki: https://words.sh/
Anyway, until that stage flashcard based methods are faster and I find it easier to stick with that.
I recommend "Decipher Chinese" for beginning to advanced Chinese learners. Also Duolingo has a nice conversational-based course, though I recommend using something else to get a grasp of the first maybe 400 characters. You need to know how characters work. You really need to write them with your hands, either with pen and paper or on a tablet/phone. My theory is that this is necessary for an efficient "neural encoding" of Chinese characters. They are designed to be written in a certain way, as if you had a wet brush in your right hand.
Part of my difficulty learning Chinese is I taught myself, primarily by studying vocabulary, and learning characters, not practicing conversation with other people and I think this is a bad way to start. It's much better to put in far far more speaking and listening practice than reading/writing. I didn't do so, and now I can read Chinese text and communicate in chat, but have difficulty listening. And that's a shame, because I think once your listening skills gets reasonable good, you can start watching Chinese TV and Movies, and bootstrap a lot faster -- and in an entertaining way that seems less like mentally taxing work.
In case of Chinese novel, depending on author's style, if you are not reading those period novels, the idioms will not be a huge problem. Although I do see that since the flexibility of the Chinese language, author tend to create new combinations of characters into their own vocabulary to keep the their language fresh, which might be low-effort to the local readers but might present great challenges to language learners.
This goal is difficult the way you put it. In Chinese, besides reduction in strokes, a number of traditional characters (of different or similar meanings but mostly pronounced the same) are converted into one simplified character. In other words, traditional character set has a n-to-1 relationship with simplified character set. For most characters this n=1, but you would also see n=2,3... quite often.
For this reason, you would often see articles awkwardly converted from simplified characters into traditional characters when it's done automatically. The other direction--traditional character to simplified character conversion--has no such problem.
For example, COW + MEAT = BEEF.
牛 = Cow
肉 = Meat
This is a very simple example but there are more complicated words such as Divorce.
In Chinese this is 離婚 which is 離 (from or without) and 婚 (marriage) but it's hard to learn that this is actually Divorce in English.
Edit: Wife is Taiwanese and I'm trying to learn Chinese.
But going the other way, for someone learning english though, this is a problem. They may very well understand and know "cow meat", but have no idea about the word "beef" if they have not encountered it before.
Just like 对不起人 translates to “Canadians”
For example, 離婚 is in the file.
- trad: 離婚
pinyin: lí hūn
- to divorce
- divorced from (one's spouse)
- trad: 我想她會和他離婚。
pinyin: wǒ xiǎng tā huì hé tā líhūn 。
eng: I think she will divorce him.
In this case, I think actually neither should be on the list, and you should instead learn 牛，羊，and 肉 separately and infer the meaning of 牛肉 and 羊肉 from those. I'll change it.
It's something you learn over time for the ones that are more obvious at least. Not all combination words are straight forward, but many basic words are straightforward. "羊肉" is literally "lamb meat" and if you're told 牛 means "cow" you can probably guess "牛肉" means "cow meat". And if 豚 means "pig", what do you think "豚肉" means? And 魚 means "fish" so "魚肉" means? They aren't always so perfectly straight forward, but I bet you can guess what this means too: 消防士 "extinguish (a fire), defend, gentlemen."
This type of meaning inference doesn't always work, though it will for the vast majority of what many would consider "basic vocabulary".
In my opinion, reading traditional characters with spaces is much easier than simplified characters without spaces.
I do notice that some generated words are bogus (compounds of other words or redundant in some way). I have a file that lists redundant words, and when I notice these words I add them to the file so the tool won't generate them again. It also lists the words that they are duplicates of, so that those words can be upranked. This is the file if you're curious: https://github.com/kerrickstaley/Chinese-Vocab-List/blob/mas...
1. (verb) leave; part from; be away from -- 我的夫人離我而去 My wife left me.
2. (verb) separate
3. (verb) defy; go against
4. (preposition) distant / apart from -- 北京離上海有多远？ How far is Beijing from Shanghai?
5. (noun) name of one of the 8 Trigrams
6. (noun) name of one of the 64 Hexagrams
The sense of 離 is separation; it makes perfect compositional sense in 離婚. A divorce involves two people starting together and going apart.
That said, I wonder if it also creates cards for the individual characters when they make sense. I know that for me, it's often easier to learn a composite word if I know the individual characters, and it helps you figure out other likely words (eg: cow+ meat = beef, so pig+meat = pork).
Also, in order to generate a flashcard for such a word, it would have to appear in a standalone context, e.g. 我没吃过牛肉 would not generate flashcards for 牛 or 肉, but 牛不吃肉would.
The Michel Thomas method has worked pretty well, much better than Pimsleur, for me:
And the Yale "Speak Mandarin" series with romanized Chinese is also really helpful.
Learning Mandarin, overall, carries a much higher cognitive load than, say, Spanish, because the sounds don't map to 26 letters. They map to 1000s of characters that must be memorized. This makes reading a much less useful way for a beginner to approach a language.
from bs4 import BeautifulSoup
soup= BeautifulSoup(r.text, 'lxml')
RE = re.compile(u'[^⺀-⺙⺛-⻳⼀-⿕々〇〡-〩〸-〺〻㐀-䶵一-鿃豈-鶴侮-頻並-龎]', re.UNICODE)
# txt = soup.find_all(string=RE)
txt = re.sub(RE, '', soup.get_text())
Get a book that is written in conversional language and learn the vocabulary of a page before reading it. This probably neither gets you speaking and writing in that language, but being able to read it is a good start.
I tried also:
* special books: but it’s impractical as they usually translate not enough words at my level
* webpages with perapera plugin: font is too small, translation is unaware of context and the texts I found were not easy to read
Why is that? Living in England, there weren’t many German-speaking people around so I didn’t get much out of it. Even when I went to Germany, they all spoke English better than I spoke German so we mostly ended up conversing in English.
Chinese on the other hand... there are lots of Chinese speakers in most places, and they’re mostly very happy, relieved and even excited to speak Chinese with you. If you ever go to China, speaking Chinese opens you up to a whole host of new experiences. Few people confidently speak English there so you can get a lot of practice very quickly.
I do agree that ROI on reading/writing is low, and it’s mostly due to the insane difficulty of memorizing characters. My suggestion: learn how to read and type simplified Chinese. Don’t bother with writing (which interestingly is much harder than typing on a pinyin keyboard), or traditional Chinese (there are tools which convert back and forth between the two of you need
I personally find traditional Chinese characters easier to learn to read than simplified characters. Simplified characters are definitely much easier to write but that's not very useful in modern times. Plus learning traditional Chinese characters helps with Japanese.
The reason that traditional Chinese characters are easier for me to learn is that it keeps components that hints toward the meaning that are left out of the simplified version.
For example, hear is 聽 (traditional) and 听 (simplified). The traditional character has 耳 which means ear, the new simplified character has 口 (mouth) and 斤 (axe). That's not necessarily the best example since some of the other components of the traditional character are a bit weird in a character related to hearing but I do feel that it's easier to derive the meaning from the traditional character.
Likewise, I prefer 愛 (love) to have the heart component 心.
But realisticlly, Mandarin is very hard to learn, we learn it here in China everyday until the last day of highschool, the everyday classes of Chinese language is no less than any other major course, that's serious time investment even for native speakers, yet still many college students are basicly asses with it.
I'd say the information within it's language world is low quality, large parts of it were translated poorly from English, no where near the English one since it's the world language, so you basicly were reduced to using it for everyday conversation, if you realy want, I suggest you don't invest more than that.
And I oppose teaching it in foreign middle schools aside for Chinese American, it's good to dabble in for some basics though.
this roi is pretty good
also have huge respect for the chinese ultra realist world view though i dont agree with it
a story like the three body problem is basically impossible from anywhere else
chinese is also devastatingly succinct on a whole another level and much appreciated