
“Prestudy”: Learning Chinese Through Reading - KerrickStaley
http://www.kerrickstaley.com/2018/09/04/chinese-prestudy
======
dwg
I am reading novels in Japanese. My method is similar but without the
flashcards.

I read a page at a time, looking up and writing all unknown vocabulary in a
notebook. There could be dozens of new words on the page. I then reread the
same page. Usually I can read the entire page through quickly the second time,
because it's fresh in my memory. However, when the same words come up again
later I may have forgotten the reading or meaning, so it gets another entry in
the notebook (even if I recognize it). After noting the same word a few times,
I may start to remember it.

The advantage of reading books, in my opinion, is that they tend to use the
same vocabulary over and over. You'll eventually remember the most frequently
used words naturally. The other advantage is that vocabulary appears in
context. This is better for both remembering and for understanding. This is
why I don't use flashcards, and just let the book be my study tool.

It's much easier to use this method if the book is selected carefully. It
should be relatively easy, but still interesting. Something that might be
assigned reading in middle school or high school, I feel, are often good
examples.

~~~
Timpy
Do you have any suggestions for a first novel in Japanese? I am already
reading slice-of-life manga, but the pictures and furigana are a crutch, and
the vocab is pretty much limited to everyday life.

~~~
pouetpouet
I would look into graded readers.

Related : Do 20 pages of a book give you 90% of its words?

Link: [https://blog.vocapouch.com/do-20-pages-of-a-book-gives-
you-9...](https://blog.vocapouch.com/do-20-pages-of-a-book-gives-you-90-of-
its-words-795a405afe70)

HN discussion:
[https://news.ycombinator.com/item?id=14673229](https://news.ycombinator.com/item?id=14673229)

------
s_m_t
After using what was essentially this technique to learn to read Japanese I'm
not entirely convinced it is actually all that useful compared to just
reading. I found myself spending so much time in anki that sometimes I
wouldn't do real reading at all... And for what? So I would have a chance at
being sort of familiar with a character when I came across it? So called
"flashcard blindness" is also a real thing. Context is so important when
learning, sometimes it was like I had to learn two words, the word I 'learned'
in anki and the word in the wild. Even though they were obstinsably the same
word somehow they couldn't connect.

If this technique did help at all I think the effect was mainly psychological.
Looking at a page with a bunch of words you don't know is stressful and
demoralizing. Even being able to think that you have seen a character
somewhere without knowing what it means can be comforting. It sounds silly but
any amount of stress can severely inhibit language acquisition. You might be
better off meditating for 15 minutes... Or taking drugs

The only thing that seemed to be 100% effective was readimg content that I was
completely engaged in. It didn't matter if said content was close to my skill
level or way beyond my skill level. As long as I am sufficiently engaged it
doesn't matter how many times I have to reference the dictionary or comb
google for grammar explanations.

~~~
generativist
I had never heard the term "flashcard blindness." But, wow -- it fits my
experience.

I think any time I mistakenly use SRS expecting that it _represents_ the
learning process rather than it _facilitates_ learning, I'm prone to this
syndrome. More explicitly, the learning process involves active use and
problem solving, creating a web of connections _between_ concepts. SRS merely
changes conceptual _recall_ times. For a CS analogy, it moves the memorized
concept to main memory instead of residing on a slow disk or over the network.
Cumulative slow seeks frustrates learning, so this is useful. But, having
something in fast memory isn't useful if you don't have an index to it -- the
absence of which (contextually) is flashcard blindness.

------
rdlecler1
One summer I was running a month long simulation for my PhD and decided to
learn Chinese using an undergrad text I found. It had about 17 chapters with
50 new characters per chapter. Normally this would be a two semester course.
Using spaced repetition, some books that helped to show the pictographic
evolution, and color coding the tones, I was able to go through a chapter a
day—granted it took about 14 hours a day. After a few weeks I could read write
and speak about 1300 words/850 characters. Reading was the easiest, then,
writing, then speaking. Listening was the hardest and one I always struggled
with. I don’t have a great memory, but if you have the time, tools, and will
it’s very manageable. Usually we lack the time for such an indulgence.

~~~
selimthegrim
Which text was this?

~~~
cepp
Sounds like this is Integrated Chinese. A very popular undergraduate text in
the US for teaching Chinese language skills.

~~~
rdlecler1
Can’t recall. It was Yale’s undergrad text circa 2007. Looked old but was very
good.

------
tralarpa
Just a word of warning to all HN readers who plan to follow the author's
method. It is a quite painful way to learn a language, especially Chinese. I
have learned a language (not Chinese) in this way, but I doubt that it is more
efficient than following a decent language course (which are, unfortunately,
very rare for Chinese. That could explain why the author did what he did).

First of all: Don't start reading a real book from a too low level. It's
absolutely fantastic that the author has the energy and perseverance to attack
a real book, but it can quickly end in frustration for most people. The author
says he passed HSK4. Let's assume that his Chinese was actually a little bit
better than HSK4 which would correspond to a vocabulary of around 1800 words.
This sounds a lot but it's actually...nothing. I can only assume that his real
level was significantly higher than HSK4, otherwise he would have seen many
more than 5-10 new words per page (20 or 30 per page would be more realistic
for the first 50 pages in my opinion).

Second, in my experience, even if you know all words of a Chinese text, you
are still very far from understanding it. The grammar is a bitch. What will
probably happen is that you will stare at what you assume is a sentence and
wonder whether you accidentally got some old edition where the characters are
printed from right to left or vertically. It's a pity that the author has
solely focused on vocabulary acquisition in his article. That could give
readers who only know Indo-European languages the impression that Chinese is
somehow like Spanish (or even Farsi) where the sentence structure is similar
enough to English and where just knowing the words can be already enough to
enjoy a book.

(Edit: language)

~~~
PavlikPaja
And Chinese vocabulary is rather small and extremely systematic, so those 1000
or so characters form a solid base for understanding. Far from looking up
every other character or so. Only a few % of encountered characters are not
among the first 2000.

I thought of trying to read the three body problem along the audiobook,
however the one I found online doesn't seem to contain all the chapters.

~~~
yorwba
> And Chinese vocabulary is rather small and extremely systematic, so those
> 1000 or so characters form a solid base for understanding.

Learning Chinese based on characters is going to be very frustrating. Most
words have two characters and knowing their meaning only helps slightly with
guessing the meaning of the word.

In the SUBTLEX-CH frequency data, 94% of characters are among the top 1000,
but only 82% of words. So it's quite common to see a sentence where you know
all the characters, but don't understand it because you don't know how they
group into words.

~~~
tralarpa
> Most words have two characters and knowing their meaning only helps slightly
> with guessing the meaning of the word.

It's really infuriating how many two-character words contain an extremely
frequent character (like 要 yào) and you have still no idea what the word
means.

~~~
3pt14159
As someone that knows a bit of Russian and French Chinese seems like a
language almost invented to be hard. Let me get this right; Chinese:

1\. Has different character sets, including traditional and simplified.

2\. Has tens of thousands of different characters.

3\. Has both conjugates and homophones.

4\. Makes new words by combining characters.

5\. Has completely different ways of speaking; none of which are easy.

6\. Generally assumes that the other communicator can use the context to
denote the tense.

Like. Guys. Common.

If you were combining characters to make words why bother with tens of
thousands of characters in the first place?

~~~
dionian
1\. "Simplified" is technically a script variant of traditional developed in
the last half century or so, only in the PRC - but they are ultimately still
the same system. Those who can read traditional chinese script can read
simplified with ease.

2\. Only ~2000 needed for normal vocab. Compare this with 10000+ for english.

3\. Mandarin in particular has a lot of homophones but it's practically not a
problem since terms are comprised of one or more sound and thus easily
disambiguate.

4\. This is a good thing, vocab is much lower (see point 2)

5\. Same with any language?

6\. All humans use context

------
wibr
This is neat!

Also useful for this kind of analysis:
[https://www.chinesetextanalyser.com/](https://www.chinesetextanalyser.com/)

I'm doing something similar but instead of reading native books I'm still
sticking to readers with increasing difficulty so that I don't have to look
too many words up while reading. Since the readers also include word lists I
can learn almost all of the words beforehand.

My reading list (traditional characters) is here: [https://www.chinese-
forums.com/forums/topic/44336-graded-rea...](https://www.chinese-
forums.com/forums/topic/44336-graded-readers-by-the-numbers-characterswords-
page-count/?do=findComment&comment=438883)

~~~
KerrickStaley
Thanks for the links! Really interesting to see other projects working in this
direction. I'll try picking up some of those readers too :) I'm trying to
learn both simplified and traditional (but my simplified is pulling ahead
right now because of all the work I'm putting in on The Three Body Problem).

------
prehistacct123
people need to go massive input either in country or online

having actually learnt to the point of watching news tv shows and reading
chinese forums i can say that going the classroom or exam route is a waste of
time and money

learn the top 50 to 70 chars then start watching dual subtitled programs like
cctv4

do it actively meaning pausing rewinding attempting to read subtitles before
the speaker etc but never touch a dictionary in flow only much later if you
couldnt figure from context repeatedly ie greater than 6 7 context attempts
fail

1.5 years and you will be much ahead of anyone else

~~~
archgoon
Hi, that cctv4 looks like a great resource; thanks! Can you recommend any
others? Are there archives of cctv4?

~~~
prehistacct123
youtube has cctv4 shows going back years and also livestreams

[https://m.youtube.com/user/cctvch](https://m.youtube.com/user/cctvch)

i learnt chinese pretty much just watching Homeland Dreamland 远方的家 and Across
China 走遍中国 from about 2016 to mid 2017 after which i was able to watch purely
chinese shows without english subtitles

these are travel and tech shows so you learn about people places and lot more
than just the language

also keeps it fresh and never boring so you never lose motivation for long

100s of hours of content all dual subtitled

------
peterburkimsher
I wrote [https://pingtype.github.io](https://pingtype.github.io) to add spaces
between words, pinyin, the literal translation for each word, and a parallel
English translation when available.

It also has support for simplified/traditional conversion, bopomofo, Taiwanese
and Cantonese, and typing from pictures.

The most useful texts are bilingual. For me, I was reading the Bible, song
lyrics, comics, and subtitles. I want to upload those, but I've been
threatened with copyright issues.

------
cgag
I think prestudying vocab before doing more extensive reading is a great
technique and I haven't seen it mentioned much.

I've got a rough prototype of a tool for finding unknown words in text with
european languages, but you've gotta mark them as known rather than
integrating with anki: [https://words.sh/](https://words.sh/)

~~~
farresito
A bit of a late answer, but your idea is pretty cool, definitely something I
could see myself using if it were integrated in an ebook reader or something
similar. For German (and a few other languages, actually), a very good website
to fetch sentences and translations from is Linguee. I might implement
something similar on top of epubjs-reader.

------
bayesian_horse
I don't believe it's a very good idea to start reading "god-mode" texts in a
foreign language until you can get to something resembling a smooth reading
flow. A couple of new words per page is ok, a couple of new words per sentence
isn't. One of the keys is to NOT look up every unknown word. You'll figure out
which ones you need and which ones you don't. Next important secret: Don't
write down the words. Instead, reread the pages a while later, and see if you
can remember it from the context. If not, look it up again.

Anyway, until that stage flashcard based methods are faster and I find it
easier to stick with that.

I recommend "Decipher Chinese" for beginning to advanced Chinese learners.
Also Duolingo has a nice conversational-based course, though I recommend using
something else to get a grasp of the first maybe 400 characters. You need to
know how characters work. You really need to write them with your hands,
either with pen and paper or on a tablet/phone. My theory is that this is
necessary for an efficient "neural encoding" of Chinese characters. They are
designed to be written in a certain way, as if you had a wet brush in your
right hand.

------
philliphaydon
I'm unsure how this works, because this technique wont teach you what the
English equiv is when 2 (or more) words are next to each other are.

For example, COW + MEAT = BEEF.

牛肉

牛 = Cow

肉 = Meat

This is a very simple example but there are more complicated words such as
Divorce.

In Chinese this is 離婚 which is 離 (from or without) and 婚 (marriage) but it's
hard to learn that this is actually Divorce in English.

Edit: Wife is Taiwanese and I'm trying to learn Chinese.

~~~
gricardo99
Both your examples seem to be a problem for someone learning english (i.e.
there's a specific word for cow meat), but not a problem for someone learning
Chinese. Perhaps I'm missing your point, but if I come across a word that
translates to "cow meat", I also know that as "beef" in english, but it's
perfectly understandable to me as "cow meat".

But going the other way, for someone learning english though, this is a
problem. They may very well understand and know "cow meat", but have no idea
about the word "beef" if they have not encountered it before.

~~~
ronilan
Exactly.

Just like 对不起人 translates to “Canadians”

;)

~~~
vfulco2
this was good :-)

------
vonnik
For any one learning Mandarin, the best advice I have is: don't start with
reading, start with speaking.

The Michel Thomas method has worked pretty well, much better than Pimsleur,
for me:

[https://www.michelthomas.com/](https://www.michelthomas.com/)

And the Yale "Speak Mandarin" series with romanized Chinese is also really
helpful.

[https://yalebooks.yale.edu/book/9780300000849/speak-
mandarin...](https://yalebooks.yale.edu/book/9780300000849/speak-mandarin-
textbook)

Learning Mandarin, overall, carries a much higher cognitive load than, say,
Spanish, because the sounds don't map to 26 letters. They map to 1000s of
characters that must be memorized. This makes reading a much less useful way
for a beginner to approach a language.

------
cromwellian
I estimate I'm currently at HSK3, but I have not read any Chinese books or
novels except childrens books. I would imagine that an adult Chinese novel
probably contains a lot of idioms (成语）and these have to be recognized 4
characters at a time, the meaning won't come from just knowing the meaning of
the individual characters. Think of an intermediate English reader
encountering the phrase "piece of cake", or "cakewalk", or a phrase like "bite
the bullet"

Part of my difficulty learning Chinese is I taught myself, primarily by
studying vocabulary, and learning characters, not practicing conversation with
other people and I think this is a bad way to start. It's much better to put
in far far more speaking and listening practice than reading/writing. I didn't
do so, and now I can read Chinese text and communicate in chat, but have
difficulty listening. And that's a shame, because I think once your listening
skills gets reasonable good, you can start watching Chinese TV and Movies, and
bootstrap a lot faster -- and in an entertaining way that seems less like
mentally taxing work.

~~~
oh-kumudo
Reading novel is always difficult. Having lived in an English-speaking country
for many years, and communicated fluently with locals, even in certain cases
passing their radar without being identified as foreigner, it is still pretty
hard to read a reasonably popular English novel. The words they use, the
rhetorics are all so different from everyday spoken language.

In case of Chinese novel, depending on author's style, if you are not reading
those period novels, the idioms will not be a huge problem. Although I do see
that since the flexibility of the Chinese language, author tend to create new
combinations of characters into their own vocabulary to keep the their
language fresh, which might be low-effort to the local readers but might
present great challenges to language learners.

------
zmh
>It also only supports texts with simplified characters. I’ll eventually add
support for traditional characters. The silver lining is that when you add a
flashcard for a simplified character, you’ll also get a flashcard for the
traditional character. It’ll be suspended by default so you’ll have to
unsuspend if you want to study it.

This goal is difficult the way you put it. In Chinese, besides reduction in
strokes, a number of traditional characters (of different or similar meanings
but mostly pronounced the same) are converted into one simplified character.
In other words, traditional character set has a n-to-1 relationship with
simplified character set. For most characters this n=1, but you would also see
n=2,3... quite often.

For this reason, you would often see articles awkwardly converted from
simplified characters into traditional characters when it's done
automatically. The other direction--traditional character to simplified
character conversion--has no such problem.

------
gpetukhov
I use Pleco dictionary for reading Chinese books (read 20% of 三体 so far).
There is a built-in copy-paste reader in the app. You copy-paste any book
(it's just text) you want and it's possible to look up any word and add it to
flashcards, pronounce it, etc. Minimum effort and preparation.

~~~
howlingfantods
This is what I do too. In the iOS version, you can directly open epubs. It's
great

------
foobarqux
Nice tool. I'm interested in Chinese websites primarily so here is some code I
wrote to dump text from a Chinese website.

    
    
        import re
        import requests
        from bs4 import BeautifulSoup
        url='http://www.36kr.com'
        r=requests.get(url)
        soup= BeautifulSoup(r.text, 'lxml')
        RE = re.compile(u'[^⺀-⺙⺛-⻳⼀-⿕々〇〡-〩〸-〺〻㐀-䶵一-鿃豈-鶴侮-頻並-龎]', re.UNICODE)
        # txt = soup.find_all(string=RE)
        txt = re.sub(RE, '', soup.get_text())
        print(txt)

------
k__
Sounds like a good technique.

Get a book that is written in conversional language and learn the vocabulary
of a page before reading it. This probably neither gets you speaking and
writing in that language, but being able to read it is a good start.

------
freshfey
Very interesting article. I'm trying to learn Mandarin with a focus on
speaking and conversating (using Memrise 2x per day for 15min). Would this
help as well? Reading seems to me like a much more difficult project to
tackle.

~~~
KerrickStaley
I've always been a visual learner and listening and speaking never really did
it for me. Reading, though harder because of the character system, has helped
cement vocabulary and grammar in a way that speaking and listening didn't. My
speaking and listening are getting better by osmosis too.

------
dgacmu
Author, since you're commenting: you have the pinyin for San Ti wrong in the
first line of your first major subsection. San is first tone, but it's marked
in your post as 4th. (you have Ti correctly in 4th)

------
baby
I’ve been trying this technique via two apps:

* duchinese

* wordswing

I tried also:

* special books: but it’s impractical as they usually translate not enough words at my level

* webpages with perapera plugin: font is too small, translation is unaware of context and the texts I found were not easy to read

------
peteretep
That is _awesome_. I've been wanting to do exactly this for Thai Wikipedia for
quite a while, but you have some serious issues with identifying words in
Thai.

~~~
yorwba
When I want to segment words in some language, I usually check what Apache
Lucene does. In this case, the Thai tokenizer [1] simply uses
java.text.BreakIterator [2] and hopes that Thai is supported.

[1] [https://git-wip-us.apache.org/repos/asf?p=lucene-
solr.git;a=...](https://git-wip-us.apache.org/repos/asf?p=lucene-
solr.git;a=blob;f=lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java;h=53f71699f2f9041bbdaab2d450e9315a6545cce9;hb=HEAD)

[2]
[https://docs.oracle.com/javase/10/docs/api/java/text/BreakIt...](https://docs.oracle.com/javase/10/docs/api/java/text/BreakIterator.html)

~~~
peteretep
Fantastic tip, thanks

------
lalos
Kerrick what are your thoughts on the Pleco flashcards instead of Anki for
iPhone Chinese flashcards?

~~~
wibr
The big advantage of flashcards in Pleco is the integration with the whole
app. If you use the Pleco Reader, you can add new cards directly from the
Reader. You can also create cards from dictionary entries and during studying
check other dictionaries for each card, play the audio etc.. Some people
prefer Anki, but personally I would recommend the Pleco Flashcards for
learning Chinese.

------
de_watcher
Software developers should be more efficient with symbols. This approach may
actually work better.

------
ofrzeta
Does anyone have experience with methods like Michel Thomas or Rosetta stone
learning Mandarin?

------
bcaa7f3a8bbc
I wonder if the author's tool can be generalized to be used for other
languages.

------
ikawe
Very nice!

