
Typing Chinese like English - bobajeff
https://tedclancy.wordpress.com/2016/10/31/typing-chinese
======
devy
As a native Chinese speaker and one of the first generation born and grew up
with personal computers (born in early 80s), I'd say a machine-learning
powered handwriting input method is more convenient. No hard feelings. It's
just too hard to reinventing the wheel on this issue where tens of thousands
of engineers had worked on this problem throughout the 90s. I remember when I
went to college in Beijing in late 90s there were literally 100s of companies
were selling Chinese input method softwares and hardwares or both. What I've
witnessed along the past two decades is that all converged into a few methods
based on pronunciation (variants of Pinyin or voice dictation) or based on
shapes (variants of Wubi or handwriting). And I can attest that Pinyin is
predominant since it's a mandatory course in schools. There are even newer
machine learning powered hybrid input methods emerging.[1]

Having said that, if you don't know Chinese neither any of its input methods,
just want input Chinese characters casually, this may be useful but if you do
want dive in to learn Chinese from ground up, I am not sure if learning this
Wuhou input method is time well spent.

[1]: [http://www.xunfei.cn/](http://www.xunfei.cn/)

~~~
hawkice
> if you don't know Chinese neither any of its input methods, just want input
> Chinese characters casually, this may be useful

As a casual student of Chinese, pinyin is actually pretty good for me (I'm
American). I don't need any new markings on my keyboard, and it helps force me
to learn pronunciations anyway, which is good. :)

Also: I saw people using the handwriting-based input in Hong Kong and it
seemed like they were typing slower than I do, which is bananas.

~~~
thaumasiotes
> Also: I saw people using the handwriting-based input in Hong Kong and it
> seemed like they were typing slower than I do, which is bananas.

Speaking as another American who types in Chinese, predictive input doesn't
work at all for me when I type in English. It works really well when I type in
Chinese, because I'm just not able to produce intricate, idiosyncratic Chinese
sentences. It doesn't surprise me that native speakers might take more time to
type in their own language; their range of expression is much greater.

------
zsj
I suggest you looking at Wubi input method[1] which is widely used in China
for more than 20 years. My first impression of your method is just the same as
Wubi.

The reason why Pinyin input method is more popular, I guess, is that Wubi is
too hard to learn which need remember a lot of rules.

But quoted from Wikipedia,

> it is true that Wubi is extremely fast when used by an experienced typist.

So Wubi is still very popular among those who need to type a lot of
characters.

[1]
[https://en.wikipedia.org/wiki/Wubi_method](https://en.wikipedia.org/wiki/Wubi_method)

~~~
huac
Wait, what's the difference between "wubi" and "wubihua"?

~~~
jjcc
wubi is based on components of a character, which means typing is super fast
at the cost of memorizing all the components. wubihua is based on 4 strokes,
first 4 and the last one. It's much simpler to use yet not very fast

------
pavel_lishin
> _My current employer doesn’t let me work on Open Source projects or personal
> projects without prior permission_

That's ... weird. I hope they're paying well for this.

~~~
MrSourz
I had the same thought. Do you know if there's a list anywhere of valley /
well known companies with this type of policy?

It's the type of thing that'd turn me off for sure during a job search.

~~~
svachalek
It's more typical for them to claim ownership of it, rather than to forbid it.
Any job where you have to submit a list of prior work with your employment
contract (which is most of them in my experience) is usually a sign that
there's a clause in there claiming ownership of everything you do while
employed there. Personally I doubt it's very enforceable, especially since
California is so worker-friendly on everything else I know about, but I don't
know how well these clauses work out in practice.

~~~
kmicklas
> especially since California is so worker-friendly on everything else

Are there no tech jobs outside California?

~~~
kobeya
He said "valley"

------
libeclipse
> My current employer doesn’t let me work on Open Source projects or personal
> projects without prior permission, so let me be clear that this is something
> I developed prior to my current employment.

Err wait. How does this work?

~~~
khedoros1
Usually, it's a clause in the employment contract saying that the company
claims ownership of anything you create while under their employment, under
the theory that a salary covers more than just the time you're in the office.
Maybe combine that with a company policy against releasing internally-produced
source code, or something?

My employer has something similar to that contract clause, but I've never
heard of them actually invoking it.

------
donald123
There is already wubi typing method in China, which was widely used by my
parents' generation. This is essentially reinventing wubi method.

~~~
WuhouIM
It's way easier than Wubi. At least, that's the intention.

------
WuhouIM
Hey, I don't know who you are, but thanks for posting about my input method
here! Most of my referrals are coming from here.

Edit: Hacker News is now complaining that I'm commenting "too fast" and won't
let me leave new comments. Feel free to leave comments on my Wordpress blog.

Someone upthread said they couldn't figure out how to type the 金 radical. It's
just the 金 key. 钱 is just "金戋". Couldn't be easier. Make sure you have
simplified mode (简体) selected at the top of the page since that's a simplified
character.

------
jxy
Not sure if it is a new system or an implementation of an existing one. There
are already a few existing ones [0] (no English translation). They are great
for people who can't speak the official Chinese language properly, which is
more than 80% of the population in China (or > 90% world wide), to whom
pronunciation based input systems are impossible to grasp.

[0] [https://zh.wikipedia.org/wiki/字形输入法](https://zh.wikipedia.org/wiki/字形输入法)

The efficiency of such input methods depend much on one's proficiency, and
also the compression ratio of the encoding. After some practice, these methods
beat pronunciation base methods easily, as the codes are shorter and with less
degeneracy.

On the other hand, all the cell phones have the basic Wubihua [1] that uses
only five keys to encode the order of how you would write a character.

[1]
[https://en.wikipedia.org/wiki/Wubihua_method](https://en.wikipedia.org/wiki/Wubihua_method)

~~~
natch
As you indirectly suggest, it really depends who the user is.

Especially when you say these methods beat pronunciation-based methods, I
think it's important to qualify that as something that depends on the specific
user.

For a touch typist and an only semi-literate (emphasis on semi) non-native
speaker, the ability to go straight to touch typing of pinyin with absolutely
no learning curve is a HUGE advantage over other systems. By contrast, this
system does not leverage my existing typing ability at all, so its ultimate
speed is gated by the fact that I will never bother to use and learn it. For
users like me, the same is true of other such systems.

The point is just that theoretical speed comparisons are moot for some people
if practical matters make one system more useful.

Speaking of moot, it seems to me that all our points may be moot as voice
input is going to be an ever growing proportion of how text is entered.
Especially in Chinese languages.

------
ensiferum
_shameless plug_

And for those who can speak Chinese, i.e. know the grammar but are limited to
pinyin and can't type in Chinese characters there's this handy tool that
combines pinyin input with a dictionary (and frequency table) and makes it
easy to write in Chinese. Basically just type the word in pinyin and then
choose the right chinese character from the suggested word list. The English
definition of the word helps to choose the correct Chinese character. There is
support for traditional and simplified.

[https://github.com/ensisoft/pinyin-
translator](https://github.com/ensisoft/pinyin-translator)

~~~
potatosoup
In fact popular IMEs like Sogou do this and also get updated to recognize
names of famous people, movies, etc.

You can even type abbreviations for common phrases, like "cflm" and it'll
figure out you meant chi-fan-le-ma 吃飯了嗎

~~~
ensiferum
Oh, that's interesting. Back in the day when I wrote that tool all the IME
tools I found and tried that were able to understand pinyin expected you know
the _correct_ character but lacked support for any kind of verbal cue
(definition, dictionary) about which is the correct character with the
intended meaning.

~~~
inimino
Current input methods are based on large corpora and use context to
disambiguate. They are surprisingly smart.

------
akavi
I'm a non-native/hobbyist Chinese learner, and I'm liking it so far. I wonder
if this could end up suiting the purposes of people in my niche, even if it
can't supersede other methods among native speakers? Personally, I strongly
dislike pinyin because using roman letters engages the intrusive "thinking in
English" part of my brain - something I (and I think, a lot of other learners)
try and avoid when engaging a new language.

I've spent a decent amount of time trying wubi, and this seems to be easier.
Wubi tries to limit itself to 26 keys, so you get a lot of non-intuitive
grouping of seemingly unrelated strokes onto the same key (Or at least, going
back to my first point, non-intuitive to non-native speakers. Can't speak to
how the groupings are perceived by people who learn to read/write via the
standard pedagogy.).

Small things that would make this better for me:

* The fact there's more radicals behind the shift key is really not obvious. You might want to add a note/tooltip pointing that out?

* A "reverse lookup" going from character to key compositions. Right now I'm struggling to figure out how to type 钱 because I can't figure out how to write the 金 radical.

* Is there just the one, canonical way of writing a character? Or can you "compose" them up from any "correct" series of strokes? If it's the latter, how hard would it be to make it the latter?

~~~
thaumasiotes
> Personally, I strongly dislike pinyin because using roman letters engages
> the intrusive "thinking in English" part of my brain

If this really bothers you, use zhuyin, or double pinyin.

> Is there just the one, canonical way of writing a character? Or can you
> "compose" them up from any "correct" series of strokes?

There is always one canonical way (the "stroke order") of writing a character.
I find it a little surprising that a hobbyist learner wouldn't already know
this? How are you studying?

My introductory text for writing characters was
[https://www.amazon.com/Learning-Chinese-Characters-Ms-
Zhang/...](https://www.amazon.com/Learning-Chinese-Characters-Ms-
Zhang/dp/7561912943/) , and it worked well enough that I can generally input
unfamiliar characters in bihua.

On your phone, pleco (which you should have if you're studying Chinese) has a
$5 stroke order addon, and skritter (meh) is all about practicing stroke
orders.

~~~
akavi
> If this really bothers you, use zhuyin, or double pinyin.

Double pinyin is still pinyin. Zhuyin works, but I suppose, on reflection, I
also want the benefit of reinforcing my knowledge of how to write a character
by hand.

> There is always one canonical way (the "stroke order") of writing a
> character. I find it a little surprising that a hobbyist learner wouldn't
> already know this? How are you studying?

I'm well aware; I was talking about in this input system. Ie, a given radical
can be further decomposed into smaller radicals or individual strokes. If I
input "stroke 1" \+ "stroke 2" instead of "radical comprising stroke 1 and
stroke 2" is it meant to still work? (I discovered several situations where
that wasn't the case, so another way of stating my question is: Is this
intentional or a bug?).

~~~
natch
>I also want the benefit of reinforcing my knowledge of how to write a
character by hand.

In that case you should just use handwriting recognition software such as that
built into your smartphone.

------
raingrove
Things like this make me appreciate Hangul(한글) more and more. It isn't
immediately obvious to me how I can enter basic characters like "国". (I still
can't figure it out.)

~~~
swang
because you're using the simplified version.

the traditional version of that character is: 國

type "F" for 囗, and "z" for 戈, if you were using cangjie you'd have to add the
extra "口" and "一" but i'm guessing this system proposes that 國 is just 囗 + 戈
because that's good enough to identify what you're going for.

edit: the problem seems to be you need to switch to simplified. "F-q-g-e" got
me the character.

~~~
khc
but some simplified characters are encoded (like 号). I also can't type words
like 回 (which is weirdly not FF)

~~~
univerio
It's Ff (capital F, lowercase F). Don't know why there's a difference between
big and small 口, though.

~~~
khc
Ah, so 国 is Fqg

------
natch
What I enjoy about pinyin input is I can just touch type.

Without remembering how the character is written.

And then when I see the result, I can check it and (assuming I sort-of-know
the character well enough to do so) I confirm it and go on. Sometimes
confirming is not an extra step; it's simply continuing to type.

Also I can type a long bunch of characters in sequence, and they give each
other context so the characters get disambiguated, meaning I have less to
check and fewer things to choose or correct.

This new method seems like a good system for someone who doesn't know how to
say the words they want to type. Such as some of the people the OP described,
so good job on that front! Maybe not so good if you are learning or aiming to
learn spoken Chinese, though, because it keeps you away from learning or
reinforcing the pronunciations, doesn't build on existing knowledge of the
keyboard, and doesn't create a very portable skill as pinyin typing would.

------
peterburkimsher
Thank you! I'm working for a computer company in Kaohsiung. There are no
Mandarin classes outside my working hours. I'm doing what I can with private
tutoring and self-study, but it's going to take decades before I can read
basic signs or have a conversation. I thought about making an app like yours,
but also got quite intimidated by the scale.

Is there some way to see your character composition database? Lots of words
that look like 包 sound like "bao" or "pao", but I haven't yet seen any
patterns in the few characters I've actually learned.

~~~
barry-cotter
It won't take decades. I have not been the most diligent of students in my
four years working in Shanghai and it was over s year before I even tried to
learn to read a bit. It's really easy to get by in Shanghai with very, very
little Chinese too.

You won't be able to get cheap one on one tutoring in Taiwan but you can get
pretty cheap tutoring over italki.com or another language exchange website.
Putonghua and Guoyu aren't that different anyway.

For learning to read the best way in the long run is to hand write it but it's
a massive pain. Skritter is almost as good and takes care of revision for you.
It has a great spaced repetition system inbuilt. You should also get Pleco.
The free version is excellent, the paid version is cheap if you actually use
it. It also has a flash card SRS built in for practicing vocabulary.

There are a number of good simplified Chinese graded readers. I'm not familiar
with any for traditional Chinese but if you look through tokenadult's comment
history you'll find some recommendations.

------
archagon
Tangent: One of the nice things about OSX is that it has a pretty robust set
of input method APIs. I used them to make a transliterating Russian
keyboard[1] (though I only scratched the surface) and I think they would work
really great for something like this as well. Typing in a browser (in my case,
using translit.ru) works alright, but nothing beats native support!

[1]: [https://github.com/archagon/cyrillic-
transliterator](https://github.com/archagon/cyrillic-transliterator) (warning:
not very polished!)

------
_chu
Finally! Looking forward to sending this to my parents after it develops a bit
more. They've struggled with communication for years (decades?) now because
they can't type using phonetics.

~~~
WuhouIM
That's exactly my goal! I'm very much interested in supporting people who
speak minority dialects. I don't feel that technology should be pushing
Mandarin on people.

------
zhenjl
I really want to like this. Maybe what the developer can do is to build a
simple app that translates a Chinese character into a series of keys. That
will ease some of the frustrations on not being able to figure out how to type
using this input method. For example, I am trying to type 瓦 but for the life
of me I can't figure out how.

edit: removed "shift doesn't work"...does seem to work.

~~~
zhenjl
Ok, I figured it out...瓦 = qRre

------
inimino
I would be interested to read more about the history of this idea and how it
compares to existing input methods.

------
zhenjl
Ok, trying to write 国家栋樑. Unfortunately I only got the first two characters..国
is Fqge, 家 is ei(, but haven't figured out how to do the last two. Any idea
how to get the last two?

Quick suggestion: when shift key is not down, should show lower case letters.
When shift key is down, show upper case letters.

------
jiqiren
Think I'll stick with the pinyin input method.

I spent about 20min trying to figure out how to write 白 and still cannot.

~~~
WuhouIM
It's 丿日.

Did you figure out that you have to press Shift to see some of the radicals?
Apparently that's not obvious. My bad.

------
runeblaze
A basic lookup table would be very nice to get started with this input method.
I am currently trying to find out how to type "武后".

~~~
wodenokoto
It's very similar to how you would look up characters in a paper dictionary
and as such most dictionary apps iknow of have a similar feature.

Start at the top left and go counter clockwise and look for compounds.

Try typing 一戈 to get 武

------
Grue3
I like this a lot. I didn't even read how it works, and was able to input some
Chinese characters I know. Incredibly intuitive.

~~~
zhemao
I actually had the opposite impression. Tried to write my Chinese name but
couldn't figure out which pieces to put together. There seem to be a lot of
shorthand radicals but not enough basic strokes.

------
hebby06
It is really awesome as a side project. Super curious how he build the big
(19K) lookupMap, anybody knows?

------
jiyinyiyong
Nowadays voice input in pure Chinese is already fast enough for daily usages.

------
laurent123456
Is it actually a new system or is it an implementation of an existing one?

~~~
WuhouIM
New system.

------
wodenokoto
How do I write "日月"? I can only write "明"

~~~
spiznnx
space after each character.

so 日 <space> 月 <space>, or c <space> v <space>.

