
Chinese Twitter users live in a density 2x to 8x their English counterparts. - audreyt
http://pugs.blogs.com/audrey/2009/10/our-paroqial-fermament-one-tide-on-another.html
======
lsb
Michael Mitzenmacher, at Harvard, had a paper in the 2003 IEEE Data
Compression Conference that gave empirical evidence that translations
compressed to roughly similar sizes (using the Bible and the EU texts), but
had wildly varying sizes uncompressed; this correlates well with linguistic
theories.

ftp://ftp.deas.harvard.edu/techreports/tr-12-02.ps.gz

~~~
est
similar post on reddit:

[http://www.reddit.com/r/programming/comments/9sjt0/rudebox_b...](http://www.reddit.com/r/programming/comments/9sjt0/rudebox_by_alcatraz_4091_bytes_of_magic/c0e91bl?context=1)

------
megaduck
This is one of the reasons that texting is so darn popular in China, and email
is not. Texting is _fast_ , and you can pack so much info into a single SMS
that you almost never need to send anything longer.

The same goes for books and essays. Chinese books and magazines are often
shorter, just because the information density is so high. It's a neat feature
of the language.

However, sometimes it can be demoralizing when you spend all evening writing
something and realize that you've only produced a single page of text.

~~~
jbert
> This is one of the reasons that texting is so darn popular in China, and
> email is not. Texting is fast

I don't see why texting in Chinese should be faster than in English.

Don't you need to push a similar number of bits through the numeric keypad?
I'd imagine that to be the limiting factor.

i.e. English texters need to send more chars, but they're choosing from fewer
chars so need fewer keypresses to select each one. Chinese texters need to
send fewer chars, but I imagine they need to make more keypresses to select
each char.

~~~
quant18
The better Chinese input methods achieve ~3 keystrokes per character on
handheld devices. [1] IIRC average word length in English is around 5 or 6,
vs. around 2 for Chinese. So the input time should be about the same.

The real bottleneck isn't the actual typing, but thinking of something to type
--- and it's a lot easier to think of how to say something verbosely rather
than concisely!

[1] [http://www.pascal-man.com/navigation/faq-java-
browser/2009_S...](http://www.pascal-man.com/navigation/faq-java-
browser/2009_SMC_G6.pdf)

~~~
megaduck
You're right that Chinese and English can be comparable in input time,
provided that you're using something like t9 for the English.

However, you lose a ton of speed the moment that you hit a word that isn't in
your t9 dictionary. Chinese input methods don't have that problem. So, the
best case for English usually equals the average case for Chinese.

Of course, since texting is the primary communication mode for many young
Chinese, they can usually text blinding fast in _any_ language.

------
jrockway
If you are willing to do a bit of encoding/decoding, then you can map 2 or 3
latin-1 characters onto one unicode codepoint, and then tweet with that
instead.

My understanding of UTF-8 indicates that you can actually represent any number
as one character, but somewhere in the xterm / firefox / twitter pipeline,
that gets fucked up. I think I have some code on github for this, actually:

<http://gist.github.com/191446>

The idea is to pack any utf-8 string into one character. It works for about 3
or 4 ASCII characters, but I think this is a perl bug rather than some
fundamental limitation. Patches welcome.

(As an aside, I am always pleased when I get to use the (>>=) operator in
Perl. And yes, I do pronounce it "bind" and not "right-shift-equals" ;)

~~~
tlrobinson
UTF-8 encodes any Unicode character using between 1 and 4 bytes. Also, some
byte sequences aren't valid UTF-8. I don't think it's a Perl bug.

------
johannchiang
Kaifu Lee mentioned the same observation in a talk. Chinese news titles
usually bear sufficient information that Google news in Chinese doesn't need
excerpt as counterpart in English. A Tweet in Chinese could be an essay.

