

What's the equivalent of Twitter's 140 char limit for non-Latin charsets? - bensummers
http://bens.me.uk/2010/twitter-charset-experiment

======
dkersten
Since, presumably, the 140 character limit derives from SMS message limits, it
might be interesting to look at what the message lengths are for these (note
that this is GSM specific and I don't know the CDMA equivalents):

SMS payload is 140 octets. That means that you can store 140 8bit characters,
eg Latin1 encoding, in a text message.

However, the GSM default encoding is 7bit packed. Each character is 7bit and
they are packed together into the 140 octets, this leaves a maximum size of
160 characters per message.

For international characters, GSM uses USC2, which is a 16bit encoding,
limiting messages to 70 characters.

You can also send concatenated messages, where multiple SMS messages make up
the actual message text. In this case, you must subtract (from memory, so may
be off) 4 octets from the limits, for the user data header. The user data
header consists of a length and one or more information elements. An
information element consists of 3 octets: an information element identifier
(in this case, 0, for concatenated message) and some parameters (I'm not sure
if this is specific to the IEI? I could look it up, but I'm too lazy.. for
concatenated messages, the next octet is number of messages and the final one
is the current message index).

Hopefully someone found this information interesting ;-)

~~~
dkersten
Minor correction - information elements consist of an information element
identifier, a length and length number of octets for a total of "length + 2"
octets per information element. In the case of concatenated messages, IE
identifier is 0 and length is 3 and the 3 octets of data are id, part and
number of parts, in that order.

------
lmkg
It's an interesting experiment, but I don't trust the conclusion. Automatic
translation still tends to be fairly literal, rather than idiomatic. I have a
suspicion that this tends to increase the average message length. An
interesting experiment would be to translate English tweets into those various
target languages, and compare the two. If my hunch is right, the text will get
bigger in both directions. The ratio between the increases may be a good
measure of the information density of idiomatic expression.

~~~
bensummers
You're right, it's absolutely unscientific and the results are of dubious
statistical significance. Going through machine translation makes the whole
thing pointless anyway.

But it was fun to write, which I suppose was the actual point of it. One of
the joys of being able to program computers is that when you want to have such
a question answered, you can a short bit of code to get the answer.

------
jrockway
Speaking of which, I've been looking for a new Twitter-alike since Twitter
hard-limited tweets to 140 characters. It's just not enough.

Any suggestions?

