> Currently, a certain range of Unicode blocks, including Basic Latin, are counted half as much as other blocks, such as CJK Unified Ideographs.
Has anyone made a Twitter clone where the "character limit" is simply a limit in the message's UTF-8 byte size? Seems like approximately where they ended up anyway. (Except, I guess, with Emoji being unwontedly cheap.)
That would penalise cyrillic languages and Greek, which use more than one byte per letter, but need roughly the same number of letters per tweet as English.
Nah, I meant that Twitter's current logic is like counting UTF8 bytes, but that they Twitter's scheme gives an emoji codepoint a cost of 1, whereas it "should" have a cost of 4.
Has anyone made a Twitter clone where the "character limit" is simply a limit in the message's UTF-8 byte size? Seems like approximately where they ended up anyway. (Except, I guess, with Emoji being unwontedly cheap.)