Our Discovery of Cramming (2017)

derefr · on March 20, 2018

> Currently, a certain range of Unicode blocks, including Basic Latin, are counted half as much as other blocks, such as CJK Unified Ideographs.

Has anyone made a Twitter clone where the "character limit" is simply a limit in the message's UTF-8 byte size? Seems like approximately where they ended up anyway. (Except, I guess, with Emoji being unwontedly cheap.)

Asooka · on March 21, 2018

That would penalise cyrillic languages and Greek, which use more than one byte per letter, but need roughly the same number of letters per tweet as English.

nicky0 · on March 20, 2018

Expensive, I guess you mean.

derefr · on March 20, 2018

Nah, I meant that Twitter's current logic is like counting UTF8 bytes, but that they Twitter's scheme gives an emoji codepoint a cost of 1, whereas it "should" have a cost of 4.

nicky0 · on March 21, 2018

Ah, got ya.

on March 20, 2018

[deleted]

dang · on March 20, 2018

Added. Thanks!