
Our Discovery of Cramming (2017) - kevlar1818
https://blog.twitter.com/engineering/en_us/topics/insights/2017/Our-Discovery-of-Cramming.html
======
derefr
> Currently, a certain range of Unicode blocks, including Basic Latin, are
> counted half as much as other blocks, such as CJK Unified Ideographs.

Has anyone made a Twitter clone where the "character limit" is simply a limit
in the message's UTF-8 byte size? Seems like approximately where they ended up
anyway. (Except, I guess, with Emoji being unwontedly cheap.)

~~~
nicky0
Expensive, I guess you mean.

~~~
derefr
Nah, I meant that Twitter's current logic is like counting UTF8 bytes, but
that they Twitter's scheme gives an emoji codepoint a cost of 1, whereas it
"should" have a cost of 4.

~~~
nicky0
Ah, got ya.

