Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(from the first puzzle)

> The venerable SMS system uses a message limit of 160 bytes. This was designed so that a message could fit in exactly one packet, thus being really cheap and fast to handle on first-generation mobile phone networks. Although the approach makes sense for technical reasons, it unfairly penalizes people who use non-latin (e.g. Russian, Greek, Japanese) alphabets - in most encodings, they need more bytes per character than latin alphabets.

Except that obviously the system is going to use an encoding that makes sense for the local language. It was long remarked that Chinese Twitter users enjoyed a less restrictive limit. [Practically no limit at all, since as this puzzle notes Twitter limited by the character instead of the byte.]

You need two bytes per character in Chinese (unless you really want to use UTF-8).

    是她吗?                         -  8 bytes
    Is that her?                     - 12 bytes
    是的                             - 4 bytes
    Yes                              - 3 bytes
    我做了很多宝宝的表情包             - 22 bytes
    I made a lot of stickers of her  - 31 bytes
This doesn't look like a penalty to me. If we did switch the Chinese into UTF-8, it would take about as much space as the English.



SMS in Europe was max 140 bytes and they had various custom 7bit encodings for most western languages. SMS also supported ucs-2 aka Unicode with fixed 16bit codepoints which cannot do modern emojis, but all normal languages can be shown, whether your phone has/had the fonts was another matter.

And when concatenating the SMS messages the UserDatHhader had to be added taking minimum 6 bytes, thus reducing the bytes available from 140 to 134 bytes, which allows only 153 or 67 characters for each 7bit or unicode SMS messages respectively.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: