anotheronebites's comments

anotheronebites · on Oct 5, 2022

ASCII and unicode are subtly different things.

you should compare ASCII with one of the UTF encodings of unicode.

the 'unicode' for ASCII's would be just the english alphabet extended with digits and a few more special control characters.

giantrobot · on Oct 5, 2022

That's what the GP is saying. You can't really tell a priori what encoding to use for a stream of "text" (bytes). Without some sort of metadata about the stream you just have to guess. Convention will help you make an informed guess but it's not guaranteed to be correct. Then stuff breaks in unexpected and stupid ways.

falcolas · on Oct 5, 2022

Constraining Unicode encodings to a fewer than 4 bytes means we limit how many countries can use text interfaces in their language. Or how much data from those countries can be passed between programs.

blueflow · on Oct 5, 2022

We do not have enough countries to fill up all that space. For UTF-8 with a 4-byte restriction, less than 18% of the available space is currently allocated to blocks.