Hacker Newsnew | past | comments | ask | show | jobs | submit | anotheronebites's commentslogin

ASCII and unicode are subtly different things.

you should compare ASCII with one of the UTF encodings of unicode.

the 'unicode' for ASCII's would be just the english alphabet extended with digits and a few more special control characters.


That's what the GP is saying. You can't really tell a priori what encoding to use for a stream of "text" (bytes). Without some sort of metadata about the stream you just have to guess. Convention will help you make an informed guess but it's not guaranteed to be correct. Then stuff breaks in unexpected and stupid ways.


Constraining Unicode encodings to a fewer than 4 bytes means we limit how many countries can use text interfaces in their language. Or how much data from those countries can be passed between programs.


We do not have enough countries to fill up all that space. For UTF-8 with a 4-byte restriction, less than 18% of the available space is currently allocated to blocks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: