Hacker News new | past | comments | ask | show | jobs | submit login

I can see why this might be confusing—I chose the word "character" carefully. The annex you cited clarifies that noncharacters are code points that can be used in interchange. However, this does not mean that a noncharacter is a character.

The number of code points available for use as characters must, by definition, exclude the noncharacters. So, to expand on the original comment, Unicode defines 1114112 code points, of which 1112064 can be used in interchange, and 1111998 can be defined as characters. UTF-16 can only represent the 1112064 that are valid for interchange, and the 66 noncharacters should generally be avoided (especially U+FFFE).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: