Hacker News new | past | comments | ask | show | jobs | submit login

\u20ac specifies a code point, not an encoding. (specifically the code point 000020AC). The encoding of wchar_t in C89 is up to the implementor, and in particular GCC uses UCS4 on most platforms. But this is besides the point; C99 and C++ both require that any C language processor that runs across that sequence of characters to immediately interpret it as the Unicode character €, or rather to behave as if it did. So you can have function names like "b\u00EAte" if you were for some reason incapable of typing ê.

What setting the locale to UTF-8 does is require (well, for very weak definitions of require; the standard is pretty damned quiet on anything other than C or POSIX locales) that the Unicode character sequence in question be output in the UTF-8 encoding. Presumably you could define a locale en_US.UTF-16 that output in UTF-16, although I would point and laugh if you did.

In sum: C89 doesn't say a damn thing of any use about wide chars; just that they exist and here's some convenient functions (and that only in TR1). C99 requires that you be able to specify characters using Unicode code points in wide character strings but otherwise does not specify input or output. The locales, which are standardized only by convention, talk about encoding from a well specified (usually) disk format to whatever the library's internal representation is.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: