Interestingly on my Ubuntu box, before I put the std::locale in it was printing "I have EUR100 to my name.", which is a pretty cool fallback.
I mean, really...C++ can work with individual bits; it can obviously support Unicode strings. The problem here is one of making the compiler aware of the details of the platform on which it's being used. The blame here appears to lie on Apple:
I'm just speechless.
Realistically, for serious text processing you need to leave the entire broken POSIX locale system and C++ wchar_t alone and work with libraries such as ICU.
In fact, I'm guessing that Apple never got around to fixing this because they probably don't use std::locale for real i18n work internally.
(update: the ICU homepage says that Mac OSX uses ICU internally -- http://site.icu-project.org/)
Try to avoid POSIX locales at all cost. Use ICU!
And yes, they are flawed, seriously so, mostly due to a lack of ambition. setlocale assumes that the locale the user is currently working in is also the locale the user has always been working in since time immemorial, and it assumes there is only one user. If you have a multilingual document, you're in trouble. If you use the locale date/time printer for anything other than immediate input and output to the user, you're in trouble (and there are circles of hell that certain Usenet newsreader authors will burn in for all eternity for this one). Any time you write a file out to disk with one locale and read it in with another you are potentially in horrible danger. This is where something like ICU comes in really handy; any time you have to deal with data from several different origins, you will be ill served by locale. So in that sense, you're quite right.
But all this is a very different problem from making printf work the way it is advertised as doing, from making it possible to output a Unicode string that you have told the computer in as many ways that you wish to output in UTF-8, write in UTF-8. If you want to do simple text output and not deal with ICU, GNU recode, whatever, you need to use setlocale or your libc will mutely suppress everything but 7-bit ASCII, and you will be very, very confused.
And because everybody else is ill served I recommended to use ICU and don't touch setlocale() for processing unicode text. I'm not sure where we disagree. What makes you think that I feel well served by POSIX locale?
std::cout << s << std::endl;
Also, he's completely wrong about the program terminating on output of a wide string, it's just that wcout is broken. If he had tried causing output afterwards with, say, printf(), he'd notice the program is still alive. Or running it in GDB would show this just as well. It's generally speaking a bad idea to test whether your program is alive, using the same mechanism that you suspect of killing it!
$ native2ascii Foo.java
public class Foo
public static void main(String args)
System.out.println("I have \u201a\u00c7\u00a8100 to my name.");
So (untested, because I'm baking a cake) printing to fdopen(1, "wb") should just send bytes, allowing printf etc to work?
MacRoman? In 2009? Really?
It never would have occurred to me to use the built-in locale stuff. That's heading for a world of hurt.
Yeah right, Unicode might be impossible in C++ ...
... on OS X
What setting the locale to UTF-8 does is require (well, for very weak definitions of require; the standard is pretty damned quiet on anything other than C or POSIX locales) that the Unicode character sequence in question be output in the UTF-8 encoding. Presumably you could define a locale en_US.UTF-16 that output in UTF-16, although I would point and laugh if you did.
In sum: C89 doesn't say a damn thing of any use about wide chars; just that they exist and here's some convenient functions (and that only in TR1). C99 requires that you be able to specify characters using Unicode code points in wide character strings but otherwise does not specify input or output. The locales, which are standardized only by convention, talk about encoding from a well specified (usually) disk format to whatever the library's internal representation is.
If you create a wide-char string (C++'s most obvious candidate for full Unicode-level support), and print it out to the console using wcout, the programmer should reasonably expect the library to perform transcoding as necessary to match the destination (providing things are configured correctly).
The fact that \u20ac maps to a code point in UCS2 or UTF-16 when encoding a wide string in text format for the compiler to read, is all but irrelevant. So long as the final data at runtime is in a valid encoding that matches its type, the runtime library should handle everything from there.