I agree with Rob's "UTF-8 everywhere". I took this approach in the TXR language. Its I/O streams output and input UTF-8, and only that. Period. (There is no virtual switch for alternative encodings.) Internally, everything is a wide character code point. I do not call the "fuck my C program function" known as setlocale, and no behavior related to character handling or localization is influenced by magic environment strings.

LANG and LC_ALL are the work of ISO C and POSIX; they are not the fault of Linux. Linux has these in the name of compliance; they were foisted upon the free word, essentially.

