The problems solved by LANG or LC_ALL are not solved by UTF8 alone. Even if you use UTF8 for all your input and output, there is still the question of how to format numbers and dates to the user and how to collate strings.
These things are dependent on country and language, sometimes even varying between different places in a single country (in Switzerland, the German speaking parts use . As the decimal separator, while the French speaking part prefers ,)
These things are entirely independent of the encoding of your strings and they still need to be defined. Also, because it's a very common thing that basically needs to happen with every application, this is also something the user very likely prefers to set only once at one place.
Environment variables don't feel too bad a place.
This localization BS has spawned an entire race of nonsense, where, for example, CSV files are not actually CSV in some regions, because their values are not COMMA-separated (as the name implies), but semicolon-separated. And we, programmers, have to deal with it somehow, not to mention some obsolete Faildows encodings like CP1251 still widely used here in lots of tech-slowpoke organizations.
So: one encoding, one datetime format, one numeric format for the world and for the win. Heil UTF-8!
as we're talking encodings: The worst file I ever had to deal with combined, within one file, UTF-8, cp437 and cp850.
I guess they had DOS and Unix machines bot no Windows boxes touching that file.
This is a problem that won't go away. Many developers are not aware of how character encoding, let alone Unicode, actually works and, what's the worst about this mess, many times, they can get away without knowing.
Humans find thousands separators useful. You're asking humans to give up useful things because they're hard to program.
That said, I idly wonder whether they could be implemented with font kerning. The bytes could be 123456.78, but the font could render it with extra space, as 123 456.78.
I don't know if it's possible with current font technology, and there are probably all sorts of problems with it even if it is, but it might be vaguely useful.
I agree though that this can (and should) be solved at font-rendering level, not at an application level.
See also paper sizes and electrical power outlets.
Your point's correct, but linefeed hasn't died: it's still the line-ending on Unixes. Old Macs used carriage return; Windows use carriage return line feed; Unix uses linefeed. I don't know what Mac OS X uses, because I stopped using Macs before it came out.
I use miles for the sport of running. This is because 1609 meters is close to 1600. Four laps around a standard 400 meter track is about a mile and everything follows from that. All my training is based on miles. I think of paces per mile. If I'm traveling abroad and some hotel treadmill is in kilometers and km/h, it annoys the heck out of me.
However, paradoxically, road signs and car speedometers in miles and miles/hour also annoy the heck out of me; though at least since I use miles for running, at least I'm no stranger to the damn things.
For laying out circuit boards, I use mils, which are thousandths of an inch: they are a subdivision which gives a metric air to an imperial measure. This is not just personal choice: they are a standard in the electronics industry. The pins of a DIP (the old-school large one) are spaced exactly 100 mils (0.1") apart, and the rows are 300 mils apart. So you generally want a grid in mil divisions. (The finer-grained DIPs are 0.05" -- 50 mils.).
There is something nice about a mil in that when you're working with small things on that scale, it's just about right. A millimeter is huge. The metric system has no nice unit which corresponds to one mil. A micron is quite small: it's 25.4 mils. (How about ten of them and calling it a decamicron? Ha.)
Inches themselves are also a nice size, so I tend to use them for measuring household things: widths of cabinets and shelves and the like. Last time I designed a closet shelf, I used Sketchup and everything in inches.
Centimeters are too small. Common objects that have two-digit inch measurements blow up to three digits in centimeters.
Centimeters don't have a good, concise way to express the precision of a measurement (other than the ridiculous formality of adding a +/- tolerance). In inches, I can quote something as being 8 1/16 inch long. This tells us not only the absolute length, but also the granularity: the fact that I didn't say 8 2/32 or 8 4/64 tells you something: that I care only about sixteeth precision. The 8 1/16 measurement is probably an approximation of something that lies between 8 1/32 and 8 3/32, expressed concisely.
In centimeters, a measurement like 29 cm may be somewhat crude. But 29.3 cm might be ridiculously precise. It makes 29.4 look wrong, even though it may the case that anything in the 29.1-29.5 range is acceptable. The 10X jump in scale between centimeters and millimeters is just too darn large. The binary divisions in the imperial system give you 3.3 geometric steps inside one order of magnitude, which is useful. For a particular project, you can chose that it's going to be snapped to a 1/4" grid, or 1/8" or 1/16" based on the required precision.
So for these reasons, I have gravitated toward inches, even though I was raised metric, and came to a country that turned metric before I got here. (And of course, the easy availability of rulers and tape measures marked in inches, plus support in software applications, and the enduring use of these measures in trade: e.g. you can go to a hardware store in Canada and find 3/4" wood.)
P.S. And yes, my ruler is made from aluminium, not aluminum.
Both the words "aluminium" and "aluminum" are British inventions. Both derive from "alumina", a name given in the 1700's to aluminum oxide. That word comes from the Latin "alumen", from which the word "alum" is also derived.
"Aluminum" was coined first, by English chemist Sir Humphry Davy, in 1808. He first called it "alumium", simply by adding "-ium" to "alum" (as in, the elemental base of alum, just like "sodium" is the elemental base of soda), and then added "n" to make "aluminum". In 1812, British editors replaced Davy's new word with "aluminium", keeping Davy's "n", but restoring the "-ium" suffix which coordinated with the other elements like potassium.
North Americans stuck with Davy's original "aluminum".
In Slovakia, we have a nice word for it: hliník, derived from hlina (clay).
Also, how on earth is it a good idea to make the core string routines in the library be influenced by this cruft? What if I have some locale set up, but I want part of my program to just have the good old non-localized strcmp?
The C localization stuff is founded on wrong assumptions such as: programs can be written ignorant of locale and then just localized magically by externally manipulating the behavior of character-handling library routines.
Even if that is true of some programs, it's only a transitional assumption. The hacks you develop for the sake of supporting a transition to locale-aware programming become obsolete once people write programs for localization from the start, yet they live on because they have been enshrined in standards.
Can I really expect it to work if I set
How would the two encodings be used? How would they be used in a message consisting of both monetary and datetime?
Should the setting not be one for encoding (selected from a range of encodings), then settings for formatting and messages (selected from ranges of locales), then finally a setting for collation which is both a locale and an encoding? Or is the linux locale system simply using these as keys, so in reality there is no difference in LC_TIME whether you use encA or encB, it will only use the locale prefix en_GB?
Full month names would be encoded in encA. Currency symbols in encB. Is it a good idea? No.
>Should the setting not be one for encoding (selected from a range of encodings), then settings for formatting and messages (selected from ranges of locales), then finally a setting for collation which is both a locale and an encoding?
I would argue an encoding setting should not be there to begin with or at most be application specific because that really doesn't depend on system locale (as long as the signs used by the system locale can be represented in the encoding used by the application).
I was just explaining why LC_* should exist even on a strictly UTF-8 everywhere system. I never said storing the encoding in the locale was a good idea (nor is it part of the official locale specification - it's a posix-ism)
It's even worse when things assume that my date preferences reflect my unit preferences. I prefer standard units (feet, pounds, knots &c.) and British/Continental dates: I don't want to use French units, nor do I want to use American dates. And yet so much software assumes that it's all or nothing.