echo "é" | iconv -t utf8 -f iso8859-15
I doubt these issues will go away within even, say, twenty years.
I've been writing code to clean up a 2013 database dump. The database stored everything in LATIN-1 fields. Not because the data is in LATIN-1, but because LATIN-1 will accept any byte value. This makes error messages during input go away. See this bad advice on Stack Overflow.
Some of the data is ASCII. Some is UTF-8. Some is Windows-1252. Some data is none of those, but is mostly ASCII except that there's a 0x9d once in a while. (Still haven't figured out what character set that is. From context, the ™ or ® symbol is intended.) So I have recognizers for these cases, and convert everything to UTF-8, testing every field value individually.
One column has garbaged non-English names. Someone had tried to "normalize" UTF-8 to lower case by using an ASCII lowercasing function on UTF-8 stored in a LATIN-1 field:
KACMAZLAR MEKANİK -> kacmazlar mekanä°k
Anita Calçados -> anita calã§ados
Felfria Resor för att Koh Lanta -> felfria resor fã¶r att koh lanta
There are a lot of Unicode-hostile environments out there. Java is old enough to always require explicit encoding declaration for pretty much any tool ... compiler, documentation generator, etc. Forget it at any one point and you get garbage. Reading or writing text files should always make the encoding explicit, but rarely does so. C#'s string methods all support, but don't require, a Culture parameter, without which you're practically guaranteed to do things like case conversion, or substring searches wrong in the general case. There was an awesome and long answer by tchrist on SO once about what the Perl boilerplate is to properly support Unicode for many or most circumstances (it's complicated and long and I doubt many people are going those lengths).
Point being, even when using something that supports Unicode well, the programmer still has to care, simply because text and language are messy things and it simply isn't possible to have a magic bullet that does everything right.
Here in CJK territory, using the wrong encoding makes the output so obviously broken  that mistakes are almost always caught before hitting production.
If you happen to be using Windows configured in a foreign language the first time you start Outlook, your inbox, sent mail, etc, folders are named according to that language, and will never change, and you'll have to live with non-standard names for the folders.
At least its teaches you how to configure folders manually in most email clients.
For anyone who doesn't know OSX, this translation happens on the UI level. Typing ls in a terminal gives you the real directory name.
Much like the printing press, I'm 100% certain that the computing (and the internet specifically) is altering human written language across the world.
It is just so much easier to avoid anything outside ASCII because you can be certain ASCII will always work - even though some awful MS Access -> CSV -> SQL -> SQL -> Excel -> SQL -> COBOL ETL pipeline. No matter what version of any software is being used.
Technology has always shaped written language and we should fight to do better but at this point it seems inevitable.
(To be clear: I'm not saying this is a good or desirable state of affairs)
Eg A Ą Å Æ Ä are all different letters in most languages, not simply a pronunciation guide. I think most European languages use at least two from that list.
Yes. In some languages those are actually not "markings" but denote proper letters, like in German ä,ö,ü and ß. But even if not, like in French, it can alter the meaning of words. E.g la != là. Therefore, for most Europeans and speakers of other languages that depend on more letters than ASCII provides, it is very annoying when that is not supported properly.
However, I have made the experience in a few cases that particularly Americans have a hard time understanding this. The remark about your wife not caring seems to be in this vein, too. Recently, I decided to convert our MySQL DB tables from latin1 to UTF8. (I wasn't even aware that we didn't have some form unicode, as our DB is only few years old, and I thought some unicode is the default nowadays everywhere. But then MySQL...)
Anyway, my CEO (also an American incidentally) was trying to keep me from it because he thought it's not high priority. However, we're about to go live in a French-speaking region, but which also has other indigenous languages (and therefore names), with their own "special" characters (I put "special" in quotes because for those languages, they're not "special" at all -- but I guess you get my gist by now).
Also, in previous jobs I have converted legacy systems to unicode and know what a pain it is down the road. Not to mention all the hard-to-find bugs if you don't do it, because some strings don't compare as they should, or people are just annoyed because their name is not shown correctly.
So I went ahead with the conversion anyway. We may never know for sure, but I'm convinced that I saved us some major customer frustrations, days of bug hunting and weeks of converting everything later, when existing data would need to be migrated.
So please everyone, just use UTF8 or some other unicode variant from the get-go. The few bits you might save otherwise are just not worth it.