Somebody immediately saying it's an encoding error accompanied by several people "troubleshooting" by copy and pasting the resulting text to other programs, and ending off with someone three years later posting in all caps asking for help with a completely unrelated problem.
I find this line astonishing. I was hoping there would be more to this story than "we fixed Notepad and left OS function broken" but after following the reference it seems they just didn't think it worth fixing.
Even though, I must say that the fact that now Windows puts a useless BOM at the beginning of every file is very annoying.
Some of the weirdness in the Unicode spec even comes from the need for backwards compatibility. 17 planes and 1,114,078 total usable codepoints... these numbers would not have been arrived if the system had the foresight that 16-bit wasn't enough in the first place. They were derived from reassigning private use area codepoints into surrogate pairs for UTF16. Unicode would probably have rather (maybe should have) started out as a 32-bit spec and avoided this mess from the get-go.
Maybe "older" but very much current, Qt's QString type internally uses UTF-16. https://doc.qt.io/qt-5/qstring.html
Sadly, it wasn't it. 16 bits weren't enough, and stuff like the fact a Unicode rune ≠ printed character (i.e. [è] can either be a single codepoints, or a combination of a [`] modifier and the latin letter [e]) meant there was basically no point in using 16/32 bit chars in the first place. When people really understood this it was almost the '00s, and stuff like Python, Windows, macOS (due to NextStep), Java, .NET, Qt were stuck. It's impossible to go back to plain `char` without annihilating backward compatibility, so everyone kept using it internally.
Fun fact, some of those languages and frameworks I mentioned never bothered switching completely to UTF-16 - for instance, Python now uses a weird mixture of ASCII and UCS2 internally.
Really? Last I heard (PEP 393), the rule was: "8 bits if all codepoints are less than 256 (i.e. Latin-1); 16 bits if all codepoints are less than 2^16 (i.e. BMP); otherwise 32 bits". This means that text with all Latin-1 characters (which are approximately the first 256 codepoints of Unicode) will be stored internally as, well, Latin-1. This implies that ASCII strings are stored as ASCII.
With these languages not even providing alternative API's that are easy to use means it'll be some time before we don't have to suffer this.
The other that I remember her sending me was along the lines of "OMG if you enter that planes flight number into MS Word 97 and set the font to Windings <variant whatever> you get a picture of a plane, two buildings, a skull and a Star of David!!1!eleven".
On a side note: If you entered the right combination of text into Excel 97, you could fly a plane over a fractal landscape ;-)
The kind of "logic" at work here is still used nowadays (e.g. the Sandy Hook school shooting conspiracy theory, with followers pointing towards the name appearing in a movie at the time). It still eludes me what the logic behind this is supposed to be. So if you plan a massive government conspiracy, you make sure to plant very precise, hidden clues all over the place in movies, TV shows, random office software and similar things years in advance, because... um.... why exactly?
I wonder if anyone with more historical perspective knows of any older examples of these blatantly false theories from other eras or if this type of conspiracy theory is unique to the digital age?
The Salem Witch Trials comes to mind
Similarly, a story came out during the Windows 3.1 era about how typing NYC with your font set to Wingdings yielded a skull and crossbones, a Star of David, and a thumbs up sign, stoking fears of antisemitism in a time of religious tension and a revitalization of the right wing after the Waco siege. While it is true that the letters in NYC mapped to those symbols, it was not deliberate. In the successor font Webdings, the letters NYC were deliberately mapped -- to an eye, a heart, and a city skyline (referencing "I love New York").
Edit: Minor clarification: I'm referring to past experience translating Chinese to English in particular.