Not always. As far as I can tell writing garbage bytes to various APIs works fine unless they explicitly try to handle encoding issues. First time I noticed encoding issues in my code was when writing an xml structure failed on windows, all because of an umlaut in an error message I couldn't care less about. The solution was to simply kill any non ascii character in the string, not a nice or clean solution but the issue wasn't worth more effort.
> In python 3 it always blows up when you mix bytes with text so you can catch the issue early on.
That is nice if your job involves dealing with unicode issues. My job doesn't, any time I have to deal with it despite that is time wasted.
"Dealing with unicode" is really just about dealing with it at the input/output boundaries (and even then libraries handle it most of the time). But without the clear delineation that Python 3 provides, when you _do_ hit some issue you probably insert a "fix" in the wrong space. Leading to the classic Py2 "I just call decode 1000 times on the same string because I've lost track"
Interesting text follows company set naming schemes, which means all english and ascii. The rest could be random bytes for all I have to care about.
Many formats like plain text or zip don't have a fixed encoding and I am not going to start guessing which one it is for every file i have to read, there is no way to do that correctly. Dealing with that mess is explicitly something I want to avoid.
This is a lot of old code, and it's all ASCII, no matter what the locale of the system is. And even if the code was updated, all the messages would still be in some text == bytes encoding, because there's no "user data" involved, and the throughput desired is in many gigabytes of text processed per second.
So yeah, unicode is not "everywhere": it may be everywhere on the public internet, but there is a world beyond this.
So you can throw in your emoji and they might not correctly show up on the xml logging metadata I write, because I don't care. But they will end up in the processed file the same way they came in instead of <?> or some random Chinese or Japanese symbol that the guessing algorithm thought appropriate.
Also, there's no guessing happening in this instance. A locale configured in your environment variable are used if you open files using text mode.