Python tools for backups are my worst terror beacuse of that - they kept destroying data of our clients because they dared (gasp!) name files with characters from their own language or do unthinkable things, like create documents titled "CV - <name with non-unicode characters>.docx".
The fact that Python3 at least tries to make programmers not destroy data as soon as you type in a foreign name (which happens even in USA) is a good thing.
You can have badly written tools in any language. There are even functions to get file paths as bytes (e.g. [os.getcwdb](https://docs.python.org/3/library/os.html#os.getcwdb), it's just most people don't use them because it's rare-ish to see horribly broken filenames and not convenient.
Do other languages get this right 100% of the time on all platforms? I don't think so, it's just you've never noticed.
* C: has no concept of unicode strings per se, may or may not work depending on the implementation and how you choose to display them (CLI probably "works", GUI probably not)
* Rust: seems to assume UTF-8? (https://doc.rust-lang.org/std/ffi/index.html#conversions)
* Go: gets this right, but probably breaks on Windows? "string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text" (https://golang.org/pkg/builtin/#string)
In short, either I don't understand what point you're making, or it isn't unique to Python.
> This means that a plain string is defined as an array of 8-bit Unicode code units. All array operations can be used on strings, but they will work on a code unit level, and not a character level. At the same time, standard library algorithms will interpret strings as sequences of code points, and there is also an option to treat them as sequence of graphemes by explicit usage of std.uni.byGrapheme.
And perhaps my favorite part (https://tour.dlang.org/tour/en/gems/unicode):
> According to the spec, it is an error to store non-Unicode data in the D string types; expect your program to fail in different ways if your string is encoded improperly.
I should note that what I really like about this approach is the total lack of ambiguity. There is no question about what belongs in a string, and if it's not UTF then you had better be using a byte or ubyte array or you are doing it wrong by definition.
What I like about the D approach isn't that declaring it an error actually solves anything directly (obviously it doesn't) but that it removes any ambiguity about how things are expected to work. If the encoding of strings isn't well defined, then if you're writing a library which encodings are the users going to expect you to accept? Or worse, which encodings does that minimally documented library function you're about to call accept?
They’re conversions because they’re not UTF-8 in the first place, that is, they’re not String/str. The conversions are as cheap as we can make them. That language is meant to talk about converting from OsString to String, not from the OS to OsString.