The Python 3 string design also necessitates scanning and often transcoding every piece of string data that it encounters, both on the way in and again on the way out. That means that not only is the string type inappropriate for any data that might not be valid Unicode, it is also inappropriate for any data that might be large.
I’ve been meaning to write a blog post about how Julia handles strings, but haven’t yet gotten around to it. Among other benefits:
- You can process any data as strings and characters, whether it’s valid Unicode or not.
- If you read any data as strings or characters and write it back out, you get the exact same data back, no matter what it is, valid or not.
- Invalid characters are parsed according to the Unicode 10 spec.
- You only get an error if you actually ask for the code point of an invalid character, which is a fairly rare operation and must error since there is no correct answer.
- The standard library generally handles invalid Unicode gracefully.
- You can use strings for large data: there’s no need to look at, let alone transcode string data—if you don’t need to access something no work is required.