Well, it can, except you then need to go through and update all of your internal APIs to be correct.
Really the string transition was just a poor choice in my opinion. Python2 already had unicode strings that were easy enough to specify (just prefix with a `u`).
It would have been better to just delineate that barrier better from an API standpoint.
I understand the appeal of having unicode for the default string literal type, but it was actively hostile to existing projects.
> Well, it can, except you then need to go through and update all of your internal APIs to be correct.
You do, but it's easy: run a compile, fix the errors, repeat until no more errors.
> It would have been better to just delineate that barrier better from an API standpoint.
Isn't that exactly what the Python 3 transition was? i.e. stop accepting non-unicode "strings" (actually just arbitrary byte sequences) for APIs that semantically require a string, reserve them for APIs that actually want a byte sequence.
> You do, but it's easy: run a compile, fix the errors, repeat until no more errors.
The reason this doesn't work is that previously the double-quote literal was a "string" type. The string type was, yes, just a sequence of bytes, but in an ascii-centric world that also mean text.
Python2 added unicode string literals that accepted unicode code points. Most APIs were happy to sloppily mix the two and generally work quite adequately.
Python3 then made the hard distinction between byte-string and unicode-string. Not an unreasonable position to take on the face of it. The issue is many python2 APIs were written from the perspective of "accepts string literal types", where that could be either bytestring or unicode string.
Now suppose you have a large codebase in python that spans the entire stack from database interaction, to webserver, to desktop application. All built on double quoted string literals. Accepting unicode strings in the places that needed that (user-facing places mainly, utf-8 bytestrings anywhere being stored on disk or sent over network)
Then you go to switch to python3, and suddenly all of your string literals are interpreted as unicode instead of bytestring / ascii sequences. So now you need to go through every place in your codebase that accepts strings as an argument and determine, "is this a user-facing string, or a utf-8 bytestring", because they used to be basically the same thing, and now they aren't.
It's not "difficult" really, it's just a pain in the neck.
None of that would be a problem in a typed language. The ultimate destination of any string literal is some standard library function, whether that's write to network socket, display to user, or something else. So you just ripple backwards from that through your own functions that are calling those standard library functions, until you get to the point where you're passing in the literal, and then you know what kind of literal it needs to be.
> None of that would be a problem in a typed language.
Python is dynamically typed and weakly typed, but still typed. That's precisely the problem! The difference is just that a statically typed language gives you all the information, and a dynamically language doesn't, but still fails. Just without providing you the necessary information up-front.
> Python is dynamically typed and weakly typed, but still typed.
People who claim that dynamic typing is a thing claim that Python is strongly typed. (This is of course nonsense; there's no such thing as dynamic typing, because types are by definition something that expressions in a language have, not something that runtime values have).
That is not a "nice explanation". It is writing to obscure rather than to clarify. And it certainly acknowledges that one cannot have differently typed values in a dynamic language.
That's not really much different from the path we took. It's just instead of running the compiler, we ran the linter and test suite until things passed. It's just when you have a million lines of code that takes quite a while.
Really the string transition was just a poor choice in my opinion. Python2 already had unicode strings that were easy enough to specify (just prefix with a `u`).
It would have been better to just delineate that barrier better from an API standpoint.
I understand the appeal of having unicode for the default string literal type, but it was actively hostile to existing projects.