Read, understand, and be glad that the interpreter isn't trying to "help" any more!
Or, at least, to me he is. This man helps so many people with nothing in return. He's a regular on some python IRC channels, he has personally helped me so much. He makes difficult concepts easy to understand. I encourage everyone to watch his pycon talks. Start with his talk on loops: https://www.youtube.com/watch?v=EnSu9hHGq5o
FunkyBob (Curtis Maloney) is also a staple with Django, he's spent years and years helping people and asking for nothing in return.
- Command line arguments
- Environment variables
- Files in general
- Many expected-to-be-human-readable fields of popular network protocols
In short, there are many situations where you want to treat a bytestring as a string, not as an array of integers.
If bytes in python 3 had acted like str in python 2 (except for the implicit conversions / comparisons with unicode strings), the situation would be a lot better. As it is, they feel like a second-class citizen designed to discourage use, and as a result are unsupported in most libraries that NEED to support them.
(edited for formatting)
However, as someone who as done a mixture of low level (e.g. system tools), high level (e.g. web apps) and network protocol programming, the Python 3 bytes/str model works well. If you really want to treat a 8-bit byte string as a string, you can always decode as "latin1". In my modern Python 3 code, I don't find a good reason to ever do that anymore.
As someone who thinks the Python 3 behaviour is largely the right behaviour (I'm unconvinced the solution used for filenames on POSIX is the right one, nor that assuming stdin/out isn't arbitrary bytes is the right choice), I still have a lot of issue with the Python 2 -> 3 migration. (Note I haven't read the article because it won't load here, nor on archive.is.)
As someone who has dealt with a fair number of codebases migrating over the past decade, I would like to have seen a clearer migration path. The route taken basically asked developers to go from:
return x == b"a"
If Python 2.6/7 had a mode like -b (which warns when bytearray and unicode are compared) that warned when str/unicode are compared, that would already have been a big improvement for the migration path. As it is, people have written tools that do this (unicode-nazi), but then you quickly run into the fact that the Python 2 stdlib does this all the time, making it hard to just try and resolve such comparisons within a Python 2 codebase. (Note Python 3's -b does warn for bytes/str!)
Now, at the same time as the behaviour of u"a" == u"b" changed, Python also changed the return type of (e.g.) os.listdir(). This means if you want to compare a different list loaded from elsewhere, you need to have that list in different types depending on whether you're running on Python 2 or Python 3. In a dynamically typed language, it's hard to make all these changes with confidence that you're actually fixing everywhere.
Agreed. This was a major mistake in the migration story for Py3. They bet on the static translation approach of 2to3, which is just inappropriate for a dynamic language like Python. Better to have doubled down on Python's dynamism by adding modes to the interpreter to suss out the code that wouldn't work on Py3.
Instead of which, we got a lot of “write-once, run-everywhere” nonsense, with everyone vying to bend their code out of shape in the most creatively unproductive ways possible. Absolutely ridiculous makework, and the Python community should’ve called itself on it. Unfortunately, the geeks love a challenge, far more than being told when they’re having a brain fart. Oh well, at least that whole shambles has finally just about run its course; here’s hoping its lessons are learned for Python 4.:)
I suppose it's moot at this point.
The result is the chaos of the present day.
That said, would anyone have been interested in a totally new encoding? For European languages which use mostly the same 26 latin characters with occasional diacritics and accents, UTF-8-with-incompatible-consumer degrades into occasional unreadable characters. But if your out-of-date browser or application gave you a "cannot decode this encoding" error, that might have caused a whole lot of pain during that transition. Not to mention that some of the same issues with OS/filesystem/language library interaction would probably remain.