I heard several people saying that 3.6 was the first Py3 worth transitioning to....

erichurkman · on Sept 9, 2019

For me, having run a slew of large Python code bases, 3.3 was the first 'useful' version to upgrade to as it re-enabled u"string" syntax. Without that, making 2/3 compatible code was hard.

3.5 fixed % formatting for bytes, further making 2/3 transitions easier (or harder, depending on your use/abuse of strings vs bytes).

3.6 added the first 'exciting' new feature: f"formatted string {literals}" if you don't have an asynchronous type project that can make good use of async/await.

pfranz · on Sept 9, 2019

I do think "worth transitioning" is the final group of people who have the most reason to hold off. If new features are what made it worth transitioning, then it probably was 3.6 with async/await. If you relied a lot of unicode and already had a mature codebase, you'd wait too. If it was a new project you'd probably start with 3 much earlier because of unicode.

sametmax · on Sept 9, 2019

Ordered dicts and f-strings are also very sweet!

loeg · on Sept 10, 2019

https://docs.python.org/2/library/collections.html#collectio...

price · on Sept 10, 2019

I expect sametmax was referring to the new feature in Python 3.6 that plain old `dict` now preserves the insertion order of its elements. See docs here: https://docs.python.org/3/library/stdtypes.html#dict.values

(Or in language-lawyer terms: CPython 3.6 introduced this behavior and documented it as an implementation detail, and for Python 3.7+ it's guaranteed a feature of the language proper.)

GeorgeTirebiter · on Sept 9, 2019

And optional type specifiers. This was the key for me.

pnathan · on Sept 9, 2019

The Py3 string handling was beyond frustrating prior to 3.6. For what I do, Unicode complexity is not required. Having to stuff it into everything was more trouble than it was worth.

WorldMaker · on Sept 9, 2019

The ubiquity of emoji alone mean that Unicode is everyone's problem in 2010+, and ignoring it won't make it go away. It's Python 2.x where dealing with Unicode (which is everywhere) is far too complex, and more trouble than it is worth. The "complexity" of Python 3 string handling is worth it, and no worse (even < 3.6) than any other modern programming language, and possibly easier than some by making clear runtime errors where accidents are easiest to make. (Though arguably, perhaps I'm biased by having done plenty of Unicode work in other languages that Python 3 did seem so familiar and easy from the start for my Unicode needs. Your mileage of course varies, we all have different backgrounds.)

comex · on Sept 10, 2019

Python 3's Unicode handling is uniquely bad. I haven't heard of any other language where you can obtain magic strings that crash the program if you try to print them:

    % python3.7 -c "import sys; print(sys.argv[1])" "$(echo -e '\xff')"           
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 0: surrogates not allowed

price · on Sept 10, 2019

Oh, that's a fun example. ("Fun", anyway.)

The key surprising thing that's going on here is this clever hack (clever, but a hack):

> In Python, file names, command line arguments, and environment variables are represented using the string type. On some systems, decoding these strings to and from bytes is necessary before passing them to the operating system. Python uses the file system encoding to perform this conversion ... > > On some systems, conversion using the file system encoding may fail. In this case, Python uses the surrogateescape encoding error handler, which means that undecodable bytes are replaced by a Unicode character U+DCxx on decoding, and these are again translated to the original byte on encoding.

https://docs.python.org/3/library/os.html#file-names-command...

This is meant as a way of fudging the fact that (a) for UI purposes, you want to treat filenames as text strings; (b) your Linux filenames are probably all encoded as UTF-8 (or your locale encoding); (c) but they might not be -- they could be arbitrary bytes, except only NUL; (d) and if they are, you really want to not munge the name when you go back and try to operate on the file.

The fudge is that filenames get decoded as (by default) UTF-8... but if invalid, the offending bytes get stuffed into the UTF-16 surrogate space. Then filesystem APIs encode as UTF-8, except they look for that surrogate hack and turn those to the original bytes, so it all round-trips.

It goes pretty wrong if you try to hand such a hacked-up string to something that just expects to encode normal real Unicode with UTF-8, though. That's what's happening in your example.

The magic words are `os.fsdecode` -- that's how you get back bytes round-trip clean from that hack.

baq · on Sept 10, 2019

what an excellent writeup of why modern languages have something like 4 stringy types: bytes, unicode, os string, path string. you could go further if you want to talk to other computers.

overgard · on Sept 10, 2019

That seems more like a difference of opinion on how to handle a bad string than an example of python 3 being bad. I'd argue failing silently is worse, as in programs outside of your toy example you might end up storing badly encoded data etc. Broken data is way worse to deal with than a crashing app

pnathan · on Sept 12, 2019

Nothing I do in my ordinary routine full-time engineering work for the past decade has required or does require anything outside the ASCII character set or, alternatively, plain bytes. The use cases my users present to me does not include emoji or non-ASCII characters. If those arise, naturally, one would select tools to address those needs. I don't know the future. In a year or two, my routine work may, of course, change.