
A Rebuttal for Python 3 - victorvation
https://eev.ee/blog/2016/11/23/a-rebuttal-for-python-3/
======
jcranmer
As this rebuttal points out, the original article basically finds two flaws in
Python 3 and then tries to puff them up as much as possible. These two
complaints are:

* Python 3 is not backwards-compatible with Python 2.

* Byte strings and character strings are different things so now I have to care about stuff.

Although I say that it's two things, it's really one thing since the main
element of incompatibility is the byte string/character string issue, since
it's that incompatibility that makes it hard to know what the right thing to
do is.

Having had to deal with internationalization before, I can honestly say that
there's only one way to handle it: convert all string data to your internal
representation (be it UTF-8 (Rust), UTF-16 (Java), or UTF-32 (Python)) at the
margin where you read/write it, and keep binary strictly separate from textual
in terms of representation. This is what Python 3 does and what Python 2
doesn't do. I have seen some people argue that Python 2's model is better
because it means that writing something like cat is harder, but that's
optimizing for the wrong thing. I've seen a build system written in Python 2
break for the sole reason that I thought to write 一 in the commit message.

As for the complaint that it's taking 9 years for Python3 to slowly become the
default instead of Python2, keep in mind that 15 years after the release of
C99, gcc was still using C89 by default (gcc 5 skipped C99 and went straight
to C11).

~~~
the_mitsuhiko
> Having had to deal with internationalization before, I can honestly say that
> there's only one way to handle it: convert all string data to your internal
> representation (be it UTF-8 (Rust), UTF-16 (Java), or UTF-32 (Python)) at
> the margin where you read/write it

I'm not going to go into the actual subtleties here but there is a lot more to
it than what you suggest here. However the biggest mistake you made here is to
equate that any encoding is equivalent here. There is a tremendous difference
between using utf-8 as internal encoding or a variable encoding like in Python
3. Rust and Go have free bytes/string conversion wheres Python 3 has none of
that.

~~~
jcranmer
Python 3's string types are internally ASCII (or Latin-1, I'm not sure),
UTF-16, or UTF-32 depending on the contents of the string, but the API
effectively assumes that the encoding is UTF-32.

The main point is that you don't want to be passing around binary data
internally and expecting people to decode charsets as needed--the knowledge of
charsets will usually be dropped.

~~~
the_mitsuhiko
> Python 3's string types are internally ASCII (or Latin-1, I'm not sure),
> UTF-16, or UTF-32 depending on the contents of the string, but the API
> effectively assumes that the encoding is UTF-32.

That is incorrect. The API assumes the string is O(1) indexable. It makes no
assumptions or guarantees about the internal encoding.

> The main point is that you don't want to be passing around binary data
> internally and expecting people to decode charsets as needed--the knowledge
> of charsets will usually be dropped.

This is an incorrect blanket statement and I can give you plenty of examples
where this is _exactly_ what you will have to do because the knowledge of the
encoding is not available until later. This is in fact a common problem that
Python 3 has because it decodes to unicode so early that it had to come up
with surrogate-escapes to roundtrip bytes in case the encoding it
guessed/inferred was wrong.

~~~
mattashii
He's talking about CPython, the default interpreter/compiler for Python in
many systems. CPython internally uses 1-byte representation if the maximum
Unicode code point of its chars is 127 or less (therefore ASCII), 2-byte
representation for values less than 32767 (thus utf-16, but only the first
half-page), and for the rest it uses UTF-32, which means all lookups can be
done in O(1).

Assuming UTF-8 encoding for source files is not that strange, as it is getting
more and more the default or preferred encoding for string-based I/O.

~~~
the_mitsuhiko
I understand how CPython works however the external API makes no guarantees
about any internal encoding. The implementation of the storage also changed
many times.

------
fao_
This is a great rebuttal against such an awful book for teaching.

Last year I recommended this book for a friend that was starting to learn to
program. After a few weeks I was talking to him about lists, and he didn't
understand what I was saying because it doesn't touch a _basic concept like a
list_ until Chapter 32! It takes literally two seconds to explain to a person
what lists are, maybe a little longer for syntax. Given that he invites people
to read other people's code early on (in one of the single digit lessons
IIRC), this is a serious omission.

In addition to this, much of the attitude towards programming in the book
isn't so much of "Woah! A new thing! It's so cool let's learn it" (i.e.
energetically curious), but tends to rebel against new things in a "Well, I
know it'll be hard and difficult..." way, which I honestly believe is damaging
to new programmers. Programming is one of those disciplines where if you don't
have a positive spirit towards both bugs and learning new things, you will get
worn down over time. I think it's best to either already possess such an
attitude or to show people documentation, etc. that helps them develop such an
attitude when learning.

I ended up prodding my friend to pick up Lua. Partly because he was more
interested in games development (and there is a great community for that
around the Love2D engine), because it is a small enough language to learn and
memorize in a week or two, and because I wanted to undo the damage that
reading LPTHW had caused on his attitude towards programming.

------
soyiuz
Mr. Shaw's book used to be a decent starter reference. I liked his step by
step short chapters, but there were some red flags. For example, he spent
several early chapters on string formatters, which are neither terribly common
nor useful for a beginner. Other key concepts like iteration were not
adequately explained, or introduced far too late in the progression.

Besides the demagoguery, this new piece displays shocking ignorance of basic
CS concepts. I will now actively warn my students against consulting his work.

Thanks to eeveee for a detailed rebuttal, which displays an appropriate amount
of reprobation, given the original's caustic tone.

Python is an incredible community effort. It deserves support and constructive
participation.

------
bobjordan
I certainly didn't get Zed's article, like maybe I'm not familiar with the
superior toolings of other languages, but whenever I've used python2to3 I
found it helpful enough to get real work done. For example, two years ago, I
fell squarely in beginner segment skill wise, and I filed an issue with a repo
owner to request to update to be python 3 compatible (an ORM for RethinkDB -
[https://github.com/linkyndy/remodel](https://github.com/linkyndy/remodel)).
The obviously talented repo owner initially responded he feared it would be
complex and take a long time. I then spent a weekend to learn the library,
used 2to3 to get 95% there, fixed the remaining small issues, and updated the
repo to be compatible with Python 3, passing all tests. Sometimes these
python2 programmers just do not want to change.

------
aorth
Great point for point rebuttal. I was annoyed at the original article when it
was circulated recently. As eevee points out, most of the original author's
points are invalid or unimportant, and to me it seems that he is just mad he
has to write another "learn Python" book for Python 3.

I'm a Python beginner and I recently wrote a few small tools in Python 3. They
have a few functions, do network I/O, argparsing, connect to PostgreSQL, etc.
It took me a few hours and it was fun.

------
_coldfire
It's human nature to resist change. Making a clean break and avoiding
backwards compatibility was the smart move, it annoyed many, but I look around
and see some languages truly suffering from such things years down the track.

Python3 is superior, its nothing but fandom when people say otherwise.
Starting from scratch I'm yet to see any real benefit for 2, yet it still
happens routinely due to the mindsets of some.

------
gnuvince
The original article by Zed should stand as an example of logical fallacies.

------
incompletewoot
Maybe Curmudgeon Driven Development is a thing. I was totally surprised that
by the end he hadn't advocated Perl 6. Like, "This is how it should be done:
Parrot in Perl 6".

I was hoping out of this kerfluffle that some one would make Python3's
TypeError show which variable was the byte string and the normal string. I
haven't seen any people argue for improving the error reporting. It's that
kind of thing that helps beginners when they start trying to do things with
their current knowledge.

------
BerislavLopac
I'm amazed how people always list "three ways of formatting strings", when in
fact there are four. :)

Admittedly, the fourth is not built-in, but it's part of the standard library:
string.Template

------
elmiko
for all his points about adoption of python 3, i think an interesting data
point is that many of the OpenStack projects are actively converting (or
already have converted) to python 3. that's no small amount of python.

------
chucksmash
> Zed Shaw, your behavior here is fucking reprehensible.

s/here//

------
zde
Ported all my code base to Lua. Python can die now.

------
pwdisswordfish

        # -*- coding: utf8 -*-
        print "Hi, my name is Łukasz Langa."
        print "Hi, my name is Łukasz Langa."[::-1]
    

This is a bad example. Reversing Unicode strings isn't an operation that is
often needed in practice, and it shouldn't be done codepoint-wise anyway,
because some characters have to be represented with multiple codepoints:

    
    
        Python 3.5.2 (default, Nov  7 2016, 11:31:36) 
        [GCC 6.2.1 20160830] on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> 'x́q'[::-1]
        'q́x'
    

Doing it properly is bit of a grapheme clusterfuck:
<[http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries...](http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>).

~~~
ambivalence
See, this _is_ a good example because in Python 2 the string is just bytes. Ł
is 0xC5 0x81. Reversing those bytes just corrupts the content. This isn't
exclusive to this one operation. Let me give you another example:

    
    
      >>> print "Hi, my name is Łukasz Langa.".lower()
      hi, my name is Łukasz langa.
    

Whereas on Python 3:

    
    
      >>> "Hi, my name is Łukasz Langa.".lower()
      'hi, my name is łukasz langa.'
    

Or even more basically, Python 2:

    
    
      >>> len("Hi, my name is Łukasz Langa.")
      29
    

Python 3:

    
    
      >>> len("Hi, my name is Łukasz Langa.")
      28
    

Sure, there's hard things about code points and graphemes that Python 3 also
doesn't touch, like the example you gave (originally: characters outside of
the BMP using surrogate pairs, later: using combining characters), or equality
(where normalization might be needed to compare two strings that look alike),
or sorting (where collation rules mean the same code points might sort
differently depending on the language used). But there's tools to deal with
those tricky situations and the default is much more robust.

And sure, you can use the `unicode()` type in Python 2 to get the correct
results but the problem is that most APIs will work with both types and
silently do the wrong thing every now and then. Those include
UnicodeDecodeErrors and UnicodeEncodeErrors that you get _sometimes_. In
Python 3, without the magic type promotion, you'll get a sane TypeError every
time.

Disclaimer: the name used in the example is mine.

~~~
pwdisswordfish
It's only 'more robust' as long as you don't care about languages in which it
isn't.

The example is bad, because reversing code points _also_ corrupts the content,
but in a different and a somewhat harder to detect way. A better method of
reversing text is to split it by grapheme clusters and reverse that. (It may
still fail to work when someone manages to sneak a bidi control character in
the string, and I am quite confident there may be other issues here that I
cannot anticipate right now.) Python 3 offers no advantage here over Python 2.
Reversing code points only appears to work, until you feed it stress-marked
Cyrillic or Korean with individually encoded jamo; which is quite similar to
the charge levelled against Python 2 strings: 'it appears to work, until you
feed it non-ASCII input'. The problem isn't solved; the failures are just
swept over to language environments in which Western programmers often don't
bother to test their programs.

Expecting len() to count characters in a string, as you seem to be doing, is
also wrong.

    
    
        >>> len('а́')
        2
    

The correct way to count characters is, again, to count grapheme clusters.
However, counting characters is not an operation that useful in practice
either. If you want to check a string against a database limit, those are
usually specified in bytes (or sometimes indeed in code points). If you wish
to know how much a piece of text takes on screen, you should ask a font-
rendering library to compute it for you; unless you're rendering to a tty, in
which case fine, count grapheme clusters. Apart from which syntax gets you
codepoint versus byte length, there's no difference here between Python 2 and
Python 3 here.

Case conversion is also locale-dependent. In Turkish and Azeri, the uppercase
form of i is not I, but İ. In Lithuanian, the lowercase form of Ì should be
i̇̀, which has both a dot and a grave accent above. str.upper() and
str.lower() fail to take that into account, and don't even warn about it in
the documentation. Again, it _almost_ works... until you're a Turk and it
doesn't. Is it really all that better than Python 2?

Manipulating human-readable strings is _hard_. Operating on code points
instead of bytes makes it easier, but doesn't solve it completely. Handwaving
about 'tools to deal with those tricky situations' is not an answer; I am
quite sure some of those tools exist for Python 2 as well.

------
aprogrammerab
It's really not surprising to me that the original link to Shaw's article is
buried on HN.

This is exactly what he is talking about: it's not about right or wrong
anymore, it's all about tribalism. I can remember when RoR was the newest,
hip, thing in Silicon Valley. If you even mentioned something wrong with it on
HN, you were silenced and downvoted.

There were real issues with RoR that needed to be addressed, but the community
developers would rather spend time silencing opposing opinions that actually
fixing the flaws in the language.

Where are these developers now? They all moved onto Javascript frameworks and
node.js.

There are real issues with Python 3. The Python 3 faction of the community has
basically told anyone that asks to fuck off and do things their way.

The responses here on HN and this 'rebuttal' have really just proven his
point. I would have rather it proven him wrong.

~~~
ubernostrum
Ah yes. "You wrote an article which was incorrect, here is an explanation of
why what you wrote was incorrect, backed up with reasoning and sources" is
_exactly identical_ to "fuck off, do things my way".

I am glad to have your wisdom on this issue.

