
How Python 3 Should Have Worked (2012) - denzil_correa
http://www.aaronsw.com/weblog/python3
======
overgard
I agree with this 100%.

Python 3 is a disaster. The problem is they made the entire thing out to be a
Big Deal, but they didn't really offer any compelling reason to upgrade. I
mean, the unicode is... kinda better, and iterators are a bit improved, but
couldn't those things have been point releases? 2.8? They basically said that
python 3 was a new language, and then offered no significant reason you should
use this new language. So we all kept using actual python, quirks and all. To
me, as a python developer, Python 3 is a failed fork. Harsh but true.

IMO, if they were going to do that sort of thing, they should have had at
least one killer feature. Like maybe if python 3 had been based off pypy they
could be saying "Look! We're 5x faster! Want to upgrade now?". That would have
been compelling. But their message was: "we cleaned up some stuff that most of
you don't care about, and broke a bunch of things". Think about pitching that
sort of upgrade to your boss. "Well it doesn't solve any of our problems, and
it creates a ton of new ones, but it's the right thing to do because it makes
some code slightly cleaner arguably! Convinced yet?"

If I were in charge of python, I would do this: announce python 4, have it be
based on the pypy interpreter, and keep compatibility with python 2 the
language while reforming the C extension APIs to be more future proof (for
getting rid of the GIL and so on). (Or maybe get rid of them entirely and just
have people use CFFI.)

~~~
crag
"... they should have had at least one killer feature"

Or if Python 3 had included reliable (and updated) package and environment
managers; and/or a default GUI framework (QT maybe) - out of the box.

The fragmentation between Python 2 and Python 3 is killing the language. Not
to mention the community. Python needs a united front.

~~~
pak
> a default GUI framework

Tkinter is Python's "default GUI framework". At least, that's what they
continue to claim
([https://wiki.python.org/moin/TkInter](https://wiki.python.org/moin/TkInter)),
and it's the GUI library I ran into first when I first learned Python.

How does it look and feel once you get started? Well, let's just say it's a
bad sign if a GUI framework website has no screenshots. Even with increasingly
hacky theming engines layered on top, it still is hard to get anything feeling
close to native:
[http://tktable.sourceforge.net/tile/screenshots/macosx.html](http://tktable.sourceforge.net/tile/screenshots/macosx.html)

Go ahead, you can start laughing...

(for anybody that is saddened by the above, there are thankfully binding
libraries for Qt and Wx, both of which do get you fairly decent cross-platform
widgets from within Python, and either of which would be better default GUI
libraries in 2014.)

~~~
jerf
People have been making the argument that QT or Wx should be the default for a
while, the problem is and remains A: licensing or B: complexity of shipping
"batteries included" distributions. For A, PyQT's license might surprise you:
[1]. Python can not distribute this with the rest of the essentially-BSD-
licensed Python distribution. For B, for instance, none of the linux
distributions particularly want "Python", which is often a _base requirement_
, to pull in either of Wx or QT, both of which are quite sizable, and require
their own stack of other things to come piling in too.

It's sad, but I'm not sure how to resolve the problem, and nobody else has
been either in the past 10 years.

[1]:
[http://www.riverbankcomputing.com/software/pyqt/license](http://www.riverbankcomputing.com/software/pyqt/license)

~~~
digisign
There's a new alternative to pyqt, though its name escapes me now.

~~~
jerf
PySide (thanks lambda) appears to be LGPL; "better" than PyQT, but still not
shippable in the core Python distro without changing the license of the core
distro.

I think that's as "good" as a QT binding can be, too; QT itself is LPGL (or
commercial license).

------
fear91
It's also really off putting for beginners that try to learn the language.
Python 3 is being served as the main download when you search for it. And
while you search for tutorials, most of them are in python 2 - and 80% of them
DO NOT state whether they are for python 2 or python 3 ( because most were
made during the python 2 times? ).

So people try to learn to code with python 3.x and get frustrated because
simplest things don't work.

They should seriously rethink the whole 3.x thing. I must say, introducing it
probably did more harm than good for the future of this language.

~~~
maxerickson
This is the major result when you Google "python download":

[http://www.python.org/getit/](http://www.python.org/getit/)

It is the same as this page:

[http://www.python.org/download/](http://www.python.org/download/)

That page gives fairly equal weight to the two versions (I guess that could
have changed over time).

Edit: It might make sense to have a warning about matching the interpreter
version up with the tutorial, but clear wording for it is not obvious to me.

~~~
fear91
People tend to click first links with way higher frequency than those located
below.

Additionally, when a beginner sees this page, he sees two versions - of which
one is 3 and one is 2 - I think most of people will choose the "newer" ( newer
= better? ) version because they don't really know about the differences
between the two.

The second part of the problem is the fragmentation of tutorials on other
websites - which can't be fixed by changing the download page.

------
jfaucett
This strategy makes a lot of sense to me. I really like the "deprecated
warnings" to "explicit failure" transitions, IMO this works really well,
though I only have experience with it at a library level using semver. So cant
say much about point #1, except that this is essentially what you're doing
when you test agains xlib-head branch.

Aaron implies that this approach was not taken with Python (non-py guy here),
could someone tell me what the reasoning was behind that? To much legacy code
in Py2 code base, just wanting to start from a clean slate, or what?

EDIT: (to add another question :) could you efficiently accomplish what Aaron
is talking about, what would be the best way to go about this? @pak would you
really have to load 2 stdlibs, is there no (efficient) way around syntax
errors?

~~~
aston
There are pieces of Python 3 that are syntax errors in Python 2. And there are
Python 2-isms that are valid syntax in Python 3 but have a different
interpretation. It's not as simple as importing certain features (which
creates a sort of language version hybrid).

The idea to allow one project to switch between Python 2 and Python 3 for
individual files is more interesting, but practically speaking would lead to
sort of a mess.

~~~
pak
>The idea to allow one project to switch between Python 2 and 3 for individual
files

Yes, I believe key parts of the standard object model changed between the two
(e.g. strings vs bytes, many of the magic methods and operators) making this
nearly impossible. Every time objects would pass back and forth, they'd have
to be converted, which is wasteful and bug-prone (and this is a whole mess of
library code that the python3 guys probably did not want to write). You'd also
need to load two different standard libraries, which would waste memory.

You only need to scan through the upgrade feature list to see how hard
intercompatibility would have been.
[http://docs.python.org/3.0/whatsnew/3.0.html](http://docs.python.org/3.0/whatsnew/3.0.html)

Although I totally agree with Aaron that this would have allowed people to
actually _use_ Python 3 without fear, anything short of forcing the entire
program and all of its modules to run in v3 mode as opposed to v2 mode would
have been a disaster from a reliability and technical design standpoint. And
that's closer to how things actually went down with 2to3, etc.

~~~
jfaucett
thanks for that link, now I see the problems Text Data vs. Unicode alone would
be an enormous overhaul. Though the syntax changes dont seem that problematic.

~~~
bskap
They aren't. And there are automatic tools for converting between them (2to3
and 3to2), along with those __future__ imports that Aaron mentioned: doing
"from __future__ import print_function, unicode_literals, absolute_import,
division" would give you most of the Python 3 syntax changes in Python 2. The
"everything expects bytes" to "everything expects text" change is the biggest
hurdle for a lot of projects.

------
edanm
I think the strategy Aaron talks about makes a lot of sense. I especially like
the idea of simply shipping future interpereters that can work with both 2.x
and 3.x code. Seriously, it makes it even more dead-simple to get started with
Python 3.

We can extend Aaron's ideas to even more radical ideas, for example, instead
of allowing Python 3 and 2 code to be mixed on a per-file basis, allow it to
be mixed on a per-function basis. In fact, allow running Python 3 code, and
drop in "backwards incompatible" blocks inside of a function to let you
program things that will be backwards-compatible. In other words, let people
program in Python 3 as much as they want, but allow them a way to use
libraries that only support Python 2 without making a mess. I'm not saying
this will be easy at all, but it will definitely make Py3k adoption actually
happen.

On the meta level, I'm really glad people are now discussing how to get the
Python 3 rollout happening, because we really are dangerously close to having
a "dead" language in Python if nothing changes.

------
nostrademons
This misses the point of why Python 3 was invented: Unicode.

Python 2's string handling is broken in the presence of unicode characters,
often leading to subtle errors that wouldn't cause exceptions until far away
from the place where the error was introduced, and oftentimes didn't produce
exceptions at all, just wrong data. Strings were defined as sequences of
bytes, and then provided a .decode method to convert them to a unicode object
that stores them as a sequence of codepoints. The problem was that a large
number of libraries (including all of Aaron's that I've looked at) used str as
their internal string type, which meant they were storing a sequence of bytes
in an arbitrary encoding but not storing the encoding along with it. If you
pass such a library a string in a different encoding, it will happily store
it, manipulate it, and concatenate it with other strings. If you pass such a
library multiple strings in multiple encodings (like, for example, if you're
pulling data from multiple webpages), you will get garbage data that can't be
decoded in any codec.

Python 3 changes this so that str stores unicode codepoints and there's a
separate 'bytes' type for uninterpreted bytes, and you are supposed to decode
your bytes into strings at system boundaries. This is recommended software
engineering practice for anyone who builds large systems that have to interact
with foreign-language text; however, a large number of Python developers work
in English-only environments where anything they receive will automatically be
ASCII. They've never tried to track down subtly broken encoding issues; for
them, the decode step is extra busywork that seems pointless.

The reason the Python2->3 transition has been so painful is that it involves a
whole language ecosystem fixing _bugs_ in their software, but the bugs are
subtle enough that the vast majority of people doing the work will never have
encountered them.

You can't just use the "from __future__ import python3_unicode" support
because this is a change to the semantics of an existing language feature. In
Python2, a string is a sequence of bytes. In Python3, a string is a sequence
of unicode codepoints. What happens when a Python3 program calls a Python2
library with a string object? Do you try to auto-convert the strings? You
can't, really, because strings in Python2 don't specify their encoding; you
have no way of knowing which codec the Python2 library _meant_ , because
chances are they didn't think about it.

The other major change in Python 3 - iterators everywhere - is similar, and
it's a recognition that an increasingly large proportion of the programming
ecosystem lives in a world where async operation is important and many
concurrent activities may be happening at once. And I'm really glad to see
Python willing to take on these challenges even with 5 years of short-term
pain, because it shows a commitment to keeping Python relevant for the issues
that 21st-century programmers will face. An increasing number of software
platforms will have to deal with non-English text; an increasing number will
need to handle concurrent, event-based environments. Without these changes
Python would basically cede these areas to languages like Go or Javascript
that have their unicode story straight and are well-adapted to async
programming.

~~~
overgard
Ok but... you could do unicode in python 2, it just wasn't ideal. The problem
was, python 3 doesn't actually solve most peoples actual day to day problems.

Here are real problems with python:

* It's slow (excluding pypy)

* The C interface sucks (compared to something like Lua) and holds back language progress

* It can't handle multicore well outside of multiprocess hacks (which are sold as "the right way" \-- bullshit. Sometimes threads are useful).

* Lambdas/closures are unnecessarily limited (I don't buy the whitespace/syntax argument -- look at how Boo works. You can do this just fine while keeping it pythonic).

* Explicit "self" is stupid and most people hate it. Javascript and Ruby are comparable languages, and neither of them need this while still having the exact same flexibility as python.

* (Down somewhere near the bottom:) strings should probably be unicode by default.

Python 3 doesn't solve any of the first five major problems, and the last
problem can be worked around in python 2.

You've correctly identified problems with python 2, but I think you're
incorrectly giving them more weight than they deserve. Most people just don't
run into those issues, and don't care, and that's why python 3 is dead in the
water -- because it doesn't solve the real pain points of python enough to
make people want to upgrade.

~~~
zanny
> Sometimes threads are useful).

I would never in a million years look at a performance problem and given these
two options:

1\. Write the critical section in a faster language in serial (ie, rather than
dynamic interpreted script, maybe compiled bytecode, or maybe even native
machine code). 2\. Write the critical section multithreaded in the script
language.

I would _never_ think to use #2 first. I would always just move my CPU bound
code into a tiny C++ library and only worry about threading as a matter of
last resort. You get so many huge leaky problems (even if you got rid of the
GIL you would be looking at variable synchronization, atomic timings, and
cache coherency) from going to multithreading it is never worth it over just
writing the same code section in native code and using a native call API, even
the default way of writing your native code with Python.h involvement.

~~~
drewcrawford
Your argument is essentially that one can write slow things in Python and fast
things in C, and that this solves the majority of problems (let's even be
generous and call it 98% of them that fit neatly into these two categories).
The trouble here is that "the number of programs" is a large number, and 2% of
a large number is still large, and leaves important classes of programs
without a good solution.

One class of programs left out in the cold is the network server. Now a
network server must respond to a large number of requests. From a basic
software engineering perspective, a 16-core machine should be responding to AT
LEAST 16 requests at once (much more if some of the requests are IO bound). So
the network server needs some kind of parallel processing (whether threads,
subprocesses, or whatever you want to suggest). Under your philosophy,
programs that need threading (thus all network servers) should not be written
in Python, but somebody should be stepping down to C. While it is probably
true that a very small minority of network servers should not be written in
Python, the broader claim is absurd; you should be able to write reasonably-
performing network servers in Python with relative ease. It is, after all, a
server-side language; "writing a server" should be very high on the list of
"things you can do".

Now more broadly, the existence of greenlet, Twisted, gevent, and their
popularity (we're talking top-100 packages here) speak to the fact that there
are a LOT of python programmers who have threading-related requirements. Are
they on crack? Now mix in the new standard library stuff like asyncio (3.4)
and threading is clearly an important enough issue to get major attention from
the core committers. Are _they_ on crack?

Now you might operate in a world where every time you need threads is an
isolated case and it's fairly simple to drop down to C. But there are a lot of
people (in absolute terms; I don't know if they are in the majority) where
when they want threads the right solution is to use threads.

The thing I hear from the core committers whenever the GIL comes up is "if we
worked on the GIL, we would be taking lots of time away from more important
things." But when you look at the things they work on instead--unicode,
iterators, ordered dictionaries, argparse, etc.--plenty of people in this
thread are insufficiently motivated to upgrade. Are ordered dictionaries
really more important than GIL work? To me, the answer is clear. I would
rather have some progress on the GIL problem than every single py3k feature
combined.

~~~
nostrademons
So here's some perspective on the concurrency problem. I write network servers
for a living - most are in C++ or Java, but I would love to be able to use
Python.

There are a number of high-level approaches you can use to concurrency.
Shared-nothing processes. Threads and locks. Callback-based events.
Coroutines. Dependency graphs and data-flow programming.

They all suck, and they all suck in different ways. Processes have large
context-switching overheads, and take up a lot of memory, and require that you
serialize any data you want to communicate across them. Threads and locks make
it very easy to corrupt memory if you forget a lock, very easy to deadlock if
you don't have a clear convention for what order to take locks in, and ends up
being non-composable when you have libraries written under different such
conventions. Callbacks require that you give up the usage of "semicolon" (or
"newline") as a statement terminator; instead you have to break up your
program into lots of little functions whenever you make a call that might
block, and you have to manually manage state shared between these callbacks.
Coroutines requires explicit yield points in your code, and opens up the
possibility of a poorly-behaving coroutine monopolizing the CPU. Dependency
graphs also require manual state management and lots of little functions, and
often a lot of boilerplate to specify the graph.

Python has a "There should be one - and only one - obvious way to do things"
philosophy, and with asyncio, Guido seems to have decided that the obvious way
for Python is going to be coroutines. It's an interesting choice, and he's not
alone in that - I recall Knuth writing that coroutines were an under-studied
and under-utilized language concept that had many desirable properties.
Coroutines free you from having to worry about your global mutable state
potentially changing on every single expression, and they also give you the
state-management and composition benefits that explicit callbacks lack.

There are parts of them that suck - like having to explicitly adds "yield
from" at any blocking suspension point, and having to propagate that "yield
from" down the call stack if added to a synchronous call. But having written a
bunch of threaded Java server and (desktop) GUI code, a lot of callback-based
Javascript, and a lot of C++ in both callback and dependency-graph style, all
of those models suck a whole lot as well.

------
philh
> 3\. In Python 2.c, warnings begun being issued when you tried to use the old
> way, explaining you needed to change or your code would stop working.

> 4\. In Python 2.d, it actually did stop working.

I'm curious, what are some examples here? The `from __future__`s that I recall
offhand are `print_function`, `division` (still need to be explicit) and the
old `with_statement` (incompatible code broke as soon as this became default).

The other old way of doing things that I can think of offhand is `catch
Exception, e`, which has been replaced with `catch Exception as e`, but not
removed.

~~~
sepeth
there is `unicode_literals` too.

For the full list, see
[http://docs.python.org/2/library/__future__.html](http://docs.python.org/2/library/__future__.html)

And there is one forgotten in that list: from __future__ import braces

~~~
philh
So it looks like in Aaron's list, 2.a == 2.c and 2.b == 2.d, and the process
has been completed only three times, and two of those times were just adding
keywords.

------
glynjackson
quote from python.org: "New to Python or choosing between Python 2 and Python
3? Read Python 2 or Python 3." "Which version you ought to use is mostly
dependent on what you want to get done."

No other language I have ever worked with speaks like this. "New to Rudy? You
could use version 2.1 or why not try version 1.0, it still works ok"

Python.org treats version 2 and 3 as completely different things, newbies to
the language like myself don't see it as a update to Python 2.x because thats
not how it's sold to us.

------
jbeja
I think the only people nthat hate python 3 is the one that have worked with
it for several years, newbies wouldn't even care a little.

------
codecondo
This aaron guy likes to talk a lot of shit so it seems

