Python 3 is a disaster. The problem is they made the entire thing out to be a Big Deal, but they didn't really offer any compelling reason to upgrade. I mean, the unicode is... kinda better, and iterators are a bit improved, but couldn't those things have been point releases? 2.8? They basically said that python 3 was a new language, and then offered no significant reason you should use this new language. So we all kept using actual python, quirks and all. To me, as a python developer, Python 3 is a failed fork. Harsh but true.
IMO, if they were going to do that sort of thing, they should have had at least one killer feature. Like maybe if python 3 had been based off pypy they could be saying "Look! We're 5x faster! Want to upgrade now?". That would have been compelling. But their message was: "we cleaned up some stuff that most of you don't care about, and broke a bunch of things". Think about pitching that sort of upgrade to your boss. "Well it doesn't solve any of our problems, and it creates a ton of new ones, but it's the right thing to do because it makes some code slightly cleaner arguably! Convinced yet?"
If I were in charge of python, I would do this: announce python 4, have it be based on the pypy interpreter, and keep compatibility with python 2 the language while reforming the C extension APIs to be more future proof (for getting rid of the GIL and so on). (Or maybe get rid of them entirely and just have people use CFFI.)
Or if Python 3 had included reliable (and updated) package and environment managers; and/or a default GUI framework (QT maybe) - out of the box.
The fragmentation between Python 2 and Python 3 is killing the language. Not to mention the community. Python needs a united front.
Tkinter is Python's "default GUI framework". At least, that's what they continue to claim (https://wiki.python.org/moin/TkInter), and it's the GUI library I ran into first when I first learned Python.
How does it look and feel once you get started? Well, let's just say it's a bad sign if a GUI framework website has no screenshots. Even with increasingly hacky theming engines layered on top, it still is hard to get anything feeling close to native: http://tktable.sourceforge.net/tile/screenshots/macosx.html
Go ahead, you can start laughing...
(for anybody that is saddened by the above, there are thankfully binding libraries for Qt and Wx, both of which do get you fairly decent cross-platform widgets from within Python, and either of which would be better default GUI libraries in 2014.)
It's sad, but I'm not sure how to resolve the problem, and nobody else has been either in the past 10 years.
Requests might make a good addition to the standard library at some point. PyQT definitely shouldn't though.
I think that's as "good" as a QT binding can be, too; QT itself is LPGL (or commercial license).
I already commented on it and I agree. Python 2 is great and Python 3 is a little bit greater. But just not better enough to switch.
> Or if Python 3 had included reliable (and updated) package and environment managers;
I remember being at Pycon and distutils2 was announced. There was an aura of coolness, enthusiasm and hope in the air. Someone asked "but what about RPM packages?" they laughed at him and Tarek hid behind the podium in a funny gesture. Years later I am still using the default old included distutils package and building RPMs with it.
You do know that pypy also uses GIL.
CFFI is to call C functions from Python. It won't allow you to run C extensions for CPython (something should implement Python C API).
I remember in the past hearing about someone actually submitting a patch to cpython remove the GIL about 10 years ago, but it was rejected because it made the language about 2-10x slower. IMO, that decision was INSANE. I have a 16 core machine right now and it can only really use 1/16th of it. Nobody uses python for it's speed, and people that need it are probably using C extension libraries like numpy, so if you were to cut the language's speed in half for a short term I'm going to say that pretty much nobody would care, especially if it were to solve a problem that everybody hates. Ruby is already about 10x slower than python on most of tasks, and it's just not a big deal. GVR is optimizing for the wrong thing.
If Python were 10x slower than it is, I'd use Perl. When I started using Python for real (2004), I'd have stuck with Perl if Python had been 2x slower. (Note that I very much dislike Perl, and thought of Python as a nicer Perl when I first learned it.) I care about speed, and I also care about convenience. Python has a very nice balance of the two for me.
On the other hand, I only use threads in one script that I've ever written (and the GIL isn't a problem with it). So I just don't care about the GIL.
(Edit: re-worded second paragraph)
No it isn't - both are roughly in the same ballpark. Even before YARV it wasn't anywhere near an order of magnitude in the general case.
So people try to learn to code with python 3.x and get frustrated because simplest things don't work.
They should seriously rethink the whole 3.x thing. I must say, introducing it probably did more harm than good for the future of this language.
It is the same as this page:
That page gives fairly equal weight to the two versions (I guess that could have changed over time).
Edit: It might make sense to have a warning about matching the interpreter version up with the tutorial, but clear wording for it is not obvious to me.
Additionally, when a beginner sees this page, he sees two versions - of which one is 3 and one is 2 - I think most of people will choose the "newer" ( newer = better? ) version because they don't really know about the differences between the two.
The second part of the problem is the fragmentation of tutorials on other websites - which can't be fixed by changing the download page.
Aaron implies that this approach was not taken with Python (non-py guy here), could someone tell me what the reasoning was behind that? To much legacy code in Py2 code base, just wanting to start from a clean slate, or what?
EDIT: (to add another question :) could you efficiently accomplish what Aaron is talking about, what would be the best way to go about this? @pak would you really have to load 2 stdlibs, is there no (efficient) way around syntax errors?
The idea to allow one project to switch between Python 2 and Python 3 for individual files is more interesting, but practically speaking would lead to sort of a mess.
Yes, I believe key parts of the standard object model changed between the two (e.g. strings vs bytes, many of the magic methods and operators) making this nearly impossible. Every time objects would pass back and forth, they'd have to be converted, which is wasteful and bug-prone (and this is a whole mess of library code that the python3 guys probably did not want to write). You'd also need to load two different standard libraries, which would waste memory.
You only need to scan through the upgrade feature list to see how hard intercompatibility would have been. http://docs.python.org/3.0/whatsnew/3.0.html
Although I totally agree with Aaron that this would have allowed people to actually use Python 3 without fear, anything short of forcing the entire program and all of its modules to run in v3 mode as opposed to v2 mode would have been a disaster from a reliability and technical design standpoint. And that's closer to how things actually went down with 2to3, etc.
We can extend Aaron's ideas to even more radical ideas, for example, instead of allowing Python 3 and 2 code to be mixed on a per-file basis, allow it to be mixed on a per-function basis. In fact, allow running Python 3 code, and drop in "backwards incompatible" blocks inside of a function to let you program things that will be backwards-compatible. In other words, let people program in Python 3 as much as they want, but allow them a way to use libraries that only support Python 2 without making a mess. I'm not saying this will be easy at all, but it will definitely make Py3k adoption actually happen.
On the meta level, I'm really glad people are now discussing how to get the Python 3 rollout happening, because we really are dangerously close to having a "dead" language in Python if nothing changes.
Python 2's string handling is broken in the presence of unicode characters, often leading to subtle errors that wouldn't cause exceptions until far away from the place where the error was introduced, and oftentimes didn't produce exceptions at all, just wrong data. Strings were defined as sequences of bytes, and then provided a .decode method to convert them to a unicode object that stores them as a sequence of codepoints. The problem was that a large number of libraries (including all of Aaron's that I've looked at) used str as their internal string type, which meant they were storing a sequence of bytes in an arbitrary encoding but not storing the encoding along with it. If you pass such a library a string in a different encoding, it will happily store it, manipulate it, and concatenate it with other strings. If you pass such a library multiple strings in multiple encodings (like, for example, if you're pulling data from multiple webpages), you will get garbage data that can't be decoded in any codec.
Python 3 changes this so that str stores unicode codepoints and there's a separate 'bytes' type for uninterpreted bytes, and you are supposed to decode your bytes into strings at system boundaries. This is recommended software engineering practice for anyone who builds large systems that have to interact with foreign-language text; however, a large number of Python developers work in English-only environments where anything they receive will automatically be ASCII. They've never tried to track down subtly broken encoding issues; for them, the decode step is extra busywork that seems pointless.
The reason the Python2->3 transition has been so painful is that it involves a whole language ecosystem fixing bugs in their software, but the bugs are subtle enough that the vast majority of people doing the work will never have encountered them.
You can't just use the "from __future__ import python3_unicode" support because this is a change to the semantics of an existing language feature. In Python2, a string is a sequence of bytes. In Python3, a string is a sequence of unicode codepoints. What happens when a Python3 program calls a Python2 library with a string object? Do you try to auto-convert the strings? You can't, really, because strings in Python2 don't specify their encoding; you have no way of knowing which codec the Python2 library meant, because chances are they didn't think about it.
Here are real problems with python:
* It's slow (excluding pypy)
* The C interface sucks (compared to something like Lua) and holds back language progress
* It can't handle multicore well outside of multiprocess hacks (which are sold as "the right way" -- bullshit. Sometimes threads are useful).
* Lambdas/closures are unnecessarily limited (I don't buy the whitespace/syntax argument -- look at how Boo works. You can do this just fine while keeping it pythonic).
* (Down somewhere near the bottom:) strings should probably be unicode by default.
Python 3 doesn't solve any of the first five major problems, and the last problem can be worked around in python 2.
You've correctly identified problems with python 2, but I think you're incorrectly giving them more weight than they deserve. Most people just don't run into those issues, and don't care, and that's why python 3 is dead in the water -- because it doesn't solve the real pain points of python enough to make people want to upgrade.
I want to in principle, don't get me wrong, but I just don't have the time and money to do it, especially at the opportunity cost of not doing other product related stuff.
> * It's slow (excluding pypy)
Don't agree. Python is good enough for me. Point being "slow" and "fast" are just invitations for flame war without a specific benchmark or use case.
> * The C interface sucks (compared to something like Lua) and holds back language progress
As other post mentioned, try python cffi. That one is pretty good.
> * It can't handle multicore well outside of multiprocess hacks (which are sold as "the right way" -- bullshit. Sometimes threads are useful).
Meh, this is often parroted. In what I do (network and server io stuff) threads work very well!
Completely disagree. This is a terrific feature. I hate implicit hidden defaults and assumptions. All the other langauges have an implicit this/self Python makes it explicit. That is a good thing in my book
Your implicit hidden default is my handy abbreviation.
Imagine if we didn't have contractions in English.
For that previous sentence, is it really at all frightening that the "didn't" meant "did not" but we haven't (ha) written it out? Could it possibly mean anything else? Wouldn't English be all the more stilted and ugly if there were only one explicit and verbose way (a Python design principle) of negating verbs?
If I redefined the "self" as any other name Pythonistas would hate on me for being unidiomatic. Most editors highlight "self" in anticipation of what it means. In fact, I have never seen code where specifying another name would be justifiable. The only conclusion from this situation is that it is a de facto keyword and re-specifying it every. single. time. is a waste of programmatic breath.
And for this reason, every time I tab between languages and get my favorite "function takes 1 argument (2 given)" error, (by the way: a completely bewildering message for a programmer from every other language, rereading the method definition and wondering where the second argument is coming from ), I mutter curses at my terminal for Python relentlessly carrying this wart into its third decade.
Implicit and explicit are relative to expectations. Python moves the implicitness into the way arguments are passed to methods. It is not any more explicit when placed in the context of all other OOP languages, but rather an eccentric convention.
: 800 results on SO. Read them and weep. http://stackoverflow.com/search?q=Python+takes+arguments+giv...
When method call argument one is method definition argument two, things get confusing quickly. What's the benefit to readability? Python has plenty of other implicit rules (semantic whitespace being the most troublesome--though ironically its primary benefit of readability runs counter to the visual noise of the explicit self arg), and as pak points out, explicit "self" in the method def just moves the implicitness elsewhere in the code.
Lua, Go, and Rust all have explicit self.
I hate to be picky, but he only acknowledged portions of what you said.
> but a couple decorators held this one back.
@classmethod and @staticmethod are somewhat important language features to not break.
And strings aren't? Whatever, I understand that there are tradeoffs and to people that only use Python, the aesthetic blemishes tend to matter less. To people that use it in the context of other languages, it sticks out like a bad paint job or a prominent stain. This, and many more opinions than I could express succinctly, are on the reddit thread for GVR's post.
@redditrasberry: To make a bad analogy, it's like a stain on your carpet in your front hallway. If you live with it for long enough you won't even see it any more. But to visitors coming to your house it's the most obvious thing. And it's particularly noticeable because the rest of Python is so nice - it's like I'm visiting an art gallery and everything is beautiful and pristine, but there on the carpet at the entrance is this huge stain that nobody has ever cleaned up.
Not sure how explicit self breaks strings? You've lost me on the way I'm afraid.
If we're quoting Reddit comments at each other:
> You may be used to a different kind of magic. Perhaps the magic of a variable called this appearing inside your method, or the magic in which an un-prefixed variable name somevar is sometimes a local but at other times a member variable this.somevar. So it may take a little time to get used to Python's style, but isn't it only because you are used to the other magic?
Explicit self is hardly a burden after about 10 minutes of learning Python, and there's really little objective argument to be made in favour of or against it as opposed to everything else.
> It's odd to me that they were willing to break so much code with 3.0 (by "fixing" strings and Unicode) but a couple decorators held this one back.
Removing self would be a far more massive change - it wouldn't just impact certain strings, it would impact every Python class.
> I hate to be picky, but he only acknowledged portions of what you said.
Read it and for me he did not acknowledge anything at all.
That statement is too broad to be useful though. All languages have warts.
I would never in a million years look at a performance problem and given these two options:
1. Write the critical section in a faster language in serial (ie, rather than dynamic interpreted script, maybe compiled bytecode, or maybe even native machine code).
2. Write the critical section multithreaded in the script language.
I would never think to use #2 first. I would always just move my CPU bound code into a tiny C++ library and only worry about threading as a matter of last resort. You get so many huge leaky problems (even if you got rid of the GIL you would be looking at variable synchronization, atomic timings, and cache coherency) from going to multithreading it is never worth it over just writing the same code section in native code and using a native call API, even the default way of writing your native code with Python.h involvement.
One class of programs left out in the cold is the network server. Now a network server must respond to a large number of requests. From a basic software engineering perspective, a 16-core machine should be responding to AT LEAST 16 requests at once (much more if some of the requests are IO bound). So the network server needs some kind of parallel processing (whether threads, subprocesses, or whatever you want to suggest). Under your philosophy, programs that need threading (thus all network servers) should not be written in Python, but somebody should be stepping down to C. While it is probably true that a very small minority of network servers should not be written in Python, the broader claim is absurd; you should be able to write reasonably-performing network servers in Python with relative ease. It is, after all, a server-side language; "writing a server" should be very high on the list of "things you can do".
Now more broadly, the existence of greenlet, Twisted, gevent, and their popularity (we're talking top-100 packages here) speak to the fact that there are a LOT of python programmers who have threading-related requirements. Are they on crack? Now mix in the new standard library stuff like asyncio (3.4) and threading is clearly an important enough issue to get major attention from the core committers. Are they on crack?
Now you might operate in a world where every time you need threads is an isolated case and it's fairly simple to drop down to C. But there are a lot of people (in absolute terms; I don't know if they are in the majority) where when they want threads the right solution is to use threads.
The thing I hear from the core committers whenever the GIL comes up is "if we worked on the GIL, we would be taking lots of time away from more important things." But when you look at the things they work on instead--unicode, iterators, ordered dictionaries, argparse, etc.--plenty of people in this thread are insufficiently motivated to upgrade. Are ordered dictionaries really more important than GIL work? To me, the answer is clear. I would rather have some progress on the GIL problem than every single py3k feature combined.
There are a number of high-level approaches you can use to concurrency. Shared-nothing processes. Threads and locks. Callback-based events. Coroutines. Dependency graphs and data-flow programming.
They all suck, and they all suck in different ways. Processes have large context-switching overheads, and take up a lot of memory, and require that you serialize any data you want to communicate across them. Threads and locks make it very easy to corrupt memory if you forget a lock, very easy to deadlock if you don't have a clear convention for what order to take locks in, and ends up being non-composable when you have libraries written under different such conventions. Callbacks require that you give up the usage of "semicolon" (or "newline") as a statement terminator; instead you have to break up your program into lots of little functions whenever you make a call that might block, and you have to manually manage state shared between these callbacks. Coroutines requires explicit yield points in your code, and opens up the possibility of a poorly-behaving coroutine monopolizing the CPU. Dependency graphs also require manual state management and lots of little functions, and often a lot of boilerplate to specify the graph.
Python has a "There should be one - and only one - obvious way to do things" philosophy, and with asyncio, Guido seems to have decided that the obvious way for Python is going to be coroutines. It's an interesting choice, and he's not alone in that - I recall Knuth writing that coroutines were an under-studied and under-utilized language concept that had many desirable properties. Coroutines free you from having to worry about your global mutable state potentially changing on every single expression, and they also give you the state-management and composition benefits that explicit callbacks lack.
However, I'll point out that they're following a very sensible "stop the bleeding" maintenance strategy in targeting Unicode and async first. Whenever you need to upgrade a massive old codebase to a new way of doing things, your first priority needs to be to stop things from getting worse. That means getting everybody onto the new system, either via a shim or (worst case, but true in this case) by fiat, and then cleaning up the older code and introducing new cool stuff that's enabled by the new features.
The big problem with Python 2's Unicode handling was that it made things easy on people who handled only ASCII and then dumped the problem on top of people who wanted to handle Unicode and still rely on libraries from people who didn't know what Unicode was. As a result, the latter people just didn't use Python, because of the pain involved. Guido and the core maintainers (correctly, IMHO) identified this group of people as important to the future of Python, and so they want to make things viable for them even if it's inconvenient for a large section of Python's existing userbase.
Special emphasis on the multicore stuff. It's just dreadful. I tried to build a complex processing system on top of `multiprocessing` and friends but just had to give up because it doesn't work half the time for obscure reasons (here's a fun one: you can't call a function from a `multiprocessing.Pool` if the function had been defined after the pool was instantiated. wat.) I just drank the kool-aid and used Celery, which is awesome, but I shudder every time I think about how many sacrifices must've been made to make it work.
That said, other languages -- most recently and notably, Ruby -- have managed to make the Unicode transition without leaving much of the existing user base behind. I realize that Python is often used in larger and slower-moving organizations than Ruby, and also has a more conservative philosophy. Even so, perhaps they could have announced Python 2.8, identical to 2.7 in every way except that strings are now Unicode. Then, after 1-2 years of everyone getting their Unicode house in order, we could have moved onto 2.9 or 3.0, and/or used the "from future" syntax.
There is no perfect solution, but it seems to me that by breaking so many things at once, we're in a situation where everyone wants to upgrade, but no one sees the overwhelming benefit from doing so, and thus puts it off even further.
I also believe that dealing with unicode more transparently is a very reasonable strategy. There are not many places where you actually have to deal with code points. Text formatting and rendering is one of those few places. Having the file i/o and standard streams do encoding/decoding by default goes against that design.
Programs generally need to deal with textual data at one of three levels:
1. Manipulate strings and substrings. e.g., "does string x contain substring y"? Byte sequences of utf8 data are fine for this. You may need to normalize the data first, but that's true of codepoint sequences as well.
2. Deal with a small number of specific ASCII codepoints--e.g., the filename separator character. Again, utf8 byte strings are fine for this.
3. Deal with glyphs, including glyphs composed of multiple combining characters. You need to iterate to assemble the glyphs, so a codepoint sequence offers no advantages over a byte sequence of utf8 with a codepoint iterator.
A sequence of bytes is just fine as a native string type. Store Unicode data as utf8-encoded byte strings. Provide easy iteration over codepoints in utf8 strings, transcoding for I/O, normalization functions, etc. Call it a day.
UTF-8 everywhere works great when you can enforce it. On the level of an individual project, you can enforce it. On the level of a language ecosystem, you can't, and you need to. Otherwises you end up with some libraries who assume their internals are UTF-8 encoded strings, some libraries that assume they are ASCII strings, some libraries that assume the caller is handling encoding issues and will take care of ensuring that it's all UTF-8, some that make no assumptions at all and carry around the encoding everywhere, and some that just haven't thought about the problem and silently break when you use them with data that came from other libraries.
It's interesting that Go and Java - both languages with mature Unicode handling - still have a distinction between uninterpreted bytes and UTF-8 or UCS-2 text. In Go, you have separate bytes and string types, even though Go strings are just UTF-8 byte sequences. In Java, you have byte and String. The problem is not a technical one of how to represent strings, it's a social one of how to get all the library authors on a language to agree on a convention of how to handle encoding.
Ok, but the solution isn't to make everyone rewrite their libraries by holding the future ransom if they do not.
It could have been real simple: let python track encoding when it's known and throw an exception if they mix oldstr(unknown) with newstr(py3).
In py3 modules newstr is str, and in py2 modules oldstr is str. Now you only have to fix the bits where they mix and the programmer can always choose to fix it in the new code.
What's the largest codebase you tried a unicode-ification project on? It's a nightmare unless you keep de/encoding as close to the i/o operations as possible.
I can't understand how you've ever found it just as easy to do "string x contain substring y" on bytes vs uc strings. Any case-insensitive test will fail miserably unless you only ever see ASCII input. Then there's sorting and tokenization. Oh god, the sorting bugs...
Even measuring the length of string is a miserable fail. And blind substitution of utf8 bytes horribly mangles the output causing mysterious segfaults or silent corruption.
On a large codebase, programmers can't keep track of what encoding is being used in which parts of the code. Eg. Let's allow the users to specify input file encoding! But our OS does filenames in UTF16-LE. And the Web API is UTF-8... nasty stuff. It's far saner to use character strings everywhere except immediately after/before I/O operations.
In Py3.3 or so, they at last detected the problem and fixed at least this one ... but very late! Many programmers might have turned their backs to Py3 already.
Basically, all strings would internally be represented as unicode (OR as a byte array + an encoding, that might be a little too ambitious though), and it has 2 APIs that it can be accessed by. non-instance related operations (such as making new strings) similarly has 2 APIs. pre-unicode-switch code gets an API that emulates as best as is possible the behaviour as before, and post-unicode-switch code gets the full unicode API as exists now for unicode strings.
Even if encoding is flat out broken, if you put in a stream of chars and store that as <whatever>, and then later you query this thing, the broken usually ends up undoing itself. Yes, this will fail spectacularly once you try to for example concatenate a string with broken encoding to a string with a different encoding, but presumably places that are international proof have switched to unicode strings long ago, and places that aren't really aware of the importance of anything that isn't in ASCII 32-127 will find all their code to magically 'just work'.
Similar things should be possible for the iterator business. Internally things are iterators, but if ever any P2 code touches them, they turn into a (memory hogging, etc) list. yes, of course, this will flat out break if you attempt to pass an infinite iterator to code that isn't used to it, but these are all transitional pains, and the key point is: As long as you don't fork the pre-big-switch version (beyond security updates), sooner rather than later libraries will fix the bugs, or their community of users will just die out as someone else writes a new one to fill the void.
Painful? yes. very. We've seen this in programming land before. Java5 introduced generics and as a result any interaction with pretty much any library written before it resulted in a cavalcade of unsafe/raw warnings. It sucked. Huge communities stuck to 1.4 (IBM WebSphere notably stayed there for almost half a decade before moving on to 1.5).
But, today? Libraries have upgraded or have been replaced. The chance you still run into pre-generics code is tiny. It took long, it sucked, yadayada, but for all intents and purposes that was java's Python2->Python3, and everyone is on Python3 at this point. In the same time span as P2->P3, roughly speaking, and that seems like it's nowhere near complete.
> 4. In Python 2.d, it actually did stop working.
I'm curious, what are some examples here? The `from __future__`s that I recall offhand are `print_function`, `division` (still need to be explicit) and the old `with_statement` (incompatible code broke as soon as this became default).
The other old way of doing things that I can think of offhand is `catch Exception, e`, which has been replaced with `catch Exception as e`, but not removed.
For the full list, see http://docs.python.org/2/library/__future__.html
And there is one forgotten in that list:
from __future__ import braces
No other language I have ever worked with speaks like this. "New to Rudy? You could use version 2.1 or why not try version 1.0, it still works ok"
Python.org treats version 2 and 3 as completely different things, newbies to the language like myself don't see it as a update to Python 2.x because thats not how it's sold to us.