Hacker News new | past | comments | ask | show | jobs | submit login
Python 3 makes it easier to develop high-quality software (migrateup.com)
237 points by ColinWright on Nov 18, 2015 | hide | past | web | favorite | 238 comments

This is sad and frustrating. The article is trying to convince people to upgrade to a 7-year-old version of Python! And everyone is still skeptical about it, thinking it's too new, too risky, unproven, and libraries aren't ready yet.

Python 3 adoption takes longer than entire NodeJS lifetime. In the same timeframe PHP introduced and EOLed 5.3 (edit: and there were 4 years between release of 5.0 and EOL of 4.x, which involved backwards-incompatible refactoring of OO semantics).

> And everyone is still skeptical about it,

They are not skeptical. Nobody thinks "yeah if I just wait long enough, Python 3 will go away and 2.8 will come out".

They just don't see an incentive to update. Simple as that. It is basic economics of time and money.

Python 2 wasn't terrible and Python 3 isn't dramatically better. Unicode and other stuff in Python 3 is nice. Is it nice enough to start digging into stable, working, money making code just to say "Yay, Python 3"? I think the development community has spoken by how it was handled. It's gonna happen but it will happen at a very slow pace.

Imagine this alternative scenario -- "Python 3 brings 3x speed improvement + removed GIL". Ok, I bet you, there would have been a much faster adoption because now there is a greater incentive to justify it. People would think "Yeah, I'll destabilize my code a bit but I gain performance, I can do that".

Someone in a sibling comment suggested "They should have just stopped updating security patches for 2.7, that would have forced them to use 3.0". That's ok if Python holds a monopoly on development languages. But it doesn't. As soon as that happens developers will not just look at updating to Python 3, they'll start looking at other ecosystems: Go, Elixir, Java, Clojure, Scala, NodeJS etc.

I could not agree more.

To add to this; if it included even a 2x speed up / memory reduction, I think 90% of everyone would have switched by now, and you know what else? Go probably wouldn't exist.

This is a good point. Go is a direct response to Google's experience with Python.

I would argue that removing the GIL would scare the shit out of developers and slow adoption instead of hasten it. A lot of accidentally thread-safe code would suddenly become thread-unsafe.

It was just a hypothetical example of the kinds of changes that would have flocked everyone to Python 3.

I think the energy spent supporting Python 2 and the inability to really unify and focus the community on Python 3 is preventing or delaying that kind of dramatic improvement.

If people would rather move to another ecosystem rather than update their code that's fine. The Python community doesn't get paid a nickel every time the interpreter runs, (at least slightly) engaged users are what move Python forward.

The difference is if we look at it from the point of view of the Python community (and they are all about Python 3). But another view is that of developers who see themselves as developers of product X or company Y and they value their bread on their table or that X and Y wins more than Python is popular or Python succeeds.

I would say that actual, continued support is an incentive to update.

I think there is way less "anti" python 3 attitude than there appears. A few haters, a few blog spam/click baiters, a few people who never even looked at python3. Vast majority of people using Python3 (me included), just use it. It's not a big deal. No reason to toot my horn over it.

There is lots of FUD from early 3.0-3 days. Python 3.5 is solid and well supported by 3rd party libs check this out


Python3 has been adopted, now. It just isn't newsworthy so we don't hear about it.

While CPython 3.x is rock solid and shipped years ago - no argument there - both Pypy and Jython still lacks stable Python 3.x releases. That's a major migration pain point.

It is hard trying to introduce Python 3 at a place that has systems that depend on both Cpython and Jython. Jumping between python 2 and 3 between projects is quite taxing mentally. And staying on python 2 only leaves me with a sense of lingering doom to say nothing of having to endure all the idiosyncrasies of python 2.x. I really enjoy the few moments when I can use Python 3.

It is perhaps easy to consider the 2 vs 3 debate over and done with for people who can just disregard Pypy and Jython entirely in their environment. I probably would think so myself had I not been so unfortunate to be the unwilling maintainer of solutions dependent on Jython right now.

Jython has always been behind and sometimes by a large margin. I don't think that is a "major" migration point at all.

No, it hasn't, based on download stats of packages. Guido found this out the hard way when he went to work for Dropbox. They run mostly Python 2.7 code, and now he has to maintain it.

Also, PyPy is 2.7 compatible, although there's a 3.x beta.

Guido knew this long before going to DropBox. Google is still on 2.7. IIRC I saw some mailing list posts back around 2008 that acknowledged that most large organizations will never upgrade to 3.0, because it just doesn't make sense when you have a large legacy codebase.

That doesn't make Python 3 a failure or unadopted, it's just the structure of our industry. By search & job stats [1], the most popular programming languages are Java, C, C++, C#, and yep, Python. Companies tend to pick whatever language is best for early adopters when they first get started and then language usage grows along with the company. So the most popular languages now are the ones that were cutting edge 15 years ago, when the biggest companies of today were tiny startups. Go look at what's most popular among startups today and you'll see what'll be the hot programming languages in 15 years.

[1] http://www.tiobe.com/index.php/content/paperinfo/tpci/index....

Nice metric to see if a language is going to be mainstream. I think this should work. Do you think of any way to validate your theory?

You can backtest it against languages that appear frequently in programming forums or Silicon Valley meetups and then take a look at their popularity & use by big companies 5 years later.

I've done this anecdotally (i.e. not with a huge quantitative sample size, but looking at specific "hot" tech companies and the tech stacks they end up using) and it seems to work, but is subject to certain gotchas. Data points:

EBay (1995) - Perl, rewritten in C++ in 1997, rewritten in Java in 2002. Google (1998) - C++. Del.icio.us (2003) - Perl. GMail (2002) - Perl, rewritten in Java circa 2002. LiveJournal (2002) - Perl. Flickr (2004) - PHP. Facebook (2004) - PHP. YouTube (2005) - Python. Reddit (2005) - Lisp, rewritten in Python in 2006. AirBnB (2006) - Ruby on Rails. GitHub (2007) - Ruby on Rails. Dropbox (2007) - Python. Hacker News (2007) - Arc, YC internal tools (c. 2012) are written in Ruby on Rails. Twitter (2007) - Ruby on Rails, rewritten 2010 in Java/Scala. Uber (2010) - Node.js, supposedly this was a rewrite.

There are two big caveats:

Sometimes a big competitor comes along right when a language is undergoing a massive rewrite - for example, it looked like Perl would own the Internet around 2003, but just a couple years later we got Rails and Django and Perl stagnated with Perl 6, and so Perl jobs are nowhere near as hot as they would otherwise be. Similarly, Python was poised to take over much of the non-web server space in 2008, but in 2009 Node.js came out just as Python was losing steam in the Python 3 transition. Python still remains pretty hot (it's been helped by its use in data science), but it may've lost the startup server market to Node.

The other big anomaly is that there was a lot of attention around Haskell and Erlang from 2005 - 2008, and yet these are still niche languages. Erlang had one big success with Whatsapp and some smaller ones with Facebook chat and CouchDB, but didn't really become mainstream. My hypothesis there is that because both of these were old languages that were rediscovered, they were built with some assumptions (eg. strings are lists, or funky POSIX interfaces) that didn't fit modern startup development, and so they couldn't gain critical mass. But then, both Python and Ruby were old languages that were rediscovered; perhaps it was the sudden resurgence of Linux (which both Python and Ruby were steeped in) along with the web that carried them forwards.

What a great analysis!

So it hasn't been adopted, based on a sample size of 1?

New companies, like the one I'm part of, use Python 3, all the time. Django, and the other 15 packages we require, all support Python 3. In fact, I've been using Python 3 since 2008, and I've been using it for work since 2013. I think we're past all of that, at this point.

I'm not sure why migration is always regarded as a metric for succes. Python 3 wasn't supposed to be a "upgrade", per se. It was supposed to allow the core to rid itself of a lot of the language's warts. It has done this quite well without sacrificing much. The result was a slow migration, not by existing code bases, but by means of new companies.

PyPy3 stable is compatible with Python 3. PyPy4 is not, yet.

This is true and incomplete at the same time.

It's true that 3 is happily happening for many people, effectively and without fuss.

But it's incomplete to pretend the schism is not an issue; not to try to understand the pro-2.x points which many reasonable users have.

Its a huge issue, even now, and to pretend it's not there is to do BOTH pythons a disservice because it will simply serve to feed the disagreement, to the detriment of both flavours of the language.

The Python 3 project needs to understand where it went wrong, acknowledge it, 'better language' arguments (true-ish but largely ineffective) notwithstanding, and try to bridge the community.

Yes the 3 segment reasonably wishes all this would just go away so we can all focus on moving python forward, but it's not going away. It must be adressed without yet another post about 3 being better, which simply makes 2.7 people feel like second class citizens. They've heard all the arguments before.

> The Python 3 project needs to understand where it went wrong, acknowledge it...

Already happened years ago. Breaking backward-compat is the issue, and is a difficult situation for any popular platform. Guido has mention it won't happen again.

I'm not sure how to even read this, after I give it the requirements file for my last Flask project. It says six projects block transition for me, then offers me a goofy badge saying 4 blockers. Over on the right, the top thirteen of the projects in my file are black and not given as links.

(However, it is clear that going to Python 3 would not work.)

Python 2 devs are also missing out on the amazing ecosystem developing around the new async features.

There are asyncio ports of many popular and less popular libraries, and it even has a nicer syntax than NodeJS.

I don't believe its the same. Code from old version of PHP could still be expected to run the same in the newer version except for "deprecation" warnings maybe. Probably, a reason why it is not easy for PHP devs to fix all typecasting quirks their language has.

Python 3, on other hand, has tried to fix many design decisions and hence requires some transition effort. The effort is many folds more with so much software already written in Python 2 and large codebases. I am not surprised the adoption is slow.

There were plans for a never released PHP6. Even some books were published for PHP6, it was meant to be the big unicode release. Most of its feature made it to the 5.4 version. In the end, it seems that it was the better decision to introduce new features in a smoother way. PHP hasn't suffered the problems that exists with Python 2 to 3. Although, there was the PHP4 to 5 transition that broke the older object oriented implemention - though most never used OO in PHP4 as it was known to be slow - so it wasn't a problem for the community.

Note also that Sun/Oracle has periodically come out with updates for Java and these get taken up quickly, even by "enterprise" sorts of people. The enthusiasm for Java 8 is huge and the transition from 7 -> 8 is astonishingly trouble free and the 6 -> 7 transition was relatively bad because of the way in which JAXB creates POJOs changed in a way that broke people's assumptions.

Java has done a great job of evolving over time -- they made the controversial choice of using type erasure in generics. This makes generics much less useful than they are in C#, but Java kept compatibility between the classic and generic collections and C# didn't. Thus the C# code base has two different syntaxes for collections, particularly in the stuff that comes from Microsoft.

The cynical lesson I've taken from watching languages evolve over time is that, if you wish to provoke mass-migration to your new language version, having either a missing feature people want, or an annoying misfeature people don't, is really important.

Java and PHP have had both in spades. Back in the Perl 4 days, people really wanted a real object system[1].

Python 2 (aside from a few things that annoy the crap out of me, but apparently not many others) is a pretty complete, fairly decent language. Perl5 is pretty complete and people are pretty happy with it. (Well, the ones who write in it.)

[1] Yes, I know, not here to argue the semantics. It is "sufficiently OO" for what people want to do with it.

You missed a whole subclass of people. Those that don't like Python3. Personally I believe Python3 took a wrong turn with the approach of converting everything to UTF-32 and making a big deal about the philosophical difference between "strings" and "bytes". I very much suspect the future will be something like how Julia deals with the issue, where UTF-8 can be just left as UTF-8.

Apparently you don't deal with much non-ascii data if you don't like python3's string handling. It's literally the reason I first upgraded, because reading in simple text data on Python2 can break completely. Seeing as literally no language on earth can be written entirely in ascii characters (since even English text can contain words with accents or old fashioned things like the long s ſ) this is a pretty big issue.

Yes it is possibly to laboriously add code to check and encode/decode text, that might sometime be text and sometimes binary pretending to be text pretending to be binary. But why go through that pain when you could just use Python3 where things are clear and generally actually work.

I fully agree. Python 3 makes life much, much easier with non-ASCII languages and prevents the annoying Unicode conversion bugs you see in most Python 2 code bases.

It's also very intuitive, which is great for teaching - I've always had a hard time explaining the difference between "strings" and "unicode" objects in Python 2, but the clear boundary makes it straight-forward in Python 3.

This is also significant reason why people have problem when migrating py2 code to py3. The code is actually buggy and mixes strings with bytes. Once they run it in Py3 all of the bugs show up because Py3 is stricter about it.

For the same reason if you need to write code that should work on Py2 and Py3, it's much easier to write it on Py3 and then add backward compatibility through "from __future__ import ..." and the six module.

I've heard lots of good things about this project, by the way: http://python-future.org/

It's much less clunky than six. Instead of manually adding the compatibility layers to your code, it automates the process, which results in cleaner code. It can even translate (some) Python 2 modules on-the-fly.

Many projects switched to it from six.

Not always straight forward in Python 3. Try explaining to your class how to ignore case in string comparisons.

>Try explaining to your class how to ignore case in string comparisons

Okay: You can't. You'll need to convert both strings to the same case, and then compare those. In the majority of cases, you can do:

return string_a.lower() == string_b.lower()

But sometimes that won't work, because converting case isn't always easy. And that's a unicode issue, not a python issue. It's not even completely a unicode issue, just a "writing can be very weird" issue.

For example, "ß" (U+00DF) is a character that doesn't/shouldn't have an uppercase form. Unicode does support it as U+1E9E, but all the examples I know of will convert it to "SS" ([U+0053, U+0053]) instead.

Try this in a javascript console, for example.

  > "ß".toUpperCase().toLowerCase() === "ß".toLowerCase()
  > "ß".toUpperCase() === "SS".toUpperCase()
  > "ß".toLowerCase() === "SS".toLowerCase()
  > "ß".length
  > "ß".toLowerCase().length
  > "ß".toUpperCase().length
  > "ß".toUpperCase().toLowerCase()
Also, try ctrl+f'ing for "ss" on the Wiki page for ß [0] and see what it matches. Or ctrl+f this page for "ß" and it will match every "ss". I think it's pretty neat.

[0] https://en.wikipedia.org/wiki/%C3%9F

(edit: You might already know everything in this comment, but I started writing and it ended up being a cool exercise for myself, so I figured I'd submit).

> "writing can be very weird"

Absolutely! I think there's a lot of baggage in computing because it happened to emerge in places where language could mostly be written in simple characters with surprisingly simple rules. When exported internationally suddenly you deal with "oddities" that are really rather the norm in other languages.

In French for example is customary NOT to put accents on capital letters, even though it's possible. But then again sometime people do it anyway - it's not a hard rule. So often converting to lowercase may require contextual understanding because some words are written the same but with different accents, and would capitalize to the same string. Which word was meant would only be evident from the context of the sentence.

I suspect that if computing had rather developed somewhere where these things were commonplace we would have much better handling of it today.

Interesting that the "ss" search only works on safari and chrome, but not on firefox (which happens to be the only browser that lets you easily toggle case-matching AFAICT). I presume this means that the first two went with a toUpperCase() approach while FF went with toLowerCase(). I wonder which choice is better in the face of unicode type oddities like this one.

That depends on the type of strings you are comparing. If you include the full Unicode set, you're going to run in to huge troubles in probably all languages simply due to the nature of the Unicode character set, rather than the specific language's handling of strings and Unicode code points.

But, personally, if you're teaching Python to a class, whether it's 2.x or 3.x. You should probably stick to using some variation of string.upper() == string.upper() to explain simple comparisons, due to simplicity.

This is exactly what str.casefold is for: https://docs.python.org/3/library/stdtypes.html#str.casefold

casefold is only part of the story and by itself does not supply a complete solution.

Could you expand on where it falls short? I'm interested, having not yet relied on it heavily.

I would suspect that you don't deal with much non-ascii data if you like python3's string handling. In Python 2 you have control over encodings. You can work with unicode-strings internally, but you receive and produce bytes, which is actually the only truth in the computer world. Everything else is a lie. It would be nice to pretend that there's only 1 encoding in the world and that encoding is roughly equivalent to unicode itself, but this isn't true. World doesn't work on UTF-8. By abstracting encodings Python 3 obscures what's actually happening.

Not at all, it forces you to unify to storing character data as Unicode code-points. Eventually if you go down far enough, you're reading the raw bytes you speak of from some sort of external system/file/DB. It's at that point that you define your encoding, and never worry about it every again. Except for when you touch those external systems/files again, again with full knowledge about what encoding is required.

Either way, if you're not dealing with differently encoded character data in Python 2.x as noted above, you're causing yourself some unnecessary headaches.

Something tells me you did not experience the pre-unicode world around 90s. I did, and I didn't like it. I prefer it is abstracted from me thankyouverymuch.

But seriously, all what python (and as a matter of the fact majority of languages) is placing a clear distinction between a byte and a character. This forces developers to keep in mind what they are dealing with and reduces number of bugs.

The truth is that if you treat everything as bytes, chances are that your program has bugs. Actually good test is to convert your existing Py2 code to Py3, if you have encoding/decoding errors then your code was buggy.

> But seriously, all what python (and as a matter of the fact majority of languages) is placing a clear distinction between a byte and a character.

No, they make a clear distinction between bytes and code points. This is the wrong approach as code points are not characters. People are going to think that everything is fine now that they're using code points and the unicode type when they should be using a unicode library for text manipulation.

In Python2, I couldn't just print the string I just read from file - such were the joys of Windows console, which has different encoding than the rest of the system (cp85x in console, cp125x in filesystem & gui)...

Compared to that, Python3 was like gift from heaven.

Python 3 is okay-ish for developers who don't think about encodings unless forced to, but it actually makes it more difficult to do "the right thing":


Having to thing about encoding all the time, without being forced to, brings in a lot of unwanted accidental complexity.

Did you know, that there is an operating system, where a simple loop that reads lines from file and prints them to the console will end up with errors? (that being Windows and some European code pages, where console is cp85x, filesystem read and write cp125x).

The accidental complexity here is due to Windows, not Python. Python would rather generate an error than give you garbled output without your knowledge.

Most developers are not aware of that and write code, that just bombs out for the user. The user would prefer at least garbled text instead of exception, because he is used to that anyway in other apps. He has at least some result, instead of stack trace.

Well, and that's before the user tries to redirect stdout to file, in which case it will end up with error too, because that is going to be 7-bit ascii.

Anyway, Python3 does the right thing with defining the encoding at the edges and everything works right for the user out of the box.

Recent versions only convert to UTF-32 when a wide character is present (they also use 2 byte characters when that is sufficient):


Glancing at Julia, I'm not sure "invalid character index" is really a great thing to have to handle when dealing with text. It does seem that systems for dealing with text will eventually offer various apis beyond just a somewhat disfunctional character api (which I think is a fair description of str in Python 3).

> Recent versions only convert to UTF-32 when a wide character is present

It's non-BMP characters that trigger the conversion to 4 bytes per codepoint. Unfortunately Emoji are non-BMP and are super common.

If you're dealing with emoji -- and yes, they are incredibly common -- you'd better not use any Python before 3.3.

Non-BMP characters in 2.x-3.2 give you implementation-specific behavior. It can vary from one system to the next. Some string methods fail on "narrow builds" of Python (the ones that store their characters in UCS-2), while others return different results than they would on a "wide build".

Not an issue because I don't use the Unicode type. All my data comes in as UTF-8 on which I do a quick validation check. Then it's just handled as bytes all the way to and from my database.

Sure, that is an option that's always available. You don't have to worry if Python's string methods are wrong if you're opting not to use them.

>Recent versions only convert to UTF-32 when a wide character is present (they also use 2 byte characters when that is sufficient) ...

I happen to know this fact. Are you really asking that everyone that comments on a language acknowledge all the possible implementation details? I at no point complained about the memory required to hold strings in CPython so I can't really understand why you would bring this up.

Because someone else reading "converting everything to UTF-32" might not be aware of the nuance. That's all.

I'm a casual user of Python 2, and I certainly didn't know this, so I appreciated the clarification.

Sorry for the snark but this has become a trigger for me. Every time anyone disagrees with the way Py3 does Unicode someone will immediately assume that that person is just confused about the implementation.

I agree. Everything is so much simpler when you use UTF-8 everywhere. Being effectively forced to use the ASCII/UCS2/UCS4 unicode type in Python3 is the only thing that has kept me from switching to it.

The only thing you are forced to think about is whether you're working with strings or with bytes. Unicode is handled transparently (but you do need to know the encoding when you're decoding bytes). How it's stored internally is none of your concern.

Yes it is. Every conversion to and from the unicode type and every expansion of the unicode type from 1 to 2 to 4 bytes eats up processor time, cache/memory bandwidth and puts more pressure on the garbage collector.

And I don't want to be forced to distinguish between strings and bytes. To me they are one and the same as I use UTF-8 for everything. Not once in my code do I need to encode/decode anything.

Nice, down voted for someone not agreeing with me. I'd rather know what you though was wrong with what I wrote.

I've always wanted to know how things are done internally. Makes me a better programmer. What I said about all this extra conversion does to performance is correct, especially when scaling up.

And my choice to use UTF-8 internally is a valid way to do things. I make sure that what comes in from the outside world is UTF-8 (either by rejecting or converting) and then don't have to worry about it from that point on. The bulk of my code deals with outbound data so no need for me to do any encoding as it's already UTF-8.

Most of us aren't so lucky. I really do wish we lived in a 100% UTF-8 world. But even if we did, I don't see how e.g. regular expressions would work.

The world seems to be headed to UTF-8 for everything. That's why having Python3 force us away from it doesn't make sense to me.

As for regular expressions, they work perfectly if your delimiters are in the ASCII range. If you're trying to match a single character then they don't work for Unicode even when using Python3's unicode type. The reason is that code points are not characters.

UTF-8 is just an encoding. Unicode is a more general term. Python 3 allows you to convert bytes to unicode from any encoding, including utf-8 (which is default). Using utf-8 as the internal unicode representation is flawed too - index and length operations are slow.

For regexes, \w matches any unicode word character (unless ASCII mode is on), just use it on strings. If you use it on bytes, then yes, you can only match bytes.

> Using utf-8 as the internal unicode representation is flawed too - index and length operations are slow.

And this perfectly illustrates what I posted in another comment elsewhere. Thinking that doing operations on code points (indexing and length) is the proper thing to do. You need to use a unicode library to work on graphemes and take into account grapheme clusters.

> For regexes, \w matches any unicode word character

From what I've read, \w just matches code points. I did find that the https://pypi.python.org/pypi/regex module supports \X which matches graphemes as per the unicode spec http://www.unicode.org/reports/tr29/.

I think we're going to see the switch occur quickly now. This year I've seen several large enterprise projects using Python 3. Also, the release of v3.5 brings useful features that will not be backported as 2.8.

It's unproven because nobody uses it and nobody uses it because it's unproven.

I don't have any compelling reason to move off of python 2.7, and for my use cases the positives are outweighed by the negatives (in particular getting used to needless changes).

Speaking of Node, it still requires Python 2.7, right? Or at least node-gyp does: https://github.com/nodejs/node-gyp

gyp itself still relies on Python 2 (and, to my knowledge, Google has mostly moved on gyp, so they're unlikely to do anything large to it).

I've said it before, I'll say it again. When all of the major Linux platforms provide the same pre-built Python 3 packages for the packages I use, I'll make the migration. Until then, it's just easier to stay on Python 2.

Yes, I could build the packages/wheels myself. Yes, I could install a compiler on the target machine and let the binaries build from scratch. I won't, however, since it's a lot of time for remarkably little benefit. Python 2 is still well supported, and is a great language to program in.

Yes, Python 3 has cleaned up some of the rough edges of Python 2; but the improvements are still not significant enough to offset the cost of porting libraries and building packages.

but for node we expect everyone to install/build from source an not use the supplied packages because they are ancient, in comparison.

Which distribution doesn't ship python3.3?

Most ship python 3.3, but most of the libraries which the distro provides pre-compiled are not Python 3.


python-mysql python-django python-keyczar python-pyhsm

> but for node we expect everyone to install/build from source an not use the supplied packages

This is not necessarily an improvement in the state of packaging. Verifying that unsigned source code has not been changed is a hard problem (which NPM has not yet solved). Installing compilation tools and extensive header files on a production machine opens up a new class of security vulnerabilities on that machine (whether a VM, container, or bare metal).

There's Anaconda too: https://www.continuum.io/why-anaconda

Many companies use it to run a recent Python stack on any Linux distro they like.

Anaconda was the easiest way for me to get Python 3 running on my Slackware instance. I am very pleased with it.

Don't you use a virtualenv? There are benefits beyond having up to date packages and its not especially complicated (I am guessing it would take an hour or two to learn).

Yes, I know about virtualenv. I also know it still requires either having pre-compiled wheels, or a compiler on the frontend. Same problem, a bit better isolated from system packages.

Is it so hard to do an 'apt-get install build-essential'? It seems a bit silly to stay on only system packages when a 'apt-get install mysql-dev && pip install pymysql' gets you a fresh up to date Python 3 (or 2) package in your virtualenv. Heck, if managing a bunch of virtualenvs is too much install virtualenv wrapper and just use 'mkvirtualenv yourproject'.

I alternate between 2 and 3 depending on the system, but I am like you as well - me using 2 has nothing to do with philosophical differences and all of that, it has to do with my Linux distribution having more python2 packages than python3 packages.

> install a compiler on the target

One solution, build on your dev machine and produce distro packages instead.

"Yes, I could install a compiler on the target machine and let the binaries build from scratch."

Are we talking about the same Python?

Yes -- specifically packages with c shared libraries. The kind you get when installing bcrypt, python-mysql, or any number of libraries with optimized binaries.

When I went to learn Python I was immediately faced with a choice that would have profound effects on my future success: Python 2 or Python 3.

Both sides are extremely persuasive and thus I felt that no matter what choice I made it was the wrong one and would haunt me down the line.

So I just stuck with a different modern language that has wide acceptance and isn't battling with itself.

> Both sides are extremely persuasive and thus I felt that no matter what choice I made it was the wrong one and would haunt me down the line.

It really isn't that big or hard of a decision. For the vast majority of projects, the choice doesn't matter, so I'd default to Python 3 at this point. Really, the only reason to choose Python 2 for a new project is compatibility with libraries you need[0], but the list of Py3-incompatible libraries is constantly getting smaller and smaller.

That said, if you already have a lot of Python 2 code written, there probably isn't any point to upgrading--the benefits are marginal at best, and it can be a lot of work to migrate a large code base.

[0]: Here's a pretty good reference for Python 3 support in popular libraries: http://py3readiness.org/

The same thing hapenned to me in 2011, so I ended learning Ruby, no regrets.

Which is quite ironic, because (at least from my experience) Ruby constantly breaks backward compatibility.

Not at all, I can easily migrate 1.9 Ruby projects to 2.2.3 w/o any problems.

I remember that code for Ruby 1.8 required changes in source to make it run on Ruby 1.9 and that was just a minor version change.

When was your experience? I could definitely agree with that pre 1.9.x. But I can think of barely a handful of examples in which minor things have broken during my time in Ruby from 1.9.3 to 2.1.5...even when booting up a year-old laptop with godknows what actual version of Ruby via rbenv I have on it...things just seem to work...And even problems that arise from specific scenarios with libraries, or when I simply must move an old system from 2.0 to 2.1.x or whatever...Upgrading the Ruby and then running bundler almost always just works.

In contrast, with Python, on a weekly basis I run into 2.x and 3.x problems...last week it was running the Google Cloud SDK, which is 2.x only. I say this as someone who has almost switched exclusively to Python (3.x) and enjoy it...but man I miss the days of Ruby's ease-of-use.

(but yeah, 1.8 to 1.9 was not a lot of fun, but thankfully, many years have passed since then)

Yes, it was 1.8 to 1.9 and frustrating because the change was just increase in minor version number.

It's not necessarily about breaking or maintaining backward compatibility that matters here. It's about how supporting two similar looking yet ultimately incompatible versions creates confusion for new users, driving them away.

This is the same issue facing Perl (although the Perl situation seems more dire) WRT Perl 5 vs Perl 6. For Python maybe the benefits of keeping Python 2 alive for large users offsets the harm of any confusion, I have small projects (all in Python 3) so it's a rare annoyance rather than a deal breaker.

> This is the same issue facing Perl (although the Perl situation seems more dire) WRT Perl 5 vs Perl 6.

The situation with Perl is neither more dire nor really the same issue as with Python. Perl 5 is not going away. Perl 6 maintainers are not the same people as Perl 5 Porters -- both will be maintained for the foreseeable future.

Second, Perl 6 is a significant change to Perl, with far more features and somewhat different syntax. People will want to change for the features. Running Perl 5 code from inside Perl 6 works by calling out to libperl, which runs a real version of Perl 5. Perl 5 will still be popular due to ubiquity, access to CPAN modules, and things like CPAN Testers.

So it's a choice for the developer, to develop in a new language that just happens to be Perl-flavored, or to go with any other language. Not really a crisis.

I agree, I think they should simply deprecate Py2 instead dragging it for 7 years + 4 more planned. People would start migrating if they knew that Py2 would be EOL soon and instead of Py2 or Py3 would be whether you want your code to run in the future and make sure all security bugs are fixed.

So, what did you choose?


PHP is still the king at web, php5-cli might not be as popular as Python.

Python seems more popular at Academics, and it's essentially the Basics for the generic public, or replacing Java at colleges nowadays.

That been said, I use PHP myself, PHP7 looks promising.

Python 2 vs 3, AnguarJS 1 vs 2, the self-conflicted "upgrade" is double-sided.

PHP7 new features are very welcome and look great.

Speed improvement is great as well. Seeing 2-5x speed improvements. :)

My problem with this post is not its factual accuracy. It's factually accurate, downright impressive.

My problem is with its premise. That premise is that Python 2.7 people need to be somehow educated. To be shown some kind of light. It's the same premise that the project leaders have been going on about for 7 years. "Can't you see how much better this is?". Worse: "are you 2.7 people stupid, or what?".

It hasn't worked.

Why hasn't it worked? Not because 2.x people are slow or uninformed. They're very well informed. They assess the merits and downsides. They've had 7 years!

The reason is a combination of inertia, but also, many people actually think that in many ways, 3.x is a regression. If you don't need Unicode, Unicode is a huge pain in the xxxx. I was just having to mess with "b"s everywhere today on strings over msgpack to a non-pythonic endpoint. Lazy ranges (can also be) a huge pain in the bt. So I must move to 3.5. For what? Type annotations? Asyncio? I've been using (the highly impressive) Tornado for 3 years on 2.7.

Now google brings out the brand new TensorFlow. On 2.7. "But porting to 3 is their #1 priority" I'm told. Okay cool. You wait. I won't.

So the advances of 3 meet the regressions of 3. Even if you believe the net is positive in 3's favour, that net is not nearly enough to persuade people with actual lives outside of computer science, to port years of code, and more importantly, to port their brains to work with the new dispensation. Python is a language for people who want to get things done. 2.7 gets things done. Zero problems. For many people, that's the killer feature.

Every programming language, programming language update, and programming tool since the dawn of time claims to make it easier to produce high-quality software.


Seriously though, the point is well taken. The right tool for the right job will always apply, and I like to think most developers and programmers acknowledge that there are very few tools that should be used in every opportunity and by every person.

And oftentimes it's true. Not sure what point you're trying to make here. The linked article pretty convincingly makes the case for the listed improvements in Python 2 -> 3.

Just because "Use X, it's better" is a frequently made argument doesn't mean it's wrong, especially when X actually is better!

When saying "the main bottom-line difference between X and Y is that Y is better", you're implying that you'll say something interesting and nuanced and then... not.

I could say "the main difference between Ruby 1.8 and 1.9 is that programs run twice as fast in 1.9" (true, 1.8 is dog-slow.) And that's "Y is better", but it's also fairly specific.

"Python 3.0 makes it easier to produce high-quality software" is annoyingly vague. They seem to mean software that behaves more consistently by reducing the amount of default unpredictability in the language spec -- that's fair and specific. It's also not what the article said, sadly.

The summary of the argument may be vague, but that's ok, it's the summary; the specifics are why there's an accompanying article.

Sure. But that's still a lousy title. "Python 3 is better than Python 2", for instance, while true, is completely failing to explain why the article is interesting.

I'm not defending the title, and I don't think lots of discussion over said title is really that useful. I found the contents of the article interesting.

This article makes a cogent case for the assertion at least.

That's because people don't generally make shittier things. It happens, rarely, on accident. But by and large (for computer languages) newer == better.

That is only true for the languages that manage to gain some momentum. I could probably write a new language, given some time (I am not underestimating that task), but most likely it wouldn't bring anything new so nobody would use it. I'd say that language designers who manage to convince one person to spend the time to learn their languages for a pet project have already done a good job.

The difference being while a lot of people argue against upgrading to Python 3, I've never heard anybody argue against upgrading to PHP 5.

I don't know the version but I saw one program written in PHP break because there was a custom function called "goto" and a later version added that as a keyword. But often people don't really have a choice as to whether PHP is upgraded or not since it often runs on shared servers.

Yeah, but that's not a good excuse to avoid upgrading as you can simply rename the existing goto method.

Python 3 is so far ahead now as a language that it surprises me anyone would even consider it for a new project. Among other things, Python 3.4 and 3.5 have added support for coroutines that provide a whole new level of utility to the language. Meanwhile, Python 2.7 is a study in limitations to anyone who has followed up on Python 3.x developments.

I still don't understand what happened here historically, was there ever an upgrade path or way to make this a smooth transition? I can't think (surely there is) of another language that had such a hard-breaking version change that still persists 10 years later as a fracture across the community.

VB6 versus VB.NET is a worse historic pothole of a version change for arguably the same language.

Ultimately the Python 2 -> 3 transition has been as smooth as developers make it. There are certainly a lot of transition efforts put in place, such as the ability to import from __future__ some of the 3 behaviors and abilities and some automated migration tools. At this point, too, most of the major libraries today support 3 and there's few reasons not to build for 3 and maybe do a few tweaks to also support 2, if you truly have to. (A lot of good 3 code runs on 2 unmodified and just needing the Python equivalent of polyfill libraries and __future__ imports.)

Some of the fracture has been binary ABI compatibility and mini-versions of this struggle can certainly be seen in just about any language with an ecosystem of cross-platform native libraries to support (the brief Node/io.js being an interesting fracture more in that it was resolved so amicably so relatively quickly).

But ultimately it's tough to resolve the psychological and political hurdles: for whatever reason there seem to be "die hard" Python 2 fanatics that hate the direction Python 3 has moved and don't seem interested at all in compromise or transition. It certainly doesn't seem to be as strong as the hate divide between VB6 and VB.NET (which will likely forever be a weird blood feud until VBA and VB6 programmers die of old age), but still haters are gonna hate.

It hasn't been 10 years. There were known IO performance regressions at the time 3.0 was released, nobody involved in releasing it would have said to use it in production right away. 3.1 was released ~6 years ago:


(Even if the above is deemed BS, the release of 3.0 was 7 years ago)

Many of the 3.0 features can be switched on in 2.7 and there were some attempts to automate as much of the code change as possible (but people ended up preferring to write code that runs on both 2.7 and 3.x, but I suppose that then is not really evidence of a fracture).

I remember quite distinctly when the Python 3000 manifesto was released just under ten years ago, in April of 2006. That is when the line in the sand was drawn and the key mistakes were made that caused the "schism" in the community. What the Python community was screaming for was a fix for the GIL and the other performance issues in Python, not minor changes to Unicode handling that were maybe awkward but perfectly possible to get right in Python 2.6.

Google seriously spent tons of effort at the time trying to remove that GIL. They were one of the biggest proponents of Python at the time, and frankly they were part of what made it so successful. With their work on Unladen Swallow not making meaningful progress and the Python core developers concentrating on Python 3000, making it clear that they don't even care, it isn't a wonder that Google was hedging their bets by hiring Rob Pike to work on Go, which many now see as most directly being a competitor to Python, and which has sapped a lot of people who would potentially be Python users.

Meanwhile, the broken release of 3.0 needs to be blamed for part of the problem, not used as a reason to delay the historical date of fracture: a lot of things were broken in 3.0, including some basic things that Python got wrong in Unicode. The performance regressions were not quickly fixed, making people hesitant of the benefits of Python 3, and the Unicode issues (which were not fixed until years later, when they added the round-trip-safe Unicode conversions; but even this is a workaround, as filenames are defined to be strings of bytes by the filesystem) were ironic as the whole point of Python 3 was to somehow be better at dealing with Unicode :/.

And you seem to be forgetting that the real problem was not the features added in 3.0 but the features taken away at the same time that made it impossible to write reasonable code that ran on both 2.7 and 3.0 even for extremely simple cases like "catches an exception". They only just a year or so ago have started to crack their party line of "shut up and upgrade to 3: it is better and you are wrong" and add some backwards compatibility features to 3.x, such as supporting the u prefix (which was the absolute dumbest thing they could have removed).

The transition has been pretty smooth, but quite slow. Nearly all the major libraries are update - IMHO if a library doesn't support Python 3 these day's it's probably unmaintained and should be avoided. There's plenty of support for transitioning smoothly especially with a handful of changes in 3.2 and 3.3 to make this even easier.

I think there are however a small number of very vocal people who didn't like the change, and their posts tend to linger...

Perl 6 started 15 years ago.

Perl was the workhorse of the early web, when I was a young programmer interested in the web it seemed like the most worthwhile language I could learn. Perl isn't dead but it's lost most of its appeal, and a large part of that is due to the failed development on Perl 6.

Yeah, but that's not really the same as what has happened in the Python community where Python 3 has been available for so long but there's still resistance to making the shift.

Perl 6 still isn't available.

Perl6 is very much available, right now: http://perl6.org/

Incomplete implementations of Perl 6 that are identified as pre-final releases are available right now (and have been for some time), but that's different from a general release. A "complete" release is planned for this Christmas; its a very different situation than Python 3, which has had stable releases for several years, including several feature releases after the first stable release.

I don't think it's fair to characterize Rakudo Perl 6 as incomplete, at least on the MoarVM backend. It is very nearly complete, with the bulk of the semantics fixed in years past. It is currently passing 122627 tests[1]. If you write code today, it will probably not break by December.

I guess it still counts as unstable software, but I used Gmail beta for half a decade and that turned out fine. At this point, its a matter of nomenclature, because I've certainly used buggier "final release" software.

[1]: https://github.com/coke/perl6-roast-data/commit/2aeeb87edc

> I don't think it's fair to characterize Rakudo Perl 6 as incomplete, at least on the MoarVM backend.

Look, I think Rakudo Perl 6 is great, and its quite usable for many things. OTOH, I think it is quite fair to refer to it as "incomplete", because:

> It is very nearly complete

"very nearly complete" is a kind of "incomplete".

You can't really make a fair comparison to Perl 6 uptake before even the Perl project is calling Perl 6 "ready" with Python 3 uptake 7 years after PSF said Python 3 was ready.

All of the professional Python devs I personally know laugh at people who voluntarily stick with Python 2.

There is a library (and script) called 2to3 that automates code translation from Python 2 to 3. I wouldn't rely on it for my own projects, but I have seen a number of authors use it successfully (e.g. write their app in Python 2 and tell users it is compatible with automatic translation).

If you're on 2.7.x, you can certainly import in quite of the newer features. There are also some resources for writing 2/3 compatible code, such as the six library and http://python-future.org/. It is no problem for fairly trivial applications, but I don't envy anyone who has to ensure 2/3 compat for a large project.

Floobits was forced to write python 6 ... or 2/3 ... for our Sublime Plugin. The python3 experience has been awful enough to completely dissuade me from using python for any other project. Completely breaking backwards compatibility for small features no one cares about was a complete disaster as evidenced by half? of the community still using 2.7.

(sorry, April 2015, not 2014)

On a tangential note, no pun intended, I am a little confused. Why would you ever want compare in the manner of "TrickyAngle(6) < TrickyAngle(5)"? I can understand something like "trickyAngle_A.degrees < trickAngle_b.degrees" but when comparing instances of a class like that, when would one ever be...greater than the other?

Often, an instance of a class represents a mathematical entity (vector, matrix, imaginary number, mathematical set, et al) where less than might be a fundamentally reasonable operation, especially in some specific context.

His example is using it as an angle, where comparing two of them is conceptually pretty sensible.

If you pretend it's a structure, sure, you'll need to extract primitive data types out of it before doing anything with it. But if you want it to behave as a higher-level abstraction, reimplementing less-than (say, for comparing the vector magnitude) at every call site isn't great.

(This is completely beside the point.)

> comparing two of them is conceptually pretty sensible.

Well, there are a bunch of sensible ways to do it, depending what the angle represents. You might not want to privilege any of them over the others directly in the class.

Is it a point on the unit circle? It doesn't make much sense to compare (sqrt(2), sqrt(2)) to (-1, 0).

Is it an angular distance to travel from point A to point B? Then you want 355 and -5 to both be less than 10.

Is it an amount that a screw has to be rotated? Then you can't take mods, so 720 > 360 > 0.

Do you have a robot that can only turn anticlockwise? Then maybe 5 < -10 < 355. This is the comparison implied by the implementation in TFA (self.degrees = degrees % 360), but it's pretty obscure.

1. Encapsulation (arguably the whole purpose of creating such a class in the first place).

2. Would you think it's reasonable for a class to implement equality ==, so it can tell you if it's equal to another instance of the same class? Then why not ordering?

3. Take Python's set type as a useful example. A < B tells you whether A is a subset of B, A | B returns the union, etc. All very handy.

I disagree with 3, a subset is not "less than" the superset. Sets are equatable, but they're not comparable. A named function or a different operator would be preferable.

The set of sets is, itself, a strictly partially ordered set by the "is a proper subset of" relation (and a non-strict partially ordered set by the "is a subset of" relation); if Python's < and > operators are viewed as generic strict ordering operators (and those symbols are, outside of Python, conventionally used for both strict total orders like those on reals and strict partial orders), then their use with sets for the strict partial order defined by the "is a proper subset of" relation is quite natural.

Given that even a relatively inexperienced developer can easily port 2,500+ lines of Python 2 to Python 3 per day, there's really very little reason for anyone to still be on Python 2 regardless of the benefits or lack thereof of Python 3. This is especially true if you're not trying to take advantage of all the performance benefits of Python 3 right away, and are content just to wrap iterators in list() or whatever just to get the functionality working the same without much effort.

I picked up Python about 1 year ago as my hobby language, and following the common advice, I started with 3. But I quickly encounter two problems:

1) some libraries I actually wanted to use were still 2

2) When I tried to google a solution for something, more than once I stumbled across a code which just didn't work. Then it occurred to me, that this is because the example is Python 2, while I am on 3.

To sum it up, I later switched to 2, and everything was fine since then. I also asked my friend, who is a professional python programmer, and to this day his company still uses 2 and don't plan to switch.

> 1) some libraries I actually wanted to use were still 2

For anyone still in this situation, re-evaluate them - that issue is mostly solved.

Python 2 is deprecated. It's no longer supported after 2020. There's no point in using it for new projects, unless there's a very good reason. You're missing out on lots of incredibly cool features like asyncio and static type checking.

There's even an increasing number of Python 3-only libraries.

Can you elaborate on static type checking? Is this the type hinting a.k.a type annotations outlined in PEP 0484 or something else? What is this even useful for? Better documentation or integration with an IDE like PyCharm?

I think he's talking about PEP 0484. As I understand it they go into separate files and there is a gradual typing system. The inspiration came from mypy http://mypy-lang.org/ and the point is mainly for IDEs and convenience in development currently, though they could be used for optimizations in implementations other than the reference interpreter.

This seems like your mental model assumes relatively small codebases. On a codebase with 1m LOC, your estimation of 2500 lines/day still takes ~400 person-days. That's a big investment to put into something that doesn't present clear benefits. Moreover, converting a huge codebase piecemeal is something of a challenge; I don't know if it's even possible to have python 2 importing python 3 (or vice versa) but assuming your large codebase is relatively interdependent, reorganizing the codebase such that you can split it into modules which can be migrated independently could also be a big undertaking. I would also anticipate that there would be a long tail of fixing bugs caused by incorrect migration (are we code reviewing those 2500 lines/day?) as well as lower productivity as developers get used to the new language.

This is why you frequently hear that large, established codebases that control their environment aren't updating to Python 3. It's pretty easy to move a few thousand lines over but complexity scales non-linearly with the size of the codebase.

It's libraries rather than porting.

Funnily, though, I just thought 'yeah there's that library I use, they never ported that', for one of my codebases. Just looked it up and they now have ported it. So I guess it's time for me to do the same. I wonder what proportion of (actually used) libs are Py 2 only still.

At this point very few 'actually used' (meaning, used by more than a handful of organizations at most) are Py2 only. There are a few still in the process of porting, and a few that will never be ported due to being replaced (but are still somewhat popular for one reason or another.)

For those who don't know: http://python3wos.appspot.com/

People are still releasing brand-new Py2-only libraries to this very day. This is especially true in science, math, nlp, ml, etc.

Yes, and I kinda think it is silly, but whatever. Care to share any examples of recent releases? I know Google's TensorFlow was/is Py2 only, but a Py3 release is imminent. Google has an unfortunate habit of using Py2... I can understand they have their reasons... it is a bit irritating though :)

(Edit: Not that I care what they use internally... but sometimes they release a really cool open source library and it is Py2 only... arr! But at least they are sharing, so I can't really be too upset :)

snap-python. Development started in 2013, and yet they refuse to support Python 3: https://github.com/snap-stanford/snap-python/issues/52

So you'll have to convince them. Maybe they did not realize that SWIG supports Python 3 nowadays?

There's really no good reason not to support Python 3, and projects which refuse will at some point be forked.

Nah, just do the smart thing and stick with python 2.7.

Such as....

The entire scipy stack, ntlk, tensorflow has partial 3.0 support and plans to support both, caffe, theano, etc.

There are few to no libraries that don't support, or don't plan to soon support both.

I thought you were listing examples of libraries that didn't support Py3... so I was gathering links to refute your claims... then I realized you were posting examples in support of my claim that Py3 support is the norm :)


Python 3 support is literally issue #1: https://github.com/tensorflow/tensorflow/issues/1

I suspect this is just an issue of they didn't want to block the release on not having Python 3 support ready at launch.

The weird thing as a non-python user is why did they even start with python 2 as first class in mind?

Probably the same reason (or at least somewhat related to) that Google App Engine only supports python 2.7. (https://cloud.google.com/appengine/docs/python/)

Python 2 is used internally at Google, and TensorFlow is, I believe, an open sourcing and nicer packaging of something that has already existed internally for awhile.

Big companies are moving slow, even Google.

because the low risk, high-audience strategy, when you're launching a new product, is to follow the path of least resistance, rather than being dogmatic. Maximum audience.

It's easy to port to python 3 but I think the holdup with some libraries is that they want to support both 2 and 3 with the same library so that devs who cannot upgrade the python on their system don't get left behind. Supporting both at the same time is possible but takes longer in most cases.

I just read the http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ article referenced elsewhere in this discussion and it illustrates why there is some stuff that is not so easy to port.

Most servers are probably still running on Python 2.6 and 2.7.

Between virtualenvs, containers, or just plain ol' `make altinstall`, there are enough ways to run a python application that _doesn't_ use the system python interpreter that this shouldn't be an issue. Not to mention that most linux distros have packages set up in a way to allow python 2 and python 3 to be installed simultaneously, without interfering with system applications that rely on the version of python that is shipped with the OS.

Majority of people dont care about new shiny, they care about things that broke or changed. Probably 90% of them started complaining because print statement is a function :).

Python 3 insistence on unicode hurts my brain. I had a 'pleasure' of writing Python 3 script manipulating data produced by Python 2 (custom pickle). Ended up reimplementing whole Python 2 pickle library by hand, because apparently no one imagined a use case involving reading raw bytes into a buffer.

The problem is really that Python 3 isn't new and shiny. It's an attempt to improve some downsides of Python 2 and some parts did actually get better but many parts didn't although they easily could have been improved, some parts even got worse.

Python 3 provides no real benefit to an experienced Python developer. So considering all the things you could possibly do, switching to Python 3 is not switchting to a new and shiny thing, it's just a bad decision.

If you really want to switch to something new and shiny, there are now many more interesting language options. Modern Javascript has copied so much from Python, you can feel very much at home there. If you care about concurrency, you have Go. If you care about concurrency and care about using a language that's well designed or just want performance, there is Rust.

async lib is really really nice. You can easily issue async requests now. Also, a bunch of useful utilities were improved like shutil.

The idea behind asyncio is nice. The API is mostly nice. The new async/await syntax that has been added mostly with asyncio is great. The documentation however is a horrible mess and the implementation could almost certainly be a lot better. Now it's in the standard library though...

David Beazley recently held a great talk on this very topic, in which he goes into these issues in more detail: https://www.youtube.com/watch?v=lYe8W04ERnY

Complaints about a language translate automatically to low-quality software the same way micro-optimizations automatically translate to fast software.

In the real world, software depends significantly more on the properties of a developer or a team. Building software is about people more than it's about tools.

Dealing with Unicode in Python 2.x is a _nightmare_. So if you need to go beyond ASCII, Python 3 is a huge improvement. On the downside, if you want to use tools like App Engine, you're still stuck with 2.7 for now, but hopefully that will change.

Python 2's handling of Unicode basically isn't. That's as much the fault of third party libraries as the core language, but it is a real and constant time suck.

I'm sure I hit Unicode problems in Python 3 a few years ago as well, but if current versions are better that would be a sufficient reason for switching.

I've had no issues whatsoever dealing with unicode in Python2. Just wondering what you found to be a nightmare?

This is excellent. The first one is totally unknown to me. I wish there is a comprehensive changes between 2.7 and 3.X with code, release notes is okay but falls under different version and hard to grasp at once.

It's really hard to keep up with language changes and implementation changes.

EDIT: I am so surprise at the number of downvotes in this HN thread. I bet there is some haters out there today :-)

The first one was a headline change in the Python 3000 release notes:


The other items in the article are also mentioned there, but not as headline items. The great majority of behavioral changes from 2.7 to 3.x should be listed in that one document, there was a freeze on language changes for several of the 3.x releases.

That level of code demonstration from the author is more comprehensive than the documentation in what's news.

A lot of the PEPs are way too verbose for a quick bite. PEP are great for a weekend read, but for a quick summary, OP does a better job.

I wasn't saying that you should be satisfied with just the release notes, I was pointing out that you were overstating the difficulty in finding an overview of the changes that went into Python 3.

There are lots of other changes since then, but most of them are additions (that don't directly impact old code) or to libraries or whatever.

Does this serve well enough as a comprehensive diff?


(Actually, maybe not, because it doesn't have code examples.)

The whole 2 vs. 3 debate would simply have not existed if the Python core devs would have simply refused to backport tweaks and fixes from 3 to 2 and would have forcefully deprecated 2.x immediately after releasing 3.0, even if it was inferior to 2.x at first release.

Guido should have used his power to force the move forward down people's throats, because that's the responsibility of a BDFL. The D is there for a reason, and its exactly in this cases that the power of the Dictator role should have been used.

Communities rarely move forward on their own if left to decide democratically: they need o be forced or tricked into moving forward by individuals with vision!

...heck, I'm even starting to like the nodejs folks for their "shove the future down people's throats" attitude :)

I think it would have been more prudent to have implemented an iterative approach:

Python 2.8: - boring easy to fix changes like division, print function, exception handling syntax, removal of iteritems() / etc

Python 2.9 - library reorganization (urllib2 & urllib, etc)

Python 3.0 - unicode stuff

Most organizations that have been lingering at 2.7 for 5+yrs would have long since migrated to 2.9, and would thus be half way to 3.x in terms of backwards compatibility.

Now instead, everyone has to introduce the better part of a decade's worth of backwards compatibility changes at once, or be left in the dust. Hardly a fair choice.

Some of your Python 2.8 exists as from __future__ imports added to various 2.x releases.


(in versions where the new behavior is the default, the import does nothing and is not an error)

I agree. Spreading it over multiple releases would have been better.

I'm sorry, but given how much high quality software is written in Python 2, examples of high quality software written in Python 3 more easily would do a better job proving the central thesis of the article than pointing out design defects of Python 2 fixed in Python 3. Because at the end of the day, there are still plenty of design defects in Python 3 (or any other language) that a coder might trip over. I mean who is to say that new yield functionality of Python 3 wouldn't lead to hard to understand spaghetti code? Or the new way of handling Unicode isn't worse than the old way? Listing concrete projects that benefitted from being written in Python 3 rather than 2 would be much more convincing.

> I'm sorry, but given how much high quality software is written in Python 2

The reason for that isn't "Python 2 is intrinsically better than Python 3," though. It's the sheer inertia of a language whose last minor version (2.7) was released five years ago and the persistent meme that Python 3 is buggy and unusable, despite clear advantages as noted in the article and the added compatibility of popular Python 2 packages in recent years.

As with hardware, Python 2 will only die when the top packages completely drop support for it, and it wouldn't surprise me if that happens sooner than later. (Django, for example, will be dropping Python 2 support in 2017: https://www.djangoproject.com/weblog/2015/jun/25/roadmap/ )

Why refer to these as two separate languages? Right now, the vast majority of Python's most popular libraries and packages support both 2.7 and 3.4+ with a single code base. We are a Python shop, and we are doing the same with all our new projects as well. There's a little bit of extra awareness you have to have to do certain things (metaclasses, for instance), but the overhead is no more onerous than, say, following well-defined coding conventions.

The latest Python 2 specification is five years old, but new implementations are still being released (PyPy 4: October 29, 2015). Similarly, the latest Common Lisp specification is from the 80s, but there are still several implementations compatible with modern OS's. As long as the language is useful for somebody, it will be kept alive.

As I understood it he was saying the better way to prove Python 3 is good for creating high quality software might be to look at the high quality software already written in Python 2, rather than pointing out flaws in it - since all programming languages will have flaws (including Python 3)..

It doesn't help that Mac books come with 2.7 installed. For someone starting out (Python is often used as a teaching language in CS101-esque classes), figuring out how to get one's machine to use 3 might seem really difficult/impossible.

Almost everyone will say "Use homebrew". I don't think it will that difficult.

Yeah brew install python3 seems to work. You then just type python to run 2 and python3 to run 3. The compsci course I did was in Python 2.

> Python 3 makes it easier to develop high quality software

That is, unless said software needs to include a popular library that has not been ported.

There aren't many unported popular libraries left. Actually, the only one I remember off-hand is Twisted, and they're working on it.

Popular libraries aren't the problem. The big problem is unpopular and obscure libraries.

Circa 2010 is calling and wants its FUD back.

Library compatibility is definitely an issue. You still have some very influential libraries that are not Python 3.x compatible. But an oft underestimated issue is the number of Py3 improvements that have been backported to Py2. For example absolute imports. That has made the case for moving to Py3 harder.

Argh, I don't like this argument.

There's really barely anything that is python2 specific, most of the things that weren't ported is no longer being maintained.

I think the main reason you should start moving to python3 is that it will be discontinued. Python 2.6 was EOL in 2013, 2.7 they've been more generous and you have 4 more years, originally it supposed to be 2015.

There's no good reason to stick with 2.7, if you have some legacy apps there's nothing preventing you from having Py2 and Py3 installed side by side.

If you need to write libraries that share code, it's much easier to write the code in Py3 and then make it backwards compatible with use of from __future__ ... imports and six module.

I feel like people who still complaining about the difficulties did not actually tried Python 3. Py3 is much more enjoyable language to write in.

TensorFlow is just the latest library that is Py2 only. App Engine is Py2 only. The bigger issue is the 2 biggest weakness with Py2 that were not addressed in Py3 (CPython version at least) - GIL & JIT. JIT is addressed more or less by PyPy but it is Py2.7-compatible, not Py3 compatible. Addressing this in the standard CPython implementation can make for a very compelling case for move to Py3. Otherwise, it does not buy you much.

TensorFlow and App Engine are both released by Google which prefers Py2 because of the FUD, although regarding TF hopefully it won't be for too long. The first issue created on GitHub[1] is basically about adding Py3 support.

Regarding GIL and JIT. Those problems are mostly performance related and not easy. As you mentioned GIL is available everywhere so it's not a good argument regarding py2 vs py3.

As for JIT you get it with PyPy which by itself is not 100% compatible with CPython (for example as of now majority of python modules written in C simply won't work).

Although they do have PyPy compatible with Python 3 although it is not as stable, but since Py3 is picking up hopefully it will be becoming more popular.

For me, it's the other way around, I have not used PyPy primarily because it's compatible 2.7 and I don't use Python for performance, but if I could get performant Py3 implementation then why wouldn't I want to use it?

[1] https://github.com/tensorflow/tensorflow/issues/1

Python 3 is great, but that doesn't matter.

The ecosystem is rotting for 2 reasons.

First, there's no easy way for library developers to maintain a codebase that supports both 2/3 development in parallel. Yes, it's easy to use a 2to3/3to2 tool but unlike transpilers in Javascript the output code may still require fine-tuning after the tranform.

This could be solved by adding preprocessor support to python (ie for conditional loading of code) but GvR is adamantly against this approach use to the way many devs write unmaintainable code in c via the preprocessor.

I actually published an OSS project called pypreprocessor that was supposed to solve this issue. It works to add preprocessor conditionals to code but won't work with both 2/3 code in the same file because there's no hook to inline code before lexer runs and throws syntax errors.

Second, package management in python still sucks. Do you use distutils, distutils2, setup.py, etc? There's no single standardized approach that everybody agrees on and each tool follows a unique approach or needs to be supported because of legacy code.

Unlike the Ruby, Javascript ecosystem where package management and publishing is relatively straightforward/easy, it's an absolute PITA to publish packages to PYPI.

Back in 2011, the answer was. Just wait 5 years for everybody to adapt their libraries to support Python 3. 5 years later, nothing has changed. The existing libraries don't want to piss off users by dropping Python2 support and maintaining two separate branches of the code in parallel sucks. So, since most everybody that uses python in production still relies on Python2, Python2 is still the default.

The library ecosystem makes the platform. The tooling sucks so library devs don't want to make the switch.

Until there is a major cultural shift python devs have 2 options. Use a better version of the language without the support of a mature library ecosystem, or use an outdated version of the language with a rich ecosystem.

And how many subtle bugs will be introduced during porting software for Python 2 to Python 3?

We ported a major production site from python 2 to 3 a year ago. Exactly one bug has come up - something to do with unicode encoding on a relatively minor part of the site. Thanks to 2to3, the concerns for bugs coming up in porting are way overblown.

The only insidious issues I've encountered with porting apps have been related to Unicode as well. A lot of old Python 2 code is (what appears to me) very hackish compared to the elegance that is Python 3.

I've seen quite a bit when porting a statistical library that used Python 2's default floor/int division behavior for quite a few operations.

Sometimes, though, the 2 version had bugs because of this behavior. So, the bugs were both subtle and bi-directional. I definitely fault Python 2 for allowing that practice, and am glad we're now using Python 3.

I think it has more to do with that people using Python3 want to write good looking code, and the people still using Python2 are more interested in getting something running quickly and therefore tend to produce hackish code.

The traps and tricks are well known. With due diligence -- few if any.

The traps are not always well known (to me).

For example, I just discovered a few days ago while porting a Python2 library that byte concatenation is an O(N^2) procedure in Python3 while it is an O(N) procedure in Python2.

So you have to use bytearrays and really rewrite all the code that deals with bytes. Basically, even if something works, you still have to performance profile.

Do you have a reference that explains this? I'd like to learn more.

I don't have a reference, but bytearray concatenation with some more bytes is amortized O(1) w.r.t. original bytearray size, but for bytes it is O(n) where n is the length of the byte string, because bytes are immutable and they are copied to a new byte string on each concatenation.

In python2 strings/bytes are also immutable but concatenation is O(1) w.r.t. original string. Not sure about internal implementation differences.

so basically whenever you see code like:

    some_bstr = b''
    for something in get_something():
        some_bstr += something
And if you have to do it this way, make sure that some_bstr is a bytearray or make it a list that you later join.

For me it was these two...

1. Calling the next( ) builtin instead of .next( ) method on an interator or generator.

2. .items( ) on a map in 3 is like .iteritems( ) in 2. Which means that nasty short cuts like map.keys( )[0] no longer work.

Oh yeah, and print( ) is a function :)

I think a bigger issue with keys() is that now the order is non-deterministic. So people may have relied on it not being in any particular order but being the same order. These are the kinds of bugs that are difficult to detect.

I don't understand this comment. The hash method changed under different releases of Python 2.x, enough that was had to change our doctests to ensure a consistent order. So as far as I'm aware, the only guarantee is that calling .keys() and calling .values() will give you the terms in the same order, so long as there hasn't been a modification in the middle.

Ok, try running this in a program like so:

    d = {chr(i):i for i in range(65,91)}
Do it with python2 and python3. You'll see that the output in python3 changes every time.

If someone was relying on consistent ordering, they're going to have a bug.

Python2's ordering is deterministic[0]


Okay, we're talking about the same thing. As the documentation points out:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

If you restart Python, you have broken the correspondence.

You'll note that either the documentation is incomplete or your interpretation is incorrect, as the most recent versions of 2.6 and 2.7 will use a randomized hash table when the -R flag is enabled:

    % ~/Python-2.7.10/python.exe -R x.py
    {'M': 77, 'L': 76, 'O': 79, 'N': 78, 'I': 73, 'H': 72,
     'K': 75, 'J': 74, 'E': 69, 'D': 68, 'G': 71, 'F': 70,
     'A': 65, 'C': 67, 'B': 66, 'Y': 89, 'X': 88, 'Z': 90,
     'U': 85, 'T': 84, 'W': 87, 'V': 86, 'Q': 81, 'P': 80,
     'S': 83, 'R': 82}
    % ~/Python-2.7.10/python.exe -R x.py
    {'Z': 90, 'Y': 89, 'X': 88, 'W': 87, 'V': 86, 'U': 85,
     'T': 84, 'S': 83, 'R': 82, 'Q': 81, 'P': 80, 'O': 79,
     'N': 78, 'M': 77, 'L': 76, 'K': 75, 'J': 74, 'I': 73,
     'H': 72, 'G': 71, 'F': 70, 'E': 69, 'D': 68, 'C': 67,
     'B': 66, 'A': 65}
I can totally understand how people expect an invariant order. As I pointed out, our regression code broke in the 2.x series because we relied on consistent ordering, and CPython never made that promise. But what I quoted above is the only guarantee about dictionary order. Everything else is an implementation accident.

Nor is it the only such implementation-specific behavior that people sometimes depend on.

  >>> for c in "This is a test":
  ...   if c is "i": print "Got one!"
  Got one!
  Got one!
That's under CPython, where single character strings with chr(c)<256 use an intern table. Pypy doesn't print anything because it doesn't use that mechanism.

Note that 'is' testing is also faster:

    % python -mtimeit -s 's="testing 1, 2, 3."*1000' 'sum(1 for c in s if c is "t")'
    1000 loops, best of 3: 893 usec per loop
    % python -mtimeit -s 's="testing 1, 2, 3."*1000' 'sum(1 for c in s if c == "t")'
    1000 loops, best of 3: 1.01 msec per loop
This extra 10% is sometimes attractive.

Sure, but why did you enable the -R flag?

We are talking about the default way of doing things in the most commonly by far used implementation.

I'm not saying someone should have relied on the specific ordering or that the code that does rely on it is a great way of doing things.

CPython2 did make that promise -

    CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
In other words, the order should be the same no matter how many times you restart the application.

It is a subtle source of bugs since it always worked in each specific version of CPython2 without the -R flag.

"why did you enable the -R flag?"

Because either the documentation means to include -R in the description, in which case your interpretation of the documentation is incorrect, or the documentation is incomplete because it doesn't describe a valid CPython 2.x run-time. Either way, it indicates that the difference isn't, strictly speaking, a Python2/3 issue.

"an arbitrary order"

Where does it say that the arbitrary order must be consistent across multiple invocations? Quoting from https://docs.python.org/2/using/cmdline.html#cmdoption-R :

> Changing hash values affects the order in which keys are retrieved from a dict. Although Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds), enough real-world code implicitly relies on this non-guaranteed behavior that the randomization is disabled by default.

I totally understand your point. I remember the debates about how this would break code. But it's there to mitigate algorithmic complexity attacks against an every increasing attack surface. This was the best solution they come up with, along with a migration path to the new default.

> It's the kind that easily sneaks its way past the most vigilant unit tests

If your unit tests are the most vigilant, you'll test the output of that function for just such a comparison.

Nothing about unicode handling?

Unicode works just fine in 2.x for people that give a shit :)

It's a lot easier to work with in 3.0, though.

For example, xmlrpclib in 2.x will return either str or unicode depending on whether the particular piece of data it's returning happens to contain non-ASCII characters. In 3.x it always returns a unicode string.

Mitsuhiko has written extensively about Py3 unicode problems, e.g. http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

I think a lot of it boils down to, sometimes you really really don't want to care about unicode. Linux filenames are one thorny issue - Linux thinks of them as strings of bytes, and a tool is broken if it can't also deal with them that way. Py3 makes it very difficult to do that correctly. There are also issues with stdin/out being opened as unicode vs byte streams.

For web development, the "unicode sandwich" (http://nedbatchelder.com/text/unipain.html) works great in my experience. However, I can see his point that for some kinds of tasks, Py3 is unambiguously a downgrade.

The problem is that in Python 3 some cases aren't easier, they've become impossible to do properly. Dealing with paths is a good example where Python 3 has created a huge mess. IO specifically stdin/-out/-err is another area in which Python 3 has created massive problems.

I believe they fixed that in 3.1, so it's no longer impossible: https://www.python.org/dev/peps/pep-0383/

Regarding the huge mess, that exists with or without Python 3 if filenames use unknown, incorrect or inconsistent encodings. To treat a path as as text, e.g. to show it to a human or send to another system, you have to know its encoding. Python 3 and GTK (with its G_FILENAME_ENCODING and G_BROKEN_FILENAMES) make this clear - some other software (like Python 2) just silently uses and propagates the broken data.

Yes I recall you having written as much, several times! b^)

People who care about handling Unicode correctly have already made up their mind.

Many people don't, though. Many people run their Python scripts in cmd.exe.

> Many people don't, though. Many people run their Python scripts in cmd.exe.

Did you know that cmd.exe supports unicode just fine? It's Python that does not (out of the box). The development version of Click for instance supports input/output in cmd.exe with both 2.x and 3.x.

Unicode is not a feature of Python 3.

I've seen what you've managed to do with Click and cmd.exe and I complimented you for it on Twitter.

I doubt that anyone has ever managed to output Unicode into cmd.exe in Python before you made this.

I hope Python 3.6 can incorporate what you did.

Question is: could they have fixed these problems without breaking everything else?

TL/DR: No. Many, many of the changes are backward incompatible with the existing Python 2 syntax and APIs.

Any changes that could be made in some compatible way have largely been backported to the Python 2.x versions (the latest stable version of which is Python 2.7.10) that have been released more-or-less in parallel with the 3.x line (search for 'backport'): https://hg.python.org/cpython/raw-file/15c95b7d81dc/Misc/NEW...

Enough such has been done that you can write Python code that runs in recent versions of both Python 2 and Python 3 (which is very important for 3rd-party library maintainers that want/need to support both it in a single codebase), but to do that you will be forgoing most of the productivity and 'niceness' benefits of the sort that are described in the OP.

I think 3.x could have kept easily kept backwards compatibility aliases for unicode(), long(), xrange(), iterkeys(), itervalues(), iteritems(). It seems to me keeping those would have certainly have been a huge help when upgrading.

Basically, I think a lot of the shims in the six library could have been included in 2.6, 2.7 and 3.x.

Also, I think it would have been wise to spread the breaking changes over many releases, not saving them all for one massive breakage. That way you can deal with only a few subtle bugs and breakages at a time.

Is it "people use Python 2"?

Slightly tangential, but I wish this blog would date its posts.

Reading "If you're starting a brand-new application today, there are plenty of valid reasons to write it in 2 instead of 3. There will be for a long time." on the same day it was written is not the same as 6 months or a few years down the lines.

I don't actually know when this was written except for the copyright: 2014-2015, so it can't be more than a bit short of 2 years ago.

I remember seeing a retweet in my stream a couple months ago where somebody made the suggestion of not dating blog posts so as to make them seem less ephemeral. I think the logic went that if the writing isn't tied to a single moment in time, then it'll become timeless.

It's one of the more irrational and idiotic suggestions I've ever come across. One of, if not _the_ most important piece of metadata for anything that has or will ever exist is its creation date. In particular for writing, putting it in a historical context usually makes the piece more powerful and increases your understanding of the subject. If it's bad writing then it's not going to have a chance anyway. Sorry to say but that may be true of the post in question. I was under the impression that due to poor Python 3 adoption, Python 2.7 would continue to be supported.

I second OP, please date your blog posts.

The Wayback Machine reveals that it was posted sometime after September 2015: http://web.archive.org/web/20150921221742/http://migrateup.c...

"If you're starting a brand-new application today, there are plenty of valid reasons to write it in 2 instead of 3. There will be for a long time."

This has been true for the last seven years, and I certainly don't see it changing anytime soon.

Your title is a little misleading, since you actually discuss three differences.

I think the OP means the single main difference is "Python 3 makes it easier to develop high quality software." and gives three examples of this.

Which still doesn't make much sense. One thesis, "better software", three supporting arguments.

It's ok to be pedantic, but only if you're correct about the thing you're being pedantic about.

Just checking - you do understand that the person who submitted the link is not the person who wrote the article, right? Just thought I'd ask, because the wording of your comment seems to think otherwise.

Py3 is a completely different language that happens to share the same name as Py2. If you want to talk about typed comparisons you might as well mention C++ and Java and Go and Rust, because you might just as easily move from Py2 to any of them, as from Py2 to Py3.

Not true. Not only can most of the code be used verbatim, but the libraries and APIs are still the same - which would constitute most of the porting effort to another language.

A Python 2 dev can learn Python 3 in a day.


Your comment makes if sound like you might think the passage you quoted was intending to highlight a shortcoming of Python 3. The article is in fact praising this behavior in Python 3 and stating that the default in Python 2 was goofy.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact