Hacker News new | comments | ask | show | jobs | submit login
Moving Away from Python 2 (asmeurer.github.io)
227 points by ngoldbaum on May 20, 2016 | hide | past | web | favorite | 275 comments

I've seen this story happen twice before.

20 years ago there was this great MP3 player, WinAmp 2. And then they released WinAmp 3, which broke compatibility with skins and plugins and which was slow. People didn't upgrade. Finally, they came to their senses and they released WinAmp 5, marketed as 2+3, which was faster and brought back compatibility with older stuff.

In the retail trading Forex world, there is this great trading app called MetaTrader 4. Many years ago they released MetaTrader 5, which broke backward compatibility and removed some popular features. People didn't upgrade. Today, they are finally bringing back the removed features and making MetaTrader 5 able to run MetaTrader 4 code.

Me, I wait for Python 5 (=2+3) which will be able to import Python 2 modules, so that you can gradually convert your code to the newer version.

I have tens of thousands of lines of Python 2 code in a big system. I can't just take 2 months off to move all of it to Python 3. Moving it in pieces is also not really possible, since there are many inter-dependencies.

Uglyfying my code with the "six" module it's also not a solution, since when I'll move, I won't care about Python 2 anymore.

So basically I'm just waiting until a consensus emerges.

It should be noted that the quantity of Python 2 code keeps on growing, I wrote most of my code while Python 3 existed. If Python 3 allowed an easy path forward, we wouldn't be in this situation.

All of the reasonable "2+3" features in Python have already landed, like u'' strings in Python 3.3. The really big issues, like the difference between strings and bytes objects, can't really be emulated away. This isn't like WinAmp or MetaTrader where you can just reimplement the old interface in the new application. Code that mixes strings and bytestrings freely is simply not going to interface well with Python 3 code.

There are few paths for upgrading. You haven't mentioned 2to3, but I will. I've migraded projects of the same size you mention (10s of kloc) with 2to3 and the process took days, not months. About three days with one developer, actually.

There are a number of projects which also have code bases which are compatible with both Python 2 and 3. You say that once you move, you won't care about Python 2 any more, but I that's not a good reason to reject six outright. In my experience, the code isn't really "uglified" but there are just a few minor cases here and there you need to think about.

Anyway. Small-ish projects like the one you are talking about are usually not as hard to migrate as you might think. There are some exceptions, of course.

I agree that the bytes/string problem is a tough one. But the dev team created it and dropped it into the laps of it's users.

The Windows API had the same problem, it was ASCII and then they added Unicode. But instead of just forcing your app to exclusively use a single one of them (sort of like use Py2 OR Py3), they allowed you to mix and match your code, and call either the old or the new version of the API. And gradually, people stopped using the older version and started using the Unicode one. Today, new Windows APIs are strictly Unicode.

Of course, the solution was ugly from their side, having 2 duplicate APIs for the same thing, but it didn't brought pain to their users and allowed them to proceed at leisure.

So, it's like Python. You can still mix and match; you just have to explicitly convert between them at the appropriate points.

The main breaking change is that you can't read a stream of bytes from a byte-oriented resource (say, a network connection that by definition can only send a stream of octets) and magically treat it like text elsewhere. While that's a tempting thing to want to do, it's been an unending fountain of bugs over the years. For example, in Python 2.7:

  >>> a = 'this is a test ಠ_ಠ'
  >>> b = bytes(a)
  >>> b
  'this is a test \xe0\xb2\xa0_\xe0\xb2\xa0'
  >>> type(b)
  <type 'str'>
In Python 3.5:

  >>> a = 'this is a test ಠ_ಠ'
  >>> print(a)
  this is a test ಠ_ಠ
  >>> b = bytes(a)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: string argument without an encoding
  >>> b = bytes(a.encode('utf-8'))
  >>> b
  b'this is a test \xe0\xb2\xa0_\xe0\xb2\xa0'
  >>> type(b)
  <class 'bytes'>
In 2.7, you encode a series of Unicode values back to... a string? This is the source of all kinds of fun as functions expecting a string as input don't have a way to tell whether it's already been encoded or if they need to do so. In 3.5, there's no ambiguity. Functions that write to files or network sockets can either receive strings and know that they have to encode the strings locally, or bytes and know that they don't have to.

Right up near the top of Zen we get:

  >>> import this
  The Zen of Python, by Tim Peters

  Beautiful is better than ugly.
  Explicit is better than implicit.
The Python 2-style implicit conversions were awfully convenient 15 years ago when most things were ASCII. They're absolutely painful now that we've moved past the expectation that strings are arrays of single bytes.

It may just be me, but I tend to assume all strings are UTF-8.

The problem is that Python Unicode strings are indexable with integers. They support random access. Each element contains one character, not one byte. The implementation chosen for Python 3 is to handle them as arrays of 1, 2, or 4-byte items, automatically expanding the characters based on the widest character in the string. In Python 2, the options were to build the environment for 2 byte or 4 byte Unicode characters.

This could have been done differently. I would have been tempted to store strings as UTF-8 in Python 3, but hide this from the user. Most of the string functions, including regular expressions, don't really random access a string; they're sequential and forward only. They could work on UTF-8. The ones that return a string index should return an opaque type (say, "StringPos") which has a reference to the string and a position in it. The string functions should accept that type as an index.

If someone applies "int()" to a string index, or performs arithmetic on it, or indexes a string with an actual integer, then it's necessary to scan the entire UTF-8 string and create an index of where each Unicode character starts. This should be a rare event. Especially if there's support for adding or subtracting small integers from a StringPos by walking the string forwards or backwards, rather than building the full index.


   s = "This is a test" # Unicode string, internal represention UTF-8
   i = s.find("is")     # returns a StringPos, not an int.
                        # no need to index the string for any of these
   s1 = s[i]            # StringPos selects the char found by find
   s2 = s[i:]           # No need to build an index for this
   s3 = s[i+1:]         # Adding a small int just walks the string
                        # these force generation of an index 
                        # the first operation like this on a string is expensive
                        # comparable to a UTF-8 to rune array conversion
   j = int(i)           # converts a StringPos to an int 
   s4 = s[i*2]          # again, expensive
In practice, those expensive operations are rare.

This would cut Python memory consumption way down when processing mostly-ASCII Unicode data. It would also make it more efficient to input and output UTF-8, which is usually what you want.

I recognize your name as a rustacean, and since I'm someone who reluctantly had to add some unsafe code to extract a substring due to graphemes being unstable, I'm surprised to see you take shots at the Python model.

(Granted, I'm still learning, so maybe I can be as efficient without relying on unsafe, but still...)

I'm a Rust user; I'm not involved with its development. I had dinner with some of the main developers once; that's about it.

I'd recommend using this crate to work with grapheme clusters in Rust: https://crates.io/crates/unicode-segmentation

... Due to graphemes being unstable? What do you mean?

Then I think you'd be much happier with Python 3 as it's explicitly clear when you're transforming between str and bytes objects.

I think you miss the point - if all text is UTF-8, then most operations can be performed on bytes without worrying about any transformations whatsoever. By rubbing your nose in the difference between str and bytes, Python 3 complicates your life. On the flip side it makes the complicated case more tractable.

if all text is UTF-8, then most operations can be performed on bytes without worrying about any transformations whatsoever

This is incorrect for anything other than an ASCII-only world. In general, "just use UTF-8" is a shorthand for "just let me pretend non-ASCII doesn't exist", because it leads to people writing code that assumes one byte == one character.

By rubbing your nose in the difference between str and bytes, Python 3 complicates your life.

Having had a lot of experience with it, I would instead say Python 3 makes you think about things and find bugs at the appropriate time -- up-front during the development process -- rather than at an inappropriate time, like 3AM on a Sunday when your pager goes off because you trusted the "helpful" but wrong way of doing things.

> In general, "just use UTF-8" is a shorthand for "just let me pretend non-ASCII doesn't exist", because it leads to people writing code that assumes one byte == one character.

Only if people writing code are entirely oblivious to what UTF-8 is. The most basic feature of UTF-8 is that it's a variable-length encoding scheme. If the main feature is it's variable-length, why would any programmer assume the length doesn't vary?

If the main feature is it's variable-length, why would any programmer assume the length doesn't vary?

Because UTF-8 is the Unicode encoding that looks just like ASCII, so long as the characters stay within the set supported by ASCII, so it lets them say they're doing "Unicode" while really just writing code to handle ASCII.

I am a Dane - we have three letters in our alphabet that are not represented in ascii (æ, ø and å) so let me assure you I don't except text to be in ascii, but I do expect it to be in UTF-8 and will consider it a bug if it isn't.

Bu åäöæø are still in the < 255 range so they only occupy one byte, as any other "normal" ascii character.

But they are different animals. When you read from a socket, you get bytes. Are they text or are they a JPG? Py2 says "they're text holding the contents of a JPG." Py3 says "I don't know. It's your data, you tell me."

Which works great up until the point that some existing data that's supposed to be text, like a filename, or a file in a text-based format, or string fields in a file in a binary format, et cetera, turns out to be invalid UTF-8. Maybe because it was written in the wrong encoding, or because some binary data has been stuffed in (e.g. with heredoc quoting), or due to corruption, or even maliciously as an obfuscation measure. And Py3 turns up its nose and rejects it, probably by throwing an exception and killing the script - while programs written in other languages, including Py2 (if the unicode type is avoided), 'just work'. For most strings there's no actual need to analyze them as text or look at character properties or do anything other than move them from one place to another, possibly interpolated with other strings, and in that case byte strings work fine, including with non-English text if everything standardizes on UTF-8. Therefore they should be the default, and explicit Unicode processing the exception, like Py2. The only situation where explicit decoding improves things is if you have to deal with legacy 8-bit encodings, which aren't all that common these days, and in that case you probably should be doing complicated things anyway like parsing any applicable headers, autodetecting encoding if that doesn't work, giving the user an override option, etc., which is work that types can't save you from.

Using byte strings also doesn't 'just work' if the original format is a Unicode encoding other than UTF-8; e.g. Windows filenames are UTF-16 and can't be converted to UTF-8 without risking errors caused by unpaired surrogates. But that's what WTF-8 is for (or rather, what it should be for, disclaimers in the spec notwithstanding)...

> Which works great up until the point that some existing data that's supposed to be text, like a filename, or a file in a text-based format, or string fields in a file in a binary format, et cetera, turns out to be invalid UTF-8. Maybe because it was written in the wrong encoding, or because some binary data has been stuffed in (e.g. with heredoc quoting), or due to corruption, or even maliciously as an obfuscation measure. And Py3 turns up its nose and rejects it, probably by throwing an exception and killing the script - while programs written in other languages, including Py2 (if the unicode type is avoided), 'just work'.

I'm tired of people spreading FUD about it. they either don't understand what they are doing (and do it incorrectly) or are repeating what other people (who did things incorrectly) said:

    » echo $LANG
    » echo $LC_CTYPE
    » echo 'works' > $'\377'$'\377'$'\377'
    » ls
    » cat $'\377'$'\377'$'\377'
    » python3
    Python 3.4.1 (default, May 23 2014, 17:48:28) [GCC] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> a = b'\377\377\377'
    >>> a
    >>> a.decode()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
    >>> b = open(a)
    >>> b.read()
Accessing files with invalid utf-8 characters works just fine. Being explicit and clearly distinguishing between a string and bytes allows python to know when the filename is encoded using utf-8 (or in fact any other encoding you have defined in your environment) or bytes. Actually python2 struggles here, because it treats everything as bytes and can't tell what is used. Yes, legacy python probably won't crash, but it will spew garbage.

Python3 on the other side is strict about to point errors in your code. If python3 code will crash it's either bug in python or (most likely) bug in your code and you should fix it.

I (and I'm sure a lot of people) prefer program to crash, pointing where the error is, than silently corrupt data once in a while, and when you notice the issue spending days or months figuring out where the corruption is happening.

Okay, but where would that filename come from in a real script? If it was passed in argv, then apparently Py3 gives you a string which is invalid Unicode (contains unpaired surrogates), and explodes if you try to print it or do anything else that implicitly encodes it without the special 'surrogateescape' setting. It's really dangerous to pass such a string around. But at least you can open it, so it's not completely broken. Oh, and that means that two different strings can encode to the same path, which... is probably not usually a problem...

On the other hand, if you read a list of filenames from a file, you'd better remember to either use binary or explicitly choose surrogateescape decoding. Which is a special case of "file in a text-based format" from my original post.

I don't know what you mean by "corruption".

What I meant by corruption is the program might be spewing garbage, instead of simply crashing pointing where encoding problem is.

As for your other point. You're right, but there's no easy way resolving this. Either treat everything as unicode and remember it's binary on linux, or have linux working and everything else broken.

In that scenario you probably should use .encode('utf-8', 'surrogateescape') as soon as you can and work on bytes, so no need of doing the above.

Also, if you for example call os.listdir() with argument in bytes it'll return entries in bytes:

    Python 3.4.1 (default, May 23 2014, 17:48:28) [GCC] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.listdir(b'.')
    [b'\xff\xff\xff', b'test.py']
But IMO, things still could be better. I would love if there was matching sys.setfiliesystemencoding() to sys.getfilesystemencoding(). In scenarios when you are not sure of encoding you could fall back to iso-8859-1 (latin-1) which has 1:1 mapping for bytes.

Currently looks like you would have to use:

    filename.encode(sys.getfilesystemencoding(), 'surrogateescape').decode('iso-8859-1')

> if all text is UTF-8, then most operations can be performed on bytes without worrying about any transformations whatsoever.

Let me guess: you don't work on anything that has any internationalization ever.

Make a list of the things that work vs. the things that don't. Works: input, output, copying, formatting, numeric conversion. Doesn't: slicing, character counting. Probably doesn't: regular expressions, case conversion. Remember that all this is premised on all I/O being done in UTF-8.


Even if we accept your myopic list of features that work and believe your statement that "most operations can be performed on bytes", that doesn't make the language usable for internationalized content. Trading having to call encode and decode, which fail with clear, easy-to-fix errors, for a myriad of subtle, silent encoding failures is not a good trade. Which you would be abundantly clear to you if you had internationalized anything ever.

During my last job hunt I kept getting people doing the "test for palindrome" phone-screen question.

After a while I started creatively expressing my dislike of phone screens by showing them an implementation that hints at some of the complexity lurking in this allegedly-simple task (i.e., you need to understand Unicode classes and normalization in order to write a palindrome tester).

That's totally a question I'd ask, and I don't understand why you don't like it. The reason I'd ask it is not because it's a simple task, it's because it's not a simple task: what I'm looking for is that you aren't going to over-confident-ly throw naive solutions at hard problems. That will cause bugs and have to be cleaned up.

I think if you showed me an implementation that hinted at some of the complexity lurking in the task, that would be a big point in your favor.

I think it depends on the interviewer and the purpose of asking the question. For a phone screen I'd expect it to be a variant of FizzBuzz, designed to weed out the people that aren't worth wasting time on. Coming up with a complex solution, when the interviewer was looking for something simple, is probably grounds for a fail - even if the complexity is justified.

Yeah, a lot of places use it as an initial phone-screen "can you code" question.

I did bust out a version on one call which did detection of combining classes, normalized into composed form before checking the reverse, etc., and the interviewer seemed kinda lost as I explained it.

Ended up not working for them, but for different reasons.

> Works .. Doesn't

I question some details in your lists but imagine you'd agree that the Works vs Doesn't split approximates to whole string level vs sub-string/character level operations.

There is nothing more mind-boggling than using mixed sources of ascii and unicode in Python2.

Except Python2 is the POSIX standard on how to handle this and I prefer retro OSs without a bytes/string distinction.

Python3 is the Windows way of handling strings so the change was pure churn. Both are valid, but Linux is "pretty popular" right? So why break everyone's code just to churn the pot.

If it was good for Kernigan and Richie, good for Joe Ossanna, good for Robert Pike well it's good enough for me.

The problem is it might be good enough for Kernighan and Ritchie, but it's not good enough for Chávez and Çelik (or, well, most of the world). The A in ASCII stands for "American" -- and it gives a good indication of where you can look to find a whole lot of folks who ASCII isn't good enough for.

Actually I'd have naively thought it meant America as in "The Americans" including South America. It's no wonder the IANA want's to change it into US-ASCII (or USA-SCII).

Python2 supports unicode. Chávez and Çelik are covered.

I'll take Python2's POSIX text model over Microsoft's default unicode strings every time because I prefer Linux. At this point in time see no reason to cater to a shrinking server platform.

Not to mention they didn't even get unicode right with Python3. Google did with Go (assume UTF8).

Python2 supports Unicode, but it requires careful thought and deliberation to do it right - and at the same time makes it very easy to do it wrong (by assuming all bytes are strings etc). As a result, when such code is written by people with little awareness of the world beyond ASCII or Latin-1, it tends to be buggy with locales that need more than that.

And yes, it's not really a Python issue, it's a more generic issue with treating bytearrays as strings (what you refer to as POSIX text model). Until UTF-8 became the standard encoding everywhere, there were many Linux apps that couldn't handle non-Latin1 encodings properly, either. It was also the case on Windows in 9x days, for all the same reasons.

> Python2 supports unicode. Chávez and Çelik are covered.

Most python2 code doesn't. It has a nice property of exploding spectacularly when the string you are writing out to console or file contains Chávez or Çelik.

> Most python2 code doesn't.

Citation needed. I can't say I ever had practical issues with not being able to handle unicode on Python if I needed. No matter which library. Please give practical examples of where you cannot deal with unicode on Python 2.

just plain

    for line in yourfile:
        print line
with yourfile containing non-ascii characters and running it in windows console.

That sounds like a limitation of the Windows console, not the programming language.

It's a limitation of how the programming language interacts with the Windows console, and it's a mistake. If you are writing potentially-garbled strings directly to the console, there is already no guarantee that the output will properly reflect the string or e.g. copying and pasting will work correctly, because the string could include backspace characters, or a ton of newlines. Therefore it doesn't make sense to do anything other than make a 'best effort' attempt when rendering strings to the console, including with Unicode handling. If it works, great, but if the string is invalid Unicode, that's only one of multiple problematic cases, so what exactly is throwing an exception supposed to accomplish? Python should use replacement characters by default in this case.

Stop using the Windows console. Or fix it. More rational than a breaking upgrade to Python 3.

Fixing the Windows console rather than upgrading to Python 3 is the only sensible choice.


It is, and Java has similar problems on Windows. The Windows console can be switched to Unicode, but by default, it emulates the IBM 5150 PC character set, circa 1980. It's not even ISO LATIN-1.

Works fine on Linux.

Windows console shpport is broken on python 3 too on many versions. Click the library can print unicode to the console on windows however. No python 3 needed.

> Citation needed.

I'm sorry to say, but your code for example. While I love your click library it has many small issues. Which annoy the heck out of me.

I would submit a patch but this is not a simple change and requires a bit of effort. Perhaps I'll find some time to work on it, but I'm afraid it might possibly break compatibility.

Can you talk specifics about what does not work? Because if there is an actual unicode problem in click i want to fix it.

Would have to go over the rest of code, but the code in ClickException is broken:

    class ClickException(Exception):
        """An exception that Click can handle and show to the user."""

        #: The exit code for this exception
        exit_code = 1

        def __init__(self, message):
            ctor_msg = message
            if PY2:
                if ctor_msg is not None:
                    ctor_msg = ctor_msg.encode('utf-8')
            Exception.__init__(self, ctor_msg)
            self.message = message

        def format_message(self):
            return self.message

        def __unicode__(self):
            return self.message

        def __str__(self):
            return self.message.encode('utf-8')

        def show(self, file=None):
            if file is None:
                file = get_text_stderr()
            echo('Error: %s' % self.format_message(), file=file)
This code is arguably wrong. I think I understand what you were trying to do, but I'm not certain. There is a possibility to use it correctly in python 2 though, but many people might be not aware of it.

In python 3 you pass the message as is (it might cause another issue, but about that later).

In python 2 you immediately encode the passed variable using utf-8. This means that you're expecting argument to be of unicode type, but at the same time you're discouraging users to use unicode_literals, and in most situations users will pass a regular string.

In python 3 __str__ is trying to convert text to bytes, while the message would already be text. This most of the times will look correct, but it might spew garbage when there are non ascii characters.

Here's corrected code (did not test it though):

    if PY2:
      unicode = str

    class ClickException(Exception):
        """An exception that Click can handle and show to the user."""

        #: The exit code for this exception
        exit_code = 1

        def __init__(self, message):
            if PY2 and isinstance(self, unicode):
                message = message.encode()
            Exception.__init__(self, message)
            self.message = message

        def format_message(self):
            return self.message

        def __unicode__(self):
            return str(self).decode()

        def __str__(self):
            return str(self.message)

        def show(self, file=None):
            if file is None:
                file = get_text_stderr()
            echo('Error: %s' % self.format_message(), file=file)
There are few other things that gives headaches (not necessarily python 2 only).

In python 3 for example click refuses to run if LANG and LC_CTYPE are not defined. Why doesn't it simply do what all other applications are doing and simply fall back to latin-1 (ISO-8859-1) instead of printing the error.

Also, another issue (and my above code is also is affected by this). Click is checking for environmental variables before continuing, yet encode and decode have hardcoded utf-8. It probably should use whatever locale.getdefaultlocale() returns.

Yes, it will explode if you try to write a Unicode string to something that doesn't have UTF-8 encoding by default. But it won't explode if you write byte strings, or if your default encoding is UTF-8.

Windows console is that case which doesn't default to UTF-8. At the time when I was working with it, py2 defaulted to ascii.

For some locales Windows has additional bonus, the Ansi (GUI) encoding is different than OEM (console) and it was heroic undertaking to make your program work correctly for both.

Python3 works out of the box.

I can sympathize, Windows is my environment too. But I'm not blind to other environments where some simplifying assumptions can be made.

For true insanity try adding IDLE to the mix. IDLE under Windows in Python 2 will let you output UTF-8, but won't accept it as input. Python 3 is more consistent.

Windows doesn't actually support Unicode, not really (for backwards compatibility reasons): it's just UCS-2 at the OS level, and hence you can't represent OS level strings as UTF-8 without data loss, because in practice you need to handle lone UTF-16 surrogate code units. This is why WTF-8 exists.


I had thought you were being sarcastic in your grandparent post.

> So why break everyone's code

Dunno about you, but the amount of encoding/decoding errors in Py2 broke my little code multiple times in the past. "oh yeah I'll just fetch this html page"... all good, until that page contains non-ASCII and then BOOM.

With Py3 you're forced to think about this sort of issues right away and with minimal fuss you're sorted for life.

There isn't a "Windows" way of handling strings. You can handle strings however you want, just make sure that you pass the right data to syscalls. Windows APIs take UTF-16 or some other code page. Linux takes bytes. OS X takes UTF-8. There is no greatest common denominator. But if you use Unicode, then at least there's a well-defined way to write the syscall wrappers on all platforms.

And it's not like Python 3 is forcing you to give up bytes. You can still os.listdir(b'/some/dir') and you'll get a list of bytes objects back. It's just that that's not default, since using Unicode by default simply makes more sense on a cross-platform system.

actually the win api is broken when it comes to unicode. there is no real unicode in windows and thats a huge problem.

Handling UTF-8 like on Linux is definitly akward in the Win32 API. And actually that only was caused by the fact of backwards compability.

Actually it would be huge if they would force users to unicode more. But I doubt that will ever happen. We maybe see Windows Active Directory on Linux before this will happen.

It's not the API that's broken, that uses UTF-16 consistently which qualifies as Unicode by any definition.

The problem really only lies in file and console encoding. If they'd simply give you the option between legacy code pages and UTF-8, and made UTF-8 the default, life would be grand.

Yep! In my earlier days as a programmer, I wondered why all win32 API functions have both the Ansi and Unicode versions? Isn't it a messy? At the end I realized it's a honorable effort to keep old Windows programs compatible.

On the other hand, few years ago, when Delphi the RAD development tool, introduced their new Unicode version, completely breaking backward-compatibility, by changing the literal string type from "Ansi string" to "Unicode String". I know many third party libraries were abandoned and many projects were prevented to upgrade. It's also a disaster to me...

> There are some exceptions, of course.

Those exceptions are often bugs in the logic of handling text. Python 2 allows you to sweep that under the rug.

> Me, I wait for Python 5 (=2+3) which will be able to import Python 2 modules, so that you can gradually convert your code to the newer version.

Thank you for saying this! For those interested in this direction, there was a discussion about it on HN a couple of months ago:


Proof-of-concept examples embedding Python interpreters within themselves:



Cython can probably compile your existing codebase such that it's compatible with both 2 and 3. Then you can write your new code in Python 3, while importing the old code as a C extension.

You can also do this piece-by-piece, if for some reason your code doesn't compile with Cython right away.

The compilation process also makes it much easier to identify bytes/unicode bugs, because you can choose to declare types.

That sounds like an interesting approach. Have you tried it with a real-world project? I'm curious if the approach has been proven.

> can't just take 2 months off to move all of it to Python 3

Welcome to the modern COBOL then. Have fun maintaining that and running only old libs cause everyone else in the world has passed you by.

> I'm just waiting until a consensus emerges.

Wake up, it's done emerged. Almost no libs aren't avail on 3 now.

> Almost no libs aren't avail on 3 now.

What an awkward way to say "most libs work on Python 3 now"... Why is that? I suspect it's because saying it clearly doesn't deliver the message you want.

It should be noted that the quantity of Python 2 code keeps on growing

According to this article Python2 support is shrinking.


Both claims are correct, as is common with statistics.

If you write `print "hello"`, you just increased the QUANTITY OF PYTHON 2 CODE. If there is one guy left in the world who occasionally writes some Python 2--and even if he's the only one left who does--the quantity of Python 2 code will, in some sense, "keep on growing", regardless of what happens to Python 3.

The article you cite refers to PERCENTAGE OF LIBRARIES that support Python 2 or 3, either one or the other exclusively or both, and it demonstrates a decrease in libraries that only support Python 2 and an increase in libraries that only support Python 3. This is presumably correct, too, but it doesn't suggest that nobody writes any Python 2 code anymore.

> Uglyfying my code with the "six" module it's also not a solution, since when I'll move, I won't care about Python 2 anymore.

I've upgraded a lot of my code with just future imports and it is just a tiny step from there, unless you're doing a lot of character encoding. Leave those projects behind, but moving most other projects is easier than I expected.

> I have tens of thousands of lines of Python 2 code in a big system. I can't just take 2 months off to move all of it to Python 3. Moving it in pieces is also not really possible, since there are many inter-dependencies.

If you cant evolve a system incrementally because of excessive coupling, that's a problem, sure, but not a problem of either the platform you are on or the one you might be thinking about moving to, unless the former forced a tightly-coupled architecture on you.

We've solved this problem by breaking our architecture up to be more distributed and introducing Python 3 one small service at a time.

Same here. It's also been a great opportunity to step back and reevaluate some early-startup engineering decisions that've made the new service codebase fun to work with. Rather than having a single backend that has to handle all possible situations ("this one thing we're talking to speaks XML..."), we can only deploy the stuff actually needed for that one service ("I think we can safely assume all-JSON here"). I highly recommend this approach.

Can you provide few irreconcilable issues that stops you from using python 3?

> Moving it in pieces is also not really possible, since there are many inter-dependencies.

I guess something is wrong with your architecture in first place, so you can't separate domains or layers easily.

I've used this module for upgrading a python2-only lib : https://pypi.python.org/pypi/future . Basically it allow you to write Python 3 code that will run on Python 2 interpreter (bar from using advanced features like coroutine or matrix mult operator).

Once you decide not to support Python2 anymore, just drop the imports. They also have a Py2/3 compat cheet sheet : http://python-future.org/compatible_idioms.pdf

It is pretty obvious there is a problem when your quick and easy cheat sheet is a 33 page tomb. If only Guido and company would acknowledge the problem and fix it instead of putting their fingers in their ears pretending Python 2 will cease to exist in 2020.

They acknowledge the problem, and their solution is to stop actively aiding and abetting the people who are continuing to make it worse by writing python 2 code.

Wow, what a simplification.

What about us who have and want to use massive libraries of existing Python 2 code?

"Just upgrade your libraries!" is not realistic for everyone, especially those who have to justify a mostly idealogical conversion to a similar language, and spend real time and money, outside of actually developing the end product, to do it.

Four more years of official support is a long time.

Someone suggested Cython to compile python 2 code as modules and import to python 3. Did not try it, but that sounds plausible.

The biggest obstacle people seem to complain about (besides print, which is trivial IMO) is unicode. This is generally also the most painful part of making python 2 code work on python 3.

The issue with it is that from unicode perspective your python 2 code is inherently broken. There's no automated tool that can fix it, and there's no way python 5 will be able to run python 2 code correctly unless unicode support would be removed. You have to manually tell python which string is made of characters and which one was made out of bytes.

You can automatically remove six from your codebase after you're Python 3.x only. So six can be a possible solution (provided you have automated tests).

I'm so glad to see this. I started with Python 1.4, and after several years too long on 2.6+ I've just recently been able to convince an employer to start a new codebase on Python 3.5. Having used it in production now, I would not willingly go back. Python 2.7 is excellent, but Py3 feels like "Py2 done right". It's at least as nice in every way and much nicer in quite a few.

I'm in a shop with solid microservice underpinnings, so our new project could just as easily have been in Go or something else for all its clients would know or care. Given that all the libraries we wanted to use were already available for Py3, this was a no-brainer. There were plenty of reasons to upgrade and no compelling reasons to stay on Py2. Should you find yourself in such a situation, I highly recommend investigating whether you can make the same move.

Which framework are you using for the said microservice?

Flask + Flask-RESTful for this one.

The Windows 10 approach - apply pain to the users until they "upgrade". Python 3 hasn't made it on its own merits, and we have fanboys like this trying to figure out some way to force people to upgrade.

Library porting to Python 3 did not go well. Many Python 2.x libraries were replaced by different Python 3 libraries from different developers. Many of the new libraries also work on Python 2. This creates the illusion that libraries are compatible across versions, but in fact, it just means you can now write code that runs on both Python 2 and Python 3. Converting old code can still be tough. (My posting on this from last year, after I ported a medium-size production application.[1] Note the angry, but not useful, replies from Python fanboys there.)

Python 3, at this point, is OK. But it was more incompatible than it needed to be. This created a Perl 5/Perl 6 type situation, where nobody wants to upgrade. The Perl crowd has the sense to not try to kill Perl 5.

Coming up next, Python 4, with optional unchecked type declarations with bad syntax. Some of the type info goes in comments, because it won't fit the syntax.

Stop von Rossum before he kills again.

[1] http://www.gossamer-threads.com/lists/python/python/1187134

What kills me is, they decided to make breaking changes, and they still kept the damn global interpreter lock! It's like saying "Sorry guys, it's 2016, and you have to port all your code, but no parallelism for you!"

Beatings will continue until morale improves...

My understanding is that it doesn't make practical sense to remove the GIL; doing so would result in a slowdown of the interpreter[1]:

  Back in the days of Python 1.5, Greg Stein actually implemented a
  comprehensive patch set (the “free threading” patches) that removed the GIL
  and replaced it with fine-grained locking. Unfortunately, even on Windows
  (where locks are very efficient) this ran ordinary Python code about twice as
  slow as the interpreter using the GIL. On Linux the performance loss was even
  worse because pthread locks aren’t as efficient.
  Since then, the idea of getting rid of the GIL has occasionally come up but
  nobody has found a way to deal with the expected slowdown, and users who
  don’t use threads would not be happy if their code ran at half the speed.
  Greg’s free threading patch set has not been kept up-to-date for later Python
> no parallelism for you!

This is a bit of a hyperbole; Python supports both hardware threads and processes, both of which can be used to achieve parallelism. Despite the GIL limiting Python code to 1 CPU, many I/O routines will release the GIL while they perform I/O meaning I/O can be done in parallel, and native code can release the GIL to do long computations. Processes can be used to overcome the GIL directly in Python, and the standard library offers support to make this as easy as it is to launch a thread, as well as some higher-level support for parallel-mapping a function across a pool of processes.

[1]: https://docs.python.org/3/faq/library.html#can-t-we-get-rid-...

The global interpreter lock is the price paid for being able to monkey-patch code and objects from one thread while they are being used in another. That falls out of the original CPython interpreter, which is a naive interpreter in which everything is a dictionary. It would be "un-Pythonic" to change that.

> The global interpreter lock is the price paid for being able to monkey-patch code and objects from one thread while they are being used in another.

So, in other words, “we don't trust our users to do concurrency correctly, and we need to keep our system safe in spite of that”?

I'm not aware of any popular scripting language that supports utilizing multiple processors without starting new processes. Is this a thing? I don't do a lot of parallel stuff with Python but multiprocessing has met my needs.

> I'm not aware of any popular scripting language that supports utilizing multiple processors without starting new processes.

Really, this is more about implementations than languages; a fairly-popular Ruby implementation (JRuby) supports native threading (and has no GIL); IIRC the main Perl 6 implementation also.

Elixir is up and coming in popularity (don't know if you consider it "scripting", which is a relatively fuzzy-bounded category), and definitely supports utilizing multiple processors without starting new OS processes (it uses "processes" as that term is used in the Erlang ecosystem, which are a different thing, basically M:N green threads.)

"Jython and IronPython have no GIL and can fully exploit multiprocessor systems" - https://wiki.python.org/moin/GlobalInterpreterLock

Granted, neither implementation has the popularity of CPython.

And both are sadly stuck on Python2, last I checked :(

Admittedly it's not all that popular, but Tcl can do this with no problems at all.

(By the way, Tcl also has an integrated event loop as well as a complete implementation of coroutines, both of which Python only very recently got.)

First, there is a solid argument to be made that if you need the performance of multi-threading, you shouldn't be doing it in Python anyway.

Second, parallelism in Python is not hindered beyond threading (e.g. multi-processing works just fine: https://www.youtube.com/watch?v=gVBLF0ohcrE).

Finally, removing the GIL is not trivial and the other changes to the language that mandated breaking backwards compatibility are pretty worth it (IMO).

GIL is a design decision for the interpreter, not a caveat of the language itself.

The reference implementation of most compilers/interpreters is usually the slowest one, because it has to be legible to read the code.

Just to add some perspective from another language: Ruby had a similar, traumatic rift when transitioning from 1.8 to 1.9. I don't think it's a coincidence that -- like Python 2.x to 3.x -- Unicode-handling was one of the changes...so even though Ruby 1.8 to 1.9 had fewer compatibility breaks than Python 2.x to 3.x, it still pissed off a lot of people.

But what made the community jump, IMO, was the fact that the Rails maintainers announced that they would be upgrading to 1.9 [0]. And since there is a very small subset of Ruby users who don't use Rails, that was the end of discussion.

Is there any library in Python that enjoys as much dominance over the language as Rails does to Ruby? Not from what I can tell...And virtually all of the big mindshare libraries in Python have made the transition (e.g. NumPy, Django)...So I agree with OP that making libraries commit to 3.x-or-else is the way to encourage adoption of 3.x...but I just don't see it working as well as it did for Rails/Ruby. That's not necessarily a bad thing, per se, in the sense that it shows the diversity of Python and its use-cases, versus Ruby and its majority use-case of Rails. But forcing the adoption of a version upgrade is one situation in which a mono-culture has the advantage (also, see iOS vs Android).

[0] http://yehudakatz.com/2009/07/17/what-do-we-need-to-get-on-r...

Django made the transition more than 4 years after Python 3 was released. Flask took a few months less. Requests took more than 3 years. Basically, for the first three years (2009-2012) Python 3 was almost entirely embargoed by the most popular Python projects.

The problem was not the lack of monoculture, but that all the strongest projects took their sweet time. (To be fair, it might be that they were lobbying for the compatibility hacks that did land in 3.2 and 3.3, about 2 years after the .0 release.) There was no trailblazer among the big projects. The only real pioneers were the Arch guys, making py3 the default interpreter quite early on; but the popularity of a minor Linux distribution is nowhere near the main projects'.

Still today, some of the most prominent leaders on such projects (the_mitsuhiko, kennethreitz etc) are very much Py3-skeptics; but at least the big frameworks have moved on, so everything else is starting to catch up. We just lost 3 years.

Ruby 1.9 rewarded your upgrade efforts by running twice as fast. There was a lot of enthusiasm to migrate.

Python 3 still seems to reward you by running slightly slower. Where's the hook? Marginal improvements to language design are bit of a weak sell for a language that's already pretty nice.

All of the testing I've done with real-world code has shown an improvement of 10-20% moving from 2.7 to 3.4. I believe 3.5 made some slight further improvements.

If only CPython3 wasn't completely blown out of the water by Python2's production-ready PyPy. That said, my tests are the exact opposite of what you're claiming- I've been doing the same benchmarks on all the CPython3 releases since 3.2. I think you're the only person I've heard make that claim in fact.

One Python library with some mindshare that has not made the transition is SageMath (http://sagemath.org). We haven't even tried yet (mostly due to lack of funding/time: https://groups.google.com/forum/#!topic/sage-devel/cUVKUx31g...). SageMath depends on sympy, so if sympy were to switch to support Python3 only before we do, then I guess we would never update sympy again in Sage.

> For SymPy, the fact that 1/2 gives 0 in Python 2 has historically been a major source of frustration for new users.

I love Python, and I think Py3k is great. But I guess I have written too much C/C++ or something because integer division yielding integers (implicitly "with truncation") is what I would expect and not a float. And I'm a big fan of the "Principle of Least Surprise" but in this case I'm surprised to see int/int give float.

But, hey, like I said -- Py3k is a net improvement. And I'm sympathetic to new programmers and I love that Python's so popular for those new to coding.

The integer division is more surprising in a dynamic language because it might stay undetected until the day you happen to pass round numbers to your function.

I don't think you're correct. If you, the designer, made those operands, you should expect them to be int() or float() because of how you created them. int() and float() have lots of other differences beyond just how they're treated by division.

In my experience, at some point someone will write f(x, 2) instead of f(x, 2.0). This especially happens when you are trying to debug something at the REPL. The addition of the // operator has really solved the problem and given us the best of both worlds here.

The SymPy author was explicitly talking about users. The sort of folks that use matlab, mathematica, and maple.

I don't know of any of them that expect 2 / 3 to be an integer.

And as ufo says, it creates subtle bugs because it works right on most data until you just happen to get a pair of ints. Then boom.

eg in Matlab all numbers are doubles unless you take action to make them otherwise.

Having a function/operator return different results depending on the type seems a bit insane, from a non-C/Python perspective.

> There should be one-- and preferably only one --obvious way to do it.

If I divide 1 in half, I'm left with 0.5.

There's a better way.

    (/ 2 3)   ; => 2/3
    (/ 2.0 3) ; => 0.6666667

"you, the designer" may not have created the operands, just taken them as function arguments. Sure, you should have written tests for different types of input, but if you didn't, it may surprise you later.

I think part of the "surprise" comes from Python being so versatile. I mostly use it for prototyping numerical programming in a research setting, i.e. like MATLAB. So I'm occasionally surprised by the integer division stuff, because I want to just treat my numbers as numbers and be done with it. OTOH, when I'm doing systemsy stuff, it doesn't surprise me at all.

Usually I would agree, but it is written in the context of SymPy, as symbolic math library. There I would expect 1/2 is the solution to 2*x=1 .

Whether or not SymPy gives int(1)/int(2) as a float or Python gives int(1)/int(2) as an integer are wholly different things.

In the context of a programming language, I have to agree with wyldfire that int-on-int operations should yield integers.

Now if SymPy wanted to override the base behavior, I think they absolutely should! I could even understand that the context of symbolic math makes this the least surprising outcome.

But Python is a programming language, not a Domain Specific Language.

On the other hand, 1/3 is still not the (exact) solution to 3*x=1...

But it can be done, even in systems implemented in Python 2. Try the following in http://sagecell.sagemath.org/

x = 1/3 3 * x == 1

1. I introduced the "Sage preparser", which does things like replacing "1/3" by "Integer(1)/Integer(3)" in Sage, before feeding that input to Python to be evaluated; Robert Bradshaw also did a lot of work on the preparer. The preparser uses a hook into the IPython evaluation system. I decided I really needed Sage to use a preparser like this during a talk I gave at Pycon 2005, based on audience feedback, which is why it exists.

2. I would really like somebody to remove the preparse code from Sage and make it a standalone library available on pypi. Here's the code (happy to relicense it under BSD): https://github.com/sagemath/sage/blob/master/src/sage/repl/p...

3. Permalink to your example -- http://sagecell.sagemath.org/?q=nwzdzn

It depends on what / means. Sometimes it's integer division and sometimes it isn't. (You can use // to get integer division more consistently.)

> You can use // to get integer division more consistently

Exactly. / in Python 3 now consistently means "real" division. If you intend integer division, you can explicitly say that with //.

Apart from breaking compatibility with Python 2, I don't see any reason why this isn't 100% better/clearer/more consistent.

It's funny how different backgrounds give different perspectives. When you say "real" division I think of ALUs and if it's real it's "really what you meant" when you had two integer operands. Or did you not mean "actual" when you used "real"? Did you mean like Real numbers?

I get that it's confusing and a little subtle to new folks that "1 / 2" is not the same as "1.0/2.0".

But if I wanted float division I would've coerced one or both of the operands to floats.

I think of it as "real" in the sense of accurate. 1/2 = 0 is mathematically incorrect. 1/2 = 0.5 is mathematically correct. / is a broadly used symbol for division. It should do division, not type check. That's why have 1/2 = 0.5 and 1//2 = 0. The second one is explicitly requesting a "special" form of division (truncation).

> 1/2 = 0 is mathematically incorrect. 1/2 = 0.5 is mathematically correct.

Those are two different functions, and both are "mathematically correct". The notation is the problem. The former operation isn't specifying the division operator, it's specifying an integer division operator.

So "/" actually represents two completely different functions, and to disambiguate between them you inspect the types of the operands. That works OK in statically typed languages, but Python is dynamically typed. There's no way to specify which operator you want without verbosely type-casting before each use.

So Python 3 fixes this by separating the two different functions to use different operators: "/" for Real (float) division, and "//" for integer division. Problem solved.

I agree with anything that you just said, and that's not the portion of the previous post that I had a problem with. The Python3 disambiguation is a nice way to handle the former ambiguity.

I'm aware that they are different functions, but I strongly feel they were poorly implemented. / is a mathematical symbol in addition to being a function in Python, but in Python it is implemented in a different way that is mathematically (but not pragmatically) incorrect.

> I strongly feel they were poorly implemented.

I'll agree with this. In the context of Python, the behavior is surprising.

> but in Python it is implemented in a different way that is mathematically (but not pragmatically) incorrect.

I disagree with this statement, for pedantic reasons. I wouldn't expect someone to use "/" in the context of standard arithmetic to mean integer division without explicitly redefining it, but redefining it would certainly be "mathematically correct". Look at the notation for Galois fields. That redefines all of the standard operators to work within a field of limited elements, and I'd consider the use-case to be comparable.

> Look at the notation for Galois fields. That redefines all of the standard operators to work within a field of limited elements, and I'd consider the use-case to be comparable.

Finite fields define standard operators because using mathematical operations two sets of fields is ambiguous. So I don't consider that "redefining", in the sense that they are being used unexpectedly, I consider that just defining how to use them in this special use case.

Whereas Python is defining / to do two operations: divide and truncate. I call that redefining because there's a certain expected outcome if you've had elementary math, and that's not what Python delivers.

I put real in quotes for a combination of at least the following reasons:

* "real" as in real numbers instead of integers.

* to acknowledge the fact that computer division is only ever an approximation of real numbers.

* real as in "what division actually means". Only in computers would you expect 1/2 to be 0. If you're doing math, you express that with something like floor(1/2) (don't have time to copy/paste floor bracket symbols).

"Real" is sometimes used as a synonym for "floating-point". Many programming languages uses "real" as the name for the primitive floating-point type.

Sadly this is not true. // is flooring division on integers, but real division on floats.

No, // is flooring division on floats too. It returns a float, but that float is rounded down to a whole number.

I... could've sworn. I guess I'm just flat out wrong.

Just double checked in py 3.5.0 and the double slash will do flooring division for any type. i.e. 5.0 // 4.0 returns 1.0, not 1.25.

Lisp has it right.

    * (/ 1 2)

Part of the problem is that the big hosting services don't even support Python 3 yet. Google App Engine and AWS Lambda don't support it, Heroku does but only for the past year, etc.

I'm building a brand new company and I'm being forced to use Python 2.7 because I'm using Lambda. This was my choice, but the point is I can't use 3 even if I want to.

That would make me pretty nervous. A couple of years from now, it will be taken for granted that all new Python projects are written in Python 3, and AWS Lambda will either be obsoleted by a Python 3-based competitor or--about the time you get your code built out--they will announce with great fanfare that Python 3 is their future and put projects like yours on the B Team's legacy infrastructure.

After "forcing" you to build your company on Python 2, they'll build theirs on Python 3 and too bad for you.

I'm not too worried -- we write all our code with Python 3 in mind (and import future) so the transition will be easy. It's just frustrating when I can't take advantage of some of the newer language features.

Agreed. When you start to work with python 3 it's a real drag to discover that your command line tools require a special virtualenv. It's not even a new problem, eg https://github.com/GoogleCloudPlatform/gsutil/issues/29

Yes, few days ago I checked AWS Lambda, it clearly only supports Python 2.7, but not Python 3. I understand your worry, which is realistic.

> Python 2.7 support ends in 2020. That means all updates, including security updates. For all intents and purposes, Python 2.7 becomes an insecure language to use at that point in time.

There are alternative python implementations like pypy, and they haven't expressed an intent to drop python 2.7 support. Only CPython 2.7 will then become an insecure interpreter.

If you think Python 2.x support ends when the core team stops supporting it, you haven't been studying history. Programming platforms with large code bases (e.g. Cobol, Fortran) have extremely long lifetimes. What's more expensive:

1) porting millions (possibily billions) of lines of working Python 2.x code to Python 3

2) keeping the Python 2.7.x interpreter in deep maintenance mode?

This idea that you can force people to move to Python 3 keeps coming up and is misguided, IMHO. There are two proper solutions to this issue:

1) make it easier to port from 2.x to 3.x. We are still making progress. Python 3.5 includes %-style formatting for byte strings. Allowing 'u' as a prefix for strings was another example. Practicality beats purity.

2) make 3.x a more compelling platform for new development. The amount of goodies in 3.0 was pretty underwhelming. I certainly wasn't very excited about moving to it. Async IO is a neat feature. Keep them coming.

This is the problem with Python 3. The arrogance of the people in support of wrestling it from us.

> Frankly, if these "carrots" haven't convinced you yet, then I'll wager you're not really the sort of person who is persuaded by carrots.

As someone that uses Python as a C replacement for one-off-scripts that work with binary objects and pure ASCII, Unicode is not a carrot. Unicode will never be a carrot. If anything, Unicode keeps me away.

If I was developing web apps, GUI programs, or data processing libraries, sure. These would all be carrots. But I'm not. The simplicity of Python 2 is my carrot.

> Python 3 does have carrots, and I want them.

You have them. But you seem to be more interested in taking away my carrots, so you don't have to worry about them any more. That's arrogance.

The truth is that free Python 2.x support ends in 2020. After that, it will be sponsored-only.

Implying there won't be a community fork with security fixes.

Things like PyPy will be the last thing that changes, they basically support whatever is currently popular.

That was suposed to happen 2014 IIRC. I wonder if 2019 rolls around and they push it back to 2025.

I still don't get why they are noy just backporting the simple things people are missing. Like breaking: print "hello". I don't think being academically correct is worth the current schism.

That's far and away the easiest transition: start using parens in your Python 2.7 code today and you won't have to alter them when migration day comes.

And use __future__ to force yourself not to forget: https://docs.python.org/2/library/__future__.html

You can fix this with one bash command, or an IDE, or whatever. Sounds like the people who can't upgrade are just , well, you fill in the word yourself.

Call me strange but I actually like that change. Not because I enjoy typing more, but because invalid syntax errors on print is an unambiguous warning that whatever code I just downloaded and ran is incompatible with 2.x

Without that, there may be other much more subtle bugs I'd have missed and spent time troubleshooting.

Python 3 ships with a utility that fixes that stuff with no work on your part just run 2to3 on all you're Python 2 source and like 99% of the differences are taken care of.

Why not have "print thing" to "print(thing)" just be an automatic conversion though?

It's is, that is what the 2to3 program does automatically converts that among other things.

Nah, I'm good. Python 2.7 is not going to change. I have zero desire to port my code and gain zero benefit. I can actually write Python 2.7 with confidence that invoking "python2" won't break on the next rev of the OS.

Most likely, if CPython & crowd give up on 2.7, PyPy will carry the flag of "Stable Python" forward and that'll be that.

But if Python dies, I won't really be sad. Better languages are out there. :)

> But if Python dies, I won't really be sad. Better languages are out there. :)

Like what? I'm curious, because I've tried a lot of the newer languages out there, and I haven't found a general purpose one that I like nearly as much as Python. Elixir is really cool, but it has a pretty narrow focus, which makes it difficult for me to make the investment in learning it.

Elixir _is_ the correct answer. Not sure what you mean by "narrow focus" - it is as general purpose as anything. Only thing holding me back from fully moving from Python to Elixir is that the ecosystem isn't mature enough yet.

I like Elixir, but the ecosystem has years of catching up to be anywhere close to Python. Not just libraries but experienced developers. There's also a danger it might end up like Ruby - basically the support system for a web framework.

Elixir is nice, but I hardly think it compares to Python due to the fact that you are required to hand back control from a native call approximately every half second or so or else you will screw the internal scheduler on the ErlangVM.

Would you do Bitcoin mining or 3D games with Elixir?


I think use-cases like that are included when you say general purpose.

Let's put it this way: there is nothing I would do in Python, that I wouldn't do in Elixir. For all intents and purposes, that's general purpose.

Common Lisp is better as a dynamic language. I have a strong suspicion OCaml is actually better as a static language, but I haven't spent enough time with it to say otherwise. If you're comfortable debugging lazy evaluation bugs at 3am: Haskell. Clojure is better if you want a pile of libraries.

For shops with a functional language horror, Java, particularly Java 8, is better.

My opinion on Python is that it's stunningly bad. I'll spare you the litany.

If you're going to get in the weeds and build libraries from scratch that don't exist, would you enjoy that in Python3? Or would you like to get in the ground floor of something truly new and truly superior in every regard with Elixir?

Other languages have eaten some of Python's lunch over the 3 debacle, but Elixir is the answer to any question about "what's better".

Why does this guy care if others don't drop Python2 support? Instead, live and let live.

Everyone is going to use what they want. People have been whining about this for almost a decade now, give it up. This community isn't going to be on one version again, and that is NOT the community's fault. It's the core dev team and Guido's for poor decision making, technical churn and refusal to go back and fix their mistakes. There's no technical reason why CPython can't run Python2 and 3 code, JVMs and the CLR have proven this sort of thing is possible.

If there's not strong enough support of Python3, let it die. Technical churn is not supposed to survive, not live on propaganda and coercion. I don't see the problem. If he bought into 3 before it succeeded, he created his own problem. I plan on porting what I have over when 3 actually gets over the hump but not before. That's what's in my best interests, which is what everyone should do (and it's exactly what the core dev team and GVR did).

Besides, if you're not into numerical Python and mainly use it for web, PyPy makes more sense to migrate to than Python3.

> I plan on porting what I have over when 3 actually gets over the hump but not before.

The popularity of major open source projects in Python have been ported or are in progress of porting over to Python 3 (mostly 3.3+).

I remember my boss telling me a few days ago in his previous job the codebase was still running Java 1.3 or 1.4. There was no incentive to upgrade because people were afraid of breaking changes, even from to the next major release. So when are they going to port over to 1.4? to 1.7 and then finally 1.8? God knows and trust me that'd be the day IBM AS400 support finally RIP. You can choose to live with that, but the pain of actually maintaining a codebase running on obsolete platform is pain. But ask Ansible maintainers, writing code that has to work with Python 2.3 was brittle. Ask anyone who has done RedHat administration they know too well how brittle working with a system totally 10 years ago (even though RH does support these old version for years). Pain in the fucking ass.

> If there's not strong enough support of Python3, let it die.

Where did you or the author get that? There is a strong motivation to go to Python 3. It just so happens people are still running on Linux that ships with Python 2 by default, or on MacOSX which is still on Python 2 by default.

If you maintain Django and you don't want to drop Python 2 support, you make the call. You don't have to join. He doesn't force people to join his collation. You are making a big deal out of his post/tweet.

So you think that the major reason people aren't upgrading is that Python 2 ships by default with Linux/MacOSX?

For what an anecdote is worth: yes, I only use 2.x because the Mac system default is 2.x.

Relying on the Mac default Python keeps my binary sizes small, and Python 2.7 is good enough. While I have at least done the __future__ imports to adopt features of 3.x such as print(), a full switch to 3.x will be more trouble than it’s worth until Apple decides to install it.

May I ask why you're optimizing for the size of your installed applications? Serious question.

Size is not the entire point. I do however like small downloads (sick of seeing applications that consume hundreds of megabytes, probably because they’re shipping an entire VM or something).

The bigger reason is to make the application as useful as possible to Python extensions. If you don’t force scripts to use a particular interpreter then the entire application can be imported into almost any script, alongside any number of other modules to do interesting things. The only module that has to be compatible is the application; if by chance the application module doesn’t work with one Python version, another copy of it can exist for two Python versions. Whereas, if your application only works inside a special interpreter binary, extensions have to reinstall special versions of every module that they want to use, and possibly port an entire stack of code to support that Python version. Both approaches are workable but an unspecified interpreter is by far the more convenient way.

> I plan on porting what I have over when 3 actually gets over the hump but not before.

Don't you think that mentality is part of the problem?

No, because as I noted in the parent- I disagree with your premise that there is a problem. This is how open source works, failed ideas are allowed to fail.

That may be true in the case of smaller open-source things. But when it comes to larger projects, it's largely the same way as it is in the rest of our society: things that are considered too important to the whole are changed regardless of pushback.

Not a big deal if something like screen dies or fades. But the same is not true of something as big as Python.

Open-source is democratic in that you have a voice, but it doesn't mean your opinion matters much in certain cases.

Python2 hardliners have been warned for years. By the time it is sunset it will have been 12 years since it's release. Why should the Python devs stop moving their project forward because you're not interested refactoring and deprecating old code?

Where I work we move forward and adapt our code/patterns over time. If an old thing breaks (due to updating to Py3 compatible features), we deprecate/replace or refactor it to work, whichever the team decides holds more value for us as a group. There's probably very little code in there that was written 8 years ago when Python3 dropped. We iterate forward.

I get pretty frustrated by this "but this stuff I wrote a decade ago works and I don't wanna..." attitude in the tech world. Not saying people should burn the candle at both ends to keep up, but come on. 12 years and you can't refactor forward? Then you find yourself in a position where, oh shit we really have to burn the candle at both ends.

This isn't about the actual work involved then. Because you're going to write code anyway. It's about not wanting to change your headspace (not all that much, esp for experience programmers) to write Python3 compatible code. The burden then is on Python hardliners to keep up or fade away, much in the same way you suggest.

Old is replaced with new. It's the way the world works. Get over it.

Throwaway account created just for that? A lot of words to say nothing other than you're pissed off and are disconnected from the real world of software development. Python2 means more than Python3 does: AWS matters, GAE matters, 1000KLOC+ matters, 50K long-tail of libraries on PyPi matters, PyPy matters. You must be new. Why not reveal your real account instead instead of hiding behind a pseudonym.

Bottom line is that you believe newer is always better. That's not true at all, especially in programming languages.

Just a bit of an understatement. That's the point of this blog post.

I agree. And if you're into numerical Python, sticking with Python 2.7 is standard. Why would you migrate to Python3 when there is no performance benefit in numerical computing? If you're going to migrate may as well migrate to what will improve performance on the non-numpy bits (pypy) than a migration just for the sake of migrating (py3).

Proof that there is no performance improvement? There are some changes in PY3 making CPython more performant in some areas, but in some other areas PY3 may perform worse. There was a PyCon talk about this a few years back.

I haven't seen any proof there is a performance improvement. The only comprehensive benchmarks I have seen show on par or worse performance. If you are going to improve the performance of something I think it is on the "improved" system to demonstrate an improvement.

Well, he states in the article that supporting both makes his job as a library writer more difficult. That seems entirely reasonable.

Yes but that's for him. He's talking about asking other maintainers to sign some sort of pact to coerce the community into migrating. He can drop Python2 support if he wants. Go ahead, live and let live? He knows that will grant his library overnight irrelevance.

If he's so committed to Python3 he can sink on his own. Guys like this, GvR, core dev team ALL think that this is a top-down affair rather than users-up. Wrong. That's the truth regarding the 8-ball he's behind and why his harebrained scheme was hatched. Python3 landed in 2008, people are just getting delusional at this point.

It is pretty simple: when I spin up an EC2 instance on amazon, and I type "python" in the console, whatever version that is will be the version I use.

Nowadays, I no longer use the distribution Python at all.

Debian Stable and pyenv it is. That way, I have a rock solid operating system and my application are completely decoupled.

Using distribution packages for application deployments is insane. It makes sense for workstation or systems software where the application you develop is part of the operating system, but for a server application, it makes no sense at all.

I use distribution packages for all my deployments, period. I have no desire to maintain a separate system for installing an upgrading my own applications versus those provided in repositories already. Time to push out a new release? `tito release koji-el7`, SRPM built and sent to Koji to be built and packaged, every 15 minutes mash runs to update our internal yum repo from Koji - and next time Puppet runs the package is automatically upgraded to the latest version without me having to do anything.

No virtualenv's, no tarballs, nothing special beyond an RPM. I won't deploy software any other way (and if something doesn't already have an RPM I will make one for it).

https://ius.io/ is your friend.

Among many things you have all versions of python available that don't conflict with system packages or themselves.

What's awesome is that you can install multiple versions and all of them just work.

Note to self: snuxoll likes pushing boulders up hills by ignoring all the tooling people have invented to make this stuff easy.

> ignoring all the tooling people have invented to make this stuff easy.

I don't, I just use the standard one that comes with the operating system instead of insisting on re-inventing the wheel because "packaging is hard". I could have just had tito push directly to a yum repository if I so desired, but it's much nicer to use Koji (which I'm already familiar with).

> I don't, I just use the standard one that comes with the operating system

...which is "ignoring all the tooling people have invented to make this stuff easy". virtualenv and pip were explicitly designed to work around two damn nigh insurmountable problems:

1) OS vendors move slowly, and 2) Sometimes two projects on the same machine need two different libraries

It's trivial to use virtualenv and pip to make Project A and Project B run side by side with different Python versions and completely different dependency manifests. You can come up with your own ad-hoc system to deal with this or you can adapt the best practice of every other Python shop. Make no mistake though: your current way is deliberately choosing the hard road.

virtualenv is trying to solve a problem that's already solved by setuptools and eggs, and doing a shitty job at it. Back before eggs where a thing, sure, virtualenv had a purpose - but now there's absolutely no need to create a completely independent environment when you can just use pkg_resources.require() or just add the appropriate eggs to sys.path.

Virtualenv doesn't make my life any easier, it makes deploying applications harder since I have to write a custom deployment pipeline - whereas I already maintain RPM's for other non-python software we use internally, and I can use the same infrastructure to handle them both.

Edit: For most of our applications the latest Fedora Server release is also our target host, so "distributions moving slowly" is rarely an excuse as typically the latest packages are available (and if not we make them ourselves, like we do for selenium).

You have mixed up history line.

Eggs are older than virtualenv, from the times of easy_install. Now there are wheels that are nicely integrated with pip.

Also virtualenv became integrated into python (it's called pyvenv now).

Another option here is to pin your Python version to whatever Anaconda is. It's very easy to install and use Anaconda in production.

They aren't completely decoupled. Even with a virtenv you still need to install and maintain a lot of library and header packages on the host OS.

Well, almost completely. The build dependencies don't change nearly as fast as the application, and are generally backwards compatible.

For complete decoupling, better use NixOS.

Or Docker containers :-)

That's problem which is solved with wheels.

No sense?

What if you are happy with the current version and don't plan on changing it before changing the OS? What if you don't install more than one app on a system, or are using containers? Then you might not need the extra complexity.

Even then. Suppose you want to change the OS for some reason. LTS eventually ends, maybe you want to change out some common library, or whatever. It's nice to have the option.

You could do it when you need it, ala YAGNI.

Is it that much more work to type "python3" ?

What's the incentive? How discoverable is that? As someone not familiar with Python, I certainly would never guess to type 'python3'.

If you are not familiar with it, why are you running the interactive interpreter? If you are running someone else's script, they will have chosen python3 for you if it is a requirement.

Distros are dropping/have dropped python2 from default install. So incentive is that py3k is available out-of-the-box while you need to install py2 separately. As for discoverability, typing pyt<tab><tab> is probably what most people would do and that discovers python3 just fine.

One of the most popular "distros", current release is named El Capitan, is still python2. Sure there are some "homebrew" incantations that can change that, but it's still a bother.

I'm a casual user of Python, and a heavy user of OS X command line, and it will be really nice if/when Apple finally changes this.

Python 2 and python 3 are not the same language. There are big breaking changes between them. Most distros kept "python" pointing at python 2 to avoid breaking old scripts. If you want Python 3, you have to ask for it.

Yes, there is a breaking change. No, they are still essentially the same language.

I mean it's a different binary with a different interpreter.

You still need to pick your linux distro.. and they aren't all running the same version. E.g. archlinux symlinks python to python3.

python will probably always run python2. if you wanna see if python3 is installed, then type python3.

I am so annoyed at Python for not having a good upgrade path with backwards compatibility. Even with iOS you can use Swift code along with ObjC.

I haven't had trouble upgrading. What issues are you encountering?

Twisted doens't support py3.

SCons not supporting Python 3.

No web2py for Python 3.

I took parallel algorithms with the guy that wrote that framework. Interesting to see it pop up on HN.

As a developer, it is difficult for me to move to Python 3 (and I want to). I'm working on a Django project and relying heavily on various Django plugins. There are a whole lot of plugins that are only supported for Python 2. Without much knowledge of Django landscape, I anticipated this and went with Python 2.7. Turned out to be a prudent decision that saved me tons of time.

Interestingly, Django is one of the big projects that has already announced they're dropping Python 2 support before 2020. Django 1.11, to be released in April 2017, is scheduled to be the last version to support Python 2. [1]

They've prudently made their last Py2 release a long-term support release, but if you want new Django features after December 2017, you're going to be using Python 3.

Django works great on Python 3, BTW, and if you've got plugins that haven't updated, I would recommend being a bit suspicious of their code quality.

[1] https://www.djangoproject.com/weblog/2015/jun/25/roadmap/

If it helps anyone, 1.8 works with 2.7 and was an LTS release, which is supported until April 2018. You don't get the cool new features like Channels out of the box, but they're available via third parties.

I started a new project in Django recently and Python3 was obviously the choice. The nice thing is that if a library doesn't support Python3 then it's a good smell that it's bloated (or 'mature') and worth finding a replacement for. Sadly, many legacy Django packages are basically losing steam and shouldn't be used.

I think it's instructional to consider what happened to Microsoft with Windows 8 here. They offered a product that users were utterly ambivalent about, and took a hardline "no debate, that's just how we're doing it" stance. And it totally flopped. So they offered some concessions in Windows 8.1, but by that point the narrative on Windows 8 had already formed and the brand was tarnished. So what was the fix? Windows 10. Arguably 10 isn't actually that different from 8 (I mean yeah, it's better, but if you installed a start menu in 8 it's really not that different). I think Windows 10 worked because by being a new windows, it got to form a new narrative separate from windows 8.

The way I see it, Python 3.4+ is a lot like Windows 8.1. It's actually really not bad, but the narrative around its predecessors will never let it truly succeed. The realistic way for python to move forward is to offer a brand new version (5 or 4, doesn't really matter), give some concessions to the python 2 people that will make them feel heard and feel reinvested (bring back the print statement, give us some long wanted things like not-stupid lambdas), and guarantee python 3 compatibility so they're not forking the damn project again.

I think at this point so many people have said bad things about Python 3 that the brand is toxic. A new version number might not really mean much technically, but symbolically as a sign to the python community that the devs have finally stopped being stubborn and arrogant and acknowledged they fucked up, it would be huge.

Is this really still so much of an issue? I switched to Python 3 some years ago and haven't even considered going back.

I can't say I've come across an issue caused by an incompatibility between the two since the first couple months after switching...

You must be in a very luxurious position then to not have to work with any legacy code base. The reality is that there are some giant python 2.7 code bases with forked versions of some major frameworks and libraries. There's no way in hell these places are going to migrate to Python 3 any time in the next decade.

The people who really decide what you do don't care about syntax. They'll ask you if it is faster. And Python 3 is even slower than Python 2.

You don't have this problem with PHP. "Hey, Boss! We need to rewrite our code for PHP 7." – "Will it be faster? How much?" – "It will be faster. By factor X." – "Go on, make it so."

The people who really decide what you do don't care about syntax.

Some bosses care about syntax, at least if you phrase it as maintainability.

That's a good point despite the downvotes. If you don't need any of the Python 3 goodies, by upgrading you will make your code run even slower. Hard to justify.

If you care this much about performance, python is most likely wrong language for you, you should try statically typed languages instead.

What about the open source developers that have been tired of the messy experience of python3.2 packaging?

That are doing this as a hobby. And don't have extensive time.

Why guilt us?

We are neither companies nor fundation's bitch.

Packaging in python is a mess, and every one wants to contribute to the shiny new features, and very few have to deal with the toilet cleaning that packaging (pypi, deb, rpm..), maintaining, dependency management, bug report platform consistency involves.

I am a bored as a maintainer to be expected to deal without a thanks for all the shit coming from the social pressure of "you should do this or blah" to comply with esoteric unproven needs that only results in more work and just more mess and layers of bureaucratic ideas disguised as "best practices".

I am no one's bitch. I am an open source coder. I code what I want, when I want, at the speed I want, and I am no slave that will do what he is being ordered.

I have a feeling that by the time this happens, a lot of the Python developers will have moved to Go or some other language.

Python 2 isn't going anywhere so long as the multi-million line 2.X codebases at big tech companies exist. Every big Python codebase I'm aware of is 2.x.

This sort of war between Python 2 and Python 3 is why I never looked seriously into Python. I don't know where should I start.

It's the same language. This is not the break between Perl 5 and 6 which was different languages only keeping the brand.

I find it very easy to write code which is simultaneously Python 2 and 3. A few ``from __future__ import`` statements and you'll be fine. The only reason some people are complaining is that they were ignoring bugs in their code regarding text data. It's not a Python problem, it's a problem with computers -- bytes are not text, but many systems ignore the difference.

Many large enterprises are starting new projects in Python 3 exclusively. It's fine if you want to use some Python 3-only syntax.

> It's the same language.

No it's not. The syntax is what defines the language:

      In linguistics, syntax is the set of rules,
    principles, and processes that govern the structure
    of sentences in a given language.
It doesn't have the same syntax. It's not the same language, by definition.

Then why do we have two different words -- syntax and language? Python is a high level language, a bytecode, a standard library, and an interpreter (or many interpreters). You might also lump into that a culture, a community, and a Zen.

When we say it's the same language, most of those aspects are the same from Python 2 to Python 3.

With such definition python 2.5 and python 2.6 are different languages, because they have different syntax (python 2.5 does not support "with" statement).

Do you consider this definition to be nothing but absurd? I think it has a lot of merit.

What impact does one or other way of thinking (absurd vs has merit) about this (a potential break in backward compatibility between language versions due to any addition, removal or alteration of one or more language features) have on the language and its ecosystem and culture?


* such potential breakage is relatively benign iff there's an easy-to-follow technique (eg using a linting tool or a program editor's search feature) that always ignores/accepts code compatible with the older language version and always catches/rejects code compatible with the newer language version.

* a further improvement on this is to have the inter-version checking done by a language's interpreters/compilers.

* best of all is when the language is at least partly compiled and compilation using the older interpreter/compiler does the rejecting at compile-time.


Honestly that was a big part of it for me, too and when node.js and io.js originally split I was worried for something far worse for the JavaScriot community. Fortunately node got its shit together (for the most part) but every time I try to use Python 3 for a project someone else tells me I can't because X, Y and Z so I have yet to use any Python 3. Kinda makes me sad in a way because it has lots of things I want to take advantage of.

But I do feel like tides are shifting and Python 3 is slowly "winning". It's just taking, what, almost a decade longer than most would have hoped?

Python 3.5, installed by default on LTS distributions, and available everywhere else.

Output from my systems right now:

"Ubuntu 15.10" - Python 2.7.10

"Ubuntu 14.04.4 LTS" - Python 2.7.6

Not sure where you're getting the "by default 3.5" from.

    > python3 --version
Or: http://packages.ubuntu.com/xenial/python3

the python executable will probably be python 2 for quite some time to come. type python3 --version

Python 3.6

No, still in alpha. If you're just tinkering, sure.

Same here with me. But seeing how nothing is being done in perl anymore, I am forcing myself to at least start using python3. I don't see any point in learning ruby, and all the other new languages out there are sorta still niche. There is plenty going on though with python and django (at least jobwise)

But yes, the "now there are two ways to do it" brought on by 2 and 3 really sux.

I should also say I always hated the "small" syntax changes even between the point releases. This was also something that kept me away (ruby has this problem too). Perl never had this problem, stuff written in 2001 can still run on perl of 2016. But I see now that this is a double edge sword. I have worked for several companies that run modern perl versions but refuse to touch their code base that was written like it was 1998 because it just "still works". That was another reason I started looking into python. I very much welcome complete breakage every few years at this point. It will at least force people into staying modern with the stack they use instead of becoming a 15 year old fossil with a disgusting resume. (I mean, what would you do if someone sent a resume to your dev shop that said they have only done CGI perl for the last 15 years and have no idea what javascript is or why they should learn it?)

Start with Python 3, the differences isn't something that you can't pick up in an hour, and since you won't have existing python 2 code base you probably will never need to do that.

The biggest issue people are having is to convert their python 2 code to work on python 3. There's virtually no downside to use python 3. All maintained libraries work with python 3.

Please stop working so hard to piss in my soup.

I plan to keep using Python 2.* until the sun grows cold or I die (whichever comes first.) So this entire effort seems like a busybody with nothing better to do working hard to screw me over.

Knock it off!

I finally was able to start new projects in Python 3 because most major libraries are compatible.

Is there any reasonable measurement of Python 3 market share? I guess it's still <25%.

No there isn't. People used to be able to extrapolate it from the pypi downloads stats (like this: https://alexgaynor.net/2014/jan/03/pypi-download-statistics/), but sadly it's not possible anymore.

There is https://caremad.io/2015/04/a-year-of-pypi-downloads/ which is also based on PyPI downloads and is more recent.

Oh, thanks! It's kind of crazy to see Python 3.x usage stuck around 7%, though.

Note: that's from a year ago.

It's tough to know but I keep an eye on this stuff. You can count on a solid ~10% marketshare for Python3. Going too far beyond that without updated numbers would be overly generous.

Roughly 12% of PyPi has been ported to Python3[0] and about 10% of pip downloads are Python3[1].



More importantly (to me) is that Py3 uploads have now exceeded Py2, according to https://twitter.com/chupapuma/status/730088983586181120

I saw that one too but Py3 hasn't actually intersected with Py2 yet. Uploads will level off. It doesn't have relevance on a marketshare question, it would be misleading. It does show some momentum.

Most everything you would want already exists on Python2. A lot of those uploads are forks or new projects that already have solutions in 2. I have no problem with Python3 and glad to see people putting their money where their mouth is and 'get to porting'. Rather than blogs/complaining about the userbase/backroom deals and conspiracies to kill Python2.

What do you mean by "market" in this context? Percentage of firms using either? Or do you just mean SLOC?


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact