It is even easier using conda.[1] Install whichever you use the most (likely python3) using either the full anaconda installer or the simpler miniconda installer. Then use conda to create an environment that uses the other python release. One has another benefit: one can create custom environments for analyses where "dependency hell" is a common problem...
You know what? I appreciate the authors drill-down into the "how" of this issue but I'd like to focus on the "why".
Why, has it taken so long to get off of python2.x- noted I understand that a few libraries coughtwistedcough have been holding projects back that depend on them. But I haven't wrote a single block of code in half a decade that wasn't python 2.x and python 3.x compatible.
At what point will we exclusively be developing on 3.x? I'm quite tired of the apologists claiming that making/maintaining things in 2.x is totally fine. At the very least you could run a pass of `2to3` (which is generally quite mature today) before claiming that it's difficult/impossible/too time consuming.
The real reason is that the Python community has been TOO NICE.
NodeJS and Ruby broke everything.
What did they do ?
They said "move your butt" and gave a small windows of time.
Everybody moved.
Python ?
We gave 10 years. Created tools. Tutorials. A way to write Python 2/3 compatible programs. Then gave 5 more years.
The result ?
Nobody moved for 9 years and spread FUD on how hard it is.
But the reality is: most people are not Google. They don't have thousands of 2.7 files to convert.
If you have a 20k LOCS Python project, you can convert that in 2 days. After what you will get a MUCH, MUCH better Python version anyway so it's largely worth it.
The way I remember it is that people were complaining about all the changes in Py2 that were breaking their code all the time. So it was decided to leave Py2 alone and do all the breaking stuff on Py3. The staticness of Py2 made it an attractive target for new development. As a result Python became very popular.
So if they had of just broke everything that very easily could of been the straw the broke the camel's back. Python would not of achieved the popularity it did.
I think the fundamental error here was the idea that languages become popular because of some fundamental quality of the language as a language. That never really happens. It is always something else. In the case of Python it was the "Batteries Included" thing. It came with a whole whack of functionality in the form of libraries written in other languages. No one much cared about the details of the framework for getting at that functionality.
I have a Python project with 28KLOC of Python, 22KLOC for a Python/C extension, and 26KLOC for the tests, so about 4x larger.
It took me about 6-8 weeks to port.
Part of it was because I had to change the API. I had an API with a method like "X.to_string(fmt)" where the format specifier could be something like 'csv' for text and 'csv.gz' for gzip-compressed text.
I had to split that out into a "X.to_string()" and "X.to_bytes()" variants.
I also deal with formats which almost always only use ASCII but where people end up putting in Latin-1 or UTF-8. The API was something like "X.get_field(name)", where name was a byte string, and it returned a byte string.
(GIGO and it's your responsibility to deal with your byte strings.)
In Python 3, X had to acquire an "encoding" and "errors" parameter, so it could know the expected encoding for the entire record. But to make it work, in practice each field could have a different encoding, so I ended up also adding a "X.get_field_as_bytes(name)". The 'name' can be a Unicode or byte string.
These changes to return a Unicode string from a byte string ended up adding a lot of code of the form:
try:
field = field_bytes.decode(self.encoding, self.encoding_errors)
except UnicodeDecodeError as error:
die("Cannot decode field %r (%r): %s" %
(field_name, field_bytes, error)
All of these new branches then meant adding unit tests to ensure coverage.
Parts of the code supported passing in a byte string, Unicode string containing only ASCII characters, or buffer object. There's no easy way to handle that in the API, so I eventually dug into Python's own source code to copy how Modules/binascii.c supports it.
I also had to switch all my code to use the new buffer API. Only to find I missed a PyBuffer_Release(), which caused a slow memory leak.
Then the performance in Python 3 was slower than Python 2, so I pushed more of the Python-level code into a C extension.
I'm not convinced the new API is all that much better. It's certainly harder to implement and therefore maintain, and I don't have all that many users.
But I can say that "you can convert that in 2 days" feels like an optimistic statement. The test suite contains over 2000 byte string constants, and even at 5 changes per minute that one change alone would take 6 hours.
Admittedly, part of the reason it took 6-8 weeks was because the result works under both Python 2.7 and Python 3.5+. It is not so easy to make compatible Python/C extensions. But as literally all of my users were using Python 2.x, I wasn't going to drop support for at least Python 2.7.
(I learned a few months ago that one potential customer was still using Python 2.6 on their compute cluster. They have since upgraded to Python 2.7. I warned them about 2020.)
You took my "port 20K LOC python code" to "take 70K LOC of mixed of Python and C and make it compatible with 2 and 3". That's an entirely different story, and is definitely the average project.
Agreed, but are you saying the scaleup is non-linear in LOC?
Because simple linear scaleup suggests 6 days of work, not 30-40 days of work.
I also pointed out that just the test suite, which not much bigger than 20KLOC, required hours of work to tweak all of the b"" strings correctly.
One of the other tedious difficulties is that this code base started during Python 2.5, so there were many submodules with print statements which needed to turned into print functions.
(Annoyingly, some "print >> f, s" debug lines, enabled with an undocumented environment variable, made it to release, because that is still valid syntax under Python 3 ("unsupported operand type(s) for >>: 'builtin_function_or_method'") and it wasn't part of the test suite.)
I looked at another code base of mine, which is 18.5 KLOC (9.5KLOC my source, 4.5KLOC for a third-party submodule which already supported Python 3, and 4.5KLOC of a machine generated table which needed no porting). This one started under Python 2.7, and made sure to use __future__ compatibility even when it was Python 2.7-only. That helped reduce the porting cost.
There are 23 commits which specifically mention fixing compatibility problems. However, I did the Python 3 update during a major refactor, and included Python 3 changes as part of the refactoring. I then re-tested under Python 2 to check for compatibility. This means I can't identify all of the changes which are 2->3 specific.
These are mostly the usual iteritems()->items(), zip()->list(zip()), itertools.izip()->zip(), a compatibility shim for isinstance(obj, basestring), open(name, "U") vs. open(name, "r", newline=None), etc.
Some of the trickier ones were: how to read bytes from stdin (sys.stdin.buffer); explicit flushes on sys.stderr (not needed for Python 2.x); a StringIO-like wrapper around BytesIO so I could write() both bytes and Unicode strings containing only ASCII (which is what cStringIO.StringIO did); and a few workarounds for changes in argparse behavior.
The trickiest is that I allow the user to pass in an eval-able mathematical expression on the command-line. My code first compiles the expression and checks that all of the global variables are known, in order to generate a more detailed error message than Python itself would. In Python 3, True and False are keywords, while in Python 2 they are built-ins. I didn't originally allow "True"/"False" in the expression, so there could be simple expressions which were valid under the Python 3 version which weren't valid under the Python 2.
While the change was easy, figuring out that I needed the change was harder.
I don't know how long it took for this code base. Based on the bunching of the commit messages, it looks like I spent at least 16 hours (= 2 days) on it, figuring from first commit of each bunch to the last.
% hg history | grep --before 1 -i 'python.*[23]' | grep date
date: Fri Jun 02 05:04:30 2017 +0200
date: Fri Jun 02 03:51:28 2017 +0200
date: Thu Jun 01 03:37:37 2017 +0200
date: Thu Jun 01 01:16:33 2017 +0200
date: Mon May 29 17:07:00 2017 +0200
date: Sun May 28 04:48:14 2017 +0200
date: Sun May 28 03:38:34 2017 +0200
date: Sun May 28 03:18:56 2017 +0200
date: Sat May 27 00:28:08 2017 +0200
date: Fri May 26 16:29:03 2017 +0200
date: Fri May 26 16:28:43 2017 +0200
date: Fri May 26 15:37:40 2017 +0200
date: Fri May 26 14:26:47 2017 +0200
date: Fri May 26 13:56:42 2017 +0200
date: Thu May 25 04:38:45 2017 +0200
date: Thu May 25 03:52:04 2017 +0200
date: Thu May 25 03:51:29 2017 +0200
date: Thu May 25 03:51:01 2017 +0200
date: Thu May 25 03:36:34 2017 +0200
date: Thu May 25 01:33:26 2017 +0200
date: Wed May 24 18:44:42 2017 +0200
date: Wed May 24 16:52:40 2017 +0200
date: Wed May 24 16:51:47 2017 +0200
This is about twice as long as your estimate would suggest. (I'm excluding the auto-generated data table and the third-party package from my LOC count.)
> Agreed, but are you saying the scaleup is non-linear in LOC?
Yes. Complexity is always exponential.
> Because simple linear scaleup suggests 6 days of work, not 30-40 days of work.
Well, again your project is particularly hard: c extension + Python 2 / 3 compatible code base. That's not the common case.
You can always find somebody with a particular situation. And we hear those persons more, and not the hundred ons with just a few scripts or a website.
> I also pointed out that just the test suite, which not much bigger than 20KLOC, required hours of work to tweak all of the b"" strings correctly.
That's Python 2 / 3 compatible code base for you. This tweak is almost a no brainer in pure python 3.
> many submodules with print statements which needed to turned into print functions.
2to3 does that automatically, and many other things, that you can even cherry-pick. If you don't use the provided tooling, you make your life harder.
> These are mostly the usual iteritems()->items(), zip()->list(zip()), itertools.izip()->zip(), a compatibility shim for isinstance(obj, basestring), open(name, "U") vs. open(name, "r", newline=None), etc.
python-future provide all the helpers for that: normalized builtins, import aliases, wrappers, etc. It has existed since 2013. People need to spread the word about the proverbial wheel on this one.
It's like using
> Some of the trickier ones were: how to read bytes from stdin (sys.stdin.buffer); explicit flushes on sys.stderr (not needed for Python 2.x); a StringIO-like wrapper around BytesIO so I could write() both bytes and Unicode strings containing only ASCII (which is what cStringIO.StringIO did); and a few workarounds for changes in argparse behavior.
Yeah, that's the hard part. You can't automate that. But it's what, 3% of your code ? Unless you hate DRY.
It's quite fair that bumping an entire project to an non compatible version of your plateform requires you do this.
> The trickiest is that I allow the user to pass in an eval-able mathematical expression on the command-line. My code first compiles the expression and checks that all of the global variables are known, in order to generate a more detailed error message than Python itself would.
Quite a cool project :)
> This is about twice as long as your estimate would suggest. (I'm excluding the auto-generated data table and the third-party package from my LOC count.)
Granted. Still, 4 days of porting for an entire project one time in a 25 years old language feels ok in my book.
Don't forget that since 2000:
- Perl 6 has been late for a decade.
- PHP failed its V6 and jumped to V7.
- NodeJS forked and merged twice.
From this point of view, what Python did is an amazing accomplishment.
If "Complexity is always exponential." then a 10KLOC project should take less than 1/2 of the 2 days time that you estimated for a 20KLOC project. Which is contrary to my experience.
> This tweak is almost a no brainer in pure python 3.
The timing estimate I made assumed 4 changes per minute. That's already pretty no-brainer. Are you suggesting it should have been faster than that? I don't know how. Not all of the strings needed to be converted to byte strings.
> python-future provide all the helpers for that
First time I heard of it. I thought we were supposed to be using 'six', which is where I got the shims from.
If I understand correctly, that package should not be used for Python libraries which want to maintain Python 2 compatibility. That is, if I have library X, and some of my users are on Python 2, then I shouldn't use python-future because the 'standard_library.install_aliases()' call modifies builtins, which may change how Python 2 works for those users.
> 4 days of porting for an entire project one time in a 25 years old language feels ok in my book.
Yes. My point is that I think your original numbers are optimistic, not that they are fundamentally flawed.
No mac comes with Python 3 installed. So, if I want to distribute simple scripts, I can either send a file and say "run this", or I can give a long list of instructions to install Python 3. Similarly, many linux distributions still don't, or only very recently, started distributing Python 3.
I don't know of any place which would have Python 3 and not Python 2.
So, the question is, why should I switch to Python 3? Python 2 is everywhere, Python 3 is not, and as far as I'm concerned I don't care about the differences.
Only running things that are installed by default is an odd concept. Sure, it's an extra step and you have a "similar enough" language sitting in the base OS. But I never stick to vi because vim isn't installed and I don't stick to sh because bash isn't installed etc; Sometimes the improvements are worth-while.
But the need for "upgrade" is not arbitrary, I wouldn't claim that you need to upgrade to maintain support for instance, since dropping support for 2.x is seemingly arbitrary in isolation.
I would instead claim that the features in python3 (and some of the backwards fixes like UTF8 everywhere) are incredibly desirable.. Static method decorators, improvements to string formatting along with, of course, asyncio and other goodies like ordered dictionaries and key-word only arguments.. There's a lot going on there and most of it is good stuff but it's impossible to backport because 2.x is architecturally broken and required a breaking upgrade.
In answer of your question; I have python3 installed on all servers. SaltStack supports it so even on Windows we use Python3.
Don't people generally just use some sort of package manager to distribute scripts and their dependencies? I don't use Mac so maybe they're way behind the curve in this.
Well, as I write in the article, Django 2.0 was a turning point to me. I was always using Python 2.7.x and was really happy with it but, since Djano 2.0 does not support python 2.x anymore I have been forced to move to Python 3.x in order to be able to write Django 2.0 apps.
Yes I know that Django 1.11 (which works with Python 2.7.x) will be supported until 2020 but, out of principle, I don't want tostart a new project with the previous Django version. Upgrading my old Django projects to 2.7.x is another (rather painful) story of course ...
It didn't work for me when I tried it out; there was key functionality broken for various tasks I was using. I'm sure it will be fixed eventually, but there's still some work to be done IME.
Well, systemd was useful when you had 1000 start up systems on linux if you wanted the same easy one for all.
But indeed, now with systemd becoming ubiquitous, it's a better solution unless you target also windows.
But honestly, when I need something to handle my process, I just attach them to my crossbar.io instance since I have it running anyway. This way it's crossplatform and easy my deployment.
It's insane to think that Python 2 is going to go away because of the existence of Python 3. That is simply not how computer languages work. Python 2 has more than enough critical mass to continue pretty much forever in some form. It is even a reasonable target for new development because it is pretty much guaranteed to be stable. I would not be surprised if there was another initiative to fix Python 2's Unicode handling that would create a fork from Python 3 in the future as Python 3 took an approach that did not catch on.
> Python 3 took an approach that did not catch on.
In your world maybe, but 97% of the project and training I took part where python 3 since Python 3.5 came out. And this spans in a lot of companies and countries.
Elephant in the room is that this is a chicken/egg problem.
The best of both worlds is a temporary transition period where you can run both (like we want both IPv4 native and IPv6 native). Unfortunately because that doesn't get adopted either we get in a situation where people need to pick either.
Articles like these help, but I use Chocolatey to maintain software on a Windows system.
So true. Just last week I thought I would start to use python 3 because I was starting on a new project and had a new computer. I installed anaconda python 3 only to have it crash when starting jupyter because the new python 3 version doesn't support backtick quoting for some reason, and jupyter was using backtick quoting.
Oh well, back to python 2 for me, which will be stable for at least 2 years. Hopefully by then, I will be able to switch to another language.
no one mentioned this, and maybe I got things wrong, but python 3 is significantly slower than python 2. at least this is certainly true using spyder (maybe its not in other cases)
laptop py 2.7 (4 cores 3ghz, 16 gb ram) vs desktop py 3.5(6 cores 4.5 ghz, 64 gb ram) - both windows. When I am running my models for kaggle competitions, my laptop usually finishes first by about 20% total time, unless the model I am running can take advantage of my GPU.
edit: I never figured out why, but if I generate a submission with 3.5x, 75% of the time its accuracy is significantly worse.
Very interesting anecdote, perhaps something wrong with the build?
I know that my python3 saltstack returners return before the python2 ones.
I have not done probing as to the exact performance profile of py2 vs py3 though- I would be surprised if python3 was actually slower though- unless it's to do with the UTF-8 everywhereness or something.
Easy: python 2 works, and python 3 offers me nothing I need to justify any effort to port code. Python 3 is a different language from python 2 (so people keep saying), and there's no business case for changing my stack's language since my stack works fine right now.
Well, I am greek so accents would be the smallest of my problems if I tried to use my actuall name as a username (Σεραφείμ) :)
Installing stuff on directories containing characters that are not plain-old ASCII letters and underscores (for example there are huge problems when installing things on directories with spaces in Windows) is a major no-no and I didn't even think of mentioning it.
And with anaconda distribution, you can't have any special characters, nor spaces in the path of the installation directory. And, in some machines that have multiple users, the default installation directory is a hidden folder which causes all sorts of issues and requires a reinstall.
Just some of the issues I've encountered running python workshops. Linux and Mac users have far fewer issues.
As I have a minimal programming knowledge I dont use windows as my dev machine. I am just stating some problems that you can face if you use your real name. For example, I was working for a contractor once and they all had windows machines with pregenerated users. Mine had an accent in the username.
I get around this by installing both version of WinPython. It's super easy since you install them into separate directories. You can double click into the command prompt shortcut that links to the folder and use pip to your liking for each version. Changing the interpreter in PyCharm is about two clicks too.
Once you embrace it, using Python 2.x or 3.x on Windows (or MacOS, or Linux) involves changing 1 line in 1 file and you're pretty much done. Each app you develop can also use whatever version you want with total isolation.
There's no Windows hoops to jump through or virtual environments to create.
My approach to using both py2 and py3 in windows:
Use 2 folders of WinPython (completely portable) for py2 and py3.
A shell script which changes all the necessary env vars from 2->3, 3->2.
Just double-click on the shell script when you want to switch and you're done.
This was my earlier approach before I moved to using 3 for most things. At that time, I tried moving env vars to 3->2 and using the Windows Launcher and haven’t looked back. It makes everything work so much better. This article actually does a fairly insufficient job of explaining it. The official docs on the Windows Launcher are worth working through. Being able to use #! in my scripts and work seamlessly with ‘py’ commands in terminal is superior in every way.
[1] https://conda.io/docs/