

Python 3 in Science: the great migration has begun - ngoldbaum
http://astrofrog.github.io/blog/2015/05/09/2015-survey-results/

======
krychu
> The main reason for Python 2 users to not switch to Python 3 is the lack of
> motivation/killer features. We need to therefore be more proactive in
> encouraging people to switch to Python 3 by (a) making sure that any new
> users are always directed to the latest Python 3 version, and (b) releasing,
> in the near future, new major versions of packages for Python 3 only, while
> maintaining long term bugfix support for Python 2 versions.

That's the evil right there. Read carefully what's written.

" _The_ main reason for Python 2 users to not switch to Python 3 is the lack
of motivation/killer features". Which says that users _are_ familiar with what
Python 3 has to offer but consider it not good enough. Why would you then jump
to a conclusion that you have to be more proactive in directing people to
switch to Python 3. Or even better, to stop adding features to what 81% of
people use.

~~~
ubernostrum
In my experience the main reason is actually people who just looked at 3.0,
decided it didn't have anything new and required work to port over, and then
never looked at 3.1, 3.2, 3.3, 3.4 or the in-progress 3.5.

Which means those people don't know about/don't get to use, among other
things:

* Vastly improved unittest module, including mock objects

* Dictionary-based logging config

* The sysconfig module

* The pathlib module

* Built-in enums

* Single-dispatch generic functions

* The statistics module

* The asyncio module

* The "yield from" syntax to delegate to generators

* The lzma module

* The ipaddress module

And that's without getting into 3.5, which adds things like the matrix
multiplication operator that science-y packages _will_ care about and probably
port for.

~~~
lqdc13
Almost none of these things are used in scientific computing though. Maybe
asyncio and yield from.

~~~
ubernostrum
I suspect enums have uses, as do the improved async code constructs (and
there's more of that in the pipeline).

~~~
m_mueller
Yes, I do use python for scientific applications and enums are one of the
things I miss - I've built around it in python 2.x by using this construct:

    
    
        def enum(*sequential, **named):
            enums = dict(zip(sequential, range(len(sequential))), **named)
            return type('Enum', (), enums)
    
        Init = enum("NOTHING_LOADED",
        "DEPENDANT_ENTRYNODE_ATTRIBUTES_LOADED",
        "ROUTINENODE_ATTRIBUTES_LOADED",
        "DECLARATION_LOADED"
        )
    
        myState = Init.NOTHING_LOADED
    

3.x is still a non starter for me since many clusters where I want my software
to work still only come with 2.6, e.g. I can't even use dictionary
comprehensions. Adoption for scientific software is heavily influenced by how
many external dependencies you have - requiring users without root access to
compile python 3.x for their cluster home is a no-go.

~~~
ubernostrum
No offense meant, but if you're at a stage where you use a pre-2.7 version
because that's what comes with the OS, then it's not really a case of "looked
at 3.x and decided it didn't have anything".

OS vendors who will be supporting ancient Python versions until the next
decade are a blight on the entire ecosystem.

~~~
m_mueller
> it's not really a case of "looked at 3.x and decided it didn't have
> anything"

I agree, this isn't the main reason why I'm not looking at 3.x.

------
skierscott
I'm a scientific user with no incentive to switch; I'm still on Python 2.7.

But the release of the @ matrix multiplication operator in Python 3.5 gives me
strong incentive to switch. I'll be switching as soon as Anaconda support
Python 3.5.

------
analog31
I'm a scientific user, and work in a R&D environment for scientific
instrumentation. I'm probably the most advanced Python user at my site, but
have inspired a number of colleagues to give it a try. We use it for pretty
much everything but developing our actual commercial software.

Perhaps an added dimension worth studying is that Python is so widely used by
people who have little chance of even understanding the differences, aside
from the print() function. I'm not sure that I could clearly articulate them
myself.

I migrated to 3.4, just so I could find out and then tell people from my own
experience that 3.4 is not broken, and that it supports a sufficiency of
packages. In other words I'm using myself as a guinea pig. At the same time, I
offer to help people by maintaining my own programs in a way that lets them
run on 2.7 or 3.4 systems, e.g., with some kind of "if version > 3" verbiage.

Given that I know my audience, I can tell them confidently that they will not
get hung up by the 2 vs 3 dichotomy, and that I will help them if it ever
becomes an issue. I think the benefits of Python, including the giant
ecosystem of packages, outweigh the risks of choosing the wrong version for
most of us.

Edit: One more thing for beginners, a lot of us have working Python
installations on our computers, but we don't quite know how or why. A useful
migration tool might be a program that analyzes your 2.x installation, tells
you what you've got, and maybe even offers to build an identical 3.x
installation for you.

------
coldtea
> _But this is all wrong – we should be teaching new users to use Python 3!
> New users won 't thank you if you teach them Python 2 and they have to
> migrate all their scripts to Python 3 in a few years..._

Believe me, they won't have to. Python 2 will stay with us for a loooong time,
and most NEW web stuff AND scientific work is done with it.

> _However, the Python developers have now stated that there will be no Python
> 2.8 release. Essentially, no new features are going to be added to Python 2.
> In fact, after 2020 (which is not so far in the future), Python 2 will no
> longer be supported._

Yeah, let's see how this works out. After all it's the same guys who said that
Python 3 transition would end with 3 being the default choice by 2014.

~~~
coliveira
Whatever the result of this, the transition Python 2->3 will remain as one of
the most troubled ever for a mainstream programming language (of course,
nothing beats Perl 5 on this area). If you consider that Java is on version 8
and C++14 is here, we can see that even old languages have done this in a much
more organized way.

~~~
lqdc13
Isn't Python the only one with breaking changes out of this group?

~~~
unscaled
Which group are you referring to? Mainstream languages? The languages
mentioned above? Perl 6 was designed from start to be very breaking, but even
putting Perl 6 aside, Ruby had quite a few breaking changes going from 1.8 to
1.9 - and that was a merely point release.

Breaking changes are tough, but many libraries with a large codebase managed
to update to Python 3 and even maintain dual-version support using compatible
features, compatibility libraries and automated conversion tools.

For the everyday users things would be much simpler, and hence the main
reasons for converting is not "too many breaking changes" but rather "not
enough compelling changes".

~~~
maxerickson
It's probably reasonable to compare the breaking changes in Ruby 1.9 to Python
1.6/2.0, when Python first added language support for Unicode. There wasn't
much in the way of blow back.

If there is at some point a decision to significantly change the semantics of
string handling in Ruby (I think tagged strings are the addition of string
handling, not a change to it), expect just as much whining as Python has seen
(but I think people will look at how it has gone and avoid doing it).

------
tzs
> Firstly, most users are using either Python 2.7 or 3.4

That was an amusing statement. His data shows 81% using 2.7, so that 3.4 could
be replaced with any version and the statement would remain true.

~~~
sfilipov
The usage of 3.4 is 16% and combined with the 81% for 2.7 that gives us 97%.
You can't replace 3.4 with "any version" and keep the statement true.

~~~
Walkman
"Firstly, most users are using either Python 2.7 or 100" That is 81% (0%
version 100) using Python 2.7 or 100. AFAIK anything over 50% is most :)

------
BuckRogers
Why can't those who want to use Python3 just use 3? Why do they have to push
the rest of us over? This read as war against the silent majority to me.

The transition has failed. Python is now 2 separate communities and we only
have the 'leadership team' to thank for that. Until they make more compromises
this will continue.

That ~17% userbase is your new Python3 community.

~~~
Animats
There's a definite "screw the 2.x users" attitude in the Python community.
Report a bug in 2.7 and see what happens. This attitude is encouraged by
Python's little tin god.[1] The big excuse for Python 3 is that it does
Unicode. Python has done Unicode since Python 2.6; it just wasn't the default.
I've been writing all-Unicode Python for about five years.

The killer problem is that many package developers, faced with the 2->3
conversion, abandoned their packages. New packages were written by others to
replace them. Users are thus forced to convert to using different packages,
packages with different bugs and a much smaller user base. I wrote previously
on converting a medium-sized production system from Python 2 to Python 3. I
was finding bugs in third-party Python 3 packages, bugs so blatant that they
would have been found years ago if the packages were being widely used. This
was in the web space; the article indicates similar problems in the scientific
space.

[1]
[https://www.python.org/dev/peps/pep-0404/](https://www.python.org/dev/peps/pep-0404/)

~~~
BuckRogers
Worse it becomes an emotional plea, "do the right thing". As if not switching
sides to 3.x is a moral issue. And the myth that 3.x is in our best interests
or "inevitable". The author is fooled into this even though his own numbers
show they can't crack 20% even though 3.0 was released in late 2008.

Using those numbers, even if the Python3 community doubles in another 7 years
they'll still be at 34%. I think that would be an optimistic outlook as many
people are simply finding something else to use instead of move to 3. I'd be
as interested in Ruby 3.0 or Go as I would Python3. Many programmers for some
reason cannot resist technical churn. To me that's what Python3 is. All things
considered not better, not worse. Just different. But with the library issue.

My guess is Pyston will catch on and maintain the 2.x line after 2020. Pyston
may end up the innovation that Py3 wasn't, but without breaking everyone's
code. LLVM JIT compiler with C extension support. It's music to my ears and
transitioning from being a Python programmer to a Pyston programmer sounds
good. I suspect by 2020 it'll be ready as a spiritual 2.8, and the 80%+ will
move to that.

Only this core dev team could've achieved defeat from the jaws of success by
shooting itself in the face.

~~~
Animats
The successor to Python 2 may, in practice, be Go. That's the direction Google
is going. Google used Python internally for non-speed-critical tasks, but the
performance was too low for anything that had to scale. A few years ago Google
hired von Rossum, and Google had a project, "Unladen Swallow" to produce a
faster Python.

It failed.[1]

Von Rossum is no longer with Google. Google hired others to develop Go, which
seems to be a good language for doing server-side web-related work. It's fast,
memory-safe, scales well on multiprocessors, and has lower development costs
than C++. That's what Google needed in their business.

Google maintains many of the key Go libraries. They're well-exercised
production code. This is not the case for Python 3, as I spent a painful month
discovering. I have some technical criticisms of Go, but when I write
something in Go, it usually works as expected without surprises. You can use
Go for important work with confidence. Python 3, six years on, isn't there
yet.

[1]
[https://en.wikipedia.org/wiki/Unladen_Swallow](https://en.wikipedia.org/wiki/Unladen_Swallow)

~~~
BuckRogers
Failing something like Pyston taking over for 2.x, that is my plan. Going from
Py2 (what I currently use) to Py3 doesn't really do anything for me at all.
There's the constant threat online from people about 'support' and guilt
trips, but the support that matters (library support) isn't going away
anytime, if ever, with a 80%+ userbase in 2015.

So I'm going to stick with Python2, and if I migrate to anything it'll be Go.
Though it'll likely just be _also_ using Go. I've already worked through a
book on Go a couple years ago. I have my complaints about it, but I also have
my complaints about Python.

Nothing is perfect, but Go IS really easy to get started with, which is worth
a lot to an individual programmer or to a team. I'm not sure it's flexible
enough as a drop-in replacement for Python, but if forced it's good enough to
completely replace Python, if this Python2/3 debacle isn't resolved.

I'm seeing Python as 2 separate communities now. Of course the Python3
diehards see that as the worst-case outcome, since their community is always
the short stick. But I think we're here now and it'll remain this way.

------
deckiedan
Python 2.7 is an extremely productive language, for a lot of us. I want to
move to 3.x at some point, but there's very little I can't do with 2.7, and it
is the default on OSX, and many linux platforms.

The _one_ killer feature that would make me move almost straight away would be
making it a lot faster.

But since I have pypy, I'm much more interested in installing that as my
'extra python version' than I am in going to 3.x straight away.

------
zf00002
Is the high % of mac users still on 2.7 partly because that is what OS X ships
with?

------
Alex3917
Python 3.2 is the latest version that is PyPy compatible, so support for 3.2
shouldn't be dropped if that's a concern.

------
jcadam
I'm currently working on a _new_ project using Python 2.6. You see, I'm
creating an analysis tool that absolutely must run on an old, isolated (no
internet connection) RedHat system, and have access to only a very limited
number of packages I can install. Which, in my case, means Python 2.6, fairly
old versions of NumPy and SciPy, and Tk for the GUI.

This is actually my first significant project in Python, so naturally I wanted
to use the latest and greatest (learning opportunity and all), but no such
luck.

~~~
mappu
I've never quite been convinced by this argument. If you can deploy and run
.py files, surely you could deploy and run a local copy of python3? It's not
like it needs root.

The target system not having an internet connection is a best-case for a local
python3 package, since the lower attack surface makes package security updates
less urgent.

------
lucb1e
Perhaps off topic, but this was the most interesting to me:

[http://astrofrog.github.io/images/survey_plots/os.svg](http://astrofrog.github.io/images/survey_plots/os.svg)

People really use Linux in scientific communities? Like, non-computer people?
In the Netherlands Linux usage (or anything other than Windows and OS X), even
among university software engineering students, is almost non-existent. Three
out of seventy students I know use it (4%), including myself.

~~~
ngoldbaum
Linux is very common in the physical sciences. Back in the 80s and 90s a Unix
box from Sun Microsystems might have been more common, but Linux has been
firmly in place for more than a decade now.

The audience for this survey is also probably somewhat biased, since it was
mostly promoted on twitter. That said, none of my colleagues use Windows. I'd
say 75% Mac with the rest various flavors of Linux.

~~~
pjmlp
Currently I am doing some consulting work in life sciences for well known
multinationals.

On my specific case, Linux is only used in their HPC clusters and for hosting
some DB servers.

All researchers use Windows systems as their desktops and control systems for
their automated robotic systems.

------
JulianWasTaken
The scientific community should focus on moving to PyPy and helping to excise
CPython C extensions from more scientific libraries.

That's at least a change that will benefit them.

~~~
rhodysurf
The C extensions shouldnt be cut out of Python. I get that PyPy is better but
cutting out c sxtensions and rewriting them in pure python would hurt
performance more.

~~~
reipahb
Personally, I would really like to see more Python extensions written using
the ctypes foreign function library. This has two advantages:

* Supported on more than just CPython. I.e. you can use them on PyPy as well.

* The extensions are far easier to install, since one doesn't need both a full C compiler as well as all development headers for the library installed on the server where you install the extension.

~~~
fdej
ctypes overhead is horrendous though, at least in CPython (I don't know about
PyPy). Fine if you're doing array operations on huge arrays, but not so if you
have lots of small objects.

I wanted to use ctypes to wrap a C library for a recent project (for the ease
of installation and development that you mention) but had to give up when it
turned out to be more than 10x slower than a wrapper written in Cython, and
barely faster than doing a pure Python implementation of the C library itself.

------
nchammas
On a related note, Apache Spark recently landed Python 3 support in master,
which will be released as part of Spark 1.4 in the next month or so. [0]

[0]
[https://issues.apache.org/jira/browse/SPARK-4897](https://issues.apache.org/jira/browse/SPARK-4897)

------
has2k1
Does this[1] type of plot have a name?

[1]
[http://astrofrog.github.io/images/survey_plots/python_vs_exp...](http://astrofrog.github.io/images/survey_plots/python_vs_experience.svg)

~~~
po
hmm... I would describe it as something like a '2D histogram square-bin plot'
a 'categorical plot' or perhaps a variant of a 'mosaic plot' using color
density instead of area.

[http://www.math.yorku.ca/SCS/sugi/sugi17-paper.html](http://www.math.yorku.ca/SCS/sugi/sugi17-paper.html)

[http://en.wikipedia.org/wiki/Mosaic_plot](http://en.wikipedia.org/wiki/Mosaic_plot)

~~~
has2k1
Thanks, that is a lot more than I hoped for yet _very_ appropriate considering
what I plan to do with it.

------
jincheker
My conclusion is: if there were no Python 3, Python would be much better

------
watersb
Only 10% Windows? Has ESO switched to Linux?

~~~
frozenport
And half are astrophysics?

------
jbdigriz
I'm assuming you chose science for the basis of this analysis because
scientists would be more likely to provide thoughtful replies, if any reply at
all correct? I did like the analysis, but once the proselytizing kept popping
up, it started to lose value and actually stoke some some of the existing
tensions on this matter (we should stop teaching Python 2??)

So now we have to come down from the clouds into reality. For one, having
limited yourself to scientific fields (sort of) you have literally chosen the
tip of the iceberg. Let's ignore the fact that the stats clearly show abysmal
adoption of Python 3 among those who make a living by being on the cutting
edge and can most afford to adopt. But 786 replies?? And we should stop Python
2 for THAT (even though ~700 in that set aren't using it)??? If you venture
into the commercial world, this entire analysis becomes irrelevant. My field
of finance and derivative trading is one of the big up and comers in the
Python space for a number of reasons but likely to costs - rapid prototyping,
powerful testing, ease of training, etc. And I can say with a fair degree of
confidence that there are more Python developers at my firm than this entire
set - and all of them use Python 2.6. Or at least they did until a multi year
upgrade migration finally brought us to Python 2.7 just this year ( and cost
millions to achieve). Most are still using 32 bit, develop in Windows (on in
house built IDEs) and have some split of deployment between Window client side
projects (GUIs) and Linux server deployment. This is not to say we are
simpletons - we have vast and complex systems, globally distributed compute
grids with tens of thousands of cores (and equivalent disaster recovery
clusters), reactive caching graphs, time shifting time series, cutting edge
data stores manipulating massive data sets in the tens of billions of rows and
basically run huge swaths of the HUNDREDS OF TRILLIONS in notional derivatives
that drive the entire global economy. These systems have BILLION dollar
budgets. And mine is one firm out of thousands employing tens (if not
hundreds) of thousands of well compensated software developers.

If my point isn't abundantly clear by now, then to put it frankly: WE (and
many other industries like ours) ARE the Python community. And it shifts all
your stats to the point where Python 3 isn't even statistically significant.
So perhaps it would be wise to work with the elephant in the room instead of
trying to pull the rug out from under him. Because elephants are big and don't
like that shit. Moreover, as a multitude of examples can attest to, any
features left out of Python 2.X ultimately just get developed in house and
become closed source, with a huge inefficiency in developing the wheel again
and again but also locking up the critical knowledge on the best way to do it,
backed by proven production deployments with big money at stake.

I don't want to disway this sort of analytical review, but if it's going to be
science, it should bear at least some resemblance to it.

~~~
tomrod
I wonder if you and I have crossed paths at some point in the industry.

