
Will Scientists Ever Move to Python 3? - johndcook
http://jakevdp.github.com/blog/2013/01/03/will-scientists-ever-move-to-python-3/
======
zzzeek
I've observed the scientific community to be generally curmudgeonly about
software issues overall, so I'd predict that they will be the _last_ group to
move to Python 3. So once every other community has moved over (in this order:
hobbyists, bleeding-edge startups, tech companies/startups in general,
corporate environments), then it will be their turn, and they certainly will
because the ecosystem will have completed moving over.

As far as 2to3/3to2, that entire process in my opinion will be going away;
once you've pegged Python 2.6 as your bottom version, it's straightforward
(though not without effort) to produce a codebase that runs in Python 2 and 3
without any changes. I've already done this for my template language Mako,
where version 0.7.4 has been enhanced to support Python 2.4 all the way
through the latest 3.3 without any 2to3/3to2 step.

~~~
ajdecon
OT, is there a good guide somewhere for writing software which works in both
2.6+ and in 3.x?

~~~
zzzeek
yes, check this out: <http://python3porting.com/noconv.html>

then read the source code (very short) to six:
[https://bitbucket.org/gutworth/six/src/64891e65200f9853aa64d...](https://bitbucket.org/gutworth/six/src/64891e65200f9853aa64d370184de838deee249c/six.py?at=default)

I don't actually pull in six, I just include those functions which I need
locally, usually with some tweaks.

------
ChuckMcM
Why? One of the interesting things about science is that your results are
_better_ if you don't change too much. So if you're using a python 2.4 to do
your data analysis and now your re-creating your experiment from 5 years sgo,
if you have a new Python and libraries and you get different results, then one
of the things you will need to do is put everything back to 2.4 and see if
your results changed because of the software or because of the experiment.

~~~
_dps
I'd add that for the bulk of people writing software to support science,
programming practices are among the least of their concerns. A data analysis
program supporting a particular research project is more akin to a
mathematical formula than to a piece of software engineering. Upgrading it to
Python 3 would be about as motivating as going back and updating your
equations in a research paper because a new font became available.

Moving your current/new development to Python 3, to a scientist, sounds a lot
like "So there's this new font. It sort of works like the old one. No it
doesn't let you write any new equations, and it may or may not complain about
being used near papers written in the older font. We know you don't
particularly like fussing with typesetting, but how about you invest in this
maybe-compatible thing instead of spending your time doing your science? We
promise you, the professional typesetters have moved beyond that stodgy
Yourfont2.7 you've been using to write your papers."

~~~
catch23
So maybe the best way to encourage adoption is to convince a researcher in the
field to implement their new math algorithms in python 3. Anyone who wishes to
use these new algorithms would be forced to upgrade.

------
sprash
Hopefully never. All the new changes might make Python better in the view of a
computer language consistency fetishist (or maybe for web apps). But for
practical purposes in data analysis things got worse not only because of the
broken backward compatibility.

Letting major APIs return iterators or views instead of lists just introduces
unnecessary complication. Most people doing data analysis don't even know what
these structures are but they definitely know lists.

Scientists have to deal a lot with bytes or strings of bytes but never
Unicode. Python3 treats Unicode as first class citizen as opposed to raw
strings of bytes like in Python2.

Sometimes you have to convert a lot of clear text data formats and needing to
use 'print(x, end=" ")' instead of a simple 'print x,' makes me cringe every
time. Printing something is substantial why shouldn't be a statement.

Finally the loss of performance in Py3k is the straw to break the camel's
back. I use numpy because it is usually faster than the stuff I wrote myself
in C. I use python because I got the best performance without having to care
much about programming.

I want to be able to reproduce results I or other people did 10 years ago.
Maybe poeple in 100 years want to do that. A language that breaks backward
compatibility for trivial consistency issues is definitely not suited for
that. The easiest solution would be to stay with 2.x for ever or choose a more
stable language.

~~~
GeorgeTirebiter
I agree. "Python" to me means "Python 2.4" -- all the rest is just sugar. I
have no reason to "upgrade".

However, one way to get people to upgrade is to have BACKWARD COMPATIBILITY.
Yes, I know, Python developers consider backward compatibility some sort of
noose, but it is the only thing that permits forward-going change without
losing the existing user base. It would be easy, too: at the beginning of each
file, have """Python 2.4""" (etc) and _everything_ would "just work". There
would be no _need_ to wait for anybody to make libraries compatible with new
versions.

The Python developers have done us all a huge disservice by ignoring full
backward compatibility. And while I appreciate all of the time and effort they
have donated to the community, I believe this one single issue, backward
compatibility, matters most.

------
lmm
Unicode support may be a small thing for scientists, who are usually dealing
with numbers rather than strings. But it's absolutely huge for the rest of us,
and that alone will drive the big libraries to python3. It'll be slow going -
I've essentially stopped work on my biggest personal project as I'm stuck in a
mess of libraries that have moved to python3 and libraries that haven't - but
unlike with perl6, I haven't seen any major libraries explicitly planning not
to support python3.

So yeah, 2018 may be a lower bound for the point at which the scientific
community moves to python3. But that doesn't mean it won't happen.

~~~
tellarin
Well, it depends on what kind of research you do. My former team worked with
different text corpora in multiple languages. Proper unicode support is/was
very welcome.

------
xaa
Speaking as a scientist: the reason for me is that there are lots of breaking
compatibility changes in exchange for very little benefit. The only
improvements I would care about -- fixes to the bad GC and broken concurrency
system -- weren't and still aren't on the table.

~~~
xyproto
I experienced the same deficencies using Python 2 and Python 3 for years in
projects at work. If you're just even a bit like me, you should find much
delight in the programming language Go.

~~~
xaa
Yes, there are lots of languages -- for example Go, Clojure, and even C++11 --
that are better languages in their own right and have also solved the
concurrency problem well enough.

Unfortunately, scientific programming is highly dependent on libraries. So, in
my field (bioinformatics), I'm basically constrained to Python or R. Python is
the less shitty of the two. You can see how thrilled I am about this
situation.

~~~
stcredzero
What about a way of calling out to Python and/or R libraries in Go? There's
already such a beast for Calling to Python from Lua.

------
madhadron
A lot of scientists are working on poorly administered clusters where the
installed Python is 2.5 or lower. I've even run into one 2.3 occasionally.
Some have versions of libc so old that you can't always compile newer software
(as I learned to my horror when I tried to install GHC on the VITAL-IT cluster
in Lausanne -- libc on that system is now approaching ten years out of date).

~~~
AUmrysh
Hopefully virtualenv has fixed that issue, do you think it could be
implemented after the fact in clusters?

~~~
herge
As great as virtualenv is, it won't help you with outdated versions of libc.

~~~
ajdecon
The traditional answer to outdated libraries on clusters is to install them in
a non-standard location and use Environment Modules[0] to select which
libraries to use. Modules are a common tool since clusters have big user bases
with lots of different needs, so you almost always have to have multiple
versions of some libraries available.

The classic example is keeping multiple MPI implementations around; but I know
scientists who keep multiple versions of several libraries in their home
directories depending on which apps they want to use, and just select the
right module groups.

That said, I've never tried this with anything as fundamental as libc...

[0] <http://modules.sourceforge.net/>

------
dman
One thing that doesnt receive enough attention is the changes to the Python
C-api. References :

1) <http://python3porting.com/cextensions.html>

2) [http://docs.python.org/3.0/whatsnew/3.0.html#build-and-c-
api...](http://docs.python.org/3.0/whatsnew/3.0.html#build-and-c-api-changes)

In the scientific computing world C extensions are used everywhere and moving
Python 2.x c extensions to Python 3 is a significant roadblock.

------
pnathan
As a note, it's really sad to see these kinds of articles. I do perhaps 90% of
my home programming in Common Lisp, a language which is around 30 years old,
but still has an active development community. _Without modifications_ , I can
run code from 30 years ago on a modern computer and have it return correctly.
I have no particular belief that this will be true for Python, Ruby, Perl 5,
Perl 6, C++, and a variety of other popular languages developed _after_ Common
Lisp was popularized and standardized. So much effort wasted upgrading your
language instead of solving your problems[1].

It's something to think about with your software choice: every line is a
legacy line for someone else to understand and compile/interpret/run. Can they
do it in the future? Do you _care_? If so, think hard and- I recommend- choose
a language which has a formal standard with multiple implementations.

[1] Obviously innovation is useful. But reinventing wheels into the same
design isn't. Obviously Haskell and other research-y languages will change.
I'm talking about production languages for production environments or for
other things persisting over the 5-20+ year spans such as science.

~~~
zurn
> [...] Common Lisp, a language which is around 30 years old, but still has an
> active development community.

OTOH many Common Lisp people lament the fact that the standardization stopped
with the 1994 spec, saying it led to the fragmentation of the implementations
with mutually incompatible extensions.

Do you think it would have hurt Common Lisp if there had been one dominant
implementation that everyone followed and reused libraries from, like with
Python?

~~~
pnathan
> OTOH many Common Lisp people lament the fact that the standardization
> stopped with the 1994 spec, saying it led to the fragmentation of the
> implementations with mutually incompatible extensions.

There is room for an update, I think. Probably something to do with thread
memory semantics. But the language _itself_ provides facility for a great deal
of extension. E.g., the default threading API (bordeaux-threads) is be
identical cross-implementation, although the exact Lisp system calls will
vary.

You have to be careful to consider what exactly needs to be changed and what
can simply be added by a macro & library function. Certain guarantees relating
to memory will fall in that area.

Of course I do not think CL is perfect. For one thing, it could have used a
respin post-CLOS to have a smoother type system and better interfaces.

> Do you think it would have hurt Common Lisp if there had been one dominant
> implementation that everyone followed and reused libraries from, like with
> Python?

Yes. There's only 1 python: python.org's python. Never mind Jython,
IronPython, and PyPy. We all are tied to CPython. :-( It defines the
situation, regardless of the docs. Really not ideal for the Python world.

Having a regularly evolving standard means that you have to port your code
constantly to ensure you're up to date with the latest hotness... or even be
able to interrelate with newer code. One of the great strengths of the POSIX
standard & C is that C89 has stayed constant and available on *nix for the
last two decades; C itself has stayed mostly constant for about three and a
half. The cost of forcing regular change is enormous and may well destroy a
community.

rachelbythebay, a blogger I follow, wrote a post I can't find right now, on
the problems when you rely on this sort of situation. She does a lot of
sysadmin work, usually (I guess) in C++, and was astounded when she learned
that just upgrading your system might shatter your software in Perl or
whathave you. I have that opinion (based on my experience) as well. Having to
have had to contend with Python 2.4, 2.5, 2.6, and 2.7 all in the same company
for the same codebase, I have this conclusion: do it right the first time,
second if you can swing it. Common Lisp succeeds at that (n.b., before CL,
lots of Lisp fragmentation existed). Python fails. C wins. Perl 5 is trying to
have run-time machine switching based on specified version (may have details
wrong there). Ruby fails. C++98 is "ok", C++11 may be a bear.

I do think that a language should be excruciatingly small and very effective,
and libraries should be built around that language; this allows libraries to
be evolved/replaced without having the core language altered. C got this
right. Scheme from R1-R4 also went this road, but as it is an academically
driven language, its been in the shadows by and large. I guess newer Schemes
have bigger standards though.

~~~
dman
I am very fond of lisp but even I find it amusing that you use Common Lisp on
one hand and think that programming languages should be excruciatingly small
on the other.

~~~
pnathan
Yes. It's pretty ironic. :-)

------
fnordfnordfnord
Some of them will never stop using Fortran, ever, until they die. My old boss
was trained with Fortran, and made a moderately successful career in particle
physics never needing to learn new tools. The ideas he needed expressed were
short, he was good at explaining them, and it was never too big an effort to
turn them into something that could run in PAW, ROOT, Python, VHDL, whatever
was needed. He did the thinking, and expressed it in the language he knew
best. He had postdocs, graduate students, and sometimes undergrads to do the
leg work. Would he have had a better career if he'd spent time learning new
languages rather than thinking about solving physics problems? I doubt it; and
he probably wouldn't have had as much fun.

~~~
sampo
_"Some of them will never stop using Fortran, ever, until they die."_

In my opinion, Fortran (meaning Fortran 90/95/03/08) gives both the ease of
native array syntax, like matlab, and leaner than NumPy, and the speed of
compiled language (like C, somewhat faster), and the safety of a typed
language.

For numerical computing with lots of 2-, 3-, or 4-dimensional arrays, I don't
really see any other language to be on par with Fortran.

------
pixelbath
Forgive the snark, but are we all moving to Python 3 now? Every time /I/ try
to use Python 3, I look for code samples and libraries written in 3, but
they're all sitting around version 2.7. Documentation searches bring up 2.7
docs for most things I want to do.

Don't get me wrong, I have nothing against Python 3, and many of the changes
seem fairly sensible, but I shouldn't have to fight against it to do something
that works perfectly well with minimal effort in 2.7. I guess that makes me a
curmudgeon, but I don't want to dick around with Python from a theoretical
standpoint, or take a great deal of time exploring its features. Since I
usually only use it for quick-and-dirty scripts, this is my ideal use case. I
honestly don't care how much better iterators and GC are handled with it.

------
Xcelerate
Whatever tool is best for the job. I'm still using Fortran in some areas of my
research (molecular dynamics stuff), but then I use CoffeeScript to post-
process simulation results. Throw in some C when I need a good balance between
speed and rapid development, and Haskell when I'm just messing around ;)

------
alpb
I actually wonder, will startup developers ever move to Pyton 3? Or hobbyist
programmers? Python 2.x is still widely used in the industry and many startups
actually seem like they won't be able to make a radical change and move. What
do you think of status of Python 3 in the industry?

~~~
tdfx
I think once the web framework holdouts like Django and Flask move to Python 3
we will begin to see a broader adoption of Python 3 in the startup community.
I don't expect to see a lot of apps being ported to 3.x, but I would think
most new apps would probably be 3.x-based.

~~~
marcosdumay
Django was recently ported, but South wasn't, and then there are the CMS
packages - Mezzanine didn't even start, and I don't see how one can even
discover the Django-CMS status.

------
16s
I once held a position as a systems programming for a big data science
research institute at a large state University. RHEL 5.x comes with Python 2.4
by default. Our Platform compute clusters ran RHEL. We would build Python
2.7.x from source and use that rather than the older 2.4. However, we never
used Python 3, nor needed it. No one cared about it. It was not a topic of
conversation even. Python 2 got the job done and lot's of research code was
written in it, and still is.

I have nothing against Python 2 or 3. I'm just relaying my own experience. We
used a lot of C++ and Java too. And the Java guys were fond of Groovy. I did
mostly C++ and Python.

------
tomrod
I think this sums it up nicely: <http://codepad.org/H7quTJlE>

~~~
samuel
I couldn't agree more. If you're going to break compatibility, do it for a
good reason. Type annotations, that would appeal to scientists(BTW, a lot of
them are using Cython)...

~~~
nostrademons
Type annotations are included in Python 3:

<http://www.python.org/dev/peps/pep-3107/>

(More specifically, Python 3 includes generic "function annotations" that look
a lot like type declarations do in other languages, but can be used for other
purposes as well. The idea is to offload typechecking to a library, so you
could eg. include type inference, dimensional analysis, preconditions or DBC,
etc. as desired.)

------
spenrose
The article makes a cogent recommendation for robust 3-to-2 tooling as the key
missing piece. Recommended.

~~~
sopooneo
If we want people to migrate _to_ version 3, wouldn't we want 2-to-3 tooling
so people could migrate their old work? I assume I am misunderstanding the
purpose or the meaning.

~~~
KMag
Read the article. 2to3 is pretty robust, but 3to2 is much less usable.

The argument is you need the major library authors to start primarily coding
in Python3 and using 3to2 to backport to Python2. Until this happens, the
quality of Python3 libraries will suffer and there will be disincentives to
migrate.

------
api
Stuck-version disease is a major problem for all these newer interpreted
languages.

~~~
untog
Python is over 20 years old. Where is the line for "new"?

~~~
Aloisius
Everything after (or perhaps including) the Bourne shell is new.

~~~
tesmar2
Whipper-snapper! You are probably using that new-fangled C language for all
your high level work.

------
w1ntermute
I highly doubt it. There are a lot of scientists still clinging dearly to
FORTRAN90. Many scientists think in terms of the next grant/paper, not the
long-term goal. This results in short-sighted software decisions, which are
exacerbated by the fact that most scientists are (very poorly) self-taught
coders.

~~~
fnordfnordfnord
1\. These folks live or die by the status of the next grant. Most of them are
not motivated by money, but rather by keeping a lab open, with a steady stream
of students flowing through it. Let's not forget that there is more than one
goal in science (solve problems, feed family, train more
scientists/engineers/technicians/etc).

2\. Their job is to solve scientific problems, not solve them in ways that
make computing folk feel good.

3\. Arguably the scientist who wastes time fretting about the technical
details of software rather than their research or their grant, is the short
sighted one.

~~~
stcredzero
_> 2\. Their job is to solve scientific problems, not solve them in ways that
make computing folk feel good._

Actually, it should be. Whole fields can and do suffer because of generally
crappy software practices. (Medicine) It should be thought of as not pissing
off your suppliers.

------
anonymouz
For me the two issues are:

1) Sage, which I use, is a huge project incorporating lots of other scientific
software and using loads of Cython. It simply takes time to migrate all the
dependencies and then Sage itself to Python 3, and it's not really a priority
for anyone.

2) It's not really something that anybody seems to focus on a lot. Scientists
mainly worry about science, minor differences in the programming language used
are generally of little concern. Migrating to a different version is therefore
not something most of them would spend time on, unless forced to by external
circumstances.

------
kedean
Something the author didn't mention is that a lot of science-related python
code relies on third party libraries to support devices they use. SR
Research's Eyelink device, for instance, is accessable through API's provided
by SR themselves, which are provided in C and in a number of Py2.x flavors.
They don't provide it in 3.x, and so the experiment cannot be written in 3.x
either. The scientists have no control over this, because its not an open
source project, it all relies on whether or not the company producing it wants
to upgrade.

~~~
KMag
But if a native library is provided, scientists aren't powerless. Sure, they'd
have to either call the libraries directly using cytpes or write thin C
extension stubs, but it's quite easy unless the API is huge.

------
sampo
Libraries and legacy code are an issue, but if the Julia (
<http://julialang.org/> ) folks manage to deliver most of their promises,
Julia might cover a lot of the data processing needs (and is faster than
Python), for which people use Python + NumPy at the moment.

And the main reason some scientist migrated to Python is probably that Matlab
is non-free, and the language of Matlab sucks. Julia might attract both Python
users and future Matlab refugees.

------
rflrob
As a scientist who teaches Python to other scientists in the summers, I wonder
if what we instead need to do is just get a sufficient mass of _new_ Python 3
programmers, and let them drag the rest over. Many fields long ago accepted
that programming is a vital skill, but that's only recently been true in
biology (if it even is true yet), so there's a lot more people who will be
learning in the next few years, as compared to, for instance, astrophysics.

~~~
ajdecon
Downside: those new Python 3 scientists will be unable to use the Python 2
libraries which already exist in their domains, so they'll all end up re-
implemented poorly...

------
zurn
Hmm.. they sound a lot more eager than the networking world. Django, Twisted
and Gevent are all still lacking Python 3 support. Saying that the biggest
scientific packages only got support in 2010-2011 sounds like a pretty good
track record to me!

------
GeorgeTirebiter
It would help if the old "print" statement would be re-instated. And rename
the new "print()" as "printf()".

~~~
KMag
I'm guessing you're just venting some bitter sarcasm, but if not...

I can (kind of) understand your complaining about the print statement going
away, despite its inelegance. However, if the semantics haven't changed, why
change the name?

------
g3orge
can someone explain why python has more scientific projects than ruby for
example... why is it considered as better for that type of work?

~~~
3pt14159
Scipy and Numpy blow anything Ruby has completely out of the water.

~~~
blakeweb
Agreed, although just wanted to mention SciRuby for rubyists jealous of scipy
and numpy. It's in development (led by a good friend of mine) and
contributions are welcomed. <http://sciruby.com>

------
JohnFromBuffalo
Once they saw my Anaconda v1, why switch to a more mature Python 3?

------
alxndr
tl;dr: no

<http://en.wikipedia.org/wiki/Betteridges_law_of_headlines>

~~~
tomrod
Could easily be rewritten to say "will Scientists avoid moving to Python3

