
A plea for stability in the SciPy ecosystem - jnxx
http://blog.khinsen.net/posts/2017/11/16/a-plea-for-stability-in-the-scipy-ecosystem/
======
anc84
I am not convinced with the reproducability issues. To me, to reproduce a
scientific finding using the exact same versions of all softwares involved is
obvious (unless the authors rule out specific side-effects).

So to reproduce paper A from 2015 that used numpy 1.234 in Python 2.654 on a
32-bit Windows system and version abcxyz123 of obscure library X, I would need
to recreate those exact conditions.

Whatever happened in the future of the softwares is irrelevant. Bugfixes might
change results anyways.

~~~
itronitron
alternatively, scientific results should be robust enough to still be
observable with marginally different software versions

~~~
maltalex
Not sure if that’s possible in practice. Every analysis would have to be
written using several different software stacks... It’s probably doable only
if the software is fairly simple or if someone has a lot if extra funding to
burn through.

~~~
hyperbovine
Key word there is marginally. If your finding holds up using NumPy 1.10 but
not 1.11... you do not have a finding.

~~~
hogu
That's probably true for numpy 1.10-1.11, but I"ve seen other libraries make
dramatic changes (for example changing the default behavior of a function)

[http://pandas.pydata.org/pandas-
docs/version/0.17.0/generate...](http://pandas.pydata.org/pandas-
docs/version/0.17.0/generated/pandas.ewma.html)

pandas changed NaN handling for ewma pre 0.15, which can greatly change the
output if your data is sparse

~~~
hyperbovine
But again, if there are NaN's cropping up in your analysis, and your results
depend on what an intermediate tool decides to do with them, you have
overlooked something substantial.

------
iClaudiusX
This article never really addresses SciPy or NumPy despite the title and much
of the discussion. Rather, the author is ranting about the change from python
2 to 3.

And even then his only supporting anecdote is that matplotlib made a breaking
change to the way it handles legends. Meanwhile he was perfectly capable of
reproducing his scientific results 4 years after publication despite the
update from python 2.7 to 3.5 and minor updates to the rest of the cited
libraries.

In light of that paucity of evidence I find it hard to support the many
hyperbolic statements that the situation is a "big mistake", "calamity", or
"earthquake" for the scientific community.

I do agree with the more general point that scientific code requires funding
consideration for long term maintenance. Many aspects of research have adopted
provisions for equipment like reagents and computing hardware. These are
considered core infrastructure and are often shared among labs. I could see a
future where software development is supported in a similar way.

~~~
peatmoss
Do you disagree with the author’s premise that multi-decade reproducibility is
of value to the scientific community? On what basis would you make that
disagreement?

And if you concede that the author’s proposed timescales have merit, how does
4 years of stability (interrupted by the requirement to make minor changes)
meet the requirement?

To me, the premise of longer-than-four-year reproducibility timelines seems
obvious. And I hope it at least seems plausible to others. Hinsen’s point is
that the SciPy community actively markets itself to the scientific community
as a way of carrying out computationally aided research, but hasn’t even
articulated what its disposition toward reproducibility is. I think
transparency in this regard is probably the right thing. And I also find
compelling Hinsen’s supposition that we should bias a little more in favor of
reproducibility.

EDIT: and I should note that, given the SciPy community has chosen to make its
home on top of Python, and is the layer of abstraction that many researchers
are now interacting with, the Python 2 -> 3 transition is very much a SciPy
issue... specifically so for the reasons articulated in this article.

------
teekert
"The disappearance of Python 2 will leave much scientific software orphaned,
and many published results irreproducible. Yes, the big well-known packages of
the SciPy ecosystem all work with Python 3 by now, but the same cannot be said
for many domain-specific libraries that have a much smaller user and developer
base, and much more limited resources."

Why? If you store software and data together, that shouldn't be an issue?
Anyway, this is the case with any software not just Python or SciPy.

~~~
ylem
I think he's referring to the resource problem. Let's suppose that you write a
library in your copious free time (like he apparently did). Once you reach
around 100,000 lines of code, it may be nontrivial to update it (for example,
if it has say bindings to C)--given that he probably doesn't have a team, but
maybe just himself and a graduate student.

~~~
OskarS
I think it's a fair point to say "the move from 2 -> 3 caused an enormous
burden for library maintainers, and many packages aren't going to be upgraded
for 3". There's no question that this is true, inside the scientific computing
community and out. It was arguably worth it to get a better language, but
still, the transition doesn't come without cost.

However, that's an entirely different point from "the move from 2 -> 3 means
that there's a bunch of science that isn't reproducible anymore". No there
isn't, just use older versions of Python. Python 2.7 isn't going to self-
destruct in 2020, you can still install it if you want to verify and examine
an old result.

------
ak217
Instead of asking others to shoulder the (enormous) burden of backwards
compatibility so that his software may continue to work, I think this person
would do well to ask package maintainers to make sure their package metadata
is correct (e.g. it lists which versions of Python are supported), and to
adopt containerization-based tools for packaging his software dependencies in
a sustainable way. The beauty of Docker (and LXC and other container
technologies) is that the Linux kernel ABI becomes your backwards-compatible
interface, and the kernel maintainers _are_ willing to shoulder that burden.

------
jnxx
The end-of-life of Python 2 will cause a lot of scientific software to break.
Many readers who work in different domains of software development will just
shrug and say "Then, Scientists should just change to Python 3 and write all
new code in it".

It ain't that easy. The main issue is summarized in this tweet:

[https://twitter.com/quantumpenguin/status/933123060822978560](https://twitter.com/quantumpenguin/status/933123060822978560)

In an attempt to summarize Hinsens argument:

Reproducibility is important for computational science. This means it needs to
be possible to run the same code with the same data, and get the same result.
And because code represents scientific models, not experiments, and these
models are used for decades, reproducibility for decades is needed.

Yet Hinsen (who is a main contributor of both Numerical Python (numpy) and
Scientific Python, observes that a typical Python script will only run two or
three years, not five or more.

Further, Hinsen points out that scientific software consists of four layers:
domain-specific code, domain-specific libraries, scientific infrastructure,
and non-scientific infrastructure. While the scientific infrastructure code,
for example Numerical Python or Pandas is already updated to use Python3, for
many domain-specific libraries, this is not going to happen, because of
restrictions on time and funding.

(An interesting insight for me: While by far most of the actual code in any
given program will consist of OS routines, system libraries, and libraries
such as Numpy, the _total amount_ of scientific code outside of these layers
is far larger, and will probably never be rewritten.)

There are two other interesting statements Hinsen has cited in other blog
posts. One is Linus Torvalds ["We don't break user
space"]([https://lkml.org/lkml/2012/12/23/75](https://lkml.org/lkml/2012/12/23/75)).

The other is Rich Hicke's "Spec-ulation" keynote on maintaining compatibility
of interfaces - here is an old HN thread:
[https://news.ycombinator.com/item?id=13085952](https://news.ycombinator.com/item?id=13085952)

~~~
viraptor
I understand a lot of the changes cause pain in the academic community.
There's a lot that could be better. But I really don't agree with the
reproducibility issues. Python 2 doesn't "disappear". It's still available.
D.Beazley did an experiment of compiling old python versions recently and went
back all the way to the pre-1.0, pre-vcs versions. They still compile (with
minimal changes) and still run. Old packages are still available (you should
keep copies of specific versions anyway if you want reproducibility). It's not
trivial, but it's not rocket surgery either.

The article also seems to think other languages are somehow immune:

> Today’s Java implementations will run the very first Java code from 1995
> without changes,

Even though Java release notes document incompatible changes:
[http://www.oracle.com/technetwork/java/javase/8-compatibilit...](http://www.oracle.com/technetwork/java/javase/8-compatibility-
guide-2156366.html)

~~~
dezgeg
It might compile, but with modern compilers doing increasingly aggressive
optimizations (especially on undefined behaviour), it becomes more of a gamble
whether the old source with a modern compiler produces the same results as the
old code with an old compiler.

~~~
Bjartr
So use an old compiler?

------
plaidfuji
This may sound trite, but this is why science is based on math, and not code.
Math results can be fully reproduced from a paper document no matter how old
it is. That is the point of math. Until code reaches that level of self
consistency and rigor, no one is going to waste their time building a friggin
virtual machine for their one-off incremental molecular dynamics result that
all of 2 people will try to reproduce.

~~~
avip
"Science" is not discussed here, but academic research publishing. And to
state that "Academic research papers are based on math" is, lacking a more
polite term, false.

------
xemdetia
This feels like a plea for Fortran that has hindered some of the other
communities of scientific software like High Energy Physics where they are
struggling to get key software ported to C++. If the SciPy community glues
itself to Python 2 it's just going to be that much worse when Python 4 exists
years from now. I agree with a lot of the other people commenting here that
you should be able to reproduce the results with the code at the time it was
written, if that is not being preserved as either an artifact or a versioning
piece that is a process problem when releasing work.

Losing the work generated before or making it not clearly reproducible is bad,
but trying to be a stick in the mud will prevent advancement for the
generations hereafter as well as for the High Energy Physics example: when was
the last time you see undergraduates taking fortran en masse? Keeping an
esoteric version of software only helps _you_ unless you make sure it helps
those who come after.

------
zaarn
I don't think reproducability will be actually hurt, you can always download
an old copy of python 2 somewhere (there are places that archive p2) and run
the code. Unsupported doesn't mean the software version implodes into vacuum.
It just means if you install it, it'll be a rather ancient version that has
possibly many bugs and vulnerabilities and you should probably not have that
touch production.

------
Myrmornis
So development on the author’s Molecular Modeling Toolkit will stop, I see
that. But it sounds like these older packages can still be used, it’s just
that scientists can’t blindly run everything on the same python installation.
But with pyenv/virtualenv, or dockerized versions of python2, it should be
perfectly possible to keep using them for a very long time. For the same
reason, the author’s comment that scientific python scripts tend not to be
usable after 5 or more years seems not entirely accurate.

However, it wom’t be possible to use them together in the same project with
modern python3 code. So that’s bad. I wonder whether one option is to create a
dockerized version of the library runnng against python2 and the write a
serialization/deserialization layer allowing the old API to be implemented in
python3?

~~~
Myrmornis
I asked the author; he says that there would be too much
serialization/deserialization overhead.

[https://twitter.com/khinsen/status/996046318773657600](https://twitter.com/khinsen/status/996046318773657600)

------
nabla9
It wold be great to have PDF/A equivalent for a subset of scientific software
that can distributed and archived. It would allow easy access for the future
generations. Alas, it's not going to happen. Perlis epigram #14: "In the long
run every program becomes rococo - then rubble."

Great Filter Hypothesis: Software rewrite chokes civilizations.

------
sanxiyn
SciPy was 0.x software until 2017. Demanding stability to 0.x software is
unreasonable. The author used SciPy with full knowledge that it's 0.x
software.

------
ylem
I'll start by saying that I don't have a good answer to this problem. But, I
do have some thoughts. I recently reviewed a paper and was extremely happy to
see that they included a Jupyter notebook and their Keras model along with
some testing data. For those of you not in science, this was rather amazing
And when the paper is published, it will be part of the supplemental
materials, so at least for awhile, readers will be able to play with it (and
the authors plan to release their training data once they find a good way to
share such a large set of data--again, a limit for academic researchers). So,
this is going above and beyond what I've seen in normal practice (I'm in
condensed matter physics).

But...there is the problem of context. In their case, they had relatively few
dependencies and I was able to get everything to run, but there are more
complex environments and ecosystems. Even if I create a docker container, at
some point it will no longer run. I think what we can do is try to make it
possible for referees and early authors to run our code for a time. We can't
hope that 5 or 10 years from now that this will be possible--but hopefully if
we document our reduction steps, then if someone really wants to reproduce the
work, they can see the flow.

Now, why might this become important? One example is a case of outright fraud.
I went to a talk by someone from MD Anderson who gave a talk about a case of
fraud at Duke in an oncology study. They saw an amazing result and their
colleagues wanted to be able to use the same statistical methodology. They
initially tried to work with the original authors, but once they discovered
problems with the work, the original author stopped being responsive. They
spend an amazing number of man-years trying to reproduce the result and
figuring out what went wrong (intentionally and not). This was important
because human trials were beginning. If the original source code (and
infrastructure) was publicly available, this could have been avoided.

For those that say a mathematical description should be sufficient--I would
say, not always. In some cases, the math could be fine, but the implementation
could be flawed. Often if you find an error in a previous result, you need to
at least make a guess as to what could have gone wrong before. The early days
of Monte Carlo simulations sometimes suffered from flaws in the
implementations of random number generators even if the over algorithm was
fine...

Containerization might solve the problem over the short term (which I would
argue is the most relevant time period). But, it won't solve the author's
second problem which is maintaining software. Here, I think the problem is a
lack of resources--there's not much credit or funding for maintaining
scientific software...

~~~
danyx
For sharing large training data have a look at
[https://zenodo.org](https://zenodo.org) which is run by the CERN people. Up
to 50GB is no problem and after that they say just talk to us :).

------
stinos
I've given this some brief thought and looking at other software and hardware
being used in scientific research it is clear this isn't limited to Python, at
all. In other words: no matter what you use, it looks like at some point
you're going to have to change _something_ (or run the entire thing in a
virtualized environment which will last way longer but has it's own problems).
What exactly needs changing, and how much work it is, depends on the
implementation originally selected and on the amount of time spent trying to
avoid such change, but it looks like this is nearly unavoidable. From the top
of my head these are problems we had to deal with in the past year or so:

\- Matlab changes some behaviour here and there from time to time, deprecates
functionality, ...

\- both C and C++ have gone through quite some changes as well, and so have
the compilers; so it's not exactly uncommon to find 20 year old code written
without this in mind and making use of some specifics which now are
unavailable

\- assembly code written for specific DSP hardware can quickly turn obsolete:
hardware unavailable anymore, build tools not running on recent OS etc

\- not an uncommon problem in psychological research etc: parallel ports are
disappearing

Problems with hardware/compiler/platform-specifics changing can usually be
avoided in software with the proper abstractions (quite the work though,
sometimes) but it's kinda hard to foresee what the evolution of a
language/library is going to be in 10 or 20 years, let alone work around that.
Or if you're going the other way and don't want to update to newer versions
but are looking to recreate environments/dependencies: what is going to happen
to the tools you are using to recreate environments in 20 years, what if they
get breaking changes? And to the sources used to recreate the environment?

tldr; not sure if there really is a complete failsafe solution for this which
spans multiple decades

Also looking just at the end results of science: suppose you need to reproduce
a result of a dataset in 20 years, maybe there's a point where trying to make
everything future-proof now is more work than starting over from scratch in 20
years (just the analysis, supposing the data is still there)? For example at
some point there was C but not yet SciPy. I can certainly imagine cases where
writing certain analysis again from scratch again now in SciPy would be less
work than trying to get the ancient C working again.

~~~
poster123
"\- both C and C++ have gone through quite some changes as well, and so have
the compilers; so it's not exactly uncommon to find 20 year old code written
without this in mind and making use of some specifics which now are
unavailable"

C, C++, and Fortran have ISO standards, and if you write standard-conforming
code, the committees that revise standards are very careful not to break
things. It's not like Python where the BDFL changes the syntax of the print
statement.

------
gaius
Do these issues impact the R community to the same extent?

~~~
peatmoss
They’ve got some infrastructure such as “Packrat” that helps a little. You can
at least be sure you have the correct version of each dependency stored away
with your sources.

You still need the right interpreter version. And of course if you have any
exotic dependencies, you’d need to figure out what to do there.

I’d say the major difference is one of disposition. The R community seems more
on board with the premise that this is a problem worthy of solving. I suspect
that that may be a function of the R community having a longer / deeper
academic history than the Python community.

EDIT: I should mention that the R community maintains some documentation about
the issue as well.
[https://cran.r-project.org/web/views/ReproducibleResearch.ht...](https://cran.r-project.org/web/views/ReproducibleResearch.html)

------
anc84
(2015)

~~~
eesmith
? The date given is 2017-11-16. The blog started in 2015.

~~~
anc84
Sorry, must have mixed up dates in my head. You are correct.

