

Performance of Python runtimes on a non-numeric scientific code - HerrMonnezza
http://arxiv.org/abs/1404.6388

======
mih
TL;DR - _Nuitka_ (which has been garnering some attention on HN today -
[https://news.ycombinator.com/item?id=8771925](https://news.ycombinator.com/item?id=8771925))
is not as fast as CPython and less faster than PyPy for the chosen scenario
(enumeration of a fat graph and computing their graph homology).

A year and a half down the line after this paper was published (Apr 2013),
Nuitka's main strengths seem to be the convenience in packaging the code into
an executable which a number of readers have reported success in. If runtime
and memory usage are more important, you are better off sticking with the
interpreters for now.

~~~
mkesper
This should be re-done with up to date versions of all programs. The authors
mention having opened bug reports/existing known bugs. All contestants should
have time to react to those, shouldn't they?

~~~
andreasvc
While that would be interesting, I don't see it as the responsibility of the
author of a conference poster that has already been presented. It would make
more sense for the authors of the various Python implementations to take this
up as a benchmark.

------
andreasvc
This comparison is interesting in that it compares the performance of plain or
type-annotated Python code, but to get the full performance benefit of Cython
you would replace lists with C arrays, objects with C structs, etc.

------
TazeTSchnitzel
By all appearances, the alternatives are only negligibly faster than CPython?!

~~~
chrisseaton
Did you see that those graphs are logarithmic? PyPy looks between 3x and 4x on
problems that run long enough for PyPy to have a chance to JIT properly.

------
reikonomusha
I didn't find this paper to be very good. While it talked lightly about a
relatively complex mathematical object it computes, it did not talk much about
what's involved in its computations, except for some very high level keywords
("comprehensions", "object graphs").

What algorithms were used? What data structures? Was the code idiomatic? Was
there any effort to reduce things like allocation?

Was homological computation the only test case? Even numerical benchmarks
typically come in a suite (a good sprinkling of linear algebraic computations,
tight straight line floating point programs, differential equation solvers,
various numerical simulators, ...), because one LAPACK function will not give
you the full picture.

This paper did not give me a very good understanding of how performant non-
numeric math—which in and of itself is an extremely broad and general term—is
on each implementation.

~~~
dalke
It's a conference paper, from EuroSciPy 2013, distributed through a preprint
service. There's no expectation it will be a high quality paper. Instead, it's
an appropriate quality for where and how it was published.

"What algorithms were used"? The papers says "The code used to install the
software and run the experiments is available on GitHub at
[https://github.com/riccardomurri/python-runtimes-
shootout](https://github.com/riccardomurri/python-runtimes-shootout) "

Checking it now, it gives a reproducible way to download the specific packages
used, and the benchmark framework. The actual code benchmarked is fatghol,
from [https://code.google.com/p/fatghol/](https://code.google.com/p/fatghol/)
. There's also a link to a preprint describing the construction algorithm, at
[http://arxiv.org/pdf/1202.1820v2.pdf](http://arxiv.org/pdf/1202.1820v2.pdf) .

What you propose is an unrealistic expectation, and only possible for people
with lots of money and time.

Instead, in real life what happens is people do A, and publish A, then do B
(building on A), and publish B, then do C (building on B) and publish C.
There's a trail of work backing up the final publication. It makes no sense
for publication Q to revisit all of A-P, nor for the author to wait until Z
before finally publishing everything. I also think knowledge transfer would be
lower since someone interested in this paper's conclusions about the available
documentation for the different Pythons (EuroSciPy is not a graph theory
specialist conference) would almost certainly not be interested in the
algorithm generation details.

You do realize the LINPACK is the "gold standard" benchmark used to rank the
top 500 supercomputers, right? And all it does is solve A x = B. In any case,
the performance suites like SPEC MPI still need to evaluate the individual
benchmarks before assembling them in a suite. Even if you require a suite for
something to be meaningful to you, this could be seen as a first step to
building such a meaningful suite.

It appears to me, therefore, like you are needless harsh and critical.

~~~
reikonomusha
Thank you for the comments.

I don't know about EuroSciPy 2013. I guess from your comment that such a
conference does not require very high quality submissions.

It is typically not good style to simply say "here is a repository which
contains the benchmark code". That is necessary, but not sufficient.
(Although, I will say many papers do _not_ include any link to where code can
be found, so this was a distinct advantage of this paper.)

There's no need to regurgitate all previous work, but a bit more than a
reference is extremely beneficial to legibility and allows for emphasis on
particular aspects of what will be measured.

My problem with the paper is my answer to the following question, "What can I
conclude from the paper?" What I gathered was approximately the following:

1\. The author has a library for computing homologies. The abstract method for
their computation is referenced in a (peer reviewed? published?) paper. The
library is linked to, though no particular version is mentioned. (Can we
really call it reproducible then?)

2\. The author has given a very brief overview of the stages of the FatGHoL
program, two of which are relevant to the benchmark. The author does _not_
discuss the structure of the objects as implemented, so I must view it as a
black box, unless I read source code.

3\. The author, in a few sentences, summarizes (but does not delve into) which
few very high level data structures used.

4\. The author spends the rest of the paper showing CPU time and memory
graphs.

5\. The author makes conclusions from the data, with sometimes plausible
explanations.

There is no outline as to what is actually being tested, except this black box
library. There are no code samples as you would typically see in a survey or
conference paper. As a reader, I've at best concluded, "a subset of some
version of FatGHoL has the following time and space measurements for a few
input parameters." Was this the conclusion the author wanted me to have?

But note the abstract says the tests "[are] an opportunity for every Python
runtime to prove its strength in optimization." Is this true? The author has
not even remotely convinced me that the code being run is even relevant
optimization capabilities.

I don't think adding up to one or two pages more talking about these things
would have cost excessive time or money on the author's part.

The unfortunate bit is elsewhere on HN and Reddit, people are now linking to
this paper as almost the definitive resource for comparing the performance of
Nuitka versus other implementations.

Lastly, I do realize LINPACK is among the benchmarks used for supercomputers
(even though it's probably more appropriate LAPACK is used, which it sometimes
is). I am very well aware of the details of the benchmark, having written an
equivalent version before myself.

~~~
dalke
Quoting from the web site: "The annual EuroSciPy Conferences allows
participants from academic, commercial, and governmental organizations to:
showcase their latest Scientific Python projects, learn from skilled users and
developers, and collaborate on code development." It isn't a conference which
requires rigorous submissions.

You say "very high quality". I used "rigorous" because quality has many
dimensions. I believe people go to EuroSciPy in part to learn which other
tools exist, and to learn from the experience of others. This paper appears to
have that audience in mind. It's partially an experience paper, and discusses
things like available documentation and the stage of development of the tools
(eg, Falcon is in early development, and crashed on the test code).

If someone came to the conference, interested in performance (which is most of
the audience) but not in NumPy (which is a smaller number), then this is a
high quality paper for this type of conference for guiding them on which
Python implementations to prioritize, even if the benchmark per se were
ignored.

You quoted where the abstract said "an opportunity for every Python runtime to
prove its strength in optimization". I can see how that might be interpreted
as a very broad benchmark. But it earlier mentioned "Python library FatGHol
... moduli space of Riemann surfaces" and later says "This paper compares the
results and experiences from running FatGHol with different Python runtimes",
so I think you're reading too much into that quote.

My code is also non-numeric scientific code. It's extremely unlikely that I
would understand the algorithm in that code, or that the mix of instructions
would match my code. I would skip the extra details as irrelevant to my
interests. Whereas the other points, like how Nuitka's claim that it
"create[s] the most efficient native code from this. This means to be fast
with the basic Python object handling." has at least one real-world counter-
example, and like how PyPy can use a lot of memory, again affects my weights
about how I might evaluate the available options.

Do you seriously think that one or two pages more would have had a significant
effect on the comments on HN or Reddit? For that matter, I see eight comments
total on HN about the paper, including mine and your three. I don't see (in
HN) peopling regard it as a 'definitive resource', but only a resource. I
don't read Reddit so can't say anything what's going on there, but surely
complaining here about Reddit doesn't help.

Also, the paper was 4 1/2 pages long. You want the author to spend about 30%
longer to write the paper, which I think is excessive.

