
Pythran: Crossing the Python Frontier [pdf] - serge-ss-paille
https://www.computer.org/csdl/mags/cs/2018/02/mcs2018020083.pdf
======
eljost
Interesting article but just skimming through it some things stand out
immediately: 1.) The first snippet isn't even valid python code as floats
don't have a shape attribute.

    
    
      s = 0. 
      n = s.shape
    

2.) The inline latex math isn't rendered properly.

~~~
rlayton2
I believe the input should be a numpy array of floats which has a shape
attribute

~~~
serge-ss-paille
(shameful author here)

One should read

    
    
      def rosen_explicit_loop(x): 
        s = 0. 
        n = x.shape[0] 
        for i in range(0, n - 1): 
          s += 100. * (x[i + 1] - x[i] ** 2.) ** 2. + (1 - x[i]) ** 2 
        return s
    

(edited)

~~~
chestervonwinch
...

    
    
        n = x.shape[0]

~~~
quietbritishjim
Or just len(x). This works perfectly well on numpy arrays and has the bonus
that it works on regular lists/tuples of floats, so the first snippet doesn't
rely on numpy.

------
accurrent
Link to the actual software: [https://github.com/serge-sans-
paille/pythran](https://github.com/serge-sans-paille/pythran)

~~~
emmelaich
> _" Pythran is an ahead of time compiler for a subset of the Python language,
> with a focus on scientific computing."_

and

> _a claimless python to c++ converter_

~~~
serge-ss-paille
(main dev writting) Don't focus too much on the _claimless_ :-) There still
are far more wide spread tools that solve the same kind of issues (numba,
cython, julia)...

The idea of the change was that it's more important to convey the problem it
solves rather than how it's done ;-)

~~~
UncleEntity
Nice project, I've wanted to do something similar using my libjit python
bindings but never seem to work up the gumption.

 _Simple and Effective Type Check Removal through Lazy Basic Block Versioning_
[0] could be adapted to make it unnecessary to compile more than one version
of the function at a (probably) minor cost of performance. It's geared more
towards jitted code but something like it where it compiles different blocks
where the types matter and falls back to interpreted code if users pass in
some random types not expected just might work -- or the python interpreter
throws an exception if the types don't make sense.

[0][http://drops.dagstuhl.de/opus/volltexte/2015/5219/pdf/9.pdf](http://drops.dagstuhl.de/opus/volltexte/2015/5219/pdf/9.pdf)

------
targafarian
Claims about Numba by the author seem a little unkind if not wrong to me.

Numba can handle vectorized (numpy behavior) directly in addition to explicit
loops. The former is accelerated less in comparison to plain python calling
numpy (since if you can use numpy operations directly, it's already really
fast) but the numpy bits in Numba can also be automatically parallelized by
Numba. Explicit loops in numba are accelerated hundreds of times over Python
loops (and you can use e.g. prange to write parallel loops, too). Point is,
the two paradigms can be mixed and matched at will within Numba.

It seems like every example is of cython, but then the author generalizes the
conclusions to Numba as well. It would be much more "honest" to show side-by-
side comparison of Numba, Cython, and Pythran, since these all have different
syntaxes and are fairly different tools.

Another example is that you don't have to rewrite functions for different
argument types in Numba, but you do in Cython (see "convolve_laplacian"
example, which can work with a simple decorator as a numba function). There
again, the impression is given that Numba suffers from the same issue as
Cython (and as mentioned elsewhere in the comments here, it's possible that
Cython has a way around this, but I don't know the details).

------
AnimalMuppet
Off topic, but this seems like as good a place as any to ask: It's my
impression that numpy is really good. Is it as good as Fortran? That is, if I
have a large, sparse, complex matrix, Fortran will have an efficient solver
for it that will also be numerically stable, and will have four decades of use
to find any weaknesses. Is numpy equivalent (except for the four decades
part)? Is it close? Or does it just cover the basic cases well, and for the
specializations you're on your own?

~~~
zb
NumPy is designed to work with SciPy, which is a wrapper for literally the
same 4 decade old Fortran libraries (LAPACK &c.) that you're referring to
here.

~~~
gnufx
Note that LAPACK isn't four decades old (and probably should be superseded
anyway). You perhaps won't want to use large-scale numerical libraries that
are that old unless they've had a lot of development and been well
parallelized.

One issue with mixed language programming that we learned decades ago is the
issue of debugging that there typically is across the interface. Also general
tool support. I recently asked the local Python expert about HPC-style
profiling of Python calling C(++) libraries, for instance. (I couldn't make
TAU work.)

------
geoalchimista
> _" As a matter of comparison, Cython does not support principles 1, 2, or 3
> and has optional support for 4."_

For _2 Type agnosticism_ , this can be emulated with a "fused type" in Cython.
See this example:
[http://cython.readthedocs.io/en/latest/src/userguide/numpy_t...](http://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#more-
generic-code)

But I think the major inconvenience with Cython vectorization is really not
about `float32` and `float64`. You get `float64` NumPy array by default from
floating-point calculations. The actual inconvenience is that the vectorized
function cannot take a scalar input like the NumPy ones. To remain
polymorphic, I usually have to perform an `is_scalar` check on the input in a
Python wrapper before sending the input data to the Cython function.

~~~
targafarian
Note that the alternative in Numba is incredibly convenient, where you can
trivially create a function where the body of the function operates on a
scalar, but then using the @vectorize decorator which makes it into a numpy
ufunc automatically. This generalizes the function to operate equally well on
scalars or numpy arrays of any dimensionality (just like "built-in" numpy
functions do).

Oh, and if you use set target='gpu', your function also works on GPUs, too. Or
you can use target='parallel' to make it parallelize automatically across CPU
cores.

------
gnufx
When Python was announced, the three main things I remember striking dynamic
languages people (apart from using a bastardized offside rule) were weird
scope rules, lack of proper GC, and that it appeared to be designed
particularly to preclude efficient implementation. We've subsequently seen the
huge amount of effort that's been devoted to different ways of working around
the implementation issue.

------
fermigier
Talk from PyParis 2015:
[https://www.youtube.com/watch?v=Af8B30mXZ7E](https://www.youtube.com/watch?v=Af8B30mXZ7E)

Slides:
[https://fr.slideshare.net/PoleSystematicParisRegion/track-32...](https://fr.slideshare.net/PoleSystematicParisRegion/track-32-serge-
guelton-et-pierrick-brunet)

~~~
icebraining
Also from this year's FOSDEM:
[https://fosdem.org/2018/schedule/event/pythran/](https://fosdem.org/2018/schedule/event/pythran/)

------
dilawar
I wonder why the author did not compare performance of pypy? I guess pypy is
jit compiler.

~~~
dec0dedab0de
I was going to say that most scientific python libraries use numpy, but a
quick google shows that pypy supports numpy now.

------
joshsyn
Isn't this solved by julia? I think scientific community should use a more
functional language rather than language like python tbh

~~~
hprotagonist
just as soon as someone who knows C/C++ ports numpy and scipy and pandas. and
gensim and nltk and sounddevice. and tensorflow and scikit-learn and keras....

~~~
ChrisRackauckas
That was done quite awhile ago? Julia now has a bunch of unique stuff Python
doesn't have because the basics are already done.

~~~
joshuamorton
Such as?

~~~
ChrisRackauckas
The iterative linear solvers from IterativeSolvers.jl along with the
preconditioner ecosystem is more expansive and uses genericness to have a lot
more functionality (it's all able to be used with matrix-free operators, GPUs
and Xeon Phis, arbitrary precision number choices along with complex,
quaternions, etc.). The differential equations solvers from
DifferentialEquations.jl covers a lot more domains than the stuff you'll find
in SciPy+PyDSTool (SDEs, DAEs, DAEs, semi-linear ODEs via exponential
integrators, IMEX, etc.). The dynamical systems library DynamicalSystems.jl is
one of a kind. QuantumOptics.jl is not only faster than QuTIP but it also
covers more areas like stochastic Schrodinger. And JuMP for mathematical
programming (optimization) is also very good in comparison to Pyomo.
Scientific computing's core is linear algebra, optimization, and diffeqs and
right there you have the basics plus some widespread applications.

I agree that Python has a library advantage in data science + ML. R has a
library advantage in the area of statistics. But Julia has quite a few
advantages in the core math areas of scientific computing and algorithm
development. There is headway being made into DS+ML as well. Julia's
pandas/dataframe equivalent is JuliaDB which adds out-of-core and online stats
functionality, so it's more at the level of pandas+dask. Flux.jl is still in
its early stages but it's quite a unique ML framework which can directly
incorporate any Julia function at any level, and then has some working
experiments with compiling to things like JS and XLA.

But in the broad view of things, every language has SciPy+NumPy pretty
satisfactory (ex: Julia's Base library has most of it, the top 20 packages
cover the rest), but from there all have tradeoffs in what areas the community
is specializing in.

~~~
joshuamorton
Cool! To be clear, I realize that could have come across as an accusatory
"prove it!", it wasn't meant that way. I've just never really needed anything
not available in the scipy ecosystem, except for some obscure statistical
methods (or not that obscure, but with a non-terrible api) that I don't think
are available in julia either.

~~~
ChrisRackauckas
You'd have to go to R for those esoteric stats packages :). Yeah, I think that
what's available in each of the languages is kind of unknown until people need
it. It would be hard to catalog it all too in a way that's both accessible and
comprehensive.

------
HerrMonnezza
Does anyone know how this compares to existing Python-to-C++ transpilers like
Cython or Shedskin?

~~~
Arkanosis
Cython is a bit different from CPython / Pythran / Shedskin in that you need
to learn the Cython language, which is a Python-ish programming language, but
not Python.

Shedskin and Pythran look somewhat similar to me (disclaimer: I've contributed
quite a bit to Shedskin but have never used Pythran so far), except in
Shedskin you don't even need annotations like with Pythran (the downside being
the finer control you have of the native types used is through the transpiler
options). Also, Shedskin development is not much active these days — to say
the least — and there's zero support for Python 3, while Pythran is under
fairly active development has beta support for Python 3.

If you're interested in Python / native implementations, you might be
interested by Nuitka as well: [http://nuitka.net/](http://nuitka.net/)

~~~
HerrMonnezza
Thanks for your answer!

However, I think this might be a bit misleading for people who do not know
Cython:

> the Cython language, which is a Python-ish programming language, but not
> Python.

Actually, [http://cython.org/](http://cython.org/) states:

""" The Cython language is a superset of the Python language that additionally
supports calling C functions and declaring C types on variables and class
attributes. """

In my experience with Cython, this description is quite accurate: code can be
annotated with C types and then compiled to efficient C code by Cython; if you
don't use annotations, then you can still compile to C code but with less
speed advantage.

I haven't used Cython since version 0.17 (quite old now) but IIRC the major
drawback was that it was mainly targeting writing extension modules for
Python; it could generate self-standing executables, but would still require a
Python interpreter to be embedded in any compiled code (that was the price for
seamless interoperability between Cython/compiled code and "pure Python"
code).

