
Python C API, PyPy and the road into the future - mattip
http://lostinjit.blogspot.com/2015/11/python-c-api-pypy-and-road-into-future.html
======
binarycrusader
I strongly disagree with the author's assertions that the need for better C
API support is strictly a need for integrating with "legacy applications".

The fact of the matter is that a vast array of systems-level software and
high-performance software is only available via a C API interface. New or old,
that's still true today.

I've used ctypes and I've used cffi; I'm glad they exist, but authors of
popular packages often don't have the luxury of supporting both of them and
the CPython API.

If by legacy, they're attempting to paint the CPython API as the legacy one, I
also find that a misleading argument at best. PyPy is not yet the official
future of Python, and attempting to paint themselves as the successor without
any clear indicator of support for that future from the Python leadership
seems odd.

I'm all for them getting financial support they need to do the work, but
attempting to justify that work by claiming only "legacy" applications need it
seems uninformed at best.

~~~
Animats
_" PyPy is not yet the official future of Python."_

We're past that point. PyPy is the future. CPython is the past. The PyPy team
succeeded in making an optimizing compiler for a language that fought them
every step of the way with gratuitous hidden dynamism. That's a considerable
achievement. It extends the life of Python by making it competitive on speed.

Go exists mostly because Python was too slow. Google used to use Python quite
a bit internally, but their effort to speed it up, Unladen Swallow, was a
disaster. That provided some of the motivation for Go.

~~~
Demiurge
> We're past that point. PyPy is the future. CPython is the past.

CPython is the past and is the present, and the future you can't predict.
However, if you try to extrapolate, you should perhaps consider as much
historical context as possible. For instance, why is Python what it is? Is it
speed? I think it is the accessibility of language syntax, design and
features. CPython is the base of this language evolution, and PyPy is
improving just speed. So I would extrapolate CPython to always be more
popular.

Go exists because someone at Google wanted to make better C and C++, a
statically typed language. It doesn't have much to do with Python. Google
always preferred C++ and Java to python because of static typing, not just
because of speed.

Overall, I think it's a mistake to fixate so much on speed of execution, when
often times speed of development is considered more important. This niche is
never going away, despite of how hard some people hammer square pegs in round
holes.

~~~
BuckRogers
>Overall, I think it's a mistake to fixate so much on speed of execution, when
often times speed of development is considered more important. This niche is
never going away, despite of how hard some people hammer square pegs in round
holes.

That's the point to PyPy. You get fast speed of development and fast CPU
performance. Best of every world. That's why we use it and not CPython. It's
already bigger than you think.

~~~
Demiurge
>That's the point to PyPy. You get fast speed of development and fast CPU
performance. Best of every world. That's why we use it and not CPython. It's
already bigger than you think.

What you use it for, where you need the speed and don't have a C based module
to rely on? I don't mind having a bit more speed for 'free', but every time I
tried, pypy hasn't been hassle free due to some module compatibility, and yet
pure python code has _never_ been a bottle neck. I have been using Python for
+10 years.

Anyway, the above question is for curiosity sake, it doesn't change the point
that CPython is where new language features are added. If Guido adopts pypy
tomorrow, I would be happy, but otherwise, it will always try to catch up, so
I don't see how it can be the future.

~~~
BuckRogers
>the above question is for curiosity sake, it doesn't change the point that
CPython is where new language features are added.

So far. Everything changes the moment the PyPy team announces they're an
official continuation of Python2. They can easily integrate more 3rd party
backported features from 3->2 into PyPy4. Guido doesn't really matter there.

That said, it doesn't take that to get me to use PyPy. The speed improvement
is a dream come true. Writing pure Python is much more preferable to me than
writing a C extension thus PyPy is the present and future for most of us.

~~~
Demiurge
Well, that would be kind of hostile of PyPy, I really don't see that
happening.

I'm still quite curious what is your use case for PyPy where it is such a god
send? I don't enjoy C, but it's just been so rare that I had to Cython or C
anything, things are quite well optimized in the ecosystem.

~~~
BuckRogers
Why would it be hostile? There's tons of Python2 users out there like me that
need support and a secure path forward with Python, and my code.

PyPy didn't create that situation, CPython did. They're just filling a market
need. It may be opportunistic, but it's hardly hostile. If anything was
hostile it was CPython3, but I'd say neither would be "hostile" moves.

Everyone should always act in their own best interests. Especially users like
myself. The CPython team did what they felt was in theirs. I do what I feel is
mine. PyPy should do the same, not be fearful of some toxic, "hostile"
accusation.

The godsend is that I don't have to worry about CPU performance being a
limiting factor at all. That's a big deal and reduces my hardware needs to do
the same amount of work. My VPS is hardly overpowered, and I like it that way
because it's cheap for my projects. :) CPython only exacerbates hardware
issues.

It's very nice to have a simple interpreter to fall back on, but at this point
in time I think most dynamic languages need to be on a JIT like PyPy. It's
just too good and I can't wait for the STM branch to be merged into PyPy4. No
more GIL for those of us using it.

------
tych0
I have nothing but good things to say about the pypy/cffi team. I have been
maintaining a project using it for about a year and a half, and we've had
great experiences reporting bugs to and getting help from Armin (the
maintainer of cffi, a pypy guy).

Distributing cffi libraries that depend on each other can still be somewhat of
a snarl, but last I looked there are a lot of remedies for this being
discussed.

Chapeau to the team!

------
BuckRogers
I've been using PyPy as my main development platform for quite some time now.
So for me if I were to move to Python3, it would have to be Python 3.2 since
that's all PyPy3 supports. I've also been told that PyPy3 is not production-
ready. While PyPy4 is, so it's a no brainer for me to develop for PyPy4 then
fallback to CPython2.7 if I run into problems.

Fingers crossed, but so far my home business is running off pure PyPy4. No
C-extensions, no interpreters. :)

------
jboy
This post highlights an interesting dichotomy in the Python scientific
computing community. Everyone knows that PyPy runs faster than CPython for
many common tasks [1].

[1] PyPy is on average 7x faster than CPython:
[http://speed.pypy.org/](http://speed.pypy.org/)

But those in the Python community who are serious about scientific computing
(or image processing, like my startup) are already using Numpy & Scipy, which
provide hand-coded C implementations of most matrix-related operations.
Everyone knows that Python for-loops are "slow" [2], and storing a large 2-D
matrix as a list-of-list-of-Python-int-object would require a huge amount of
memory & indirection. So, Numpy offers an N-dimensional array type,
implemented in C: C arrays of densely-packed C primitive types, with for-loops
in C to iterate over the matrix elements. Then Scipy builds a lot of Matlab-
like functionality as modules on top of this fundamental Numpy array type.

[2] Python for-loops are slow: [https://jakevdp.github.io/blog/2014/05/09/why-
python-is-slow...](https://jakevdp.github.io/blog/2014/05/09/why-python-is-
slow/)

So the most expensive operations in a Python number-crunching program are
likely already implemented using Numpy & Scipy operations, which run in
compiled C (and additionally, often make use of Blas/Atlas/LAPACK/etc, for
even greater speedups in sustained number-crunching).

But unfortunately, Numpy & PyPy do not naturally work together. Being written
in C, Numpy makes substantial use of the CPython C-API -- and in fact, Numpy
provides its _own_ C-API [3]! The official Numpy package doesn't work with
PyPy; the PyPy project very thoughtfully provides its own PyPy-compatible
Numpy package [4].

[3] Numpy C-API:
[http://docs.scipy.org/doc/numpy-1.10.0/reference/c-api.html](http://docs.scipy.org/doc/numpy-1.10.0/reference/c-api.html)

[4] PyPy-compatible Numpy package: [http://pypy.org/download.html#installing-
numpy](http://pypy.org/download.html#installing-numpy)

Furthermore, Numpy is fantastic, but it can't offer all possible permutations
of matrix operations. In particular, there are certain image-processing
operations that are awkward (and thus, computationally-inefficient) to express
using Numpy operations. So you might ultimately need to go to the Numpy C-API
anyway.

This is why we created (and, just a few days ago, open-sourced) Pymod:
[https://github.com/jboy/nim-pymod](https://github.com/jboy/nim-pymod)

Pymod is a Nim+Python project that auto-generates all the Python C-API
boilerplate & auto-compiles a Python C extension module that wraps the
functions in a Nim module. Pymod enables us to write our Numpy array-
processing code in Nim, then compile it (for C++-like speeds) as a well-
behaved Python module. Nim made this very easy, because it compiles to C.

After considering our Python-integration options (CPython C-API, `ctypes` and
`cffi`), we decided to go with the CPython C-API & Numpy C-API. We explained
this decision in greater detail in the "Implementation details" section of the
Pymod README [5]; the executive summary is that `ctypes` seems better suited
to wrapping C types in Python, rather than exposing existing Python types in
C, while the CPython C-API code could be generated & compiled with the C code
that Nim was going to produce anyway.

[5] Pymod implementation details: [https://github.com/jboy/nim-
pymod#implementation-details](https://github.com/jboy/nim-
pymod#implementation-details)

That said, we would be delighted for Pymod-produced Python modules to be able
to run under PyPy. We've been strongly considering implementing a `cffi` back-
end for Pymod, but this won't necessarily solve the Numpy issue. It would be
even better if PyPy could support all the CPython C-API extension modules in
the world in one fell swoop.

~~~
p4wnc6
Can you comment on the difference between Pymod and Cython? Cython's pretty
mature now and it's straight up amazing how easy it is to generate cross-
platform extension modules. It even exposes a lot of C++ too.

Cython is a separate programming language that allows you to write C code with
a special Python-like syntax, or write Python with Python syntax (including
NumPy), and has a few extra type annotation bits here and there (like array
syntax or pointer syntax). At the end, it creates the equivalent C code (with
all needed CPython API boilerplate) and compiles it into an importable shared
object file.

Just from the description above, it sounds superficially the same as Pymod; is
it fair to say Pymod is to Nim what Cython is to C/C++? I'd be very interested
to know more about how the two tools compare and contrast.

~~~
jboy
Sure, this is a question that people ask quite frequently. I most recently
answered it here:
[https://news.ycombinator.com/item?id=10569768](https://news.ycombinator.com/item?id=10569768)

"""Actually, Pymod was designed to be almost an anti-Cython. :)

My issue with Cython is that it's a limited sub-language within Python, where
you add Cython elements incrementally & iteratively (diverging from Python in
the process) until the code runs "fast enough". I'd rather work directly in a
full-featured, internally-self-consistent language from the start. Nim has a
clean Pythonic syntax, with all the best parts of C++ (including its runtime
speed).

Hence, Pymod takes the form of an `exportpy` annotation (a user-defined Nim
pragma) onto existing Nim functions, which are then auto-compiled into a
Python extension module. So there's no gradual divergence of my Python code
(as it becomes more "Cythonized"); rather, the high-performance code is
written directly in pure Nim. :)

"""

There are a few more details in that thread comparing the wrapping of existing
C libraries in Cython vs Pymod. (It doesn't seem right to copy-paste an entire
thread...)

~~~
p4wnc6
I'm not sure I appreciate that criticism of Cython. A general strategy in
Cython is to use the annotation feature (cython -a) and visually inspect the
degree of CPython API necessary for the lines of Cython source code. I've
generally found this process to be really enlightening. Of course you can use
that information to select portions of the code that don't need to be
involving the CPython API, add typed contructs to those parts of the Cython
source, and iterate. But you can also learn a lot about how CPython works ...
for example how using the 'and' keyword can invoke a long chain of Python
special functions with lots of type checking overhead.

What this lets me do is to be extremely fine-grained about my optimizations,
or conversely, to also see when some optimizations are not worth it because
they don't help much but they do hurt the readability, or cause too much
Python-to-Cython divergence as you put it.

In a lot of cases, I prefer that this is left up to me to do, rather than if
Cython had already made hand-mapped choices about which Python things compile
to which C or machine code things. If taken to the limit, a Cython that did
that would just become what PyPy is, except it would be ahead of time compiled
instead of JIT compiled.

But I do see the benefit of both approaches. Sometimes you don't want the
burden of choosing your annotations to induce the desired compilation effect,
and you don't want to allow for similar but not identical Cython source files
to result in dramatically different C code, as can often happen currently.

~~~
jboy
The preferred workflow that you've described seems to be a (more considered)
form of the standard Cython workflow that I see described:

(1) Write Python. (2) Compile with Cython. (3) Run compiled Cython, profile &
review. (4) Consider what Cython annotations to make to code; make code
changes. (5) Goto 2.

The emphasis is always on iteration & incremental additions.

Of course I practice iterative & incremental development, and of course I'll
prototype a quick proof-of-concept implementation first (often in
Python+Numpy) before profiling & algorithmic optimization. But the Cython
workflow seems to me to add more iteration (of incremental Cython syntax
additions) than is really necessary. When I'm working to implement some
algorithm, I don't really want to iteratively learn how Cython or CPython
implement various Python functions; I'd rather write my "proper version" code
just once, properly the first time.

So why not take it to the logical extreme and write it all in Cython-lang from
the start? If we're writing for-keeps code in a language with Pythonic syntax
& static types, I find Nim-lang a more expressive, more full-featured language
(with features such as generics & type-safe Lisp-like macros, in particular; I
note that Cython _does_ support pointers & operator overloading) than Cython-
lang in general-purpose uses, without being very different at all in simple
uses.

For example, there is an example `primes` function in the Cython tutorial:
[http://docs.cython.org/src/tutorial/cython_tutorial.html#pri...](http://docs.cython.org/src/tutorial/cython_tutorial.html#primes)

Here is an equivalent implementation in Nim. As you can see, it's really
almost identical in syntax:

    
    
      proc primes(kmax: int): seq[int] =
        var kmax = kmax
        var n, k, i: int
        var p: array[1000, int]
        result = @[]
        if kmax > 1000:
          kmax = 1000
        k = 0
        n = 2
        while k < kmax:
          i = 0
          while i < k and n mod p[i] != 0:
            i = i + 1
          if i == k:
            p[k] = n
            k = k + 1
            result.add(n)
          n = n + 1
        return result
    

All of this said, I understand that a great deal of this decision comes down
to personal preference: Would you rather start with Python & then iteratively
diverge? Would you rather start & stay in Nim? And I can also see the benefit
of both approaches in different circumstances. :)

~~~
porker
> The emphasis is always on iteration & incremental additions. > All of this
> said, I understand that a great deal of this decision comes down to personal
> preference: Would you rather start with Python & then iteratively diverge?

When doing scientific research, I'd prefer to start with Python and then
iteratively diverge. The aim is to get my task done as quickly as possible (so
I can go write those 15 journal articles due last Monday) and go just as far
as is needed. It's the scientist/engineer vs programmer mindset difference.

However, if the code was done and I knew I'd need to use it again, I like the
idea of rewriting it from scratch in Nim - not that I or any scientific
colleagues know Nim (only one has heard of it).

Anyway, I look forward to keeping an eye on your work and maybe using it in
the future :)

------
_RPM
One of the great things about Python is the C API and how well documented it
is.

~~~
aardvark179
APIs like that are both good and bad because they let you do almost anything
so make many optimisations either hard or impossible, and have a nasty habit
of exposing implementation details which constrain future changes to the
language implementation.

Even APIs such as JNI that do their best to abstract over such details tend to
block a JIT's view of what is going on and end up being a real pain.

------
ericfrederich
Personally I'd be in favor of a more general solution... Some kind of
cffi/CPython bridge.

------
tomrod
I'm afraid I'm not up to speed on pypy. What is the benefit here?

~~~
x0x0
Speed. In many common cases, pypy is significantly faster than the python C
runtime. Unfortunately, the python C api was not designed with a jit in mind.

~~~
p4wnc6
But as Numba has shown, it's achievable to write a CPython-to-LLVM-IR compiler
for subsets of the language, then add NumPy-awareness, and add multiple
targets like GPUs, and achieve better-than-PyPy speed ups jitting pure Python
(CPython) code.

I think it's an interesting question as to whether PyPy "is the future" or
rather something more like Numba, or even a whole ecosystem of specialized
variants of Numba that may be tweaked or tuned to take advantage of different
hardware optimizations, work better for different domain problems (e.g. image
processing vs. network programming).

In a lot of cases where PyPy improves over some standard CPython code, it's
exactly the sort of code you really wouldn't care about optimizing in the
context of a larger application. In any sort of scientific application, the
pure CPython parts of it are usually small and insignificant compared to
what's written in NumPy/SciPy/pandas and science libraries written on top of
those, and Numba can be seen as the first "specializing compiler" for this
sort of application -- with a lot of room for people to write other kinds of
specializing compilers that compile different genres of Python code for
domain-specific purposes, the way that Numba compiles generally scientific
Python for numerical computing purposes.

~~~
Veedrac
> But as Numba has shown, it's achievable to write a CPython-to-LLVM-IR
> compiler for subsets of the language, then add NumPy-awareness, and add
> multiple targets like GPUs, and achieve better-than-PyPy speed ups jitting
> pure Python (CPython) code.

Other than "to-LLVM" and "targets like GPUs", jitpy does exactly that for
PyPy-in-CPython.

[http://jitpy.readthedocs.org/en/latest/](http://jitpy.readthedocs.org/en/latest/)

It's not quite as fast for numeric code (though it's close _and_ conformant)
but it's also not a language subset nor does it degrade to CPython performance
on dynamic parts of the language. It should be a terrific option if you're
able to section out discrete workloads that CPython's struggling with, even in
non-numeric code.

Few people know of it though.

~~~
p4wnc6
Thanks for the link to JitPy. I will definitely read more about it. Just from
the first parts of the docs, though, it looks very limited compared with
Numba. In Numba, there is a distinction between 'python mode' and 'nopython
mode' \-- meaning that even at the final LLVM IR emission, if Numba is forced
to punt on type inference (e.g. because an untyped Python list object is an
argument or something), Numba can still fall back directly to the CPython API,
and in general even this has speed benefits as a lot of intermediate calls or
intermediate variables _can_ be lowered or typed.

So even though JitPy won't "degrade to CPython performance", it's slightly
moot since you can only handle a limited set of types in JitPy. I guess one
area where JitPy should win in theory is if there is a dynamic Python object
inside of the function, like a dynamic list. Then JitPy is OK since it's not
part of the signature, and when it can't optimize due to indeterminate type
inside the function, the fallback will be PyPy instead of CPython, and hence
should be faster.

But Numba offers a lot on the array computing side, which is what it is
specialized for. For example, from the Numba docs <
[http://numba.pydata.org/numba-
doc/0.21.0/developer/architect...](http://numba.pydata.org/numba-
doc/0.21.0/developer/architecture.html#stage-5-rewrite-typed-ir) >:

> Numba implements a user-extensible rewriting pass that reads and possibly
> rewrites Numba IR. This pass’s purpose is to perform any high-level
> optimizations that still require, or could at least benefit from, Numba IR
> type information.

> One example of a problem domain that isn’t as easily optimized once lowered
> is the domain of multidimensional array operations. When Numba lowers an
> array operation, Numba treats the operation like a full ufunc kernel. During
> lowering a single array operation, Numba generates an inline broadcasting
> loop that creates a new result array. Then Numba generates an application
> loop that applies the operator over the array inputs. Recognizing and
> rewriting these loops once they are lowered into LLVM is hard, if not
> impossible.

> An example pair of optimizations in the domain of array operators is loop
> fusion and shortcut deforestation. When the optimizer recognizes that the
> output of one array operator is being fed into another array operator, and
> only to that array operator, it can fuse the two loops into a single loop.
> The optimizer can further eliminate the temporary array allocated for the
> initial operation by directly feeding the result of the first operation into
> the second, skipping the store and load to the intermediate array. This
> elimination is known as shortcut deforestation. Numba currently uses the
> rewrite pass to implement these array optimizations. For more information,
> please consult the “Case study: Array Expressions” subsection, later in this
> document.

One reason why I see the Numba route as more effective than the PyPy route is
that these kinds of highly-specific optimizations seem likely to occur all
over the place, and to be very related to the genre of computing you are
doing. One person may not want Numba to perform these optimizations, because
loop fusion isn't important to their codebase which doesn't do a lot of array
computing. Instead, they might want some kind of domain-specific optimization
that assists with a kernel bypass for some low-latency algorithm.

This could all be exposed in Numba, but as separate domain-specific sub-
modules or via user-defined IR optimization passes or something. Or it could
exist as totally separate JIT compiler projects, of which JitPy/PyPy's JIT
compiler and Numba's JIT compiler would each just be genre-specific examples.

I tend to see this as the more likely future, and that apart from cases where
someone is really just taking a bunch of pure Python code and running it with
PyPy interpreter instead of CPython interpreter (which I forecast would be a
rare case), the difference between PyPy and CPython won't matter nearly as
much as the difference between domain-specific JIT compilers.

~~~
Veedrac
I agree with everything you've said. jitpy is not so much competing in the
numeric space (although it's not far from it), since Numba is as you say
really good there.

You do need to be able to build an abstraction boundary, such that complex
classes never cross it, but nothing prevents you from instantiating a PyPy
class and communicating to it though function calls.

You'll probably want some way to share references across them, but that should
be as simple as having a global mapping and a `Ref` class that cleans it up on
destruction. Should be a few dozen lines of code.

