
Inside cpyext: Why emulating CPython C API is so Hard - mattip
https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html
======
haberman
There is a lot of detail here, but the higher-level takeaway is this: the
Python C API is a very "wide" API. It exposes lots of details of CPython: the
good, the bad, and the ugly. You get direct access to pointers to the
underlying objects. You manually manage reference counts.

For the polar opposite of this, consider the Lua API. You don't get pointers
to any VM data structures except a single pointer to the "Lua state". You do
not perform any manual memory management.

Lua's approach has yielded amazing results. LuaJIT is not only source-
compatible with pre-existing Lua 5.1 extensions, it is _binary compatible_
with them. You can take a .so that you built before LuaJIT ever existed and it
will work with LuaJIT _without recompiling_. This is astounding to me.

Moral of the story: keep your interfaces as narrow and encapsulated as
possible!

~~~
typomatic
Worth noting that the PyPy article links to a whole site devoted to the
troubles and rehabilitation of the CPython API:
[https://pythoncapi.readthedocs.io/](https://pythoncapi.readthedocs.io/) .
It's really exciting to see these backend concerns being addressed with depth
and care, especially against the backdrop of syntactic sugar PEPs blowing up
uglily.

~~~
antt
PyPy is one of the few projects that are doing good work, communicating their
work well and making the right strategic decisions. At his rate I won't be
surprised if python4 is just PyPy.

~~~
sitkack
If the PyPy team made something that was compatible with both Python2 and
Python3, that Python would be _that_ Python.

------
klibertp
> for a total 2-3 person-months of work.

In a year. In a project which could put Python next to JS, for the last pain-
point that prevents it.

Python - one of the top 10 most popular languages - community and all its
industrial user, including some of the most successful companies on the
planet, can afford to put 3 person-months of work for that feature.

There has to be something else at play here that I'm missing. Well, other than
missing that "donate" button for a tad too long...

~~~
rtpg
There are very few commercial sponsors for Python and related projects. Django
has enough resources for basically one full-time paid contributor, but that’s
it. Huge amounts of Python infra is maintained in people’s free time, despite
powering so much

------
quotemstr
Right. It's exactly this sort of coupling between the runtime and extension
modules that prompted me to adopt a very conservative design for the Emacs
extension module interface, which tries as hard as possible to isolate
extension modules and abstract over VM implementation details.

I didn't quite get as far as I wanted, since the module system still relies on
conservative stack scanning to find C-extension GC tools (because everyone
else wasn't sold on JNI-like explicit local references), but it's still much
more tightly specified than the Python API.

The Python extension API has another problem: it relies on FILE* and other
assorted bits of the C runtime. That's mostly okay on POSIX-y systems where
it's common for a whole process to share one C runtime, but on Windows, where
different modules can come with different C runtime versions, this kind of
leaking of objects across an interface boundary really hurts.

------
robmccoll
It seems like it would be easier and more performant (begin heresy and likely
vast oversimplification) for PyPy to have a mostly complete implementation of
CPython that it interacts with at the object level such that each object in
PyPy could be a PyPy GC object descended from W_root or a CPython RefCount
object implemented on PyObject*. There would be many points where you would
still have to convert to make operations work. Either way, running under the
assumption of "objects from CPython or objects that are passed to CPython are
more likely to interact with other CPython objects, so lets not convert them
back until we have to" might result in closer to CPython performance for code
with a lot of CPython extensions.

Does this seem reasonable? Is this even possible? I don't know much about the
internals of PyPy...

~~~
antt
PyPy in general has much better performance than CPython.

------
faragon
As contrast, calling C from Python, even using callbacks, it is very easy and
incredibly useful, thanks to the "ctypes" [1]

[1]
[https://docs.python.org/2/library/ctypes.html](https://docs.python.org/2/library/ctypes.html)

~~~
simias
True but that's not particularly impressive, most non-sandboxed scripting
languages have a straightforward way to call C. It's just a lot simpler that
way: C doesn't have much of a runtime, you don't have to worry about
corrupting C's state too much. There's no garbage collector in C for instance.

------
mattip
Would be nice if funding comes through to make c extensions fast on PyPy, then
we could try out many what if scenarios descibed in other comments:

What if we wrote that in pure python instead? What if we moved the computation
to a GPU c extension? What if we use a diffent GC strategy?

------
sitkack
I have been saying this for 8+ years, maybe longer. The CPython "Python.h"
should be viewed as deprecated and folks should be using cffi [1] for all of
their native code gluing needs. By targeting cffi, one gets futured proofed
multi-runtime extensions for free. "Python.h" ties one to a specific
implementation.

[1]
[https://cffi.readthedocs.io/en/latest/](https://cffi.readthedocs.io/en/latest/)

------
bratao
I started making PyPy the default interpreter for my projects, and you should
too!

Maybe I'm a sucker for the underdog, but in my mind, PyPy could save the
Python ecosystem from irrelevance. People are looking at Rust and Go with the
excuse of performance, and they are now the new Hip language. Even Ruby is
catching up.

Without an answer, Python could be side-tracked for a increasing number of
scenarios.

------
Pxtl
Seriously, this surface area should've been closed off when they broke
backwards-compatibility with the Python 2 to Python 3 jump.

------
aportnoy
Can anyone point to an introduction to CPython internals? My goal is to write
a Python extension in C that manipulates Python objects.

~~~
Too
Try boost-python, SWIG or cython instead. They give you a nicer c++ api and
automatically handles ref-counting, exception handling and other things that
are easy to forget.

[https://www.boost.org/doc/libs/1_61_0/libs/python/doc/html/t...](https://www.boost.org/doc/libs/1_61_0/libs/python/doc/html/tutorial/index.html#tutorial.quickstart)

~~~
eesmith
How is the Boost/PyPy integration? Sure, there's cpyext, so in principle a
Boost-generated extension can be used by pypy.

Are there any changes which would make Boost-based extensions better
integrated/supported by PyPy?

The linked-to document only talks about Cython and cffi.

~~~
mattip
There is cppyy, which is cffi for c++. Boost python is not being maintained,
in the cpython world pybind11 is more popular but cppyy is pypy freindly

~~~
eesmith
Boost isn't being maintained? I did not know that. One of the project I use
often - F/OSS but I'm a user, not core developer - uses Boost for C++/Python
integration. They chose it many years ago. This topic has never come up on the
mailing list or user group meetings. I suspect that since Boost "just works",
no one has care to re-evaluate that decision.

Thanks for the pointers to what's going on in the C++/python integration
layer. I'll experiment with it.

------
ezoe
For the healthy improvement of programming language, it is vital to have a
multiple independent and competing implementations. Python doesn't allow that.

For this, I think the Python will follow the same fate as Perl did.

