Hacker News new | past | comments | ask | show | jobs | submit login
Hello, HPy (hpyproject.org)
345 points by milliams 9 months ago | hide | past | favorite | 52 comments



This looks like the holy grail of extension development, effectively a Python runtime HAL, it's wildly exciting. Getting anything to work optimally on PyPy and CPython more or less means writing it twice. CFFI overhead on CPython really sucks for anything where the runtime of the wrapped function does not eclipse the overhead of the binding. HPy looks like a fantastic approach.

Only complaint from a quick scan -- carrying over that horrid PyArg_Parse() API. It's huge, needs varargs, isn't type safe, repeatedly rebuilds strings to look them up in kwargs, and isn't even all that convenient to use. Also, if you haven't dealt with a SEGV due to PyArg_Parse() before, it means you probably have never tried to write an extension

PyPy will become about 1000x more practical if something like this ever makes it into Cython. That would make it easy to bring over huge swathes of existing extensions like lxml.


We plan to introduce something similar to CPython's argument clinic eventually. I.e., you will be able to declare what is the C signature of your impl function and HPy will generate the argument parsing code automatically.

One nice effect of such an API is that under certain conditions, implementations will be allowed to bypass argument parsing entirely: e.g., if I have a function which takes two C longs, the PyPy JIT will be able to emit a call directly to it, without having to box/unbox the arguments just to do the call as it happens right now.

However, on the other hand we will need to provide an API similar to the existing PyArg_ParseTuple, because we want to make it easy to migrate existing extensions


I'm not familiar with the Python runtime, but PyArg_Parse sounds very much like a PHP C API function (zend_parse_parameters) which has the same purpose and problems. Several years ago, an alternative API was added where you did one function call (actually macro) per argument. It has worse code size but better performance. Maybe Python can do the same?


We have this same problem in Ruby - implementing the Ruby C extension API if you aren't exactly the same as the reference Ruby implementation is extremely challenging. I think Charlie Nutter proposed something like this for Ruby in the past https://github.com/headius/xni.


> We have this same problem in Ruby - implementing the Ruby C extension API if you aren't exactly the same as the reference Ruby implementation is extremely challenging.

Isn't that (plus, also, cross-interpreter native extensions) what the FFI [0] gem (first implemented for Rubinius, then JRuby, and now pretty much every Ruby interpreter that matters except Opal.)

AFAIK, FFI has been the recommended way of writing C-exts for Ruby for sometime, as FFI exts are both cleaner to write and portable across Ruby implementations.

[0] https://github.com/ffi/ffi


It'd be great if people used FFI, or the standard built-in equivalent, Fiddle, but they generally don't, so it doesn't really practically solve the problem at the moment. I think there's something like half a billion lines of C extension code using the traditional API out there.

However, note that the FFI isn't really applicable with one very common use-case of C extensions anyway - increased performance through reaching through abstractions. See fast_blank for example. Can you reimplement that effectively using a cross-platform FFI? I'm not sure you can.

The FFI also adds a bit of runtime overhead for each call, which you may not want.


Only support extensions in Wasm? Change build tooling (I don't Ruby, I am sure it has an equivalent) so that setup.py builds the C extension into a Wasm module.

Or grab the fn that the FFI points to and lightly decompile and inline into the interpreter?

Compile the typed subset of dynamic language of the day into Wasm and inline that into the interpreter?

I am sure something here is the Holy Grail of native-ish extension mechanism.


GraalPython and TruffleRuby are trying this exact idea, but with LLVM bitcode rather than WASM.


I've always used Cython to make an extension. Speed improvements have always been awesome. I'm not sure when I'd want to use HPy over Cython. What are your opinions on this, please?


I think the idea is for cython to move away from generating CPython C API code to generating hpy code in the long run. From the point of view of cython users this won’t make much difference.


hello, OP of the blog post here. Yes, the plan is to write a Cython backend at some point, so that all Cython extensions will benefit from HPy automatically.

The main Cython developer (Stefan Behnel) was present during the very first few meetings where we started to talk about HPy and gave positive feedback on that.


One obvious improvement here is the use of `HPyContext`. This allows the Python interpreter state to be encapsulated rather than globally-accessible, which would enable multiple interpreters in a single process.

Lua’s C API is generally considered to be well-designed, and does something similar with `lua_State`. How does the HPy API compare to the Lua API more generally?


> enable multiple interpreters in a single process.

CPython (the default interpreter) has had support for multiple interpreters in the same process since 1997, but it has only been exposed to the C API, and not in the language itself. Python 3.10, coming out later this year, will expose multiple interpreters in one process (subinterpreters, see PEP 554 [1]).

I'm excited about what could eventually come out of this. If there is one GIL per interpreter, we could have something like the `multiprocessing` library for parallel execution, but all within one process.

1 https://www.python.org/dev/peps/pep-0554/


This is basically how the Threads extension [0] works for Tcl. Tcl has long supported creating many interpreters per thread. The Thread extension exposes some threading commands to the interpreter, which lets you create a new thread+interp. You can then send messages to that interpreter, or pass it file descriptors, but otherwise they are fairly isolated (apartment threading model).

[0] https://www.tcl-lang.org/man/tcl8.6/ThreadCmd/thread.htm


Can multiple Tcl interpreters run truly parallel on different threads in the same process? That’s possible with Lua but AFAIK not with Python, because one of the pieces of global Python interpreter state is the Global Interpreter Lock.

Encapsulated interpreter state via HPyContext would allow replacing the per-process lock with a per-interpreter lock, enabling such parallelism.


Yes, threads from the Thread extension are real OS threads, which have a Tcl interp in them. Since they are separate interpreters then there's no impact from any kind of "per-interpreter lock".

Passing messages between Threads can be done asynchronously which does not block either thread.

Example Tcl program with threads

    package require Thread
    set tid [thread::create]
    thread::send -async $tid {
        while true { after 200; puts Jazz }
    }
    while true { after 100; puts Party }
This creates one thread which is in an infinite loop printing "Jazz" to stdout every 200ms, and the "main" thread which is in an infinite loop printing "Party" to stdout every 100ms.


Related but separate from the subinterpreters PEP, there is interest in moving from a global interpreter lock to one lock per subinterpreter. This would need to be in place before running Python interpreters truly in parallel within one process. There’s a great summary here: https://lwn.net/Articles/820424/


No. See A Disclaimer about the GIL right after the Abstract.


The big difference between Lua and Python C APIs is that Python values are exposed directly as first-class values (as pointers in CPython, and as handles in HPy). If one native function wants to call another native function and pass it a Python object, it can do so directly - the VM is none the wiser for it, except for reference count updates.

In Lua API, all Lua values are bound to a stack, and C functions have to exchange values via that stack - there's no free-standing "Lua value" type in the API. Thus, the runtime is fully aware of all the objects that flow back and forth, even if one native Lua function is calling another native Lua function.


> In Lua API, all Lua values are bound to a stack, and C functions have to exchange values via that stack - there's no free-standing "Lua value" type in the API.

That's not 100% correct. You can have values unbound from the stack, they're just typed, rather than having one "value" type. (lua_Number, lua_CFunction, etc). Though functions bound from do still use the stack for taking/returning values. (However your C functions can directly work on other value types).


Correct me if I'm wrong, but doesn't this only work for value types? i.e. there's no way you can hold a reference to a Lua (not C) function, or a table, without the stack.


What you're looking for in that case is lua_topointer, which can turn any Lua userdata, table, thread, or function into a void pointer.


Not really. Looking at the doc for that function:

"There is no way to convert the pointer back to its original value. Typically this function is used only for debug information."

(In CPython, you can e.g. stash a PyObject* away in C globals.)


If what you want is a reference to any Lua object, that you can grab from somewhere random in C, that doesn't exist on the stack, then you'll be using Lua's registry.

Your first stop will be luaL_ref [0].

[0] https://www.lua.org/manual/5.3/manual.html#luaL_ref


Yep, that's exactly what I meant. It goes back to the same thing - all Lua objects exist strictly in the "Lua world" (be it the stack or the registry), and anything that wants to exchange them has to go through those mechanisms. In Python, you just get a PyObject*, and you can pass it around as much as you want - the only thing you have to take care of is to reference-count it properly.


A registry object in Lua is an integer in C, and you can pass it around as much as you want. However, you have to be careful, because it can be eliminated by the GC, in much the same way as a PyObject pointer.

The registry object really is just an address. It's the equivalent of: lua_State->top+LUA_REGISTRYINDEX[reference] in C.

There's no functional difference to a PyObject pointer.


I love that the debug mode catches missing refcount increments/decrement. I hate programming with manual reference counting. Looking forward to trying this out.


I know SWIG stopped being cool long, long time ago, but it does still appear pretty easy to use [1], and has the advantage of being able to target multiple languages.

I get the feeling ppl don't use it mostly because of the "uncool" factor...

Would you use SWIG to extend python or other languages? Why or why not?

1: https://www.geeksforgeeks.org/wrapping-cc-python-using-swig-...


I used SWIG once a long time ago, for a project that did actually target multiple languages.

I got burned so bad that I ditched all efforts at automatic binding generations from there on out. Debugging this thing was impossible (to me, a long long time ago) and I ended up writing manual bindings to python & matlab instead. Those two were the important targets anyhow, and it allowed me to really think through how I exposed & handed over the different chunks of memory being allocated in the native code.


AFAIK SWIG doens't support PyPy (or IronPython, etc), which is one of the core motivations for HPy.


Nice! I've been following HPy for a while, and I'd like to eventually implement it in RustPython[0] (the blockers to doing so are mostly on our part, not theirs). I'm hopeful that this will vastly improve compatibility for native modules for interpreters other than CPython - numpy seems almost impossible to get working in RustPython without something like HPy.

[0]: https://github.com/RustPython/RustPython


Anything that can improve the C extension experience is very welcome. The current APIs are kind of... rough.


My 2 cents as an implementer of an alternate Python implementation -- I am skeptical about this approach. We tried the "provide a limited API" approach in the past and found the following things:

- People actually use the "hard to implement" parts of the C API

- Moving things from macros to function calls can often be quite detrimental to performance

- C extensions have been co-optimized with the C API, so changing the C API will make things less optimized

- These "check if runtime debugging is enabled" checks are not free

My guess is that this will end up in a tough middle ground where extension writers will be faced with a tradeoff along the lines of making their extension 10% slower for 99% of their users in order to make things 2x faster for 1% of their users.


what is the python implementation you are involved with?

I agree with your concerns, HPy tries to address them since the beginning. Basically, there are two distinct compilation modes: - CPython ABI: in this mode, things like HPy_Dup and HPy_Close are translated directly into Py_INCREF and Py_DECREF. The overhead is 0 both in theory and in practice, since all the benchmark that we ran so far confirmed this.

- HPy Universal ABI: in this mode, you introduce the indirections which makes it possible e.g. the debug mode. Our benchmarks indicate a 5-10% overhead, which is in line with what you (and we :)) expected.

So, if you are an extension writer, you will be able to distribute both CPython-opimized and universal binaries


Watch out people! If everyone adopts this, people can just start using any Python implementation they want with any extension module.


This could be big. A strong porting guide will go a long way. I wonder to what extent that could be automated?


We plan to write a porting guide and examples at some point, but we didn't yet, also because we are still in the experimental phase and the API is still subject to change (although it changes less quickly nowadays).

In theory, most of the porting could be automated. The only thing which cannot is turning Py_INCREF/DECREF into HPy_Dup/HPy_Close: they are closely related, but HPy requires that you close each handle independently, contrarily to Python/C where you can DECREF the same PyObject* multiple times, as long as the final refcount is correct.

For that, the debug mode will be very useful because it precisely tells you which handles you didn't close and which ones you closed multiple times.


Is cython still a thing? If so, what's the benefit of this over cython?


It still is, and Cython is great for accelerating critical Python code.

A C extension is far preferable when you want to code in C, either to write a new data type[1], or write a Python frontend to a C library[2] that is too complex to be well supported by simple FFI.

I think people use Cython more internally when they value the maintainability of "mostly Python" over the fact that it's slower than what native C would get them.

[1]: https://github.com/tobgu/pyrsistent

[2]: https://github.com/libgit2/pygit2


> over the fact that it's slower than what native C would get them.

You have a quote for that? Cython code, in the context of "C-extended-python" probably never is slower than native C. There has been a lot of work over the years to make sure of that.

Now, if you want to compare it to "a C program/library in the wild", then we are not comparing apples to apples. The whole point of Cython is inserting C code into a Python module, to have (high performance) C code inside a Python environment.


My source is simply running `cython -a` to see all the highlighted lines, and fixing them and observing how many hoops I have to jump through to get code that doesn't interact with the interpreter.

Cython makes you do more work to avoid the interpreter. C makes you do more work to involve the interpreter.

> There has been a lot of work over the years to make sure of that.

Some cython code can compile close to the C code you'd write.

But that's not how it's likely to be used, because the design pushes you in the opposite direction by default, and the value of Cython is that it's far faster than vanilla Python, and the painless interop.


>Cython code, in the context of "C-extended-python" probably never is slower than native C

Huh? Cython tries, but it's not exactly optimized hand-coded C. And it works better for some things (e.g. numerical typed code) than others.


> A C extension is far preferable when you want to code in C

Or any other language that supports C FFI.


> If so, what's the benefit of this over cython?

It's not a "benefit over" thing, they're tools operating at completely different levels and thus not necessarily exclusive.

cython binds to the C API of CPython, meaning you've got the same problem loading a cython-compiled extension into alternative implementations as you have with other CPython extensions.

If HPy succeeds, cpython can (hopefully) have an HPy backend and generate HPy bindings, somewhat transparently.


Kudos to the authors. This is solid infrastructure, if you get corporate users please ask for sponsorship!


If you are on c++ there is boost python. It wraps the CPyyhon api in a good way with lots of nice stuff like automatic ref counting, function definitions and data type conversions.


These days you should really use PyBind11.


Oh it's great of you're binding c++. Template magic, magick-er than boost, didn't think it was possible but it works really nice out of the box. So nice it inspired other libraries in other languages. See Ada-py-bind on github. Still some code to write by hand but the abstraction is quite nice.


boost::python targets Python2 only. At least it used to and the docs don't mention anything about fixing this.

https://www.boost.org/doc/libs/1_72_0/libs/python/doc/html/t...

https://www.boost.org/doc/libs/1_72_0/libs/python/doc/html/b...


It works with Python 3 just fine.


[flagged]


Not sure what you mean - Smalltalk primitives are methods implemented by the language implementation. This statement is just about the fact that they provide an alternative API via an alternate header file.


this is unnecessarily dismissive - you could have said "smalltalk primitives are an interesting parallel to this idea" and people would have enjoyed following the pointer, rather than getting annoyed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: