Something I don't see being mentioned in the comments:
What's a really frustrating part of working with the C API?
Setting up the compile/link flags!
The python3-config works generally well but is only available at the OS level.
But you don't want to mess with that (e.g., to access pip-installed packages).
Beyond that, everything is a mess!
python3 -m venv doesn't even bother creating such a script.
anaconda/miniconda? Don't even try!.
So every package pollutes their build scripts with many hardcoded `python3 -c "import sys: print..."` calls.
I've opened a CPython/PR that may help a bit by adding `python3 -m sysconfig --json` flag [0]
Very happy to see that these issues are getting attention now. I think that the Python language being so centered on one implementation is a long-term threat to its success. Web servers, command-line programs, and embedded devices have different requirements: high post-warmup throughput, fast startup, low memory usage. They aren't necessarily best served by the same implementation. If this project succeeds in replacing Python's C API with something that doesn't expose implementation details, such as whether the implementation uses reference counting, that could make it easier both to maintain alternative implementations, and to experiment with new techniques in CPython.
Is my understanding correct that this would provide version agnostic python bindings? Currently, I am building a version of my bindings separately for each version (e.g. building and linking with python 3.7, 3.8, etc.). While automated, it still makes CI/CD take quite a long time.
As others have said, this has been supported since the limited/stable APIs were introduced. What this adds is a way of implementing a Python extension that can be loaded in (not just compiled for, which is already an improvement!) different Python implementations, namely CPython, Pypy and GraalVM.
But it is very limited. Understandably so, as they don't want to ossify the internal APIs, but it still is so limited that you can't actually build anything just using just that API as far as I know.
Woah! Okay, that's very cool. I thought it was much more limited than that (for the stable abi). Awesome!
It seems like they use mostly normal python as a bridge with the rust codebase. So from what I've seen on their repo, they mostly do not use any CPython APIs (a part from a few wrappers I think). Which makes sense!
That makes sense: I would assume polars mostly converts from Python to rust at the edge then it works in rust internally.
Though I’ve not really looked at the details I’d assume most of the missing stuff would be “intimate” APIs of builtin types. And all the macros leveraging implementation details.
I wouldn't recommend ccache (or sccache) in CI unless you really need it. They are not 100% reliable, and any time you save from caching will be more than lost debugging the weird failures you get when they go wrong.
You can't cache based on the file contents alone. You will also need to cache based on all OS/compiler queries/variables/settings that the preprocessor depends on, since the header files might generate completely different content based on what ifdef gets triggered.
And that’s not impossible, just tedious. One tricky (and often unimportant) part is negative dependencies—when the build depends on the fact that a header or library cannot be found in a particular directory on a search path (which happens all the time, if you think about it). As far as I know, no compilers will cooperate with you on this, so build systems that try to get this right have to trace the compiler’s system calls to be sure (Tup does something like this) or completely control and hash absolutely everything that the compiler could possibly see (Nix and IIUC Bazel).
It’s not about that, that’s not relevant to ccache at all. (And yes, C23 does have __has_include, though not a lot of compilers have C23 yet.) It’s about having potentially conflicting headers in the source file’s directory, in your -I directories, and in your /usr/include directories.
Suppose a previous compile correctly resolved <libfoo.h> to /usr/include/libfoo.h, and that file remains unchanged, but since that time you’ve installed a private build of libfoo such that a new compile would instead resolve that to ~/.local/include/libfoo.h. What you want is to record not just that your compile opened /usr/include/libfoo.h (“positive dependencies” you get with -MD et al.), but that it tried $GITHOME/include/libfoo.h, ~/.local/include/libfoo.h, etc. before that and failed (“negative dependencies”), so that if any of those appear later you can force a recompile.
Oh yeah that can cause lots of weird problems. I've run into that sort of issue a lot when cross-compiling, cause often then you might have a system copy of a library and a different version for the target, that can be a real pain.
please read the documentation before dispensing uninformed advice like this -- it works using the output of the preprocessor and optionally, file paths
Why are you so skeptical? Think about how it works and then you'll understand that cache invalidation bugs are completely inevitable. Hell, cache invalidation is notoriously difficult to get right even when you aren't building it on top of a complex tool that was never designed for aggressive caching.
It would be interesting to see benchmarks comparing HPy extensions to equivalent Cython/pybind11 implementations in terms of performance and development time.
I'm a little unclear as to how this fits in with libraries like PyBind11 or nanobind? It seems like those libraries would need to be rewritten (or new libraries with the same goals created) in order to use this in the same way?
Yeah, sure, I mean, how many people write C to write an end-user Python module. There's stuff that genuinely wraps C libraries or predates higher level language wrappers, like numpy or matplotlib, but how many new modules are actually themselves written in C?
The point is that’s not relevant, the issue is the API / ABI of the modules, its requirements, and its limitations, not the langage in which the modules are written.
I would guess also that HPy would replace the includes of `Python.h` that pybind11 et al make in order to bind to CPython, and so existing extensions should be easier to port?
A lot. You don't have to write in C, just use the C-API functions. pybind etc. introduce a whole new set of problems, with new version issues and decreased debug ability.
First of all, cool to see some activity on this front!
I’ve written a fair share of pure CPython bindings and regularly post about implementing them with minimal overhead (<https://ashvardanian.com/posts/discount-on-keyword-arguments...>) and would love to share a few recommendations, questions, and concerns :)
Just a suggestion to help you grow—I'd restructure the landing page (<https://hpyproject.org/>) and the README of the repo (<https://github.com/hpyproject/hpy>). It could benefit from some examples to clarify the "Nicer API" bullet point. Maybe these could be taken from the API documentation page (<https://docs.hpyproject.org/en/latest/api.html>). The page could also be more convincing with some supporting stats in favor of PyPy, GraalPython, and other Python runtimes. A reader like me might not be sure if they have enough usage and are stable enough.
Avoiding singletons and having encapsulated context objects like `HPyContext` is definitely a great thing to have, especially in the multi-threaded Python future or in complex environments with multiple sub-interpreters. But this doesn't really solve the problem if, under the hood, the `HPyContext` still redirects to CPython's singleton.
I've also looked at the linked benchmarks (<https://pypy.org/posts/2019/12/hpy-kick-off-sprint-report-18...>). They are dated from 2019, five years ago, and already mention CPython's `METH_FASTCALL` fast calling convention, but it seems like they are not compared to it. In either case, parsing arguments from one "ll" string specifier is hardly a detailed benchmark if the underlying magic isn't explained. I occasionally do one-off benchmarks as well, but it's better to describe the principle—why the thing is supposed to be faster. For example, if you're concerned about performance, you'd just parse the arguments directly from the tuple without string formatters—like this:
tangentially related question: is there something as simple as luajit's ffi for python? as in: give it a c header, load the shared library, it simply makes structs usable and functions callable.
It's a no-go at this point, if you want this on MS Windows. CGo on MS Windows uses MinGW, while CPython uses MSVC. It's very hard to make this work due to name mangling.
I.e. you can do this for Python from MSYS2, for example, but not for the one your users will likely have.
Unless it was done at the very beginning, I doubt it would have been even possible because the current C API is the remnant from that very first public version.
python has one of the most fractured development ecosystems of any moderately used language. i’m pretty convinced python is a language that attracts poor development practices and magnifies them due to its flexibility. the people who love it don’t understand the extreme flexibility makes it fragile at scale and are willing to put up with its annoyances in an almost stockholm syndrome way
I think any programming language with a lot of popularity attracts poor development practices. Simply because a lot of programmers don't actually know the underlying processes of what they build. The flip-side of this is that freedom and flexibility also gives you a lot of control. Yes, it's very easy to write bad Python. In fact it's probably one of Python's weaknesses as you point out. If you're going to iterate over a bunch of elements, you probably expect your language standard libraries to do it in an efficient way, and Python doesn't necessarily do that. What you gain by this flexibility (and arguably sometimes poor design) is that it's also possible to write really good Python and tailor it exactly to your needs. I think Python scales rather well in fact. Django is a good example, as it's a massive workhorse for a lot of the web (Instagram still uses their own version of it as one example). It does so sort of anonymously similar to how PHP and Ruby do it outside of the hype circle, but it does it.
One of the advantages Python has, even when it's bad, is that it's often "good enough". 95% of the software which gets written is never really going to need to be extremely efficient. I would argue that in 2024 Go is actually the perfect combination of the good stuff from both Python and C. But those things aren't necessarily easy to get into if you're not familiar with something like memory management, (maybe strict typing?), explicit error handling and the differences between an interpreted and compiled language.
Anyway I don't think Python is anymore annoying than any other language. The freedom it gives you needs to be reigned in and if you don't then you'll end up with a mess. A mess which is probably perfectly fine.
But CPython itself has poor development practices: For about 8 years those in the inner circle can modify anything and pose as experts while brutally squashing criticism.
Thanks! I'm not a native English speaker so I'll probably fuck this up again but it's nice to know. Reign would be cooler though. I guess I could say "you gotta reign your developers use of Python".
> most fractured development ecosystems of any moderately used language
Can you elaborate? What's done wrong with Python and right with other "moderately used language" ?
For start, C/C++ doesn't even have an official ecosystem. For Java or Golang, it looks better only because the "ecosystem" does not always include native extensions like cgo or JNI. Once you add them the complexity were no better than Python's
You have Anaconda packaging world vs PyPI. You have pyproject.toml for project management, which is not supported by Anaconda or the flagship documentation generation tool: Sphynx. You have half a dozen of package installers, none of them work to the full extent / all have different problems. You have plenty of ways to install Python, all of them suck. You have plenty of ways to do some common tasks, s.a. GUI, Web, automation: and all of them suck in different ways, w/o a hint of unifying link. Similarly, you have an, allegedly, common relational database interface, but most commonly used SQL bindings don't use it. And the list goes on.
As I said, it's only because .so extensions were hard. If every package were pure Python, I would simply copy paste them in my source code `lib` path.
Don't laugh at me, this is called "vendoring" or "static linking" by other languages, and the "requests" famously included a version of urllib3 for quite a while
Oh, but there's plenty more to Python packaging... unfortunately. You can put a ton of random stuff that's not Python modules into wheels: scripts, data, headers. Anaconda doesn't support most of that.
There is no fracture or "versus" here. You can pip install on top of Anaconda. Anaconda provides a more stringent solver and OS level packages that some pip level modules often depend on, it just solves the integration problem, but I use both, including requirements.txt in my Anaconda env.yml all the time.
> You have pyproject.toml for project management, which is not supported by Anaconda or the flagship documentation generation tool: Sphynx.
Again, Anaconda is not "standard" python thing, it is a replacement for build OS level packages, such as GDAL, which is a just a subset of Python modules. Anaconda does not need to support standard python tooling, because those python tools exist outside of Anaconda.
To simplify, for every Anaconda package, you can likely find it in PyPI, but for every PyPI, you will not find it in for conda. Anaconda is not a competitor for PyPI, it does not need to replicate every PyPI feature.
> You have plenty of ways to install Python, all of them suck.
What does this actually mean? You install Python with all the major OS installation methods, and absolutely none of them suck, any more than installing anything on this OS does. The standard ways are Python Setup.exe, apt-get install, and brew install. Yes, you can additional options such as conda distros, yet what exactly sucks about them? Nothing.
> You have plenty of ways to do some common tasks, s.a. GUI, Web, automation: and all of them suck in different ways, w/o a hint of unifying link.
I think I'm starting to get it. Everything sucks if you've been around long enough. Django is vastly prevalent web framework. wx widgets is standard, and there are bindings for most GUI toolkits. There are many toolkits, is it Pythons fault they all got invented by different organizations? Is it an interpreted language responsiblity to provide a cross platform GUI toolkit for you?
> Similarly, you have an, allegedly, common relational database interface, but most commonly used SQL bindings don't use it.
What are you even talking about? Who in the world cares about this? People use database specific libraries, in every single language, because every database has its own set of features.
> And the list goes on.
Your list reeks of someone flinging critiques without even knowing what they’re talking about—just a lot of hot air fueled by emotional baggage, likely from some long-dead language you once cherished before it was mercifully abandoned.
This is what people believe when they don't know how it works: no, you cannot. But this isn't even the point. The point is that you have different tools that have no interop between them, nothing in common at all: conda-build and setuptools (and there's plenty of half-implemented Python packaging tools that cannot package native extensions).
> Again, Anaconda is not "standard" python thing
Python doesn't have a standard at all. Nothing is standard about any aspect of Python outside of marginal stuff like floating point or XML etc. Anaconda is as legitimate as any other tool that works with Python. This is how it was intended. You probably wanted to say "not as popular as", which would be true, but also Anaconda is popular enough for this to be a problem.
> which is a just a subset of Python modules.
Are you sure you know what Anaconda is? You make the opposite impression...
> Anaconda is not a competitor for PyPI
Anaconda is a competitor of PyPI. It literally provides its own package index (this is what P and I stand for in PyPI).
> Everything sucks if you've been around long enough.
Python sucks. Let's not extrapolate this to other things. Marriages, for example, usually don't suck if they lasted long enough. I can think about few more things that get better with time.
But, Python is not a good language by any metric. But it's also not unique in that aspect. So, idk why would you drive so much attention to this fact. Good languages are rare, good and popular -- I'm yet to find one.
> Who in the world cares about this?
Parent poster of the post you replied to. But, more broadly, common interfaces are important because they allow one to avoid vendor lock-in, lower maintenance cost, reduce the onboarding time for the new developers.
> Your list reeks of someone flinging critiques without even knowing what they’re talking about
I don't care to name names. Python is garbage, and I never claimed otherwise. As for knowing my stuff... so far you seem to be that kind of guy. But, keep going. Sometimes the urge to argue may lead to you read about the subject of your argument.
> This is what people believe when they don't know how it works: no, you cannot.
Yes, you literally can. Whatever you think is the problem with this, it's not a problem for me, or hundreds of people I've worked with, who are doing this. So, your problem is a niche nitpic irrelevant to pretty much anyone.
> Python doesn't have a standard at all.
Again, no one cares, in a practical sense. When you install Python 3, you get pip, that's what "standard" means here, colloqually. You calling something "garbage" because of your desire for some kind of strict hierarchy of paperwork is your, very niche, personal problem that no one else cares about.
> Are you sure you know what Anaconda is? You make the opposite impression...
yeah, it's a snake. haha, get it?
> Anaconda is a competitor of PyPI. It literally provides its own package index (this is what P and I stand for in PyPI).
Yeah, two things having an overlap in functionality does not mean there is some kind of competition. It's just different tools solving similar, but different problems.
> Let's not extrapolate this to other things. Marriages, for example, usually don't suck if they lasted long enough. I can think about few more things that get better with time.
No, actually, the entire point of the saying went over your head. If something has been around long enough, it has accumulated both good and bad. Marriages, any things you can think of. The point is that you judge the whole thing, not just by how bad the bad thing is. Python is solving a huge amount of technical problems, and for many of those problems, it sucks a lot less than PHP, Perl, C. The sheer fact that it is popular makes it suck less than any language that is obscure, that no one else know, except you. I'm sorry you have a favorite BNF grammar that is useless for getting actual things done in the real world.
> I don't care to name names. Python is garbage, and I never claimed otherwise. As for knowing my stuff... so far you seem to be that kind of guy. But, keep going. Sometimes the urge to argue may lead to you read about the subject of your argument.
What you think you know, by merelly calling Python hot garbage, you lose any intellectual authority, by displaying emotional immaturity. Why not throw some names around? I can name Oberon, it's a better language. There are better designed languages. It's all the rest of the rant, that is complete nonsense, that shows someone who clearly doesn't understand why people use Python to do things like find new particles, earth like planets, or cancer curing molecules, let alone build boring web apps. Anaconda works great, pip works great, Python is great to read and write, the library ecosystem is fantastic, it's the best language to get a lot of work done, and I've considered all other choices every other project for 20 some years, and Python has been many times the top choice.
> Sometimes the urge to argue may lead to you read about the subject of your argument.
This is not an argument, you're fighting windmills, and I'm playing the internet, lol.
> Yes, you literally can. Whatever you think is the problem with this, it's not a problem for me, or hundreds of people I've worked with, who are doing this.
Well, numbers of people who don't know how something work isn't a proof of anything... Also, "doesn't work" in this context means that the design of the feature is flawed and in corner cases cannot be made to work. This is different from, for example, a bug that suggest that the design is fine, but implementation failed. This is also different from "doesn't work at all". But, it would be too obviously false to be considered.
The reason why it doesn't work is this: some Python packages are distributed as source distributions. There are plenty of reasons for that, which I don't want to go into. In this case, pip will try to build these packages, unless you tell it not to (but then you won't be able to install them, which is probably not what you want). It doesn't matter whether you use pyproject.toml or any other description of the build system you use: pip will have to somehow find it and run it. And, at this point, all bets are off. conda may improve detection of such cases and identify potential breakage caused by this, but there isn't a universal solution to this problem, and with the current position from people working on Python infrastructure there won't be one.
Conda will be also unlikely to give it in to pip because conda's packages are a lot better designed than PyPI. It would be a huge downgrade if they do. So, I expect this to be a problem for a long time.
> Yeah, two things having an overlap in functionality does not mean there is some kind of competition.
This is exactly what it means: overlap in functionality means competition.
> yeah, it's a snake. haha, get it?
This took an unexpected turn to... an elementary school?
Yeah, I’m sorry, you’re bringing this down below elementary schools logic adequacy levels.
No, overlap in functionality is not the definition of competition. Mugs and cups have overlap in functionality, it doesn’t mean they’re competing in your cabinet. You can use either one, depending on the specific occasion or fancy.
You’ve rendered the following terms completely moot:
“Something works” and “something competes”
Arguing the sky is not blue because you can see infrared, which few people care about, is great entertainment, and I’m learning a lot about your vision, but you’re not going to enlighten anyone.
Python .pth files are horrific. Here's an actual .pth file I was dealing with the other day (from Google Cloud Storage) which completely prevents you from overriding the module using PYTHONPATH:
import sys, types, os;has_mfs = sys.version_info > (3, 5);p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('google',));importlib = has_mfs and __import__('importlib.util');has_mfs and __import__('importlib.machinery');m = has_mfs and sys.modules.setdefault('google', importlib.util.module_from_spec(importlib.machinery.PathFinder.find_spec('google', [os.path.dirname(p)])));m = m or sys.modules.setdefault('google', types.ModuleType('google'));mp = (m or []) and m.__dict__.setdefault('__path__',[]);(p not in mp) and mp.append(p)
If .pth files are the worst thing you can find to complain about, Python's doing pretty well. That horrific .pth file in question is better placed as the feet of its creators than the mechanism itself.
I searched all my machines and could only find simple ones, all related to some legacy crud for setuptools, one of which was a way to get eggs (which nobody should be using these days) to work.
They're barely a thing these days and mostly a relic from before the end of the Great Setuptools Stagnation and the site package all but discourages their use.
It shows that the language is highly dynamic and you can patch anything? The .pth mechanism allows the party controlling the Python installation (site) to run some init code before any user code, basically an rc mechanism. Nothing more, nothing radical. Maybe you’re unhappy with the dynamism, in which case your complaint is misplaced.
>the people who love it don’t understand the extreme flexibility makes it fragile at scale and are willing to put up with its annoyances in an almost stockholm syndrome way
The people who love it understand that its extreme flexibility makes it applicable everywhere, while academic purity mostly doesn't work in the real work. They also prioritize getting things done over petty squabbling, but they know how to leverage available tooling where reliability is crucial.
The reason is a popularity not a technical one. It's inevitable to get a diverse interest to improve different parts of ecosystem by different parties.
After cpyext and cffi, this is the third attempt, largely driven by PyPy people, to get a C-API that people want to use.
If they succeed and keep the CPython "leaders" who ruined the development experience and social structure of CPython out of PyPy, PyPy might get interesting. If they don't keep them out, those "leaders" will merrily sink yet another project.
cffi is used to wrap c libraries. Only a masochist would use ctypes to wrap a whole library. While both are technically FFIs, it does not make sense to compare them. From a conceptual perspective, cffi was written to replace the C-API for C modules.
The python3-config works generally well but is only available at the OS level. But you don't want to mess with that (e.g., to access pip-installed packages). Beyond that, everything is a mess! python3 -m venv doesn't even bother creating such a script. anaconda/miniconda? Don't even try!.
So every package pollutes their build scripts with many hardcoded `python3 -c "import sys: print..."` calls.
I've opened a CPython/PR that may help a bit by adding `python3 -m sysconfig --json` flag [0]
[0] https://github.com/python/cpython/pull/123318