Agree. I had a build pipeline pulling down an internal package to build into a `.pex` [0] file for running a Python program in a release pipeline that did not have access to the internal package repository. I came up with this solution because the typical virtualenv that you might build locally does not appreciate being copied wholesale to another system (iirc it had some absolute paths).
One day the release pipeline broke because a dependency's authors published a new version containing a wheel for a newer ABI, though the older ABI was still supported. My build pipeline pulled the one for the later ABI because it was compatible, but the release pipeline environment, which could not be upgraded, did not have the newer version of gcc required. It was a nasty introduction to the fact that one Python package version can have multiple published wheels targeting different C++ compilers!
But the worst part was not that I learned something new! It was that I wrangled with both pex and pip for a few hours trying to figure out how to download a wheel for an older ABI as both tools resolve the latest compatible by default. The options to do so are ostensibly there but the effect was not. I definitely could have made some mistakes but in my mind it shouldn't be _hard_ to get it right.
I ended up downloading the wheel for the internal package in the build and copying it directly into the artifact. The release pipeline resolved public dependencies and installed the internal package from the local file.
Not sure what you mean. pex is the dependency lock, at least from a version perspective. To be clear, it broke after I intentionally upgraded the offending package to the later version.
I admit to being unaware of "versions inside versions" where a version may have multiple published ABIs that are not compatible across systems, but a nice packaging system would still make it easy for me to use platform X to build for platform Y.
Oh, from your "One day the release pipeline broke because a dependency's authors published a new version..." wording it seemed that this breakage occurred without you upgrading anything.
> It was a nasty introduction to the fact that one Python package version can have multiple published wheels targeting different C++ compilers!
Erm, is this not the failure of C++ standards instead of Python?
This seems like Python is the failure point being caused by C++ not having a stable ABI. Python is simply trying to paper over the brokenness of the C++ ecosystem.
Python is simply trying to paper over the brokenness of the C++ ecosystem.
I'd be pretty comfortable saying that this is indeed a Python problem. C++ ABI interop has been an issue for decades. Instead of papering over it, Python should be exposing a host triplet (or whatever) so that end users can more easily identify what is and isn't compatible.
Pretty much every other ecosystem (except perhaps Javascript?) does this. Look at FreeBSD packages, the ABI is used as part of the identifier. GNU Autoconf, rvm, homebrew, rust, etc., etc. all use and expose an ABI identifier so that you aren't accidentally going to mix and match things.
Python packaging doesn't have good ways to depend on specific versions of <insert-other-language-here> packages. This isn't specific to C/C++ and is mostly historical: it's designed for an ecosystem of packages where everything is written in Python! But nowadays, more and more Python libraries are bindings to code written in many other languages, so you need a package management approach that includes this.
There's no way to depend on specific versions of C or C++. The actual binaries vary depending on architecture, compilation toolchain settings, compilation flags, link options, and all sorts of other things. There's really no way to address those built artifacts even if they were available on pypi or something.
1. Build the artifact that you depend on, publish it on "pypi or something" referenced by both the version of the library and a hash that uniquely identifies the build. This is the approach taken by conda, for better or worse.
2. Only allow one canonical build of a library you depend on so that the "version" becomes a unique identifier for the build. This is the approach taken by distribution package managers.
3. Create metapackages that describe ABI constraints that are required by packages and must be satisfied by the underlying system. For example, a "cxxabi" package could be provided by the underlying system, and the packaging tools could automatically add dependencies on "cxxabi" at build time, based on either an exact pin to the library built against, or in some cases, a relaxed dependency by inspection of versioned symbols used by the binary.
4. Statically link all your dependencies and/or vendor all your dependencies. These are used by quite a few pypi packages that depend on standalone C libraries to avoid most of the issues altogether.
Of course, all of these either have flaws [1] or are so detailed that they're distributed build caches with more steps. You can hash project source files, all build commands, all textually included headers, precise versions of toolchains, etc., into a Merkle tree, but this is not generally how python applications pin "versions" of dependencies.
[1] For instance, you cannot version a C or C++ library build independent from the versions of all of the transitive dependencies of that library (more or less). Of the options listed here, only distribution package managers can really account for this problem, and not every distribution package manager cares to.
My thoughts are that viewing binary package distribution/archival as really just a distributed build cache is the only real way to go, and that's exactly why pypi and associated tooling has so many flaws.
That’s a good point! However, there’s still things Python could do to help make the experience better.
The package I was using had multiple artifacts per version, including the raw zipped source code and wheels for each ABI they built for. It’s certainly convenient that the most applicable one for your current system is pulled automatically by pip. But in this case that’s not the behavior I wanted, and I could not successfully get pex or pip to download a “less optimal” artifact.
You can blame C++, but if Python is already papering over deficiencies it’s not unreasonable to expect improvements that aren’t fundamentally impossible. It should definitely be possible for there to be one easily understood argument to specify which artifact pip should download. In my (potentially flawed) experience, figuring out which ABI value to pass was already not straightforward. Even then, the arguments for selecting an ABI or even asking for the raw source instead of a wheel weren’t seeming to have an effect.
One day the release pipeline broke because a dependency's authors published a new version containing a wheel for a newer ABI, though the older ABI was still supported. My build pipeline pulled the one for the later ABI because it was compatible, but the release pipeline environment, which could not be upgraded, did not have the newer version of gcc required. It was a nasty introduction to the fact that one Python package version can have multiple published wheels targeting different C++ compilers!
But the worst part was not that I learned something new! It was that I wrangled with both pex and pip for a few hours trying to figure out how to download a wheel for an older ABI as both tools resolve the latest compatible by default. The options to do so are ostensibly there but the effect was not. I definitely could have made some mistakes but in my mind it shouldn't be _hard_ to get it right.
I ended up downloading the wheel for the internal package in the build and copying it directly into the artifact. The release pipeline resolved public dependencies and installed the internal package from the local file.
[0] - https://pex.readthedocs.io/en/latest/