This is a great writeup by a central figure in Python packaging, and gets to the core of one of Python packaging's biggest strengths (and weaknesses): the PyPA is primarily an "open tent," with mostly independent (but somewhat standards-driven) development within it.
Pradyun's point about unnecessary competition rings especially true to me, and points to (IMO) a hard reality about where the ecosystem needs to go: at some point, there need to be some prescriptions about the one good tool to use for 99.9% of use cases, and with that will probably come some hurt feelings and disregarded technical opinions (including possibly mine!). But that's what needs to happen in order to produce a uniform tooling environment and UX.
To me the natural choice would be pip which, as OP points out, has the advantage of shipping with Python by default. We could remedy the shortcomings of pip by incorporating the most successful features from the N alternatives that exist today.
The npm/yarn fork that formed in 2016 is the closest analogy I can think of. My impression is that npm improved on its biggest weaknesses (e.g. a lock file) and has remained the default choice in that ecosystem. I suspect the same would happen with pip if it too made significant improvements.
Even if pip was a little better I still wouldn’t use it, because Poetry solves the problem so well you’d need to pull it out of my cold hands. It solves not only the install package problem, but locking, venvs, project structure and packaging in a well integrated solution with a fantastic UX.
If poetry (or similar) were renamed to Pip and it were to be included in the stdlib id obviously switch.
It doesn’t solve the single static binary problem or docs problems, but it’s much much closer.
Everytime you install a package through poetry, you're also running pip code.
You didn't even understand the point they made: the need for Pipenv and poetry would pretty much go away if pip added support for a proper lockfile and venvs. And that's the only correct choices as pip is already pythons package manager.
Any multi step process is guaranteed to produce random assortments of workflow tools to manage those steps which will seek to replace the original process as the one true process.
If there two steps, one part of the community will insist they belong in: a makefile, bash script, python script, lambda network service, bazel, pants, scons, terraform, ansible, npm …
before poetry and pipenv was a thing, pretty much everyone was recommending the virtualenvwrapper script instead of using venv directly...
so yes, thats what it ultimetaly always ends up as.
Poetry ended up being mostly a disappointment having used it for the last 18 months. First time installs were eventually an hour long to resolve, and lock files were platform specific, making using them for reproducibility almost useless. The wider python ecosystem is frankly painful to productionize even today and it really does need some focus and governance.
I spent a long time migrating a project to use poetry. One of the reasons I opted for poetry over others was that the lockfile retained all of the environment markers in the packaging metadata, so that the lockfile could support multiple interpreters and interpreter versions, multiple platforms, etc.
I can't speak for Poetry directly, but knowing how Python dependency resolution works: I don't think Poetry can make lockfiles not platform-specific, since package source distributions are allowed to (and regularly do) run platform-specific code for their own dependency selection logic.
For example, your package might depend on `foo`, which in turn could sniff the host OS and select the appropriate subdependency. You'd then end up pinning that subdependency, which would be incorrect on a different host OS.
(Similarly for Python versions: a subdependency might be required on < 3.7, so re-installing from a lockfile generated from an older Python could produce a spurious runtime dependency.)
If you use C-extensions or platform specific packages like pywin32, how would.you expect them to NOT be platform specific? Would you want it to only allow pure Python packages?
I've given up on poetry. I still to this day can't get it to use the active version of python, which you'd think would be the default. Even after setting the config flag. Setting the config flag somehow made things worse, all my poetry envs started using the wrong python. I can't even force it to use the version I want with 'env use /path/to/python', I can only pass stuff like 'env use 3.9'. What the hell is 3.9? I want to pass a specific path to a version I have.
Shipping with python is a disadvantage; any real workflow tool needs to be able to manage multiple python installations, and if you try to use a part of the python installation to do that you've got a circular dependency problem. Plus the whole "where modules go to die" thing.
To me, that’d be a UX nightmare. Even if I tried to, I’m not going to memorize which of my 20 checked-out Python projects are requiring which Python invocation.
IMHO it’s absolutely the toolchain’s job to manage that for me. I’d never adopt a toolchain that wouldn’t.
When exactly are you defining as “traditional”? The Python point releases have generally been painless for a long time. There was a little friction in the early 2-3 era before they made it easier to write code which worked on both without translation but in general it’s been at least a decade since I’ve cared about this except for one time where a couple of libraries depended on a private API which changed in Python 2.7.9, and that was back in 2014.
Maybe in the 2.4 -> 2.7 transitions - Python3 after about 3.6 is pretty good, you might see some minor package breakage if you're running the very latest version, but that's fairly rare.
Generally speaking, if you're on a modern OS - like recent Ubuntu LTS or MacOS you're getting 3.8 -> 3.10 and compatibility is very good in these releases.
You might want to look at micromamba as a separate tool that can install python itself, libraries, etc. with zero dependency on a python being installed. It's a very fast tool written in C++ so it's easy to install and use almost anywhere. It uses the conda ecosystem of packages.
> Pradyun's point about unnecessary competition rings especially true to me, and points to (IMO) a hard reality about where the ecosystem needs to go: at some point, there need to be some prescriptions about the one good tool to use for 99.9% of use cases, and with that will probably come some hurt feelings and disregarded technical opinions (including possibly mine!). But that's what needs to happen in order to produce a uniform tooling environment and UX.
Not to mention that all efforts would be focused on improving a single package manager. Even if the one that was standardized wasn't the "best" at first (by whichever metric you prefer), giving everyone an incentive to improve that one will likely make it the best within a few years.
The primary (probably build and) packaging system for a software application should probably support the maximum level of metadata sufficient for downstream repackaging tools.
Metadata for the ultimate software package should probably include a sufficient number of attributes in its declarative manifest:
Package namespace and name,
Per-file paths and Checksums, and at least one cryptographic signature from the original publisher. Whether the server has signed what was uploaded is irrelevant if it and the files within don't match a publisher signature at upload time?
And then there's the permissions metadata, the ACLs and context labels to support any or all of: SELinux, AppArmor, Flatpak, OpenSnitch, etc.. Neither Python packages nor conda packages nor RPM support specifying permissions and capabilities necessary for operation of downstream packages.
You can change the resolver, but the package metadata would need to include sufficient data elements for Python packaging to be the ideal uni-language package manager imho
For me, before I even get to packaging, I hit the install/environment issue with python. Python, by default, wants to be installed at a system level and wants libraries/packages to be at a system level.
That shit has to stop. The default needs to be project local installs. Node might have issues but one thing they got right is defaulting to project local installs vs python where you need various incantations to get out of the default "globals" and you need other incantations to switch projects.
However, you'll find that as with all packaging discussions there are people opposing it, because their workflow doesn't match yours and they don't want to change how they work. We, as a community, need a way to resolve such stalemates or I fear we won't make much headway
We need some person, a nice person, a benevolent person, who could some how tell other people what to do, dictate it if you will, and it would be best if they could keep up this job for the rest of his or her life. We’ll call them, the Friendly Language Uncle.
My solution to this problem has been … to stop using python as a universal tool for all things and instead use other tools purpose built for those purposes.
Unfortunately (or not) that basically means no more python. Because it is definitely a “Jack of all trades, master of none”.
Focusing on programming languages that just want to be programming languages and not also a system service makes life so much better. You end up using languages that produce self-contained, easily shippable binaries, or languages with easily embeddable runtimes, instead of trying to write code that has to somehow survive in a “diverse ecosystem”, which generally makes it overly bloated and brittle as it grows so many appendages to solve so many orthogonal incompatibilities it comes to resemble enterprise open source…
You can do this now by creating a Python virtual environment[0]. Then you can package your project with a requirements file and some instructions on its use.
I use Python VENV often, and it works really well.
Activating virtualenvs isn't necessary. In every python project I work in, I do
$ cd projectFoo
$ ./ve/bin/python whatever...
(more realistically, it's `make whatever` which then builds the virtualenv into `./ve` if needed, pip installs required packages into it, and runs the command).
Yes, I agree that it would be nice if the default behaviour of `pip install -r requirements.txt` was to install it in an isolated virtualenv specific to that project, but it's not also not like it's completely impossible magic.
This is the important difference. Scripting languages should default to examining the current directory and then its parent directory etc. to find the resources they need. Python doesn't have this default and probably can't change at this point.
The Debian dist-packages/site-packages mechanism alleviates a lot of the problems with dependency hell and clobbering from packages needing to be installed. It's a shame it hasn't been embraced by mainline Python. A case of perfect being the enemy of good. Instead we get the chaos of pyproject as the next great thing.
This PEP is meant to ensure you don't clobber your Linux distro's base environment and break key OS applications.
Also, this problem doesn't exist on e.g. Windows, where there is no OS installed python to clobber. Other folks have taught themselves to always use virtual environments for this very reason and therefore don't share your problem.
Hence, there are tutorials out there that don't talk about your problem and tools exist where the default behavior might be dangerous on your Linux distro.
I get a lot of flak for saying this, but I do hope python committers and psf members take a hard look at perl and its demise. Perl had a very similar problem of having N equivalent choices for doing the same thing and python for better or worse is heading in the same direction.
In the last few yrs of my professional career many projects I worked, were on are migrating python -> golang cuz frankly many people were fed up of python2->3 migration.
Not that golang doesnt have it fair share of issues, but the the packaging and deploying aspects of a single binary makes CI/CD and operations a breeze. I hope python makes this is first class citizen in its ecosystem.
These days I don’t think I’m being too brash in saying that anyone dealing with Python 2 to Python 3 transition stuff is so deep in the realm of enterprise abandonware that it’s not worth paying it much attention. Not necessarily including you here!
But as someone that went through the transition, I’m certainly not going to deny how much of a shit show it was. And I get a feeling that the two situations were/are caused by the same political / organisational / philosophical factors.
I’d be miffed if the Python / PyPA mob got so distracted with their internal politics, certain terse personalities, or the hands-off competition-is-good ideology, that the packaging story makes Python an increasingly unappealing choice. Even worse, we could end up with a packaging ecosystem run by effing Microsoft like the JS people do, because “competition”.
And yes I’m totally conflating aspects of packaging here. At least we all settled on PyPi, except for all those that haven’t… :)
The industry I am in is extremely conservative about its upgrades and changes. Between that, our custom software interfacing with the 800 pound gorilla for our market, and a few other things, we're behind.
Now, this software depends on a very specific version of ArcGIS. Desktop, not Pro. Not just version 10. Not just version 10.2. But version 10.2.1.
If you go look at ESRI's page for Python and 10.2.1, it says, in large and bold letters (which is something ESRI doesn't do in its documentation very often), NOT TO UPGRADE OR CHANGE PYTHON VERSIONS, EVEN A TINY BIT.
And that version is 2.7. I couldn't even get pip working when I wanted to install a very old version of lxml I wanted.
I guess what I am getting at is, as a language becomes successful and has market penetration, it also seeps down, way far down, in the chain of dependencies. It's the price of success, essentially. I think language maintainers should really pay more attention to that. Just as you know how a program ends up sticking around longer than you originally thought, so too do versions of your language. It is in some ways akin to trying to explain to da Vinci that very far in the future, some of his artworks will adorn clothing and coffee mugs: the good stuff just gets to places you would never dream.
You're situation is related to this: https://pypackaging-native.github.io/key-issues/native-depen...
Where it is incredibly hard to align various pieces of software spanning OSes, architectures, programming languages. The result is that you have to compile the whole universe for them to work together. In your case that universe is stuck on 2.7 and it will require a large effort to get everything up compatible with a recent Python version
I guessed you worked in GIS before you mentioned ArcGIS. Our old software was tied with QGIS and I worked hard to de-couple. Tying a Python env to these whales is insane.
I think a Cython extension or a second process that uses ArcGIS, with a simple interface between them, might be a better choice for the future? Never needed that though.
I'm convinced 99% of all issues with pip can be solved with 3 simple aliases. It already has everything else, it just the UX that isn't opinionated enough. There wouldn't be a need for Pipenv, Poetry or yet another project-definition or lockfile format, if only these few fundamental commands with sane defaults were included out of the box.
1. Installation. Just install everything in a bloody application-specific virtualenv by default and use the existing lockfile mechanism by default.
3. Running. Remembering to activate a virtualenv is too easy to forget and leads to mode-confusion. We need an equivalent of npm run.
pyrun='./pip_modules/bin/$1'
Done. That took 5 minutes tops. Now if you were to do this just a little more seriously its of course a bit more effort but still within the realms of weeks/months, not years. It seems they are more busy bikeshedding forever about the build-backends or whatever else that only very few packagers cares about. The packaging of course has its issues but they are all minor compared to preserving end-users sanity. The UX is the #1 issue that impacts millions of users daily, it can and just need to be fixed yesterday.
> Now if you were to do this just a little more seriously its of course a bit more effort but still within the realms of weeks/months, not years.
My feeling is that that approach would cause dozens of itches to pop up. Things like, “my coworker is on Windows and I’m struggling to come up with step-by-step instructions for her to set up the project.” Those are valid UX issues but likely just the beginning of a rabbit hole.
I think that tackling those issues would inevitably lead to yet another Poetry/Pipenv clone – because with UX, the devil is in the details.
> A class of users expect a packaging tool that provides a cohesive experience (like npm (NodeJS), [...], etc) – a single tool that provides a build system, dependency manager, publishing, running project-specific tasks/scripts, etc. I’ve referred to this as “workflow tool” in this post.
I don't know enough about node, but aren't there at least two or three package managers (npm, yarn, maybe pnpm)? Then there are half a dozen different things for the "build" / transpile / compile stage of frontend work, and
> Pick from N ~equivalent choices is a really bad user experience
this has caused me to bounce off of getting into frontend work several times. It's so aesthetically displeasing that my brain doesn't want to learn it.
...aren't there at least two or three package managers (npm, yarn, maybe pnpm)?
I'm not sure about yarn, but pnpm has exactly the same API as npm has. It simply has a different disk organization and caching strategy. If npm maintainers so chose, pnpm's behavior would be an option within npm. Since they haven't chosen that, it seems completely reasonable to "compete" in the way that pnpm does.
Yes, but one has to come to that conclusion oneself, which increases friction.
I realise you’re talking about pnpm, and not yarn, but when I think of issues like this, yarn is what comes to mind. A bunch of projects prefer yarn over npm, or at the very least place them side by side, in the project documentation. As someone reasonably familiar with JavaScript, but that only works with it sporadically, I always find myself going back and trying to reverse-engineer the reasoning, and I’m never entirely confident.
So when I circle back to my 6-monthly journey into writing some JavaScript, and see another thing called “pnpm” making its way into package installation docs, I cannot be blamed for sighing and being a little irked. Especially since my daily driver is Python, so I’ve got loads of packaging trauma.
I think the nice thing about the node ecosystem is that all the packaging tools are using the exact same package.json. The only thing that’s different is the way they resolve/install the packages.
Yarn has a sightly nicer CLI than npm and (I think) innovated a bit more quickly. pnpm mitigates the node_modules problem while otherwise just being npm.
You can usually substitute the package manager in the docs with your favorite, but in doubt just use the one they use.
Yarn used to be compatible with npm package files too. But now that it's moving away from it I probably won't use it anymore; the advantages it brings are not worth the incompatibility.
I'm not an expert at JS, but can't you use npm to do multiple things if you choose? I.e. build, install, create a place to install some package. In Python, traditionally, this would need separate tools (setuptools, pip, virtualenv). That's what the author meant – not that there are no competitors.
There really needs to be one tool that combines installing, building, testing and running Python projects. This tool should be officially endorsed by core developers of Python and gradually added to the standard distribution model. The good starting point for this would be Pip.
It is de facto standard tool for installing packages. It can be extended to do more.
Also, the standard should favor pure-Python packages, in order to untangle Python from it's legacy as a glue language for modules written in C(++). Otherwise, no progress will be made at the language level and Python will die out once C becomes legacy language, replaced by safer systems programming languages.
We need to learn from Java's and Javascript's ecosystems. Those languages are now used mostly in their pure form. They are portable, optimized and more stable than most languages. With enough care, Python can become the same.
> Also, the standard should favor pure-Python packages, in order to untangle Python from it's legacy as a glue language for modules written in C(++).
Never gonna happen. Python's explosion has been through data science and ml, which is based entirely around calling Fortran/C/C++/etc.
> Otherwise, no progress will be made at the language level and Python will die out once C becomes legacy language, replaced by safer systems programming languages.
I assume you mean Rust here. Do you think python can't already call rust? It's been able to do that for the better part of a decade.
I mean, this is once of python's main selling points. It doesn't matter what your numeric/scientific library is written in, python can call it, and it's vaguely convenient to write (depending on your taste for comprehensions).
Why would the python ecosystem shoot themselves in the foot like that?
"Never gonna happen. Python's explosion has been through data science and ml, which is based entirely around calling Fortran/C/C++/etc."
What's Django, then?
"I assume you mean Rust here. Do you think python can't already call rust? It's been able to do that for the better part of a decade."
Python can only call C code. The fact that Rust itself (or Fortran, btw) has good C ABI is on Rust, not Python. Also, by calling Rust's C ABI, you are making a double indirection, thus severely reducing performance.
That's why I said we need to optimize Python, and even make progress towards making Python - in Python itself. Just like Go can compile itself, Python should be able to run itself. But, that is all for the long run.
Pretty rare compared to data work, at least among python projects I've worked on.
> Python can only call C code. The fact that Rust itself (or Fortran, btw) has good C ABI is on Rust, not Python. Also, by calling Rust's C ABI, you are making a double indirection, thus severely reducing performance.
That's a bit of an implementation detail, but fair. There is some overhead. But every language since C has shied away from committing to a stable abi (or at least the C ecosystem is willing to commit to one, the standard mentions nothing about it).
That's only overhead at the boundaries though, and the common pattern of using python as an orchestrator with the vast majority of logic happening inside native code really does minimize the overhead you actually experience. At least you can architect an application that way (same pattern works with things like pyspark as well).
> That's why I said we need to optimize Python
By all means, optimize python. I think you'll run into the GIL pretty fast, but I wasn't objecting to that. I was objecting to making things like pyspark or tensorflow second class citizens. Doing that is going to destroy python.
I'm doubly confused. Not only is Python perfectly capable of binding to Rust, but also why would you want things you'd usually do in faster languages to be done in Python? It's great as glue but I think we'd agree that it's not very good at some performance critical things.
> Also, the standard should favor pure-Python packages, in order to untangle Python from it's legacy as a glue language for modules written in C(++).
Why? Except for the C(++) part, which was never particularly accurate in the first place except that that’s the most popular set of lower-level languages to start with, what's thr problem with Python being a glue language?
> Otherwise, no progress will be made at the language level
Clearly false, as progress continues to be made at the language level.
> Python will die out once C becomes legacy language, replaced by safer systems programming languages.
Why? Python works as glue for Rust as well as anything else. If anything, the biggee threat to Python as a glue language is more dev-friendly system languages, not safer ones, but even there I don't see how packaging deemphasizing support for non-Python modules does anything but accelerate Python problems.
> We need to learn from Java's and Javascript's ecosystems. Those languages are now used mostly in their pure form.
They always were, though; both were designed for use cases without any reliable lower-level besides their own VM, and became popular in that environment. That's not what Python’s ecosystem grew on, and arguably what Java and JS teach here is lean into what made you a success.
Python has its warts and this one of them, but I think people embellish how bad it is. Perhaps the reason why there isn't a singular packaging solution like other languages is because most Python users get along just fine with the alternatives that exist.
The (absent) packaging system is the single greatest reason I’ve moved away from python professionally. Life is too short to do without vendoring and/or lock files - especially in an interpreted language.
I don't know, I'm pretty happy with Pipenv. It's the standard among the people I talk to and very good at being "pip, but better". I've tried poetry but don't really find it good enough to justify being so different, maybe unless you already have a pyproject.toml anyway.
There's a lot of talk about Python packaging but as a semi-casual user I've actually had very few problems with it.
Moved all my projects to Poetry for reasons. But in hindsight, I must say that Pipenv used to work just fine for me, as does Poetry today. No major issues so far with either tool.
What I do miss is being able to `pipenv run myscript`. That UX has degraded slightly for me, having to say `poetry run poe myscript` now. Barely an inconvenience though.
Packaging and Nvidia are pretty much the main reasons python has degraded in status for the past 7 years. It started with the added and unnecessary confusion from conda (and silly things like pythonxy), but has really escalated through extra stupidities like poetry. Much of the features that may be enticing for these extra packages should have pushed the developers to improve the standard tools in python (pip).
The other problem is due to python's lead in ML and the need to work with GPUs. That started with tensorflow, which had immense (still does) issues in dealing with versioning, mostly due to Nvidia (matching cuda versions and such). So effectively, Nvidia and package management killed a good language.
Poetry certainly made our life much easier and was a productivity improvement. It the the tool that brought us back from Conda because it was able to solve environments than nothing in Pip land was able to do.
It's unlikely Python would still be a popular language without Conda. As the article says, many Python users are not traditional software engineers - they need a tool that just works and Conda is the closest thing in the ecosystem.
Used to spend much time to optimize Gentoo Linux but eventually switched to Ubuntu to be productive. I feel Conda is similarly useful, especially if not considering Docker i.e. local dev, and am disappointed hackers seem to be dismissing it.
Poetry seems pretty slow due to some of the extra dependency logic it builds on top of pip. Whether this trade off is worth it probably depends on the scope of the project. For most of my projects I stick with pip and almost never run into issues.
The problem is that poetry/conda/etc chose to make their own thing to begin with, when they should've just improved the standard language tool with the features they wanted. That's one key difference between shit languages (such as R) and languages that have good tooling (e.g. Rust) - the community works on the core together more rather than each person doing their own thing and confusing everyone in the community by not having a standard way to do anything.
Every package management developer for python should have just added features to the standard package manager instead of doing their own thing.
It will be a VERY deserving fate, since the community is always busy fighting amongst themselves and the maintainers are conservative to the point of being unreasonable.
Maybe such a community deserves obscurity.
Python has real problems that need addressing, and that was true 10 years ago as well, packaging included. Pretending this isn't the case makes more serious techies just laugh at Python, and it's fully deserved.
We do tech, for God's sake. We don't do religion. Merit is what should call the shots.
Also a good language for non-professional programmers. When people ask me which language they should learn as a first language, I recommend Python. Not because it's a great language; but because it has a massive ecosystem that allows beginners to build fun things with minimal effort.
The greatest strength of Python, I think, is that it allows cross-disciplinary collaboration, because it's easy for non-programmers to learn.
Which makes the lack of a functional easy-to-use package manager all the more frustrating.
You're not wrong but it's a double-edged sword: people's first language leaves a deep impression and people often can't un-write (from their brains) the wrong programming practices they learned from Python.
I've seen people openly admit they'd learn their second language much faster if they didn't try to constantly compare it with Python.
As usual, hindsight is 20/20, and nobody is telling you these things beforehand. And those who are lucky enough to have wisdom shared with them are usually quick to dismiss it and not listen to it.
But isn’t this true no matter the first- and second-language combo? I started out writing C and this definitely influenced how I learned and wrote python code. My colleagues that started out with ruby, or java, also program in distinct ways that show their “accent”, so to say.
I have seen recent predictions that put Python as the main AI language in 2030 with 90% confidence. It's not dead or going away.
Python also works better in Codex and chatGPT, as an output of AI. Almost all AI research comes in Python, most of it never gets reimplemented, and the cutting edge stuff are usually just in Python. So if you want to try the latest toys - you guessed it - you need Python. Or be prepared to wait for 6-12 months/forever to get ported to your lang, and be prepared for bugs and less support
AI is going to be a differentiator for programming languages. The languages with more code out there, more questions answered on SO, will work better, causing more adoption, it's the rich get richer problem.
I got no horse in the race and I don't care if Python is "going away". It likely will not since it has a huge inertia and devoted fans.
Furthermore, you citing the current state of affairs is not convincing. You're basically saying "the sun is shining now at noon, surely it will keep shining during midnight". Or "right now it's raining so surely it'll keep raining 24/7 for the next year".
Also who predicts stuff for 2030 with 90% confidence? I'd like to have that person's self-esteem because nobody can predict as far into the future.
I've been part of a number of communities and Python's seems dysfunctional and anarchic.
Maybe that's a good thing and breeds creative forces -- the proponents certainly make that case, I heard, and I'm not opposed to the idea, just a tad skeptical. Time will tell, right?
In the meantime, Python is still missing some stability guarantees and a good package manager. As a programmer that's a turnoff, so I work with other languages. Make of that what you will but I'll restate that I got no horse in the race. I'm using my experience to judge if something seems a good fit to work with, and maybe -- does it have a future.
Python packaging and versioning has been a shitshow for far more than 7 years. The language has existed since the early 1990s and the philosophy about backwards compatibility and packaging hasn't changed.
Python makes backwards incompatible changes to the language all the time, even in dot releases. This means that you can't just install everything into some shared directories since some code would need version X of Python and some would need version Y. So you need virtualenv or something like that.
But virtualenv isn't enough either because most actually useful Python code calls into C libraries (Pandas, Tensorflow, Matplotlib, etc.) And those need to be managed by something like RPM, deb, or even docker.
This is leaving aside implementation issues. For example, many python dependency management tools are incredibly slow and there are some differences of opinion on how to express dependencies.
I don't understand your paragraph about C libraries. With wheels (which pip can install) there is no problem packing, and linking to, compiled C libraries.
It's often not feasible for legal or regulatory reasons to install bundled versions of C libraries. Like most enterprise software wants to use the system version of openssl because otherwise the vendor has to scramble to update for each openssl security issue.
Even when it is possible to install bundled libraries, someone has to do the work to set up Python packages that bundle the native libraries. This work is done for some of the most important pacakges like TensorFlow, but probably not for some more esoteric library you want to use. I also suspect that people seriously using TensorFlow probably want careful control over which version they are using, rather than letting a random Python package manage that.
Basically, interacting with native libraries is a hard problem to fully solve for any language. A lot of other languages like Java have struggled with this as well. But at least in Java you seldom use native dependencies. In Python you use them all the time (arguably, serving as glue code for crusty old FORTRAN and C libraries is where Python shines.)
Python's stubborn determination not to standardize on anything for packaging (we have easy_install, conda, pip, pipenv, poetry, and who knows how many others) as well as its insistence on breaking its own compatibility certainly don't help either. And Python is more hostile than most languages to integrating with the system package manager (RPM, deb, etc.) because of how it sprays its files around the filesystem.
Keras/tensorflow has some <= version specifications for modules it depends on. If there is a > version installed, pip install fails.
Being able to install multiple versions of the same module seems like a fairly basic feature for a packaging manager. And it would seem to me that having a shim executable to examine a profile of some kind for an app at startup, and load/install the correct version of python doesn't seem impossible.
Second that. Pip has been the default for more than a decade. And yes it does have issues. Why not fix it or add options to pip instead of piling on more tools?
Because for whatever reason Python really attracts the Start Over And This Time We’ll Get It Right crowd. It’s why I don’t start new projects with it anymore, but make most of my living cleaning up the mess that incompatible changes make with existing software. So, in that sense, it’s great for keeping devops employed.
Very good summary of the situation, I hope something good will come out of it.
I've used pipenv, pyenv, poetry and settled for my own use-cases on just pip with virtualenvwrapper.
So any program goes into its dedicated folder with requirements and virtualenv.
However there's something that I've never managed to get working.
You have say a 3.8 virtualenv, and package file mentions python should be >=3.8.
But when calling python -m build, the build process creates a new environment using the system python.
So if you are on an old system that is on Python 3.7, the resulting package cannot be installed because of the inconsistency between declared python version and the one that has been used.
It may be a very stupid problem, but I haven't been able to find any documentation for it.
Any search engine is just too happy to throw any 'python packaging result' before anything so specific.
Maybe I should use one other tool for that? But there's no clear (default?) path documented.
You could look into cibuildwheel, to have more control over what Python versions are used to build a wheel, but as the name implies not a great solution for running it locally.
Looking through the docs of pdm, Hatch or Poetry, I can't really find a definitive answer if they will use the Python version you specify. They all at least need to be able to locate the correct Python version in order to do so. It would be great if managing Python versions was something that came along with Python, so that these tools could rely on it more.
At the top is a "tl;dr" which makes some points that I think are good to keep in mind for any set of tools:
Pick from N different tools that do N different things is a good model.
Pick from N ~equivalent choices is a really bad user experience.
Picking a default doesn’t make other approaches illegal.
As someone who has had to take over the projects of other people, I will say the third point needs a caveat:
Picking a non-default tool needs a written explanation with sound reasoning
When there is a default, new members to the community will gravitate towards it and, if there isn't a good reason, will avoid learning other options.
In time, this ends up being anywhere from a minor annoyance to very frustrating, as not only are you losing time to learning, but potentially losing time to not having access to extras as the community around the default expands and your now special snowflake project goes without.
For very specific tools and libraries, it might be okay if they are easily interchangeable (but is apparently bad UX?) But for things with larger responsibilities - a full blown ORM, package manager, etc- it can end up being a millstone.
Packaging pure Python often boils down to where to place the packages, the hierarchy between them and where to find the starting point for an application (if it is one and not a library).
Insert native extensions into the mix and you get all kinds of issues.
Like, which library to link to? Which version? Which foreign function library to use? Will my Python version be compatible with that library? Et cetera.
Why does Ruby not have similar problems? (Or if it does, why does nobody seem to care?)
Like why does Ruby have gem and bundler, each doing one thing, whereas python has fifty bazillion tools that all do nearly the same thing if you squint but all have their own weird problems?
I've personally just ended up using poetry and that has mostly stopped me from having to care overmuch about the tooling.
I feel like I'm spoiled coming from Java land, where aside from too much XML, maven just works and if you need to do weird shit (like if you're android, for instance) then you use gradle, which still doesn't fuck up the maven repository format and we can all just live happily. (And nobody uses Ant anymore).
In some ways, it's actually pretty weird that Ruby hasn't had similar problems (I say wearing both my Ruby fanboy and Python packaging contributor hats).
In others, it's less weird: Ruby's upswing happened with Rails and a handful of other "killer" frameworks, which helped to solidify developer workflows (and expectations) around packaging. Python, by contrast, has had multiple generations of "killer" usecases, each with their own baggage (and many predating any real packaging standards).
The history of Python packaging is also much older, and much more devolved than anybody in 2023 would consider reasonable for a packaging ecosystem: the earliest generation of PyPI, for example, was just an index that pointed to other webhosts for downloads, rather than a full package host. This helped ossify manual workflows that developers at the time were content with, and some of that cost is still being paid forwards.
I think the main reason Ruby has avoided this (like most other newer languages) is the lack of a numeric/scientific ecosystem, which is where the real mess begins. Science code never dies (because the rewrite almost never implements everything the predecessor had, so now you need both the original tool and its "replacement"), so I have tools built from Fortran, C, C++, Tcl, Java, Python and now Rust to combine.
Python packaging has too many people inventing their own tools and their own little fiefdoms instead of picking one good tool and pushing it forward. There’s also the PyPA, which is an organization (or a group of semi-related volunteers, depending which interpretation suits them best) that maintains many of the tools (although none of the most feature-complete, namely poetry and PDM) and that produces standards that promote tooling proliferation.
Not a Ruby guy, but my feeling is that C extensions are more rare in that ecosystem. The majority of the Python woes feel directed at the proper way to package and compile non-Python code.
Nokogiri was historically a bit difficult to install but that was because of dependencies and not packaging. To overcome that limitation, nokogiri now has prebuilt binaries for most platforms.
Packaging in ruby using bundler is amazing. It also evolved naturally. In the beginning there was only `gem install`. Then came `bundler` with package list and lock files. It had widespread adoption and now it is shipped with ruby.
Ruby was actually the precursor to the solutions seen nowadays. It was thre first mainstream case of a packaging tool using lockfiles, and one of the persons behind it was also involved in the development of yarn (which brought lockfiles to node) and rust cargo.
Pradyun's point about unnecessary competition rings especially true to me, and points to (IMO) a hard reality about where the ecosystem needs to go: at some point, there need to be some prescriptions about the one good tool to use for 99.9% of use cases, and with that will probably come some hurt feelings and disregarded technical opinions (including possibly mine!). But that's what needs to happen in order to produce a uniform tooling environment and UX.