I see a lot of package managers I never head of, am a happy venv & pip user.
One of the key faults of pip is what happens when you decide to remove a dependency. Removing a dependency does not actually remove the sub-dependencies that were brought in by the original dependency, leaving a lot of potential cruft.
This is not really an issue if your virtual environments are disposable. Just nuke and recreate venv from scratch using only what you need.
This is similar approach to “zero-based budgeting”. It forces you to carefully pick your dependencies and think about what you carry.
I never mention transitive dependencies in my requirements.txt file, just direct dependencies and rely on pip to install all transitive libs.
You dont even have to freeze the version, just list the name and pull up latest version whenever you run pip upgrade
If you dont do that, you can quickly go down the javascript’s path of bloated node_modules.
Can people explain why venv&pip is a bad solution that doesnt work for them they have to resort to other package managers?
Even venv is not really required if you dockerize your python apps, which you will have to do anyways at deploy time
I do a lot of maintenance work and every time I've encountered a (complex) project set up in this way that's older than ~6 months, it's bitrotted from breaking changes in the dependencies.
The python ecosystem does not stand still and seems quite happy to introduce breaking changes in non-major versions.
To my mind, there are 3 ways to make sure your python project of today will work in 2 years time:
1. Have no dependencies
2. Destroy your virtual environment and reinstall it every day for the next 2 years and fix anything that breaks
3. Freeze your entire dependency tree where it stands and occasionally do large breaking change updates
IMHO the problem with python package management isn't python package management. The biggest problem I've hit is when python depends on non-python - `pip install` failing because I need to `apt install libsomething-dev` is the big one, but also managing multiple versions of python itself (I count this as a dependency on non-python because the python3 binary isn't written in python - if it was, then pip+virtualenv could manage it like everything else). And also as you note there's just ecosystem churn.
If we could reboot all of computer science, I'd want one unified cross platform package manager with the same market takeover level as Git, and all languages designed to be aware of it.
No file system overlay stacking or anything, no file system hierarchy of different folders for different types of resources, just python style search paths and virtual environments, and no global environment allowed, ever.
But Python development is pretty nice at the moment so I can't complain.
Is option 3 exactly what you’re supposed to do? Freezing your dependency graph and/or explicitly denoting what version of the dependency you want are your best bets for avoiding problems like this
A lot of people will assume that specifying major version upper bounds on dependencies is what you're supposed to do, but I've seen this fail more often than freezing dependencies.
The problem with major version upper bounds is that if it's possible to write a test case for a bug, it's possible to depend on broken behavior. Changing behavior in a way that breaks users should be a major version bump, but that's not actually how people use semver and semver isn't really described that way either. It's described in a way that makes people think that changes in type signatures are the predominant impetus to bump major versions.
I mention this ceiling pinning footgun in the article.
It's an enormous pain in the ass to explain to folks, and some software engineers I've met are totally incredulous that that's "not the right thing to do"
It is, but Python software tends towards large dependency graphs which will quickly accumulate CVEs, and so this strategy greatly upsets your security/audit people.
(Obviously patching vulnerabilities is good and proper, but automatically flagged CVEs are only ever _potential_ at best; in many contexts most of them are not actual vulnerabilities.)
Yes. But then any transitive dependency stays there forever even when not needed anymore.
Initial requirements (only first level, version ranges) and dependency resolution results (all transitive packages, exact versions or hashes) are two very different things and should be treated separately.
You can implement and maintain it by hand with two requirements.txt files but it's rarely done this way. And really at this level you're better off with a normal package manager.
fwiw, we built an oss plugin for poetry on option #3 when publishing for repeatable installs to automate translation of published wheel metadata from lock file, also supports mono repos.
What sucks is when you imply that your project will be unmaintained for 2 years, without any kind of update (not even basic security upgrades). This is a not a project, this is abandonware from day 1.
Obviously my profession gives me a ton of bias but in general it seems that for most people the squeaky wheel gets the grease and once a project is working it is ignored until it stops working again because there are only so many hours in the day and management has an infinite list of demands...
And then it breaks 2 years later and I'm called in because the person who wrote it left and no one else knows how it works because they didn't write any documentation etc etc etc
It's not like that simple 50 line Python script taking stuff from API A and stuffing them to API B is a full ass project with a JIRA project ID and a Product Manager.
It works, people forget it exists.
Then something breaks and I need to start fixing it. And the amount of work it takes to just update the packages is more than it'd take me to rewrite the whole thing in a language with a sane distribution mechanism (Go).
I go over the downsides in the article, but there is nothing fundamentally wrong with using pip and venv. It's just that if you've ever worked in other programming ecosystems, it would be immediately apparent that things could be significantly simpler and more reproducible.
Why does python need virtual environments? Why isn't it that you simply are in the correct environment when you're in your project folder? Why are there upwards of a dozen different config files for a project rather than just one standardized one? Why do you need to nuke your whole environment and recreate it just to remove one package?
These things are silly. They get in the way of reproducibility. They are security liabilities. There is a better way
> Why isn't it that you simply are in the correct environment when you're in your project folder?
I typically have several environments for my main project. Mostly I use a Python 3.12 version with the full set of optional third-party packages installed, but I also have a "clean" version with none of them installed so I can test error handling, as well as a Python 3.10 version as that is the oldest supported version for one of my third-party optional dependencies, plus a 3.8 version since I still support that version in my core system.
In that way I can quickly test code against the typical deployment cases before committing.
The project has about 500K lines of C code, mostly in one Python/C extension which takes over a minute to fully rebuild, but most changes are to Python code or other bits of Python/C or Cython code which are faster to rebuild, so I do editable builds for most of my work for the quick edit cycle.
I don't know how to nominate one of these environments as "correct".
(The full test suite against the various permutations of Python versions and 10 or so optional components is a mess. I've given up on testing all combinations as it wasn't proving worthwhile.)
How would that help? At the end I'm still using setuptools to build the extension, right?
So I'm stuck with the same core issues, but with a different front end?
To add to the complexity, I distribute my software as source, and I am wary of telling my customers to install another package in order to run the commands to install my package.
Virtual environments are cool, and necessary, but at the same time, they are incredibly limited and I always get frustrated at the lack of features they should have. They are too fragile.
Nuking a whole venv when you mess up isn't really efficient.
They aren't portable. You need to package up your editable project for an offline system? Too bad, virtual environments use hardcoded paths and symlinks that will be broken when you try.
Want to convert the packages in your venv back to .tar.gz or wheels? There's no way to do that either.
I've read "they're too/so/very fragile" so many times, but nobody has given me an example of how or why they consider them fragile, what steps do I need to do to break them.
Myself? I consider them ephemeral: I create my Makefile with a 'venv' target to delete/rebuild the virtual environment in case of changes. Some do take longer to rebuild, but with a global pip cache it takes much less time to rebuild once it's been done the first time.
Hardcoded paths and symlinks? Not a problem, I know that virtual environments don't travel; and since they're ephemeral if I need it at a different location I can always rebuild them.
Do you have an example for your third paragraph?
And I've done the last one, converted all packages to wheels, uploaded to an artifact server, and used those for production deployments.
I'm curious, really; not bashing you for having troubles with it, but I don't understand the aversion given my lack of blockers when using them.
Can I ask you a question, what do you think is better approach: 1) publish packages as wheels or 2) publish apps as docker images (or docker-compose files or helm charts)?
Why some People prefer 1 to 2? I think 2 is more “production friendly” and universal across other languages and stacks (same approach used for java js ruby etc)
Publish a module as a wheel if you want to be able to use pip to install it in your servers, or anywhere really.
Once you've built out your virtual environment you can build your docker image and push it to a registry.
Your helm chart should include references to docker images that are getting deployed to a kubernetes cluster, among other entities that need to be built out for your app to work.
Those are all different layers of the deployment, independent of each other.
If someone prefers 1 over 2 (or viceversa), they don't understand what they think they know.
The wheel, kept on an Sudafed repository, will let you lock down your supply chain, plus no need to rebuild every time it gets used.
I would build it once for whatever required architectures are needed, build wheels for them, amd use those for packaging up the final apps that need deploying (through docker, k8s, whatever).
Wheels can be production friendly if you know that you're only ever going to care about one specific OS family, and your app is pure python+stuff available on PyPi.
For internal use, where you can just pretend everything other than Debian based systems don't exist, it's ok, but I definitely see the value of docker, even though I like Snap a lot more.
I'm surprised nobody has ever made a "distro" where all the packages had a thin python wrapper and installed from a private Pip repository, so you could package your whole app and all dependencies with it.
Dockerfile and requirements.txt is everything you need to know to recreate environment, and this is how apps are usually ran in production. This is how you can create consistent recreatable and testable env and carry it from dev machine to test, staging, CI/CD, and prod environments
Never really had an issue with python apps not being portable.
Building regular python app? Just use official python docker image.
Using advanced CUDA stuff with nvidia cards? Just use nvidia’s docker image and forget about fiddling with configuring and compiling dependencies. It is all as easy as carrying dockerfile and requirements.txt
> They aren't portable. You need to package up your editable project for an offline system? Too bad, virtual environments use hardcoded paths and symlinks that will be broken when you try.
This isn't 100% true. If you carefully use the same python version and the same path for your venv and python then copying the venv between computers works perfectly. I've done it many times, actually last time I changed my computer I copied over all my projects and venvs and everything worked flawlessly.
Pip will cache the downloaded package tarballs to make repeated installs faster at least, but that's little help when you have a dozen venvs with multiple gigabytes of pytorch.
> Just nuke and recreate venv from scratch using only what you need.
That's great until you have dynamic dependencies which means you can run apparently happy yet fail later on an import. This means you have to take special care generating your dep list.
> I never mention transitive dependencies in my requirements.txt file, just direct dependencies and rely on pip to install all transitive libs.
> You dont even have to freeze the version, just list the name and pull up latest version whenever you run pip upgrade
This gets you into situations when you need to release an urgent one-line hotfix into production but your docker builds are broken because some version changed. Stable reproducible build are an absolute must if you ask me.
> Can people explain why venv&pip is a bad solution that doesnt work for them they have to resort to other package managers?
It's very simple:
1. As you said yourself, you don't need venv at all.
2. The format used by `pip -r` (colloquially known as "requirements file") is not a defined standard.
3. pip is not a package manager, but a package version resolver and installer.
Pulling the latest version without freezing is how you Break Stuff. Every python developer seems to be addicted to breaking changes.
Nuking and recreating, freezing, etc are all easy minor tasks but nonetheless add a few lines each of code.
Auditing for security issues would probably need some other package or other.
And with virtualenv, you can have an activated environment. It's stateful and your scripts have to then take into account state. Poetry run doesn't have that issue.
That's enough minor annoyances to make me not want to ever use anything but poetry plus the audit plugin and freeze-wheel if needed.
> This is not really an issue if your virtual environments are disposable. Just nuke and recreate venv from scratch using only what you need.
There was (or still is, I wouldn't be surprised!) a time when the AWS CLI and the AWS Elastic Beanstalk CLI had incompatible versions of some dependency and couldn't be installed in the same venv. Not fun.
I also use venv and pip and I'm happy with the solution. I don't get the knock on JavaScript node modules though. Why wouldn't you want to list out your dependencies and their versions? It can be greatly helpful and I don't see how it contributes to bloat
I use the same workflow, and works for me very well.
It would be cool, though, to have a wrapper around pip that adds any packages I install to the requirements file. That way I don't have to do it manually.
I actually would love something like `pip generate-reqs` where pip goes through every .py file in my app and looks at whats being imported and writes these libraries to requirements.txt, together with all transitive dependencies from the currently installed `pip list`
pip is so easy, but unfortunately i've found that if you add package signatures to requirements.txt, pip chokes on it when installing it later. And subdependencies aren't always named perfectly, e.g. they might specify ~=1.4, and a subdependency that what was once 1.4.0 is now 1.4.27, and incompatible or compromised.
conda is so heavyweight installing whole pre-approved builds. and the command line options I find extremely frustrating.
I need supply chain security and perfectly reproducible builds, so poetry was the only real option.
wouldn't `pip freeze > requirements.txt` solve your problem? It will list everything currently installed, including transitive dependencies at currently installed and working versions
I concur and I also think that there are too many build backends.
pdm is my current favorite package manager. It is fully PEP-compliant and the lockfile generation is nice. I wouldn't call hatch a package manager because I don't think it can make lockfiles.
uv is on my radar but it doesn't look ready for primetime yet. I saw they are building out a package management API with commands such as `uv add` and `uv remove`. Cross-platform lockfiles, editable installs, in-project .venv, and a baked-in build backend might be enough for me to make the switch. It's my pipe dream to get the full build/install/publish workflow down to a single static binary with no dependencies.
Anna-Lee Popkes has an awesome comparison of the available package managers [0] complete with handy Venn diagrams.
The pyOpenSci team has another nice comparison of the available build backends [1].
Another user of pdm here for professional projects. It sure is more standards compliant than poetry. Support for in-project venvs and integration of configs for packages such as pytest is quite useful.
When evaluating package managers, poetry for sure was a contender. However listening to others experiences regarding poetry developers introducing breaking changes that could potentially cause the CI pipeline made it a no go [1]. uv seems to be coming along rather nicely, but wasn't anywhere near the level of stability compared to pdm during the evaluation phase.
pdm is my go to as well. I've decided to really jump on the pyproject.toml train and pdm plays very well with it. .venv by default is pretty nice as well.
Anything, and I mean anything, is better than pipenv though.
I use pip. I plan to continue using pip. If I need an isolated environment, I use conda, but then I install everything with pip. If I need to guarantee versions I pip freeze.
There's a lot of cruft and desire for a one-size-fits-all solution but the base tools are probably good enough. My setup is not the one-size-fits-all solution but it works for me, and my team, and lots of other teams.
Beware anyone who tells you that thirty years of tooling doesn't have a solution to the problem you're facing and you need this new shiny thing.*
*Playing with shiny things is fun and should not be discouraged, but must not be mandated either.
BUT importantly... it REALLY does not work well for lots of teams. For me, this setup has caused production outages multiple times across multiple teams. Maybe the root python ecosystem should learn and adopt from other ecosystems that have figured out complex deployment in a much easier way.
I've seen people use that setup but not freeze the dependencies and so have errors in production that didn't exist in development and waste days trying to figure out what was going on.
I've seen people use this setup and then struggle to deploy in different environments, especially when a dependency updates and no longer works correctly on a particular device, or where there are differences in behaviour or packaging or something in two different machines.
I've seen people accidentally install packages locally and not add them to the requirements file (especially when they're less experienced with Python), and cause outages by having the application crash on startup.
I've seen people freeze the dependency list and then have excess dependencies floating around because they couldn't differentiate between dependencies that were being used, and dependencies that were previously transitively installed and no longer needed. This doesn't necessarily cause outages, but does slow everything down over time, either in continual package maintenance or in downloading excess packages.
Most of the time, when I've seen teams use this sort of "simple" packaging process, they end up writing a bunch of scripts to facilitate it (because it's rarely so simple in practice). I have seen these scripts fail in almost every possible way. Often this happens in a development environment or before production, but I've seen production issues here as well.
To be clear, I think there are some situations where .venv and requirements.txt really are all you need. But I don't think going down that route removes complexity or makes things easier. Instead, it means you need to manage that essential complexity yourself. There are sometimes advantages to that, and reasons why it might make sense to take that option, but they are relatively rare. And given that pip/venv are right now the most official way of handling packaging in Python, that raises a massive red flag for the entire ecosystem.
The problem mentioned with pyenv is that people accidentally develop/test on the wrong version of python itself. But that's specific to pyenv, and I don't actually see where the article discusses problems with venv. So again: What exact steps would a team take using just pip+virtualenv or pip+conda (the comment you responded to didn't mention pyenv or venv) that would lead to production outages?
It feels like you've determined there's nothing wrong with pyenv, pip, and virtualenv so any issues brought up, you will reject.
If that's not the case, here's the issue - someone used pyenv and did not exactly specify the python type - I believe we were on 3.9 and prod was 3.9.11 and the current python version was 3.9.12. There was a downstream package that had an OS dependency - I believe it was pandas - that conflicted and needed to be updated locally to work with 3.9.12. This broke and raised an error in production that was not reproducible locally - and when you deploy on AWS, reproducing can be a pain in the butt. I'm sure if the data scientist had used perfect pyenv, virtualenv, and pip commands; we would have caught this. However, they're very complicated - especially for people who focus on math - so requiring full knowledge of these tools is unrealistic for most data scientists.
> It feels like you've determined there's nothing wrong with pyenv, pip, and virtualenv so any issues brought up, you will reject.
Alternatively, I'm rejecting your claims because you keep making them and then not providing evidence. Now that you've actually described the problem, I can agree that that's a footgun, and pyenv should start to strongly discourage setting a global version in much the same way that pip has started to protect against people using `sudo pip install` to trash their systems.
This is what I do essentially. I make a new conda env for each project and use pip or conda install. What if I have a new project that needs components from two projects? Sometimes there will be impossible to solve dependencies when trying to use both components. Its not feasible to dive into each dependency within each dependency to figure out how to resolve them.
Rust's package manager, cargo, is able to handle this by allowing multiple versions of libraries to be installed in a single environment. Why can't python do that? How can one solve this with conda/pip or any currently available python tool? I've given up and decided to use websockets between different python processes from different environments.
I used conda once... Once. It broke my python installation and then when I went to fix it by upgrading python it broke my Debian system. All these layers in software is really starting to feel like a house of cards to me at this point.
I do the same. I see that using conda outside data science is frowned upon, but in my experience, is the closest I can get to a docker container. Freezing the python version and its dependencies in a conda env has saved me a lot of trouble when coming back to old projects.
I have worked with poetry professionally for about 5 years now and I am not looking back. It is exceptionally good. Dependency resolution speed is not an issue beyond the first run since all that hard to acquire metadata is actually cached in a local index.
And even that first run is not particularly slow - _unless_ you depend on packages that are not available as wheels, which last I checked is not nearly as common nowadays as it was 10 years ago. However it can still happen: for example, if you are working with python 3.8 and you are using the latest version of some fancy library, they may have already stopped building wheels for that version of python. That means the package manager has to fall back to the sdist, and actually run the build scripts to acquire the metadata.
On top of all this, private package feeds (like the one provided by azure devops) sometimes don't provide a metadata API at all, meaning the package manager has to download every single package just to get the metadata.
The important bit of my little wall of text here though is that this is all true for all the other package managers as well. You can't necessarily attribute slow dependency resolution to a solver being written in C++ or pure python, given all of these other compounding factors which are often overlooked.
I will! I'm sure it's faster when the data is available. But when it's not, in the common circumstances described above, network and disk IO are still the same unchanged bottlenecks, for any package manager.
In conversations like this, we are all too quick to project our experiences on the package managers and not sharing in what circumstances we are using them.
I ship a lot of Python in a CI/CD or devops context, and also deploy it to embedded targets. I've never needed anything more than pip, a venv, and pip-tools (to provide pip-compile). Venvs are treated as disposable. The basic workflow I use is:
One-time (or as-needed for manual upgrades):
1. Make a venv with setuptools, wheel, and pip-tools (to get pip-compile) installed.
2. Use venv's pip-compile to generate a fully-pinned piptools-requirements.txt for the venv.
3. Check piptools-requirements.txt into my repo. This is used to get a stable, versioned `pip-compile` for use on my payload requirements.
During normal development:
1. Add high-level dependencies to a `requirements.in` file. Usually unversioned, unless there's a good reason to specify something more exact.
2. On changes to `requirements.in`, make a venv from `piptools-requirements.txt` and its `pip-compile` to solve `requirements.in` into a fully-pinned `requirements.txt`.
3. Check requirements.in and requirements.txt into the repo.
4. Install packages from requirements.txt when making the venv that I need for production.
This approach is very easy to automate, CI/CD friendly, and completely repeatable. It doesn't require any nonstandard tools to deploy (and only needs pip-compile when recompiling requirements.txt). It also makes a clear distinction between "what packages do the developers actually want?" and "what is the fully-versioned set of all dependencies".
It's worked great for me over the years, and I'd highly recommend it as a reliable way to use the standard Python package tooling.
This is the only way I know to get it right with (almost) raw pip. Surprisingly it's not widely known or documented (as can be seen in comments here too), so you need to train the team to use it and then to enforce it. This is easier to achieve with more integrated tools.
# direct dependencies
"build >= 1.0.0",
"click >= 8",
"pip >= 22.2",
"pyproject_hooks",
"tomli; python_version < '3.11'",
# indirect dependencies
"setuptools", # typically needed when pip-tools invokes setup.py
"wheel", # pip plugin needed by pip-tools
]
pip is nice since it comes out of the box with Python. setuptools used to but now it's gone.
The dependency explosion is a huge problem for a lot of people trying to lockdown the security and maintainability of their codebases. I think that's why a lot of people are rallying around Astral's projects like uv and ruff... they do so many things and they do them well.
The nice thing about my approach is that pip-tools is only needed when modifying your requirements, and it also lives in a separate venv/requirements.txt from the dependencies you actually care about.
So yes: you need those dependencies in a development context, if you're actively modifying your Python requirements. But they don't make their way into your production requirements.txt, and don't need to be installed anywhere other than a short-lived venv.
Use Rye. It wasn't abandoned it ownership was transferred.
Rye uses other pretty standard stuff under the hood, tools that follow PEPs, its just a front end that is sane. uv is fast as well. It downloads the pinned version of standalone Python, it keeps everything in its own venv and theres very little messing/tweaking of the environment.
It is messy, although its getting better. I doubt everything will ever standardise to one tool however.
The people who say "just use pip and venv" don't understand the issue.
Distutils has been ripped out of Python core, setuptools is somewhat deprecated but not really. Just don't call setup.py directly. Or use flit. Or perhaps pyproject.toml? If the latter, flit, poetry and the 100 other frontends all have a different syntax.
Would you like to copy external data into the staging area while using flit? You are out of luck, use poetry. Poetry versions <= 1.2.3.4.5.111 on the other hand do not support a backend.
Should you use pip? Well, the latest hotness are the search-unfriendly named modules "build" and "install", which create a completely isolated environment. Which is the officially supported tool that will stay supported and not ripped out like distutils?
Questions over questions. And all that for creating a simple zip archive of a directory, which for some reason demands gigantic frameworks.
And if you want to build a Python/C extension, perhaps with a codegen step as well, then most of these modern tools throw their hands in the air.
I had to hack my setup.py because PEP 517's fascination for completely isolated environment mean even a single typo fix in a Python comment meant a full re-compile of all 500KLOC in my main Python/C extension.
Setuptools creates temporary directories for the .o files, and passes them to the build command. I had to override that to revert to the old behavior:
And reading your post, I still do not understand your issue. Is this because you can do various stuff in various way ? Would you like a single path, which must be followed ?
> The people who say "just use pip and venv" don't understand the issue.
The people saying that say that because it works for them and it solved their problems.
> setuptools is somewhat deprecated but not really
setuptools remains the default and most popular backend (the thing that builds your package). What was deprecated was calling it directly, instead you use pip or build or any other frontend.
Why? One reason is because it exposed lots of non-standard eccentricities that users would then start to rely on, breaking the effort of standardisation.
> Or perhaps pyproject.toml? If the latter, flit, poetry and the 100 other frontends all have a different syntax.
They all use toml syntax, they all use the same build dependency standard, poetry does not use PEP 440 configuring version dependencies, but it does read PEP 440 dependencies from wheel metadata fine. I hope one day the version dependency spec can be updated with everything that has been learnt, from both Poetry and the rest of the ecosystem, but Python packaging is a slow moving beast.
> Should you use pip? Well, the latest hotness are the search-unfriendly named modules "build" and "install", which create a completely isolated environment.
pip does this for you though, I'm not sure why most end users need to know "build" or "install" exist.
> Which is the officially supported tool that will stay supported and not ripped out like distutils?
I'm not sure why you think any of these are "the officially supported tool", they are an attempt by various PyPA members to minimally and correctly implement the standards, but PyPA is a loose collection of volunteers.
Whereas, CPython core dev have a pretty clear arms length attitude towards all of packaging. This for me feels like the real contention in the Python packaging world.
> Questions over questions. And all that for creating a simple zip archive of a directory, which for some reason demands gigantic frameworks.
Well the people using pip and venv didn't have any of these questions, they just got on with it because it solved everything for them.
Likely you want your package to figure something out as it’s being built, or have interesting metadata, or include binaries, or something else not on the happy path, and you want your build tool to "just work" even though chances are you’re doing something not that simple or what the tool creator thought. Otherwise, if you really just need a “simple zip archive of a directory” just write out the directory and call python -m zipfile.
There’s nothing wrong with lots of people needing specific use cases, but it does mean all these build tools need to add lots and lots of options to accommodate them all, and before you know it you're complaining of a "gigantic framework" (even though I wouldn't describe either setuptools or hatchling like that).
> pip does this for you though, I'm not sure why most end users need to know "build" or "install" exist.
Because at least at one point in time "build" was marketed as the future, it even appeared in a setuptools warning message. This developer spends several paragraphs on the issue:
If it is no longer the future, so be it. Pip seemed to force isolated installs by default for a while and then backed off again.
As for the volunteer question: All OSS package managers are maintained by volunteers and I do not know of any other ecosystem that prompted an xkcd comic about the packaging situation.
Wait, so build isn't the future standard/"happy path" anymore? I thought setup.py will inevitably be phased out. Do you have any links or discussion that point to that change?
Poetry isn't perfect but I'm happy to have it. It installs most packages without issue and effortlessly handles multiple versions of Python on the same system.
In my experience, Poetry works much better than, say... ... npm.
Can you describe what issues you've had with NPM that you haven't had with Poetry? In principle, they should be doing the same thing - both use a per-package environment and a lockfile, allow you to run commands in the environment, ensure the checked-in dependencies remain in sync with the local dependencies, etc. The main difference is that Python's ecosystem only allows one version of a dependency to exist at a time, but Node's ecosystem can handle multiple coexisting dependencies.
I like poetry but I’m not sure the Python ecosystem is disciplined enough for it, all too often I get stuck trying to resolve dependencies between packages which should be compatible but the dependencies weren’t defined well
I share the author's excitement about `uv`. I was a big fan of `pip-compile` because it was the simplest possible way to have clear top-level dependencies that also froze sub-dependencies (you note your top-level dependencies in `requirements.in`, then use `pip-compile` to freeze those plus the sub-dependencies in `requirements.txt`, and it adds comments noting what top-level dependency brought it in).
Python actually now has over a dozen package managers, none of which are as good as Cargo in Rust. Here's my attempt at a nearly comprehensive rundown.
Maybe it's just a joke, but for those that don't know that is Perl's motto and even better because there's the venerable, wonderful CPAN [0] which is the de facto package repository for the language.
It's certainly a joke, because the Python original motto was "there's only one way to do it", and the current one is "there's only one (obvious) way to do it".
The Python's motto was created as an obvious reference to the Perl's one, purposefully negating it.
I've found Nix to be basically the be-all-and-end-all of making Python work reasonably. Mitchell Hashimoto put out a post about how to actually do that a while ago [1].
Happily using pip, venv, and pip-tools for every project and still finding them more than suitable. They might not have the marketing budget or pizazz of others, but if you're looking for effective and boring tools that get the job done so you can solve more interesting problems they work just fine.
Additionally I use pyenv and pyenv-virtualenv to manage multiple Python versions. Different venvs pointing to different Python versions etc, but the core tools are still pip, venv and pip-tools. (I do not use conda)
I publish a Python SDK with about 85k monthly downloads according to PyPI. I make sure to run tests (unit tests, type checking, integration tests) against all currently supported minor versions of Python (3.8 - 3.12)
I think pyenv finally solved the versioning problem for me.
Additionally, being able to set a Python version or virtual environment per-directory with "pyenv local" which has eliminated having to remember which venv I was using on a project directory, or remembering which convention for venvs I used for a project.
And "pyenv shell" is also handy for temporarily changing the default Python version of your current shell session.
Or changing the global default using "pyenv global" - for example I currently have this set to "3.12.3", even though I usually test the SDKs I build against the oldest version I have to support first - the latest 3.8.X
On Rye: "This project was ultimately abandoned by its author in 2023 and given to Astral.sh in favor of supporting uv instead"
I don't think that's quite the right way to frame this. Handing Rye over to a company that could maintain it full time isn't the same thing as "abandoning" it - and the new maintainers are active on that project: https://github.com/astral-sh/rye/commits/main/
I made https://pip.wtf, which is a "god damn it, I'm doing this myself" alternative for single-file scripts that just need some basic deps. You paste some code into your script and then it installs dependencies to a local directory.
Nice! The only criticism I have of this is needing to figure out the obscure shell quoting rules. But it is pretty easy to follow the DIY spirit and replace the os.system call with subprocess!
> Naturally this led to a proliferation of new Python package managers which leverage the new standard. Enter poetry, PDM, Flit, and Hatch.
An important qualification: Poetry uses pyproject.toml, but it doesn't use the standard (i.e. PEP 518, and 621) metadata layout. This in practice means that it doesn't follow the standard; it just happens to (confusingly) use a file with the same name.
To the best of my knowledge, the others fully comply with the listed PEPs. In practice this means that the difference between them is abstracted away by tools like `build`, thanks to PEP 517.
I disagree with the title, but it's an okay rundown of the various package managers. Wish the author had tried out hatch though since it seems good. Also rye is not abandoned. You can see the repo has updates within the last 24 hours for a new release. I think they want uv to eat rye, but that hasn't happened yet.
My current favorites are uv + mise. Handles lockfiles, multiple versions of python, and it's very fast since uv is very fast. Have not tried pdm or hatch though.
> As part of this release, we're also taking stewardship of Rye, an experimental Python packaging tool from Armin Ronacher. We'll maintain Rye as we expand uv into a unified successor project, to fulfill our shared vision for Python packaging.
uv is transformative because it is (1) correct and (2) crazy fast.
I wanted to like Poetry but the performance is atrocious. On the other hand, with uv, I feel like I can always build a new instance of a system whenever I want, whereas with conda or poetry I might have to wait ten minutes.
I really like Nix for Python, as long as the packages I need are already packaged. Otherwise I'll use pip and venv to try stuff out. Can't stand conda.
If you haven't yet, check out https://devenv.sh (super powered nix shell and more). It's pretty nice for python packages and installs your requirements to a project local venv for you via whatever tool you want (pip, poetry, uv etc).
I've been using it for a couple of years and it's super nice to be able to manage both python and "native" dependencies, and other non-python development tools all together.
I used just nix and whatever python packages are already in nixpkgs for several projects. And that works really really well until you run into an issue with compatibility like I did. It seems to mostly happen when some extremely common tool like `awscli2` depends on a specific version of some package and so it's pinned.
Package management has been a problem almost every time I have dabbled in Python. This is a great overview of the situation and will save me time the next time.
It didn't used to be like this. I remember first starting in Python, around a decade ago, when Python3 has been out a long time but everyone still wrote in Python2 for some reason. You used pip, that's it. Everything was easy.
There were very good reasons to make some of the changes that have been made, but I think that big switch kind of normalized breaking changes in the python world. At this point we have a catastrophe on our hands. A language and it's tooling is supposed to get out of my way, it's supposed to be the tool by which I express my intention, not be a thing I have to tinker with all day. There's room for being opinionated, and there's room for upgrading things, but if I need to follow 10 RSS feeds just to keep up to date with changes, if I'm arguing with my colleagues about which of a dozen ways to use a language is best, something has gone horribly wrong.
At my last job, we lost days if not weeks to "solving environment" due to poetry and many folks not understanding that ceiling pinning is BAD when you use a tool like poetry to manage your deps.
Seriously, conda can be pretty annoying for packagers. I don't like the Jinja configuration stuff, I don't like the chaotic build output (see above), sometimes libraries are leaking in from the system, for some undocumented issues it takes ages to search for an answer.
One everything is set up, things tend to work, apart from the slow solver. I also do not like the way that conda embeds itself in the Windows installation.
It is a bit overhyped, like all things in the "scientific" ecosystem.
ML engineer, I am up to 3 irreversibly broken environments after simply adding a package. Conda is inexcusably slow and but even mamba can save it. Every conda project I have ever was later switched to poetry.
Here's a fun challenge: try to determine channel priority in your conda env, go ahead try.
Conda is the single worst packaging tool I have ever used.
I use Python but I’m not a professional Python programmer. I use pip and venv, with requirements.txt. It copies Python binary and some of the shared standard libraries from the system, installs the required packages in sites-packages, and it always works fine.
Am I missing anything major by not using conda and Poetry?
That's generally fine until the build breaks due to a transitive dependency several levels down releasing a breaking change. While requirements.txt may hold enough information to build the project now, it is not guaranteed to work in the future -- you would need lockfiles for that guarantee.
I've seen projects sort of emulating this by using "requirements.in" to list direct program dependencies, and auto-generating "requirements.txt" using "pip freeze". But when you find yourself doing that, it's probably time to switch to Poetry.
I thought pip freeze > requirements.txt lists direct dependencies, as well as the transitive dependencies, and includes exact version numbers. From system, it copies the relevant binaries.
In the future, pip install requirements.txt will install the exact same set of packages at those pinned versions. Unless the system components drastically change, the build is deterministic and project should run in the future.
Sure, in the future if you change the version number for one package, the build might break, but that’s expected.
It looks like the lock files specify the dependency graph, hashes of each package, URLs, and metadata about the system, so they do better when a package is updated.
Why vendoring is not a common practice baffles me, especially since the leftpad incident happened over 8 years ago.
For Python, you can use `pip wheel` (https://pip.pypa.io/en/stable/cli/pip_wheel/) to download .whl files of your dependencies in a folder, add that folder to your version control, and update `sys.path` to include your .whl.
For updating packages, you run `pip wheel` again and check in the new .whl files after carefully review the changes.
There is also virtualenvwrapper. It’s quite handy to create, list, remove virtual environment. I prefer to store all venvs in ~/.cache/virtualenvs instead of .venv in project directory, makes it more clean, no need to exclude for backups or git repository.
Notable omission in pip-tools which many are suggesting here as being simpler: it can't write requirements files for multiple environments/platforms without running it once for each of those environments and having one file for all of them.
We settled on Poetry at the time but it has been quite unstable overall. Not so bad recently but there were a lot of issues/regressions with it over time.
For this reason I am happy to see new takes on package management, hopefully some of these will have clearer wins over the others, where you have to spend ages trying to figure out which one will do what you need.
One thing this blog does not question is whether package managers are needed at all. Look at how Deno does it: packages are imported from URLs and downloaded to a local cache at run-time. This allows CDN-based package distribution, but you still need a package search engine.
Composer did the right thing: it doesn't handle binaries, SAT for solving a _single_ version for the entire project tree and overall focus on simplicity.
You need to handle PHP extensions outside of it. Composer can only warn you if your PHP lacks some compiled part, but won't act on it. It's good separation of concerns (and phpize/pecl already handles extensions very well).
You also don't have multiple copies of the same package at different versions. Composer requires that a single version must be compatible. This sounds like a nightmare that would break easy, but it actually works by forcing package providers to be more backwards-compatible.
That being said, we often don't use Composer to install tools in a machine, like we do with Python. Composer is there mostly to build standalone projects to be deployed elsewhere.
You can `composer global install` stuff, but you often don't need to. I absolutely see this as a win. Tools for my machine should be installed by my OS package manager, not some language specific thing.
I think this problem comes from ideas we had before that were well intentioned. The idea that led to all of this is, what if we could save developers a ton of work and users a ton of disk space by packaging libraries for them to use? And then we have a ton of problems as a result of this decision. C is now your system's API, mismatched dependencies, a gazillion package managers.
And then what solution do we have? Virtual environments, virtual machines, docker and appimage. We package all dependencies and even entire operating systems so as to avoid all these problems. It's legacy support all the way down.
From scratch, I'd say, devs should just pull all dependencies into their code and package them with their product. Users should never even have to touch something like pip, or a virtual environment. A package manager that allows a developer to publish tools others can use to build code, but that packages the dependencies with their package instead of pulling it for users, would be ideal. Where possible, avoid dependency on anything external entirely. What's that XKCD about yet another standard? I know it will never happen, but I sure do wish it worked like that.
Cargo is strictly worse than any of the solutions for python. Doing even the simplest things seems to involve pulling down and compiling hundreds of dependencies.
This is... Not a bad thing? Compiling your dependencies into your standalone executable is like butter, no dependency hell, no virtual environments or containers to avoid said hell. You release a program that has everything it needs to run, it's as good as it gets IMO.
The problem that the resulting program ends up too large for my users to download, because it depended on libtwoplustwo that in turn depended on libkitchensink.
The rust compiler does not include in your binary any unused code. Unless libtwoplustwo uses the entirety of libkitchensink, the latter is not entirely included in your binary.
This is not true for python, or external libraries, or appimages, or docker containers.
Not quite, it is pretty trivial to check what external dependencies your program calls, and what those dependencies call and so forth. I'm sure you could write a program that has a dependency chain that requires the entire database of packages in cargo (and that would he an interesting experiment). A lot of modern programming languages evaluate what code is referenced in a working program and only include used code, and the compiler enforces writing your program in such a way that it doesn't have to go over infinite possibilities to check. Of course, you have to have the entire dependency to compile for each import (and so would your user if you only distribute source code) to reference the first problem you mentioned above, and I'm sure there are ways to trick the compiler into endlessly checking dependencies or evaluating references, probably using unsafe, but that's why the compiler won't let you compile with unassigned values and has ownership and scope, you have to write your program in such a way that the compiler can tell what code will be executed at runtime or it won't compile.
I program that is Turing complete cannot be reasoned about unless you can solve the Halting problem. So you cannot decide at compile time if a dependency that is imported is actually needed or not. In rust even the build-scripts are Turing-complete rust-programs, which makes this even harder.
This is similar approach to “zero-based budgeting”. It forces you to carefully pick your dependencies and think about what you carry.
I never mention transitive dependencies in my requirements.txt file, just direct dependencies and rely on pip to install all transitive libs.
You dont even have to freeze the version, just list the name and pull up latest version whenever you run pip upgrade
If you dont do that, you can quickly go down the javascript’s path of bloated node_modules.
Can people explain why venv&pip is a bad solution that doesnt work for them they have to resort to other package managers?
Even venv is not really required if you dockerize your python apps, which you will have to do anyways at deploy time