This makes me so happy. Back when we had Jenkins slaves, one of our devops guys set a pipeline up that pip installed different versions over the top of system packages causing weird intermittent failures everywhere. Different pipelines would be running in different requirements files. I revoked sudo privs immediately for Jenkins (I didn't add them in the first place) and reprovisioned the whole build cluster resulting in pipelines breaking consistently where they should have been breaking: trying to do stupid stuff.
Personally I only ever use the system python packages on Linux if I can get away with it. Saves a whole world of problems.
> Back when we had Jenkins slaves, one of our devops guys set a pipeline up that pip installed different versions over the top of system packages causing weird intermittent failures everywhere.
Not everyone might like containers, but using them for CI seems like a good way to avoid situations like this, at least when viable (e.g. web development). You get to choose what container images you need for your build, do whatever is necessary inside of them and they're essentially thrown away after once you're done with what you need to do, cache aside. They also don't have any significant impact or dependencies on the system configuration either, as long as you have some sort of a supported container runtime.
Second containers (however controversial they may otherwise be). The official python Alpine container is < 20mb and the slim Debian variants are < 45mb. For many of my Python projects I end up needing to install various system dependency libs like CUDA, libav, libsndfile, etc. I tend to be a "fan" of containers generally and containers seem like the best way to handle situations like mine.
Nowhere in the official Python documentation (where 99% of new python users are going to go) does it warn or even talk about Linux and Debian specific issues like only using apt packaged versions of dependencies. It wasn't even until recent years that pip gave a hint or warning something might break in those setups. The situation with Python on Debian has been pretty bad IMHO with a cloistered group of people saying the status quo is just fine because it works for them exclusively.
"The situation with Python on Debian has been pretty bad IMHO"
The situation with anything has been pretty bad in Debian lately.
I'm all for the minimalistic approach in regards to Python. It's Ok to provide only the packages needed by applications and the core system. For everything else, there's pip.
EDIT:
I've meant to say, there's pip inside a venv.
I agree, the problem is if you just 'pip install foo' on Debian, like 99% of python docs and readmes say, it instantly fails. Worse yet if you think you should just elevate to root and force pip install to work you can potentially break the Python install. It's a nasty footgun.
Mixing pip with another package manager has always seemed weird to me. You're just asking for things to conflict and break.
I noticed with Homebrew that there was no way to untangle packages installed through pip and ones installed through Homebrew. After dealing with that mess once, I now make sure to use pip install --user. It can still cause things to break, but if that does happen it's at least easy to nuke the packages installed to my home directory.
Good. Now we just need to get pip itself updated so it refuses to run outside of a venv, and refuses to run unless invoked with "python -m pip" and we'll finally have something at least half decent.
And don't even get me started about how much better npm is at publishing packages, versus pip's refusal to add the same user friendliness.
PEP 704 is a recent proposal to require a virtual environment by default for any package installer - https://peps.python.org/pep-0704/. Again, you can opt-out if you want.
You could have a container whose entry point is a shell script that calls multiple Python programs that need different environments, or a multiprocess container that runs multiple Python programs, although I guess you could still address either by breaking down your containers differently.
> There’s no shortage of package management alternatives available for Python [...]
> How someone is meant to pick between these as a new developer is a mystery.
This.
Every time I get booked to look at some Python project hours are usually wasted initially figuring out what dependency mgmt solution was used how.
And with what 'special sauce' the resp. developers deemed to be 'the right way' (or some library required because ... it just does)
As the author wrote: it seems common to omit the dependency setup in the Readme for Python projects.
I can understand why one would not mention this 'step' in a Rust or Node project but for Python it seems very much necessary.
I outright look for alternatives for something when the search comes with something written in python; the well-accepted strategy for deploying Python seems to be "abandon all hope of deploying it yourself in a way where updating is easy and hope docker container someone did that dealt with this mess will be enough.
Huh? What stack? Python and Ruby take the same (I think better) approach of being written web server agnostic which requires you do some packaging work with your preferred wsgi/asgi server but after that it’s like everything else. In your container, copy all your shit over, install deps, pass envs to talk to external services like postgres/redis, run migrations, run server. Updates are just build the container again with the new version and run it.
I’m too lazy for that so in my own stuff I embed the web server in the project itself and start it programmatically (same with the migrations) so there’s less setup.
If the issue is the Docker container then that’s not really much to do with Python but that pretty much all software is written with that deployment strategy in mind. Those single file no libc statically compiled binaries are that way to run on a from scratch container.
I appreciate the tools that release a self-contained executable using something like PyInstaller. I don't have to worry about dependency issues and it runs without needing a whole Docker container.
The author mentioned venv in the article. Also I believe the parent comment is talking about the difficulties of choosing between different dependencies management solutions, venv among them, rather than the lack of (a good) one
Why cause yourself difficulty by drifting towards optionality vs. using the ops suggestion and using venv?
This topic gets posted to HN far too often - I'm starting to think people are deliberately avoiding venv for some reason, because otherwise it's a perfectly capable system for package management.
Yes the article mentioned venv. And the parent said it's hard to choose. But the choice is easy: just use the basic built-in one. (Until you have a reason to use something else.)
It's a good general philosophy for software engineering: don't add stuff without good reason. There really is the potential to add infinite stuff these days - "awesome tools" and "best practices", without end. Individually they can help with particular problems you may have, but together they make a mess, and distract focus from the particular purpose of your software.
>> While pip alone is often sufficient for personal use, Pipenv is recommended for collaborative projects as it’s a higher-level tool that simplifies dependency management for common use cases.
There are many, but it recommends one. I don't think any reasonable person will actually go out and try all 7(?) of them.
Which is basically the same advice. If you don’t already know what you want start with pip and requirements.txt and when you hit pain points: wanting to soft lock transitive dependencies, automate venv setup for onboarding, locking packages by hash, package caching, bundling Python itself with your code, etc. then there’s probably a tool out there tailored to your use-case.
I'm convinced that there are very few python libraries that Just Work if you follow their installation instructions. I've never found one that didn't come with issues myself.
Complain about this to a Python dev and you'll be "Well actually"ied to oblivion and each and every one will have their own opinion-as-fact on the best practice for managing these -- totally unaware how antithetical Python development has become from The Zen of Python.
Yeah, the well-actually's are a problem. It's not all of us though.
Python dev's know we have a problem, it's just hard to fix because "people developing apps and worrying about dependencies" is a rather small part of the python community. It's not like Java or something where everybody writing the language is a developer. Most are scientists or business people or students working in places like anaconda or Jupyter. So it's really hard to get momentum behind an all-together-now solution.
I've slowly been gravitating toward Nix flakes so I can use it to pin to a project versions of all of the things you can't reliably install with pip alone (like python itself, or numpy, or postgres or whatever) and then have it read deps from poetry (via poetry2nix) for everything that "just works," but that's never gonna fly with the non-developer Python community. Hell, it probably won't even fly with half of the developers either, but it works well for me.
I think my situation is typical of python developers, which is why we have this problem. I think it'll stick around for a while because it's not like "just use a different language" is gonna fly with the non-dev crowd. They're going to expect somebody else to solve these problems for them.
(I may have a bias because my company offers OSS python apps in a SaaS form factor, so our support folk are the ones solving these problems--typically by either handling the virtualenv behind the scenes or by ensuring that users with conflicting dependencies are using different images).
I wanted to run guarddog on source packages. Only then build them locally and install. Turns out, `pip download` triggers code execution in fetched packages.
Somewhat surprising and in this day and age worth spreading awareness of.
Put your package names in requirements.txt and run `make update-frozen`. To reinstall everything from frozen state, `make clean frozen`. (And replace the first space with a tab; HN is stripping my tabs out)
I know Pythonistas like to use Python for everything, but there are other tools out there that will make your life much simpler.
The article talks about installing Python packages for development, but if you find yourself using `pip` to install Python tools/scripts then you should use `pipx` - it will properly sandbox those tools so they don't break (or be broken by) the system or other Pythons:
The tool specific usecase with pipx is unique though, it's laser focused and perfect at the job of getting a python tool to users regardless of whatever wacky state their Python install is in. It's kind of separate from the issues of managing dependencies. It's a fantastic tool I wish more python documentation and users would embrace.
The core problem (as I see it) is that Linux distros tend not to have any firm distinction between "system" packages, "user" packages, and "development" packages (which are a subset of user packages). The system package manager installs everything globally, while also being considered the only approved/safe way to install packages.
Languages tend to try to get around this by providing their own package registries and build systems to use them (npm, pip, cargo, etc), and developer tools often include some sort of sandboxing to avoid interference from the system packages (venv, bazel, cargo, nix develop, etc).
For user packages a tool like Snap, home-manger, Flatpak, or AppImage seems necessary.
Python makes the problems very obvious, especially since it has so many package management systems, gets used for system packages, and gets used for user applications.
If anyone's interested in a pipx clone with excellent tab completion, I would appreciate any feedback on pipz, a function of my zsh plugin for python environment and dependency management: zpy
While I agree with the author to not do global pip installs for every new project, I also don’t want to see text in every git repo README explaining Python package managers.
The lack of one true package management approach is a failure of the language. OP is advocating for a saner default like npm, instead of the current venv + pip mess.
npm has the same property of keeping the files locally, but without any need to activate/deactivate a venv. It “just works” that way by default if you “npm install”
Agreed, using the paths makes it feel like a conventional toolchain. I haven’t tried this but it sounds like if I execute the python executable in the venv directory I get that shell. Only issue from there is writing executables that invoke the venv path in a deployable way
It isn't a mess: venv + pip is simple and (usually) sufficient.
Legacy/existing code or genuine justifications excepted, of course, there is no need to use anything else - even if an alternative is better, the use of alternatives is usually worse. Short of any massive technical reason, the best option is almost always to use the default option.
The only thing they have in common is package.json, but even then they can interpret things differently, such as workspaces.
And then node_modules, which packages should not rely on but do, forcing many other tools into compatibility mode which often takes an install take a very long time.
Various packages rely on node_modules existing as a directory with a particular layout, some rely on being able to write into it. Some of the npm alternatives are built to store and manage dependencies in other ways (e.g., keep packages as zip files or other archives and get node to load direct from the zip), and these other mechanisms do not use a node_modules directory, hence compatibility problems.
I think he prefers a python-esque way where they're sort of dumped in a flat namespace (and not in current project directory), rather than the node_modules way where it's recursively a copy of each thing and its specific exact dependencies, all the way down.
There are ways to not use node_modules, by using newer Yarns for example.
> There are ways to not use node_modules, by using newer Yarns for example.
My point was that if you use yarn2 in pmp mode, and you have a dependencies that depends on the node_modules layout being at the same level as package.json, than even if your package manager doesn't not need or use node_modules, it must emulate it so the dependencies can find their files.
As a non-python person who has hair-pulling with python pip / pip3 / python2 / python3 python-is-python2-or-python3, this was a relevation.
pipenv looks like what pip should have been.
Another story on HN is "what happened to Ruby" and that really crystallized what I don't like about python. I'm not a ruby programmer, but I have to admit how much fantastic software came out of Ruby.
Ruby was always fighting Java for some reason, it should have been fighting Python. If only Ruby had won THAT war.
This is where npm gets it right. It's so much simpler to have the default install in a local folder, and then have an option to install globally if you like.
I appreciate the concern for new developers, but I really don't think it's a good solution to have every project readme describe pip, poetry, pipenv, and whatever other new hotness there is in the package management world. There's a reason that all the readmes describe pip installation: it's the lowest common denominator, present with every standard python install, and along with virtualenv (also standard) it can do most of the requirements for package management.
I think to help new developers, we could encourage documentation to briefly point to the official PyPA documents on the variety of options available. It would be better to focus on making that more accessible, rather trying to throw the burden onto package maintainers to describe using their package with every new tool.
Omg this is so true! I installed a package globally, but then my interpreter was using another version of python, which doesn't have the installed package. It took me an hour to find out about this. What a waste of time.
It's also interesting how things like AWS Lambdas, Graviton, etc, are exposing all the shortcomings of the various pip install, venv, poetry, etc, approaches.
It's not impossible to figure it out, but you end up spending a lot of time to come up with something that works locally, within containers, inside a CI/CD system, and then deployed out across things like Lambdas, or non x64 machines.
Then, after it's all working, upgrading the Python version, or an extension that has C code, etc, repeats some of the hard bits.
At least with Lambda it really is easy, just use Serverless Application Model and when you do “sam build” choose “--use-container”. It will look at your CloudFormation template where you are referring to the local directory containing your source code and requirements.txt and download and build in the appropriate Docker container for your language, version and architecture.
I assume that means container based Lambdas, which would have slower cold start times and maybe some other disadvantages, but yes, it would be simpler.
No. You just build zip file based Lambdas locally using containers.
In your CFT you specify the local directory and the architecture.
SAM will download the Amazon Linux container for your language runtime locally using the correct architecture (x86 or ARM) and download the correct dependencies based on your architecture and package everything in a local folder. It will then output a modified template pointing to the local folder where your Lambda was built. It will contain your source code and dependencies that are compatible with Amazon Linux.
“sam package” will then zip the files up built by Sam build and upload them to S3. It will then create another template that references the zip file in S3.
“Sam deploy” will deploy the standard zip file based Lambda.
This lets you build zip file based Lambdas locally including Amazon Linux native dependencies on either Windows, Macs or other versions of Linux.
When I switched to Arch Linux, I learned that pip has a --user option to install Python packages in the home dir of the current user. This is essential to not interfere with the system install from the system package manager. I had really trouble with that in former times.
Furthermore, as I now be used to bleeding edge packages, I update at least once a week all the outdated Python packages of my >450 installed ones.
When some packages get downgraded because of requirements, I ask:
Do I need the package that caused the downgrade more often or with more of the packages in the main environment, or is this true for one or some of the downgraded packages?
According to the answer, I put the 'problematic' package(s) in a new or existing venv, and update the downgraded ones in the main environment, if necessary.
This work cannot be done by a package manager!
Costs me <10 minutes every week to keep the main environment up to date, a bit more if I want that for some or all venvs.
If I didn't have A installed and then I install B which transitively installs A, then I expect that uninstalling B will also uninstall A. If only one system is managing the packages, then it is able to do this. It will have a record of the things I've explicitly installed so it knows what dependencies are safe to uninstall.
I prefer a package manager that tells me that there are things that may be safe to uninstall to one that decides to uninstall things on its own.
Maybe I installed B who installed A. Maybe sometime later I needed A and I didn’t do anything because it was already there. Seeing A disappear when I uninstall B may be unexpected.
apt handles this by marking packages as manually installed. You and the author could both be happy with that solution but afaik pip doesn't currently store such information.
This post points out one of my struggles with python.
I am not a python developer but I use python heavily for some tooling. So all I need to do is to “distribute” my tools to other servers in a replicable and consistent matter, isolated from global packages.
Can you please help me understand two points?
1. If I use venv+pip to install some python app, do I have to “activate” that specific virtual environment before executing that tool or can I just simply call it by its path on the file system?
2. Are there any official guide rails for making venv-wrapped app accessible to other users on a server? Or just as simple as placing links to /usr/local/bin/ for example?
1: usually you can just run the binary by its path. tbh I don't fully understand why it doesn't always work, but it's fairly rare, and most of the ones I can kinda-remember may have been during install time.
2: due to 1, symlinks often work. It's how I've installed all of my custom python binaries. Otherwise you'll very frequently see python "binaries" installed by e.g. Homebrew that are actually ~5 lines of minor environment prep and then running the actual binary - that's the only reliable way afaik.
> Lets say you use the same package again, but theres been a new release with some additional features. When you upgrade your global Python to use it, you now need to ensure every project you’ve done now works with it. That’s annoying and unlikely to happen, what you’ll be left with is a broken build.
Wait, what?
Don't python packages generally use `semver` versioning, and ensure that upgrades in the same major version are backwards-compatible?
And that different major versions are co-installable?
I saw this Twitter thread the other day (https://twitter.com/fchollet/status/1617704787235176449?s=46...) about similar problems, and some comments suggest using Docker. I couldn’t find any guides or ways to do this for a Python project; anyone here know more or has done this before?
poetry breaks once a while for me, so I am not using it these days.
pipenv used to be my first choice but it became inactive, seems it is actively under development again?
a few weeks ago there is a recommendation for PDM but I have not really used it.
For now I am using the pip+venv approach.
By the way, you better do: `python -m pip install` instead of `pip install`, don't remember why anymore but I did read somewhere that explained the difference and I agreed on then to prefer 'python -m pip install'
I use Python for research. If I need some package, I simply want the latest version; pip install is usually fine.
If something depends explicitly on the fixed (old) version, that's when problems happen and I grudgingly remember how to use pyenv. But I like to use the most recent versions and most recent Python, and I like packages that share this bleeding edge approach.
Article conflates global installation into the system python with global installation in general. Not everything is a project dependency. If you want, say, ipython, available everywhere, global installation is appropriate. You can get this without clobbering my system python by simply not using the system python for my projects.
I've been a happy user of pipenv for several years (at work, in production) and still recommend it. You lock the versions you want independently of the requirements.txt so you can update just the packages you want without worrying about sub-dependencies. 10/10 recommend.
You probably want conda if you're in this situation, as it basically solves for these issues (but doesn't have great docs for actually adding packages to it, unfortunately).
npm certainly has a number of problems (at the end the article compares pip to npm) -- but after reading this article I didn't realize pip was so problematic. I also didn't realize it installed things globally.
> I also didn't realize it installed things globally
It doesn't. It's a subtle distinction but the 'blame' doesn't lie with pip. When you do a pip install it does it in the context of the python interpreter you're using.
If you use your global python you get an installation in a global context from pip. If you use a non-global python you get a non-global installation from pip. And this is what venv etc give you; a local interpreter, which means the associated pip installs in a local context (a separate one for each venv).
I don't understand why pip doesn't do it like npm. Admittedly, I don't write Python code much, but "npm install xyz@1.2.3" simply installs to a node_modules/ folder in the current directory. Very easy to parse and nuke if I need to. I don't really understand how venv and its weird shell prompt are a better solution.
Is there a canonical example of how python projects should manage dependencies and sandboxing such that other developers can just clone, install, and get to work?
Put everything in a docker container/OCI image and have someone own managing and babysitting the build of that image for everyone else.
There really is no single tool or workflow for everything in the python world. What works for a simple source only python package can break horribly if you try using sophisticated scientific computing packages with numerous native dependencies (and then you realize you need conda or a whole other set of tools).
I'm not sure if I get the point of this article. So basically the author has learnt that there are a different ways of managing packages in Python? I'm aware that this might be a problem in Python, but let's be serious guys, you only need to spend 5 mins to learn about venv/conda and you will never face any problem in a basic Python project. You don't have to write an article about that.
You can still force it via `pip install --break-system-packages ...` if needed.