Hacker News new | past | comments | ask | show | jobs | submit login
Freezing Requirements with Pip-Tools (simonwillison.net)
55 points by BerislavLopac on July 15, 2022 | hide | past | favorite | 57 comments



I've been writing Python forEVER and the one thing I fail to comprehend about it is how the dependency situation on Python only gets worse. It has somehow surpassed pre-nvm Node for being a total dependency clusterfuck.

What are we using now? Pip? Pyenv? Pipenv? Poetry? Hatch? Conda?

Once that's settled, there's invariable some library that has some native buildchain tooling requirement that isn't satisfied and the build process has shat all over my terminal. So now I have to hunt down whatever libraries gcc is complaining about.

Christ, it should be against the law to release a programming language that doesn't have dependency management built-in.


> What are we using now? > > Pip?

Of course, for standard package installation.

> Pyenv?

Yes, if you need multiple Python versions available on the same system.

> Pipenv?

God no. https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...

> Poetry? Hatch?

Yes, for local environment management and library dependency configuration.

> Conda?

Not really, unless you're doing data science and are heavily invested in the rest of Anaconda ecosystem.

I don't really understand where the confusion is coming from - Python is used in a wide number of use cases and scenarios, and it is no surprise that there are many different tools in the box, each solving a different subset of problems.

The packaging issue has been all but resolved in recent years, with the advent of wheels, pyproject.toml and a number of other standards.

> So now I have to hunt down whatever libraries gcc is complaining about.

Which has nothing to do with Python.


100% I switch between Go and Python a lot and even with Go mods detractions it’s still so much simpler to have everyone using one package manager built into the standard toolchain


Poetry is all you need.


Did you ever try poetry[1]?

It is a great tool that I use in all my projects. It effectively solves dependency management for me in a very easy to use way.

Dependency management using requirements.txt used to give me such a headache, now I just have a pyproject.toml that I know works.

    [tool.poetry.dependencies]
    python = ">=3.8,<3.9"
    pandas = "^1.4.3"
This basically means: Use the version of python between 3.8 and 3.9 and use any version higher than 1.4.3 for pandas.

What I like about poetry is that it makes sure that the whole dependency graph of the packages that you add is correct. If it can not solve the graph, then it fails fast and it fails hard, which is a good thing.

This is probably a very bad explainer of what poetry does, but be sure to check it out! :)

[1] https://python-poetry.org/


Poetry has a major issue with its lockfiles when working with active projects. It generates a top level dependency listing checksum, which causes any two PRs/branches that independently update the top level requirements to conflict with each other.

The related issue, https://github.com/python-poetry/poetry/issues/496, has been open for 4 years with no movement.

The other issue with Poetry is that it uses its own pyproject.toml dependency listing format instead of the one standardized in the relevant PEP (https://peps.python.org/pep-0621/). This is understandable for historical reasons (Poetry was first written before this was standardized), but Poetry should have been updated to support the standard format.

A relatively minor issue, but the poetry shell command is also a footgun. It's presented as a way to configure your shell to activate the virtualenv for the project. In reality it's a very slow, barely functional terminal emulator running on top of your terminal, which will cause problems for any programs that assume a working terminal or talk to the tty directly.


https://twitter.com/SDisPater/status/1521932867214921728

poetry shell is also not a term emulator, it's just a subshell with some environment variables setup for your project. Once you are in, it's just a regular shell. If anything is slow, it's where you add or remove a dependency, but it's probably faster than you editing requirements.txt, clearing out your virtualenv and then reinstalling everything again.


https://github.com/python-poetry/poetry/blob/master/src/poet...

The process spawned by `poetry shell` is a terminal emulator driven by the pexpect and cleo packages. It hijacks and proxies the user's keystrokes before sending them to the underlying terminal.


Cleo creates "subcommands" in a git like manner, whereas pexpect spawns a subshell. That cleo Terminal class you see is only a viewport.

A terminal emulator will be something substantially more complex such as libvterm. If it ain't handling terminfo, it ain't a terminal emulator.

https://launchpad.net/libvterm


That is the point I was making. It's not a proper terminal emulator, instead it's a half-assed one. If it gets between the user's keystrokes and the host shell, it should be a proper emulator. Otherwise it should set up the environment and get out of the way.


Poetry is good, I'll hit a bug in poetry from time to time but it's improving quickly.

Direnv is better, direnv layout will automatically create a hidden virtualenv directory on your work tree and sets up your path when you cd into it. The only downside is it doesn't seem to work on Windows.


I don't see the link between poetry and direnv? Poetry is about solving python's dependency issues, direnv doesn't seem to have anything to do with that


Set up direnv, and then:

  cd /your/project && echo "layout python" > .envrc && direnv allow
Give it a go then tell me what the point is for any of these poetry/pipenv/hatch/flit/pdm/pyflow thingy if neither you or your teammates work on Windows.


That will only take care of setting up and loading the correct virtualenv. Which you may prefer to "poetry shell" or "poetry run", since it's more automatic, but that's not the main reason for using poetry.

The main reason to use poetry is sane dependency management the way it exists for most other ecosystems (bundler, cargo, npm, maven, gradle, ...). In particular, that includes lockfiles.


Use direnv to take care of the virtualenv part of functionality poetry offers, use pip-tools to deal with the dependency management/reproducible build part. Or if you like, straight up pip freeze.

Use pyenv to install all the different python versions you need.

Ideally, all of these functionality should bundled in one tool, but the only thing that's available is pyflow, which don't stop blowing up with an exception for me.


Then you're not recommending direnv as a poetry replacement, you're recommending direnv + pip-tools + pyenv. That's a different matter.


The hardest packaging problem in Python is not resolving dependencies or pinning down dependencies. Pip has it largely solved a some years ago. Literally every Python packager using some parts of pip underneath. The messiest part is literally packaging and how to install and isolate the packages.

How you install multiple Python versions doesn't really matter as long as the binaries are on your PATH. You can use Homebrew or Macports or Pyenv or whatever. The only remaining problem is how to manage your virtualenvs. You can use virtualenv or venv directly, but you will have to manage where to put them and remember to activate them before you install dependencies and dev tooling. But if you use direnv, it's fire and forget, once you have direnv setup and one line of directive in a .envrc file, or perhaps a few gitignore, you don't have to think about where to put the virtualenv or remember to activate it again.

So yes, I'm actually just recommending direnv if you want to keep it simple.


> The hardest packaging problem in Python is not resolving dependencies or pinning down dependencies. Pip has it largely solved a some years ago.

I really disagree, and I think so does everyone who uses Poetry, pipenv, pip-tools, etc.


Yeah im a huge poetry fan for dep management but i never use “shell” or any of its virtualenv management features


When your project depends on a module version and that module depends on another one (sub dependency), it’s very common that re-installing the same module version in a new environment will cause something to break because the sub dependency was updated. This is not something that direnv solves therefore those other tools are still needed.


pip freeze > requirements.txt


Then you lose track of which are actual dependencies and which are sub dependencies


It doesn't matter. Pinning means pinning every dependencies and sub dependencies.


Isn't that the same as adding version constraints to setuptools' setup.cfg? [1] pip will use these in its dependency resolver.

You can also constrain the Python version in there if you want.

[1] https://setuptools.pypa.io/en/latest/userguide/quickstart.ht...


Maybe it is the same, but the idea is to take that selected version of the dependency and store a hash checksum for it, so that one can later get the exact same dependencies. Poetry only does not source setup.cfg, but its tool-specific config file. This way it stays out of the way of any other tool.


pip-tools can hash as well: https://pip-tools.readthedocs.io/en/latest/#using-hashes

Using a tool specific config file seems like a design choice with upsides and downsides, which I respect


A Pipfile can store hashes for multiple versions of a package built for multiple architectures; whereas requirements.txt can only store the hash of one version of the package on one platform.

Can a requirements.txt or a Pipfile store cryptographically-signed hashes for each dependency? Which tool would check that not PyPI-upload but package-builder-signing keys validate?

FWIU, nobody ever added GPG .asc signature support to Pip? What keys would it trust for which package? Should twine download after upload and check the publisher and PyPI-upload signatures?


Are cryptographically-signed hashes really necessary in this case (why?)? What would be the secret, which is used for signing?


If the hashes are retrieved over the same channel as the package (i.e. HTTPS), and that channel is unfortunately compromised, why wouldn't a MITM tool change those software package artifact hash checksums too?

Only if the key used to sign the package / package manifest with per-file hashes is or was retrieved over a different channel (i.e. WKD, HKP (HTTPS w/w/o Certificate Pinning (*))), and the key is trusted to sign for that package, then install the software package artifact and assign file permissions and extended filesystem attributes.


From the sigstore docs: https://docs.sigstore.dev/ :

> sigstore empowers software developers to securely sign software artifacts such as release files, container images, binaries, bill of material manifests [SBOM] and more. Signing materials are then stored in a tamper-resistant public log.

/? sigstore sbom: https://www.google.com/search?q=sigstore+sbom

> It’s free to use for all developers and software providers, with sigstore’s code and operational tooling being 100% open source, and everything maintained and developed by the sigstore community.

> How sigstore works: Using Fulcio, sigstore requests a certificate from our root Certificate Authority (CA). This checks you are who you say you are using OpenID Connect, which looks at your email address to prove you’re the author. Fulcio grants a time-stamped certificate, a way to say you’re signed in and that it’s you.

https://github.com/sigstore/fulcio

> You don’t have to do anything with keys yourself, and sigstore never obtains your private key. The public key that Cosign creates gets bound to your certificate, and the signing details get stored in sigstore’s trust root, the deeper layer of keys and trustees and what we use to check authenticity.

https://github.com/sigstore/cosign

> our certificate then comes back to sigstore, where sigstore exchanges keys, asserts your identity and signs everything off. The signature contains the hash itself, public key, signature content and the time stamp. This all gets uploaded to a Rekor transparency log, so anyone can check that what you’ve put out there went through all the checks needed to be authentic.

https://github.com/sigstore/rekor


Hatch is also interesting and very similar to Poetry.

https://hatch.pypa.io/latest/

In comparison to poetry I think it includes more advanced multi-environment and multi-python-version support and a tox-like testing matrix. It probably gets a little too complex there.

It also works with pyproject.toml

If anyone else has experience with Hatch vs Poetry please share!


You can't lock your dependencies with Hatch.

https://hatch.pypa.io/latest/meta/faq/#libraries-vs-applicat...


> What I like about poetry is that it makes sure that the whole dependency graph of the packages that you add is correct.

As does pip these days.


I tried, thanks! :) And I also suggest to evaluate PDM https://pdm.fming.dev/


I've been using PDM recently and although there have been a few issues I really like just cd'ing into a directory, running "python" and the correct set of packages being available.



My least favorite part about poetry is how slow it is.


I seem to remember that some part of poetry's slowness is due to how the python index works and is therefore a problem shared by all such tools.

That said, I've used both pipenv and poetry, and I had projects where pipenv would simply time out when trying to resolve packages. I haven't seen the same behaviour with poetry (indeed, that was the reason I migrated one project from pipenv to poetry after I just had to give up with the former).


Pip-Tools is great for pinning and freezing with hashes. Does what it should. So much faster than pipenv or poetry. Never looked back.


Until you need a platform or Python version agnostic lock file. pip-tools compiles the list for your current environment which makes it limited.


This is interesting; can you expand or point to some documentation? I don’t have this design requirement right not, so I’m trying to understand any growing pains I might be locking myself into


Not OP, but in our case, there was a package that had a dependency for python3.6 but not for python3.8.

Our production environment was python3.6. Devs rebuilt the requirements.txt with python3.8.

When we attempted to use the requirements.txt with python3.6, we couldn't because a package was missing (and we installed with `--require-hashes`). The dependency was `importlib-metadata` iirc.

But googling around, here's an example of a package that has dependencies that changed based on the python version: https://github.com/pypa/pep517/blob/main/pyproject.toml#L13 .

In our case, we just made sure to rebuild the requirements.txt with the version that matched our production; not sure if there's a "nice" way to support multiple versions with pip-tools.


As you discovered, the actual fix is making sure your production and development python environments match. As for platform discrepancies, docker helps with this


> platform or Python version agnostic lock file

I might be splitting hairs here, but this seems like an oxymoron: if it's agnostic on anything, it's not really a lock file.


You still need something else to manage your python versions and virtualenvs tho, and then as soon as you've pick a solution for the latter two problems, chances are you'll discover these tools also have a lock files that'll solve the problem of freezing packages and enabling reproducable builds for you.


What about just throwing every thing into docker?


Docker is good at what it is good app, but sometimes I don’t want to deal with a whole other system just want to share code with five coworkers


That would be the "something else" :)

If you don't want to fire up docker, you'll have to look elsewhere other than just pip-tool


We are migrating our company codebase to bazel which uses this and it is quite helpful. Hermetic builds are nice and all devs share same versions of packages so reproducing bugs is easy


FWIW I have a code snippet in a pip-tools issue that will list only the outdated root dependencies in requirements.in:

https://github.com/jazzband/pip-tools/issues/1167

I personally find it extremely useful when upgrading dependencies.


Shameless plug: I built pip-chill to help me generate less stringent (and shorter) requirements.txt files. Its main use is to remove "noise" - packages required by other packages that you don't care about - but it can also remove version numbers (which is useful for testing against the latest and greatest).


pip-tools is alright but it doesn't support cross-platform (or python version) lockfiles.

poetry is alright but it doesn't support the latest PEP standards, and its slow.

PDM is where it's at; it's fast, has a really responsive maintainer, supports all the latest PEP standards, and has really good cross-platform support.

pdm.fming.dev/


PEP 582 is only a DRAFT, though. Not sure I want to sign on to a solution that might undergo substantial revision or be rejected.


This exchange makes me a bit skeptical. Is there later news?

https://twitter.com/mkennedy/status/1375242144135270403?lang...


I just discovered this after using (and being annoyed by) pipenv. I’ve heard good things about poetry, but I guess my big mystery is why either of them even exist when tools seems to pre-date?


pip-tools has a minimalistic design philosophy, while poetry and pipenv have a maximalistic one. In other words, pip-tools tries to do one thing well, while poetry and pipenv try to do about a dozen different things well and have various opinions on how to do them. Minimalistic tools often lose out to maximalistic ones in the short term, only to win out in the long term as the maximalistic ones lose the momentum and resources needed to maintain their feature set.

pip-tools was never advertised as well as poetry or pipenv, despite in my opinion being a better tool for the job, and having a best-in-class dependency resolver.


> Minimalistic tools often lose out to maximalistic ones in the short term, only to win out in the long term as the maximalistic ones lose the momentum and resources needed to maintain their feature set.

I don't think this is true. Pipenv and poetry's feature set largely coincides with the feature set of similar tools in other languages (e.g. bundler, cargo). Some such tools have even larger feature sets (e.g. maven, gradle). Nonetheless, these tools haven't lost out to "smaller tools", but have become standard tools in their respective communities.

The problem in the Python ecosystem is fragmentation. Nobody can agree on what the right tool for the job is, so everyone uses something different, dependencies/projects don't always work well with all tools, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: