Constraints Are Good: Python's Metadata Dilemma

PaulHoule · 2024-12-06T16:31:46 1733502706

Eggs are dying out, pointed out by this 2 year old blog post:

https://about.scarf.sh/post/python-wheels-vs-eggs

The metadata problem is related to the problem that pip had an unsound resolution algorithm based on "try to resolve something optimistically and hope it works when you get stuck and try to backtrack".

I did a lot of research along the line that led to uv 5 years ago and came to the conclusion that installing out of wheels you can set up a SMT problem the same way maven does and solve it right the first time. They had a PEP to publish metadata files for wheels in PyPi but I'd built something before that could suck the metadata out of a wheel with just 3 http range requests. I believed that any given project might depend on a legacy egg and in those cases you can build that egg into a wheel via a special process and store it in a private repo (a must for the perfect Python build system)

the_mitsuhiko · 2024-12-06T17:18:48 1733505528

The metadata problem is unrelated to eggs. Eggs haven’t played much of a role in a long time but the metadata system still exists.

Range requests are used by both uv and pip if the index supports it, but they have to make educated guesses about how reliable that metadata is.

The main problem are local packages during development and source distributions.

PaulHoule · 2024-12-06T17:35:15 1733506515

Back in the case of eggs you couldn't count on having the metadata until you ran setup.py which forced pip to be unreliable because so much stuff got installed and uninstalled in the process of a build.

There is a need for a complete answer for dev and private builds, I'll grant that. Private repos like we are used to in maven would help.

the_mitsuhiko · 2024-12-06T17:38:16 1733506696

Eggs did not contain setup.py files. The metadata like for wheels was embedded in the egg (in the EGG-INFO folder) from my recollection. Eggs were zip importable after all.

PaulHoule · 2024-12-06T18:10:52 1733508652

It looks like you can embed dependency data in an egg but it is also true that (1) a lot of eggs have an internal setup.py that does things like compile C code, and (2) it's a hassle for developers on some platforms who might not have the right C installed, and (3) eggs reserve the right to decide what dependencies they include when they are installed based on the environment and such.

westurner · 2024-12-07T05:20:41 1733548841

I decided to look into how this works these days.

These days you need toml to parse pyproject.toml, and there's not a parser in the Python standard library for TOML: https://packaging.python.org/en/latest/guides/writing-pyproj...

pip's docs strongly prefer project.toml: https://pip.pypa.io/en/stable/reference/build-system/pyproje...

Over setup.py's setup(,setup_requires=[], install_requires=[]) https://pip.pypa.io/en/stable/reference/build-system/setup-p...

Blaze and Bazel have Skylark/Starlark to support procedural build configuration with maintainable conditionals

Bazel docs > Starlark > Differences with Python: https://bazel.build/rules/language

cibuildwheel: https://github.com/pypa/cibuildwheel ;

> Builds manylinux, musllinux, macOS 10.9+ (10.13+ for Python 3.12+), and Windows wheels for CPython and PyPy;

manylinux used to specify a minimum libc version for each build tag like manylinux2 or manylinux2014; pypa/manylinux: https://github.com/pypa/manylinux#manylinux

A manylinux_x_y wheel requires glibc>=x.y. A musllinux_x_y wheel requires musl libc>=x.y; per PEP 600: https://github.com/mayeut/pep600_compliance#distro-compatibi...

> Works on GitHub Actions, Azure Pipelines, Travis CI, AppVeyor, CircleCI, GitLab CI, and Cirrus CI;

Further software supply chain security controls: SLSA.dev provenance, Sigstore, and the new PyPI attestations storage too

> Bundles shared library dependencies on Linux and macOS through `auditwheel` and `delocate`

delvewheel (Windows) is similar to auditwheel (Linux) and delocate (Mac) in that it copies DLL files into the wheel: https://github.com/adang1345/delvewheel

> Runs your library's tests against the wheel-installed version of your library

Conda runs tests of installed packages;

Conda docs > Defining metadata (meta.yaml) https://docs.conda.io/projects/conda-build/en/latest/resourc... :

> If this section exists or if there is a `run_test.[py,pl,sh,bat,r]` file in the recipe, the package is installed into a test environment after the build is finished and the tests are run there.

Things that support conda meta.yml declarative package metadata: conda and anaconda, mamba and mambaforge, picomamba and emscripten-forge, pixi / uv, repo2docker REES, and probably repo2jupyterlite (because jupyterlite's jupyterlite-xeus docs mention mamba but not yet picomamba) https://jupyterlite.readthedocs.io/en/latest/howto/configure...

The `setup.py test` command has been removed: https://github.com/pypa/setuptools/issues/1684

`pip install -e .[tests]` expects extras_require['tests'] to include the same packages as the tests_require argument to setup.py: https://github.com/pypa/setuptools/issues/267

TODO: is there a new one command to run tests like `setup.py test`?

`make test` works with my editor. A devcontainers.json can reference a Dockerfile that runs something like this:

  python -m ensurepip && python -m pip install -U pip setuptools

But then still I want to run the tests of the software with one command.

Are you telling me there's a way to do an HTTPS Content Range request for the toml file in a wheel for the package dependency version constraints and/or package hashes (but not GPG pubkey fingerprints to match .asc manifest signature) and the build & test commands, but you still need an additional file in addition to the TOML syntax pyproject.toml like Pipfile.lock or poetry.lock to store the hashes for each ~bdist wheel on each platform, though there's now a -c / PIP_CONSTRAINT option to specify an additional requirements.txt but that doesn't solve for windows or mac only requirements in a declarative requirements.txt? https://pip.pypa.io/en/stable/user_guide/#constraints-files

conda supports putting `[win]` at the end of a YAML list item if it's for windows only.

Re: optimizing builds for conda-forge (and PyPI (though PyPI doesn't build packages (when there's a new PR, and then sign each build for each platform))) https://news.ycombinator.com/item?id=41306658

m463 · 2024-12-07T01:41:02 1733535662

https://en.wikipedia.org/wiki/Satisfiability_modulo_theories

TheCleric · 2024-12-07T07:34:09 1733556849

    Maybe the solution will be for tools like uv or poetry to warn if dynamic metadata is used and strongly discourage it. Then over time the users of packages that use dynamic metadata will start to urge the package authors to stop using it.

I wouldn’t bet on this one. I know a lot of python package maintainers who would likely rather kill their project than to adapt to a standard they don’t like. For example see flake8’s stance on even supporting pyproject.toml files which have been the standard for years: https://github.com/PyCQA/flake8/issues/234#issuecomment-8128...

I know because I’m the one that added pyproject.toml support to mypy 3.5 years ago. Python package developers can rival Linux kernel maintainers for resistance to change.

lyu07282 · 2024-12-06T17:14:24 1733505264

> The challenge with dynamic metadata in Python is vast, but unless you are writing a resolver or packaging tool, you're not going to experience the pain as much.

But that is by choice, I as a user, am forced to debug this pile of garbage whenever things go wrong, so in a way it's even worse for users. It's a running joke in the machine learning community that the hard part about machine learning is having to deal with python packages.

pdonis · 2024-12-06T17:38:35 1733506715

A lot of the problem seems to be driven by a desire to have editable installs. I personally have never understood why having editable installs is such an important need. When I'm working on a Python package and need to test something, I just run

python -m pip install --user <package_name>

and I now have a local installation that I can use for testing.

the_mitsuhiko · 2024-12-06T17:52:27 1733507547

That would you require to make re-installations if your local app you develop against after every code change. Very few people will want to do that and it’s potentially very slow.

It’s also a step not needed by most other ecosystems.

pdonis · 2024-12-06T18:13:43 1733508823

> It’s also a step not needed by most other ecosystems.

From what I can gather, most other ecosystems don't even have the problem under discussion.

bheadmaster · 2024-12-07T13:14:56 1733577296

Go (a.k.a. Golang), with its network-first import system (i.e. import "example.org/foo/bar"), has solved the problem in a surprisingly simple way. You just add a "replace" directive in a go.mod file and you can point your import (and all child imports) to any directory on the filesystem.

pdonis · 2024-12-06T18:12:08 1733508728

> it’s potentially very slow.

Potentially, perhaps. But it's certainly not for the cases where I use it: a pure python package, whose dependencies are already installed and are not changing (only the package itself is). Under those conditions, the command line I gave takes a couple of seconds to run.

orf · 2024-12-07T00:59:52 1733533192

That iteration loop is pure madness.

pdonis · 2024-12-07T02:19:58 1733537998

Why? It works for me.

paulddraper · 2024-12-06T20:35:03 1733517303

I.e. orders of magnitude longer

pdonis · 2024-12-07T02:19:49 1733537989

Orders of magnitude longer than what?

the_mitsuhiko · 2024-12-07T13:02:46 1733576566

Than editable installs. The main sentry app takes ~10 seconds to pip install. I would not want to run that every code change. Also more painful to debug because the filenames in the stack trace no longer match to what you have open in your editor.

pdonis · 2024-12-08T21:35:12 1733693712

> The main sentry app takes ~10 seconds to pip install.

Which is much longer than the "couple of seconds" I gave for my use case. Yes, if it takes that long, I can see how you would want some alternative.

> Also more painful to debug because the filenames in the stack trace no longer match to what you have open in your editor.\

Why not? If you do a fresh install, everything should match up. It seems like this problem would be more likely with an editable install, if things aren't kept in sync properly.

the_mitsuhiko · 2024-12-09T19:32:06 1733772726

> Why not? If you do a fresh install, everything should match up.

Absolutely not. The file names in stack traces will be from the site-packages folder in the venv instead of the local checkout.

raoulj · 2024-12-06T18:16:38 1733508998

Yeah. It's too slow. Editable installs make application development much faster.

taeric · 2024-12-06T16:33:47 1733502827

I am curious how Python got into this situation. Was it largely taking the path of least resistance to more and more adoption?

I get that Python is, strictly speaking, an older language. But, it isn't like these are at all new considerations.

the_mitsuhiko · 2024-12-06T17:17:01 1733505421

Classic case of lack of constraints early on. Once people use all that power you end up with a mess.