
Distributions vs. Releases: Why Python Packaging Is Hard - BerislavLopac
https://pydist.com/blog/distributions-vs-releases
======
perlgeek
Most of my pain with python packaging comes from incompatible changes in the
toolchain.

Some years ago, pip started distrusting HTTP mirrors, and while you could add
some options to force it use HTTP, these options weren't present in previous
pip versions. Which meant that you now had to provide options depending on the
pip version -- which is harder if you don't call pip directly (for example
through dh-virtaualenv).

We switched to HTTPS, but one with a TLS cert signed by a company-internal CA.
Getting pip to trust another CA is non-obvious, and again depends on the pip
version. So another version-dependent treatment necessary.

Then pip suddenly moved all of its implementation from pip.something to
pip._internal.something, stating that it was documented all along that you
shouldn't import anything from the pip.* namespace. But, the package names all
didn't start with an underscore, so, by python convention, looked like
perfectly fine public classes and methods.

Moreover there simply isn't any public API to do things you _really_ don't
want to reimplement yourself, like parsing `requirements.txt` files.

As soon as you want to do anything slightly non-standard with
pip/distutils/setuptools, you find a solution pretty quickly, and that
solution turns into a nightmare when you upgrade any of the components
involved.

Also, finding a local pypi mirror that lets you inject your own packages _and_
deals with wheels correctly _and_ doesn't generally suck _and_ is somewhat
easy to use... not easy.

~~~
icebraining
Agreed; for something that managed to become the de-facto solution for Python
package installation, pip always felt surprisingly unreliable to me.

My solution at a small company, where we couldn't waste time with this crap,
was to vendor everything: when a developer updated the requirements.txt, they
would also install the packages in a project directory (using "pip install -r
requirements.txt -t ./dependencies") and commit that. That way, the rest of
the process (CI, packaging, etc) could just checkout the repo, set
PYTHONPATH=./dependencies and ignore pip completely.

------
boapnuaput
Nix's usage of source hashes to pin every package is more and more prescient,
as each language's custom-written package manager reinvents the wheel. (Pun
indented!)

~~~
Godel_unicode
There are plenty of situations where the output of the build process is not
deterministic on only the source.

~~~
coherentpony
Example?

~~~
Godel_unicode
Install Python from source with and without readline-devel in the ld library
path.

Or any other piece of C software with macros that depend on the build
environment. Same .c file in, different binary out.

~~~
m_mueller
This sort of thing is IMO exactly why containers have merit. Let the software
vendor decide what to distribute or others jump in and fill the gap.
Powerusers can still read the dockerfiles and decide what do do or even build
themselves. When it's built the image is well defined, i.e. not depending on
host environment, including files stem except volume mounts. When will we get
a whole desktop distro like this?

~~~
mikepurvis
Many of the pieces which go into a desktop have a strong assumption of being
able to interact with other components in ways other than a network socket.
Containerizing an application is one thing, but how do you containerize
something that has plugins? What about desktop widgets? Do they go in their
own containers, or get mixed in with the host's somehow? How is ABI
compatibility enforced now that we're in container land and none of that
matters any more?

What about client/server components where there's dbus in between, like the
network manager or volume control?

These are solvable problems, but it's been enough of a challenge building a
server OS from containers (RancherOS), and building a desktop OS that way is a
significantly harder problem.

~~~
m_mueller
> containerize something that has plugins

I don't really see the problem there, I see the plugins as essentially just
data (i.e. stored in a mounted volume), for which updating and versioning is
in the domain of the application itself or maybe some standardized library it
uses.

desktop widgets: essentially the same thing, it's a plugin to the desktop
environment and can be stored as a volume mount on the DE container.

dbus is probably something that would require an evolution on the container
side, or alternatively it would need to be all abstracted into network
interfaces. another possible way to look at it is to have a layer between
kernel and containerized userland that is responsible for manipulating all the
physical host things in the traditional way, and the examples you give are
exactly that. maybe this sort of thing should continue to be distributed
tightly together with the kernel.

------
olah_1
I only got interested in Python again after I became aware of tools like
Poetry [https://poetry.eustace.io/](https://poetry.eustace.io/)

~~~
ptx
Looks very interesting, but the part of the installation instructions where it
says that its directory "will be in your $PATH environment variable" worries
me – is the installer taking the liberty of messing with my configuration
without asking first?

------
malkia
If you take this further, and can compile any of the "C" code (open source),
then you can link that existing code as one single binary - e.g. "python" \-
basically python with all compiled code needed, thus: \- You no longer need
.dll, .so, etc. to go along, you can package the rest of the .py files into a
.zip and slap it at the end of the now "fat" python binary, hence effectively
having one executable - a bit of a "go" style releasing... \- This can even
have smaller sizes, but careful (with FFI) - because only the code that is
exposed to be used from Python would get linked in the fat "python" binary..
that is - with "FFI" is more tricky, as you have to force symbol to be kept \-
You can obviously do that with other runtimes - java - e.g. your whole "java"
can be down to one executable really

But to get there, you need sane build system, in order to explain these
dependencies outside of python/java/whatever... And obviously the source code
(although precompiled static libs should work in this scheme too)...

The goal again - just to have one thing, and most importantly no .so, .dll,
.dylib lying around,

~~~
icebraining
This is what tools like py2exe and PyInstaller do.

~~~
paavoova
You can do this with plain zip files as Python can unpack and execute in-
memory. But you have to extract non-Python code such as C libraries at runtime
somewhere so the OS can link them. I would be interested to know how the
packagers you mentioned get around this, or if they too use temporary files
and unpack.

~~~
dead_mall
This is exactly how I package my python scripts. The answer to your question:
Memfd [https://magisterquis.github.io/2018/03/31/in-memory-only-
elf...](https://magisterquis.github.io/2018/03/31/in-memory-only-elf-
execution.html)

And zipimport pip package

~~~
malkia
Right, there are OS-specific approaches on all systems, but if you are able to
compile Python itself along all open source C/C++ libs that your .py-thon file
might use, you end up with exactly one executable (you can possibly even go
further packaging the .py files in some compressed (bytecode compiled) inside
the same binary) - e.g. all you need is there, and the approach would fare
better across more systems (e.g. does not rely on advanced techniques, no need
to unpack the .DLL in temp folder, etc.). Also better for post-mortem
debugging - your .PDB, and your .DWP (or whatever symbol files) are.. well
just one single file, rather than several.

Same thing for java, dart, etc. This makes it much easier to deploy, use,
download, as it's basically "zero"-config install, no need for scripts to
tweak your PATH, etc.

(but it has limitations clearly, if you use proprietary code, or you can't
compile given open source library bundled with you - e.g. LGPL code, that you
have to dynamically link, then again if you are project is open source this is
no longer that much of a limit (but would not know that for sure))

------
alexbecker
Author here, happy to answer any questions.

~~~
4dahalibut
Why isn't the obvious answer to do this change?
[https://github.com/pypa/packaging.python.org/issues/564#issu...](https://github.com/pypa/packaging.python.org/issues/564#issuecomment-428553391)

~~~
alexbecker
Because right now you can _only_ upload individual distributions to PyPI. A
release is implicitly created when you upload the first one. If PyPI
implemented that change, you could only have one distribution per release.

The proper fix would be to make publishing a release a separate operation. But
that breaks all existing tooling and workflows.

~~~
MrOxiMoron
Make it a configuration option for the package on pypi, then tooling can
migrate slowly and at some point it can become the default for new packages.
If then at some point someone uses old tooling the downside would be that they
might need to do a manual publishing step to actually publish their package,
but at least you don't have the problem of publishing before you are ready.

------
your-nanny
[https://github.com/pypa/pip/issues/5874](https://github.com/pypa/pip/issues/5874)

This seems like reasonable suggestion.

------
MrOxiMoron
shouldn't pip then also look at the hashes (if provided) to determine which
package to install? If it worked at one time it should keep working if there
are "better" options available now. Consistency is important here.

~~~
di
The article links to
[https://github.com/pypa/pip/issues/5874](https://github.com/pypa/pip/issues/5874)
which proposes exactly this.

~~~
alexbecker
In fact I'm hoping to get some time this weekend to make a PR implementing
this! (I'm assuming the reason you/Donald Stufft haven't done so is time,
rather than disagreement over whether this is a good solution or some unseen
obstacle to implementing it.)

------
xiaodai
How does R avoid this then?

~~~
nestorD
I believe that, for R, releases are synonymous with distributions. The code is
compiled on the user's machine (I once had a fortran error message while
installing an R package).

If reproducibility matters, you can also use Microsoft R Open to get your
packages from a frozen snapshot of CRAN :
[https://mran.microsoft.com/](https://mran.microsoft.com/)

------
nirse
[https://xkcd.com/1987/](https://xkcd.com/1987/) (edit: this is for anyone who
mentions conda in here)

~~~
mlthoughts2018
How so? Conda is strictly simpler than alternatives, by a solid order of
magnitude at least.

------
mlthoughts2018
It’s beyond disappointing that the article does not discuss conda and conda’s
build variant capabilities [0].

Maybe 2 years ago there was still room for honest debate among Python package
tooling. There just sincerely isn’t anymore.

[0]: [https://conda.io/projects/conda-
build/en/latest/resources/va...](https://conda.io/projects/conda-
build/en/latest/resources/variants.html#build-variants)

~~~
phowon
As someone who aggressively uses conda, I do think one of its downsides is how
heavy it is. I agree that it's a good one-stop-shop if you want to get an
environment up and running with no issues. But if you're not doing any
scientific computing for a given project, it might help to use something more
lightweight.

~~~
nirse
I've had to 'debug' quite a few (ana)conda setups, mostly ones that were used
for scientific work, where the whole python setup got so messed up that
nothing was quite working anymore, both system and conda pythons. I advise
against it.

~~~
mlthoughts2018
This sounds highly implausible, since conda’s number one mode of operation
will fully isolate system dependencies into separate environments. It is
extremely difficult to misuse conda in a way that would cause this problem,
which makes me believe this comment is just trolling.

~~~
nirse
Sorry, no trolling here. The times I've tried to fix the setups I've lost my
patience with it and IIRC got things back working by deleting conda and
installing packages through manually homebrew (all Macs). But it wasn't on my
machine, so I'm not sure how they got into that state. I think it was due to
using a wide range of IDE's/environments that all tried to manage their own
packages and load paths, combined with perhaps some manual setup.

Probably Conda is fine if you just commit to it, but I think the users who got
into these messes weren't knowledgable enough to oversee the consequences of
their actions.

So you could argue that the problem isn't with conda, but with the users.
However, it seems that for naive users the consequences of using conda aren't
clear (enough), which can get them into a mess.

~~~
kalefranz
Hi! I was the conda dev lead for 2.5 years. We work really hard to make sure
conda does the Right Thing in unpredictable, almost hostile, environments.
It’s a large surface area to guard against, but in general we’ve been
incredibly successful.

I hate to hear that you had a bad experience. If it’s at all possible, please
provide something on our issue tracker (github.com/conda/conda) we can
replicate and write regression tests against. Help us improve on what we’ve
missed.

