
The problem with packaging in Python - jimsojim
http://blog.ionelmc.ro/2015/02/24/the-problem-with-packaging-in-python/
======
syllogism
Noooooooo

This is a really terrible suggestion.

The last thing we need is _another_ way to do things, that only takes care of,
say, 70% of the functionality.

All that will happen is, someone now has to look through the simple interface,
and decide oh wait, I need one feature this doesn't provide. Now your
"solution" has made the problem worse, not better.

If you're going to provide this sort of simplified interface, you need to make
damn sure a _super majority_ of users never need to look past it. Otherwise
you've done much more harm than good, by providing yet another competing
alternative.

The real problem with packaging in Python is that users are asked to write
this program, setup.py, that by itself should not be a very difficult program
to write. Its tasks simply are not that complicated.

What makes it complicated is that they're then told that their program should
consist of a single function call --- a call to this monstrously complicated
setup() function, with a confusion of conflicting arguments.

This is a stupid way to write a program! This design decision is at the heart
of the whole problem. The interface to setuptools, distribute, distutils etc
is fucked and always will be fucked, because there's no way to provide a good
interface to the functionality via a single function call.

That's why every semi-complicated setup script ends up having to monkey patch
the internals of setuptools or distutils, to say, replace some of the
Extension class machinery, or try to correct a compiler flag. It's because the
design is fundamentally a failure. The whole idea does not work.

Things would be much easier if we had direct, simple control. You can provide
a default fall-through, but it should be easy to route the "build_ext",
"install", "sdist" etc commands to your own function.

Just tell us what our program needs to do and provide us a nice library of
building blocks to do it, and we will write this program. Simple.

By far the majority of my tickets for my NLP library, spaCy, are related to
packaging and installation. Why should putting some files onto a user's
computer and zipping up some files on my machine and calling out to a compiler
and specifying some dependencies be harder than understanding natural
language?

~~~
crdoconnor
This is essentially the same problem faced by all frameworks. When they are
done well and in a loosely coupled fashion, a framework will make the basic
stuff trivial to do with a few _declarative_ lines of code and everything you
might want to do will be possible without much trouble. Even, perhaps
especially, the things that the creators didn't even imagine.

When a framework is done badly, in a tightly coupled fashion, it may make a
bunch of basic stuff easy but it will also make more complex stuff either
impossible or possible but requiring horrendous contortions in the code.

I think that a bunch of libraries is better than a poorly designed framework
but I still don't think they're a substitute for a well designed framework.

Unfortunately, many framework developers' attitude to such contortions is not
"wow, we screwed up, we'd better accomodate this valid use case" (or even
better make it possible to do declaratively) but instead this:

"You shouldn't be doing that"

~~~
Radim
Even worse: "Let's add another function API parameter (to the list of two
dozens already there!), which will take care of _this_ particular use case".

Now the opaque black-box "framework solution" works again!

Also discussed in my "API bondage" post: [http://rare-technologies.com/data-
streaming-in-python-genera...](http://rare-technologies.com/data-streaming-in-
python-generators-iterators-iterables/)

------
crdoconnor
One thing that is direly needed in pip is a way to specify system package
dependencies. The way packages fail right now if you're missing something like
libjpeg is cryptic and nasty. Users need to know that they need to "apt-get
install x" or "yum install y" without googling for that cryptic error message.

Compared to that, I'd consider the overgrown setup.py interface or need for a
MANIFEST.in file to be barely even a problem.

I tried writing a package that took care of specifying unix packages in a
distro independent way
([https://github.com/unixpackage/unixpackage](https://github.com/unixpackage/unixpackage)),
checked for installation and asked politely via sudo to install if the package
wasn't installed.

However, in pip, the "ask politely" approach became impossible after v1.5.6
because of this bug/decision:

[https://github.com/pypa/pip/issues/2732](https://github.com/pypa/pip/issues/2732)

(a sudo prompt came up but it was not possible to give an indication to the
user of why it was there, so I gave up on the idea)

~~~
falsedan
pip is garbage. Example: when building the dependency graph of requirements,
if the same distribution appears twice with different versions, it picks one
and doesn't emit an error.

~~~
crdoconnor
Ouch. That is bad.

------
falsedan
So Perl had the same sort of problem—everyone used the cpan client to install
packages by wrapping the calls to

    
    
        perl Makefile.PL
        make
        make test
        make install
    

And package authors would use helper modules to handle the Makefile.PL
cruft[0][1]. It sucked.

rbjs wrote Dist::Zilla[2], a packaging abstraction tool: you would tell it
what was in your package, and it would write the installer for you (and you
could choose which of the Makefile.PL helpers to use, if you cared). The end
user never sees any remnant of the tool.

The closest Python has right now is pbr[3].

[0]
[https://metacpan.org/pod/ExtUtils::MakeMaker](https://metacpan.org/pod/ExtUtils::MakeMaker)
[1]
[https://metacpan.org/pod/Module::Install](https://metacpan.org/pod/Module::Install)
[2]
[https://metacpan.org/pod/Dist::Zilla](https://metacpan.org/pod/Dist::Zilla)
[3]
[http://docs.openstack.org/developer/pbr/](http://docs.openstack.org/developer/pbr/)

------
yen223
It's funny that a lot of Python's design hinges around having "one right way
to do it", which is a strength of the language, and yet there's a million
different ways to do packaging.

Python really needs one, and only one, true way to do packaging, but I think
it's too late for that to happen now.

~~~
moonbug
Python's reputation for being clean is largely an artifact of it being a
contemporary of PHP:

~~~
krapp
And yet PHP wound up with a fairly decent package manager when Composer showed
up.

Although it's becoming more and more common to require NPM as well for PHP
development now, apparently.

------
slavik81
I actually just built my first python package two weeks ago. It was pure-
python and I found it fairly straightforward, starting with the bare-bones
LPTHW [1] template, and adding things from the Sharing Your Labor of Love [2]
guide.

It basically 'just worked' across Python 2, Python 3, Linux, OSX and Windows.
There's a lot of outdated examples out there, and it might be more troublesome
if I had a more complicated project, but... well, it was ok for me.

[1]
[http://learnpythonthehardway.org/book/ex46.html](http://learnpythonthehardway.org/book/ex46.html)
[2] [https://hynek.me/articles/sharing-your-labor-of-love-pypi-
qu...](https://hynek.me/articles/sharing-your-labor-of-love-pypi-quick-and-
dirty/)

------
colanderman
As an outsider to Python/Perl/Ruby/etc., I've never understood why these
languages have each their own packaging system, rather than simply relying on
the host OS's packaging system. After all, it serves C, C++, and Bash well;
and pip packages generally are repackaged as Red Hat/Debian/etc. packages
anyway.

Making .rpms and .debs is not all that hard, though I admit there are a few
too many package formats (and that still leaves non-Linux users without a
clean solution). Beside this, what benefits do users of these language-
specific packaging systems enjoy that I am missing?

~~~
keypusher
The first is being distro agnostic. The second is the ability to install
packages without sudo, and stuff like virtualenv. The recommended way to set
up a python project these days is to create a virtualenv and install all your
packages there. That way it does not interfere with system packages, and can
easily by cloned or destroyed as necessary.

~~~
slavik81
I really feel that's a failing of rpm and dpkg. You should be able to install
new packages for your local user without sudo and without affecting other
packages. If the system tools weren't so lacking, I doubt we'd have the
proliferation of language-specific packaging systems.

If they were better designed, you could simply tell debian users to install
rpm to handle their rust packages, like you now tell them to install cargo.
That sounds ridiculous, but it shouldn't.

I'm hoping Nix, Guix or something of that sort can one day fill that role.
Integrate them nicely into distros and let them slowly take over the non-
system packages. Let rpm and dpkg slowly wither away until finally, with a
little "pop", they disappear into nothingness.

Of course, all these language-specific package managers support Windows, so I
hope Nix and/or Guix actually do so. As git has shown, it doesn't even have to
be amazing support. But, without something basic, people are going to keep
inventing new package managers to make Windows work.

Will you kill off all other package managers? Obviously not, but I do think
you could prevent quite a few people from re-inventing the wheel for their
next language if there was already a good tool to build upon.

------
jaywunder
My biggest takeaway from the article is that you "want something simple", and
most people would agree with you. What you don't talk about, or dodge along
the lines of is saying "X is how we should do it".

Personally I like node's way of using package.json and cargo's Cargo.toml. If
someone made that for python that'd be awesome.

The other thing node has specifically is the "node_modules" folder, which
recursively has all the dependencies and their node_modules folders. While it
might not be super space efficient it's really stupid simple.

~~~
wldcordeiro
Npm 3 now does some dependency tree pruning by flattening the node_modules
folder as much as possible too which addresses that size issue.

------
awinter-py
I want to see an 'algebra' approach to packaging -- strongly-typed packaging
modeled as types and operators. Package and import systems present a lot of
surprising behaviors and edge cases in part because they're not well-defined
enough to be well-documented.

Also, packaging systems need to get better at integrating with VCS versioning.
The elm packaging system goes beyond this to automatically bump versions when
your api changes [https://github.com/elm-lang/elm-package#version-
rules](https://github.com/elm-lang/elm-package#version-rules), whoa cool. But
at minimum, make a note of git/hg id.

The 'just use npm' approach is interesting too
[http://dominictarr.com/post/25516279897/why-you-should-
never...](http://dominictarr.com/post/25516279897/why-you-should-never-write-
a-package-manager). NPM's approach to 'dependencies of dependencies' has a lot
of fans.

Finally: part of the problem here is that there's no package manager for C.
The author dances around this in saying 'C extensions are hard, we should
ignore them'. If we had a good C package manager a lot of work linux distros
do in supporting C libraries would be simplified and builds of everything
would be _so much_ more portable.

~~~
derefr
> NPM's approach to 'dependencies of dependencies' has a lot of fans.

NPM's approach works for Javascript because package A importing libfoo-1.0 and
package B importing libfoo-2.3.11 doesn't cause any sort of global namespace
collision for "Foo" within the JS runtime; each Foo is just a property of the
object returned by require().

This isn't true of most other platforms; you can't e.g. import two versions of
a Ruby library into the same Ruby runtime. (You can import two versions of an
Erlang module into the same BEAM VM, but the second one will be treated as a
code upgrade for the first. Processes that don't do a full-qualified tail-call
will stay on the "old" version, though, so both versions _can_ be loaded in
parallel. There's still only one global "module table", though.)

On the other hand, the "packages get their own scratch namespaces which their
dependencies can be exposed into" _is_ true of Unix (or rather, can be made
true by clever use of symlinks/chroots/etc.), which is why Nix can do what it
does.

~~~
awinter-py
Solid point -- the py import system uses global sys.modules, so npm-style
nested deps couldn't happen on a vanilla python.

That said, it's not impossible to hack the import command to use a different
package tree per module. The main difficulty would be supporting the different
ways of storing modules (eggs, filesystem, .so support).

------
objectified
I think that for the majority of use cases when it comes to packaging and
shipping Python programs, we really want to ship OS packages. We don't want
users (that includes sysadmins/operations teams in this case) to deal with
having pip available, understanding what virtualenv is, and so on. We just
want them to be able to use apt, yum, or whatever they experience as most
common on their system. Yet at the same time, we don't want our application to
take in regard ancient versions of OS packages to depend on.

I think in the end we want a self contained OS package, to offer a sane
install method for any target OS. Why would users have to make a difference
between e.g. a C program that they install through apt, or a Python program
that all of a sudden requires a completely different way of installing?

Here are some interesting thoughts on it: [https://hynek.me/articles/python-
app-deployment-with-native-...](https://hynek.me/articles/python-app-
deployment-with-native-packages/)

And here's a tool that I wrote to be able to wrap up a virtualenv into an OS
package, while specifying a precise list of OS dependencies. It's not perfect
by far, and admittedly a bit crude, but it already does the job well for some
real life "production" stuff. It's called vdist (virtualenv distribution).

[https://github.com/objectified/vdist](https://github.com/objectified/vdist)

Documentation is here:

[https://vdist.readthedocs.org/en/latest/](https://vdist.readthedocs.org/en/latest/)

~~~
iofj
Software engineers often have a need to have their package work on multiple
distributions - or even OSes, which they may know little about. They maintain
the program, and try really hard not to care what it runs on. They need the
SSL bug fixed on both redhat, debian and mac os X. Hence the "static linking"
solutions where all the packages are controlled from the program, like
virtualenv/pip.

System engineers are charged with keeping various programs running on
(usually) a fixed distribution/OS. They need to have a way to update SSL when
there is a security issue, for all applications. They need to be able to
update various libraries and have programs transparently use new or otherwise
fixed versions of libraries. They also very much need the ability to uninstall
packages.

OS level packages are a solution written to serve the need of sysadmins. But
software engineers don't care, because maintaining the correct package list
and versions for 4-5 distributions is horrible. So they don't.

What I don't get is why docker isn't the ideal solution for both groups.
Assuming, of course, sysadmins have both ability and the inclination to insist
on the sources.

~~~
curryst
> What I don't get is why docker isn't the ideal solution for both groups.

For some things, Docker is a great solution for both groups, but that requires
that your organization and application meet a list of requirements. Is the
application only going to be run by you, as a server? If it's going to be run
elsewhere, you may not have the option to require it run on Docker or if it's
going to be run as a client then you definitely don't have the option.

The conflicting goals come from the System Engineers goal of making it run on
all platforms vs the System Administrators goal of making it run on _this_
platform. System Engineers want a solution that makes it easier to abstract
away the platform, System Administrators want a solution to make it easier to
install on this platform. OS packages are easier as a System Administrator
because it all ties into a central source of authority. Each package knows
which other packages are installed on the system and which packages it needs.
Fragmenting the environment makes it more difficult because you've got
distinct sources of authority. Pip package A needs version 1.2 of OS package
B, but pip package C needed for pip package D needs version 1.3 of OS package
B. All of which you'd have no idea about until it all crashes and burns and
you have to go figure out why.

OS packages are usually better in terms of installation, but pip packages are
easier to create. Ubuntu's new packaging system is meant to address this,
though we'll see how it goes.

------
dikaiosune
Having used the combination of Gradle and Capsule in Java-land, I desperately
want to equivalent for Python. A tool that can declarative state dependencies
and bundle them into a single file (akin to a static binary) which has a
single entry point and all dependencies included. For libraries, the JAR model
works well too.

~~~
pmahoney
I don't know the state of Python packaging in Nix [1], but it can almost do
what you ask. I've used it to create a tarball of, for example, a customized
Postgres (built with MySQL foreign data wrapper in this case), and all its
dependencies down to libc (there's an implicit dependency on a Linux kernel),
and unpack and run it on a Linux system where Nix package manager is not
installed.

I've done similar for Ruby projects using many gems with various C library
dependencies, though I used a fair bit of custom Nix definitions for this
(i.e. at the time, the state of Ruby packaging in Nix was poor; I'm not sure
of the current status).

A signifiant requirement is that it must live in /nix (it's theoretically
possible to build everything for another path, but I don't think many people
do this so there may be surprises).

And this is all easier if the target system has a Nix package manager, rather
than dealing with (somewhat bulky, since includes everything) tarballs.

[1] [http://nixos.org/nix/](http://nixos.org/nix/)

------
Others
[https://xkcd.com/927/](https://xkcd.com/927/)

------
mkhpalm
I often skip all that and use native packages with pyinstall, dh_python, and
dh_virtualenv (from spotify)

For me its much cleaner and controlled than distutils plus the knowledge is
largely transferable to other stacks. I wrap our node, java, and ruby
applications using the same tooling as well as any services we depend on.
(kafka, kibana, spark, etc) After that I can pass off making images,
containers, installing, managing stuff, whatever onto anybody in IT or
operations that knows linux.

Obviously that wouldn't work well for everybody but it sure beats trying to
manage 20 different ways to do everything in my case.

------
pjtr
I'd prefer docs. And please: PEPs are NOT docs!

In the last couple of years setuptools changed / broke so many things that
previously worked without documenting anything. Even trying to figure out what
the relevant docs would be, and who maintains them is hard. As Nick Coghlan
mentions in the comments: Even the people maintaining this stuff have no idea
what it all does.

And still, the correct (but incredibly boring and frustrating) solution is for
someone to sit down and figure this stuff out and document it. Definitely not
me though. :)

------
moonbug
Use Anaconda and be done with it.

~~~
jaywunder
I think this is the problem Others was referring to with the xkcd. Anaconda
might be the best one in your opinion, but there's a bunch more, like pip's
`requirments.txt` and virtualenv

~~~
moonbug
I've used them all. Conda bears the least resemblance to a software clown car.

~~~
craigyk
conda is quite good. It makes a strong case to be a good general (not just
Python) package manager. In particular I think the way it tries to make
compiled binaries relocatable is very useful (and clever).

------
harlowja
Openstack uses PBR (no not the beer):

"""Python Build Reasonableness"""

[http://docs.openstack.org/developer/pbr/](http://docs.openstack.org/developer/pbr/)

It is pretty nice and helps a ton with packaging sanity...

------
sigsergv
We don't need to get rid of setuptools/distutils/etc. We need something like
debhelper but for python, set of tools that generates and maintains proper
setup.py and relatives.

