
Packagers don't know best - decklin
http://vagabond.github.io/2013/06/21/z_packagers-dont-know-best/
======
tmoertel
The packagers actually do know what’s best. What they do makes patches flow
faster not only downstream but also upstream. _Improvements and fixes get to
more people and get to them faster._

Unbundling upstream libraries from downstream projects flattens the change-
flow network, reducing the time it takes for things to get fixed and for the
fixes to propagate. For example, say that project P uses library L and bundles
a slightly modified L in its release. Whenever L’s developers fix or improve
or security-patch L, P’s users don’t get the new code. They have to wait for
P’s developers to get around to pulling the new code from L, applying their
own modifications, and re-releasing P.

Packagers say that’s crazy. They ask: Why does P need a modified L? Is it to
add fixes or new features? If so, let’s get them into L proper so that L
proper will not only meet P’s needs but also provide those fixes and new
features to everyone else. Is it because P’s version of L is no longer L but
in name? Then let’s stop calling it L and confusing everybody. Fold the no-
longer-L into P or release it as a fork of L called M that can have a life of
its own.

The point is that keeping L out of P makes two things to happen: (1) It
ensures that when L’s developers improve L, all users, including P’s
downstream users, get those improvements right away. (2) It ensures that when
P’s developers improve L, those improvements flow upstream to L quickly and
reach all of L’s users, too.

More improvements, to more people, faster. That's the idea.

~~~
npsimons
Yeah, I love how the article is so myopic, they can't imagine a world in which
they might be using a package that _some other package is also using_ ,
therefore, it _might need to be upgraded separately from their package_. So
the author has worked on two large projects that have dependencies, yet he
thinks he has the experience to say that splitting a package up (say, into
docs, libs and executables) is a bad thing? How many embedded devices has he
administered? Or clusters? Or simple networks where things are setup to have
NFS mounts across machines, and it's obvious that while you can install the
docs on the NFS-doc-server once, you may need to have separate binary and
library installs for each architecture/OS on the NFS-binary-servers. There's a
reason sysadmins _love_ well packaged software.

~~~
regularfry
Splitting a package into docs, libs and executables makes a lot of sense.
Splitting those further, so you've got umpteen "independent" packages which
95% of users are just going to have to manually recombine to get the
functionality the upstream package provides out of the box, can get
pathological. Debian has historically been particularly bad at this, and
Ubuntu inherited that tendency.

~~~
mwcampbell
Please give an example of a pathological case in Debian.

~~~
regularfry
Ruby in etch was pretty absurd, from memory.

------
jordigh
The point of not using embedded libraries isn't about saving space. It's about
not having several slightly different versions of the same bug spread out
across several slightly different versions of the same library.

Saving space is just a nice side effect, so why not have that too?

The DLL hell problem doesn't exist in a GNU-based system because we have
sonames. Windows and Mac OS X don't have those; instead, the software
libraries there can't coordinate with each other harmoniously, so each program
has to have all of its libraries packaged with its own set of bugs while
making a hostile and rude gesture to the rest of the programs in the OS.

~~~
marssaxman
And yet, the Mac OS X user experience is _so much nicer_ than the one you get
with a GNU-based system; you download an app, it is self contained, it works,
end of story. I have been hearing the same old story for years about how
dependency-tracking package managers are the right way, and yet that
environment continues to have problems, as described in the article; while the
supposedly inferior Mac OS X packaging system just works, and I never have to
mess with anything.

I am happy to give up a little extra disk space in exchange for having
predictable executables that work in the configuration they were built and
tested for.

~~~
dman
Can you imagine using OSX as a development box without homebrew?

~~~
hosay123
Yes, and my /usr/local isn't a freaking .git repository. Homebrew,
bikeshedding what MacPorts achieved a decade previous (and getting it horribly
wrong) since 2009

~~~
randallsquared
If by "achieved", you mean "aspired to". MacPorts never actually seemed to
correctly build the things I needed. Of course, I gave up before 2009.

------
JoshTriplett
"I know you have all these rules that try to make packages consistent so
sysadmins don't have to give any extra thought to each individual one, but I'm
a special snowflake that you should treat differently."

I care about having a system with hundreds or thousands of packages installed
on it that all work consistently.

Linux is not OS X, and packages are not .dmg files; I want your package using
the system version of libfoo, not your own fork of libfoo. If you have awesome
changes to libfoo, then you should either get them into upstream libfoo, or go
all the way and _actually_ fork libfoo into libfooier upstream to allow
packaging it separately.

~~~
dvanduzer
Reading your comment, I am starting to understand why everyone is so confused
about this.

Linux and OS X both have the same underlying options for static or shared
libraries. There is a large amount of "enterprise" software that is
distributed just like a .dmg file.

There is plenty of middle ground between having _everything_ dynamically
linked and _everything_ statically linked. The author of the article believes
that packagers should trust developers to make good decisions. (Granted, there
are plenty of bad developers and abandonware galore. Packagers are justified
in stepping in to make new decisions here.)

~~~
mwcampbell
As a sysadmin, which do you prefer: self-contained software distributions with
all the dependencies included, or packages that use the host distro's package
manager, use system versions of libraries wherever possible, and otherwise
integrate well with the host system? The latter seems better to me, but it's
an honest question, not rhetorical.

~~~
grosskur
False dichotomy---bundling dependencies doesn't preclude integrating well with
the host system. See Basho's latest Riak package for Ubuntu 12.04:

[http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.2/u...](http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.2/ubuntu/precise/riak_1.3.2-1_amd64.deb)

It integrates well with the host system (init script, "riak" user/group, data
in /var/lib/riak, logs in /var/log/riak, etc.) _and_ bundles dependencies in
/usr/lib/riak.

In my experience, bundling dependencies is often the only practical way to
install a complex app. Take Sentry as another example:

[https://github.com/getsentry/sentry](https://github.com/getsentry/sentry)

The current version (5.4.5) depends on 37 Python packages:

    
    
      BeautifulSoup==3.2.1
      Django==1.4.5
      Pygments==1.6
      South==0.7.6
      amqp==1.0.11
      anyjson==0.3.3
      billiard==2.7.3.28
      celery==3.0.19
      cssutils==0.9.10
      distribute==0.6.31
      django-celery==3.0.17
      django-crispy-forms==1.2.8
      django-indexer==0.3.0
      django-paging==0.2.5
      django-picklefield==0.3.0
      django-social-auth==0.7.23
      django-social-auth-trello==1.0.3
      django-static-compiler==0.3.3
      django-templatetag-sugar==0.1
      gunicorn==0.17.4
      httpagentparser==1.2.2
      httplib2==0.8
      kombu==2.5.10
      logan==0.5.6
      nydus==0.10.6
      oauth2==1.5.211
      pynliner==0.4.0
      python-dateutil==1.5
      python-openid==2.2.5
      pytz==2013b
      raven==3.3.11
      redis==2.7.6
      sentry==5.4.5
      setproctitle==1.1.7
      simplejson==3.1.3
      six==1.3.0
      wsgiref==0.1.2
    

Ruby or Node apps have dependency trees of similar or greater size. It takes
an enormous amount of effort to roll all of these as individual packages (yes,
I've done it) and it's a colossal waste of time once you realize you can have
a working package in under a minute with virtualenv, pip, and fpm:

    
    
      $ virtualenv --distribute /opt/sentry
      $ /opt/sentry/bin/pip install sentry
      $ fpm -n sentry -v 5.4.5 -s dir -t deb /opt/sentry

------
dfc
_Look at this ubuntu erlang package, it depends on 40 other packages, as well.
That isn’t even the worst of it, if you type ‘erl’ it tells you to install
‘erlang-base’, which only has a handful of dependencies, none of which are any
of these erlang libraries!_

That package is a dummy package that depends on erlang-base and the rest of
the base erlang platform. You would have to force dpkg to ignore dependencies
in order to install erlang without erlang-base. I would love to hear how that
happened.

Splitting things up into multiple packages makes distributions easier to
manage. One person can take the lead on package-dev while another person can
take the lead on package-doc. Splitting things up into multiple smaller
packages also makes distributing fixes a lot easier. With a one line fix to
one include would you rather send out the entire erlang environment or just
the small package that needed the fix?

And yes splitting things up to save storage requirements is most useful for
resource constrained devices, not new servers/laptops. But it means that a
user who is comfortable with Debian or Fedora on the server/desktop can use
their same trusty OS on their next project when the device places serious
restrictions on system overhead.

~~~
omaranto
You misunderstood the point the author made about erlang-base: it's not that
he or she somehow installed Erlang without installing erlang-base, but rather
that if in an Ubuntu system you try to run 'erl' before installing any Erlang
packages at all, you receive a message telling you something like "to get the
erl command install the package 'erlang-base'" and if you go and do that, you
don't get the Erlang standard library! The point is that Ubuntu should either
suggest 'erlang' instead, or not have all those separate tiny packages in the
first place.

~~~
YokoZar
Ahh, this is just a simple bug in the command-not-found package that makes
those recommendations when you type in a missing binary, not an underlying
problem with the entire philosophy of splitting pacakges!

~~~
omaranto
I'm not sure it's really a bug with command-not-found: I think in this case it
correctly gives the package containing the erl command, namely erlang-base;
the problem is that Ubuntu decided that there should exist an erlang-base
package that gives you erl without the standard library.

------
andrewflnr
I'm going to go ahead and plug the Nix package manager here:

    
    
      Nix is a purely functional package manager. This means
      that it can ensure that an upgrade to one package cannot
      break others, that you can always roll back to previous
      version, that multiple versions of a package can coexist
      on the same system, and much more.
    

So you can all have your own versions of lager or whatever, and still have
everything managed sort of nicely. Doesn't solve the "include the docs or not"
problem, though. And I'm not sure if it does anything for tmoertel's patching
concerns.

[http://nixos.org/nix/](http://nixos.org/nix/)

~~~
zaphar
Nix is possibly the only system I know of other than maybe homebrew, strangely
enough, that has designed itself sanely.

It solves the security issue the space issue, and also the I need special
patches for my version of this lib in my unique application.

------
schmonz
Upstream developers don't know best, either. Packagers sometimes make bad
decisions, just like upstream does, because we're all people. "Install our
software the way we think you should" is a point of view, but not a very smart
one unless it's accompanied by a willingness to be persuaded otherwise. This
particular upstream developer clearly hasn't seen an OS-agnostic cross-
platform package manager like pkgsrc, where one of the packager's tasks is
often to make software more portable than upstream cares to bother with. To
take one obvious example, we make sure libtool works on all our supported
platforms, and then we make sure software with its own precious all-the-
world's-a-Linux way of linking shlibs uses libtool instead. Do we try to feed
back our portability fixes upstream? Of course. Does upstream always want
them? Of course not. Are they wrong to not care? We sometimes think so. Are we
wrong to patch their code? They sometimes think so. They have their goals, we
have ours. If anyone reliably knows best about anything, it's users.

------
m-r-a-m
My favorite _interesting_ packaging choice is TeX Live in Fedora 18 [1]. There
are about 4500 texlive-* packages (out of around 35000 binary packages in
Fedora total). The packagers split up the packages based on upstream metadata.

[1]
[https://bugzilla.redhat.com/show_bug.cgi?id=949626](https://bugzilla.redhat.com/show_bug.cgi?id=949626)

~~~
bcl
s/interesting/insane/

~~~
andor
As a TeX user, I find this extremely useful. The Fedora packages map 1:1 to
Texlive packages. There is no need to research if a LaTeX package is available
and in which Fedora package it is hidden, you can just install "tex-
packagename".

------
ochs
The solution is easy: if you fork a project and it becomes incompatible with
the upstream, _rename it_. How is anyone supposed to discriminate between the
two versions if they have the same name?

Also, I'd say, if your software needs lots of modified dependencies, you're
not communicating with those projects properly.

If every single project were to fork every one of their dependencies, the
result would be maintenance nightmare.

~~~
notacoward
>if your software needs lots of modified dependencies, you're not
communicating with those projects properly.

This, a hundred times. The OP wants to bundle modified versions of other
people's open-source software as part of their own without feeding the changes
upstream properly, and that's just not the right way to do things.
Distributions' rules discouraging bundled packages are there because _even
worse things happen_ if everyone does that. Sometimes the dependent package
has to put off packaging a new release for a particular distro until their
dependencies are satisfied, but then it's time to put on big-girl panties and
move on. Managing dependencies and reducing version sensitivity are part of a
developer's job.

~~~
zaphar
A change that is only useful to you has little likelihood of being accepted
upstream. Changes that are only useful to you are far more frequent than you
seem to think.

~~~
notacoward
Don't assume you know what I think. I've had to grapple with this issue myself
many times. I've had to implement nasty workarounds because upstream rejected
a trivial patch. It's a pain, but spewing about how packagers all have OCD and
live in the past is hopelessly egocentric and whiny . . . and
counterproductive. They do know what they're doing, and their policies
generally do make sense if you consider what works across thousands of
packages instead of just one. Exceptions and accommodations can be made when
the benefit outweighs the cost or risk, but a case has to be made for that.
Throwing a tantrum isn't making a case.

------
Nux
There's also the security factor (that many devs today like to ignore); using
shared stuff simplifies it. Maybe packagers don't know best, but neither does
this guy.

~~~
kevingadd
You say this guy doesn't know better, but given that he's talking about
shipping a security sensitive application that relies on custom tuned, tested
forks of libraries, how can you say that he's wrong for not wanting his
library fork replaced with some arbitrary version on an end-user's machine?
How can that possibly be safer?

It's certainly nice to be able to take an existing library an app depends on,
patch it to fix a security hole, and drop that in. But that isn't what's
happening in this context...

~~~
binarycrusader
So the developer wants to reduce their cost of properly engineering and
documenting their application's usage of a particular library in exchange for
significantly increasing the costs of rebuilding and updating every software
package that uses the same libraries onto the OS developer and their
customers?

~~~
zaphar
Who said he didn't properly engineer and document their applications usage of
a particular library?

~~~
tantalor
By forking a library, you are not properly using it. You're using something
else.

Here "properly engineering and documenting" means pushing upstream changes to
officially support your use case, and documenting it so other people know why
your use case is important.

------
tehwalrus
I have two projects which follow different ideas on this (mostly due to size).

In both cases, I've basically written "lazy python bindings" for something in
C++ (lazy because I only support the features I want in pythonland). Neither
of the C++ projects is on github or anything, they're just hosted out there
somewhere else (one on SVN, and one only available as archives, I think.)

In the archive case, and since the codebase is small, I just included the
whole codebase in my git repo, and added a few small cpp, pyx and py files
around it. This library already has a fork, and has the most stars (like, 3)
of all my github repos - embedding all the required code and statically
linking (indeed, compiling) it as part of my `setup.py` works great, and is
easy for 3rd party users too.

In the SVN case, the main project is huge, like a few hundred MB of source
(and they use some crazy code generation, so that's not even the half of it.)
It also comes with its own very very basic python driver. So, my approach is
to give people two or three small patches, build instructions (the project is
a nightmare to build correctly,) and then my python code just installs on its
own and talks to the project as a normal python library. This version is
useless - it's permanently out of date, _I_ can't even get the build
instructions I wrote 3 months ago to work when I'm trying to set it up _for_
someone else, and the whole thing is a massive nightmare. If I'd forked it and
provided the huge source tree myself, that would be reduced - but that project
is also under active development and it'd be great to actually _use_ their
latest, least buggy version!

Each of these decisions was made the way it was for real, sensible reasons -
I'd hate for a package manager to have to contend with the mess of the second
project, and yet apparently that's the way they'd prefer to go with both!

Good job no one needs to use any of my code, really.

------
binarycrusader
While I sympathise with some of the complaints the developer has, the idea
that every software component should live as an isolated stack that duplicates
its entire set of dependencies is misguided.

OS administrators want a maintainable, supportable system that minimises the
number of security vulnerabilities they're exposed to and packages software in
a consistent fashion. They also want deterministic, repeatable results across
systems when performing installations or updates.

Likewise, keeping various components from loading multiple copies of the same
libraries in memory saves memory, which helps the overall performance of the
system.

Also, statements like this aren't particularly helpful and are factually
inaccurate:

    
    
      So package maintainers, I know you have your particular
      package manager’s bible codified in 1992 by some grand
      old hacker beard, and that’s cool. However, that was
      twenty years ago, software has changed, hardware has
      changed and maybe it is time to think about these choices
      again. At least grant us, the developers of the software,
      the benefit of the doubt. We know how our software works
      and how it should be packaged. Honest.
    

Some packaging systems are actually fairly new (< 10 years old), and the rules
determined for packaging software with that system have actually been
determined in the last five years, not twenty years ago as the author claims.
Nor are the people working on them grand, old, bearded hackers.

OS designers are tasked with providing administrators and the users of the
administrated systems with an integrated stack of components tailored and
optimised for that OS platform. So developers, by definition, are generally
not the ones that know how to best package their software for a given
platform.

As for documentation not being installed by default? Many people would be
surprised at how many administrators care a great deal about not having to
install the documentation, header files, or unused locale support on their
systems.

Every software project has its own view of how its software should be
packaged, and while many OS vendors try to respect that, consistency is key to
supportability and satisfaction for administrators.

So, in summary:

* preventing shipping duplicate versions of dependencies can significantly reduce:

\- maintenance costs (packaging isn't free)

\- support provision costs (think technical support)

\- potential exposure to security vulnerabilities

\- disk space usage (which does actually matter on high multi-tenancy systems)

\- downtime (less to download and install during updates means system is up
and running faster)

\- potential memory usage (important for multi-tenancy environments or
virtualised systems)

* administrators expect software to be packaged consistently regardless of the component being packaged

* some distributors make packaging choices due to lack of functionality in their packaging system (e.g. -dev and -doc packaging splits)

* administrators actually really care about having unused components on their systems, whether that's header files, documentation, or locales

* in high multi-tenancy environments (think virtualisation), a 100MB of documentation doesn't sound like much, until you realise that 10 tenants mean 10 copies of docs which is a wasted gigabyte; then consider thousands of virtualised hosts on the same system and now it's suddenly a bit more important

* stability and compatibility guarantees may require certain choices that developer may not agree with

* supportability requirements may cause differences in build choices developers do not agree with (e.g. compiling with -fno-omit-frame-pointer to guarantee useful core files at minor cost in perf. for 32-bit)

I'd like to see the author post a more reasoned blog entry with specific
technical concerns that are actually addressable.

~~~
zaphar
"the idea that every software component should live as an isolated stack that
duplicates its entire set of dependencies is misguided"

That's not what he said. He said that packagers frequently break his software
for users by incorrectly breaking it up into the wrong pieces and then
including a version of that piece that doesn't work. It's especially bad in
the case of erlang applications as he enumerates and it's caused by packagers
not taking the time to understand the consequences of where they split the
software into packages, all in the name having only one version of lib-erl-foo
installed on your system.

~~~
binarycrusader
If the developer didn't make it clear the they had essentially forked Erlang
or what the component's requirements are, the blame lies with them, not the
packager.

If the developer did, then they need to reconsider how difficult their making
the lives of their customers by forcing the potential for additional
vulnerability exposures on the system.

There's a non-zero cost involved in packaging.

------
mkhattab
I love that the OP brought up FreeSWITCH because this is one example where I
believe it's most troubling for package maintainers, software engineers and
system implementers alike. From a software engineer's perspective, including
3rd party libraries in one source tree it transfers the burden of maintenance
and support to one project maintainer. Not reinventing the wheel is good and
all, but you still have to maintain its integrity.

From a package maintainer's perspective, especially in the case of Debian,
they must ensure that packages are stable and secure. It's their job to make
sure security updates are released. In the case of FreeSWITCH, there's no
distinction between the main source and its dependencies. Package maintainers
might as well not bother with including software like FreeSWITCH in their
repos or risk the integrity of their system.

System implementer's are mostly ambivalent about these issues until their
distro's FreeSWITCH package includes broken dependencies or until their
FreeSWITCH installation has a security exploit due to a library that can't be
patched independently.

I love FreeSWITCH but I'm sorry to say that it's poorly architected. However,
I'm a system implementer, so I don't care.

------
mey
Along the same issue, see Debian (and as a result Ubuntu) and Ruby Gems. Used
to drive me up the wall (until I stopped bothering).

~~~
claudius
Yeah, these Ruby guys releasing incompatible versions every other weekend and
expecting to be allowed to blurt their stuff all over the system was rather
strange. Good thing Debian provided decent packages.

------
mwcampbell
Having read the arguments on this thread, and having seen the pathologies of a
single mega-repository of packages as in Debian (e.g. long release cycles,
breaking stability policies for the major web browsers), I think that Ian
Murdock's former company Progeny was on the right track with its component-
based Debian derivative. As I remember it, the idea was to have a small base
system, then have separate components for things like GNOME, Firefox,
OpenOffice.org (now LibreOffice), etc.

Meanwhile, Ubuntu's split between main and universe/multiverse is a pretty
good compromise. I wouldn't be disappointed if Ubuntu jettisoned universe and
multiverse, the better to focus on having a solid main repository, and let a
thousand small, focused repositories pick up the slack. As long as all of
those repositories leave the packages in main alone, as EPEL does with Red
Hat-based systems.

------
malkia
There is something to be said about the API's themselves. For example sqlite
is backwards compatible (interface-wise), but then my recent worst example was
the perforce (p4) client library. It uses C++ and the folks keep changing
member variables in the exposed interfaces forcing us to recompile.

------
spullara
The real issue with bundling software is that you can't pull in a security
patch. You actually have the same issue with internal packages at large
companies. If you can stay on the current release you can drastically reduce
the effect of security bugs.

------
mwcampbell
I wonder which packagers the author is griping about primarily. I don't see
Riak in Debian.

------
jeremiahjordan
This is the reason our company creates their own packages and runs our own
repositories.

------
grigio
i think the right approch is packages for the operative system layer and
bundles for the applications.

it isnt very smart that a user must be root to install a GUI

------
MostAwesomeDude
If only Erlang had versioning in modules like other languages do. Modules are
hard, most languages get them wrong, and this should be fixed, but you
shouldn't blame packagers.

~~~
nosequel
You are missing the point. Even if erlang did support versioning of modules
the problem would still exist. Package maintainers arbitrarily break things up
because they immediately see a dependency and think it needs to be a separate
package. They do this completely ignoring the big picture of shipping solid /
tested code.

~~~
asuffield
On the contrary, the people maintaining packages in the distribution are
firmly on the side of shipping solid, tested code. That hacked up duplicate of
a library that you copied into your source tree? It does not have one
hundredth of the testing that has been applied to the version of the library
which every other package on the system uses. You have to think about the
system as a whole, not assume it is a bootloader for one application.

~~~
zaphar
The answer here is don't package the application using that library then.
You'll just ship broken software.

Or....

You can recognize that the author needed those patches to that library and
figure out some way to include them.

