Hacker News new | comments | ask | show | jobs | submit login
Packagers don't know best (vagabond.github.io)
119 points by decklin on June 21, 2013 | hide | past | web | favorite | 126 comments

The packagers actually do know what’s best. What they do makes patches flow faster not only downstream but also upstream. Improvements and fixes get to more people and get to them faster.

Unbundling upstream libraries from downstream projects flattens the change-flow network, reducing the time it takes for things to get fixed and for the fixes to propagate. For example, say that project P uses library L and bundles a slightly modified L in its release. Whenever L’s developers fix or improve or security-patch L, P’s users don’t get the new code. They have to wait for P’s developers to get around to pulling the new code from L, applying their own modifications, and re-releasing P.

Packagers say that’s crazy. They ask: Why does P need a modified L? Is it to add fixes or new features? If so, let’s get them into L proper so that L proper will not only meet P’s needs but also provide those fixes and new features to everyone else. Is it because P’s version of L is no longer L but in name? Then let’s stop calling it L and confusing everybody. Fold the no-longer-L into P or release it as a fork of L called M that can have a life of its own.

The point is that keeping L out of P makes two things to happen: (1) It ensures that when L’s developers improve L, all users, including P’s downstream users, get those improvements right away. (2) It ensures that when P’s developers improve L, those improvements flow upstream to L quickly and reach all of L’s users, too.

More improvements, to more people, faster. That's the idea.

I don't want to speak with any authority on this subject since it's mostly foreign to me. However, I will say that your explanation comes with a pretty big assumption: that a library sits on some 2-dimensional spectrum from bad to good. If you don't subscribe to this notion, libraries don't improve; they simply change. If you accept this, then you begin to see why arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness.

I'm not saying I disagree with you, just trying to point out a spot you might have overlooked.

Ok, I definitely have a bias toward what the grandparent is saying: flatten the hierarchy.

But just from a logical standpoint, couldn't you apply the following equally to the Riak guy who is complaining: "arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness"

It absolutely is madness. If the Riak guys want to use leveldb in a way Google won't support, they should rally with the package managers and get Google to stop being "pretend open source." (Hint, Google: just releasing the source doesn't work if you ignore all bug reports and patches from outside.)

I suspect the real issue here is too much "Not Invented Here" syndrome by all parties involved.

That's silly. Leveldb is totally open source in every sense of the word. Just because you want to customize it in a way they didn't originally customize it doesn't mean they are "pretending" open source. The fact that they are allowed to customize it in their own custom fork is largely because google is not "pretending".

There are plenty of reasons to ignore patches from outside that are completely valid. Google gets to decide the direction of their fork of leveldb. If a patch doesn't fit that direction they are under no obligation to accept it.

It's not madness for Riak to want a divergent version of a package. Nor is it madness for the package maintainer not to desire to take that package in the direction that Riak wants to. This is why there we have forks in the first place and it's perfectly fine.

In short. No it doesn't equally apply to the Riak guy. The package is responsible for cutting boundaries in the proper place if they don't want to do the work investingating that then they shouldn't package it.

> If the Riak guys want to use leveldb in a way Google won't support, they should rally with the package managers and get Google to stop being "pretend open source."

They don't have any control over Google, by "rallying" or otherwise.

Consider the recent case of the WebKit/blink split. Here you have two sets of some of the world's smartest engineers, who cannot agree about how to render a webpage! And this is a well-defined problem. There are actual standards about how to render webpages! In theory, everybody agrees about what's going on here, and yet it's fork time.

As for what Library X does, there are no standards, and not necessarily does anybody agree about what they are building. And let's be honest here, you probably do not have WebKit-caliber developers hacking on Library X. So the chance that you can arrive at consensus for Library X is much lower than for WebKit/blink.

Meanwhile, instead of dicking around with the will-they-wont-they-merge-upstream committee, you can just ship software that works in practice to people that want to use it. If you are a software developer, and you have the choice between writing software and arguing about it, it is usually a good bet on average to write software.

Well put, and this particular problem can easily be handled by having a prominent doc section called "Information for packagers" that outlines all this stuff. This isn't a new problem and seems to be best handled by engaging with the packagers and putting a small amount of effort into helping them, it's easy and pays enormous dividends.

The "information for packagers" doc section is often referred to as a Makefile.

It enumerates minimum versions of shared libraries, as well as explicit versions of static libraries.

The problem isn't with minimum versions, the problem is with maximum versions, which are potentially unknown at the time a project is released, i.e. the code works with the latest real ease of a library, but a future release of the library may break the project.

The only real way to avoid that is for a project to include a bunch of compliance tests, that validate that an underlying library correctly performs the operations the project needs it to do. But this is actually a lot of work, so in reality it will almost never be done. Which leads us back to the discussion about the wisdom of packagers changing libs without understanding the on downstream projects.

They really don't.

This is part of the reason for the plethora of Linux distributions. Some deployments can afford the rapid pace (and consequent instability) of the short term Ubuntu releases or Fedora. Other deployments really do require the longer term stability of the more methodical Ubuntu LTS releases or CentOS / RHEL.

Any improvement requires change, but not all changes are an improvement.

I'm not sure I was able to get my point across to you. Let me try another approach.

The improvement I'm talking about occurs upsteam of the distributions, even though it is caused by the distributions' packaging policies.

Libraries are upstream from projects, and projects are upstream from distributions. If the distributions discourage projects from bundling libraries, this policy will encourage project developers to talk to the upstream library developers to get desired changes into the libraries, rather than go the customize-and-bundle route. This improved coordination and patch-flow benefits the users of the libraries and the users of the projects, regardless of whether those users rely on any particular distribution to get the software. Users are, as always, still free to pick whatever distribution best suits their preferences, or no distribution at all. Still, they benefit from the distributions' debundling policy.

I might be able to comment better if the term "projects" were better defined. It just depends on, well, the nature of the dependencies. Most libraries depend on other libraries. Often at least three layers deep.

If you could explain how your philosophy would deal with, for example, nginx and Apache both depending on libssl, which itself depends on libcrypto, which depends on libz and libc (both of which are also separate independent dependencies of Apache and nginx) then maybe we could discuss it better.

Oh, and in theory I should be able to swap libssl for libgnutls arbitrarily. How do we handle that?

You're conflating the dependency graph with the change-flow network. The first represents how projects rely on other projects; the second represents how changes must propagate to reach all users. Once you understand the difference, you'll understand why debundling is the sensible response to the sea of large-scale interdependent software-development projects that characterizes most FOSS ecosystems.

If I understand you correctly, you're saying the dependency graph and the "change-flow network" are completely orthogonal.

Separation of concerns is a value of good software projects. But there are practical realities that the author of the article enumerates specifically.

If there is a tight coupling between his application and a handful of upstream libraries, packagers are far more likely to break his application by distributing the latest version of that shared library. Other applications that aren't as tightly coupled can handle that upgrade. Since it is tightly coupled, he's going to be highly attuned to the upgrade needs for his specific statically compiled version.

They're not orthogonal; they're two directed graphs with the same vertices and different edges.

If the dependency graph G = (V, E) has a vertex for every software project and an edge x -> y iff downstream project y depends on upstream project x, then the change-flow network is the graph C = (V, F), where there is an edge x -> y in F iff there is a downstream path between x and y in G and also y requires an update and re-release when x changes (e.g., because it bundles a copy of x in its releases).

So if there is a change to project x, for it to flow to all affected dependents, you must update all downstream neighbors of x in the change-flow network C.

For example, consider the following dependency graph, in which library L is used by downstream library L2, and L2 by project P:

    L -> L2 -> P
If none of the projects bundle their upstream dependencies in their own releases, then the corresponding change-flow network has no edges, and updating any project requires only re-releasing its own package to satisfy all dependencies:

But if L2 bundles a copy of L, and P bundles a copy of L2, then the corresponding network looks like this:

    L -> L2
    L -> P

    L2 -> P

A change to L requires re-releasing not only L but also L2, and P. A change to L2 requires re-releasing L2 and also P.

Does that make more sense now?

You seem to misunderstand how static linking works.

If P statically links to its own version of L2, then L2 is just a part of P. The fact that there may be a dynamically linked version of L2 elsewhere on the system is irrelevant.


  L -> L2 -> P

  L -> L2 -> Q

  L -> L2 -> R
If the authors of L2 release a new version that P and Q are happy with, but creates an extremely subtle segfault condition in R, then what?

The packager could just wait to release the upgrade to L2 until all downstream packages have compatible releases.

The packager could backport a subset of the L2 patches that is still compatible with R (Redhat does this a lot).

The packager could silently curse the author of R for not statically linking the necessary frozen-in-time version of L2 and thus bypassing this problem entirely.

> If P statically links to its own version of L2, then L2 is just a part of P. The fact that there may be a dynamically linked version of L2 elsewhere on the system is irrelevant.

No, it's highly relevant because when a security fix lands for L2, it takes longer to propagate to users if projects like P bundle their own versions of L2 as part of their releases. In that case, users must wait for the project developers to work the already-released L2 fixes into their own bundled versions of L2 and then release new versions of the projects before any downstream users get the fix. But if P and other projects use the same version of L2 that everybody else does, everybody gets the fix right away.

> If the authors of L2 release a new version that P and Q are happy with, but creates an extremely subtle segfault condition in R, then what? ...

> The packager could silently curse the author of R for not statically linking the necessary frozen-in-time version of L2 and thus bypassing this problem entirely.

More likely, the packager would patch L2 to fix the problem with R and then talk to the upsteam L2 developers to get the patch included in L2 proper. This way, R's users get the fix right away and the problem gets eliminated at its source, in L2, rather than papered-over in R's private copy of L2.

As I wrote in my original post, one of the big benefits of the "no bundling" policy is to make sure that patches flow upsteam to where they belong instead of piling up in downstream repos where they do good for only one dependent project instead of all dependent projects.

I mean, sure, just typeset it in LaTeX and it'll breeze right past your Fortune 500 IT department's change management board.

Meanwhile, organizations too small for a change management board can outsource that function to Debian.

Fortune 500 companies use Debian, and mom and pop shops still have to deal with the fact that patches occasionally break critical services.

If you have never in your entire life run `apt-get upgrade` and spent the next four hours wishing you hadn't, then you are quite fortunate. Regardless of the size of your shop, if you're outsourcing patch approval blindly to any distribution, you probably aren't doing anything particularly interesting.

Yeah, I love how the article is so myopic, they can't imagine a world in which they might be using a package that some other package is also using, therefore, it might need to be upgraded separately from their package. So the author has worked on two large projects that have dependencies, yet he thinks he has the experience to say that splitting a package up (say, into docs, libs and executables) is a bad thing? How many embedded devices has he administered? Or clusters? Or simple networks where things are setup to have NFS mounts across machines, and it's obvious that while you can install the docs on the NFS-doc-server once, you may need to have separate binary and library installs for each architecture/OS on the NFS-binary-servers. There's a reason sysadmins love well packaged software.

Sysadmin here.

Authors of well packaged software know when they need fine grained library features, and include static versions of that library. Authors of well packaged software also pay attention to the distribution of commonly used libraries, and make careful decisions when using system-provided shared versions of those libraries.

The author of the article is complaining when someone downstream overrides those decisions. If you're asking how many clusters one of the primary developers of Riak has run, you may not be reading closely enough.

Splitting a package into docs, libs and executables makes a lot of sense. Splitting those further, so you've got umpteen "independent" packages which 95% of users are just going to have to manually recombine to get the functionality the upstream package provides out of the box, can get pathological. Debian has historically been particularly bad at this, and Ubuntu inherited that tendency.

Please give an example of a pathological case in Debian.

Ruby in etch was pretty absurd, from memory.

In principle I agree, however in my experience the time from me submitting a patch to L that I need for my new feature in P to work until that patch makes it into a stable release of L that packagers actually ship can be month.

In that time frame I'm stuck between not shipping a new version of P (often unacceptable as I've got users to answer to) or shipping my own slightly modified version of L.

The point of not using embedded libraries isn't about saving space. It's about not having several slightly different versions of the same bug spread out across several slightly different versions of the same library.

Saving space is just a nice side effect, so why not have that too?

The DLL hell problem doesn't exist in a GNU-based system because we have sonames. Windows and Mac OS X don't have those; instead, the software libraries there can't coordinate with each other harmoniously, so each program has to have all of its libraries packaged with its own set of bugs while making a hostile and rude gesture to the rest of the programs in the OS.

And yet, the Mac OS X user experience is so much nicer than the one you get with a GNU-based system; you download an app, it is self contained, it works, end of story. I have been hearing the same old story for years about how dependency-tracking package managers are the right way, and yet that environment continues to have problems, as described in the article; while the supposedly inferior Mac OS X packaging system just works, and I never have to mess with anything.

I am happy to give up a little extra disk space in exchange for having predictable executables that work in the configuration they were built and tested for.

> And yet, the Mac OS X user experience is so much nicer than the one you get with a GNU-based system; you download an app, it is self contained, it works, end of story. I have been hearing the same old story for years about how dependency-tracking package managers are the right way, and yet that environment continues to have problems, as described in the article; while the supposedly inferior Mac OS X packaging system just works, and I never have to mess with anything.

Clearly we have differing requirements. I've found the Debian user experience so much nicer than the one you get on OS X or Windows: you install a package with apt, it pulls in all dependencies, and it Just Works. I consider "self-contained" a bug and a warning sign that makes me start looking for a ten-foot pole; "self-contained" is another way of saying "inconsistent" and "not well integrated".

I am surprised to see you compare the OS X and Windows experiences. They seem completely dissimilar to me.

With Windows, it seems like every little thing needs some complex installer procedure that splats files all over the system. Every program needs dozens of DLLs, and the only way to get rid of it is to run an uninstaller that hopefully remembers everything it created, and even more hopefully doesn't break anything else on your machine.

With Mac OS X, there's generally no installation process at all. You download the app, and you put it where you want it to go, and then you run it, and that's it. Nothing goes anywhere and you don't need any special process to manage it.

The experience I've had on Ubuntu is sort of midway between these. There's a complex installation process, and everything has to deal with it, and shit gets plastered all over your machine and there's no way you can keep track of it all, but at least things generally mostly work most of the time.

But really: why manage complexity when you can do away with it?

and you don't need any special process to manage it.

You're speaking of apt/packaging like it's a bad thing, when it's awesome. Want to download a new OSX or Win app? Open your browser, hunt it down, in the case of Win, figure out if you can trust the site, download it, open the downloaded item and do the install dance, and then have it stay sessile on your system, never updated... unless it has its own phone-home system - and now you have more crap on your system.

Want to install a package with package management? It takes literally seconds plus download time. Package management systems are all about doing away with managing complexity.

If you think OS X's approach has done away with it, you're wrong. Maybe you've found it a worthwhile tradeoff, but flattening each application's dependency tree is still a tradeoff: you get truly independent applications, but you pay for it in duplication, the costs of which are well enumerated in binarycrusader's post. Maybe you mostly don't encounter those costs. Maybe you even believe most people mostly don't encounter those costs. But they do exist.

Exactly. The "self-contained" approach means I can't just upgrade a library once and have everything using it Just Work with the features enabled (or bugs fixed) by the new version.

It's not going to just work all the time. There are so many possible libraries and possible versions of those libraries and their sub-libraries that it is possible, even likely, that nobody in the world has ever tested that precise combination of components before. You don't know whether it's going to work until you try it, and when it doesn't work, it's up to you to figure out what went wrong and fix it. This seems like a monumental waste of time. I would rather use a complete, monolithic application built and tested by the app developer, leave it the way it came, and upgrade it when the app developer has a new, complete, built, and tested version I can use.

Personally, I'd prefer a library version tested by a huge number of people running a given distribution than one tested only by the developer of a single package.

And the opposite approach means I only need to update glibc once and flash audio is broken.

> you install a package with apt, it pulls in all dependencies, and it Just Works

And what happens in the case of a package not available through apt? ("Sorcery! All applications must be packaged! All shared libraries must be packaged individually!")

And there went your afternoon installing some just-slightly-uncommon piece of software

Then you fall back to this 'superior' individual packaging method. I don't see how this is a counter.

If you packaging partisans manage to convince developers of the merits of your position, the individual packaging method won't be easily available: it'll be "wait for the OS maintainers to decide to include the newer version" or "make; make test; make install" (i.e., be your own packager).

The developers are still free to package it themselves and distribute installers as .deb, .rpm ETC. If they have a dependency not available in your package manager (or to old a version), they can either also send you that, or have you add their PPA. The only significant problem I see is that their is a fragmented market of packaging systems, which makes it difficult for individual developers to target everyone. Of course *.tar.gz is a good universal installer for when someone doesn't use a major package manager.

We "packaging partisans" are just saying we find classical packaging superior to the every-piece-of-software-on-its-own model, not that the latter isn't an OK fallback if there is no package.

Or you set up pkgsrc under /opt or /usr/pkg or some such, and use that for the software where you want newer versions than your distro provides.

For applications its easier for sure. I've always liked how easy it is to installed statically linked applications. Still, bugs happen.

Development libraries, command line utilities, interpreters-- bread-n-butter developement or unixy stuff or in otherwords, the things the average user don't see-- is usually a a lot easier to get and keep up to date when you have a package manager.

Then one day somebody finds a critical bug in zlib and you need to go and fetch an update for every single application you have ever installed on your system. Then it turns out half of them don't have updates.

This is a fallacy. Just because a bug was exposed in a library you used does not mean your software will be exposed in the same way. Your use of the lib may not even overlap with the bug exposure. And I seriously doubt you carefully comb all the libs you select on a project for "security" before you release something.

Say there's a vulnerability or something in zlib's decompression, as GP suggested. That's gonna affect pretty much all software using zlib (i.e. a LOT). On a system where the package maintainers took care of having all packages use the system zlib, the whole problem is fixed by ONE ENTITY (the zlib maintainer or team) waking up and patching one package. Every user updates the zlib library package through their package manager (which informs them that they need to), and the vulnerability goes away for thousands of programs.

If, on the other hand, those thousands of programs all bundled zlib, the user won't be safe until hundreds of maintainers wake up and do (repeatedly the same) patching. Or even worse, if there isn't even a package management system, as some apparently want, the user has to also go fetch the fixed programs from thousands of upstreams. Oh, and the user also has to know about the vulnerability. Not gonna happen!

As we can see, the classical model reduces work duplication, reduces patching times and manpower need, and certainly takes a big responsibility off the user's shoulders.

Funny that you call the packaging scheme the "classical model" - to me that's the "[relatively] new and weird" scheme. I'd say that "everybody builds their own binaries" is the classical model.

This is the argument that comes up every time. Perhaps it matters to people who are running complex servers hosting an array of services. In my life, it never comes up. I am either using a personal machine or managing a server which is responsible for one single service.

The idea that upgrading one library could affect the behavior of hundreds of programs is terrifying. How do I know they all still work? I don't. I have to go test them all. What this means is that I never update any libraries at all, on a linux machine, because I can't know in advance what the upgrade might break.

My concern is not about keeping everything up to date; it is about keeping everything working. If there is a bug in one program then I want to update that program and only that program and no other programs at all. Then I can evaluate the behavior of the new program. If it is worse than the old behavior, I can hopefully go back to the old version. If it is better, then I can keep it. Nothing else should change.

This is exactly what I get on Mac OS X, and it's what I get when I build apps with statically linked libraries on Linux: stuff works until I break it, and then I know what I broke, so I can fix it.

When I let package managers update things for me, my system becomes an unknowable chaos of changing behavor. Instead, I simply never update anything until I am ready to pave the machine and start from scratch. I install everything I might want to use, then I disable updates and leave it alone until I am ready to start over.

> Funny that you call the packaging scheme the "classical model" - to me that's the "[relatively] new and weird" scheme. I'd say that "everybody builds their own binaries" is the classical model.

I stand corrected.

> In my life, it never comes up. I am either using a personal machine or managing a server which is responsible for one single service.

I just went back through my own update log a couple of months: You don't use anything that uses libxml [1], ffmpeg (audio/video Swiss army knife) [2], poppler (popular PDF library) [3] or openSSL [4]? Do you pay attention to the security notices of every program you have that bundled one of those? Are you sure that upstream is paying attention to the security ntoices of those libraries?

> The idea that upgrading one library could affect the behavior of hundreds of programs is terrifying. How do I know they all still work? I don't. I have to go test them all. What this means is that I never update any libraries at all, on a linux machine, because I can't know in advance what the upgrade might break.

For non-rolling distributions (for example Debian stable, Ubuntu, Fedora, RedHat) packages don't have their behavior changed throughout the lifetime of a release <6>. Security and bug fixes are backported to whatever version is in place for the lifespan of the distribution's release. This is part of the point of stable releases, and something that's routinely forgotten by those who hound package maintainers for newer versions of things!

As an example, let's consider [1]. Ubuntu 13.04 uses libxml2 version 2.9.0 plus some Debian/Ubuntu patches. They released version 2.9.0+dfsg1-4ubuntu4.1 with the following changelog [5]:

  * SECURITY UPDATE: multiple use after free issues
    - debian/patches/CVE-2013-1969.patch: properly reset pointers in
      HTMLparser.c, parser.c.
    - CVE-2013-1969
Only the security fix is applied, and after upgrade you know that libxml2 will continue as it has done for the lifetime of your distro release, except it's no longer vulnerable to CVE-2013-1969. Moreover, this applies to every program using libxml (i.e. a lot) with one update. Again: There's no other change in behavior for libxml!

> This is exactly what I get on Mac OS X, and it's what I get when I build apps with statically linked libraries on Linux: stuff works until I break it, and then I know what I broke, so I can fix it.

Stuff seems to work until you break it. Then a security hole is found, and it turns out things aren't really working very well at all. To me it seems like a lot of work to keep track of such things by yourself!

[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-1969

[2] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2496

[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-1790

[4] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-0169

[5] https://launchpad.net/ubuntu/+source/libxml2/2.9.0+dfsg1-4ub...

<6> Firefox and a few other packages, mostly web browsers who are making it hard to provide security backports, are notable exceptions.

I agree that the Mac packaging system is wonderful for installing programs, but I think this is because it does not even attempt to solve the problem of uninstalling programs.

Arguably, this is a good trade-off because users rarely uninstall software. But it means that if you ever become uncertain about the configuration state of a Mac, you're probably going to have to reinstall from scratch.

What?! This is total nonsense.

Application uninstalls are as trivial as dragging the application to the trash bin. No, this will not eliminate the application's data from ~/Library, etc, but 98% of the time you don't want that anyway. If you know what you're doing, it's usually a quick `rm -rf ~/Library/...` and you're done. Some poorly behaved apps stick stuff in other places or otherwise muck with your system, but now with the app store, that's no longer an issue.

And, if you're absolutely anal about deleting every single trace of an app, there are tools that automate the process. For example: http://www.appzapper.com/ -- But really, it's probably a waste of your time unless you had a badly behaved app go rouge. In my many years of Mac ownership, I've installed and uninstalled hundreds of apps and the only time I ever had to bang my head against the wall was when I used to use MacPorts and a Postgres install went haywire because of the same sort of packaging nonsense that the article is talking about.

> Application uninstalls are as trivial as dragging the application to the trash bin. No, this will not eliminate the application's data from ~/Library, etc, but 98% of the time you don't want that anyway.

When uninstalling an application, you usually do want to remove all of the application's components. How often do you say, "You know, I'd like to uninstall 25% of this application, even though the remaining 75% will just be dead weight without it"?

The two kinds of things there are configuration/settings, and the user data. When you uninstall Adium (for whatever reason), uninstalling the chat logs from the last five years is probably not part of your expected outcome. Uninstalling an application shouldn't reach into your homedir and delete things, either on Linux or OS X.

I do expect when I delete, say, Steam, it won't leave 20 GB of games I can't play sitting around in a totally invisible place. Similarly, when I delete some video or sound editing software, I don't expect it to leave several gigabytes of samples and filters lying around. Both of these are real situations I've encountered when people came to me asking why their hard disk was so ridiculously full.

I can understand not deleting things out of ~/Documents, but a lot of stuff that goes in the Library folders is not what users think of as data that should outlive the application.

Steam itself is sort of a package manager, so that's an interesting edge case...

However, I think that there are basically three categories of application data

1) Documents -- These should never be deleted and are not invisible 2) Settings & other small data not worth deleting, probably nice to keep around in case you ever re-install. Most stuff. 3) Large semi-temporary files, like samples and other downloaded add ons that are optional parts of the application

I think OSX handles 1 & 2 well, but you're right, it needs a way to handle #3 too. However, I think that #2 is a much better default than #3.

I can see arguments both ways on things that are technically recoverable but might require lots of download time (like 20GB of games). However, your settings in those games (if not held on Steam's servers; I don't know where they are) should never be deleted by the application. If you're saying that non-recoverable settings and such should go in ~/Documents, I disagree with that, too: things in the Library folders aren't what the users think of as data, but they are stuff that the users will be upset are missing if they uninstall and reinstall.

That 'go rouge' is a neat typo, it sounds much naughtier than going rogue.

Riak could be contained in one OS X application bundle. All its dependencies in one directory. To uninstall, just trash it. It's up to the distributor.

Can you imagine using OSX as a development box without homebrew?

No, but homebrew doesn't break things up into an insane number of pieces either. In the context of this blog post, Riak on homebrew is a single package.

Yes, and my /usr/local isn't a freaking .git repository. Homebrew, bikeshedding what MacPorts achieved a decade previous (and getting it horribly wrong) since 2009

If by "achieved", you mean "aspired to". MacPorts never actually seemed to correctly build the things I needed. Of course, I gave up before 2009.

Sure, pkgsrc has made it easy to manage third-party software on Mac OS X for over a decade. Lots of other platforms too, with root or without. Homebrew is cool, but I'm pretty happy about being able to use the same package manager everywhere I go.

Yes. Never used homebrew. Not sure I'm missing anything.

MacPorts all the way, since it's sane.

The first thing I do with a new mac is install homebrew.

Your argument is disingenuous. While it might prevent a bug from spreading due to older/differing embedded libraries, it is equally likely to cause new bugs, when library signatures change, and some library function is suddenly gone.

When you bundle the libraries yourself, you only have to target the libraries you included. When you let the package manager do the magic for you, you have to target every version of the libraries, ever.

> when library signatures change, and some library function is suddenly gone.

No, no, no. This is not how you fix critical / security issues in a well maintained system. You either backport a single patch that fixes the problem without changing any signatures, or if you support a very old, incompatible software you reimplement the fix yourself. Then the release is not a new library. It's the old one + fix.

This is what the proper package maintenance is about. No functions should ever be "suddenly gone".

Also if you say in your installation requirements "this software requires libfoo >= 1.2.3, < 2.3.4", no sane package maintainer will disagree. Your application may be patched in the packaging process to work with a different supplied version, but most likely it will just get what's needed.

I think this is a key point that many people overlook the importance of when pressuring, say, Debian or Ubuntu maintainers for new package versions inside of a single release.

Just as a point of order, Windows has dynamic library versioning and multiple loading at least as good as sonames. This was first introduced in Windows 2000, 13 years ago.

I don't know about Windows, but OS X certainly does have that [1]. It's simply called by a different name.

[1] https://developer.apple.com/library/mac/documentation/develo...

And windows has winSxs - side by side assembly... Then again it's the thing that makes the c:\windows\winsxs folder grow to dozen gigabytes and more (like 9 different versions of msvcr90 which is only for vs2008)

Oh, and its the thing that totally stops me from buying surface, since I did "dir /s" on one of the devices at the store

"I know you have all these rules that try to make packages consistent so sysadmins don't have to give any extra thought to each individual one, but I'm a special snowflake that you should treat differently."

I care about having a system with hundreds or thousands of packages installed on it that all work consistently.

Linux is not OS X, and packages are not .dmg files; I want your package using the system version of libfoo, not your own fork of libfoo. If you have awesome changes to libfoo, then you should either get them into upstream libfoo, or go all the way and actually fork libfoo into libfooier upstream to allow packaging it separately.

Reading your comment, I am starting to understand why everyone is so confused about this.

Linux and OS X both have the same underlying options for static or shared libraries. There is a large amount of "enterprise" software that is distributed just like a .dmg file.

There is plenty of middle ground between having everything dynamically linked and everything statically linked. The author of the article believes that packagers should trust developers to make good decisions. (Granted, there are plenty of bad developers and abandonware galore. Packagers are justified in stepping in to make new decisions here.)

As a sysadmin, which do you prefer: self-contained software distributions with all the dependencies included, or packages that use the host distro's package manager, use system versions of libraries wherever possible, and otherwise integrate well with the host system? The latter seems better to me, but it's an honest question, not rhetorical.

For my boxes I'm at the point of striking a middle ground. I use system versions of libraries wherever possible, but in many cases I compile the actual applications from source. The package manager tends to be too many versions behind, and a lot of the software I use is in active enough development that the difference is actually of significant importance. If the package managers updated more quickly, I'd use them.

Edit: Also, the package managers have an awful habit of installing dependencies that are not actually dependencies. Drives me up a wall.

False dichotomy---bundling dependencies doesn't preclude integrating well with the host system. See Basho's latest Riak package for Ubuntu 12.04:


It integrates well with the host system (init script, "riak" user/group, data in /var/lib/riak, logs in /var/log/riak, etc.) and bundles dependencies in /usr/lib/riak.

In my experience, bundling dependencies is often the only practical way to install a complex app. Take Sentry as another example:


The current version (5.4.5) depends on 37 Python packages:

Ruby or Node apps have dependency trees of similar or greater size. It takes an enormous amount of effort to roll all of these as individual packages (yes, I've done it) and it's a colossal waste of time once you realize you can have a working package in under a minute with virtualenv, pip, and fpm:

  $ virtualenv --distribute /opt/sentry
  $ /opt/sentry/bin/pip install sentry
  $ fpm -n sentry -v 5.4.5 -s dir -t deb /opt/sentry

Every organization's needs are different. A large operation like Heroku has a large server footprint, but a very uniform build. A large operation like General Electric might have a large server footprint, but with all sorts of bizarre business requirements that you or I couldn't even imagine.

(I spent the first half of my career in the ISP business.) If I see something as an infrastructure cost that scales slowly but surely, I want a standard distro package wherever I can get one. Sometimes you're in an environment where you've got extremely specific vendor-or-client-imposed requirements, and the best you can hope to do is standardize your configuration / deployment process.

I've had environments where I cared more about CPAN or PyPI than yum or apt (or RHN or roll your own). If I have N00 servers doing the same thing or N00 servers doing a variety of things, the answer shifts.


Very few people understand the importance, and benefits, when they have never seen anything other than .msi[1], .dmg, or worse, .zip.

[1]: Lets just pretend for a minute that Windows software only comes as plain MSI.. all the various .exe's which splatter stuff all over the disk simply doesn't exist.

Look at this ubuntu erlang package, it depends on 40 other packages, as well. That isn’t even the worst of it, if you type ‘erl’ it tells you to install ‘erlang-base’, which only has a handful of dependencies, none of which are any of these erlang libraries!

That package is a dummy package that depends on erlang-base and the rest of the base erlang platform. You would have to force dpkg to ignore dependencies in order to install erlang without erlang-base. I would love to hear how that happened.

Splitting things up into multiple packages makes distributions easier to manage. One person can take the lead on package-dev while another person can take the lead on package-doc. Splitting things up into multiple smaller packages also makes distributing fixes a lot easier. With a one line fix to one include would you rather send out the entire erlang environment or just the small package that needed the fix?

And yes splitting things up to save storage requirements is most useful for resource constrained devices, not new servers/laptops. But it means that a user who is comfortable with Debian or Fedora on the server/desktop can use their same trusty OS on their next project when the device places serious restrictions on system overhead.

You misunderstood the point the author made about erlang-base: it's not that he or she somehow installed Erlang without installing erlang-base, but rather that if in an Ubuntu system you try to run 'erl' before installing any Erlang packages at all, you receive a message telling you something like "to get the erl command install the package 'erlang-base'" and if you go and do that, you don't get the Erlang standard library! The point is that Ubuntu should either suggest 'erlang' instead, or not have all those separate tiny packages in the first place.

Ahh, this is just a simple bug in the command-not-found package that makes those recommendations when you type in a missing binary, not an underlying problem with the entire philosophy of splitting pacakges!

I'm not sure it's really a bug with command-not-found: I think in this case it correctly gives the package containing the erl command, namely erlang-base; the problem is that Ubuntu decided that there should exist an erlang-base package that gives you erl without the standard library.

Nice catch. It did not occur to me that people would run a program without installing it, use command-not-found and/ignore the suggests list when installing a package. As another commenter pointed out this has nothing to do with splitting packages up. Its a toss up between user error or bug in c-n-f.

I'm going to go ahead and plug the Nix package manager here:

  Nix is a purely functional package manager. This means
  that it can ensure that an upgrade to one package cannot
  break others, that you can always roll back to previous
  version, that multiple versions of a package can coexist
  on the same system, and much more.
So you can all have your own versions of lager or whatever, and still have everything managed sort of nicely. Doesn't solve the "include the docs or not" problem, though. And I'm not sure if it does anything for tmoertel's patching concerns.


Nix is possibly the only system I know of other than maybe homebrew, strangely enough, that has designed itself sanely.

It solves the security issue the space issue, and also the I need special patches for my version of this lib in my unique application.

I had never heard of this and it looks awesome, thanks for the link.

Upstream developers don't know best, either. Packagers sometimes make bad decisions, just like upstream does, because we're all people. "Install our software the way we think you should" is a point of view, but not a very smart one unless it's accompanied by a willingness to be persuaded otherwise. This particular upstream developer clearly hasn't seen an OS-agnostic cross-platform package manager like pkgsrc, where one of the packager's tasks is often to make software more portable than upstream cares to bother with. To take one obvious example, we make sure libtool works on all our supported platforms, and then we make sure software with its own precious all-the-world's-a-Linux way of linking shlibs uses libtool instead. Do we try to feed back our portability fixes upstream? Of course. Does upstream always want them? Of course not. Are they wrong to not care? We sometimes think so. Are we wrong to patch their code? They sometimes think so. They have their goals, we have ours. If anyone reliably knows best about anything, it's users.

My favorite interesting packaging choice is TeX Live in Fedora 18 [1]. There are about 4500 texlive-* packages (out of around 35000 binary packages in Fedora total). The packagers split up the packages based on upstream metadata.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=949626


As a TeX user, I find this extremely useful. The Fedora packages map 1:1 to Texlive packages. There is no need to research if a LaTeX package is available and in which Fedora package it is hidden, you can just install "tex-packagename".

It does sound insane to have so many packages, but it also follows Fedora's policy of staying as close to upstream as possible. The entire package building process is automated since TeX Live provides the packaging metadata with the source.

They also include the meta packages [1] so you don't have to install every package individually if disk space is not an issue.

[1] https://fedoraproject.org/wiki/Features/TeXLive#Benefit_to_F...

Why is this insane? Seriously, what problems does it cause, hypothetical or otherwise?

The solution is easy: if you fork a project and it becomes incompatible with the upstream, rename it. How is anyone supposed to discriminate between the two versions if they have the same name?

Also, I'd say, if your software needs lots of modified dependencies, you're not communicating with those projects properly.

If every single project were to fork every one of their dependencies, the result would be maintenance nightmare.

>if your software needs lots of modified dependencies, you're not communicating with those projects properly.

This, a hundred times. The OP wants to bundle modified versions of other people's open-source software as part of their own without feeding the changes upstream properly, and that's just not the right way to do things. Distributions' rules discouraging bundled packages are there because even worse things happen if everyone does that. Sometimes the dependent package has to put off packaging a new release for a particular distro until their dependencies are satisfied, but then it's time to put on big-girl panties and move on. Managing dependencies and reducing version sensitivity are part of a developer's job.

A change that is only useful to you has little likelihood of being accepted upstream. Changes that are only useful to you are far more frequent than you seem to think.

Don't assume you know what I think. I've had to grapple with this issue myself many times. I've had to implement nasty workarounds because upstream rejected a trivial patch. It's a pain, but spewing about how packagers all have OCD and live in the past is hopelessly egocentric and whiny . . . and counterproductive. They do know what they're doing, and their policies generally do make sense if you consider what works across thousands of packages instead of just one. Exceptions and accommodations can be made when the benefit outweighs the cost or risk, but a case has to be made for that. Throwing a tantrum isn't making a case.

> if you fork a project and it becomes incompatible with the upstream, rename it

I agree completely, but I'd like to take your idea further in a direction you likely didn't intend.

The fundamental observation of distributed version control systems, in my opinion, is: Every commit is essentially a fork.

When you combine these two ideas: 1) fork->rename and 2) change==fork, with the 3) identities & values from FP/Clojure/etc, you realize that version numbers are complete folly.

Coincidentally, I just wrote about this with respect to SemVer: http://www.brandonbloom.name/blog/2013/06/19/semver/

In short, if you have awesomelib and make an incompatible version, you can call it awesomelib2. Or you could call it veryawesomelib or whatever else you want. If you give up on the silly idea of being able to compare version numbers, then versioning and naming become equivalent.

Version numbers only appear to be folly because proper engineering discipline has not been applied when managing the stability and/or backwards-compatibility of shared interfaces.

If more developers cared about versioning their software appropriately based on incompatible changes or stability guarantees, it would significantly reduce the costs of maintaining OS software distributions and providing integrated software stacks to users.

> Version numbers only appear to be folly because proper engineering discipline has not been applied when managing the stability and/or backwards-compatibility of shared interfaces.

Encoding intelligence (beyond, perhaps, simple sequence) in version numbers for software is fundamentally folly.

Encoding intelligence about compatibility in version numbers of APIs is only folly to the extent that "proper engineering discipline has not been applied when managing the stability and/or backwards-compatibility of shared interfaces."

Confusing what makes sense with software and what makes sense with APIs is as problematic as any other confusion of interface with implementation.

I think we're saying the same things.

The purpose of libraries is generally to provide an API to one or more consumers.

I'm talking about versioning as applied to the library, as a representative of a set of interfaces provided.

Not as some sort of runtime detection mechanism.

> If more developers cared about versioning their software appropriately based on incompatible changes or stability guarantees

But they don't. And you're not going to be able to make them. And even if you did, people would disagree about what constitutes compatibility, stability, and engineering disciplin. One man's "breaking change" is another man's "that was an implementation detail". It's not possible to get this right, since first you need to define "right". That's why versioning is folly.

There's also the security factor (that many devs today like to ignore); using shared stuff simplifies it. Maybe packagers don't know best, but neither does this guy.

You say this guy doesn't know better, but given that he's talking about shipping a security sensitive application that relies on custom tuned, tested forks of libraries, how can you say that he's wrong for not wanting his library fork replaced with some arbitrary version on an end-user's machine? How can that possibly be safer?

It's certainly nice to be able to take an existing library an app depends on, patch it to fix a security hole, and drop that in. But that isn't what's happening in this context...

So the developer wants to reduce their cost of properly engineering and documenting their application's usage of a particular library in exchange for significantly increasing the costs of rebuilding and updating every software package that uses the same libraries onto the OS developer and their customers?

Who said he didn't properly engineer and document their applications usage of a particular library?

By forking a library, you are not properly using it. You're using something else.

Here "properly engineering and documenting" means pushing upstream changes to officially support your use case, and documenting it so other people know why your use case is important.

I have two projects which follow different ideas on this (mostly due to size).

In both cases, I've basically written "lazy python bindings" for something in C++ (lazy because I only support the features I want in pythonland). Neither of the C++ projects is on github or anything, they're just hosted out there somewhere else (one on SVN, and one only available as archives, I think.)

In the archive case, and since the codebase is small, I just included the whole codebase in my git repo, and added a few small cpp, pyx and py files around it. This library already has a fork, and has the most stars (like, 3) of all my github repos - embedding all the required code and statically linking (indeed, compiling) it as part of my `setup.py` works great, and is easy for 3rd party users too.

In the SVN case, the main project is huge, like a few hundred MB of source (and they use some crazy code generation, so that's not even the half of it.) It also comes with its own very very basic python driver. So, my approach is to give people two or three small patches, build instructions (the project is a nightmare to build correctly,) and then my python code just installs on its own and talks to the project as a normal python library. This version is useless - it's permanently out of date, I can't even get the build instructions I wrote 3 months ago to work when I'm trying to set it up for someone else, and the whole thing is a massive nightmare. If I'd forked it and provided the huge source tree myself, that would be reduced - but that project is also under active development and it'd be great to actually use their latest, least buggy version!

Each of these decisions was made the way it was for real, sensible reasons - I'd hate for a package manager to have to contend with the mess of the second project, and yet apparently that's the way they'd prefer to go with both!

Good job no one needs to use any of my code, really.

While I sympathise with some of the complaints the developer has, the idea that every software component should live as an isolated stack that duplicates its entire set of dependencies is misguided.

OS administrators want a maintainable, supportable system that minimises the number of security vulnerabilities they're exposed to and packages software in a consistent fashion. They also want deterministic, repeatable results across systems when performing installations or updates.

Likewise, keeping various components from loading multiple copies of the same libraries in memory saves memory, which helps the overall performance of the system.

Also, statements like this aren't particularly helpful and are factually inaccurate:

  So package maintainers, I know you have your particular
  package manager’s bible codified in 1992 by some grand
  old hacker beard, and that’s cool. However, that was
  twenty years ago, software has changed, hardware has
  changed and maybe it is time to think about these choices
  again. At least grant us, the developers of the software,
  the benefit of the doubt. We know how our software works
  and how it should be packaged. Honest.
Some packaging systems are actually fairly new (< 10 years old), and the rules determined for packaging software with that system have actually been determined in the last five years, not twenty years ago as the author claims. Nor are the people working on them grand, old, bearded hackers.

OS designers are tasked with providing administrators and the users of the administrated systems with an integrated stack of components tailored and optimised for that OS platform. So developers, by definition, are generally not the ones that know how to best package their software for a given platform.

As for documentation not being installed by default? Many people would be surprised at how many administrators care a great deal about not having to install the documentation, header files, or unused locale support on their systems.

Every software project has its own view of how its software should be packaged, and while many OS vendors try to respect that, consistency is key to supportability and satisfaction for administrators.

So, in summary:

* preventing shipping duplicate versions of dependencies can significantly reduce:

- maintenance costs (packaging isn't free)

- support provision costs (think technical support)

- potential exposure to security vulnerabilities

- disk space usage (which does actually matter on high multi-tenancy systems)

- downtime (less to download and install during updates means system is up and running faster)

- potential memory usage (important for multi-tenancy environments or virtualised systems)

* administrators expect software to be packaged consistently regardless of the component being packaged

* some distributors make packaging choices due to lack of functionality in their packaging system (e.g. -dev and -doc packaging splits)

* administrators actually really care about having unused components on their systems, whether that's header files, documentation, or locales

* in high multi-tenancy environments (think virtualisation), a 100MB of documentation doesn't sound like much, until you realise that 10 tenants mean 10 copies of docs which is a wasted gigabyte; then consider thousands of virtualised hosts on the same system and now it's suddenly a bit more important

* stability and compatibility guarantees may require certain choices that developer may not agree with

* supportability requirements may cause differences in build choices developers do not agree with (e.g. compiling with -fno-omit-frame-pointer to guarantee useful core files at minor cost in perf. for 32-bit)

I'd like to see the author post a more reasoned blog entry with specific technical concerns that are actually addressable.

"the idea that every software component should live as an isolated stack that duplicates its entire set of dependencies is misguided"

That's not what he said. He said that packagers frequently break his software for users by incorrectly breaking it up into the wrong pieces and then including a version of that piece that doesn't work. It's especially bad in the case of erlang applications as he enumerates and it's caused by packagers not taking the time to understand the consequences of where they split the software into packages, all in the name having only one version of lib-erl-foo installed on your system.

If the developer didn't make it clear the they had essentially forked Erlang or what the component's requirements are, the blame lies with them, not the packager.

If the developer did, then they need to reconsider how difficult their making the lives of their customers by forcing the potential for additional vulnerability exposures on the system.

There's a non-zero cost involved in packaging.

> OS administrators want a maintainable, supportable system that minimises the number of security vulnerabilities they're exposed to and packages software in a consistent fashion.

I am the administrator of the machines I use. I am also the user of the machines I use. I care far, far more about my experience as a user than I do as an administrator. The less administrating I have to do the better. As the administrator of my array of personal computers what I want is for everything to work, and to stay working, and for new things never, ever, under any circumstances to break old things.

As a user of the machines I use, I value consistency. I don't want to have to hunt down documentation in weird and wonderful places, I want to go to one spot and have it there.

I love that the OP brought up FreeSWITCH because this is one example where I believe it's most troubling for package maintainers, software engineers and system implementers alike. From a software engineer's perspective, including 3rd party libraries in one source tree it transfers the burden of maintenance and support to one project maintainer. Not reinventing the wheel is good and all, but you still have to maintain its integrity.

From a package maintainer's perspective, especially in the case of Debian, they must ensure that packages are stable and secure. It's their job to make sure security updates are released. In the case of FreeSWITCH, there's no distinction between the main source and its dependencies. Package maintainers might as well not bother with including software like FreeSWITCH in their repos or risk the integrity of their system.

System implementer's are mostly ambivalent about these issues until their distro's FreeSWITCH package includes broken dependencies or until their FreeSWITCH installation has a security exploit due to a library that can't be patched independently.

I love FreeSWITCH but I'm sorry to say that it's poorly architected. However, I'm a system implementer, so I don't care.

Along the same issue, see Debian (and as a result Ubuntu) and Ruby Gems. Used to drive me up the wall (until I stopped bothering).

Yeah, these Ruby guys releasing incompatible versions every other weekend and expecting to be allowed to blurt their stuff all over the system was rather strange. Good thing Debian provided decent packages.

Having read the arguments on this thread, and having seen the pathologies of a single mega-repository of packages as in Debian (e.g. long release cycles, breaking stability policies for the major web browsers), I think that Ian Murdock's former company Progeny was on the right track with its component-based Debian derivative. As I remember it, the idea was to have a small base system, then have separate components for things like GNOME, Firefox, OpenOffice.org (now LibreOffice), etc.

Meanwhile, Ubuntu's split between main and universe/multiverse is a pretty good compromise. I wouldn't be disappointed if Ubuntu jettisoned universe and multiverse, the better to focus on having a solid main repository, and let a thousand small, focused repositories pick up the slack. As long as all of those repositories leave the packages in main alone, as EPEL does with Red Hat-based systems.

There is something to be said about the API's themselves. For example sqlite is backwards compatible (interface-wise), but then my recent worst example was the perforce (p4) client library. It uses C++ and the folks keep changing member variables in the exposed interfaces forcing us to recompile.

The real issue with bundling software is that you can't pull in a security patch. You actually have the same issue with internal packages at large companies. If you can stay on the current release you can drastically reduce the effect of security bugs.

I wonder which packagers the author is griping about primarily. I don't see Riak in Debian.

This is the reason our company creates their own packages and runs our own repositories.

i think the right approch is packages for the operative system layer and bundles for the applications.

it isnt very smart that a user must be root to install a GUI

If only Erlang had versioning in modules like other languages do. Modules are hard, most languages get them wrong, and this should be fixed, but you shouldn't blame packagers.

Erlang applications get versions, and you can specify them when you build releases, which are the Erlang's way of packaging self-executables, through a mechanism that is platform independent.

The thing is, Erlang assumes that things work from said releases, and find the newest available applications in their library path. This makes sense because it is entirely possible for an Erlang application that was upgraded without ever being shut down to want to roll back to older versions.

When this happens, this application has a path with all the libraries and dependencies it ever needed and can rollback to an older one (without shutting down), or start fresh from the newest one automatically.

Other metadata may be added by each release as required.

The thing is that Erlang developers who are experienced and will write and ship products and Erlang will know this and try to build releases and packages that respect this. Then package managers will (often) undo it to fit whatever pattern they have in mind. They did it, for example, with Ubuntu, removing one of the test frameworks that is part of the standard library and setting it in a different package.

Users who tried the language for the first time couldn't run things that depended on the standard library because it was separated in many different packages.

You are missing the point. Even if erlang did support versioning of modules the problem would still exist. Package maintainers arbitrarily break things up because they immediately see a dependency and think it needs to be a separate package. They do this completely ignoring the big picture of shipping solid / tested code.

On the contrary, the people maintaining packages in the distribution are firmly on the side of shipping solid, tested code. That hacked up duplicate of a library that you copied into your source tree? It does not have one hundredth of the testing that has been applied to the version of the library which every other package on the system uses. You have to think about the system as a whole, not assume it is a bootloader for one application.

The answer here is don't package the application using that library then. You'll just ship broken software.


You can recognize that the author needed those patches to that library and figure out some way to include them.

Erlang releases appear to be their solution to this.

I fail to understand how, exactly, packagers' choice to not use the release is erlang's (or riak's) fault.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact