Hacker News new | comments | show | ask | jobs | submit login
So you want to write a package manager (medium.com)
173 points by sdboyer 441 days ago | hide | past | web | 108 comments | favorite



This article describes the status quo of package managers. Lock files, dependency resolution algorithms, etc. However, these are obsolete. The functional package management paradigm approaches the problem from a different angle which is able to precisely describe the full dependency graph for a piece of software, all the down to the bootstrap binaries (the C compiler's compiler, etc.). In this system there is no dependency resolution, because each package is explicit in what it depends on. A dependency is much more than a version number. The novel idea in functional package management is that a package can be viewed as a pure function whose arguments are the dependencies, the source code, the build script, etc. and whose output is the built binary. By viewing packaging from this perspective, functional package managers overcome the problems that plague traditional package managers, allowing for unprivileged package management, transactional upgrades and roll backs, multiple variants of a package coexisting without conflict, etc. One such implementation of functional package management, the one I hack on and recommend checking out, is GNU Guix.

https://gnu.org/software/guix/manual/html_node/Introduction....


I dislike the whole 'functional' package manager idea as it just punts the work onto people to add explicit deps when most of the dependency information is within the code itself. Prefixing with a hash is also not ideal in my view as there are about a dozen other ways to do the same thing without as harsh of consequences. An ideal package manager could prove that package y could be a valid substitute for package x, analyze source for deps which gets into the deep end of the pool (symbolic execution, static analysis, etc).


> The novel idea in functional package management is that a package can be viewed as a pure function whose arguments are the dependencies, the source code, the build script, etc. and whose output is the built binary.

And this is an awesome idea. Cryptographically hashing your install and keeping the install spec around for later is very powerful.

> Lock files, dependency resolution algorithms, etc. However, these are obsolete.

These are still very important (and hard) problems. Claiming that they’re obsolete ignores what they’re used for.

> The functional package management paradigm approaches the problem from a different angle which is able to precisely describe the full dependency graph for a piece of software, all the down to the bootstrap binaries (the C compiler's compiler, etc.).

I suppose this is fine from the perspective of a system package manager. But at the application development level, there are not very many users who want to specify all that.

If you want to play around with different software stacks in Guix and Nix, you have to actually write packages, which involves tweaking a lot of files. There aren’t so many people with the intestinal fortitude to get down to that level. I want a system where I can say “try running with a different version/build/configuration of this particular dependency”, and then the PM figures out what else needs to be done to accommodate that. Users don’t want to tweak package files a lot, and even less do they want to manage the profusion of version/configuration-specific package files that result from doing such experimentation.

We’ve developed Spack as kind of a compromise between these extremes. Spack [1] has builds parameterized by version, compiler, build variants, compiler flags (soon), etc. and it attempts to let the user experiment with a large combinatorial space of packages. You can do that with a command-line syntax (without editing package files), and you can specify tweaks down to the dependencies with a recursive syntax [2].

I support code teams at LLNL who want to experiment with many different compiler flags, dependency versions, etc., across Blue Gene/Q, Cray, Linux, GPU, and Xeon Phi machines, and none of them want to specify everything quite as rigorously as Nix/Guix demand. What we want is really good build parameterization and a concise way to specify it. I don’t see many Guix/Nix builds written that way — they’re all tied to specific versions and you’d need a new package file to support a new version.

> functional package managers overcome the problems that plague traditional package managers, allowing for unprivileged package management, transactional upgrades and roll backs, multiple variants of a package coexisting without conflict, etc.

That’s one way to look at it. The other is that these systems ignore these problems. What you’re really offering is a really good reproducible build system, at the price of rigorous specification. Reproducibility is a big deal, and there are great reasons to do this, but you can’t say the other systems are “obsolete” when they have very different goals.

What Nix/Guix are not doing is reducing the burden of specification. You write down all the details of a very specific, reproducible software stack, but you do not make that software much more composable or extensible than it already was. The user can install the version combination that you packaged, but can they easily try their own? Can they easily try to build with a different compiler/compiler version/set of compiler flags/dependency version/etc.?

npm, pip, etc. hide a lot of details from the user, and that is why people likely continue to use them (i.e., they are not “obsolete”). Constraint solving, etc., are still necessary to hide a lot of the complexity that users don’t want to deal with. Nix and Guix are great for reproducing a snapshot, but app devs want to explore the build space more than that.

Spack attempts to find a happy medium. We cryptographically hash builds, but we also let the user build things that the original package author may not have tried yet, without modifying the package. That does require all the constraint solving nastiness, but it doesn’t kill reproducibility. Spack stores the generated build provenance after doing some constraint solving [3], but the tool helps fill in the missing details of the dependency graph, not the human. Nix and Guix, AFAIK, do not do that. Spack isn’t fully reproducible because it’s not a full system package manager and it doesn’t use chroot, but you can try the same build again using the provenance that was generated when all the solving was done. The tool just helped the user along the way, and I think that’s still a very useful (not obsolete) thing.

[1] https://www.computer.org/csdl/proceedings/sc/2015/3723/00/28...

[2] Slides 9-11: https://tgamblin.github.io/files/Gamblin-Spack-SC15-Talk.pdf

[3] Slide 14: https://tgamblin.github.io/files/Gamblin-Spack-SC15-Talk.pdf


> they’re all tied to specific versions and you’d need a new package file to support a new version

I haven't looked at Spack, but as a Nix user, it's often the case that different versions of some software package require different packages in order to build them. For example, particularly on a less-used platform like darwin, I've submitted numerous patches to the Nix package of MariaDB due to minor breakages in newer versions. It's not always simple to parameterize the upstream version.

That said, there are many cases where it is that simple. Overriding a package in Nix to change the version is possible [1], but could be easier from a UX perspective.

> The user can install the version combination that you packaged, but can they easily try their own

As an example, I needed an older Ruby version that required an older libxml2 than available in nixpkgs. I did need to create my own Ruby package, but was able to simply override the version in the existing libxml2 package, and I configured nix to use that version across all other predefined nixpkgs that required libxml2 and were needed by my project. This meant a lot of compiling on my part (since nixpkgs of course can't publish binaries for my customized builds), but it was fairly straightforward.

[1] http://nixos.org/nixos/manual/index.html#sec-customising-pac...


These are the things that Spack attempts to solve. If you look at slides 14-15 in the presentation above, the concretization process automates something like your second example. You can build with an arbitrary package version, e.g. `spack install ruby@1.8 ^libxml2@1.2`, and Spack will evaluate constraints on packages to build a new DAG. If some other dependency in the DAG says it can't build with the older libxml, or if the older libxml implies other dependency version constraints, they'll be integrated in the new DAG. The output of this process is a concrete specification, with all versions/compiler/variants/etc. filled in, which could be used to reproduce the build later.

If spack doesn't know about the version, it can try to scrape the package webpage to find it automatically, or you can just add another one-line `version()` directive in the package file, rather than creating a new package entirely or making a global config change.

My point is mainly that to automate this type of change, you need something like constraint solving to adjust the DAG. That can be used in conjunction with a functional Nix-like build, and actually adds some value to it. That is what Spack is doing. This is why I claim the constraint solving, etc., is not obsolete.


> What Nix/Guix are not doing is reducing the burden of specification. You write down all the details of a very specific, reproducible software stack, but you do not make that software much more composable or extensible than it already was.

This is not so. You do not have to write down all the details of the complete software stack as build systems can be abstracted away (e.g. `gnu-build-system` provides the GCC toolchain, binutils, make, etc). Only immediate first-level dependencies are declared.

And in many cases you don't have to do even that because you can generate package expressions from upstream information using `guix import`. There are importers for CRAN, bioconductor, pypi, hackage, CPAN, and others.

> The user can install the version combination that you packaged, but can they easily try their own? Can they easily try to build with a different compiler/compiler version/set of compiler flags/dependency version/etc.?

Yes! We use Guix for multiple clusters at a research institute and of course users must be able to create package variants, with different configure flags or compiler versions. This use-case is covered very well by Guix.


My point is mainly that the package specification in Guix is very verbose -- I must hack (and in some cases generate) package files to do what I want to do. The approaches you mention for generating package variants seem to generate new packages -- how do you deal with the profusion of packages that result here?

Don't get me wrong -- Nix is the inspiration for a lot of what Spack is doing, but we've added to it by making an attempt to template the packages by version, compiler, and options. So the usr don't have to "create" the package variant at all: they just `spack install numpy %intel@15.0.1` to compile with the Intel compiler, because the compiler can be swapped in/out of any build. We do similar things with versioned interfaces like MPI -- you can swap OpenMPI or MVAPICH in/out of a build. I have not seen anything to lead me to believe Guix allows this in a simple way, without generating a new package and copying a lot of boilerplate from the old one. The graph transformation stuff you mentioned in your other comment is promising, though.


I'll look into Spack, but I don't see the fundamental problems with Nix as you describe them. Packages can indeed be very parameterized and these parameters become inputs to the functions. They encourage you to have your own clone of the package repository so you can easily modify packages—yes, this is very hacker oriented, but who else wants to experiment with compiler flags?


> What Nix/Guix are not doing is reducing the burden of specification. You write down all the details of a very specific, reproducible software stack, but you do not make that software much more composable or extensible than it already was.

Yep, this. Glad someone else is saying it, because I'm starting to wonder how people can read literally thousands of words rooted on a foundation of making it easier for developers, and reducing their uncertainty, and then expect them to write a book-length spec doc.

(note that requiring people to write down tons of spec information reduces, at most, one kind of uncertainty, and at high cost)

> Spack stores the generated build provenance after doing some constraint solving [3], but the tool helps fill in the missing details of the dependency graph, not the human.

Sounds a lot like what's I described, no?

(I haven't heard of Spack, will look into it)

> isn’t fully reproducible because it’s not a full system package manager

I didn't touch on shared libs in the article, and probably should've, at least to point out that, for the purposes of the discussion of a PDM, it's mostly out of scope.

Things are really just so, so much nicer when we actually let there be layers and scopes of responsibility.


So many useful tools are packaged in language-specific package management schemes, and they are a pain to deal with in conjunction with a system package manager. I'm speaking of nice command-line applications that work like Unix utilities, that simply parse STDIN and write to STDOUT, but for some reason I have to install PyPI/NPM/gem/whatever to install them, and figure out what language-specific idiosyncrasies I have to massage to get them to work (venv, rvm, and whatnot).

Like the author, I want language implementers to split the packaging realm: anything that's a pure source dependency can and should be packaged with language-specific tools, but anything that runs on the system should be distributed in some sort of universally executable format, such as a launcher shell script with an application directory in /usr/bin.

Otherwise you're in this current mess where anyone writing a system package manager has to deal with language runtimes and fight with the language-specific package manager to handle distribution/installation/removal/rollback/versioning in a coherent way with the rest of their platform. Moreover, the user has to make the decision between installing a package from their system vendor or from the language-specific repository.

So I guess what I'm advocating for is for the "language package manager (LPM)" concept to go away. If you need to distribute source dependencies, that should be handled on a per-project and not per-language-environment or per-system basis. If you need runtime dependencies, distribute them as a tarball in a standard directory format and let system vendors package them like the rest of the software they manage.

The JVM world has good things going with Maven and Gradle. I never have to use some Java command-line package manager to install a global dependency; rather, I'm declaring what my project needs and letting the build tools handle the rest. I never have to deal with conflicting dependency versions between projects or language versions. And binary Java applications are easily available via my system's package manager. There is just no need for the middle layer.


The problem is that, as a software developer, I am not going to learn 3 PMs to distribute my free software (yum, apt, homebrew) nor am I going to navigate the politics of Fedora, Debian, Ubuntu, etc. for getting my free software included in their repositories and/or keeping it current. I have things to do, like write software.

Language PMs are "good enough" to solve the problems of the people actually writing the software, and in most cases that is really all that matters. I'm sorry they're not good enough for you, but it's not my fault that Rust, Python, and Go will let me upload to their package manager but Debian/Fedora/whatever you use will not.

Your fight is not with the language PMs, it is with your OS vendor who will neither give me upload rights nor send out someone else who does, who neither designs tools for me to use nor sends someone who already knows how to use them.

To be clear, I think there are good reasons why OS vendors won't change their ways, but software developers won't change our ways either, so neither language PMs nor OS PMs are going away any time soon.


> Your fight is not with the language PMs, it is with your OS vendor who will neither give me upload rights nor send out someone else who does, who neither designs tools for me to use nor sends someone who already knows how to use them.

As far as Fedora is concerned becoming a package maintainer is a fairly straightforward process, and you are more than welcome to package up your own software and submit it to the software collection or solicit someone to assist you with it. The rpmdevtools package contains a super-useful utility called rpmdev-newspec that provides templates to create packages for most popular standards out there already (ruby gems, python setuptools/distutils scripts, CPAN modules, etc) - it's really easy to get started and you can always push your packages up to COPR without needing sponsorship from the existing packaging team.


This is missing the point. The problem is still that there is a million different system PMs, but (usually) only one for each language. The package maintainer have no incentive to care about Fedora, regardless of how easy it is. The real solution is not for Fedora to absorb all language specific packages but to integrate the different package managers so that they work better together.


> The real solution is not for Fedora to absorb all language specific packages but to integrate the different package managers so that they work better together.

Which is precisely why rpmdev-newspec has templates for this case. Simply running rpmdev-newspec -t ruby/python/perl/php-pear/ocaml/R mypackagename will generate a rpmspec file ready to go for any of these languages using their standard package managers (gem, distutils/setuptools, CPAN, pear, OPAM, packrat) - just add necessary metadata (Requires/BuildRequires, summary, description, version, and a changelog).

It's not magic, it still requires minimal effort (3 minutes worth of metadata + calling mock to test the build) - but it's hardly difficult. Sure, you could make some extra scripts to automatically populate the metadata from the language PM's descriptor of choice to limit the manual work to release bumps and editing the changelog - and I don't think anyone would be opposed to that, but that's pretty much all that would be left.

The biggest problem is outside of rolling-release distributions like Arch, Gentoo, OpenSUSE Tumbleweed and Fedora Rawhide bumps in major versions within a release are highly discouraged - and a lot of people used to using language PM's have a habit of changing things rapidly even post-1.0 and not supporting old branches - often leaving the distribution packager with needing to backport patches for security or bug fixes. I'm not going to argue if it's better or worse, but it's an issue that needs to be addressed.


The responsibility to package software at a system level isn't yours, it's the system vendor's. Let them hash out a standard packaging model while you ship your plain tarball or provide `configure && make && make install` instructions.


What you say is very sane why should learn 3 packet managers. The problem is that language packet managers are not good enough, they are enough to get software to install on the developers system. Often leaves the future and the users hanging there with something less optimal.

I guess my issue is that "good enough" is a very fluid concept, and I often discover some months after that it was my excuse for shitty code/integration test.


The ideal solution would be for the system PM to delegate the responsibility for the language dependencies to the language (preferably compartmentalized). It would be pretty slick to have a standardized way for language PMs to be able to declare dependencies on system packages.


No one is stopping you from creating your own yum or apt repository. For Fedora you could use COPR [1]:

> Copr is an easy-to-use automatic build system providing a package repository as its output.

> Start with making your own repository in these three steps: 1) choose an architecture and system you want to build for; 2) provide Copr with src.rpm packages available online; 3) let Copr do all the work and wait for your new repo.

[1]: https://copr.fedorainfracloud.org/


You missed his point. As a software level developer if he uses "his language package manager" that's where his responsibility stops. In your example, he'd make a package for yum or apt but what about brew what about $obscure_other_one? It's an abstraction layer.


> The JVM world has good things going with Maven and Gradle.

Gradle internally uses Maven. Leiningen (the Clojure build tool) also uses Maven, and the nice thing here (while being rife with other problems) is that it pulls the correct language version in as a dependency (i.e. it pulls in Clojure in the required version).


These were my thoughts entirely while reading the article. If I want a command line utility, I really don't want to install npm then mess with any problems with that. I want to simply 'brew install' everything like that and be done.

Sure, if I need the source of something for compilation or testing, i.e. 'pip install numpy', then having that handled by pip + virtualenv makes sense.

Besides that use case, brew should handle everything to reduce conflicts.


What would you say is the difference then, between Maven / Gradle and say, NPM? I haven't used JVM tools, but what you are describing sounds like how NPM works. You specify that your project needs library X, version 4.5, in your package.json file and then NPM handles the rest.


For one, you can't install system-level dependencies with Maven. `gradle install <foo>` simply isn't a concept. Dependencies you download for a project are confined to that project, and optionally cached for other projects dependent on the same versioned artifacts.

The end result is that there is no "global" dependency like you would install with `npm install <foo>`. There is no dealing with dependency mismatches between projects that depend on "<foo> >= 1.2" for example, which can be a hassle to track down in the "global dependency installation" model most often used by these language-level package managers.


IMO this, the Maven and Gradle way is the way it should be. Global dependencies are a nightmare.


What you're describing for Maven is exactly how NPM works. You run `npm install foo`, it installs foo locally, for the project you are working on. Again, I am having a hard time understanding what people think is different between Maven and NPM.


Used both, difference is non-existant. Not sure what parent means.


It is similar, but you need to have npm installed first don't you? With gradle you typically interact with it through a shell script (aka gradle wrapper) and that downloads the correct version of gradle if it isn't already present in the project directory and then gradle downloads the dependencies. The person who sets up the project would generate the wrapper script using gradle.


> It is similar, but you need to have npm installed first don't you?

Same thing for gradle, you need to install it at first place. There is absolutely no difference except the language that is being used. And yes you can use shell scripts with npm too, except your shell scripts aren't going to run on windows.


Does npm have this?

https://docs.gradle.org/current/userguide/gradle_wrapper.htm...

I think that's what GP was referring to -- it's not a shell script for automating use of the tool, it's a shell script for automating acquisition and initialization of the tool.


When you fetch a package with npm, the package will either be : a library(installed locally) or a command line tool (installed globally). You can also run any number of tasks per project (npm run <task>), described in the npm manifest, or execute tasks before or after installing, and before publishing a library.


Right, but that doesn't address how you acquire/install/configure npm itself, which is one of the main benefits of having gradlew[.bat].


You need to have node installed already. Once it's installed you'll have npm because it's bundled with node.

I googled quickly for gradlew.bat and from the first link I opened it looks like it depends on java being installed already.

Node already includes npm because it's the official package manager. Java doesn't include gradle, presumably because they have multiple options and none of them is considered the official one?


I wasn't aware of the implicit "if you have node you have npm" relationship -- thanks for clearing that up.


The language package manager can have so much richer information than the system one, is the thing. The system package manager only understands a crude, lowest-common-denominator notion of dependencies, and everything goes via the system repo which is never going to be remotely up-to-date with the language package repo. Finally most system package managers won't let you have multiple versions of a library installed at all, which makes using more than 2 programs at the same time virtually impossible (because they usually want different versions of some dependent library or other).


>The language package manager can have so much richer information than the system one, is the thing.

This is simply not true. Language package managers cannot describe the entire dependency graph of a piece of software because they can't handle things that aren't written in that language. For example, there are plenty of Ruby gems that require a C compiler and C shared libraries in order to work, but those dependencies cannot be encoded in that system. System package managers can describe the entire graph, thus providing richer information.

>Finally most system package managers won't let you have multiple versions of a library installed at all, which makes using more than 2 programs at the same time virtually impossible (because they usually want different versions of some dependent library or other).

Yes, this is a big issue with the status quo system package managers. Fortunately, there's hope on the horizon in the form of functional package management. Functional package managers such as GNU Guix and Nix allow for an arbitrary number of variants of the same software to coexist without conflict. Additional benefits include unprivileged package management, transactional upgrades and roll backs, reproducible builds, and no central point of trust for binaries (anything can be built from source).

Language package managers are good for quickly sharing code written purely in that language with other developers, but unacceptable for anything else.


> This is simply not true. Language package managers cannot describe the entire dependency graph of a piece of software because they can't handle things that aren't written in that language. For example, there are plenty of Ruby gems that require a C compiler and C shared libraries in order to work, but those dependencies cannot be encoded in that system. System package managers can describe the entire graph, thus providing richer information.

On the contrary, the system package manager has to have a simpler model, precisely because it has to handle multiple languages. A generic model is never going to be able to capture the ruby-specific details as well as a ruby-specific model.


Based on my experience, this also isn't true. I added support for Ruby gems in the GNU Guix package manager. The way it was done was by creating a Ruby-specific build system that Ruby-based software packages elect to use. We have a generic model that allows for many types of specific build systems to exist that deal with the domain-specific concerns. This generic models works better than a Ruby-specific model because, unlike RubyGems, Guix can capture the entire dependency graph.

Take the Pg gem, for example. In order to build and use it, you need the libpq shared library. RubyGems can't encode this dependency, but Guix can. The recipe for ruby-pg is only 25 lines long: http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/...

There was a talk given at FOSDEM about foreign packages in Guix with a focus on Ruby which is worth watching or skimming the slides: https://fosdem.org/2016/schedule/event/guixmodules/


> Language package managers are good for quickly sharing code written purely in that language with other developers, but unacceptable for anything else.

Right, so, what almost all of the FLOSS world is doing, almost all of the time. Miiiight be worth optimizing for that case. Especially if a proper PDM would compose well with larger systems...which it would!


> Right, so, what almost all of the FLOSS world is doing, almost all of the time.

Citation needed. I have been packaging (scientific) software for years and it's usually a happy mix of different languages and libraries.


Got to agree with davexunit here. Language package managers aren't going to handle your non-<language> dependencies. One place where this comes up in spades is with numerical software. Example: frequently you want to build numpy/scipy with fast math libraries (BLAS, mkl, etc.), which might be in C, Fortran or whatever. Python's myriad custom package managers have no support for that level of customization so you're back to manual installs.

Spack [1] attempts to solve this by allowing extensions [2,3] for languages that have their own module systems. Basically you can embed language-specific module enablement logic in the package, and you can enable/disable packages in particular Python installs. You can then build several versions of numpy with different dependency libraries and swap them in and out. It's a compromise between language and system package managers. Right now it's too stateful in that the modules are linked/unlinked into a python installation. We have plans to provide something more like virtualenv or conda environments.

Functional package managers are nice, as are reproducible builds, but they come up short for customizability. I'd have to write an awful lot of packages with Nix and Guix to try out many different versions, compilers, dependencies, etc. Spack has a syntax [4] that allows these to be composed arbitrarily. It's not fully reproducible right now (though it could be if we went down to glibc and used a chroot jail), but the packages are much more composable than the current crop of functional package managers.

[1] http://github.com/llnl/spack/

[2] http://software.llnl.gov/spack/basic_usage.html#extensions-p...

[3] http://software.llnl.gov/spack/packaging_guide.html#extensio...

[4] http://software.llnl.gov/spack/basic_usage.html#specs-depend...


> Functional package managers are nice, as are reproducible builds, but they come up short for customizability. I'd have to write an awful lot of packages with Nix and Guix to try out many different versions, compilers, dependencies, etc.

I disagree. At least with Guix you have programmatic access to the dependency graph (since package objects are just Scheme values). You can traverse the package graph and easily swap out all occurrences of a particular package. You do not have to write the package expressions yourself as you can just modify the package objects directly.

See also the paper "Reproducible and User-Controlled Software Environments in HPC with Guix"[1].

Since the paper was published Guix also gained command-line syntax to rewrite the dependency graph on the fly, e.g.

    guix build guix --with-input=guile=guile-next
which maps the "guile" input to "guile-next" when building the "guix" package. Recent versions contain a pretty generic package transformation framework.

[1] https://hal.inria.fr/hal-01161771/en


In Spack this would look like:

    spack install guis ^guile@<new version>
How do you handle optional dependencies in this model? Or, a more complex case, dependencies that are only present for certain versions of the package? Look at the gcc package in Spack:

    https://github.com/LLNL/spack/blob/develop/var/spack/repos/builtin/packages/gcc/package.py#L54
This builds gcc 4.x and 5.x versions, and it's pretty concise. The `mpc` and `isl` dependencies are only present when the version of gcc is 4.5 or higher, or 5.0 or higher, respectively.

There's an iterative DAG-building step called concretization that looks at these types of constraints and tries to solve for a satisfactory DAG, which is then built in a functional style. I don't see how I would model that in Guix, and but in Spack it's a few lines, and there's command-line syntax to customize versions on the command line, not just programmatically. e.g.:

    spack install gcc@5.0 ^isl@0.11.1 ^mpc@1.0.2


One more reason for me to enjoy Gobolinux i guess.

The people in charge there have either implemented or plan to implement support for those language specific managers into the Gobolinux tree.

As best i understand it, this by making the Compile tool able to use these managers to download, and then capture the output and place it into the proper place in the /Programs tree.


> OS/system package manager (SPM): this is not why we are here today

Arguably, this might be already a wrong approach. Whatever you say, but this is a problem (and is causing problems) that I can do both `apt-get install python-mutagen` and `pip install mutagen`. But what I cannot do for sure at this point: get rid of one and only one of these tools. So before speaking about how can I fuck up writing some language-specific PM it might be fair to admit that pretty much nothing actually is language or project-specific when talking about computer systems, and consider why somebody might write another such tool when we're done with our task.


Yes, I’ve argued before that in a modern software environment both package management and environment separation/virtualisation should be OS responsibilities. As you say, it’s silly that we have OS/distro level functionality, language-specific tools, and sometimes generic but not OS-integrated tools all doing essentially the same job.

If we also had some simple, standardised, cross-platform convention for indicating which packages were available and had which dependencies on which platforms, so we could have common repositories that could advertise the specific versions but also the specific supported platforms of each package they hosted, and so any package management tool could speak the same protocol to those repositories, then many things would get better in terms of portability as well. This is an area where the language-specific tools do sometimes have advantages today, though those advantages typically turn to dependency hell almost immediately if you have a package containing a native code element that it decides to build with whatever C(++) compiler it can find on the target system.


i agree, this overlap is a problem. and i agree that there's a not-obviously-wrong argument to be made that ignoring SPMs is the wrong approach. to this end, someone pointed me to rubinius' wiki yesterday, which had a nice breakdown of the PM hierarchy: https://github.com/rubinius/mkrb/blob/master/README.md#the-m...

OTOH: part of the reason that i included the notion of "compiler, phase zero" is because i think that's one thing that, scope-wise, differentiates SPMs from PDMs. an SPM's output is executables. i think this difference is significant. maybe i'm wrong.

(If anything, I think there's more overlap between an SPM and an LPM).


The difference is obvious. pip works on all os/platforms. apt-get certainly doesn't.

If I write a language and ask users to use their favorite package manager to fetch dependencies, I'm not going to build a big community because packages and their dependencies will not be shared across platforms, or each user will have to add some manifest for every package manager, or different repositories will have to be maintained for each platform ...

os package managers and language package managers ARE orthogonal.


Except they aren't. Not even remotely. The package for some scripting language like python or PHP might have binaries it's depending on, or might have a code written in another language. Pretty much any project-wise language-specific dependency manager can include something that is not actually a library or a bunch of source code, but an utility exposed globally (like unit-testing tool or framework code generator). There are dozens of projects which started out as a domain-specific library (and thus subject for being operated via PDM or LPM only) and soon end up being used as an utility, useful for end-user (because difference between a library and a tool here was a 50 lines of code, exposing some existing methods and which we would like to have anyway for testing). Such tool then is installed via several package managers (and additionally from the source code, because github is far ahead of your PM repository, yeah…) and can cause version incompatibility problems or something even more obscure. And that's just from the top of my head: there can be hundreds of cases which I did not remember right now or even think of, ever. And neither have you.

In fact, pretending that these are different things and that "developer" is something drastically different from "user" is precisely the reason, why all (or, well, most) package managers in existance are so fucked up. The hard truth we must accept before we talk about package managers is that there's no hard lines in this domain. It's a complex problem and must be dealt with accordingly, without exceptional reductionism.


While I agree with that it is the wrong approach, the proliferation of language package managers is showing a different case - that the system package manager have taken on too much responsibility and that a better solution rather would be to add support to integrate the language PMs. There really is nothing that would stop apt-get install to use pip under the hood.


Spack has, to some degree, a compromise between the language and system package management sides of things. Languages in Spack can specify how their extensions should be activated/deactivated within the installation, and modules like numpy can ‘extend’ the language packages. So, you can plug and play with different python modules. I haven’t looked at how Go does language extensions, but it would be interesting to see if this can be mapped to Go, Lua, etc.

[2] http://software.llnl.gov/spack/basic_usage.html#extensions-p...

[3] http://software.llnl.gov/spack/packaging_guide.html#extensio...


Since Go is the language he brings up in the writeup, I have to ask: honestly, why does Go need a package manager?

Go didn't support shared libraries until very recently, and still doesn't use them for things other than where runtime pluggability is actually required. If I want to install a Go program without compiling it, I just use my favorite package management tool-- the cp command.

Meanwhile, Linux package managers have acceptable versions of many of the Go programs I want to use. And building from source just uses git to pull the dependencies it needs. Not every programming language needs to reinvent the wheel, poorly. This looks like a solution looking for a problem.


> why does Go need a package manager

Because a language is nothing without an ecosystem of libraries. These libraries need to be version-ed and dependencies between these libs need to be resolved.

> Go didn't support shared libraries until very recently

It still doesn't.

> If I want to install a Go program without compiling it, I just use my favorite package management tool-- the cp command.

That's not the issue at end here. If i'm a developer and I need to publish an open source project, i'm not going to publish it with all its dependencies. I need a mechanism to manage them, that's what a package manager is for.

Go has an half baked on, go get. It can fetch dependencies but it cannot tell between versions of the dependencies.

It's interesting to note that a lot of Go developers are actually against a separate package manager.


A community, gaslit by years of poor tooling, IMO


It still doesn't [support shared libraries]

OK, fair enough. Go supports shared libraries, but not shared libraries in Go that can be loaded from Go. It's a bit confusing. Design doc here: https://docs.google.com/document/d/1nr-TQHw_er6GOQRsF6T43GGh...

If i'm a developer and I need to publish an open source project, i'm not going to publish it with all its dependencies. I need a mechanism to manage them, that's what a package manager is for.... Go has an half baked one, go get. It can fetch dependencies but it cannot tell between versions of the dependencies.

You don't need a package manager to lock down the versions of external Go dependencies. You only need "go get with versions". The reason why "go get with versions" has not been implemented is very simple: the Go developers looked at the horrible stuff that was going on in Maven with ancient dependencies being used all over the place, and decided "let's not implement that." This is the point that all the pro-version-locking people never talk about, let alone acknowledge: version locking encourages people to use old versions "for stability reasons" and makes it seem acceptable to break APIs in new versions.

It's disingenuous to call "go get" "a half baked-package manager." "Go get" is fully baked, but Go ain't C++ or Java. We don't need DLL hell, lockfiles, dependency graphs, and all that.

This whole discussion makes me think of a quote by Yossi Krenin. Many people feel that "good design" means "lots of code" or "lots of layers". Actually, those are some of the meanings of "bad design". If you think you need to build a rocket ship to walk down to your mailbox and pull out the mail, you might want to rethink.

There is a compromise solution brewing where you can "vendor" dependencies (i.e. bundle them with your project.) See https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJ... There are maybe a few cases where this won't work, like with proprietary code where you're not allowed to redistribute the dependency, but in general this should solve 99% of the problems out there. The forthcoming shared library support with traditional Linux package managers should solve the other 1%.


Not only don't I want to write one, I don't really want to use one.

Especially as a user who isn't programming in the given language. Don't make me install some package manager for your language before I can run your program.

Self-contained executable or installer, please, or go back to the wood shed.


Yep - an LPM (which can work as the sort of self-contained installer you're asking for) is a reasonable thing to present to users. As I argue in the article, though, you can't have a sane LPM without a PDM undergirding it.


>Self-contained executable or installer, please, or go back to the wood shed.

Language package managers suck, but this is way worse. Self-contained executables bundle all of their dependencies, which is terrible for both reproducibility and security. Users and system administrators have no reasonable way of identifying, patching, and updating vulnerable libraries/programs these things, leaving them dependent on the upstream providers of each binary bundle for a bunch software they didn't even write. I've seen a growing push to "appify" GNU/Linux lately, with Docker, xdg-app, OmniBus, etc. and I have become very worried about what will happen if this becomes the dominant way of distributing software on GNU/Linux.

Package management is good for users. We need a safe and sane way to maintain the software running on our computers. The control ought to be in the hands of the user, not the developer.


> Self-contained executables bundle all of their dependencies, which is terrible for both reproducibility and security.

That is only a religious belief. If two programs share a common library, an upgrade to that library (say security fix) could fix things for one program, but introduce a hole into the other. Every upgrade to some shared piece must be validated by the developers and QA of every program that uses it. 99 programs could be fine the change, and the hundredth could break.

And if you want utmost reproducibility, then you in fact need a given version of a program to have its exact dependencies, so that you're running exactly what the developers are running. If program X needs libfoo.1.2, and program Y needs libfoo.1.3, and the programs are actually bundled with their specific version of libfoo, then you have better reproducibility than if libfoo.1.3 is foisted upon program X because program Y requested that version.

The model where you have one libfoo only works if everything is open source and packaged by an upstream distribution, which takes care of curating the entire combination of stuff, so that when program Y needs libfoo.1.3, the entire distro is officially pushed forward to that libfoo version; it becomes the official libfoo for program X also. What you have matches the upstream and so behaviors are likely to be reproducible. If the vendors for different programs are completely independent, then you in fact sometimes need multiple versions of dependencies.


This is why we need stuff like Nix or Guix, where you get the best of both worlds. Packages can depend on different versions of libraries, there is still good accountability of what each package depends on (critical to fix urgent security bugs), if same library is used it is shared, things can be easily sandboxed, and they are also reproducible.


Reposting my comment from the other day since it's something I've been struggling with for years. I'm halfway through the article only so maybe I'm repeating what's being said but...

  I really wish we'd get generic package management.
  A proper package management server and protocol that is.
  Something that can be used and picked up by Windows and linux
  distros but also by domain-specific packages (pypi, etc), browser
  extensions, games with addon repositories and so on.

  It's sad that right now we can't answer "and now we need package
  management" by "sure, let me fire up pkgserve.d" or something.
I think a package management server would be a good first step. Package managers (clients) themselves will inevitably be numerous as different use cases call for different UIs. Android wants a pretty user-facing store. Python wants something that can interact with the python environment at a low level. Debian wants a million checks and firefox wants to integrate with the browser anyway.


Well, spoiler then: I don't advocate for generic package management. That's someone else's problem. Maybe, if the ideas I've put forth in there lay useful foundations, it's something that could happen, but I very much doubt it's practicable.

Even if it does happen, though, I suspect that end-user package management still ends up looking different from developer package management.


I agree. I don't think "generic package management" is possible at all. I think it's possible to make it more generic than it is now (namely by having similar domains agree on single package managers), but these issues are far less relevant.

What is possible I think is to have generic package management servers and protocols. The same way you have HTTP servers right now and don't just roll your own domain-specific protocol for your own domain-specific web just because it's in go instead of python or whatever. And yes, you end up with multiple web browsers and that is fine. You even end up with domain-specific browsers (eg. embedded in games, or as a library, or "live apps" and what have you) and that's all fine.

What I see, whenever I see a new package manager, is a colossal waste of time that could have been avoided.


Best i can think of is tar.gz. But then i am a Gobolinux user, and there binary packages are not much more than a tar.gz-ed branch of the /Programs tree that can effectively be unpacked anywhere and then symlinked back into the tree.


nixos.org it's been done please use it.


Nix is not even sort of a generic package manager in the sense of actual package management. It's a system package manager.

Try to host your game's addons in a nix repository, see how well that goes...


http://nixos.org/nix/manual/#sec-one-click

Also, I'm curious as to what you mean by "system package manager" when nix explicitly supports installation of software by unprivileged users (the packages only get linked to from the users profile, so it's safe). I also run nix under debian a lot very successfully.


What would be the problem with that? I'm curious.


"system package manager" is a meaningless distinction.

If you want the game to load/unload addons at runtime, rather than just wrap the games executable with flags, then yes you have to learn to talk to nix daemon directly which is a bit more work. But do that and things will still work just fine.


As long as I briefly have HN's attention - how did y'all find the repo-as-universe, and universe alignment, metaphors? Useful?

(This was the diagram leading up to it: http://imgur.com/bzy22DA)


I liked it. Nice article. It makes me happy about Dart's design decisions in this area. Hopefully something nice happens for Go.

In the hazy future, one thing I wonder about is how to do back-pressure outside the monorepo: if upstream makes a mistake and breaks something important downstream, how quickly do they find out? Today this happens informally. Ideally (from a responsible maintainer's point of view) you'd have an easy way of finding out before publishing a new version, and the second-best approach would be automatic notification soon after. This might involve speculatively compiling and testing some downstream packages to find out how bad the breakage is, and probably involves coordinating continuous builds somehow, or perhaps making continuous builds the job of the central repo (making it even more centralized).


I can't help but think that if 10 years ago submitting Debian packages has been made easier rather than harder then we would not have this ugly mess today (and Debian would be in a better shape, too). As a software author all I want is to provide others with a simple way to install my stuff. I'd like to write a simple deb spec and tell "just apt-get it". And the fact that not everybody is running Debian is not really a concern since packages can be automatically converted most of the time. But that is not possible, not because writing deb specifications is hard, but because to push any new thing in Debian requires to go though a "becoming an official Debian maintainer" process involving lots of steps including convincing some people that you and your software are worth it, check of ID, etc. I do not want to become an official anything, so I just git push to whatever smaller package manager that's used in that community.


Just wanted to say this was an awesome article. I have no immediate need to write a package manager, but it was a great overview of an interesting and tricky domain. Thanks!


If you are going to build a package manager, can I humbly suggest to take the learnings of late and use a DHT + strong signing of every package?

This practice of forever chaining a language to a bunch of fragile servers out there that go bad at the rate of links is worrying.


I considered including some mention of crypto assurances of packages in the article. I did not, because it would take someone who knows the constraints dictated by such systems far better than I to come up with a means by which such mechanisms could be integrated into a PDM...and have people still use it.

Most of the integrity of packaging systems (at the PDM level), derive from the assurances provided by the underlying VCS - e.g., Git's tamper-proofing assurances by virtue of how its commit DAG is built. If your system has a registry in the middle, that does create an SPOF; if that registry intermediates the VCS with tarballs, that takes away the clients' ability to rely on the VCS for verification. So, in my naive view, that's the point at which signing becomes more crucial.


I wish there was some option to quickly install libraries when using C or C++ on windows.

It's just unbelievable that windows doesn't have this yet.

Although the only thing I hate to do, is compile a lib for such version of MSVC, and add it in my project, meaning I have to add the include path, the lib path, the lib name, in each debug and release, after having ran CMake on it.

Windows is still a little cranky.


Well, ms has nuget now for C# development, and I think it supports C/C++, but I don't imagine the community is huge.


I enjoyed this a great deal, but one thing that I don't see touched on in any of these package management posts is the difference between system-level and user-level package management. Maybe it's because I use a Macintosh, and thus have never been subject to the joys of apt/yum/pacman et al, but I've never been comfortable with the idea that I could install some binary or library and it'd be puked into the system execution context. I use homebrew, which has its problems no lie, but I can upgrade my own versions of software that might conflict with system level stuff without fear.

I'd like the distribution to stick to its knitting and leave the decisions about my personal world of versioning and dependencies to me.


> Maybe it's because I use a Macintosh, and thus have never been subject to the joys of apt/yum/pacman et al, but I've never been comfortable with the idea that I could install some binary or library and it'd be puked into the system execution context.

I don't use a Macintosh. Is their OS system not built out of binaries or libraries? How are those managed, if not by a package manager? I was under the impression that Apple called their package manager "the App Store".

> I use homebrew, which has its problems no lie, but I can upgrade my own versions of software that might conflict with system level stuff without fear.

It's true that some systems' package managers don't support separation of each user's packages from each other, or from system-wide packages. For example, I've used dpkg and rpm which don't support that. However, some package managers can handle user-specific installation (e.g. Nix can), so I don't see how using a combination (e.g. using Nix for user-specific packages, in a Debian system managed by dpkg) is any different from using homebrew alongside whatever-the-Macintosh-package-manager-is.

I've found that using the same package manager for both makes life easier though; e.g. it avoids having two copies of something installed, since one PM didn't spot that the other had already installed it.

> I'd like the distribution to stick to its knitting and leave the decisions about my personal world of versioning and dependencies to me.

Personally, I consider myself to be in charge of my computers, so my personal decisions about versioning and dependencies apply to both user-specific packages and the whole system; e.g. if I want to test on different versions of Python, I should be able to; if I want to have my kernel built with different compiler flags, I should be able to.


> I don't use a Macintosh. Is their OS system not built out of binaries or libraries? How are those managed, if not by a package manager? I was under the impression that Apple called their package manager "the App Store".

App Store isn't a package manager, this is a store where end users download applications like Google Play, has a lot of restrictions regarding what can be published and distributed. One certainly cannot distribute libraries through the App Store or register alternative repositories. And the App Store doesn't resolve dependencies or stuff like that.


System-level package management is mostly an open-source-y phenomenon; it requires the maintainers of the package repository to be able to freely recompile everything from source (so that they can all use the same shared libraries). Over in Mac-land, the closest equivalent would be your OS updates (which ship the whole set of bits as a unit, and therefore not require much in terms of management). Things in the app store are apps, i.e. not libraries, and they all have to carry their dependencies with them in a larger app bundle, thereby trading disk space for much simpler package management.

This is all separate from user-level package management (homebrew, nix, etc.), and project/application/source-level package management (what that article was actually talking about).


This is by the author, or one of the authors, of Glide, which so far is the best dependency manager for Go that I have come across: https://github.com/Masterminds/glide.


Glad you like glide, and thanks! Though I'm at the very most a co-author - I've contributed only ideas so far, and maybe a comment fix or two.

Now that I'm finally done writing this, maybe I can write some code.


Slightly off topic but last time I tried, I can't add new dependency without upgrade existing dependencies with glide. Can glide do it now?


One important issue that gave me a headache in the past (at least for NPM) is the reproducibility factor. You can't reproduce an installation using NPM.

That's because with NPM you can install a specific version of module A on time t0 that also installs, based on package.json, the latest version of module B. But on t1 there is a new version of module B, so the new installation of the same version of module A ends up having a different version of module B. Obviously, this problem expands dramatically when you use multiple modules with different degrees of depth.

Hopefully mook pointed about the mention of shrinkwrap in the article.


npm-shrinkwrap [1] (mentioned in the article) is supposed to help with this; it makes a lock file (npm-shrinkwrap.json) that records the resolved version.

I think it got started being used at work (by stuff I wasn't working on) because trying to install old code that hasn't changed broke because some HTTP proxying middleware for Express changed APIs or something.

[1]: https://docs.npmjs.com/cli/shrinkwrap


fwiw, folks on twitter suggested to me that npm may be moving in something like my "sync-based PDM" direction:

https://twitter.com/rebeccaorg/status/698370989386694657


Whilst this is true, it's clearly bad practice to have a 'latest' dependency in any module that is itself designed to be depended upon.

Personally I've never seen this be a problem in practice. Possibly because I'm lucky, possibly because I've only used dependencies that don't do this. Since I mostly use quite popular libraries, and check the dependencies of the more niche ones I decide to use for exactly things like this, I think the latter is probably the case.

Out of curiosity, I wonder if anyone here has hit this problem in practice with npm and could provide a concrete example - in particular with popular open source dependencies?

Yes it's a possible problem in theory, but a lot of problems in software are possible in theory but rarely occur in practice because people follow sensible patterns to guard against them.


I hit it with an unpopular package (like 10 stars on Github). A dependency of a dependency slightly changed its API in a patch release. Fun debugging. So glad we have this auto-install-newer-untested-versions feature as default.


I was one of the "lucky" people who experienced this. The main module was bitcore-wallet-client


With 7 links to the same essay in the last 12 articles, and an aggregate of 24 upvotes, I would have thought someone would have made a comment in at least one of them by now.

I'm only halfway through it - package managers aren't something I'm used to dealing with. It makes me wonder though, just how many package managers are there these days? Even within Python, I've heard of several.


If you use Debian or a derivative, consider pypi-install, cpan2deb and any wrapper which builds debian packages from $LANG repos.


TL;DR stop and use Nix.

SCNR.


Or its sister project Guix. I really like some of their ideas, e.g. the guix challenge command [1].

[1] https://www.gnu.org/software/guix/manual/html_node/Invoking-...


Sister project? That implies that they are somehow affiliated...


There's overlap in the community/technology, but they are separate projects. Oftentimes people think that Guix is a fork of Nix, but this isn't so. Fun fact: Guix was started by a former NixOS developer.


Yeah.

Interestingly there are several threads on the nix mailing-list right now discussing how to do just this: implement a language-specific package manager, for a brand-new language, using nix. One guy is even using it to replace make.


+1

Are there advantages in developing a new package manager today? There are many mature packaging systems (npm, Bower, Nix, OPAM, ...) which are not written in Go but could probably be adapted to support the Go ecosystem. Moreover, using an existing solution prevents the human cost of debating on "design choices".


You got there before me. Nix brings me so many happys every day.


Nix/Guix have been brought up a bunch of times in follow-ups. I was aware of Nix (though not Guix) before writing the piece, and I do need to check it out. But...

The primary requirement for a PDM is that developers will actually use it. If the interaction is much more complicated than the command diagrams I laid out towards the end of the general section, it's really just not gonna happen. Maybe someone could build a guix frontend that focuses in on just those things. I don't know. New territory for me.

My guess, though, is that SPMs and PDMs are really only superficially similar.


I'm a developer. I used it.

The important thing is that the underlying idea is rock solid, and the work of taming all the crapily packaged software out there is mostly done. So yes there is work to do but its already getting easier. Big todos:

- Proper CI. We need to not merge PRs until we know the merge commit builds. (Travis is wrong in this regard.)

- Sane CLI. Things change over a decade (cough git cough), but good news is were going to replace the whole damn thing 0-fucks given for backwards compat and make something sane even for those that don't appreciate/understand the underlying elegance.

- Support for institutional installations: A central build daemon + farm is a pretty lame say to synchronize things. Should make it so anybody can build stuff, and anybody can share binaries with those that trust them, and shared NSF store or equivalent for de-dup.

None of these things are hard at all.


> Support for institutional installations: A central build daemon + farm is a pretty lame say to synchronize things. Should make it so anybody can build stuff, and anybody can share binaries with those that trust them, and shared NSF store or equivalent for de-dup.

At the research site where I work as a software person, we use a central guix-daemon managing a shared NFS store. Anyone can build stuff and manage their own software profiles from cluster nodes. Works very well for us and gets us a big step closer to reproducible science.

Sharing build artifacts also works with Guix. You can either export from and import items to the store or use "guix publish" to share items via HTTP.


How does this work security-wise? Can any user deploy on your clusters? AFAIK Guix requires root/chroot. This would never fly at our site...


The daemon runs as root (on one server that has write access to the shared store) to spawn build processes in chroots (work is underway to use user namespaces where possible). The builds themselves are performed as unprivileged build users. Users communicate with the daemon via RPCs.

Since every build is forced to its very own unique output directory (by prepending a hash of all the inputs), one user's build/installation does not affect other users.


Last I checked with Nix, NFS and SQLite did not play well. Does Guix work around that?


And windows users ? and mac users ?


I don't know about windows, but Nix runs on OS X.


There's work to make it run on Windows, but it proceeds slowly.


I wrote my own "package manager".

It's nothing more than three short portable shell scripts, using only ftp, sed, gzip, tar, rm, cd, etc.

Credit to the pkgsrc folks for making things simple enough that this is possible.

However my usage of "packages" is minimal. I prefer statically compiled binaries that I compile myself. And I write scripts to automate the fetching, patching and compiling.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: