I can't believe the article doesn't mention it.
I've been using NixOS as my OS for development, desktop and we're in the middle of transitioning to using it for production deployments too.
Nix (the package manager not the distribution) solves so many of the discussed problems. And NixOS (the linux distribution) ties it all together so cleanly.
I keep my own fork of the Nixpkgs repository (which includes everything required to build the entire OS and every package), this is like having your own personal linux distribution with the but with the simplest possible way of merging changes or contributing from upstream.
I use it like I'd use virtualenv.
I use it like I'd use chef.
I use it like I'd use apt.
I use it like I'd use Docker.
In addition to Nix, there is also a newer project: GNU Guix. Guix is built on top of Nix but replaces the custom package configuration language with Scheme, among other differences. https://gnu.org/software/guix/
When package management is solved at the system level, our deployment situation becomes a whole lot better. I used to do a lot of Ruby programming. Wrestling with RVM and bundler was a real pain, especially since bundler was incapable of helping me with the non-Ruby software that I needed as well like libmysqlclient, imagemagick, etc. Using Nix/Guix, you can throw out hacky RVM (that overrides bash built-ins like cd!) and simply use a profile that has the right Ruby version.
Bye pip, bundler, composer, CPAN, puppet, ansible, vagrant, ..., and hello Nix/Guix!
Personally, I'm rather more keen on Nix; the language is pretty much designed for writing JSON-style configuration except as a formal programming language, which is what the vast majority of Nix code is (both package definitions and system configurations).
Additionally, with Nix, you can be close to certain that if you build something twice, you'll get the same result, because it can't access impure resources.
Finally, because Guix is a GNU project, the official repositories are going to go nowhere near non-free software. Nixpkgs contains non-free software, although disabled from installation by default. You might be a little less likely to have people help you get non-free software working on the GNU Guix mailing lists, if you happen to use any.
Guix uses an embedded domain specific language that is also designed for easily writing package recipes, but it uses s-expressions instead of something that is "JSON-style". Also, Nix build scripts are written in Bash, whereas Guix build scripts are written in Scheme. I think that makes Guix more consistent in its programming style.
>Additionally, with Nix, you can be close to certain that if you build something twice, you'll get the same result, because it can't access impure resources.
Guix has this same certainty because it uses the Nix daemon, and the defaults are a bit stricter than Nix.
>Finally, because Guix is a GNU project, the official repositories are going to go nowhere near non-free software.
That doesn't mean that you can't host your own non-free packages or use someone else's non-free packages. But yes, Guix does not ship with packages that ask the user to give up their freedom. To me, that's an advantage.
On the other hand, in order to write a Guix build script, you have to know Scheme (and whatever libraries Guix provides for this task) rather than utilising your existing knowledge of writing shell scripts.
> Guix has this same certainty because it uses the Nix daemon, and the defaults are a bit stricter than Nix.
Really? So you don't actually get access to any of the I/O Scheme libraries from Guix? My understanding (and it seems the understanding of several other people) is that while Guix uses the Nix daemon and thus derivations (and thus build processes) are pure once generated, the process for generating them from the Scheme code is not guaranteed to be so, given that the Scheme code can do practically anything.
Of course, you might not actually write non-deterministic Scheme code, but it's nice to have the guarantee that given a .nix file and a specific version of nixpkgs, the build will always come out to the same result no matter what the creator of that file has done.
To learn Nix you need to learn both how to write shell scripts and how to write Nix expressions. How is that better than just learning Scheme, which is very trivial to learn the basics - and for most packages, you don't really need to learn much because you can reference other packages - it's really just like a configuration file.
One of the goals of GNU is to really make Guile ubiquitous - used for configuration of packages, build processes, service configuration (via DMD) and software configuration/extension. There should be no need to learn dozens of different configuration formats and languages, scheme is the only language you'll need to be able to fully drive your OS. (Well, perhaps not strictly true, you'll probably still need to use the shell, but you'd preferably write guile scripts rather than plain bash).
> My understanding (and it seems the understanding of several other people) is that while Guix uses the Nix daemon and thus derivations (and thus build processes) are pure once generated, the process for generating them from the Scheme code is not guaranteed to be so, given that the Scheme code can do practically anything.
You can do anything from the shell too (which nixpkg can invoke) - you can even invoke guile from a shell script. The guarantee given by both systems is that the build happens in an isolated environment (via chroot), and it doesn't matter what general purpose computation happens inside the environment.
Neither Nix nor Guix make guarantees about the resulting binary from a build process - we do not yet have reproducible builds[https://wiki.debian.org/ReproducibleBuilds]. The only guarantees made by both PMs is that packages have an identity which is a hash of their source, dependencies and build instructions. Changing the build instructions results in a new derivation, so unless you have some crazy package that deliberately tries to make itself non-reproducible, you should get approximately/functionally identical binaries from building, even if they are not bit-exact. Obviously we'd like ReproducibleBuilds in both systems, to be able to authenticate the actual build via its hash.
Except that as a Linux admin, you probably already know how to write shell scripts anyway, and you'd be rather hard-pressed to manage a Linux system while never once having to write or read one. There's also a lot more information on writing shell scripts than Guile scripts for various common purposes.
> scheme is the only language you'll need to be able to fully drive your OS.
With a different DSL for each use, meaning you have to learn the restrictions of each DSL anyway.
> You can do anything from the shell too (which nixpkg can invoke) - you can even invoke guile from a shell script.
That's true, although in Nix's case, the derivations map directly to the source, as Nix code itself can't call out to anything (only return derivations which might) - and in theory, the build could then happen in a sandbox, making it even more likely that the result would be the same.
We might not have reproducible builds yet, but Nix is closer to having them than Guix if somebody wanted to make a research project out of it.
I worked with a lot of platforms, such as PHP, Perl, Ruby, Python, Node.js and .NET. I felt the pain of pip, easy_install, setup-tools, virtualenv, bundler, gems, cpan, pear, rvm, rbenv, npm, bower, apt-get or whatever else I used at some point or another. And I swear, in spite of all the criticism that Java or Maven get and in spite of all warts, in terms of packaging and deployment for me it's been by far the sanest. I mean, it's not without warts, heaven forbid to end up with classpath issues due to transitive dependencies, but at the very least it is tolerable.
I don't think being locked into the JVM is a very good solution. Java libraries can depend on other non-Java components.
Nix doesn't require you to specify the entire dependency tree; each dependency specifies its own dependencies, and those are resolved during the build process.
(For the record, at least Haskell, Python, and node.js packages are pulled into the nixpkgs tree from their respective package repositories regularly, albeit many missing native dependencies; there's a separate file that you can edit and send a pull request for packages which have native dependencies.)
Also, "Java" is mentioned twice in the article but I can't find mention of Ocaml Functors. I thought they solved most package problems even before Nix was around?
The basic idea is
- do away with modules
- all functions have unique distinct names
- all functions have (lots of) meta data
- all functions go into a global (searchable) Key-value database
- we need letrec
- contribution to open source can be as simple as
contributing a single function
- there are no "open source projects" - only "the open source
Key-Value database of all functions"
- Content is peer reviewed
These are discussed in no particular order below:
This is the crux of the problem. Before too long, the amount of metadata dwarfs the thing it describes and it's easier to rewrite the function than it is to find or describe it.
Modules and sub-module hierarchy offers a greater, simpler organizational methodology.
Seriously? Is that his solution to package management?
Keep in mind that Joe Armstrong is talking about Erlang here, which is a functional language - most of the functions in libraries are sort-of kind-of independent from each other; they especially don't share state.
There seems this fundamental disconnect between people making languages about how people use their languages. I don't have time to follow your Twitter feed, because I'm working on a lot of different things. I know it's important to you, the Language Developer, and so you think it should be important to me, the Language User. But I have dozens of things to keep track of, and all of them imagine that they're the most important thing in my world.
It's like the old office culture mocked in "Office Space" where the guy has 7 different bosses, each imagining their own kingdom is the most important.
Package management itself is not a solved problem, so you can't very well expect programming languages to be any different. The existing systems work quite well and make total sense: your package manager is tailored to your specific use case. Centralized/decentralized is a red herring. First figure out how to package every single thing for every single system and use case in the world, and then come back to me about organizational systems.
Every programming language has its own package manager because it's written in that language. No language maintainer is going to say something like, "Hey, want to use Ruby? Just install Perl first so you can install some Ruby packages!"
Likewise, every OS-level package manager assumes an OS. I'm sure apt-get, and yum and Nix are great. I'm also sure their greatness isn't very helpful to Windows users.
There's also the dependency between those two. An OS-level package manager can't easily be written in a high-level language, because one of its core jobs is to install high level languages. A language-level package manager doesn't want to re-invent the OS stack.
Bootstrapping is hard. Package managers sit very very low on the software stack where any dependencies are very difficult to manage and where consolidation is nigh impossible.
"Stable" distributions have an additional downside he doesn't mention: when you upgrade every package all at once it's a LOT more effort than if you had upgraded them slowly over time. Dealing with multiple library changes at once is an order of magnitude more difficult than dealing with them one-at-a-time.
And also, to some extent, if all the libraries you are using have a long term stable API, then it doesn't actually matter which one you pick - anything is painless.
Curious... I have exactly the opposite experience. I find that a certain amount of time is required to carefully regression-test my application code after upgrading a library. Doing this 23 times for my 23 different dependencies that need to be upgraded can be quite costly. If I, instead, upgrade all of the libraries at once and perform my extensive regression testing just once, I save a great deal of effort.
That's if everything goes smoothly. If something does NOT go smoothly and I encounter an error, then I need to determine which upgrade caused the problem. Most of the time (85% perhaps?), that turns out to be easy and obvious just by looking at the error that presents itself. In the remaining cases, I simply roll back half of the package upgrades and start binary-searching to identify the culprit (or culprits in the case of a conflict between libraries).
From my experience exactly the opposite is true. Compare uprading Slackware to keeping an Arch Linux running. With Slackware, I have to sit down for an hour, do the upgrade, read the notices that come along with it, maybe see if it will break any of my custom packages. This happes once or twice a year (security upgrades are completely painless as they don't break things). With Arch Linux I need to do that every day. If I don't have time to do it for a month, the system is basically broken beyond recognition...
I'm fine with having the odd out of date version of something, I'm just saying: be incremental about keeping your stuff up to date.
If you're very lucky, the packaging in question will not conflict horribly with apt or yum. So you probably won't be lucky.
That's what I wish the language designers had avoided. It would be convenient for users if there were never any reasons to want anything but the most recent version.
I know Perl and similar languages tend to value rapid development over complete reliability, but I'd prefer not to think about version numbers or worry about updates that break things. Maybe if the package management systems had been designed with less rope, users would more rarely get hung.
Maybe this time we can talk about how to meaningfully solve these problems instead of just fighting pointlessly about if old tools are so great should be used for everything.
Decentralized package management huh?
How would that work?
A way of specifying an ABI for a packages instead of a version number? A way to bundle all your dependencies into a local package to depend on and push changes from that dependency tree automatically to builds off of it, but only manually update the dependency list?
I'm all for it. Someone go build one.
http://0install.net/ does this (sad to see it wasn't mentioned in the article). Basically:
1. Use URIs rather than short names to identify packages.
2. Scope dependencies so different applications can see different versions of the same library where necessary.
Here's an OSNews article from 2007 about such things:
Technically impossible for many languages (have fun figuring out what it would look like in Perl...). And even when it's possible, it's not a guarantee: you can have a semantic change without an ABI change. Cargo, Rust's newfangled package manager, supposes semantic versioning, and I think it's a sane attitude.
We've taken a pretty good shot at this in the OCaml ecosystem the via OPAM package manager (https://opam.ocaml.org).
* OPAM composes its package universe from a collection of remotes, which can be fetched either via HTTP(S), Git, Hg or Darcs. The resulting package sets are combined locally into one view, but can be separated easily. For instance, getting a view into the latest XenAPI development trees just requires "opam remote add xapi-dev git://github.com/xapi-project/opam-repo-dev".
* The same feature applies to pinning packages ("opam pin add cohttp git://github.com/avsm/ocaml-cohttp#v0.6"). This supports local trees and remote Git/Hg/Darcs remotes (including branches).
* OCaml, like Haskell, is statically typed, and so recompiles all the upstream dependencies of a package once its updated. This lets me work on core OCaml libraries that are widely used, and just do an "opam update -u" to recompile all dependencies to check for any upstream breakage. We did not go for the very pure NixOS model due to the amount of time it takes to compile distinct packages everywhere. This is a design choice to balance composability vs responsiveness, and Nix or 0install are fine choices if you want truely isolated namespaces.
* By far the most important feature in OPAM is the package solver core, which resolves version constraints into a sensible user-facing solution. Rather than reinvent the (rather NP-hard) solver from scratch, OPAM provides a built-in simple version and also a CUDF-compatible interface to plug into external tools like aspcud, which are used by other huge repositories such as Debian to handle their constraints.
This use of CUDF leads to some cool knobs and utilities, such as the OPAM weather service to test for coinstallability conflicts: http://ows.irill.org/ and the solver preferences that provide apt-like preferences: https://opam.ocaml.org/doc/Specifying_Solver_Preferences.htm...
* Testing in a decentralized system is really, really easy by using Git as a workflow engine. We use Travis to test all incoming pull requests to OPAM, much like Homebrew does, and can also grab a snapshot of a bunch of remotes and do bulk builds, whose logs are then pushed into a GitHub repo for further analysis: https://github.com/ocaml/opam-bulk-logs (we install external dependencies for bulk builds by using Docker for Linux, and Xen for *BSD: https://github.com/avsm/docker-opam).
All in all, I'm very pleased with how OPAM is coming along. We use it extensively for the Mirage OS unikernel that's written in OCaml (after all, it makes sense for a library operating system to demand top-notch package management).
If anyone's curious and wants to give OPAM a spin, we'd love feedback on the 1.2beta that's due out in a couple of weeks: http://opam.ocaml.org/blog/opam-1-2-0-beta4/
Also, you can pick which version of the compiler to run, and have it manage switching everything.
It seemed like it was years ahead of cabal, but that might just be because I only used it a little, I don't know. But there are some things to learn from OPAM.
Do you have a blog post like this, or something I could post the the Haskell subreddit?
(How to pin a development is central to the day-to-day development workflow of OCaml/OPAM users and quite annoying to change after-the-fact, so we're eager for feedback on this iteration before we bake it into the 1.2.0 release).
The OPAM blog is only about 2 weeks old, so there'll are quite a few more posts coming up as our developers discover there's quite a lot to write about :)
I'm quite excited.
The real problem is that its so powerful and hard to ramp up on... The docs aren't sufficient for its overall complexity. That all aside, if the will were there, it could be the git of package managers.
* Quality and Trust mechanisms. If there are 14 different postgres clients, which do I choose?
* Package Metadata management. Where can I send bug reports? Who is the maintainer? How can I contact someone? Is there an IRC channel?
* Documentation and Function/Class Metadata. Why should I go to the Github README for one package, and to a random domain for another package?
* Linking compile and runtime error messages to documentation or bug reports. Why is google still the best way to track down the cause of an obscure error message?
* Source data linking and code reviews. I should be able to type in a module/namespace qualified function name and view the source without having to scour a git repository. I should also be able to comment directly on that source in a way that is publicly visible or privately visible.
I want to illustrate this with a detailed example of something I did just the other day, when I set up the structure for a new single page web application. Bear with me, this is leading up to the point at the end of this post.
To build the front-end, I wanted to use these four tools:
- SASS (a preprocessor to generate CSS)
Notice that each of these directly affects how I write my code. You can install any of them quite happily on its own, with no dependencies on any other tool or library. They are all actively maintained, but if what you’ve got works and does what you need then generally there is no need to update them to newer versions all the time either. In short, they are excellent tools: they do a useful job so I don’t have to reinvent the wheel, and they are stable and dependable.
In contrast, I’m pretty cynical about a lot of the bloated tools and frameworks and dependencies in today’s web development industry, but after watching a video by Steven Sanderson (the creator of Knockout) where he set up all kinds of goodies for a large single page application in just a few minutes, I wondered if I was getting left behind and thought I’d force myself to do things the trendy way.
About five hours later, I had installed or reinstalled:
- 2 programming languages (Node and Ruby)
- 3 package managers (npm with Node, gem with Ruby, and Bower)
- 1 scaffolding tool (Yeoman) and various “generator” packages
- 2 tools that exist only to run other software (Gulp to run the development tasks, Karma to run the test suite) and numerous additional packages for each of these so they know how to interact with everything else
And this lot in turn made some undeclared assumptions about other things that would be installed on my system, such as an entire Microsoft Visual C++ compiler set-up. (Did I mention I’m running on Windows?)
I discovered a number of complete failures along the way. Perhaps the worst was what caused me to completely uninstall my existing copy of Node and npm — which I’d only installed about three months earlier — because the scaffolding tool whose only purpose is to automate the hassle of installing lots of packages and templates completely failed to install numerous packages and templates using my previous version of Node and npm, and npm itself whose only purpose is to install and update software couldn’t update Node and npm themselves on a Windows system.
Then I uninstalled and reinstalled Node/npm again, because it turns out that using 64-bit software on a 64-bit Windows system is silly, and using 32-bit Node/npm is much more widely compatible when its packages start borrowing your Visual C++ compiler to rebuild some dependencies for you. Once you’ve found the correct environment variable to set so it knows which version of VC++ you’ve actually got, that is.
I have absolutely no idea how this constitutes progress. It’s clear that many of these modern tools are only effective/efficient/useful at all on Linux platforms. It’s not clear that they would save significant time even then, compared to just downloading the latest release of the tools I actually wanted (there were only four of those, remember, or five if you count one instance of RequireJS).
And here’s the big irony of the whole situation. The only useful things these tools actually did, when all was said and done, were:
- Install a given package within the local directory tree for my project, with certain version constraints.
- Recursively install any dependent packages the same way.
That’s it. There is no more.
The only things we need to solve the current mess are standardised, cross-platform ways to:
- find authoritative package repostories and determine which packages they offer
- determine which platforms/operating systems are supported by each package
- determine the available version(s) of each package on each platform, which versions are compatible for client code, and what the breaking changes are between any given pair of versions
- indicate the package/version dependencies for a given package on each platform it supports
- install and update packages, either locally in a particular “virtual world” or (optionally!) globally to provide a default for the whole host system.
This requires each platform/operating system to support the concept of the virtual world, each platform/operating system to have a single package management tool for installing/updating/uninstalling, and each package’s project and each package repository to provide information about versions, compatibility and dependencies in a standard format.
As far as I can see, exactly none of this is harder than problems we are already solving numerous different ways. The only difference is that in my ideal world, the people who make the operating systems consider lightweight virtualisation to be a standard feature and provide a corresponding universal package manager as a standard part of the OS user interface, and everyone talks to each other and consolidates/standardises instead of always pushing to be first to reinvent another spoke in one of the wheels.
We built the Internet, the greatest communication and education tool in the history of the human race. Surely we can solve package management.
So now that we know what to do, the big question is: who's going to spend the next 5-10 years of their life on that project?
But this is my point: We are already solving all of those problems, and doing almost all of the work I suggested.
All of the main package managers recognise versions and dependencies in some form. Of course the model might not be perfect, but within the scope of each set of packages, it is demonstrably useful, because many of us are using it every day.
All of the people contributing packages to centralised package repositories for use with npm and gem and pip and friends are already using version control and they are already adding files to their projects to specify the dependencies for the package manager used to install their project — or in many cases, for multiple package managers, so the project can be installed multiple different ways, which is effectively just duplicated effort for no real benefit.
All major operating systems already come with some form of package management, though to me this is the biggest weak point at the moment. There are varying degrees of openness to third parties, and there is essentially no common ground across platforms except where a few related *nix distributions can use the same package format.
All major operating systems also support virtualisation to varying degrees, though again there is plenty of scope for improvement. I’ve suggested before that it would be in the interests of those building operating systems to make this kind of isolation routine for other reasons as well. However, even if full virtual machine level isolation if too heavyweight for convenient use today, usually it suffices to install the contents of packages locally within a given location in the file system and to set up any environment accordingly, and again numerous package managers already do these things in their own ways.
There is no need for multi-year ISO standardisation processes, and there is no need to have everything in the universe work the same way. We’re talking about tools that walk a simple graph structure, download some files, and put them somewhere on a disk, a process I could have done manually for the project I described before in about 10 minutes. A simple, consolidated version of the best tools we have today would already be sufficient to solve many real world problems, and it would provide a much better foundation for solving any harder problems later, and it would be in the interests of just about everyone to move to such a consolidated, standardised model.
These all happen regularly when OS maintainers have to package software for release. They spend thousands of hours to resolve [by hand] each one in order to support the various use-cases of their end users. If you are imagining some automated process just magically makes all your software come together to build you a custom development environment, you are mistaken. It's all put together by humans, and only for the use cases that have been necessary so far.
So yes, all these things exist. In small, bespoke, use-case-specific solutions. What you're asking for - universal software management standardization - can't practically be achieved in more than one use case. This is why we are all constantly stuck in dependency hell, until a bug is filed, and the system is once again massaged into a working state by a human. Frustrating, sure. But it works most of the time.
And yet npm remains a useful tool, and mostly it does what it should do: download a bunch of files and stick them somewhere on my disk. The same could be said for gem, pip, Bower, and no doubt many other similar tools. They just all do it a bit differently, which leads to a huge amount of duplicated effort for both the writers/maintainers and the users of these packages.
I’m not arguing for magic or for orders of magnitude more work to be done. I’m just arguing for the work that is mostly being done already to be co-ordinated and consolidated through standardisation. To some extent I’m also arguing for operating systems that include robust tools to navigate the modern software landscape as standard, mainly because installing things with tools like apt has an unfortunate way of assuming there should be one global copy of everything, which is frequently not the case for either development libraries or end user software on modern systems, and because if the OS doesn’t provide good universal package management tools then someone else will immediately invent new tools to fill the gaps and now we are back to having overlapping tools and redundancy again.
And no, it isn't code publishers that spend thousands of hours resolving broken and incompatible builds, it's release maintainers. Go look at bug lists for CentOS. Look at the test trees for CPAN. It is literally mind numbing how much shit breaks, but it makes total sense when you realize it's all 3rd party software which largely is not designed with each other in mind. Somebody is cleaning it all up to make it work for you, but it sure as shit ain't the software authors.
Once you develop enough things or maintain enough things you'll see how endlessly complex and difficult it all is. But suffice to say that the system we have now is simpler than the alternative you are proposing.
Sure you can. Projects of all scales do this all the time. Have you never heard C described as being portable assembly language?
Unless you are writing low-level, performance-sensitive code for something like an operating system or device driver, usually details like endianness matter only to the extent that they specify external protocols and file formats. I would argue that this sort of detail is normally best encoded/decoded explicitly at the outer layers of an application anyway.
Obviously if you rely on primitive types like int or long in C or C++ having a specific size or endianness, or if you assume that they will be equivalent to some specific external format, you’re probably going to have problems porting your code (and any package containing it) across some platforms.
However, that issue does not contradict what I proposed. It’s perfectly viable — indeed, it’s inevitable — to have packages that are only available on some platforms, or packages which depend on different things across platforms. That’s fine, as long as your packaging system doesn’t assume by default that the same thing works everywhere.
And no, it isn't code publishers that spend thousands of hours resolving broken and incompatible builds, it's release maintainers.
Who is the “release maintainer” who made those jQuery libraries I mentioned in my extended example above play nicely together?
Again, this issue does not contradict what I proposed anyway. In my ideal world, if packages are incompatible or don’t have sufficient dependencies available on a certain platform, you just don’t list them as available for that platform in whatever package index they belong to. Once again, this is no harder than what a bunch of different package management tools do (or fail to do) right now.
It's not something you can make generic like a file/folder based version control tool. It's like asking for the Git of unit testing/continuous integration or whatever, not going to happen.
It needs to do this because each application is sandboxed. For most uses a generic packager is fine though. After all, most languages also have RPM, Deb, packages etc.