Hacker News new | past | comments | ask | show | jobs | submit login
Package managers should be immutable, distributed and decentralized (evertpot.com)
137 points by treve on March 23, 2016 | hide | past | favorite | 61 comments



Asking for Immutable / Append-only is insanity, due to the fact that eventually someone WILL clutter the hell out of it, which all of your mirrors will hate you for, as well as downstream ecosystem toolchains. The only viable way to allow it is by vetting every upload, and uh, good luck finding volunteers for that.

Decentralized / Distributed is a good thing to have and prior art already exists.

Source: Experience with the grand daddy CPAN.


> The only viable way to allow it is by vetting every upload, and uh, good luck finding volunteers for that.

Debian seems to have no problems with that.


Why parrent comment was downvoted?

Major Linux distributions (Fedora/Debian/SuSE/etc.) are excellent example of proper solution for the "dependency hell" problem for C programs. JavaScript is the new C, but for web, so same solution will appear for same problem.


Debian vets the hell out of the developers, but once you're a dev you can update any package. It's a major faux pax to push an update to a package that you're not responsible for without a very good reason, but since they're all signed with the uploader's public keys this isn't a problem.


> Debian seems to have no problems with that.

Because Debian developers do not package all sort of crap.


Yeah, no. This is 90% of the reason why I stopped using Debian based distros. There's a complete lack of consistency with how they do their package management to the point that I eventually started to build most things from source. Too many outdated and unpatched packages.

Archlinux is a bit better, but the crazy amount of timeouts and the everything-goes mentality is getting pretty frustrating.

CentOS/rhel isn't much better, for what it's worth. Dear god, those repos....


Immutability doesn't mean you need to have the whole history locally. For example in git you can just checkout one branch while the whole repo is on the server. For bitcoin there are clients which download only a part of the blockchain even when every transaction is immutable and cryptograpically signed.


It is not insanity. It is actually what many sane package managers do. Allowing someone to "unpublish" something that many depend on - that is insanity.


Vetting new module is what the drupal.org project does, and the backlog is massive and people don't want to submit new projects due to new module hell.

After their first module they can create as many new modules aas they want though.


Worth noting too, that there is a strong community movement AWAY from the thorough vetting process and TOWARDS an unvetted, open system modeled after npm et al.


Maven Central (java) is like that. So far not much problems.


Nix (https://nixos.org/nix), probably?


Speaking of Nix, I just watched this video by Eric Merritt from recent Erlang Factory conf:

https://www.youtube.com/watch?v=xRSFJH3Lw6I

He is the author of one of the popular Erlang books and here talks about using Nix in production instead of just pure langauge specific package managers (or OS package managers).

Some really good insights on immutability, reproduceability, and of course real production use examples.


Yes Nix, but Nix that uses IPFS (https://ipfs.io) as a distribution medium for packages. This was a package will stay published as long as someone is using it.


Yeah, they're getting there. The binary package cache is a start. If it cached upstream tarballs and git revs as well as builds, it would mitigate the sort of thing the OP talks about.

Nix still needs some sort of distributed trust model, though. Without that, nixos.org is a single point of failure.


You can use your own binary caches and of course your own clone of nixpkgs repo. I think moving the binary caches to IPFS would also be a fun project.


A really powerful thing that I think is extremely underutilized for this is the ability to sign git tags. Using that it should be very possible to allow for trusted pinning against a source code version that is easily replicated in mirrors.


Why not pin against the git hash?


I wrote the above on my phone, so I probably could have expanded more. I don't just mean pinning for a local build, I mean being able to publicly pin a version as being 'released' through signed commits, and that state propagating through mirrors in a way you can trust. This would make your git repo and your gpg key the source of truth on package versioning rather than offloading it to a mutable centralized service.

It'd still be valuable to be able to propagate revocations, to indicate versions that are no longer valid (because of security issues or what have you), but it would be entirely possible to have mirrors available for widely used packages that still have the correct signed sha for that version for projects that can't or shouldn't move on for whatever reason.

Basically I'm saying a released version of a package could quite easily be a tuple of the following:

- gpg key of project releaser

- sha of release tag

With auxiliary information relevant to discovering both newer versions and mirrors of older versions:

- canonical git repo where the tag and signature can be found, as well as future revisions of the same

- (optional) mirror repos where you can find the above if the canonical repo is not available.


Because while its difficult, I don't think its impossible to force a hash collision, making it theoretically possible to push malicious code that would look as if it was the trusted version you want to use.


Signed Git tags reference the same hash (commit id) that you could pin against directly.


We needs blockchains for releases, projects, etc. are addressed by hash. Search for your project, the name can freely change, but the hashes are immutable and release hashes guarantee integrity of all subordinate artifacts (files, metadata). Perhaps if you want to public a new project, you have to first find a free "block", with basically linear (slowly increasing) proof-of-work. Then, for distribution, something like BitTorrent to distribute bandwidth and resilience across many systems (companies can choose to host their own complete public mirrors).


To achieve decentralization, we need a better model of trust for OSS at Github scale. If there was a way to verify the authenticity of packages, we could even use existing centralized repositories. I've been tinkering on an approach to this at https://github.com/chromakode/signet as a proof of concept.


I hope one day Cargo (package manager for Rust) will run over IPFS. :)


As far as I know (after searching), there's no IPFS module for Rust. IPFS + GPG would be ideal and I think by being one of the youngest major languages, Rust should not wait and lead in this other aspect of security!


The good thing is that IPFS exposes an HTTP API, so doing language bindings should be quite easy. For example, JS bindings for IPFS: https://github.com/ipfs/js-ipfs-api


Yeah, that's a way to do it!


Go technically would of been immune to such an attack unless of course they took down the project's github repository, but it's possible to host your Go code anywhere with git or other VCS. One thing that I always loved about Go was this altogether.


Exactly. People have been laughing at Go for advocating vendoring. Now we have a very concrete evidence that this is a sane approach.

The vendoring mechanism advocated by Go is better than a central registry (like npm, PyPI, RubyGems or crates.io).

crates.io’s FAQ claims that a central registry is good for discoverability (find popular packages) and speed (fetch only necessary data instead of the whole git repository), but a proxy could provide the same benefits to an ecosystem based on decentralized repositories and vendoring.


Not saying that NPM is perfect but you can host your node modules wherever you want as well. Supports tarballs/folders/git out of the box.


If that's the case, then why don't the users of that package update their dependencies to his github repositories? I know it sounds like a pain, but users shouldn't be so reliant on a service that could one day disappear, which is one reason I like Go's approach.


https://ipfs.io would be able to accomplish this goal.


Semi related but recently Sam Boyer wrote a good article about writing a package manager. He explores the topic quite well, indicating that there are several things that are conflated as package managers

https://medium.com/@sdboyer/so-you-want-to-write-a-package-m...


The issues here are 1) authenticity (tampering impossible) as well as 2) distributed source control (GIT already is a best-in-class FOSS for SC). So we need to make GIT able to publish something that is a BlockChain-encoded item (File metadata + Content bytes, on a tree/directory structure) then we're done!

Come to think of it GIT already is able to map commits to checksums. That's practically everything that is needed. As soon as someone implements a BLOCKCHAIN GIT protocol, we're done. The problem will essentially be solved. Or in summary: Implementing a BLOCKCHAIN that stores GIT commits, along with some metadata for each commit.

BlockChain is good for finance, as well as source control is the main point here. BlockChain can store anything but the point here is redundant, verifiable storage, of all sources.


What if an archived version of a module is found to contain errors or security holes? How would they fix that when the module is frozen in the BlockChain? Add a blacklist?


No, you just publish a new release with a suitable semantic version number. The point of the blockchain is immutability of each release, not limiting a package to only ever exist as a single release.


Something like an Ethereum contract could be used for this. And the smart contract could also allow for deprecating/flagging bad releases as long as the transaction comes from the original author.


A clickbait title (also not the original title) with buzz keywords.

A package manager should be fast, easy to use, with good conflict resolution —and useful information when this fail, fast downloads and a web of trust behind it.

Not incidentally, you get this from major GNU/Linux distributions.


I wonder why this article has gone 10 hours without a title change?


We already have the immutable repo that you are looking for. We do all our builds from a caching proxy server that pulls in new binary objects when we need them, but serves up the old one for the 99% majority of repeat builds.

Artifactory is one package that does all of this, but in the past I have rolled my own using various packages to cache maven repos, Ubuntu packages, Python Pypi packages. It is not that hard and makes you mostly unaffected by Internet outages and repos that go away for some reason.

And since the cache is under your control you can publish your own binary objects to it, and clean it up too if you really want to get rid of something.


An example would be "https://github.com/whyrusleeping/gx", a Go package manager build on IPFS


Happy archlinux pacman user chiming in.


Unless it changed since I used Arch last time some years back, Pacman still don't have mandatory package signing.


Pretty sure it has changed since then. At least, I used to make the same complaint as you, and a friend who uses Arch recently told me what I'm saying is way out of date.

https://wiki.archlinux.org/index.php/Pacman/Package_signing


Well, then it has changed. That is good.


NPM has a huge and strong community. There will be no more than a couple of days and you could see replacements of those modules again on npm, but with a different name.

Not to mention most of the modules on NPM that are maintained by one person are usually less than 1k lines of code.

If you have a small project then it would not be a huge thing to replace them, if you have a big project, you've probably already did that.


Unless one of those modules is used as a dependency in a lot of the other modules that you use, in which case a bunch of different people all need to update before your project is fixed. Which is exactly what happened in this case.


Sadly there's even something worse that can happen at this point. https://news.ycombinator.com/item?id=11343985


Why even have package managers? Serious question.

Do you really enjoy bloating your projects with 3 versions of the same package because it is required by 3 other packages that have been independently maintained?

Why would you not want to know what is in your app, if you spend so many man-hours building it?

Version control for all projects seems to be a much better solution than package management.


If that is truly a serious question and not just feigning surprise, then I have a serious question for you; have you ever worked on a project with so many people and so much history there is no way that you can personally understand it all?


Yes.


Why have tire factories? Don't you want to be confident that you know how your tire is made? Why don't we just make tires for all of our vehicles?


It's pretty crazy to me to think that so many people were relying on a module that merely left pads a string.


I think trusting people is actually a good thing which the node community has going for them. The opposite culture of vetting everything and making it really cumbersome to publish artifacts would most likely have led to that 'left-pad' would never hav been published.


>...they can even modify already-released packages.

I keep seeing people say that, but NPM rejects it if you try to publish a version that already exists. Am I misunderstanding what the author is saying?


It may be considering unpublishing as a modification.


I don't really use npm, but I imagine you could unpublish the version and then republish the version as a modified one?


Using hashes as identity rather than names would solve these problems.


What about secure? So much software, so little is being reviewed...


and use dht_rss for management of versions over bittorrent?

http://libtorrent.org/dht_rss.html


Another reason to use blockchain technology for trust management.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: