Hacker News new | past | comments | ask | show | jobs | submit login
Which programming language has the best package manager? (versioneye.com)
48 points by reiz on Jan 16, 2014 | hide | past | favorite | 104 comments

This guy doesn't seem to understand the real differences between java/clojure packaging, and scripting language packaging.

Ruby's "gems" are source packages. When a gem is installed, arbitrary code is executed, C stubs (or even C libraries!) are compiled, and the resulting artifacts can be very different on different systems. Some Ruby gems even modify the Ruby code at install-time. The same is true for Perl (CPAN) and Python (setuptools, pip, etc)

Java and Clojure packages are artifacts. Each jar is a finished, built product. If it has JNI stubs, those stubs were pre-built against a specific version. The artifact is identical on every installed system. No arbitrary code is executed. Reasoning about the contents of a Clojure package does not require me to solve the halting problem.

The artifact approach is unambiguously better for any production deployment. The source-based approach found in Ruby, Perl, and Python is a problem for me more often than a solution. It is hypothetically great that I can use the same "gem" on OSX and Linux, but it is more important to me that I get consistent deployments on two different Linux systems.

The artifact approach is unambiguously better for any production deployment.

Which is precisely why many Linux distributions use them, and "app stores" were already a thing in Debian in the mid-90s. Every time I read about a programming language's "package manager" I can't help but feel that the creators of such are reinventing the wheel, poorly, and mostly to paper over issues in backwards OSes that don't have proper package management.

Hmm. This is an appropriate implementation for an operating system, but I don't think it's suitable for a programming language.

To take Ruby as an example - I have lots of different interpreter versions installed. 1.8.7, 1.9.3, 2.0, 2.1, jruby, rubinius. There's some scope for managing this in OS package managers, but it's already complex. Then for each of these environments, I want a different set of packages with different versions (some are compatible with older versions, and some aren't, for example). And finally, I want different versions and combinations of these packages for different apps that I might want to run - again, there's limited scope to implement this in an OS package manager.

It's only a problem because the versioning on most ruby gems is broken, because Ruby package managers enable such breakage.

The big difference between dpkg and gem is that gem allows you to install more than one version of a gem at a time. With dpkg you overcome this by creating a whole new package when the major version changes. Thus I can install libcap and libcap2 simultaneously. Any breakages in minor version changes are considered significant problems.

This is the right thing to do in a production environment: minor & patch level increments are very likely bug fixes and security fixes, and should be applied to all packages that use that library.

But it's a PITA for development. And since package manager developers are mostly developers, they did what was right for them, and allowed multiple versions to be installed simultaneously.

Today, the right answer is to use something like Vagrant or Docker so that development environments are much closer to production environments.

There are two easy/sane ways to solve your problem.

1. RHEL / CentOS solve this problem with "software collections," aka "scl." Using scl gets you the advantages of ye olde retargetable RPMs but without all the hassle and suffering. Makes it relatively easy to scope your packages that way.

2. fpm doesn't do nearly as bad a job as it used to. Given a gem, it can correctly build it, assign its C dependencies, and assign a prefix so your gems for ruby-ee don't overlap with ruby19 or ruby18.

You ideally want both. Source packages for development work, and artifacts for deployment. I've switched to .debs for deployment. Oh so simple.

Oh yeah, don't get me wrong - that's lovely. There's nothing more frustrating that trying to get a mildly complex Rails app deployed on a mutable server.

.. or they don't like the limitation of only having 1 version of any artifact installed at a time, artifacts being installed globally, etc. etc.

Of course they could target Nix instead, but how many people use Nix?

"Of course they could target Nix instead, but how many people use Nix?"

More than use the new system they developed instead, at the outset.

You make it sound like "papering over issues in backward OSes" is a bad thing. The fact is, those "backward OSes" have a HUGE user base, no matter how much you may judge them for being inferior. There is clearly demand for a tool that:

1. Allows developers to distribute their stuff to users on those OSes. 2. Requires the least amount of effort as possible on part of the developer. 3. Does not require the developer to custom-package their stuff multiple times, once for each OS.

Face it, developers are lazy (not meant in a derogatory way). Most solutions do not meet requirements 2 and 3, which is why developers will largely choose to avoid them until they have no choice.

And also, what matthewmacleod said in this thread is very true.

> creators of such are reinventing the wheel

It's kinda funny how they named the latest and greatest in the Python world because of this: http://wheel.readthedocs.org/en/latest/

I don't think the artifact approach is unambiguously better. Sure, "consistent deployments on two different Linux systems" is important but you can achieve that with rubygems along with the extra flexibility that distributing the packages as source provides.

Two approaches you can take:

1. bundle package will package up all of your installed gems, which you can then deploy with your project.

2. Your build server can install all of your gems, and then you can package up the resulting directory and deploy it - you could also build e.g. an rpm package.

"bundle package" still just bundles the source packages.

Installing all of the gems then packaging the resulting directory is an example of an artifact-based approach.

I prefer to do it gem-by-gem using 'fpm' in order to maintain better dependency data -- fpm will invoke rpmbuild with the right options to find shared objects and automatically attach correct dependency data for the ruby runtime, C libraries etc

Python has wheel, which is a binary package format that pip supports. It is something I've been looking into for deploying Python packages in production.

Edit: Link to PEP for Wheel: http://www.python.org/dev/peps/pep-0427/

I used to be in the Ruby / Rails / Bundler camp, and this was indeed an annoyance.

But with Node.js, the typical deployment scenario for us, and I believe others as well, is to simply tar up the entire app including dependencies.

I'm not sure if this is possible with the other scripting languages you mention.

I also don't believe this is as big an issue in development for Node.js, because most packages are pure-JS, and don't run any build steps whatsoever.

Finally, I think it's also very convenient to have my dependencies' sources available in development. I trace into them quite often.

Bundler offers similar functionality, but it seems like most people don't use it. http://bundler.io/v1.5/bundle_package.html

"bundle package" caches the source packages, the gems themselves. not the artifacts produced by installing the gems.

I think virtualization can take care of consistent deployments. And developing under a virtualized container ensures consistency from multiple platforms.

My point is: a source-based approach is not the tool to blame for consistency, it's used with lazy concern, and that's a mistake onto the crafter.

I work on two different platforms as one. Virtualization achieves that.

Yes, freezing particular OS configurations into golden images on a virt platform can make a source-based approach repeatable. Unfortunately, this just moves the problem: I now have to maintain the golden images.

Relying on virtualization for repeatable source-based deployment is like screwing in a lightbulb by rotating the entire building around the bulb.

The artifact approach is unambiguously better for any production deployment.

Which is only a problem if you're still doing application deployments directly to production machines, which is becoming less and less common. Tools like Docker, Packer and Aminator allow you to do the package install at build time.

If anyone else was wondering: http://en.wikipedia.org/wiki/Halting_problem

Specifically, much gem metadata is stored as a ruby script. To get some information about a gem, I have to execute the ruby code.

Python libraries that use setuptools are the same way.

What gems modify the Ruby code?

puppet, facter, rake, mongrel use 'sed' to edit the code on the fly at install time, to set shebang lines etc

There are surely many others, these are just the ones I have at hand as examples.

I assume he means: any gems that monkey-patch any of the built-in classes. I believe ActiveSupport modifies things like Fixnum (1.days.from_now, for example).

Does it really happen at install time or (like in Python) at runtime everytime the package/module/gem is loaded/required?

ActiveSupport doesn't modify the base Ruby classes until it's been loaded. Merely installing gems will not impact anything except disk space.

Yes. That's what I meant.

Ah. Ok. I was thinking something changing core files on disk.

I bet that's doable ...

Change string.c, recompile, if a gem can execute arbitray code.

They really should have added CPAN (for perl) to their list. Perl might not be the most popular language nowadays, but in my opinion, their package manager gets a lot of things right...

* centrally hosted

* mirrored all over the world

* packages are automatically checksummed & the checksum is compared on install

* is is quite easy to publish packages

and very important in my opinion

* it encourages you to write tests. Unit tests included in packages are automatically executed when it is installed. Many systems (semi)-automatically report back when the installation of a package fails. Hence you get tests on tons of platforms for your package

* pod documentation is automatically available on cpan.

More importantly, without CPAN, we might not have things like pip and npm.

Please check CPAN Testers and copy that too to the Python world.

Not only will you get comprehensive tests and reports of all the packages with different OS/language versions, it is a great test of the language implementation with different C compilers etc.

We are constantly adding new package managers to our system. CPAN and Nuget are already on our radar. We will update the blog post in a couple months with them.

I'm picking up Python in my new job, and one thing that I dearly miss from Perl is an equivalent to Dist::Zilla (https://metacpan.org/release/Dist-Zilla). After a bit of initial setup (name, PAUSE account, etc), publishing is about as easy as

   dzil new My::Module
   <build library>
   dzil test #Make sure the build and tests go okay
   <set version and add changes to changefile>
   dzil publish

AFAICR, CPAN is PM's PM, and it's been that way for as long as I can remember. When is the next Perl resurrection ?

No clear winner when npm has more dots than all of the others? What's the point in even comparing them, then?

I think npm's approach is pretty cool -- it sacrifices disk space for portability (package dependencies are always installed in the node_packages directory of the package root), so while you might effectively have 3 or 4 of the same coffee-script folder in your install, you don't have to worry about virtual environments for development and you don't have to worry about dependency upgrades breaking some other package. I think the nature of javascript allows this to be effective considering that many dependencies are smallish snippets of code.

No. There is no clear winner. The point is to show how other communities are solving this problem. By comparing them you get maybe new ideas.

NPM is doing a pretty good job. If they would sign their packages and make the licenses mandatory it would be the clear winner. Only 2 points are missing.

It appears to be biased towards npm given that one set of dots is "JSON Based" and could have just as easily been "XML Based." I mean strictly the chart, not the article. The article seemed fair.

In the first version of the chart I had column for "XML". But it didn't looked good, because there was only 1 candidate for XML and that was Maven. That's why I dropped it again. Beside that I think that XML is kind of overkill for dependency management. JSON is just fine, much smaller and easy to read. Having the dependency definition in native code (Python, Groovy, whatever) is not a good decision, in my opinion, because that's a source for security vulnerabilities.

I understand, you make a choice that summarizes the data the best that you can; I wasn't trying to attack you. I was trying to point out that choice of a language for the representation of your configuration is exactly that, a choice.

I don't know that I agree that having the dependency definition in native code is a source of vulnerabilities; it could make the definition easier to write for people packaging code in that language. Also, I think JSON could be considered native code for npm since it is written in nodejs/javascript.

I didn't felt attacked. I appreciate feedback ;-) And you are right. The choice of a language is just a choice. Dependency definition in native code is always a security vulnerability, because by resolving the dependencies you execute unknown code, specially if the packages are not signed. I could for example publish a python package on PyPI with a setup.py which contains code to delete files on your hard disk. At the moment my setup.py gets executed on your machine you will lose some files. Something like that can not happen with JSON, XML or YML.

> package dependencies are always installed in the node_packages directory of the package root

Usually but not always. You can also have peer dependencies: http://blog.nodejs.org/2013/02/07/peer-dependencies/

More dots means the author has deeper understanding in npm and thus found more to write about it.

Actually I'm using in my daily work more Bundler and Maven. But I'm anyway a big fan of NPM.

I find it funny that mother of all package managers, Perl's CPAN is only mentioned in a footnote.

I'm sorry for that. But I'm getting every day requests for Bower, CocoaPods and Nuget. I don't get so many requests for CPAN. But It's on my radar and I will update the blog post somewhere this year.

Doesn't CTAN predate CPAN?

It does, but I don't think TeX counts as programming language.

FWIW, TeX is turing complete (as is PostScript).

The details for Maven an Lein are a bit off. They both use the same repositories (I think they even use the same library to download packages) yet they are reported as having different qualities in this area (licensing, mirrors, etc.). I don't think licenses are mandatory for either and there are tons of mirrors. You can even create your own pretty easily [1] (at least for the subset of libraries you actually use).

[1] http://www.sonatype.org/nexus/

Maybe I was not clear enough on that point. Of course Lein is using a Maven Repository as backend. But the official Clojure Repository is not search.maven.org it is https://clojars.org. And there is no Mirror of clojars.org. Of course it's possible to mirror it, because it's a Maven Repo. But currently nobody is doing that.

> Usually GEMs are not cryptographically signed! This can lead to security issues! It is possible to sign GEMs, but it’s not mandatory and most developers don’t sign their GEMs.

Unfortunately, Nobody Cares About Signed Gems: http://www.rubygems-openpgp-ca.org/blog/nobody-cares-about-s...

We've tried making a difference in Phusion Passenger by setting an example, and supporting gem signing (http://www.modrails.com/documentation/Users%20guide%20Nginx....). All Phusion Passenger gem releases since 4.0.0 RC 4 (1 year ago) are signed. All our other gems (default_value_for, etc) are also signed. Unfortunately not many people followed.

We'll continue to sign all our stuff, but it's sad that it never took off community-wide.

I have read that blog post a couple days before. To sad. I think the people don't sign their Gems because it's some extra work and developers are lazy. I bet the artifacts in Maven are only signed because it's mandatory. You can not submit a unsigned artefact to search.maven.org. They will decline it. But in the intranets of many companies their are a lot of self hosted Maven Repositories and believe me, nobody is signing the Jars their!

If we want to have more security in the Ruby community then there is only one way. RubyGems has to decline every unsigned Gem. Signing Gems must be mandatory.

I have to disagree with this statement: "They learned from the failures of other package managers. It’s almost prefect!"

I find the following cons with NPM:

- shrinkwrap does not work as well as the alternatives in Bundler (yes, that's plural).

- specifying versions as git hashes does not verify the hash on "npm install"

- npm link is more awkward than using :path => in Bundler.

The big difference between NPM and bundler is that NPM allows common dependencies of dependencies to have different versions. This is usually awesome, except when it isn't -- it can cause subtle breakages if you try to pass an object from one dependency to another. This is very rare though, and if I had to choose, I'd choose the NPM behaviour.

I didn't know that with the git hash. That's a good point! What I like on NPM is that they use JSON for defining the dependencies and that the packages are by default not installed globally. NPM and Bundler are different in some ways, but in general they do both a good job.

I'd also add:

- Allowing multiple versions of the same package in one application is asking for Cthulhu to climb out of the ocean and burn a whole in your mind and your program.

One axis which matters to me which is largely unaddressed here: how easy it is to set up your own package server, and whether the package manager can deal with more than one package source at a time. It's mentioned in passing, but it would be good to know how well this works in a little more depth.

It's virtually trivial to set up a gem server, for instance, but I wouldn't know where to start for the others.

Good point. I will take that thought into the next blog post. The next blog post will cover CPAN and Nuget, too.

He missed my most crucial requirement: Can a package manager support more than it's chosen language? Am I forever consigned to vendor lock-in?

Every one of these tools is 100% focused on supporting a single language. They go so far as to use that language as the build spec so even attempting to bend it to another platform feels alien. It's okay if I only ever want to use Ruby or Javascript will be my thing for the next decade. But what if I want to take my existing npm or gem workflow or knowledge and apply it to a new language? I have to start all over from scratch. I'll have no idea how to fetch dependencies or package up my stuff for sharing.

If I start to tinker with more than a handful of languages, my head is going to be full of details on package management rather than APIs or library capabilities. Who wants that?

Very good point. I didn't took that into the article because these package managers are all build for 1 single language. But I totally get your point. A language agnostic package manager, or at least a language agnostic repository with a clear defined API, would be awesome! The clients could still be different for each language, to handle language specific problems, but the repository server could be the same.

Great overview, but do you think it's possible to compare package managers between different programming languages? I mean they all work quite differently when it comes to dependencies, build process etc. I'm a Python coder & pretty happy with PIP and PYPI...

Yes. I think comparison is possible and important. Of course dependencies are handles in every language a bit differently. But on the other side they all do the same, more or less. The differences are more on the client side.

PIP is a good tool. But in my opinion the definition of dependencies shouldn't be in setup.py file. Because it's too easy to execute random code in a setup.py. A JSON file is the better choice for a project file, in my opinion.

I agree, when you're solving a problem, is package management a big enough issue to influence what language you're going to choose? I think it's an interesting question and not sure of the answer. My gut says no, but others might think otherwise.

fwiw, julia's package manager is interesting - it's a wrapper around git. i don't know the details, but it seems to work well and allows you to have both "standard" and "in development" packages.

As someone who is new to frontend/js development; Bower is a life savior.

It's currently the only solution. There not alternatives. It's cool that they build it, but currently it has many weak points. I hope they will add user auth and some kind of validation soon.

Seems odd that they'd totally miss out Pip for Python.

As I understand it, pip == PyPI.

When you type 'pip install package_name', pip connects to PyPI to retrieve package_name.

PIP and Python is on the list. Look at PyPI.

Am I the only person that gets confused between PyPI and PyPy?

I personally am quite fond of Haxelib. First of all, installing Haxe with Haxelib was dirt simple, but what I like is it seems to always resolve dependencies properly, its easy to upgrade single packages, grab dev versions, or upgrade all your libraries at once. Updating the language or Haxelib itself is also easy. And its fast as hell.

In fact, installing the entire Haxe toolchain is the easiest I've ever experienced.

Bundler is just a dependency manager, not a package manager. Is the same if you told apt-get is a package manager. Dpkg is a real package manager.

Well. You are right. Next time I will pick my words more carefully.

Rubygems has at least one public mirror: http://stackoverflow.com/a/17150536/437888

There are also solutions such as https://github.com/pusewicz/rubygems-proxy allowing you to cache / proxy to rubygems.org

But the mirror posted their doesn't work for me. I get always a Forbidden message: http://mirror1.prod.rhcloud.com/mirror/ruby/. Or is it just me?

You are not supposed to be able to hit the url in your browser. You are supposed to use it as a gem source in a Gemfile or with the gem command line tool.

There is also http://gems.gzruby.org/ if you happen to be behind the Great Firewall of China

Thanks for the hint. I really didn't know that. I will update the blog post with my new knowledge. One point more for Bundler/RubyGems :-)

To be exact, Composer allows you to install packages which don't have a license, but you cannot publish a package on Packagist anymore if it does not have a license (and packages which were submitted before the license became required are not updated anymore in the registry until they specify the license)

Nils Adermann just confirmed that. I updated the blog post and the info graphic.

As a long time maven user who recently started using PyPi, I have to say that PyPi is a mess. I frequently get broken packages, timeouts, etc. Most python users seem to have been trained to keep local copies, and/or to download once and tarball it up, and don't see a problem.

Good point. My next package manager after Maven was Bundler/RubyGems. And it was enlightening. I don't use so much PyPI, but I heard that the core commuters are working on a big refactoring.

Also worth mentioning, Maven manages dependencies but it's primarily a build system.

Gradle is another build system with dependency management.

The initial setup of publishing an artifact is high on Maven Central, but Sonatype makes it easier. Even easier is publishing to clojars.

Yeah, that is good point. I guess should have point out that in the article.

But I still think that the initial publishing on Maven Central is more difficult then it should be. Pretty much every other package is doing a better Job on publishing. It took me less then a minute to publish my very first Ruby Gem. But it took me 1 week to publish my very first Jar file on the Maven Central.

Clojars is doing better.

I had real issues with NPM when trying to work on a spotty network connection and offline. It really really wanted to go and hit the network over and over to check versions.

Was just working on this with the npm developers. https://github.com/npm/npm/issues/4131

I was actually thinking more like maven where it has a local cache that it checks first and like everything should do and not bother with the net if there is no connection, which no desktop app seems to be able to do.

GEM isn't an acronym. It's just a name.

Sent from my Apple MAC.

Thanks for the note. Actually I don't know why I wrote it like an acronym. I should now it better!

A much better title to this article would be "Comparing Programming Language Package Managers".

You will laugh. That was my first title. I showed the article around, 2 days ago, and a good friend of my told me: "You need a more provocant title!".

Has anyone tried doing something like npm for C?

Or would we consider apt-install <whatever>-dev to be that?

I talked to many C devs, and they all use some kind of linux native package managers. For example apt-get or yum or RPM.

Yeah, that's what I use, but that also means that on Windows I sometimes have annoying troubles hunting down dependencies. Then again, there would be the temptation to somehow bolt that functionality onto cmake, and then we'd all be doomed.

That's true. That's the disadvantage of that system. If there would be a language agnostic package manager, would you use it?

For all the java devs who can scape maven insanity exists ivy.

Lots of people seem to be moving over to Gradle now? I don't suppose anyone can provide an opinion on Gradle vs. Maven?

Personally I've always found Maven to be a pain, but a necessary evil for larger projects / teams. I've tried using it for smaller personal projects and the benefit wasn't worth the time invested.

I'd rather not invest more time learning Ivy / Gradle / anything else until someone can convince me it's not the same old crap under a different label...

Good point! I will blog more about that. I think there is a reason for that, why there are so many build tools out there in the Java ecosystem. Obviously there is no clear winner and the people are not satisfied with the current status.

Will Gradle ever expose an API so developers can use the same language for the build scripts as they do for the projects they're building, i.e. Groovy with Gradle for Groovy projects, and Scala with Gradle for Scala projects?

The Gradle overview at http://www.gradle.org/overview says "The Gradle design is well-suited for creating another build script engine in JRuby or Jython. It just doesn't have the highest priority for us at the moment."

Perhaps if Gradle easily enabled any JVM language to be used, there'd be far greater uptake. There's a few of us out here who don't like using Groovy.

No Nuget?

Nuget is on my radar. I will update the blog post in a couple months with Nuget and CPAN. It just takes some time.

C with no doubt possible.

I'd say Scala's SBT. Detractors say it's too hard, but unlike other build tools, it let's you get the job done with minimal fuss.

I've been pleasantly surprised using SBT... it's been doing the job for me for a year now and I haven't had any problems.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact