> Python packaging in general is such a messy ecosystem Not just messy. It’s pro...

hultner · on Dec 13, 2019

As someone who's been in software development for two the better part of two decades now I would say that's far beyond the truth.

Yes it's not great right now and it's not completely clear if I should recommend a new user pip+venv, pipenv, pyenv, poetry or conda and that's a big problem, Python was actually quite early to handle dependencies and packaging in a standardized and structured way. I remember that most of the projects I worked on early on in my career more or less completely relied on vendoring dependencies, if packaging systems existed they were of either very complex or the one that came with your os/distro.

However Python currently needs to catch up, we need some alignment and a clear path forward, it would also be great to have an official way of building artefacts. There's plenty of ways today but no great blessed way to recommend a newcomer.

cure · on Dec 13, 2019

> As someone who's been in software development for two the better part of two decades now I would say that's far beyond the truth.

As someone who has had to deal with Python's packaging a lot, I would say that it's really, really bad. Maybe not the worst in the world, but it is much closer to the worst than to the best.

There are 15 ways to package Python modules, none of which are feature complete (but a lot of them pretend to be). Every year or so someone decides that they need to write another packaging tool for Python, and then they give up before it is feature complete.

A tool like virtualenv is useful, but it overpromises and underdelivers (this is the overarching theme of Python packaging imo): it does not completely separate your environment from the system. E.g. virtualenv still uses the host system python packages cache, and it does something rather nasty with .so's that get copied from the host system...

And don't get me started on dependency management in Python. When the dependencies of your dependencies make breaking interface changes, you're in for a world of pain.

I find it amusing, in a sad way, that the language that prides itself on "there is one way to do it" screwed up so badly in the packaging department by not having one good way to do it. Meanwhile the language whose motto is "there is more than one way to do it" has a standard, sane way to package things, with multiple tools that work together in a coherent way. Perl got this one right.

chipotle_coyote · on Dec 13, 2019

> Meanwhile the language whose motto is "there is more than one way to do it" has a standard, sane way to package things, with multiple tools that work together in a coherent way. Perl got this one right.

I was always impressed by CPAN compared to most other package managers I've used. And, yeah, between Python, Perl, Node, Ruby, and even PHP, Python is by far the most inscrutable and the most prone to giving me fits when existing Python apps like Lektor just... inexplicably break after one of the Homebrew versions of Python gets updated.

BlueTemplar · on Dec 13, 2019

Oh, glad that I wasn't the only one appalled at how this breaks Python "rules" !

reledi · on Dec 13, 2019

To add to the confusion, there’s also setuptools, easy_install, wheels, eggs, ...

f00_ · on Dec 13, 2019

I mainly use virtualenv and would recommend that to people until they have this problem trying to use matplotlib on osx: https://matplotlib.org/3.1.0/faq/osx_framework.html

osx includes a "non-framework" build of python making it hard to use matplotlib

Actually just now finding out about "venv" in the standard library introduced in python 3.3

oblio · on Dec 13, 2019

Maven basically solved Java's dependency problems back in 2005.

zer00eyz · on Dec 13, 2019

Serious question: Isn't maven the reason that you can't really package Hadoop? It is my (high level) understanding that it is at the core of (or a big part of) why Hadoop is so hard to build on your own vs VM.

oblio · on Dec 16, 2019

Hadoop is hard to package because build scripts are treated as second hand citizens for most projects. There's a high standard of code reviews for many projects, but for the build scripts it's "it works, merge it".

I've built quite big projects with Maven (>2 million LOC) without issues.

Tajnymag · on Dec 13, 2019

Well, C/C++ dependencies are also a hell to reliably setup on different platforms.

josteink · on Dec 13, 2019

But C/C++ is messy because it doesn’t have a language/platform-provided package-management system, while python actually does.

So that’s apples to no oranges, I guess. And it still doesn’t leave python looking particularly good.

Scarblac · on Dec 13, 2019

But the Python mess because initially, it didn't have such a system either. Then a bunch of them were introduced, and some of them became official but there was never one that solved all the problems.

linuxftw · on Dec 13, 2019

I would consider the packages shipped by most distros as platform-provided package-management. It's not specific to C/C++, but if I'm developing something for say, Debian, I'm going to use as many system-provided libs as possible.

amlozano · on Dec 13, 2019

Have you tried Conan[1] by chance? I've only used it for small projects, but it was pretty nice.

https://conan.io/

madeofpalk · on Dec 13, 2019

I thought NPM was everyone's favourite hated package manager/repository?

ufmace · on Dec 13, 2019

The faults aren't so much with the actual NPM software as with the whole ecosystem. The real root of the problem IMO is that Javascript has such a tiny std lib compared to other popular languages that have package management systems. This encourages lots of people who are missing various functions normally found in std libs to write packages implementing various combinations of those functions. Those who are writing larger packages then depend on various combinations of those little helper packages. So of course, if you need to use multiple large packages to do something useful, they'll tend to pull in a huge forest of tiny packages in a bunch of versions.

egeozcan · on Dec 13, 2019

NPM itself was also not great, but did become great in the last majors. The weakness of the stdlib is still a problem but very, very slowly getting better with the new Ecmascript and node.js versions.

_eht · on Dec 13, 2019

Hit the nail on the head.

BlueTemplar · on Dec 13, 2019

left-pad am-i-right?

josteink · on Dec 13, 2019

It clearly has its faults, but it is simple to use, requires no “venvs”, is CI-friendly and mostly does what it’s supposed to with few surprises.

Much unlike how things work with python.

fock · on Dec 13, 2019

And it comes from a time, where a 2MB-webpage is considered acceptable. The whole idea of venvs is basically still considering some reuse. If you shout out loud "SCREW YOU LIBRARY REUSE", then npm is perfect of course...

rpedela · on Dec 13, 2019

It is, at least on HN, but it is also one of the best overall.

owl57 · on Dec 13, 2019

Yes, but for opposite reasons. You probably don't get left-pad in an ecosystem without one — and preferably only one —obvious way to publish and use it.

sosodev · on Dec 13, 2019

I might be alone but I hate dealing with Java’s package management system.

toraobo · on Dec 13, 2019

What would be an example of a good packaging ecosystem?

nonbirithm · on Dec 13, 2019

Probably cargo.

- It was built in to Rust from the beginning and officially sanctioned so fifteen different people don't have to build their own incomplete, buggy package managers.

- You can add plugins for things like automatically updating or adding dependencies.

- It handles projects and subprojects.

- All you have to do to install dependencies and run a project is "cargo run", reducing friction for getting into new projects.

- For the most part, it just works.

That said it isn't perfect, especially when needing custom build scripts, but it's good.

cassianoleal · on Dec 13, 2019

Golang is probably the worst. Python a close second.

theamk · on Dec 13, 2019

Cmake? (And especially Kitware Superbuild... try pinning packages there!)

mateuszf · on Dec 13, 2019

Isn't it a build tool, not package manager?

theamk · on Dec 13, 2019

It is used as package manager often enough -- stick "git clone", or ExternalProject into the CMakefile and you have the (bad) equivalent of "pip install" for C++.

For example, I mentioned "kitware superbuild" above. It describes itself [0] as:

> It is basically a poor man’s package manager that you make consistently work across your target build platforms

[0] https://blog.kitware.com/cmake-superbuilds-git-submodules/

madeofpalk · on Dec 13, 2019

I thought NPM was everyone's favourite hated package manager?

rthomas6 · on Dec 13, 2019

C/C++.

gwd · on Dec 13, 2019

C/C++'s package management story is "defer it to distributions". Personally I much prefer `apt-get install libfoo-dev` to 1) each language inventing its own incompatible system, and 2) each developer self-publishing, so there's little to no safety or accountability when adding a new dependency.

yjftsjthsd-h · on Dec 13, 2019

So, to be clear, I also tend to prefer distro-supported packaging. However. Forcing everything to go through the distro means that you massively limit what's available by raising the barriers to entry, you slow pushing new versions (ranging from Arch's "as soon as a packager gets to it" to CentOS's "new major version will be available in 5 years"), and you lock yourself into each distro's packages and make portability a pain (Ubuntu ships libfoo-dev 1.2, Arch ships libfoo-dev 1.3, CentOS has libfoo-devel 0.6, and Debian doesn't package it at all). When distro packages work, they're great, but they do have shortcomings.

teddyh · on Dec 13, 2019

> limit what's available by raising the barriers to entry

But that’s what you would have to do youself anyway. You can’t use all fresh upstream version of everything, since they don’t all work together. So some versions you’ll have to hold off on, some other versions might require minor patching. But this is exactly what distro maintainers do.

rthomas6 · on Dec 13, 2019

That's a fair point. I live in embedded world most of the time, and using third party libraries in that space is not always so easy :)

MadWombat · on Dec 13, 2019

> Personally I much prefer `apt-get install libfoo-dev`

So, what do you do when different projects need different versions of libfoo? Right, you download and build it locally inside the project tree and fiddle with makefiles to link to the local version rather than the one installed by apt-get. So basically you do your own dependency management. Good luck with that.

gwd · on Dec 13, 2019

> So, what do you do when different projects need different versions of libfoo?

Well typically other people's projects I would normally rely on the distro to compile other people's projects as well; and then the distro maintainers would sort that out. That may include making a patch to allow older projects to use newer versions of a library; or if the library maintainer made breaking changes, it might mean maintaining multiple versions of the library.

Obviously if I myself need a newer version than the distro has, I may need to work around it somehow: I might have to build my own newer package, or poke the distro into updating their version of the library.

I mean, honestly, I don't build a huge number of external projects (for the reason listed above), and so I've never really run into the issue you describe. It seems to me that the "language-specific package" thing is either a side-effect of wanting basically the same dev environment in Windows and MacOS as on Linux, or of people just not being familiar with distributions and seeing their value.

teddyh · on Dec 13, 2019

> So, what do you do when different projects need different versions of libfoo?

That’s an untenable situation. The package which depends on the older version of libfoo is either dead, in which case you should stop using it, or it will soon be updated to use the newer version of libfoo, in which case you’ll have to wait for a newer release. This is what release management is.

MadWombat · on Dec 13, 2019

> The package which depends on the older version of libfoo is either dead, in which case you should stop using it, or it will soon be updated to use the newer version of libfoo

So you are suggesting that every time libfoo bumps its version I have to update the dependencies on all of my projects to use the latest, find and fix all the incompatibilities, test, release and deploy a new version? Seriously?

doubleunplussed · on Dec 13, 2019

Yeah, I mean what's the alternative? Your code will bit rot if you don't keep up. You don't have a living software project if you don't do this.

You should read the release notes of the new version of your dependency, fix any obvious issues from that, see if your tests pass, and wait for bug reports to roll in for non-obvious things not caught by automated tests. Ideally you should do this before the new release of the dependency hits the repos of the distros most of your users use, so that it's only the enthusiasts that are hit by unexpected bugs.

Even if you could delay and batch the work together every several releases of a dependency, you're still doing the same amount of work, and it's usually simpler to keep up bit by bit than all at once.

One trick is to not use dependencies that have constant churn and frequent backward-incompatible changes, and to avoid using newly introduced features until it's clear they've stabilised. When you choose dependencies, you're choosing how much work you're signing up for, so choose wisely.

Of course you could go the alternate route and ship all dependencies bundled - but that is a way to ignore technical debt and accidentally end up with a dead project.

Also, your project should not demand a specific version of `libfoo`. If `libfoo` follows semver and a minor release breaks your project, that is a bug in `libfoo`. Deployments of production software should pin versions, but not your project itself.

MadWombat · on Dec 13, 2019

> You don't have a living software project if you don't do this.

What I might have is a working piece of software that is an important part of company infrastructure. For mission critical software reliability is a much more important metric than being current. Unless there is some really compelling reason to update something it should not and will not get updated. There are mission critical services out there running on software that hasn't been changed in decades and that is a good thing.

> Also, your project should not demand a specific version of `libfoo`. If `libfoo` follows semver and a minor release breaks your project, that is a bug in `libfoo`.

Who the fuck cares if it is their bug or not? I need my service working, not play blame games. And if I have a well tested version deployed, why the hell would I want to fuck with that? And if I have tested my service when linked against libfoo 1.6.2.13.whatever2 I had better make sure that this is the version I have everywhere and that any new deployments I do come with this exact version.

But if I start a new project, I might want to use libfoo 3.14.15.whocares4 because it offers features X, Y and Z that I want to use.

teddyh · on Dec 13, 2019

Exactly. This is what we signed up for when we release software and commit to keeping it maintained. This is what we do.

If you instead just like writing software and throwing it over the wall/to the winds, you are an academian in an ivory tower, and have no connection to your users in the real world.

MadWombat · on Dec 13, 2019

> you are an academian in an ivory tower, and have no connection to your users in the real world

Quite the opposite, actually. My responsibility is to my users. And that responsibility is to keep the software as stable as possible. So the only time I will consider upgrading my dependencies is when reliability requires it. If libfoo fixes some critical bug that affects my project, yes, maybe I should upgrade (although I am running the risk of introducing other regressions). If libfoo authors officially pronounce end of life for the version of libfoo I am using, maybe I should consider upgrading, even though it is safer to fork libfoo and maintain the well tested version myself. But it is irresponsible to introduce risk simply to keep up with the version drift. So if my project is used for anything important, I should strive to never upgrade anything unless I absolutely must.

teddyh · on Dec 13, 2019

> I should strive to never upgrade anything unless I absolutely must.

It seems that the choice is whether to live on the slightly-bleeding edge (as determined by “stable” releases, etc), or to live on the edge of end-of-life, always scrambling to rewrite things when the latest dependency library is being officially obsoleted. I advocate doing the former, while you seem to prefer the latter.

The problems with the former approach are obvious (and widely seen), but there are two problems with the latter approach, too: Firstly, you are always using very old software which are not using the latest techniques, or even reasonable techniques. This can even be considered to be bugs – like using MD5 hash for example, which, while being better than what preceded it, much software were using MD5 as a be-all-and-end-all hashing algorithm; this turned out later to be a mistake. The other problem is more subtle (and was more common in older times): It’s too easy to be seduced into freezing your own dependencies, even though they are officially unsupported and end-of-lifed. The rationalizations are numerous: “It’s stable, well-tested software”, “We can backport fixes ourselves, since there won’t be many bugs.” But of course, in doing this, you condemn your own software to a slow death.

One might think that doing the latter approach is the hard-nosed, pragmatic and responsible approach, but I think this is confusing something painful with something useful. I think that doing the former approach is more work and more pain from integration, and the latter approach is almost no work, since saying “no” to upgrades is easy. It feels like it’s good since working with an old system is painful, but I think one is fooling oneself into doing the easy thing while thinking it is the hard thing.

The other reason one might prefer the former approach to the latter is that by doing the former approach, software development in general will speed up by all the fast feedback cycles. It’s not a direct benefit; it’s more of an environmental thing which benefits the ecosystem. Doing the latter approach instead slows down all feedback cycles in all the affected software packages.

Of course, having good test coverage will also help enormously with doing the former approach.

MadWombat · on Dec 13, 2019

> but I think this is confusing something painful with something useful

There is software out there that absolutely cannot break. Like "if this breaks, people will die". Medical software, power plant software, air traffic control software, these are obvious examples, but even trading and finance software falls into this category, if some hedge fund somewhere goes bankrupt because of a software bug real people suffer.

It doesn't matter how boring, inefficient and outdated these systems are. It doesn't matter how much pain they are to maintain and integrate. These are systems you do not fuck with. A lot of times people who do maintenance of these don't even fix known bugs in order to avoid introducing new ones and to avoid the rigorous compliance processes that has to be followed for every release. Updating the software just to bump up some related library to the latest version is simply not a thing in this context.

I am not working on anything like this. I work on a lighting automation system. If I fuck up my release, nobody is going to die (well, most likely, there are some scenarios), but if I fuck up sufficiently, a lot of people will be incredibly annoyed. So I have every version of every dependency frozen. All the way down the dependency tree. I do check for updates quite often and I make some effort to keep some things current, but some upgrades are simply too invasive to allow.

teddyh · on Dec 13, 2019

> I am not working on anything like this. I work on a lighting automation system. If I fuck up my release, nobody is going to die (well, most likely, there are some scenarios), but if I fuck up sufficiently, a lot of people will be incredibly annoyed. So I have every version of every dependency frozen. All the way down the dependency tree.

Most people’s systems are not that special that they absolutely need to do this, but it feeds one’s ego to imagine that it is. And, as I said, it feels more painful, but it’s actually easier to do this – i.e. being a hardass about new versions – than to do the legitimately hard job of integrating software and having good test coverage. It feeds the ego and feels useful and hard, but it’s actually easy; it’s no wonder it’s so very, very easy to fall into this trap. And once you’ve fallen in by lagging behind in this way, it’s even harder to climb out of it, since that would mean upgrading everything even faster to catch up. If you’re mostly up to date, you can afford to allow a single dependency to lag behind for a while to avoid some specific problem. But if you’re using all old unsupported stuff and there’s a critical security bug with no patch for your version, since the design was inherently buggy, you’re utterly hosed. You have no safety margin.

Too · on Dec 14, 2019

This is the danger of living on the edge of EoL, as you called it. Once you are forced to update a packages, usually with short notice at the most inconvenient time ever due to some zero-day vulnerability found in one of your depdendencies. Then the new version no longer supports another old version it has a subdependency to, which you also pinned, so you have to update the subdependency also. And to upgrade that package you have to update yet another subdependency, and so on.

Suddenly a small security-patch forces you to essentially replace your whole stack. If your test flags any error you have no idea which of the updated subpackages that caused it, because you have replaced all of them. Eventually you accumulate so much tech debt that it's tempting to cherry-pick the security patches into your packages instead of updating them to mainline, sucking you even deeper down the tech debt trap.

Integrating often means each integration is smaller, less risky and easier to pinpoint why the failure happens. Of course this assumes you have good automatic test coverage, which i assume you do if the systems are as life-critical as parent claim them to be.

There's also a big difference between embedded and connected systems here. Embedded SW usually get flashed once and then just do whatever they are supposed to do. Such SW really is "done", there is no need to maintain it or it's dependencies because it's not connected to the internet so zero-days or other vulnerabilities are not really a thing.

McP · on Dec 13, 2019

Conan is actually quite nice. I've been porting a series of projects to it and it's been a pleasant experience - Conan is very flexible, the documentation is thorough and the developers are very responsive on Slack/GitHub. https://conan.io/