Hacker News new | past | comments | ask | show | jobs | submit login

This will be a somewhat intemperate response, because as a developer of a significant library I found this quite irritating.

If you publish a Python library without pinned dependencies, your code is broken. It happens to work today, but there will come a day when the artifact you have published no longer works. It's only a matter of time. The command the user had run before, like "pip install spacy==2.3.5" will no longer work. The user will have to then go to significant trouble to find the set of versions that worked at the time.

In short unpinned dependencies mean hopeless bit-rot. It guarantees that your system is a fleeting thing; that you will be unable to today publish an end-to-end set of commands that will work in 2025. This is completely intolerable for practical engineering. In order to fix bugs you may need to go back to prior states of a system and check behaviours. If you can't ever go back and load up a previous version, you'll get into some extremely difficult problems.

Of course the people who are doing the work to actually develop these programs refuse to agree to this. No we will not fucking unpin our dependencies. Yes we will tell you to get lost if you ask us to. If you try to do it yourself, I guess we can't stop you, but no we won't volunteer our help.

It's maddening to hear people say things like, "Oh if everyone just used semantic versioning this wouldn't be a problem". Of course this cannot work. _Think about it_. There are innumerable ways two pieces of code can be incompatible. You might have a change that alters the time-complexity for niche inputs, making some call time-out that used to succeed. You might introduce a new default keyword argument that throws off a *kwargs. If you call these things "breaking" changes, you will constantly be increasing the major version. But if you increase the major version every release, what's the point of semver! You're not actually conveying any information about whether the changes are "breaking".




If you publish a Python library with pinned dependencies, your code is broken as soon as someone tries to use it with another Python library with pinned dependencies, unless you happened to pin exactly the same version of the dependencies you have in common.

Python libraries should not pin dependencies. _Applications_ can pin dependencies, including all recursive dependencies of their libraries. There are tools like Pipenv and Poetry to make that easy.

This is less of an issue in (say) Node.js, where you can have multiple different versions of a library installed in different branches of the dependency tree. (Though Node.js also has a strong semver culture that almost always works well enough that pinning exact versions isn’t necessary.)


The most frustrating thing is that pip doesn't make it easy to use more loose declared dependencies while freezing to actual concrete dependencies for deployment. Everybody rolls their own.

> Python libraries should not pin dependencies. _Applications_ can pin dependencies, including all recursive dependencies of their libraries.

Is the pypi package awscli an application or a library?

poetry is frustrating in that it doesn't allow you to override a library's declared requirements to break conflicts. They refuse to add support [1][2] for the feature too. awscli for example causes huge package conflict issues that make poetry unusable. It's almost impossible not to run into a requirement conflict with awscli if you're using a broad set of packages, even though awscli will operate happily with a more broad set of requirements than it declares.

[1] https://github.com/python-poetry/poetry/issues/697

[2] https://github.com/python-poetry/poetry/issues/697#issuecomm...


For this purpose, I’m defining a “library” as any PyPI package that you expect to be able to install alongside other PyPI packages. This includes some counterintuitive ones like mypy, which needs to extract types from packages in the same environment as the code it’s checking.

The awscli documentation recommends installing it into its own virtualenv, in which case pinned dependencies may be reasonable. There are tools like pipx to automate that.

Though in practice, there are reasons that installing applications into their own virtualenv might be inconvenient, inefficient, or impossible. And even when it’s possible, it still comes with the risk of missing security updates unless upstream is doing a really good job of staying on top of them.

I don’t think that respecting declared dependency bounds is a Poetry bug. Pip respects them too (at least as of 20.3, which enables the new resolver by default: https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip...). If a package declares unhelpful bounds, the package should be fixed. (And yes, that means its maintainer might have to deal with some extra issues being filed—that’s part of the job.)


Why on earth would you ever add awscli as a dependency? That makes very little sense. It’s an application (that is no longer distributed via pypi).

You should use boto3


> Is the pypi package awscli an application or a library?

Hopefully a library! As hopefully the AWS command-line interface is maintained and distributed separately from any SDK that powers it...


boto core/boto/boto 3 are the libraries, which awscli drives.


> Python libraries should not pin dependencies. _Applications_ can pin dependencies, including all recursive dependencies of their libraries.

This is essentially what we do where I work. When we maked a tagged release, we will create a new virtual environment, run a pip install, run all the tests and then run pip freeze. The output of pip freeze is what we use for the install_requires parameter in the setup method in setup.py.

That said, a library could certainly could update their old releases with a patch release and specify a <= requirement on a particular dependency when versions newer than that no longer work. That said, it would be a bit of work since indirect dependencies would also have to be accounted for as well.


It's maddening to hear people say things like, "Oh if everyone just used semantic versioning this wouldn't be a problem". Of course this cannot work. _Think about it_. There are innumerable ways two pieces of code can be incompatible. ... If you call these things "breaking" changes, you will constantly be increasing the major version.

One of the things that prompted the OP was this breakage in Python's cryptography package [1] (OP actually opened this issue) due to the introduction of a Rust dependency in a 0.0.x release. The dependency change didn't change the public API at all, but did still cause plenty of issues downstream. It's a great question on the topic of semver to think about how to handle major dependency changes that aren't API changes. Personally, I would have preferred a new major release, but that's exactly your point syllogism — it's a matter of opinion.

As a sidenote, Alex Gaynor, one of the cryptography package maintainers is on a memory-safe language crusade. Interesting to see how that crusade runs into conflict with the anti-static linking crusade that distro packagers are on. I find both goals admirable from a security perspective. This stuff is hard.

[1] https://github.com/pyca/cryptography/issues/5771


It's hard because underneath is a battle of who bears the maintenance and testing costs that no one wants to bear.

Asking a publisher to qualify their library against a big range of versions just means that they need to do a lot more testing and support. Obviously they want to validate their code against one version, not 20, and certainly don't want an open ended > version which would force them to do a validation each time a minor dep is released.

Similarly when publishers say I will only work against version X, this puts a bigger burden on the user to configure their dependencies and figure out which version they can use. They would like to push that work onto vendors.

What's a bit depressing is that these economic concerns are not raised openly as the primary subject matter, but the discussion is always veiled in terms of engineering best practices. You're not gonna engineer your way out of paying some cost. Just agree on who bears the cost and how you will compensate them for the cost, then the engineering concerns became much easier.


Libraries pinning dependencies only fixes a narrow portion of the problem and introduces a bunch of others (particularly in ecosystems where only a single version of a package can exist in a dependency tree). In particular, it is great because it makes life slightly easier for the library developers. However, if every library pinned deps, it becomes much harder to use multiple libraries together: suppose an app used libraries A and B, and A depends on X==1.2.3, while B depends on X==1.2.4. It’s then pushed on to every downstream developer to work out the right resolution of each conflict, rather than upstream libraries having accurate constraints.

Pinning dependencies in applications/binaries/end-products is clearly the right choice, but it’s much fuzzier for libraries.


I think you're really under-rating how important it is to be able to do something like "pip install 'requests==1.0.5'" or whatever, in order to reconstruct the past state of a project. If requests hasn't pinned its dependencies, that command will simply not work. The only way you'll be able to install that version of requests is to manually go back and piece together the whole dependency snapshot at that point in time.

There's pretty much no point in setuptools automatically installing library dependencies for you if you expect the library dependencies to be unpinned. In fact it would be actively harmful --- it just leads people to rely on a workflow that works today but will break tomorrow.

You're asking for an ecosystem where there's no easy way to go back and install a particular version of a particular library. That's not better than having version conflicts.

The other thing I'd note is that it's quite an understatement to say that pinning dependencies makes life "slightly easier" for library developers. We're not going to accept builds just breaking overnight, and libraries that depend on us aren't going to accept us breaking their builds either.


Sure, it sucks that unpinned dependencies lose historical context as the deps move forward, and I’ve personally suffered this in my own library maintenance work... but there’s still the fundamental issue of conflicting pinned versions if there’s multiple libraries.

(At the app level, the right approach to “going back in time” is for those apps to pin all their deps, with a lockfile or ‘pip freeze’, not just top level ones. That is, one records the deps of requests==1.0.5 in addition to requests itself.)


Can you give an example of a real ecosystem that can't handle such a conflict? In my actual experience, the package manager will either automatically use the latest version, or in one case has more complex rules but still picks a version on its own (but I stay away from that one due to the surprise factor). Your argument has force against bad package managers and against using very strict dependency requirements, but not against pinning dependencies sensibly in a good ecosystem.

The only conflict I've seen that can't be automatically resolved is when I had some internal dependencies with a common dependency, and one depended on the git repo of the common dep (the "version" being the sha hash of a commit), and another depended on a pinned version of the common dep. Obviously there's no good way to auto-resolve that conflict, so you should generally stick with versions for library deps and not git shas.


> If you publish a Python library without pinned dependencies, your code is broken.

> you will be unable to today publish an end-to-end set of commands that will work in 2025

Not necessarily.

Since ~2010 I maintain an application with an unpinned requirements.txt; it doesn't even have version constraints at all.

The only breakages I had were either:

1. when switching from Python 2 to Python 3 (obviously)

2. when a new Python version introduces a bug (but Python is not pinnable anyway)

3. once, when a dependency released a new major version and removed an internal attribute I was using in my tests out of laziness (so that one is entirely on me)

The trick is to only use good libraries, that care about not breaking other people's code.

---

It's also worth noting that it's not your job as a developer to make sure your application can be installed anywhere; it's the packager's job to make sure your app can be installed in their distribution.

And if your users want to use pip (which is kind of the Python equivalent of wget + ./configure + make install) instead of apt/yum/... to get the very latest version of your software, then they should be able to figure out how to fix those issues.


It's clear that your approach is a possible approach:

1. 'only use good libraries' 2. 'it's not your job as a developer to make sure your application can be installed' 3. 'if your users want to use pip... they should be able to fix those issues'

However, this isn't a solution to the problem that led to the existence of language ecosystems. It is a refusal to acknowledge the problem.


> It's maddening to hear people say things like, "Oh if everyone just used semantic versioning this wouldn't be a problem". Of course this cannot work. _Think about it_. There are innumerable ways two pieces of code can be incompatible. You might have a change that alters the time-complexity for niche inputs, making some call time-out that used to succeed. You might introduce a new default keyword argument that throws off a *kwargs. If you call these things "breaking" changes, you will constantly be increasing the major version. But if you increase the major version every release, what's the point of semver! You're not actually conveying any information about whether the changes are "breaking".

That's the point of the link to Hyrum's law. The article argues that the practice of pinning encourages that attitude: consumers feel free to depend on internal implementation details, producers feel free to change behaviour arbitrarily, and no-one takes responsibility for specifying and maintaining a stable interface, which is how you actually break that knot - producers need to specify which parts are stable interfaces and which are not, consumers need to respect that and not depend on implementation details, and then you can actually use semver because it's clear what's a breaking change and what isn't.


You’re completely wrong and this advice is somewhat harmful. What you’re describing is how a Python application should be managed. Not a library. Libraries should absolutely not lock their advertised dependencies to arbitrary point-in-time versions for fairly obvious reasons.

Picking a suitable dependency specifier depends heavily on the maturity of the library you’re using and if you need any specific features added or removed in a specific release.

Saying your library depends on “spacy==2.3.5” is a lie that will mean any other library that depends on spacy>=2.3.6 can’t be used. Even if your code will realistically work fine with any spacy 2.x release.


Everyone needs commands like "pip install 'spacy==2.3.5'" to work reliably in the future, so that you can go back and bisect errors. You need to be able to get back to a particular known-good state, and work through changes

I'm not saying we pin our dependencies to exact specific versions, but we absolutely do set an upper bound, usually to the minor version.


> I'm not saying we pin our dependencies to exact specific versions, but we absolutely do set an upper bound, usually to the minor version.

OK. That's more sensible, but "pinning" implies == to a specific version. If you know a library does semantic versioning and breaks their API then ~= is fine. Just not ==.


You should use a tool that supports a lockfile, not pip directly. I recommend Poetry.


Even applications shouldn't really be pinning dependencies. The only time to pin dependencies is when deploying that application. That could mean bundling it with pyinstaller or making a docker image. But someone should still be able to install it from source with their own dependencies.


This is the difference between an "application" and a "library": https://caremad.io/posts/2013/07/setup-vs-requirement/

Libraries should absolutely not pin their dependencies. Applications should if you care about reproducible builds (not necessarily byte-for-byte, but "can build today == can build tomorrow").

Installing both libraries and applications in the same way in the same environment is a fundamental mismatch that pip encourages, and yes - it leads to fragile binaries.


I get the impression that this advice is accurate for the python ecosystem, but that’s because the entire ecosystem is broken with respect to backwards compatibility.

The exact same mechanisms work fine with other programming languages, and (more importantly, probably) different developer communities.

In fairness, Python’s lack of static types does make things worse than the situation for compiled languages. (Though that’s a general argument against writing non-throwaway code in python).

People claim node does better, even though JS is also missing static types, so presumably they solved this issue somehow (testing, maybe?). I don’t use it, so I have no idea.


> In short unpinned dependencies mean hopeless bit-rot.

No, this is not true, for the simple reason that there will _always_ be unpinned dependencies (e.g. your compiler. your hardware. your processor) and thus _those_ are the ones that will guarantee bitrot.

Pinning a dependency only _guarantees you rot the same or even faster_ because now it's less likely that you can use an updated version of the dependency that supports more recent hardware.


> your compiler

Compilers of languages like C, C++, Rust, Go etc go above and beyond to maintain backwards compatibility. It is extremely likely that you will still be able to compile old code with a modern compiler.

> your processor

Hardware is common enough that people go out of their way to make backwards compatibility shims. Things like rosetta, qemu, all the various emulators for various old gaming systems, etc.

> your hardware

Apart from your CPU (see above) your hardware goes through abstraction layers designed to maintain long term backwards compatibility. Things like opengl, vulkan, metal, etc. The abstraction layers are in widespread enough use that as older ones stop being outdated people start implementing them on top of the newer layers. E.g. here is OpenGL on top of Vulkan: https://www.collabora.com/news-and-blog/blog/2018/10/31/intr...

> [Your kernel]

Ok, you didn't say this part, but it's the other big unpinned dependency. And it too goes above and beyond to maintain backwards compatibility. In fact Linus has a good rant on nearly this exact topic that I'd recommend watching: https://www.youtube.com/watch?v=5PmHRSeA2c8&t=298s

> Pinning a dependency only _guarantees you rot the same or even faster_ because now it's less likely that you can use an updated version of the dependency that supports more recent hardware.

Dependencies are far more likely to rot because they change in incompatible ways than the underlying hardware does, even before considering emulators. It's hard to take this suggestion seriously at all.


> Dependencies are far more likely to rot because they change in incompatible ways than the underlying hardware does

Yes, that is true. It is also very likely that you can more easily go back to a previous version of a dependency than you can go back to a previous hardware. The argument is that, therefore, pinning can only speed up your rotting.

If you don't statically link your dependencies, and due to an upgrade something breaks, you can always go back to the previous version. If you statically link, and the hardware, compiler, processor, operating system, whatever causes your software to break, now you can't update the dependency that is causing the breakage. And it is likely that your issue is within that dependency.

Pinning can only make you rot faster.


Honest question: have you ever worked as an application developer? Responsible for getting working artifacts to users as a means to an end?

Pinning dependencies absolutely and unquestionably works better, and for longer, than dynamic linking, for this use case.


Perhaps I still fail to explain myself: what I am saying is that _not pinning_ only _adds_ more choices, so by definition it can only work better.

Pinned or not, if a software update breaks things, you can always just revert back to a previous version of your dependencies. This applies to a myriad soft problems including a dependency changing interface.

However, when pinning, when one of your static dependencies is broken due to a change outside your control (e.g. hardware, operating system, security issue making it unusable, or something else), the user's only recourse is to call the developer to fix the software.

I am not claiming whether one happens more frequently than the other, or claiming that hardware changes cannot break the main software itself, which often nullify the point. All these issues can happen to both software with static linking or dynamic linking. However dynamic linking has at least one extra advantage that static linking cannot have, and the opposite is not true.

> have you ever worked as an application developer? Responsible for getting working artifacts to users as a means to an end?

Look, ironically I find that all of this crap discussion is because of a newer generation of "application developers" that do not know yet what does it mean to "deliver working artifacts to users". Imagine my answer to that question.


> However, when pinning, when one of your static dependencies is broken due to a change outside your control (e.g. hardware, operating system, security issue making it unusable, or something else), the user's only recourse is to call the developer to fix the software.

In practice, this happens so infrequently it can be ignored as a risk. (When it does happen, users generally don't expect the software to continue to work.)

> dynamic linking has at least one extra advantage...

You don't seem to be acknowledging the downside risk to dynamic linking which motivates the discussion in the first place. An update to a dynamically linked dependency which breaks my delivered artifact is an extremely common event in practice.


> In practice, this happens so infrequently it can be ignored as a risk.

Well I disagree there. Security issues or external protocol changes (e.g. TLSv1.2 to TLSv1.3) are rather frequent, not to mention usually customer wants to upgrade their machines (old ones broke) and existing operating system no longer supports the new hardware.

> An update to a dynamically linked dependency which breaks my delivered artifact is an extremely common event in practice.

Again, I agree. A "surreptitious" dependency update breaking the software is much more common. However, I have already acknowledged that _two times already_, and the point that I'm making is that it doesn't matter if you are pinning dependencies or not: customer CAN FIX these issues without help from the developer. They just have to roll back the update!

On the other hand customer CAN'T fix the first issue (e.g. new hardware).


If developers are unwilling to maintain dependencies and be good citizens of the larger language community, should they be adding those dependencies in the first place?

If you're not operating in the large ecosystem then fine. But if your project is on e.g. pypi, then there is an issue.

(edit: Note, yes I know the virtualenvs exist, docker exists, etc. but those are space and complexity trade-offs made as a workaround for bad development practices)


> No, this is not true, for the simple reason that there will _always_ be unpinned dependencies (e.g. your compiler. your hardware. your processor) and thus _those_ are the ones that will guarantee bitrot.

Docker with sha256 tags fixes that issue (and docker container even specify a processor architecture).


If you're introducing breaking changes in every new release you should still be in the 0.x stage of SemVer. You're doing something wrong if you end up on v77.0.0. The Node ecosystem's strict compliance with SemVer works fine 99% of the time because SemVer is indeed an effective versioning system (when people use it right).


I don't have a particularly strong viewpoint on this, but I find it noteworthy that in your example the user themselves is asking for a specific version of the software. You don't seem to be intending for users to ask for simply the latest version and have that work, but a specific one, and you want that specific version to work exactly as it did whenever it was published.

I can see some instances in which this expectation is important, and others where it is likely not or else certainly less important than the security implications.

For the extremes, I see research using spaCy has a very strong interest in reproducibility and the impact of any security issues would likely be minimal on the whole simply due to the relatively few people likely to run into them.

On the other extreme, say some low-level dependency is somehow so compromised simply running the code will end up with the user ransomware'd after just-long-enough that this whole scenario is marginally plausible. Then say spaCy gets incorporated into some other project that goes up the chain a ways and ultimately ends up in LibreOffice. If all of these projects have pinned dependencies, there is now no way to quickly or reasonably create a safe LibreOffice update. It would require a rather large number of people to sequentially update their dependencies, and publish the new version, so that the next project up the chain can do the same. LibreOffice would remain compromised or at best unavailable until the whole chain finished, or else somebody found a way to remove the offending dependency without breaking LibreOffice.

I'm not sure how to best reconcile these two competing interests. I think it seems clear that both are important. Even more than that, a particular library might sit on both extremes simultaneously depending on how it is used.

The only solution - though a totally unrealistic and terrible one - that comes to mind is to write all code such that all dependencies can be removed without additional work and all dependent features would be automatically disabled. With a standardized listing of these feature-dependency pairs you could even develop more fine-grained workarounds for removal of any feature from any dependency.

The sheer scale of possible configurations this would create is utterly horrifying.

At any rate, your utter rejection of the article's point seems excessively extreme and even ultimately user-hostile. I can understand your point of view, particularly given the library you develop, however I think you should probably give some more thought to indirect users - ie users of programs that (perhaps ultimately) use spaCy. I don't know that it makes sense to practically change how you do anything, but I don't think the other viewpoint is as utterly wrongheaded as you seem to think.


> I'm not sure how to best reconcile these two competing interests.

What would help a lot is if the requirements were specified outside of the actual artifact, as metadata. Then the requirements metadata could be updated separately.


>Of course this cannot work. _Think about it_. There are innumerable ways two pieces of code can be incompatible.

There's a very simple solution here: just don't write bugs.


Completely and utterly wrong. I hope nobody heeds this advice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: