Hacker News new | past | comments | ask | show | jobs | submit login
Lock Files Considered Harmful (chriswarbo.net)
3 points by todsacerdoti 19 days ago | hide | past | favorite | 8 comments



I don't understand any of the statements in this post. They don't seem to form a coherent argument, and they don't match my understanding of how lockfiles work.

There's a bunch of very strongly worded statements ("hack", "legacy tooling", "poor design"), but no actual explanations of _why_ or _how_ lockfiles are supposedly bad.

Package managers _are_ deterministic, but they also are reliant on the published package versions that match the listed constraints, _at the time the package manager ran and generated the lockfile_. A dep listed as `foo: ^1.2.0` will match version 1.2.3 today. If 1.3.0 or 1.9.57 is published tomorrow, it will match that newer version.

So, the main purpose for lockfiles is to A) cache a set of resolutions so that the package manager doesn't have to re-run the resolution process every time a repo is cloned and deps are installed, and B) make sure that a _consistent_ set of resolved packages is installed each time, so that you don't accidentally get newer versions that might cause different behavior.

The arguments about hashes seems to ignore what lockfiles already do. They do normally include hashes of the resolved packages, both standalone and possibly as part of a download URL.

The one statement that has any seeming validity is that they _can_ be "attack vectors", in that they are large, border on unreadability, and no one reads the diffs.

But beyond that, this post feels like it never makes a coherent argument.


> Package managers _are_ deterministic, but they also are reliant on the published package versions that match the listed constraints, _at the time the package manager ran and generated the lockfile_.

Yes, that's why "version numbers" are basically documentation; unrelated to the actual files depended on by a build (e.g. try copying random junk into `~/.m2/repository`). Hence, if we want to use those names/versions to refer to particular files, we should specify the database we're using to do the lookup (AKA the repository's "index state").

> A dep listed as `foo: ^1.2.0` will match version 1.2.3 today. If 1.3.0 or 1.9.57 is published tomorrow, it will match that newer version.

hackage.haskell.org lets us specify what "today" is, via a datetime https://cabal.readthedocs.io/en/3.4/cabal-project.html?highl...

However, it's even better is to use a database that's kept in version control. There's a third-party attempt to do this for Hackage, at https://github.com/commercialhaskell/all-cabal-hashes whereas crates.io does this natively at https://github.com/rust-lang/crates.io-index


Nix has a bunch of different build functions for rust packages and they generally involve reading the Cargo.lock file in order to figure did out what dependencies need to be downloaded. On the other side, there is pip2nix which generates a lock file in that form of a nix file from a list of python packages, which you then add to git and reference. If python had a commonly used lockfile format this step would be necessary.


My preferred approach is to use one of those `foo2nix` tools, but as part of the build process (using the "import from derivation" feature of Nix). There's no need to keep such build products in git.

I've done this professionally for Maven projects (mostly Scala) via mvn2nix, and personally I've been doing it for many years with Haskell via cabal2nix (Nixpkgs now has `haskellPackages.callCabal2nix`; before that I was hacking together similar things like https://github.com/Warbo/nix-helpers/blob/45a66d714877233680... )


>There's no need to keep such build products in git.

There is when the bridged build system doesn't use hashes, like requirements.txt.


Yes, that's covered in the article, e.g.

> Input data, especially descriptions of available dependencies, should be specified such that we can reproduce it for any previous run. This is the problem many legacy systems struggle with: e.g. deferring important details to some HTTP response, with no way to validate its consistency (it’s usually not consistent, due to new versions being included in queries!). This is the main problem with legacy tools, which makes their users resort to lock files.

The article is pointing out that these are mistakes, in the hope that developers of new packaging tools will stop copying them. In the mean time, those of us stuck using those tools will need to keep working around their problems using hacks like lock files.


Hot takes without sincerely exploring the problem space. Benefits, not discussed. Tradeoffs, none mentioned. I figured it was a junior dev, but i see posts from the author since 2007. Id be willing to revisit if the depth/quality is also revisited.


This blog post was a spin-off from this page, which goes into much more detail and has a worked example of combining Nix and Haskell's Cabal tool to resolve dependencies in a reproducible, decentralised way http://www.chriswarbo.net/projects/nixos/nix_dependencies.ht...

Now I've finished that page, I should probably update this blog post to reference it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: