Hacker News new | past | comments | ask | show | jobs | submit login
NPM lockfiles can be a security blindspot for injecting malicious modules in PRs (snyk.io)
259 points by fagnerbrack 6 months ago | hide | past | favorite | 73 comments

Note that with Rust's cargo, replacing a registry source with a source from github isn't possible without also changing the Cargo.toml. Cargo always consults both .toml and .lock files, and only uses the .lock file to the extent where it's consistent with the .toml. If it notices any inconsistencies, e.g. .lock pointing to github but .toml pointing to crates.io, its behaviour depends on the parameters passed. If you pass --locked or --frozen, it gives an error. If you don't pass any parameters, it tries to fix the issue by itself. It can't use the information from the .lock file to get the precise version, and thus just uses the latest version that's still consistent with the version requirement in the .toml file.

I don't think that cargo's decision to always consult the .toml was actually motivated with security in mind. Instead I think cargo's authors just wanted to avoid the hassle for developers of having to constantly invoke commands to update the .lock file (is this what "convention over configuration" means?). So I think this is an instance where a choice done for better ergonomics actually ended up improving security as well.

Note though that even with current cargo, any PR that claims to do a simple cargo update could very well not be such a PR. It could e.g. include an "update" to a yanked and possibly vulnerable/backdoored version. One malicious upload, even if yanked, could be milked for a long time. See repi's comment in this thread for a solution: https://news.ycombinator.com/item?id=21887445

Added a verification step to prevent this in Rust code with our cargo-deny tool. https://github.com/EmbarkStudios/cargo-deny/pull/80

We run this in GitHub Actions for all of our Rust repos, primarily to prevent banned or duplicate dependencies and approved licenses. But also works for verifying the sources of crates to prevent this specific issue now

Wait, you're supposed to commit those? Genuinely curious.

I always set up an ignore rule for them and use package.json's dependency list instead. My projects are all on the small side though; is the idea to commit the .lock file and avoid breakage through things like minor version updates of your first level dependencies? Wouldn't that carry a bigger security risk for large numbers of packages that then aren't receiving patch updates as readily?

Lock files are useful in production environments where you want to ensure the exact same package versions are being used that you tested with. A package.json file specifies package names and versions, but theres no guarantee your package registry will always serve the same content.

The lock file records the hash of each package and uses it to verify package integrity on future installs. If any of the hashes don't match up, the `npm clean-install` command[0] will throw an error.

[0] https://docs.npmjs.com/cli/ci.html

Integrity checks are important, but an even more basic benefit obtains regarding version ranges. Typical package.json dependency entries specify _ranges_ of versions that will satisfy the dependency, rather than a single specific ("pinned") version. So it's by design, and fully to be expected, that over time, subsequent builds will yield different results as library authors publish newer versions which still satisfy the version ranges. Knowing when this has happened, and being able to diff the results in the dependency graph, is a key feature of lockfiles. A proper lockfile implementation (like yarn's) provides deterministic builds, so anyone building the project with the same lockfile gets precisely the same dependency graph. You get the benefits of version pinning without the downside.

Without the package lock file, your build is not repeatable. In two years (or more likely, two weeks) someone new will check out your code, it will be broken out of the box, and the poor sod will have no idea why. More than likely they will blame you.

May be I'm missing something, but why dependency resolution cannot be deterministic? If you always choose a version specified in the config, or a highest version if few different versions of the same dependency are defined through some of intermediaries, you'd always end with the same set. AFAIK it's the strategy in Gradle for JVM projects, and a similar deterministic strategy is used by Maven.

NPM resolution is not deterministic because package dependencies may be specified using SemVer, e.g. “~1.0.0” which may result in pulling 1.0.0, 1.0.1, 1.0.2, and so on - anything that matches 1.0.x specifier.

The NPM ecosystem commonly uses these types of specifiers. Even if your own dependencies are all specified in “package.json” using exact versions, they almost certainly depend themselves on other versions (dependencies of dependencies, and so on) using the semver specifiers. That means whenever an author of that package publishes themselves, your resolved dependencies may change.

The package-lock.json is simply a record of the exact dependency resolution graph based on the registry state that existed at the moment you generated it, so that you can reliably reproduce that graph a month or a year later. Otherwise it’s very common for NPM-based projects to have “NPM install” fail later on (or succeed, but create a bug or unit test failure) because some new version was added to the registry and created a compatibility issue or bug.

But package lock is ignored for all users of your npm. Only root package’s lock file has effect. This makes it useless in most cases. It’s even more useless if you think that production apps are built very often as tagged docker images = are reproducible by design.

> But package lock is ignored for all users of your npm. Only root package’s lock file has effect. This makes it useless in most cases.

But package-lock is intended to be used at the project level. i.e. lock all package versions in the current project or app.

That isn't "useless in most cases", it solves specifically the case it is meant to solve.

> It’s even more useless if you think that production apps are built very often as tagged docker images = are reproducible by design.

But you're stuffed if you need to rebuild the docker image later, or for example, go back to an old commit/release, create a 'support' branch to retrofit a change and build a new image. You want repeatability all the way down.

Before lockfiles I would pin version in my package.json file. Now, I still pin version in my package.json file ;)

package-lock is for pinning transitive dependencies, which package.json has no control over

> avoid breakage through things like minor version updates of your first level dependencies?

No, a much more useful use case is avoid breakage from minor version updates in your 15th level dependencies. You might be cognizant of your semver best practices, but the team who publishes the string utility library included by the query builder included by your ORM might not be so careful.

> No,

I think you mean "Yes, but also ..."

I don't really. If you only wanted to control your first-level dependencies, you would simply not use caret and tilde ranges in your package.json, no lockfile needed (except for cryptographic verification).

> I don't really. If you only wanted to control your first-level dependencies

They didn't say "only", they said "things like".

Commit your lock files, but never the node_modules folder.

Well that depends. e.g. yarn has an auto clean feature and a feature to flatten the dependency tree. If you have a well groomed dependency tree, one might want to commit the node_modules folder:

1. Ultimately dependencies end up in your production environment, so folks might wanna review the diff of dependency updates.

2. No need to run yarn install. So if your build server has only access to your git repo, no need to get packages from a remote environment.

3. No need for yarn or npm on non-Frontend dev machines. It will just work with Node installed.

4. If npm or yarn ever cease to exist (or the registries), any commit should still work as intended.

If you have native dependencies, committing node_modules will break cross-compatibility with different platforms.

That's exactly what we were hit with - a mix of Mac and PC developers and everything ploded (this was literally one of our first node projects about 5 years ago we didn't know to not commit node_modules)

All of these (except 3) also get solved by moving dependencies to a hosted npm registry. Something which has been suitable for some projects I’ve worked on. And keeps ’git clone’-times down.

Well, not necessarily. Basically you create a new dependency and a piece of infrastructure _you_ need to maintain and that often still relies on yarn/npm upstream registries.

So if that part of the infrastructure is down, you cannot work. A repo might be more error resilient, because if you check it out, you check it out.

Some people prefer to commit them (along with composer lock files for php). I always advise against it for this exact reason and because I want to make sure packages are correctly installed on the target system. Occasionally, deeply flawed versions are removed and it helps with catching it as well since you are forced to reinstall packages on a pull or build.

Edit: a minor version update should not be backwards incompatible and break things. If it does, the package is poorly maintained imo.

Not committing your lock file leaves you vulnerable to any of your dependencies publishing a new vulnerable version and directly updating to that.

You can pick your poison.

There is also that. But one can reference the exact version and it becomes a non issue. In a properly secure environment it might be desirable. But as with everything there are pros and cons to each approach.

With NPM, this is not true. You can reference exact versions in every single entry in the package.json file and it doesn’t matter.

Each one of those exact versioned libraries you referenced will itself likely have semantic version dependencies which may update themselves. Unless every single dependency you depend on also follows the policy of pinning to exact versions (and their dependencies do the same, and so on), you are still vulnerable (See left-pad incident, which was a sub-dependency of babel).

The only way to avoid this is the old shrinkwrap or the new package-lock.json feature.

I stand corrected

While the title of the page says "NPM lockfiles", the article actually talks about yarn lockfiles. In essence though, it applies to all lockfiles. An easy step for many organizations (especially with github actions) is to have it verified.

Better yet don’t accept third party contributions to lockfiles... I don’t know any reason I would approve an external PR that modifies the lockfile.

Perhaps you're misunderstanding what this article about.

The author is making a pr that updates the package.json, which also means the lockfile is updated.

If you don't accept any PR that touches the package.json file, then that means you're not accepting a PR that introduces any new dependency, tries to update a dependency for security reasons, or uses a feature available in a newer version of the dependency.

All of those reasons seem like valid reasons to accept a lockfile change -- that the author of the PR changed the package.json file for any of those reasons.

That's fine. Normally when I get a PR to update dependencies, I go ahead and update them myself, thank the submitter, and close the PR. No need to merge it directly. Similarly, if it adds a new dependency, I can ask them to put that in an isolated commit, cherry pick all the other commits, and update the dependencies myself.

No, I would likely not accept a PR that adds a new dependency. We try to minimize dependencies as much as possible, if the feature absolutely needed the dependency, we probably wouldn't open it for third party contributions in the first place (not least of all because maintenance of the dep now falls onto us if ever upstream falls short).

As for updating dependencies for security/features, we leave that to dependebot.

if not, the PR may fail.

But since the lockfile can be generated from the package.json, I still don't see a reason that the lockfile needs to be committed by a third-party contributor instead of being generated automatically post-merge, or by the repository maintainers during the course of normal development.

If you use the lockfile like that then there is little point in having a lockfile. The lockfile specifies the exact versions of the whole dependency tree, and their hashes, at a given point in time. If you don't commit it then everyone will have different versions of the dependencies.

I see your point, but if a dependency package needs exact versions for its dependencies, I'd expect it to be specified in its package.json as well. If not, then I don't see why exact dependencies for the entire tree are needed.

Sure, if a package absolutely needs exact dependencies for its entire tree, it can check in the lock file, but I've not found this necessary in practice provided I use dependencies I trust , that follow semantic versioning.

> Sure, if a package absolutely needs exact dependencies for its entire tree

Not sure you're talking about the same thing as everyone else.

There's a big difference if you're maintaining a package vs maintaining an app.

If you're a "package" maintainer, you don't want to pin dependencies. Because the package consumer (i.e. people building apps using your package) should not have their exact versions dictated to them.

If you're an "app" maintainer, you absolutely need to check in your lockfile, because you should care about repeatable builds.

It certainly doesn't help that the NPM's and Yarn's lockfiles are hard to read. Compared them to Gemfile.lock which bundler generates.

NPM's package-lock.json https://github.com/search?q=filename%3Apackage-lock.json&typ...

Yarn's yarn.lock https://github.com/search?q=filename%3AGemfile.lock&type=Cod...

Bundler's Gemfile.lock https://github.com/search?q=filename%3AGemfile.lock&type=Cod...

Generate the lockfile (and any generated files) on a trusted machine instead. Generate it and merge it from a ci server.

But a lockfile is not like "any generated file", because for typical generated artifacts they should always be generated the same, for the same commit.

This is not true for a lockfile, where the whole point is to capture the specific versions at the point in time that the generation is done.

The benefit of the contributor committing the lockfile is that it encodes the exact combo of dependencies that worked at the time the related code changes were made (in the same PR).

This means other project maintainers aren't left scratching their heads trying to figure out why a PR worked on your machine but fails in CI.

Code smells if a change works with one lockfile but not another generated around the same time. The utility of a lockfile increases with the age of the file.

We expect lockfiles to change when we update it add dependencies. If a pull request cannot build with a lockfile generated when it is merged then that change should not be merged.

> Code smells if a change works with one lockfile but not another generated around the same time.

This isn't true at all. You can't predict when a new, potentially breaking, version of a dependency might get published. It could happen a second after you generate your lockfile, or create your PR.

At time of PR creation we could have versions 1, 2, 3 and 4 of our transitive dependencies.

At time of merge we could have versions 1.1, 2.2, 3.3 and 4.x available.

Dependencies are outside of your control, so always "smell", that's why it's crazy to do anything other than pin the heck out of them.

This. Just reset package.json and yarn.lock to master state and then re-apply the changes yourself (via yarn). Should be possible most of the time as package.json changes are usually small.

This is a lot of manual work for every pr that touches package.json.

To me this looks extremely frightening. Sure, you can create good routines etc to prevent this, but that's not really enough. As a user of any module you can't trust every package to have good routines. In any basic project there are tens and hundreds ++ dependencies, and we all know that some of them will be vounerable for this exploit. We need a MECHANISM that prevents this, routines and good practices are not enough

There is something really weird and complex with the way dependency management works out in JS that I can't quite put my finger on, but seems to make things especially intractable somehow. While other platforms have pretty similar dependency management tools... I think that the ruby `Gemfile.lock` for instance would usually have much smaller and human-readable diffs for an analogous change.

Instead of changing the module's URL to e.g. a private evil GitHub repository, how about registering a new package within the official npm registry, using homoglyphs/similar-looking characters.

For diff viewers that are highlighting entire lines for changed content, the difference in the package name wouldn't be spotted in reviews when also changing the version or some other part of the URL.

Or am I missing something?

At least when I review code, if I see a line changed that doesn't need to be, I leave a comment asking why and hold off on approving the code for merge until I figure out what's going on. This is more due to not having changes people made by mistake merged in rather than a strategy for dealing with malicious submitters, though.

I think the issue is that the version number and name change would be the same line. Maybe you'd see that the line changed and that the version number changed and assume that was the only change for the line. (Depending on client, might not highlight specific sub-line changes)

Curious, would you ever approve a third party change to a lockfile? I maintain a very large OSS project and I don’t know any reason I would accept a PR that changes the lockfile.

Would you accept a PR that changes the package.json file to add a new dependency?

Not likely, we try to minimize dependancies as much as possible. If a new feature were to absolutely necessitate a new dependency, it probably wouldn't be open for third party contributions in the first place (not least of all because maintenance of the dep now falls onto us if ever upstream falters).

You could just discard the PR’s lock and generate your own in the PR

I believe npm blocks publishing packages with similar names to existing packages.


It looks like only validate the domain isn't enough.

NPM does not care about whether the package name in the tarball match to the name in lock or not.

So you can just point to tarball path to another package that has the similar name. and nobody with ever notice that.

And domain whitelist won't work in this situaltion either.

Does it care about casing? I doubt it does...so I could for example point ABCmagical to ABCmagicaI. (The last letter in the latter being an uppercase i) You would have to update the package too, or something else to red herring the colorized diff.

The irony of this article is that it recommends you to use npx.

Great work, very creative vector!

On the other hand, looks like a big missed opportunity from snyk.io! Feels like they could have taken the extra step and time to come up with a counter to this in their product, and use that post as both a way to showcase their product and the discovery.

In general I wonder about the ability to receive a root kit from any number of sources, sharing software projects is a major concern. What is best practice here? Only work inside a VM?

Why would the lock file contain the url?

Different modules can come from different hosts. Whether that’s a good idea is debatable.

so why isn't the host for the dependency specified in the package.json instead?

I think NPM is poorly designed. These lessons have all been learnt in the past by the maven ecosystem, and yet, it is repeated again in the javascript npm ecosystem.

Maven sidesteps the issue by not even having lockfiles and leaving everyone who checks out a repo free to have different versions of subdependencies, so I'm not sure it's a very stellar example.

only if you use snapshot dependencies (or your dependencies use that).

Maven pins down the version for you if you followed the "best pactise", or you can override sub-dependencies' versions if you should so choose (via dependencyManagement), and also lets you control the actual repo to download from as well. And as a bonus, you get to share the binary across different projects if they are on the same machine.


> What's poorly designed about each dependency recursively installing its own dependencies even if some of them are shared?

Something npm doesn’t do, so not a great example. Simple apps having dependencies with lots of files is also a package ecosystem thing, not a package manager thing.

Run `npm ls` without `npm dedupe` and notice "deduped".

> Something npm doesn’t do, so not a great example.

Doesn't do anymore. It definitely used to, I'd estimate around 2014-2015.

> Are you telling me the package manager that installs tens of thousands of files for a simple app is poorly designed?

I blame the tiny JS stdlib and the resulting culture for this, rather than the package manager.

It's true that the package manager enabled that culture to flourish, but a package manager would have come about either way. And if npm had somehow disallowed massive dependency chains and 14 line packages, a competitor would have filled the gap and been adopted as a result.

In my recollection that culture predated package managers for Node. When npm came around people on the official node IRC channel were already advocating tiny, single function modules. Those tiny packages on npm, albeit being weird, felt like a pretty organic culture thing. Not sure where that came from, I think it might have already been a thing for a subset of that community coming from other tech stacks at the time.

While the node stdlib isn't all-bolts-included, I wouldn't exactly call it tiny either. What are you missing? I kind of blame the community for moments of stupidity like that leftpad thing just to pick one, if somebody had added some great extensions to the string class before that happened it would have been some other tiny issue.

I don't know the js world well at all, but it seems to me it's got to be the culture much more than the standard library.

Look at C. Libc is tiny and relatively useless compared to most languages. Things are rarely ever added relative to decades ago. But there are many high quality self contained projects with very thin dependencies. Example: sqlite depends on libc and a few syscalls. It's an extremely solid piece of work despite those constraints. What's your excuse?

Why does the article recommend requiring https, if the packages have sha512 integrity checks?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact