If you aren’t reviewing the diffs of your dependencies when you update them, you’re trusting random strangers on the Internet to run code on your systems.
Espionage often spans multi-year timelines of preparation and trust building. No lesser solution will ever be sufficient to protect you. Either read the diffs, or pay someone like RedHat to do so and hope that you can trust them.
Code review can't catch determined malicious actors. It just isn't a viable protection against that kind of attack.
Take a look at the underhanded c contest for plenty of proof where even very senior developers told up front that there is a backdoor in the code often can't find it! And they can't all be blamed on C being C, many of them would work the same in any language.
I don't know the solution, but shaming users and developers for not reviewing code enough sure as hell isn't it.
All that being said, reviewing changes in dependencies is still a good idea, as it can catch many other things.
There’s no shame in refusing to study dependency diffs. If you consciously evaluate the risk and deem it acceptable, I agree with your judgement call.
What I find shameful is the lack of advisory warnings about this risk — by the repository, by the language’s community, by the teaching material.
This should have been a clearly-known risk. Instead, it was a surprise. The shame here falls on the NPM community as a whole failing to educate its users on the risks inherent in NPM, not the individual authors and users of NPM modules.
> you’re trusting random strangers on the Internet to run code on your systems
I mean that's pretty much how the world works. Even running Linux is trusting random strangers on the internet. Most of the time it works pretty well, but obviously it's not perfect. Even the largest companies in the world get caught with security issues from open source packages (remember Heartbleed?).
When I visit a random website, it is very hard for that website to compromise my computer or my private data. The only really viable way is a zero-day in my browser, or deception (e.g. phishing, malicious download).
When I install an app on my iPhone, it is very very hard for that app to compromise my phone or my private data.
In both of these cases, I can download and run almost any code, and be fairly confident that it won't significantly harm me. Why? Because they're extremely locked down and sandboxed with a high degree of isolation. On the other hand, if I install random software on my desktop computer, or random packages off NPM, I don't have such safety any more.
The prevalence of app stores and the Web itself speaks to the fact that it _is_ possible to trust random strangers without opening yourself up to a big security risk.
What does that have to do with development? My point was nearly everyone uses libraries that are written by strangers from the internet. It mostly works.
My maven projects' dependencies are technically all pinned, but updates are just a "mvn versions:use-latest-releases" away. But, crucially, I have a file that lists which GPG key IDs I trust to publish which artifacts. If the maintainer changes, the new maintainer will sign with their key instead, my builds will (configurably) fail, and I can review and decide whether I want to trust the new maintainer or not.
Of course, NPM maintainers steadfastly refuse to implement any kind of code signing support...
What if the maintainer had also given away the key used to sign the previous releases?
I know, it doesn't make much sense why would anyone do that, but then again, I think "why would you that?!" feeling is part of what is triggering the negative reactions here. We just don't expect people to do things we wouldn't.
Even package managers where signing support nominally exists, the take-up is often poor.
IIRC rubygems support package signing but almost no-one uses it, so it's effectively useless.
We're seeing the same pattern again with Docker. They added support for signing (content trust) but unfortunately it's not at all designed for the use case of packages downloaded from Docker hub, so it's adoption has been poor.
I think browsers show how to migrate away from insecure defaults successfully. The client software should start showing big obvious warnings. Later stages should add little inconveniences such as pop-ups and user acknowledgement prompts, eg. 'I understand that what I'm doing is dangerous, wait 5 seconds to continue'. The final stage should disable access to unsigned packages without major modifications to the default settings.
Browser security has heavily benefited from the fact that ther are a small number of companies with control over the market and an incentive to improve security.
Unfortunately the development world doesn't really have the same opporunities.
If, for example, npm started to get strict about managing, curating, security libs, they could just move to a new package manager.
Security features (e.g. package signing, package curation) have not been prioritised by developers, so they aren't widely provided.
actually when publishing to the biggest java package repository (sonatype) you NEED to sign your packages.
also you can't transfer ownership without giving away your domain or github account. But you can add others to also upload to your name, but if an accident occurs your liable, too.
Would you have distrusted this maintainer though? If someone takes it over and publishes what appear to be real bug-fixes, I'd imagine most people would trust them. The same goes for trusting forks, or trusting the original developer not to hand over access.
> Would you have distrusted this maintainer though? If someone takes it over and publishes what appear to be real bug-fixes, I'd imagine most people would trust them.
Quite possibly. But I'd make a conscious decision to do it, and certainly wouldn't be in any position to blame the original maintainer.
The sale of any business that makes use of cryptography will generally include the private keys and passwords necessary to ensure business continuity. Code signing would not necessarily protect you against a human-approved transfer of assets as occurred here, whether as part of a whole-business sale or as a simple open-source project handoff.
If you have tons of depedencies then it's not feasible to check every diff. You may able to do it or pay someone if you are a bigger organization, but a small shop or solo developer can't do this.
"If you have tons of depedencies then it's not feasible to check every diff."
Part of bringing in a dependency is bringing in the responsibility for verifying it's not obviously being used badly. One of the things I've come to respect the Go community for is its belief that dependencies are more expensive that most developers currently realize, and so generally library authors try to minimize dependencies. Our build systems make it very easy to technically bring in lots of dependencies, but are not currently assisting us in maintaining them properly very often. (In their defense, it is not entirely clear to me what the latter would even mean at an implementation level. I have some vague ideas, but nothing solid enough to complain about not having when even I don't know what it is I want exactly.)
I've definitely pulled some things in that pulled in ~10 other dependencies, but after inspection, they were generally all very reasonable. I've never pulled a Go library and gotten 250 dependencies pulled in transitively, which seems to be perfectly normal in the JS world.
I won't deny I haven't auditing every single line of every single dependency... but I do look at every incoming patch when I update. It's part of the job. (And I have actually looked at the innards of a fairly significant number of the dependencies.) It's actually not that hard... malicious code tends to stick out like a sore thumb. Not always [1], but the vast majority of the time. In static languages, you see things like network activity happening where it shouldn't, and in things like JS, the obfuscation attempts themselves have a pretty obvious pattern to them (big honking random-looking string, fed to a variety of strange decoding functions and ultimately evaluated, very stereotypical look to it).
And let me underline the point that there's a lot of tooling right now that actively assists you into getting into trouble on this front, but doesn't do much to help you hold the line. I'm not blaming end developers 100%. Communities have some work here to be done too.
I'm not convinced that this incident argues in favor of Go's "a little copying is better than a little dependency", which I continue to strongly disagree with. Rather, it indicates that you shouldn't blindly upgrade. Dependency pinning exists for a reason, and copying code introduces more problems than it solves.
I don't think you have to get all the way to the Go community's opinions to be in a reasonable place; I think the JS community is at a far extrema in the other direction and suffer this problem particularly badly, but that doesn't mean the other extreme is the ideal point. I don't personally know of any other community where it's considered perfectly hunky-dory to have one-line libraries... which then depend on other one-line libraries. My Python experiences are closer to the Go side than the JS side... yeah, I expect Python to pull in a few more things than maybe Go would, but still be sane. The node modules directories I've seen have been insane... and the ones I've seen are for tiny little projects, relatively speaking. There isn't even an excuse that we need LDAP and a DB interface and some complicated ML library or something... it was just a REST shell and not even all that large of one. This one tiny project yanked in more dependencies than the sum total of several Perl projects I'm in charge of that ran over the course of over a decade, and Perl's a bit to the "dependency-happy" side itself!
I suggest a different narrative. That node.js achieved the decades-old aspiration of fine-grain software reuse... and has some technical debt around building the social and technical infrastructure to support that.
Fine-grain sharing gracefully at scale, is a hard technical and social challenge. A harder challenge than CPAN faced, and addressed so imperfectly. But whereas the Perl community was forced to struggle over years to build it's own infrastructure - purpose-built infrastructure - node.js was able to take a different path. A story goes, that node.js almost didn't get an npm, but for someone's suggestion "don't be python" (which struggled for years). It built a minimum-viable database, and leaned heavily on github. The community didn't develop the same focus on, and control over, its own communal infrastructure tooling. And now faces the completely unsurprising costs of that tradeoff. Arguably behind time, due to community structure and governance challenges.
Let's imagine you were creating a powerful new language. Having paragraph- and line-granularity community sharing could well be a worthwhile goal. Features like multiple dispatch and dependent types and DSLs and collaborative compilation... could permit far finer-grain sharing than even the node.js ecosystem manages. But you would never think npm plus github sufficient infrastructure to support it. Except perhaps in some early community-bootstrap phase.
Unfortunately, one of the things that makes JS dev pull in so many deps is that it lacks a decent standard library. Meanwhile, the Go standard library is amazing!
If you're on point enough to know which features/bugfixes you're getting then you're probably doing enough to be safe already. Just don't go around running npm -u for no reason and you should be fine.
The only way to be truly safe from this attack vector is to own all of your dependencies, and nobody is willing to do that so we're all assuming some amount of risk.
That will work assuming have have audited the code already, but you will also have to audit every (changed dependency)(factorial) every time you bump a dependency!
I would argue that having tons of dependencies is a problem in and of itself. This has become normal in software development because it’s been made easy to include dependencies, but a lot of the costs and risks are hidden.
Maybe not, but you can avoid updating unless necessary. Assuming you only make necessary updates (and at least do a cursory check at the time) and vet any newly added-dependencies as you go, you can greatly reduce your own attack surface. You're still probably vulnerable to dependency changes up the chain, but then at least you're depending on a community that is ostensibly trustworthy (i.e. if every maintainer at least feels good about their top-level dependencies then the whole tree should be trustworthy).
I would caution one to not have tons of dependencies. More surface area in terms of the amount of 3rd party libraries/developers means more chances that one of them is not a responsible maintainer, as in this case. That increases the application's security risk.
Then let's hope you're not using Webpack, which alone has several hundred of them and not small ones, mind you... super complex libraries that trying to "own" enough that you'd be able to securely review code diffs is completely infeasible.
That's a big and totally objective reason to abandom the Node.js/NPM ecosystem, like its original author did.
A language that doesn't have a decent standard library means that you'll have to use huge amounts of code that random strangers used, and the chain of dependencies will grow larger and larger.
In languages like Ruby and Python, you have a decent standard library, and then big libraries and frameworks that are maintained by the community, like Rails, Django, SqlAlchemy, Numpy, etc. That's healthy because it minimises or zeros the amount of small libraries maintained by a single guy, thus maximising the amount of code that you can trust (because you can trust the devs and development process of a popular library backed by a foundation or with many contributors).
With Node, almost every function comes from a different package. And there's no bar to entry, and no checks.
If Node.js is going to stay, someone needs to take on the responsability of forming a project, that belongs to an NGO or something, where the more popular libraries are merged and fused into a standard library, like that of Python's. Personally, I'm not touching it until then.
You can't, you're forced to trust to some degree depending on factors specific to your project. If you're writing missile navigation code then you better check every last diff, but if you're writing a recipe sharing site then you don't have the same burden really.
Unfortunately this isn't really doable in today's world of JavaScript development. If you want to use any of the popular frameworks you are installing a metric ton of dependency code. So not only do you have to somehow review that initial set of code, but you need to know how to spot these types of things. Then, once you complete that task, you now have to look at the diffs for each update. And there will be a lot of updates.
What you're suggesting is a great idea from a security perspective. But for typical workflows for JS development it just isn't practical.
Now, maybe this means we need different workflows and less dependencies. But it's so ingrain I don't know that it's easy to fix / change.
If you aren’t reviewing the diffs of your dependencies when you update them, you’re trusting random strangers on the Internet to run code on your systems.
Espionage often spans multi-year timelines of preparation and trust building. No lesser solution will ever be sufficient to protect you. Either read the diffs, or pay someone like RedHat to do so and hope that you can trust them.