Hacker News new | past | comments | ask | show | jobs | submit login
NPM package ‘ua-parser-JS’ with more than 7M weekly download is compromised (reddit.com)
255 points by geocrasher on Oct 22, 2021 | hide | past | favorite | 141 comments



There is an open NPM “audit assertions”[1] RFC which would allow the compromised package to be nested one dependency deeper, and would allow the hijacked ua-parser-JS package to assert that it’s unaffected.

Should this proposal go through, it’s quite possible this would have gone much further. It would have allowed the attacker to wait for detection, and to silence any propagation of that to users of the otherwise legitimate package.

I had to bow out of the conversation because I didn’t feel like I could engage it in a healthy way—I’m way too personally affected by my own experience dealing with dismissal of serious security issues—but I think folks here might be interested in it given this compromise.

1: https://github.com/npm/rfcs/pull/422


A malicious audit assertion is grounds to consider the package malicious. Thus you just mark the parent package as malicious.

Edit: I think your misunderstanding is you're considering "malicious config" and "malicious code" significantly different issues, and arguing malicious config won't be noticed as malicious because it's only malicious in the context of what it enables. I believe this is incorrect. Innocuous seeming usage of a legitimate feature that actually has malicious effect is not a new issue.

Looking at the issue in more detail you were told essentially exactly this.

> A far, far easier version that already exists is: put malicious code directly into an otherwise innocuous package (or as a dependency). No vulnerability warning would exist, and thus there'd be no need for assertions. In other words, your "exploit" is just a more complex way of doing something that has always been possible.

> Once you've trusted a maintainer, you have already completely surrendered any protections to their judgement. If they make a mistake with an assertion, they could also make a mistake with code, or by depending on a package with an unknown vulnerability. Similarly, if they're malicious, they already own you because you depend on their code. This RFC does not introduce any new problems or attack kinds - all it does is add another vector that is more complex and easier to fix than existing ones.

Edit2: Another way of saying it is you're essentially arguing "this feature means that given sufficient control of a popular package to make innocuous seeming but actually malicious changes essentially any compromise is possible", and the proponents of this feature are saying there are already so many ways for that level of access to be exploited this makes no difference.


I don’t regret linking the issue, I hope someone who isn’t going to be as emotionally impacted by it will recognize the risks it introduces. But I regret the energy I’ve spent even trying to get past my own physical pain from nearly two years of burnout being dismissed this way even as I’ve tried to discuss the problem in good faith.

I don’t have energy for this. You, like the people in the issue, are welcome to explain away the similarities between different attack vectors and refuse to imagine how this might present different opportunities. I frankly can’t engage in it anymore. The link is there for people who can.


Is the concern that it will encourage library maintainers to do essentially something analogous to an empty catch block to make the error go away without having to think too hard?

  try:
    use_thirdparty_library()
  except SecurityIssues:
    # Make error go away. It's probably fine.
    pass
This would then hide the problem instead of letting it noisily bubble up the stack.


One way to view it is that someone writes a package that their expected usage is as a build tool. It uses a library that has a input validation vulnerability. Since they view their tool as build tooling they mark their package as not impacted. Npm audit no longer shows the vulnerability.

Someone uses that build tool in a production system in a way the maintainer never even considered.

Npm audit shows nothing to that end user. They are vulnerable.

People will say they misused the tool. But either way, shit gets compromised and it is something could have been prevented if you take the stance that the end user was paying attention to audit findings.

And yes, many corporations and govt agencies take audit findings seriously. Hiding this because a maintainer said not impacted is scary IMO


And on the other hand every week I see a few issues about "regex DDOS" for tools that usually go nowhere near user-generated input. The signal/noise ratio of NPM audit is very poor, and right now I think its main impact is to pressure people to update often. All of that creates a very fertile ground to distribute malicious code through NPM.


I acknowledge and address that other hand both in the linked issue and this comment thread.


Wow. That takes the lax attitude towards security in the NPM world to a whole new level.


To be fair to the proposal’s advocates and the person who posted the blog article which prompted it: they’re not wrong that npm audit is mostly irrelevant noise and actively harmful (incentivizes individuals to ignore meaningful security issues) because of that. Their proposal is just the wrong solution, and last I checked in they were too wedded to it.


npm audit is pretty damn meaningless at the moment. Creating a bog-standard react app with create-react-app (`npx create-react-app my-app`) right now results in

> 68 vulnerabilities (21 moderate, 45 high, 2 critical)

Even installing the latest npm itself (`npm install -g npm@8.1.1`) results in

> 3 moderate severity vulnerabilities


create-react-app made their own security problems by bringing in the entire world, npm audit just makes that clear.

npm itself having vulnerabilities is a more serious problem and it's not clear that they're taking it seriously.


I think security at NPM is "done".

It's a public repository of stuff. End of story. Why should NPM do the job of vetting everything? They aren't getting paid for it (or most of it).


> npm, Inc. is a company founded in 2014, and was acquired by GitHub in 2020.

https://www.npmjs.com/about

> Headquartered in California, [GitHub] has been a subsidiary of Microsoft since 2018.

https://en.wikipedia.org/wiki/GitHub

I think they're effectively a department that generates a lot of PR. They have paid security staff.

https://jobspresso.co/job/software-engineer-platform-2-2-2-2...

This is a job posting for a security engineer at npm from July 4, that appears filled to me. I'm sure as an organization npm inc. is aware of vulnerabilities in their core product, so there's internal back and forth - the usual stuff.


Wow, there's some very dismissive responses in there - and who calls users "muggles"?


Maintainer already released clean versions "on top of" the compromised ones, and NPM acted on reports and removed the compromised versions as well.

Compromised (and no longer downloadable from NPM):

- 0.7.29

- 0.8.0

- 1.0.0

Clean:

- 0.7.28 (last version before the hijack)

- 0.7.30

- 0.8.1

- 1.0.1

Compromised versions apparently contained a cryptomining tool capable of running on Linux, and a trojan that extracts sensitive data (saved passwords, cookies) from browsers on Windows. Both are blocked by up-to-date Windows Defender and presumably other AV software.


We pin all of our npm dependencies and upgrade them via dependabot. Dependabot links to the GitHub or GitLab release for each dependency bump, and I typically skim / scan every single commit to each dependency. But there's no guarantee that what's on GH matches what is uploaded to npm (which is what happened in this case; there are no malicious commits).

Does anyone know of a good way to verify that a npm release matches what's on GH? Version controlling the entirety of node_modules/ and running untrusted updates in a sandbox would work in theory, but in practice many packages contain minified js which makes the diffs between version bumps unreadable.


Skip the nonsense and just check your dependencies in directly to your repo. The separation has no real world gains for developers and doesn't serve anyone except the host of your source repo. As it turns out most people's repo host is also the operator of the package registry they're using, so there aren't even theoretical gains for them, either.

Doing it this way doesn't preclude the ability to upgrade your dependencies, it _completely_ sidesteps the intentional or unintentional desync between a dependency's source and its releases, it means people have to go out of their way to get a deployment that isn't reproducible, and in 4 years when your project has rotted and someone tries to stand it up again even if just temporarily to effect some long-term migration, then they aren't going to run into problems because the packages and package manager changed out from beneath them. I run into this crap all the time to the point that people who claim it isn't a problem I know have to be lying.


> I run into this crap all the time to the point that people who claim it isn't a problem I know have to be lying.

I don't think that's right.

Just because someone denies a problem exists—a problem that you know for a fact, with 100% certainty exists—doesn't mean they're lying.

It may mean you know they are wrong, but wrong != lying, and it's a good thing to keep in mind.

If you have external reasons to believe that the person you're talking to should or does know better, then it's fair to say they are lying.

But, in general, if you accuse someone who is simply wrong to be lying, you're going to immediately shut down any productive conversation that you could otherwise have.


People don't do this because `node_modules` can be absolutely massive (hundreds of megabytes or more), and a lot of people don't like (for various reasons) such large repositories.

There is a deprecated project at my work that committed the entire yarn offline cache to the repo. At least those were gzipped, but the repo still had a copy of every version of every dependency.

It isn't a good long term solution unless you really don't care at all about disk space or bandwidth (which you may or may not).


A middle ground that I've seen deployed is corporate node mirrors with whitelisted modules. Then individual repos can just point to the corporate repo. Same thing for jars, python packages, etc.


And build pipelines that fail due to the size of the repo.


Committing node_modules and reproducibility are somewhat not orthogonal though.

You can get reasonable degrees of reproducibility by choosing reasonable tools: Yarn lets you commit their binary and run that in the specified repo regardless of which version you have installed globally. Rush also allows you to enforce package manager versions. Bazel/rules_nodejs goes a step further and lets you pin node version per repo in addition to the package manager. Bazel+Bazelisk for version management of Bazel itself provides a very hermetic setup.

Packages themselves are immutable as long as you don't blow away your lockfile. I used to occasionally run into very nasty non-reproducibility issues with ancient packages using npm shrinkwrap (or worse, nothing at all), but since npm/yarn got lockfiles, these problems largely went away.

These days, the non-hermeticity stuff that really grinds my gears is the very low plumbing stuff. On Mac, Node-GYP uses xcode tooling to compile C++ modules, so stuff breaks with MacOS upgrades. I'm hoping someone can come up with some zig-based sanity here.

As for committing node_modules, there are pros and cons. Google famously does this at scale and my understanding is that they had to invest in custom tooling because upgrades and auditing were a nightmare otherwise. We briefly considered it at some point at work too but the version control noise was too much. At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history)


Ill-advised tool adoption is exactly the problem I'm aiming to get people to wake up and say "no" to. You need only one version control system, not one reliable one plus one flaky one. Use the reliable one, and stop with the buzzword bandwagon, which is going to be a completely different landscape in 4 years.

> Packages themselves are immutable as long as you don't blow away your lockfile

Lockfiles mean nothing if it's not my project. "I just cloned a 4 year old repo and `npm install` is failing" is a ridiculous problem to have to deal with (which to repeat, is something that happens all the time whether people are willing to acknowledge it or not). This has to be addressed by making it part of the culture, which is where me telling you to commit your dependencies comes from.


This problem does happen, but committing node_modules won't fix it. Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo. Wherever those files came from, 4 years later you upgraded your OS, and now the install step will fail (in my experience usually because of node-gyp).

Unless you were talking about committing compiled binaries as well, in which case every contributor must be running the same arch/OS/C++ stdlib version/etc. M1 laptops didn't exist 4 years ago. If I'm using an M1 today, how can I stand up this 4 year old app?

The real problem is reproducible builds, and that's not something git can solve.


Assuming the npm registry doesn't dissapear, npm will download the exact same files you would have committed to your repo

I think that’s where you missed what the gp is saying.

If everything is checked in to source control, npm will have nothing to download. You won’t need to call npm install at all, and if you do it will just return immediately saying everything is good to go already.

The workflow for devs grabbing a 10 year old project is to check it out then npm start it.


You are probably not running the same OS/node/python version as you were 10 years ago. If you were to try this in real life, you'd get an error like this one. https://stackoverflow.com/questions/68479416/upgrading-node-....

The error helpfully suggests you:

>Run `npm rebuild node-sass` to download the binding for your current environment.

Download bindings? Now you're right back where you started. https://cdn140.picsart.com/300640394082201.jpg

Of course if you keep a 10 year old laptop in a closet running Ubuntu 10.04 and Node 0.4, and never update it through the years, then your suggestion will work. But that workflow isn't for me.


> which to repeat, is something that happens all the time

Are you sure this isn't just a problem in your organization? As I qualified, the issue you're describing was a real pain maybe two or three years ago, but not anymore IME. For context, my day job currently involves project migrations into a monorepo (we're talking several hundred packages here) and non-reproducibility due to missing lockfiles is just not an issue these days for me.

As the other commenter mentioned, node-gyp is the main culprit of non-reproducibility nowadays, and committing deps doesn’t really solve that precisely because you often cannot commit arch-specific binaries, lest your CI will blow up trying to run mac binaries


> Are you sure this isn't just a problem in your organization?

I'm really struggling to understand the kind of confusion that would be necessary in order for this question to make sense.

Why do you suspect that this might be a problem "in [my] organization"? How could it even be? When I do a random walk through projects on the weekend, and my sights land on one where `npm install` ends up failing because GitHub is returning 404 for a dependency, what does how things are done in my organization have to do with that?

I get the dreadful feeling that despite my saying "[That] means nothing if it's not my project", you're unable to understand the scope of the discussion. When people caution their loved ones about the risk of being the victim of a drunk driving accident on New Years Eve, it doesn't suffice to say, "I won't drink and drive, so that means I won't be involved a drunk driving accident." The way we interact with the whole rest of the world and the way it interacts with us is what's important. I'm not concerned about projects under my control failing.

> non-reproducibility due to missing lockfiles is just not an issue

Why do you think that's what we're talking about? That's not what we're talking about. (I didn't even say anything about lockfiles until you brought it up.) You're not seeing the problem, because you're insisting on trying to understand it through a peephole.


I mean, of course I'm going to see this from the lenses of my personal experience (which is that nasty non-reproducibility issues usually would only happen when someone takes over some internal project that had been sitting in a closet for years and the original owner is no longer at the company). Stumbling upon reproducibility issues in 4 year old projects on Github is just not something that happens to me (and I have contributed to projects where, say, Travis CI had been broken in master branch for node 0.10 or whatever) and getting 404s on dependencies is something I can't say I've experienced (unless we're talking about very narrow cases like consuming hijacked versions of packages that were since unpublished) or possibly a different stack that uses git commits for package management (say, C) - and even then, that's not something I've run into (I've messed around w/ C, zig and go projects, if it matters). I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

As I mentioned, my experience involves seeing literally hundreds of packages, many of which were in a context where code rot is more likely to happen (because people typically don't maintain stuff after they leave a company and big tech attrition rate is high, and my company specifically had a huge NIH era). My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues in the first place. I wouldn't be in a position to dictate that they should commit node_modules in that case.

Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.


> I don't think it's a matter of me having a narrow perspective, but maybe you could enlighten me.

I have to think it is, because the "shape" of your responses here have been advice about things that I/we can do to keep my/our own projects (e.g. mission critical or corporate funded stuff being used in the line of business) rolling, and completely neglected the injured-in-collision-with-other-intoxicated-driver aspect.

Again, I _have_ to think that your lack of contact with these problems has something to do with the particulars of your situation and the narrow patterns that your experience falls within. Of the projects that match the criteria, easily 40% of them are the type I described. (And, please, no pawning it off with a response like "must be a bad configuration on your machine"; these are consistent observations over the bigger part of a decade across many different systems on different machines. It's endemic, not something that can be explained away with the must-be-your-org/must-be-your-installation handwaving.)

> code rot is more likely [...] My negative OSS experience has mostly been that package owners abandon projects and don't even respond to github issues

Sure, but the existence of other problems doesn't negate the existence of this class of problems.

> I wouldn't be in a position to dictate that they should commit node_modules in that case.

Which is why I mentioned that this needs to be baked in to the culture. As it stands, the culture is to discourage simple, sensible solutions and prefers throwing even more overengineered tooling that ends up creating new problems of its own and only halfway solves a fraction of the original ones. (Why? Probably because NodeJS/NPM programmers seem to associate solutions that look simple as being too easy, too amateurish—probably because of how often other communities shit on JS. So NPMers always favor the option that looks like Real Serious Business because it either approaches the problem by heaping more LOC somewhere or—even better—it involves tooling that looks like it must have taken a Grown Up to come up with.)

> Maybe you could give me an example of the types of projects you're talking about? I'm legitimately curious.

Sure, I don't even have to reach since as I said this stuff happens constantly. In this case, at the time of writing the comments in question, it was literally the last project I attempted: check out the Web X-Ray repo <https://github.com/mozilla/goggles.mozilla.org/>.

This is conceptually and architecturally a very simple project, with even simpler engineering requirements, and yet trying to enter the time capsule and resurrect it will bring you to your knees on the zeroeth step of just trying to fetch the requisite packages. That's to say nothing of the (very legitimate) issue involving the expectation that even with a successful completion of `npm install` I'd still generally expect one or more packages for any given project to be broken and in need of being hacked in order to get working again, owing to platform changes. (Several other commenters have brought this up, but, bizarrely, they do so as if it's a retort to the point that I'm making and not as if they're actually helping make my case for me... The messy reasoning involved there is pretty odd.)


> check out the Web X-Ray repo <https://github.com/mozilla/goggles.mozilla.org/>.

Thanks for the example! Peeking a bit under the hood, it appears to be due to transitive dependencies referencing github urls (and transient ones at that) instead of semver, which admittedly is neither standard nor good practice...

FWIW, simply removing `"grunt-contrib-jshint": "~0.4.3",` from package.json and related jshint-related code from Gruntfile was sufficient to get `npm install` to complete successfully. The debugging just took me a few minutes grepping package-lock.json for the 404 URL in question (https://github.com/ariya/esprima/tarball/master) and tracing that back to a top-level dependency via recursively grepping for dependent packages. I imagine that upgrading relevant dependencies might also do the trick, seeing as jshint no longer depends on esprima[0]. A yarn resolution might also work.

I'm not sure how representative this particular case is to the sort of issues you run into, but I'll tell you that reproducibility issues can get a lot worse in ways that committing deps doesn't help (for example, issues like this one[1] are downright nasty).

But assuming that installation in your link just happens to have a simple fix and that others are not as forgiving, can you clarify how is committing node_modules supposed to help if you're saying you can't even get it to a working state in the first place? Do you own the repo in order to be able to make the change? Or are you mostly just saying that hindsight is 20-20?

[0] https://github.com/jshint/jshint/blob/master/package.json#L4...

[1] https://github.com/node-ffi-napi/node-ffi-napi/issues/97


I don't understand your questions.

My message is that for a very large number of real world scenarios the value proposition of doing things the NPM way does not result in a Pareto improvement, despite conventional wisdom suggesting otherwise.

I also don't understand your motivation for posting an explanation of the grunt-related problem in the Web X-Ray repo. It reminds me of running into a bug and then going out of my way to track down the relevant issue and attach a test case crafted to demonstrate the bug in isolation, only to have people chime in with recommendations about what changes to make to the test case in order to not trigger the bug. (Gee, thanks...)

And to reiterate the message at the end of my last comment: the rationale of trying to point at how bad mainstream development manages to screw up other stuff, too, is without.


Personally, I see code as a fluid entity. If me spending a few minutes to find a way to unblock something that you claimed to "bring you to your knees on the zeroeth step" is a waste of time, then I guess you and I just have different ideas of what software is. For me, I don't see much value in simply putting up with some barely working codebase out of some sense of historical puritanism; if I'm diving into a codebase, the goal is to improve it, and if jshint or some other subsystem is hopelessly broken for whatever reason, it may well get thrown out in favor of something that actually works.

You may disagree that the way NPM does things works well enough to be widely adopted (and dare I say, liked by many), or that true reproducibility is a harder problem than merely committing files, but by and large, the ecosystem does hum along reasonably well. Personally, I only deal with NPM woes because others find value in it, not because I inherently think it's the best thing since sliced bread (in fact, there are plenty of aspects of NPM that I find absolutely atrocious). My actual personal preference is arguably even less popular than yours: to avoid dependencies wherever possible in the first place, and relentless leveraging of specifications/standards.


> My actual personal preference is arguably even less popular than yours: to avoid dependencies wherever possible in the first place

You lack the relevant facts to even be able to speculate about this. You haven't even done an adequate job grappling with the details presented here without the need for assumptions.

> If me spending a few minutes to find a way to unblock something

Imagine you have a coworker who frequently leaves unwashed dishes in the sink. You wash and put them away, but it happens enough that you decide to bring it up. Now imagine that when you do bring it up, your coworker lectures you, condescendingly and at length, about the steps you can take to "unblock" the dirty dishes problem (by washing them), as if there's some missing piece involving not knowing how and that that's the point of the discussion, instead of the fact that this (entirely avoidable) problem is occurring in the first place.

You're not unblocking anything, nor would this be the place to do so even if you were. Under discussion is the phenomenon that, for all the attention and energy that has gone into NPM, frequently old projects fail on some very basic preliminary step: just being able to complete the retrieval of the dependencies, i.e. the one thing that `npm install` is supposed to do. You voiced your skepticism and then with every response to this thread moved the goalposts further and further away, starting out generally dismissive that the problem exists, then wrapping up by offering your magnanimity to what you choose to see as someone struggling.

There is no struggle here, nor are any of the necessary conditions that would make your most recent comments relevant satisfied. Like the earlier anecdote about the annoyance of dealing with the type of person who jumps in to offer pointers about how to change a failing test case so that it no longer reveals the defect it was created to isolate, these are responses that fail at a fundamental level to understand the purpose and greater context that they're ostensibly trying to contribute to.

If `npm install` is broken on a Thursday and annoys the person who ends up forced to work through that problem, and then you show up on a Saturday with a explanation after-the-fact about what to do in situations like the one that happened on Thursday, what possible purpose does that serve? At best, it makes for an unfocused discussion, threatening to confuse onlookers and participants alike about what exactly the subject is (whether the phenomenon exists vs morphing this into a StackOverflow question where there's an opportunity to come by with the winning answer and subtly assert pecking order via experience and expertise). At worst, it does that and also comes across as incredibly insulting. And frankly, that is the case here.

> You may disagree that the way NPM does things works well enough to be widely adopted (and dare I say, liked by many)

By your count, how many times have you moved the goalposts during this, and how many more times do you plan on moving them further?

> I don't see much value in simply putting up with some barely working codebase out of some sense of historical puritanism

Major irony in relation to the comment above and given the circumstances where this discussion started. Shocking advice to let your source control system manage your source code and examine the facts about whether late-fetching dependencies the NPM way makes it worth the cost of putting up with the downsides that I brought up and the recurring security downsides that lead to writeups like the article that was originally posted here.


> At work, we've looked into committing tarballs (we're using yarn 3 now) but that also poses some challenges (our setup isn't quite able to deal w/ a large number of large blobs, and there are legal/auditing vs git performance trade-off concerns surrounding deletion of files from version control history

With Git LFS the performance hit should be relatively minimal (if you delete the file it won't redownload it on each new clone anyways, stuff like this).


And what happens when you need to update those dependencies?

Software is a living beast, you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues.

Yes, keep a separate repo, but also keep it updated. The best approach is to maintain a lag between your packages and upstream so issues like these are hopefully detected & corrected before you update.


> And what happens when you need to update those dependencies?

Then you update them just like you do otherwise, like I already said is possible.

> you can't keep it alive on 4yr-old dependencies. In fact, you've cursed it with unpatched bugs and security issues

This is misdirection. No one is arguing for the bad thing you're trying to bring up.

Commit your dependencies.


Let's say that the day you update your dependencies is after this malware was injected but before it was noticed.

Now you have malware in your local repo :(

Having a local repo does not prevent malware. Your exposure to risk is less because you update your dependencies less frequently, but the risk still exists and needs to be managed. There's no silver bullet.


This is more misdirection. By no means am I arguing that if you're doing a thousand stupid things and then start checking in a copy of your dependencies, that you're magically good. _Yes_ you're still gonna need to sort yourself out re the 999 other dumb things.


Sounds like a great way to end up with what $lastco called "super builds" - massive repos with ridiculous amounts of cmake code to get the thing to compile somewhat reliably. It was a rite of passage for new hires to have them spend a week just trying to get it to compile.

All this does is concentrate what would be occasionally pruning and tending to dependencies to periodic massive deprecation "parties" when stuff flat out no longer works and deadlines are looming.


That’s the whole deal with yarn 2 isn’t it? With their plug’n’play it becomes feasible to actually vendor your npm deps, since instead of thousands upon thousands (upon thousands) of files you only check in a hundred or so zip files, which git handles much more gracefully.

I was skeptical at first as it all seemed like way too much hoops to jump through, but the more I think about it the more it feels that it’s worth it.


> Skip the nonsense and just check your dependencies in directly to your repo.

Haha, no.

That would increase the size of the repository greatly. Ideally, you would want a local proxy where the dependencies are downloaded and managed or tarball the node_modules and save it in some artifacts manager, server, or s3 bucket


What's the problem with a big repository? The files still need to be downloaded from somewhere. It's mostly just text anyway so no big blobs which is usually what causes git to choke.

For that one-off occasion when you are on 3G, have a new computer without an older clone, and need to edit files without compiling the project (which would have required npm install anyway), there is git partial clone.

Does npm have a shared cache if you have several projects using the same dependencies?


>Does npm have a shared cache if you have several projects using the same dependencies?

pnpm does, that's why I'm using it for everything. It's saving me many gigabytes of precious SSD space.

https://github.com/pnpm/pnpm


Now anything with native bindings is broken if you so much as sneeze.


> Does anyone know of a good way to verify that a npm release matches what's on GH?

I'm not aware of any way to do this, and it's a huge problem. It would be great if they introduced a Docker Hub verified/automated builds[0]-type thing for open source projects. I think that would be the only way we could be certain what we're seeing on GitHub is what we're running.

Honestly it’s hard to believe we all just run unverifiable, untrustable code. At the very least NPM they could require package signing, so we'd know the package came from the developer. But really NPM needs to build the package from GitHub source. Node is not a toy anymore, and hasn't been for some time—or is it?

[0] https://docs.docker.com/docker-hub/builds/


This is ~solvable at a third party level. Nearly everything on NPM (the host) is MIT licensed or similar. When packages are published, run their publish lifecycle and compare to the package that’s actually published.

I don’t have the resources or bandwidth to do this, but it’s pretty straightforward +- weird publishing setups.

Edit: of course this doesn’t apply to private repositories but… you’re in a whole different world of trust at that point.


I started working on this exact problem a few years ago. Didn't get far, though, I think I stopped because I assumed there just wouldn't be any real interest.


I couldn't find the code, so I just started over. Haven't hosted it anywhere yet.

https://github.com/connorjclark/npm-package-repro


Awesome! Thank you.


Doesn't npm have a facility to tell it to download releases directly from source? Most package managers have in one form or the other, but I'm not very familiar with npm.

To be honest I'm not sure if npm (the service, not the tool) and similar services really add all that much value. The only potential downside I see is that repos can disappear, but then again, npm packages can also disappear. I'd rather just fetch directly from the source.

This is how Go does it and I find it works quite well. It does have the GOPROXY now, but that's just an automatic cache managed by the Go team (not something where you can "login" or anything like that), so that already reduces the risk, and it's also quite easy to outright bypass by setting GOPROXY=direct.


Deno (https://deno.land/), another runtime based on v8, has a system similar to Go, with local and remote imports https://deno.land/manual@v1.11.5/examples/import_export.


You can’t really fetch from git because for the majority of packages there is a non-standard build step that packages do not consistently specify in package.json, if at all. Packages on NPM are just tarballs uploaded by the author. Furthermore, what about transitive dependencies?


> what about transitive dependencies?

What about them?

As for unspecified build steps: this seems like a solvable problem. I would just submit a patch.


Fetching from git is possible. The downsides are lack of semver, having to clone the full history of the repo, and having to clone the complete repo including files not needed for just using the lib, eg preprocessors, docs and tests.


I assume people use tags and such no? That gives you versions and you can just fetch it at a specific tag. Either way, this is very much a solvable problem.

A few docs and tests doesn't strike me as much of an issue.


Renovate gives you links to diffs on the published npm package: https://app.renovatebot.com/package-diff?name=hookem&from=1....

It's also great at doing bulk updates, so you get a lot less spam than you do from Dependabot.


Technically you could pin directly to a git commit instead of an NPM release


Although npm supports lifecycle methods that run before publish / on install, many packages fail to use those correctly (or at all) yet still require a build step, so using the GH repo directly very often does not work.


This is the right answer. Everyone else replying is patently wrong.


Edit: this was intended for a child comment sorry


Why is this a link to Reddit? Usually that’s when the link is to the specific Reddit discussion because that’s what is being shared, but that’s not the case here.

@mods I think the link should be to the actual post here: https://github.com/faisalman/ua-parser-js/issues/536


I don't know why anyone would auto-download untrusted packages as part of their own build/release chain and not review the code manually. Anything could be in there.

The way people use NPM is pure fucking insanity.


I mean, the alternative is reviewing 3000 packages by hand. No thanks. I’ll trust Snyk to let me know if anything is compromised.


I suspect a huge fraction of those packages need to be rolled up into a standard library for which some foundation can take money and do proper release engineering.


But then aren’t you still trusting a third-party to properly audit their own code and ensure it doesn’t get compromised? Where’s the line?


It's easier to trust 1 org than 1000 different developers.


I don't know where the line is, but I know users of eg Debian don't find trojans and cryptominers appearing on their systems after routine updates. They have thousands of obscure packages, but they don't let just anyone upload something.


You have to trust someone at some point. I think that was part of the point of the "Reflections on Trusting Trust" lecture.


When these compromised npm packages happen, the blast radius would be a fraction of what it is today if the JavaScript standard library wasn't so anaemic.


Scanners are overrated. They only do basic heuristics like checksumming and pattern matching. It's like antivirus for your code and we all know how useless that is.


When you've got thousands of packages, I'll take a scanner over nothing. For a big ass monorepo like the one I maintain at work, it's a great tool.

Snyk does report on vulnerabilities that don't necessarily affect your usage of a package, but IMHO, that's erring on the safe side and I'm ok with that.


most importantly, they check version numbers against known compromised versions like this, just in case you aren't on HN to see it.


Snyk is okay but I've had my issues with them. Never put all of your eggs in one basket.


Could you elaborate on those issues ?


Totally agree, this is a huge reason I use Yarn. Reproducibility.

The catch is, particularly with yarn 3, if you want the soundness and capability to vendor zip files (completely removing the concept of node_modules) you need to use PnP and deal with some packages that don’t define their dependencies very well.

There’s still packages that will try to import/require stuff they dont declare, which is unsound. So when you get the warning, you need to declare it yourself in the root yarn config.

It’s a bit annoying, but the payoff is soundness.

You can clone to a new machine (be it a for a dev, or ci/cd) and be good to go without touching the network.

It’s like going from js to ts. Extra rigour for a price, but more often than not the price is worth it.


Dies yarn still not honor the yarn.lock of transitive dependencies?


Even if they were trusted, it's still a horrific waste of resources. Maybe it's related to the insane trendchasing attitude ("gotta get the latest!!!!!111") that has infected JS/web development. Combine that with "free" cloud CI services and the general spreading stupidity of not knowing where the data actually is or what's going on (popularity of "quick tutorials" that have completely clueless noobs running one-line commands which can silently manipulate gigabytes of data), and it's not hard to see the result.


Convenience and invented here syndrome.

https://en.m.wikipedia.org/wiki/Invented_here


I recently looked at cloudflare workers. Then I remembered that the JS ecosystem is built around NPM. No thanks.


Our approach to solving this on our open source project:

1. Enforce use of lock files 2. On any PR that changes any hash in the lock file, run a static analysis that looks for “permission” changes in any source file. E.g., used network, used env vars, used filesystem 3. Comment on the PR with any observed permission changes

It is very alpha-quality and not yet generally available but here are some details: https://deps.semgrep.dev/


Does that tool resolve things like:

   setTimeout("dan" + "gerous" + "func" + "tion" + "()", 1);
I'm only somewhat superficially familiar with JavaScript (and only in the browser, not NodeJS), but you can probably get far more obfuscated than that; or at least, you can in some other similar-ish languages. For example in Ruby on Rails they use (or used? It's been years so may be different now) all sorts of meta-programming stuff, which at times actually made it somewhat difficult to find a method definition if you didn't know where to look; it was all dynamically defined at runtime.

My worry with such a tool would be that malicious actors would go out of their way to hide their code from such tools, and that the tool would give me a false sense of security.


Yeah, static analysis will never be enough, you'll need to parse the code and evaluate it in a sandbox to find evil functions.

Another more obfuscated example:

    function hello() {
      console.log("I'm stealing all your passwords now");
    };hello();

Can be turned into:

    (function(_0x3c7f26,_0x1837dd){var _0x8bd36e=_0x5929,_0x90b69b=_0x3c7f26();while(!![]){try{var _0x249481=parseInt(_0x8bd36e(0x1cc))/0x1*(parseInt(_0x8bd36e(0x1d0))/0x2)+-parseInt(_0x8bd36e(0x1ce))/0x3+parseInt(_0x8bd36e(0x1d5))/0x4+-parseInt(_0x8bd36e(0x1d4))/0x5*(parseInt(_0x8bd36e(0x1d2))/0x6)+-parseInt(_0x8bd36e(0x1d1))/0x7+-parseInt(_0x8bd36e(0x1cd))/0x8+-parseInt(_0x8bd36e(0x1cf))/0x9*(-parseInt(_0x8bd36e(0x1cb))/0xa);if(_0x249481===_0x1837dd)break;else _0x90b69b['push'](_0x90b69b['shift']());}catch(_0x19e610){_0x90b69b['push'](_0x90b69b['shift']());}}}(_0x2849,0xf22bd));function hello(){var _0x2f0a61=_0x5929;console[_0x2f0a61(0x1d3)]('I\x27m\x20stealing\x20all\x20your\x20passwords\x20now');}function _0x5929(_0x2fab05,_0x1e12e){var _0x284982=_0x2849();return _0x5929=function(_0x5929e4,_0x24e341){_0x5929e4=_0x5929e4-0x1cb;var _0x1372b0=_0x284982[_0x5929e4];return _0x1372b0;},_0x5929(_0x2fab05,_0x1e12e);};hello();function _0x2849(){var _0x114094=['8091OneRlW','38iQspVa','2224236lvStpa','54nJkbsM','log','908785hkDdiw','7617548ADhgFx','32610QhPrTC','44539hBNEpr','7460904ZMwHlf','5412480lnulMs'];_0x2849=function(){return _0x114094;};return _0x2849();}
No way for a static analysis tool to catch that.


Tumblr themes used to have a similar static analysis and this was the exact way to get around it (well, `atob()` in that specific case)


Our analysis actually covers this case (and the obfuscation case below) just fine: we don’t care about being precise, just very conservative. So any use of setTimeout where the first argument is not a function is flagged as a potential dynamic execution. There are a couple other security teams at other companies who are collaborators on the project with us who have contributed Semgrep rules that cover what should be all the dynamic injection points in JS (eval, vm.*, child_process). In fact many obfuscation techniques will show up as a permission change as they involve the addition of dynamic behavior. Which is a great signal to have—why is my dependency publishing obfuscated code?

And making it does highlight “where should we look” in lockfile diffs, which makes us more comfortable updating frequently.

Also, the strongest signal comes from 0 -> 1+ permission transitions. Like if leftpad suddenly adds an exec or setTimeout call.

Would love to hear more critique of the approach though!


> Would love to hear more critique of the approach though!

It wasn't really a "critique" as such, and more of a question. It's not hard to imagine being able to fool a naïve tool – flagging any dynamic code execution seems like the right approach.


An unfortunate side-effect of this type of system, in my experience, is that you end up being stuck on very old dependencies for a very long time.

When a security issue is discovered, perhaps months or years later, it is extremely difficult to upgrade because

1) the patch will always be applied to the most current release, which could be many major versions beyond what you run, and

2) manually backporting the security fix is impossible without creating your own fork and the code will likely be so different as to not be able to cleanly apply the patch.

You are creating so much friction to update dependencies that you will find yourself unable to update them in the future.

Small example: I recently upgraded a dependency that required us to move from node 12 to node 14. That required us to upgrade more packages, including our database ORM. The ORM was 2 years out of date. Now we have changes that literally affect the entire application and require significant regression testing (automated and manual) because the risk surface is so much larger. This wouldn't have happened if we didn't use a lock file with pinned versions.

------

In the past (node 4.x), lock files weren't written by default, and I (on a small project) had sufficient test coverage that always allowed NPM to install the latest minor version bump of every package, every time. If there was a problem caught by the tests, then I immediately fixed the issue with the upgrade.

The nice thing about this is that I was always on the latest version of every dependency, so I always had the latest security fixes, and the online docs for the dependencies were always accurate to what I was using.

If this particular exploit happened, it would have been more likely to get deployed but it would have also automatically gotten fixed as soon as the patch update rolled around. Whereas with a lock file, someone may have updated it and then not been aware when the "fixed" version came out because npm/yarn don't tell you that.

As a JavaScript dev (full stack) I think lock files are a bad idea by making us LESS secure, not more (keeping around insecure older dependencies that can't be patched).


Lock files making you less secure is untrue. Using lock files does not mean your dependencies are not being updated, it means you have control over _when_ they are updated. Which means 1. you won't be stuck on a dependency update breaking your code while you are working on an unrelated feature, because you can work on adapting your code to the update independently, 2. you can check whether lock file update breaks the tests separately from testing your own changes, 3. with vcs you can retroactively investigate which dependency update caused a particular breakage or behavior change.


> If this particular exploit happened, it would have been more likely to get deployed but it would have also automatically gotten fixed as soon as the patch update rolled around.

Note that updating your deps to a non-hacked version would not "fix" it, though. The machine is persistently compromised until reimaged if it was given the chance to install the new package.


Pre-lockfiles I had multiple instances of a new version of a dependency being released that didn't follow semver correctly (it can be hard to identify what constitutes a breaking change at times).

Without a lockfile this was a pain to fix, now you can just not merge the upgrade until the problem is resolved (rather than every branch suddenly failing)


> Now we have changes that literally affect the entire application and require significant regression testing (automated and manual) because the risk surface is so much larger. This wouldn't have happened if we didn't use a lock file with pinned versions.

But without a lock file you would always be using the latest version, right?

Wouldn't those two years of constant possible breaking updates (would you know if something updates to the latest version? How do you manage that all versions are the same across all machines?) require a lot more regression testing over those entire two years and introduce a lot more risk of breakage?


No, because the changes would have been many small incremental changes, each of which is easier to validate on it's own. It's the sun of those dozens of update all at once that creates the problem.


NPM/any package manager should not let you upload/manage packages with more than 1k downloads a month without 2FA. It's a liability and this is bound to happen.


IMO they—NPM especially—should also prevent publishing arbitrary packages where the source is accessible. In such cases, `npm publish` should be an authorization check and run your publish lifecycle to produce the package.

It’s absolutely bonkers that an ecosystem where the vast majority of packages are permissively licensed allows those packages to be published with basically whatever the heck you want, completely independent of whatever you would find inspecting and building it yourself.


Does any package ecosystem do this? Rust had the opportunity to start off doing the right thing but instead uses long-lived API tokens which can get easily owned.


NPM supports TOTP, but it’s opt in.


This can be difficult to integrate into CI systems though. I don't recall ever seeing a way to allow interactive input during a CI pipeline to provide a 2FA token.

I'd argue that a completely automated release pipeline still trump's a manual/local process with 2fa (for reproducibility and transparency mainly).

Being able to have a hold step that also prompts for a token in order to publish in a CI system could be pretty neat though and best of both worlds


I’ve been saying this for years: https://news.ycombinator.com/item?id=17514541

Though rather than mandating it, just show if the publisher has it enabled and let the market decide.


Yes, I'd love it if NPM would mark any package as insecure if it transitively depended on a developer that didn't use 2FA.


Semi-related: Microsoft is going to be (or has begun) checking for differences between published npm packages and their source control.

I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source https://github.com/microsoft/Secure-Supply-Chain


How will this work with say typescript? It's pretty common to not commit the transpiled code - do you expect this to change, or would this check basically do a build + pack and then compare?


Yes it will have to do that. Some packages .npmignore the source files so those may not even be present.

But doing that is actually easier than having to deal with all the different ways of versioning packages. For example, I am not using git tags in my project, some others may be, some may be using a branch not named `master` to deploy from. They're going to have to intelligently traverse the git tree to draw meaningful conclusions.


Submitted link is to a reddit thread.

Better one might be the GitHub issue discussing it: https://github.com/faisalman/ua-parser-js/issues/536


which has already been submitted here: https://news.ycombinator.com/item?id=28960439


This appears to affect other popular packages as well, e.g. fbjs.

It‘s time for proper supply chain security, the typo- and namesquatting minefield that is Pypi shows clearly why there is a need. Way too easy to distribute malware this way or register repos for popular packages by appending a version number for example. I‘m constantly running into shady deserted packages. Why not partnering with a large security vendor? The infrastructure is often provided as a donation from cloud providers as well


“ I noticed something unusual when my email was suddenly flooded by spams from hundreds of websites (maybe so I don't realize something was up, luckily the effect is quite the contrary).”

I’ve been running a small deal aggregation and notification website for 15 years. Almost no new users these days. A few months ago I was excited to see dozens of new accounts created each day but no one ever clicked the verification link that was sent to their email. That’s when I first learned that creating unused accounts is a technique used by hackers to flood people’s inboxes to hide fraudulent purchases or other nefarious activity like publishing a hijacked npm package. A simple what is x + y? captcha solved the issue.


For packages that popular, authority of more than one people should required or at least some kind of delay until npm update/install command catch it unless specifically that version set. If i remember correctly if someone hijacked my npm account and published something i wouldn't even noticed it. Obviously publisher should get an email or maybe even sms.


The delay idea sounds promising. I'd want that as part of npm, but I suppose someone could build an NPM proxy that adds the delay.

Has anyone tried something like this, even for a different ecosystem?


It's possible with AWS CodeArtifact using EventBridge to promote after a delay. The company I work for has a three day delay on npm.


A large part of npm's staggering success is the very low friction to publish packages. I worry that any mitigation that slows down the publishing side of things would kill innovation in the ecosystem.


I imagined an optional delay for npm install/update. That way it would only slow down users who wanted more time.


Got it. Yea I agree that's useful and easy mitigation, and I'm sure enterprise users would certainly pay for other safety labeling/filtering/signing of packages.


It makes so much sense, surprised me that it doesn't exist.


Pretty sure that every time (or almost every time) when I run `npm install` in a repo downloaded from Github, NPM complains about several security problems in the dependencies. Kinda feels like an unending ‘cry wolf’ situation.


Exact the same analogy Dan Abramov used in his blog: https://overreacted.io/npm-audit-broken-by-design/


I wish NPM had an option to disallow pulling dependencies from accounts that don’t use 2FA. Even if you trust a publisher it is so easy for their packages to get compromised because they are lax about security.


That the package is downloaded 7 million times a week exacerbates the impact significantly. The CI/CD trend of pulling everything from upstream over and over for builds (despite caching, as the absurd NPM download counts indicate) ensures that any upstream issue is distributed widely and frequently.

Continuous Integration and Continuous Deployment of vulnerabilities.


This is the price of free code. If you don't read the open source you're adding to your system then don't complain when it owns you! You wouldn't take your medicine without reading the label first.

NPM is a miracle for open software development, but it requires diligence.


Weird comparison. The equivalent with medicine is not 'reading the label', but understanding the chemistry, reading all the research, reading all the associated papers, understanding all the biology, etc.

It's reduced to you needing to only read the label by regulation.

Expecting people to read the source of their dependencies, outside of very specific use cases and industries, is a lost cause.


Imagine if you needed a doctoral degree and a billion dollar FDA study before being able to publish open source code.


The difference is that NPM packages are provided for free.


Many drugs are free or almost free where I live. Well, not really free, they're paid by our taxes. And npm is paid by Microsoft not paying theirs. So the difference isn't so big.


"And npm is paid by Microsoft not paying theirs."

LOL ok.


Amusingly enough I guarantee you that 90%+ of people only ever read the label on their medication when there’s absolutely nothing else to read in the bathroom.

It’s assumed the doctors and the pharmacist have done the reading for you.


Unless these are compiled binaries it should be fairly easy to spot funny-business. Surely someone out there must do diffs on version updates to see what changed. I know that I am nosey enough to see what changes people made rather than relying on change logs. Either way it would be nice if NPM devs signed their scripts. Nothing as complex as rpm/deb packages, just corresponding .asc files might be useful for the small handful of people that validate signatures. Or if public key management is a PITA and too much friction, then even a signed manifest file on a trusted origin server vs. mirrors could be super handy. The code deployment systems could curl the manifest and at least validate the checksums of the files.


I'm surprised there isn't some machine learning system for this.


This is what I don't like about JS world.

I started a project in react and it installed like 100+ random packages. I had to audit everyone, so silly not to have a standard library.


You want a standard library for... React?

What about the people that like Vue or angular more than react?


I think the suggestion was that there's this crazy explosion of transitive dependencies because JS is not 'batteries included' with some standard library that covers quite a lot of what people want.

If there's no standard library, you have to include some libraries to do what you want. Those libraries had no standard library to depend on either, so they had to do the same thing you did. And so on.


The JavaScript culture around dependencies is baffling to me.

The simplest of tools will easily have hundreds of transitive dependencies, making these sort of events a mess to deal with, and causing the blast radius to be avoidably large.


Left-pad was already a warning sign that something is seriously wrong with the culture. https://news.ycombinator.com/item?id=11348798


The worst thing about it is that now left-pad is in the standard (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...), but that's it. is-odd has more than 400k weekly downloads https://www.npmjs.com/package/is-odd.


https://github.com/helpers/handlebars-helpers/issues/315

This is the only package that even uses (is even includes is odd) it while having more than 10 weekly downloads.


IMO this is largely a side effect of the JavaScript standard library being so anaemic - if available functions were bolstered significantly, it would eventually have some impact on this madness.


I'd encourage everyone to try the npm fund command (https://docs.npmjs.com/cli/v7/commands/npm-fund) on their repos to see how many projects need funding. All of these projects are an opportunity for a malicious actor to offer a lump sum of money to get the package.


I'm usually pretty critical of tiny NPM modules, but I really want to support whoever maintains this package.

Hundreds of lines of regexes, people asking about IE8 support, and a constantly moving target every time a browser changes something. This really must be a horrible package to maintain.


That’s their own fault. No one needs to “detect” random esoteric browsers. Once you have 4/5 detections you should be good. IE8’s UA hasn’t changed in forever, so you could just copy-paste one of the many regexes online and keep it unchanged.


A stupid dependency for user agent parsing.. just becoming more and more disillusioned with the js ecosystem. And then this. And at work the junior devs turning to random npm packages for every little thing.


Seems like a good time to plug my project https://gitlab.com/mikecardwell/safernode


I was actually about to use this package last week. Damn what a timing.


- steals cookies and browser data on windows: this means your browser cookies and all logged in accounts might be compromised. Maybe even passwords saved in browser. I'd even extend to password manager extensions.

- crypto miner on both platforms

- maybe: changes registry for existing crypto miners




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: