Hacker News new | past | comments | ask | show | jobs | submit login
Small world with high risks: a study of security threats in the NPM ecosystem (acolyer.org)
74 points by godelmachine on Sept 30, 2019 | hide | past | favorite | 48 comments

What the npm ecosystem really needs is something like the distinction between Ubuntu's "main" and "universe" repositories, so that you have a smaller subset of known-good packages with harmonized transitive dependencies, stronger centralized curation, a lot more direct scrutiny, tighter control over versioning, and some party that is responsible for addressing vulnerabilities and ensuring appropriate maintainership. If you could rely on that for core functionality and only needed to go outside of it for the long tail of more specialized things, it would be a lot cleaner and safer than what we do today.

Linux distros have many, many years of experience curating packages at scale. It is crazy to me that the node community just totally disregards all of the best practices learned by the Linux distros in favor of practices that are lazy and unambiguously dangerous.

Intel and Nokia were trying to add repository trust levels to rpm back in 2010/2011.

To this day, if you can add a new repository, it is fully trusted and can provide trojaned updates to any core package. Bump the version number to something higher than what the core distro would use and you can persist for quite a long time. I remember pointing this out at a Nokia project (would have been no later than early 2008) as an attack vector via apt repositories.

We wanted to prevent a rogue or compromised repository from being able to provide an update to, say, glibc or libstdc++. Never got to work on that - Elopcalypse took it all down.

> To this day, if you can add a new repository, it is fully trusted and can provide trojaned updates to any core package.

Invent idiot-proof security and someone will invent a better idiot. Even if you have a "trusted=1" flag for repos, you can be sure people will set external repos to be just as trusted as the core ones without a second thought as soon as they stop them from doing what they want to do (whatever that is, or however in/sane it is).

It's maybe not the most elegant option, but for security-conscious sysadmins you could already today disable all repos ad hoc and run your "system wide update" only from certain whitelisted repos, while still allowing specific one-off package installs from third party repos.

By adding a third party repo you have already said you trust that repo to install software, probably as root, on your machine. It doesn't get much worse than that from a security standpoint anyway right, will "trust flags" really make much of a difference to the wider security issue?

Sorry if I'm making too many assumptions here or thinking out loud.

> Even if you have a "trusted=1" flag for repos, you can be sure people will set external repos to be just as trusted as the core ones

That was the thing - you couldn't.

I don't remember which way the level hierarchy went, but the idea was that rpm could not be executed directly even by root (LSM prevented that), and all package installation logic was confined to Zypper/libzypp. Packages and repos could declare their security level. Trying to pull in an upgrade to a LEVEL(core) package from a LEVEL(media) repository would fail. Zypper simply would not allow to install a package from a lower-security repository over an installed package that had come from a higher-security origin. Manually added LEVEL(core) repos would have been ignored.

There were a few layers of applied crypto, package signatures and immutable keyrings involved too. I had a prototype Zypper/libzypp with package-level signature verification almost ready just before my vacation and planned to polish it to an RFC after coming back. Best laid plans... During those two weeks the entirety of Nokia's linux development had been axed, maemo killed and thousands of good embedded systems engineers were suddenly looking for new jobs.

Funny thing, though. Zypper's source code was almost pleasant to work with after having seen the Lovecraftian horrors that made up libapt.

Take a look at Debian's package pining.

It's not a security feature, but a convenience one for forbidding repositories from updating the packages from the ones you trust more (with correctness). Yes, a binary flag would be useless. What the GP wants also does not add any security, but it's an important part of a stable system that language-specific repositories have been overlooking for a while.

Debian-based distros can use package pinning to this effect, but since it is an opt-in feature for advanced users, it is not a full solution. I'm mentioning it here just for sake of completeness.

Totally second this. I see novice web developers writing in Typescript in an effort to be more type save and at the same time using hundreds of npm packages that are often written by amateurs that make basic mistakes.

Just take an average startups web project node_modules directory, what's inside there? Hundreds and hundreds of packages of which most are dependencies of other packages. Anyone could have written it! Novice devs swear by using Typescript, but at the same time using hundreds of black boxes that can easily contain stuff way more damaging that a string applied to a number..

Remember left-pad? That was an easy one to fix, but still caused damage at large scale. What about a vulnerability in a larger and more complex package, owned by some bad party?

For me the biggest problem is webpack. Do you know an alternative bundler with less dependencies and typescript support? I already use webpack without the webpack-cli package in most of my projects in order to avoid the extra dependencies.

I'm currently investigating rollup because it only has 3 dependencies (2 of which are @types)

I'm trying to use Vue in a secure web app, and Webpack is a nightmare. I have to trust hundreds of dependencies to use it at all. Trying to avoid this and use some other method of compiling multiple .vue single-file components into a larger single js file is proving tricky (to say the least). How did the JS ecosystem end up in this situation?

I recommend rollup for libraries and parcel for web apps.

I'll 2nd the rollup recommendation for libraries, but unfortunately parcel has the same (or worse) problems as Webpack with it's proliferation of dependencies.

Rollup does have some issues with webapps since it's so heavily oriented toward JS-as-entrypoint, but tbh Webpack has the same issues; it just works around them by layering some extra complexity on top in the form of plugins (which you can actually do relatively easily in rollup too, though it's still not ideal in either situation).

Edit: Just measured those dependencies for comparison.

- rollup: 3 dependencies (all top-level, none have subdependencies. 2 are just typings which contain no executable code)

- webpack: 425 dependencies (23 top-level)

- parcel: 837 dependencies (57 top-level)

I haven't used parcel, so can't comment on its function. By the sounds of their website they're focused on speed, so if performance is your only concern they might be a good shout. If you're looking to reduce your security surface area, their approach to dependency management seems pretty irresponsible.

Yeah I actually didn’t realize parcel had so many dependencies. I choose it over webpack not for speed but because it’s highly opinionated and requires essentially zero config.

So I guess I’ll change my suggestion and say just use rollup

Parcel has 57 dependencies... (not even counting transient dependencies)

That sort of defeats the purpose of switching from webpack.

And for libraries: Why do you even need a bundler? Can you not distribute the library as multiple files?

Yeah I made some incorrect assumptions about the nature of parcel, foolish mistake on my part.

For your other question it depends on your library. If you’re only targeting ESM or CJS sure, if you need a UMD or want separate outputs for separate targets a bundler like rollup can help prevent a lot of pointless boilerplate.

Isn't parcel a front-end for webpack?

> It is crazy to me that the node community just totally disregards all of the best practices learned by the Linux distros in favor of practices that are lazy and unambiguously dangerous.

"Move fast and break things."

The whole Javascript ecosystem depends on high churn rapid adoption of "new" technologies. Stopping to check things would slow this down. Higher-QA technologies get outcompeted by lower-QA technologies.

> Higher-QA technologies get outcompeted by lower-QA technologies.

While both are still relatively strong, it seems that JS's recent (~8y) surge in popularity over established mainstays like Java may stands counter to that point.

JS has a unique position in being the only supported language in browsers after Java Applets, Flash, and ActiveX were all killed off (mostly for valid reasons).

Given that the frontend has to be written in Javascript, it's tempting to use it for the backend as well.

The software industry is where quality goes to die, if anything the surge in popularity proves the point

>Linux distros have many, many years of experience curating packages at scale. It is crazy to me that the node community just totally disregards all of the best practices learned by the Linux distros in favor of practices that are lazy and unambiguously dangerous.

"Screw your wheel, we can invent invent our own" is not exactly a surprising strategy when it comes from the language/development community that is mostly building framework after framework that gets abandoned in a year or two.

In the React Native ecosystem, the Expo project https://expo.io is doing a great job at providing this kind of centralized curation, with monthly release cycles, constistent APIs and dealing with the version hell that makes "raw" React Native such a pain to work with.

Security has always been one of my biggest worries when using node. The mentality in the javascript world seems to be to simply not care.

To just use whatever libraries. On all the node.js projects I have worked on, none has had any security aspects when importing a new library. That is a bit scary since many libraries has a ton of dependencies themselves.

Not once have I even heard a discussion about a security topic or perhaps maybe vetting dependencies and similar stuff.

or maybe just make a standard library (with a tool to pick only what's necessary for the web) - I think the worst thing is the incredible number of packages just replacing the node-native modules. One would suppose that in a "always-run-on-the-latest-version"-ecosystem one would try to integrate those into the mainline, but apparently some people really like a filled package.json...

Also if you look close, a lot of commonly required modules in the big projects are just used once in an act of "code"-spam by interested parties...

EDIT: just have a look at this: https://github.com/eslint/eslint/commit/55bc35dcd2dc3987cc77... - let's use cross-spawn, if you could use this node-native function (available for about a 100 years...) https://nodejs.org/api/child_process.html#child_process_chil...

For basic JS functions, I much prefer to write my own than rely on someone else's. A low number of dependencies is a factor for me when choosing a package (if there are options).

Quite a few packages cover basic functions, such as type identification, array handling, etc... which have only a few lines of relevant JS code but are used by many other packages. That seems to unnecessarily increase the risk to my project of the module disappearing or becoming a security threat. There's something to be said for rolling your own helper functions.

There is also work ongoing to isolate libraries at run time so they can’t do anything see https://www.infoq.com/presentations/npm-security-realms-ses/

Unix based operating systems have pretty great file system security and almost everything is based on the file system, and you can use unix sockets for networking. So you can for example create a new user, give the user access to only the folders it needs. Then run the program as that user. You can also from within the app itself read configuration files, etc, then chroot, and setuid to continue as a non privileged user. Then you can use Apparmor or se_linux to fine tune access.

But you can't apply any of that to one particular library that a program uses, while not restricting the rest of the program.

Yeh, you could spawn new processes with different access levels, but it might add too much complexity to the app. It would be cool if we could require modules like this:

    foo = require("excitingModule", {fs: true, net: true, os: true});
It wouldn't be that hard to implement either.

The creator of Node is currently working on a runtime to enable exactly that: https://deno.land

    deno --allow-net https://deno.land/std/examples/echo_server.ts
This would be the same as running something with an unprivileged user:

    sudo -u otheruser node echo_server.js
Deno however takes it to a whole new level by running server code directly from the web =)

You can scope it further:

  deno --allow-net= https://deno.land/std/examples/echo_server.ts
Or even provide a list of addresses:

  deno --allow-net=,localhost https://deno.land/std/examples/echo_server.ts

On Linux you can use namespaces

    ip netns exec networkname node script.js
Still only on the entire app though, the idea was how to restrict single modules and their dependencies, not your whole app! Something like:

    const foo = requires("bar", {fs: true, net: "", os: true});

you know the basic working principle of Linux-Namespaces? And Iirc the JS-intepreter is not spawning a subprocess with a communication pipe for each require? So good luck with your "easy implementation" - you could also just create a standard library/make PRs removing unnecessary cruft from existing high-level packages.

And if you think creating an interpreter in the interpreter is the solution, I'm quite sure the thing will not be secure if it is a natural child of the JS ecosystem.

I wonder if a system similar to the way Kosher certification might be applicable here. A company could continuously scan packages for threats and host the scanned packages which could be accessed via subscription.

Why should we trust a company over the community? And how accurate could be the scanning? What if something is missed or if there are a lot of false positives?

I do wonder how frequently and thoroughly the community is looking at the packages though. Perhaps I'm in the minority but I really only look when I'm debugging or trying to write a similar function myself to remove a dependency. But as a fairly novice and self-taught developer I'm not sure I'd recognize a security threat even if I was staring right at it.

That is exactly my point. Studying code for security exploits is hard and tedious work. If I were developing a Node.js app, I would be glad to pay someone to do that for me.

You can have a contract with a company and take them to court for breach of service. If you don't like their work, you can switch to a different vendor. This isn't ideal but I prefer it to relying on the kindness of strangers.

"Altruism is a fine motive, but if you want results, greed works much better." - Henry Spencer

why don't you use podman, bubblewrap or just straight namespaces?

Because I'm familiar with Docker

I wonder if it would be worth it for a set of large companies who depend on the npm ecosystem to get together and jointly fund security auditing for the top packages.

"vetting the most dependent upon 1,500 packages would reduce the ITP ten fold, and vetting 4,000 packages would reduce it by a factor of 25"... I wonder how accurate could be this automated package vetting?

NPM should use The Update Framework (TUF) [1] and in-toto [2] in order to protect its users from attacks against the registry itself. A notable list of adoptions include:

* Datadog is using both TUF and in-toto to defend against attacks between developers and end-users of its Agent integrations [3].

* Docker Content Trust is protecting users from a compromise of Docker Hub itself [4] using a version of TUF [5].

* Uptane is a version of TUF that is being standardized to protect software updates for automobiles [6].

* PyPI is considering adopting TUF to protect users of Python packages [7].

* Cloud Native Application Bundles is standardizing the use of TUF and in-toto to protect users of cloud-native applications [8].

Disclosure: I am a security engineer at Datadog, and am involved with both TUF and in-toto.

[1] https://theupdateframework.com

[2] https://in-toto.io

[3] https://www.datadoghq.com/blog/engineering/secure-publicatio...

[4] https://success.docker.com/article/docker-hub-user-notificat...

[5] https://blog.docker.com/2015/08/content-trust-docker-1-8/

[6] https://uptane.github.io/

[7] https://github.com/pypa/warehouse/issues/5247

[8] https://github.com/deislabs/cnab-spec/blob/master/300-CNAB-s...

Who is ehsalazar? Is that actually a person publishing micro-packages, or what?

How does this compare to package managers such as Maven (Java) or CPAN (Perl)?

What is the vetting process for other package managers?

I thought this was written by certain company that is trying very hard on profiting creating FUD in the npm ecosystem. To my pleasant surprise it isn't and it's written by a new/different author. Scroll almost to the end and there it is, the author is a board observer in said company.

It's not FUD if the danger is real. In that case it's just everyone's responsibility to stay informed.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact