That said, I wish more people would talk both sides. Yes, every dependency has a cost. BUT the alternatives aren't cost free either. For all the ranting against micropackages, I'm not seeing a good pro/con discussion.
I think there are several lessons to be learned here (nixing "unpublish" is a good one, and I've not been impressed with the reaction from npm there) the most important of which is probably that we should change our build process: Dev should be pulling in updates freely to maintain the easy apply-fixes-often environment that has clearly been popular, then those should be pinned when they go past dev (to ensure later stages are consistent) and we should have some means of locally saving the dependencies to reduce our build-time dependency on package repos.
Sadly, though, I've not seen a lot of discussion on a reasonable way to apply those lessons. I've seen a lot of smugness ("Any engineer that accepts random dependencies should be fired on the spot", to paraphrase an HN comment), a lot of mocker ("haha, look at how terrible JS is!"), and a lot of rants against npm as a private entity that can clearly make mistakes, but not much in the way of constructive reflection.
Clearly JS and NPM have done a lot RIGHT, judging by success and programmer satisfaction. How do we keep that right and fix the wrong?
For the sake of discussion, here is my set of best practices.
I review libraries before adding them to my project. This involves skimming the code or reading it in its entirety if short, skimming the list of its dependencies, and making some quality judgements on liveliness, reliability, and maintainability in case I need to fix things myself. Note that length isn't a factor on its own, but may figure into some of these other estimates. I have on occasion pasted short modules directly into my code because I didn't think their recursive dependencies were justified.
I then pin the library version and all of its dependencies with npm-shrinkwrap.
Periodically, or when I need specific changes, I use npm-check to review updates. Here, I actually do look at all the changes since my pinned version, through a combination of change and commit logs. I make the call on whether the fixes and improvements outweigh the risk of updating; usually the changes are trivial and the answer is yes, so I update, shrinkwrap, skim the diff, done.
I prefer not to pull in dependencies at deploy time, since I don't need the headache of github or npm being down when I need to deploy, and production machines may not have external internet access, let alone toolchains for compiling binary modules. Npm-pack followed by npm-install of the tarball is your friend here, and gets you pretty close to 100% reproducible deploys and rollbacks.
This list intentionally has lots of judgement calls and few absolute rules. I don't follow all of them for all of my projects, but it is what I would consider a reasonable process for things that matter.
[edit: I should add that this only applies to end products which are actually deployed. For my modules, I try to keep dependency version ranges at defaults, and recommend others do the same. All this pinning and packing is really the responsibility of the last user in the chain, and from experience, you will make their life significantly more difficult if you pin your own module dependencies.]
Originally we used to simply check in the node_modules folder.
Now I check in the npm-shrinkwrap.json (sanitised via https://www.npmjs.com/package/shonkwrap), and then use a caching proxy between the CI server and the real npm.
There's a bunch of choices available for this proxy, I've used one called nopar, but sinopia is also popular. Both Artifactory and Nexus can also be configured to do this, as well as act as caching proxies for a number of other package systems too.
One thing that would be useful to this debate an analysis of a language ecosystem where there are only "macropackages" and see if the same function shows up over and over again across packages.
Look no further than C++, where nearly every major software suite has its own strings, vectors, etc. implemented, frequently duplicating functionality already implemented in (1) STL, and (2) Boost. I seem to recall that the original Android Browser, for example, had no fewer than 5 kinds of strings on the C++ side of the code base, because it interfaced with several different systems and each had its own notion of what a string should be.
The advantage (or disadvantage) of including common functionality in macro packages is that you can optimize for your particular use case. In theory, this may result in more well-tailored abstractions and performance speedups, but it also results in code duplication, bugs, and potentially even poor abstraction and missed optimization opportunities (just because you think you are generalizing/optimizing your code doesn't mean you actually are).
Clearly, we need some sort of balance, and having official or de facto standard libraries is probably a win. Half the reason we're even in this situation is because both JS and C++ lack certain features in their standard libraries which implicitly encourage people to roll their own.
In many ways problem exists because there used to be different ideas in how strings should be set up. Today we mostly decide on UTF-8 and only convert now to legacy APIs if needed. This is a bad comparison because it's literally caused by legacy code. C++ projects cannot avoid different string classes because of that.
When a company goes public, there are lots of regulations. A lot of people rely on you. You can't just close up shop tomorrow.
When your software package is released as open source, it becomes an issue of governance. If you can't stick around to maintain it, the community has a right to appoint some other people. There can be professional maintainers taking over basic responsibilities for various projects.
Please read this article I wrote a couple years ago called the Politics of Groups:
This is a general problem. Here is an excerpt:
If the individual - the risk is that the individual may have too much power over others who come to rely on the stream. They may suddenly stop publishing it, or cut off access to everyone, which would hurt many people. (I define hurt in terms of needs or strong expectations of people that form over time.)
Note that this isn't a cost-benefit approach. It simply asks what needs to be the case for you to accept third-party code. Every project is going to answer this slightly differently. I would hope that running a private repo, pinning versions, learning a bit about the author and some sort of code review would be the case for at least most, but apparently many folks feel OK about the unrestricted right of net.authors, once accepted into your project, to publish code at seemingly random times which they happily ingest on updates.
A lot of coders seem to see only the "neat, I don't have to write a state machine to talk to [thing]," or "thank god someone learned how to [X] so I don't have to." But that, combined with folks who don't manage their own repos or even pin things, leads to folks who's names the coders probably don't even know essentially having commit privileges on their code.
>Note that this isn't a cost-benefit approach.
Eh, it is a cost-benefit approach if the appropriate circumstances are "where the benefit outweighs the cost/risk." And I don't know any good way to answer it without doing that.
What are your own examples of specification of such circumstances that don't involve cost-benefit?
Unless you just decide, when asking the question like that, "Geez, there's no circumstances where it's appropriate" and go to total NIH, no external dependencies at all (no Node, no Rails, no nothing).
What I was getting at is that "under what conditions" is a baseline gating requirement that needs to reflect the nature of what you're building. If I'm building something that is intended to replace OpenSSL, my gating conditions for including third party code is going to be a lot different than if I'm building what I hope is the next flappy birds' high-score implementation.
People have all sorts of gating functions. I don't write code on Windows, because I never have and see no reason to start. Baseline competency of developers is another one. You can view baseline requirements as part of the cost-benefit if you like, but generally, if the requirements need to change to make a proposed course of action "worth it", you probably need to redo the entire proposal because the first one just failed. (If your requirements are mutable enough that ditching some to make the numbers work seems acceptable, I have grave doubts about the process and/or the decision making.)
I basically included my own examples already: a private repo to freeze versions in use and host our own forks when appropriate, code review by at least one experienced developer of any included modules, and a look at the developer.
 This is pretty fuzzy (the process, not necessarily the developer). I generally google-stalk them to see what else they've worked on, if there have been public spats that lead me to believe the future upgrade path might be trouble for us, whether they have publicly visible expertise in what they're coding, etc. Basic reputation stuff, plus evidence of domain expertise, if appropriate.
Without being clear about that, it can sound like your conditions are just about having a certain internal _process_ in place, and not about any evaluation at all -- if the process is in place, the condition is met. Rather you're saying you won't use any dependencies without reviewing them more than people typically do and deciding they are okay.
The problem of HOW you decide if something is okay still remains; although the time it takes to review already means you are definitely going to be using less dependencies than many people do, even before you get to the evaluation, just by having only finite time to evaluate. And you'll be especially unlikely to use large dependencies, since they will take so much time to review. (Do you review on every new version too?)
And you still need to decide if something is even worth spending the time to code review. I'd be surprised if expected 'benefit' doesn't play a role in that decision. And, really, I suspect expected benefit plays a role in your code review standards too -- you probably demand a higher level of quality for something you could probably write yourself in 8 yours, than for something that might take you months to write yourself.
What you do is different than what most people do in code reviewing all dependencies, and making sure you have a private repo mirror. I'm not sure it's really an issue of "conditions instead of cost-benefit analysis" though. You just do more analysis than most people, mainly, and perhaps have higher standards for code quality than most people (anyone doing "cost benefit analysis" probably pays _some_ attention to code quality via some kind of code review, just not as extensive as you and without as high requirements. If you're not paying any attention to code quality of your dependencies, you probably aren't doing any kind of cost-benefit analysis either, you're just adding dependencies with no consideration at all, another issue entirely).
Try this. Say you want to build a house. You plan it out, and while doing so write requirements into your plan. One is that you want zombie-green enameled floors, and a second is that it needs to comply with local construction ordinances.
Neither of those are costless. It is possible that your green floor related urges are something you'll compromise on - if it is too expensive, you'll suffer through with normal hardwood. This, to me, looks like a classic cost-benefit tradeoff - you want to walk on green enamel, but will take second-choice if it means you can also have a vacation instead of a bigger mortgage.
The building code requirement looks different. While you can certainly build your home while ignoring codes, that is usually not a very good strategy for someone who has the usual goals of home ownership in mind. Put a different way, unless you have rather unusual reasons for building a house, complying with building codes is only subject to cost-benefit insofar as it controls whether the home is built at all.
Does this better illustrate where I'm coming from?
When one begins a coding project, hopefully one has a handle on the context in which is to be used. That knowledge should inform, among many other things, expectations of code quality. Code quality in a given project is not just the code you write, but also the code you import from third parties. If you don't subject imported code to the same standards you have for internally produced code, you have a problem, in that at the least you don't know if it is meeting the standards you set.
So… I agree that, from one perspective, every decision is cost-benefit, including eating and putting on clothes when you leave the house. I think it is useful, however, to distinguish factors for which tradeoffs exist within the scope of succeeding in your goals from factors that amount to baseline conditions for success.
As an aside, if you wanted to jump up and down on this, I'm surprised you didn't take the route of asking how far down the stack I validate. What's _really_ running on that disk controller?
 I do hope we can avoid digressing into building code discussions.
 "I don't care" is a perfectly fine standard, too, depending. Not everything is written to manage large sums of money, or runs in a life-or-death context, or handles environment control for expensive delicate things, or... I absolutely write code that I'm not careful about.
Pretty much the only thing needed to accomplish this is to discourage the use of the global flag, and a few strong notes in the documentation and tutorials to check in your node_modules folder that remind people that if you don't check it in, you have established a build time dependency on NPM being up/correct.
edit: corrected spelling
Indeed. Unfortunately, they've mostly been learned by others before this, and ignored or forgotten. If you want to see how these issues have been dealt with in the past, look to a project that's been doing the same thing for much longer, such as CPAN and the Perl module ecosystem. It's been around for quite a while, and is very mature at this point. Here's a few features that I would hope other module ecosystems would strive for:
- Mirrored heavily, easy to become a mirror. 
- Immutable by default, older packages migrated to special archive version of system which does not archive older packages. 
- Indexed in multiple ways (author, name, category) and searchable. 
- All modules tested regularly on a matrix of multiple operating systems, language versions, and module releases.  There's also often a few different architectures tested to boot (IBM/z, Sparc), , but that may be quite a bit less regular.
- Static analysis of modules as a service to authors to help them possibly identify and fix problems, as well as reporting . 
- Module installation defaults to running all tests and failing to install if tests fail.
- The ability to set up a local private CPAN mirror for yourself or your company (you can peg your cpan client to a specific mirror). 
Undoubtedly there are features of other module ecosystems that CPAN doesn't do, or doesn't do as well, and could look to for a better way to accomplish some things, but that's the whole point. None of this exists in a vacuum, and there are plenty of examples to look to. Nothing makes any one system inherently all that different than another, so best practices are easy to find if you look. You just need to ask.
6: http://cpants.cpanauthors.org/ (CPANTS is distinct from CPAN Testers)
This is what I was mentioning in the other thread (and being called a troll for... sigh). I appreciate the idealism of "if we have micromodules, we don't have to reimplement common helper functions, which scales to thousands of bytes saved!". But in practice, there's craptons of duplicate dependencies with different versions. Which negatively scales to hundreds of kilobytes wasted. In code, in downloads, in install time, in developer time (because devs install things too. A lot more than end users in fact...), etc.
One of the many problems which means what's on paper doesn't at all correspond to what we actually get.
If you have 5 similar string formatting libraries that depend on leftpad, you could collapse them into one, and have a single instance of leftpad inside the combined library. Less licenses, less READMEs, less time downloading, and with tree shaking you still get similar end result.
In practice, you need to balance the overhead of an extra dependency with the benefits from sharing versions with others. When you add a dependency, you also now have a larger attack surface. Any extra dependency adds some amount of friction. Sometimes it is negligible. But, if you look at the overhead across all of your modules, it can add up quickly.
In some cases, the benefits from a dependency outweigh the overhead costs.
In other cases, you just write your own leftpad function and move on.
You could call it a "standard library."
Previously if you wrote a frontend library and wanted to depend on another library you had to either tell developers to install it too, or bundle it in your library (not a great idea), but either way the file size it added was obvious.
Now you can just drop one line in your package.json, which is super convenient, but obscures the cost of that dependency, and all of it's dependencies.
The main question is about caching. Why not just have signed versions of everything floating around?
There's plenty of good libraries like lodash and math.js that are pretty much the next best thing to a standard library.
The problem was fixed 10 minutes later anyway. This whole discussion surrounding this is a combination of knee-jerk reaction, "waah", and realization of "Oh shit, depending on external code means we are dependent on external code!"
If you want to code without dependencies, go write JavaEE. Everything is included, you don't need any third party dependencies, and you can use cutting-edge tech like JSP, app servers and JavaServer faces.
What about the fact that React and other popular projects have apparently bought into this ecosystem? You can't use them without also using their dependency tree.
Dependencies aren't bad in and of themselves, but when you've lots of tiny dependencies, the costs mount rapidly.
It's a matter of taste, and Node accommodates various tastes.
Sure, npm gives you plenty of rope to hang yourself with, but you can't really blame people for pointing out that what you were doing was a bad idea when you end up dangling from the noose you just created.
Oh I agree. But maybe the answer is neither extreme? A healthy ecosystem with both established "standard" libraries and larger than single-line packages.
How is it even remotely similar? In one case it's all your code and in the other it's none of your code.
This has never yet quite worked out in software. Object-orientation was part of the resulting research effort, as are UNIX pipelines, COM components, microkernels and microservices. When it goes wrong you get "DLL Hell" or the "FactoryFactoryFactory" pattern.
But really this is the fault of the closed source browser manufacturers, who prefer to attempt lockin over and over again through incompatible features rather than converge on common improvements.
But remember that things like computing a KWIC index used to be real problems back in the day that required serious programmer work. They have become trivial thanks to better libraries and better computers.
There's currently a single major browser engine that is still closed source, EdgeHTML (And with the current trend of open sourcing things at Microsoft, this might change very soon)
Plus the standard bodies were created to prevent that. After a significant stagnation in the mid-2000s, that was ended by the WHATWG, we're getting amazing progress.
An unrelated Cisco study of code reviews found that 200–400 LOC is the limit of how many LOC can be effectively reviewed per hour. Applying the findings of these studies suggests that neither functions nor patches/pull-request diffs should not exceed 200 LOC. FWIW, I have worked on commercial software that had functions many thousands of lines long! :)
This was my response as well:
> The combination of a micro-library culture, “semver” auto-updates, and a mutable package manager repository is pretty terrifying.
Either of the second two properties are dangerous on their own, but culture of micro-libraries compounds the problem.
The actual issue has to do with trusting a package of any size over time. This is true regardless of whether the package implements 1 line of code or 1000.
The trustworthiness of a package is a function of several factors. Code that is not actively maintained can often become less trustworthy over time.
What we need is one or more 3rd party trust metrics, and our bundling/packaging utilities should allow us to use that third party data to determine what is right for our build.
Maybe some of us want strong crypto, maybe others of us want adherance to semver, maybe others want to upgrade only after a new version has had 10K downloads, maybe others only want to use packages with a composite "score" over 80.
On the continuum of code quality from late night hack to NASA, we all must draw a line in the sand that is right for a particular project. One size does not fit all.
It's a big mistake (as well as a profound example of bad reasoning) to blame micropackages. The size of the package has nothing to do with it. Any codebase with any number of dependencies faces some risk by trusting the maintainers or hosting of those dependencies to third parties, which is the problem we need to do a better job of solving.
The issue is measuring the trustworthiness of a dependency, and recursively doing that operation throughout the dependency graph.
Simply focusing on the number of dependencies or the size of a dependency is silly.
How many developers here would gladly add a rogue "dependency", like a developer they had never spoken to before, into their project without some care? And yet the willingness to open the front and literal back doors of the project to so many dependencies, like low-quality functions-as-a-module is astounding.
I guarantee that the weaknesses of the NPM ecosystem are already known and exploited by bad actors. There are people who earn large 6 figure salaries / consulting fees for finding and exploiting these issues. This is a wakeup call that we need to do something about it.
But it does not spiral out of control nearly as bad as any attempt at frameworks on NPM, because unlike Node almost every Flask extension depends on three things - Flask, the external package the extension attaches to( ex: WTForms) and Python's standard library.
A similar Node package would depend on possibly hundreds of tiny one liners to replace the absence of standard library.
For example: "I, mbrock, think that pad-left v1.0.3 with hash XYZ seems like an uncompromised release."
Then the tool that upgrades packages could warn when a release isn't trusted by someone you trust (or transitively via some trust web scheme).
The approval system becomes like a "release review" process.
Trustworthy people wouldn't approve releases automatically... but then, who's trustworthy?
Like Pynchon wrote, paranoia is the garlic of life...
But there's a philosophy and mindset in JS which is relatively unique to other languages which is to optimize for download size to the browser (noting in another comment, tree shaking its not entirely yet "a thing"). Users don't want to include all of Lodash (or a stdlib) in their download package to just use _.isArray. Now Lodash does also provide the ability to use subpackages specific to a function (lodash.isArray) can be loaded as an independent package to mitigate that download hit but its relatively unique in that sense.
I find a good way to think about things is that every single dependency you have adds another set of people you have to trust.
You're trusting the competence of the developer (i.e. that the library has no security flaws), you're trusting their intent (i.e. that they don't deliberately put malicious code into the library) and you're trusting their Operational Security practices (i.e. that their systems don't get compromised, leading to loss of control of their libraries).
Now when you think about how little you know about most of the owners of libraries you use, you can see possibility for concern.
The bit I disagree with the article about is signing. I personally think that developer signing is a useful part of this as it takes the repository owner out of the trust picture (if done correctly). Without it you're also trusting the three items above for the repository provider and it's worth noting that a large software repo. is a very tempting target for quite a few well funded attackers.
Docker at least has provided some of the technical pieces to address this in their repositories with content trust...
Oneline function, but 20 lines of tests and another 20 test environments.
The solution should not come from people that write only front-end JS code. I'm waiting for a response by the libraries that were broken by left-pad.
Your stdlib is bundled with the language runtime and contains binaries, OS interactions and the necessary fundamental stuff. Updating this is hard, because this depends on hard things like the OS, and because it must maintain compatiblity with everything.
However, the commons-package can add all kinds of pure (as in, just the language in question) stuff to the language, such as several dozens of collections, all kinds of utility languages, and so on. Updating this would be a lot less risky than updating the entire language runtime, especially if it uses proper semantic versions. This would allow this lib to iterate faster and include these utility-functions easier as well.
This is one approach we used to deal with this last year, for example, on build/devops side: https://medium.com/@ojoshe/fast-reproducible-node-builds-c02...
1) Bundle everything in your distribution. Not unreasonable, but would be nice to have a hybrid protocol that lets the publisher store a signed copy of all the dependencies but only send them on request (so less duplication is sent over the wire).
2) Have the same as 1 but in dedicated "mirrors" and persistent distributed storage a la freenet. Files are only deleted if there isn't enough space on the network and they are the least--
Personally, I much prefer a single nice, complete, well-polished module that works perfectly over lots of tiny modules which are awesome individually but which suck when used together.
I don't really understand why there isn't a stdlib of these "micropackages" that can be downloaded to save a lot of effort.
This is not true. NPM locally caches every module the first time it is downloaded.
Therefore with widely downloaded modules such as isarray, it is very likely it has already been downloaded on the local system and so is pulled from the cache.
The actual percentage of fresh downloads from NPM in a real-world deployment is overwhelmingly small.
This is literally coming from the npm website. Presumably that number is what it says: downloads.
People should be upset that NPM will just take your module away if they feel it's appropriate. The whole reason left-pad was unpublished has been completely ignored.
There are so many one line packages:
https://github.com/chalk/ansi-regex/blob/master/index.js (my favourite)
And I ran out of willpower, at only L. Seems to me the complete lack of any decent standard library has caused this Cambrian explosion of packages, and the overhead is astounding. Sure it's appealing to google "nodejs validate ip", then run "npm install is-ip" and use it with "require('is-ip')", but fuck me how wasteful do you want to be. My frontend ember app ends up installing over 500mb of dependencies (most of which is useless test files and other redundant fluff files). How has this happened?
What's to stop one of those one line packages adding a malicious one liner that concatenates and uploads your super-secret private source code to somebodies server? You're really trusting the complete integrity of your codebase because you depend on "is-array", because you can't be bothered to write "x.toString() === '[object Array]'", and JS introspection (which seems so (ab)used) is so broken that this is needed? Woah.
([1,2,3]).toString() => "1,2,3"
Because type introspection is so ugly in JS, but so needed, that it's hidden behind 'require('is-array')' to make it palatable. Ridiculous.
40 years of computer experience as EE, coder, IT security, DBA tells me; when IT moved from a way to do work (science) to a thing of its own (marketing); this happened during Dot-Com bubble, time to market became the goal and security was tossed. You here this in mantras like:
Push NOW fix latter.
Fail fast and often.
I say: Secure, Complete, Fast - Pick two.