Hacker News new | past | comments | ask | show | jobs | submit login
Micropackages and Open Source Trust Scaling (pocoo.org)
291 points by s4chin on Mar 24, 2016 | hide | past | web | favorite | 87 comments



I think this is a serious set of reasonable thoughts about the incident, and don't want to demean the article.

That said, I wish more people would talk both sides. Yes, every dependency has a cost. BUT the alternatives aren't cost free either. For all the ranting against micropackages, I'm not seeing a good pro/con discussion.

I think there are several lessons to be learned here (nixing "unpublish" is a good one, and I've not been impressed with the reaction from npm there) the most important of which is probably that we should change our build process: Dev should be pulling in updates freely to maintain the easy apply-fixes-often environment that has clearly been popular, then those should be pinned when they go past dev (to ensure later stages are consistent) and we should have some means of locally saving the dependencies to reduce our build-time dependency on package repos.

Sadly, though, I've not seen a lot of discussion on a reasonable way to apply those lessons. I've seen a lot of smugness ("Any engineer that accepts random dependencies should be fired on the spot", to paraphrase an HN comment), a lot of mocker ("haha, look at how terrible JS is!"), and a lot of rants against npm as a private entity that can clearly make mistakes, but not much in the way of constructive reflection.

Clearly JS and NPM have done a lot RIGHT, judging by success and programmer satisfaction. How do we keep that right and fix the wrong?


I suspect you aren't seeing much discussion because those who have a reasonable process in place, and do not consider this situation to be as bad as everyone would have you believe, tend not to comment on it as much.

For the sake of discussion, here is my set of best practices.

I review libraries before adding them to my project. This involves skimming the code or reading it in its entirety if short, skimming the list of its dependencies, and making some quality judgements on liveliness, reliability, and maintainability in case I need to fix things myself. Note that length isn't a factor on its own, but may figure into some of these other estimates. I have on occasion pasted short modules directly into my code because I didn't think their recursive dependencies were justified.

I then pin the library version and all of its dependencies with npm-shrinkwrap.

Periodically, or when I need specific changes, I use npm-check to review updates. Here, I actually do look at all the changes since my pinned version, through a combination of change and commit logs. I make the call on whether the fixes and improvements outweigh the risk of updating; usually the changes are trivial and the answer is yes, so I update, shrinkwrap, skim the diff, done.

I prefer not to pull in dependencies at deploy time, since I don't need the headache of github or npm being down when I need to deploy, and production machines may not have external internet access, let alone toolchains for compiling binary modules. Npm-pack followed by npm-install of the tarball is your friend here, and gets you pretty close to 100% reproducible deploys and rollbacks.

This list intentionally has lots of judgement calls and few absolute rules. I don't follow all of them for all of my projects, but it is what I would consider a reasonable process for things that matter.

[edit: I should add that this only applies to end products which are actually deployed. For my modules, I try to keep dependency version ranges at defaults, and recommend others do the same. All this pinning and packing is really the responsibility of the last user in the chain, and from experience, you will make their life significantly more difficult if you pin your own module dependencies.]


These practices may not be as widespread as I assumed, but this is how I've been doing npm dependencies for the last few years.

Originally we used to simply check in the node_modules folder.

Now I check in the npm-shrinkwrap.json (sanitised via https://www.npmjs.com/package/shonkwrap), and then use a caching proxy between the CI server and the real npm.

There's a bunch of choices available for this proxy, I've used one called nopar, but sinopia is also popular. Both Artifactory and Nexus can also be configured to do this, as well as act as caching proxies for a number of other package systems too.


If you aren't using something like nopar or sinopia, you're really doing it wrong - I mean, if you're taking the micro-dependency route, surely you're not building your application as one monolithic chunk of code, right? So you need somewhere to publish your private modules to, anyway.


Essentially we're trying to figure out when it's appropriate for "my" code to become "everyones" code, and if there are steps in between. ("Standard library", for example.)

One thing that would be useful to this debate an analysis of a language ecosystem where there are only "macropackages" and see if the same function shows up over and over again across packages.


> One thing that would be useful to this debate an analysis of a language ecosystem where there are only "macropackages" and see if the same function shows up over and over again across packages.

Look no further than C++, where nearly every major software suite has its own strings, vectors, etc. implemented, frequently duplicating functionality already implemented in (1) STL, and (2) Boost. I seem to recall that the original Android Browser, for example, had no fewer than 5 kinds of strings on the C++ side of the code base, because it interfaced with several different systems and each had its own notion of what a string should be.

The advantage (or disadvantage) of including common functionality in macro packages is that you can optimize for your particular use case. In theory, this may result in more well-tailored abstractions and performance speedups, but it also results in code duplication, bugs, and potentially even poor abstraction and missed optimization opportunities (just because you think you are generalizing/optimizing your code doesn't mean you actually are).

Clearly, we need some sort of balance, and having official or de facto standard libraries is probably a win. Half the reason we're even in this situation is because both JS and C++ lack certain features in their standard libraries which implicitly encourage people to roll their own.


> Look no further than C++, where nearly every major software suite has its own strings, vectors, etc. implemented, frequently duplicating functionality already implemented in (1) STL, and (2) Boost.

In many ways problem exists because there used to be different ideas in how strings should be set up. Today we mostly decide on UTF-8 and only convert now to legacy APIs if needed. This is a bad comparison because it's literally caused by legacy code. C++ projects cannot avoid different string classes because of that.


As the author of Groups on iOS and the Qbix Platform (http://qbix.com/platform) I think I have a pretty deep perspective on this issue of when "my thing" becomes "everyone's thing".

When a company goes public, there are lots of regulations. A lot of people rely on you. You can't just close up shop tomorrow.

When your software package is released as open source, it becomes an issue of governance. If you can't stick around to maintain it, the community has a right to appoint some other people. There can be professional maintainers taking over basic responsibilities for various projects.

Please read this article I wrote a couple years ago called the Politics of Groups:

http://magarshak.com/blog/?p=135

This is a general problem. Here is an excerpt:

If the individual - the risk is that the individual may have too much power over others who come to rely on the stream. They may suddenly stop publishing it, or cut off access to everyone, which would hurt many people. (I define hurt in terms of needs or strong expectations of people that form over time.)


For workaday engineers (e.g., not people attempting to build distributed libraries), the appropriate question about dependency management is, "under what conditions is it appropriate for a person or group outside of your org to publish code into your project?"

Note that this isn't a cost-benefit approach. It simply asks what needs to be the case for you to accept third-party code. Every project is going to answer this slightly differently. I would hope that running a private repo, pinning versions, learning a bit about the author and some sort of code review would be the case for at least most, but apparently many folks feel OK about the unrestricted right of net.authors, once accepted into your project, to publish code at seemingly random times which they happily ingest on updates.

A lot of coders seem to see only the "neat, I don't have to write a state machine to talk to [thing]," or "thank god someone learned how to [X] so I don't have to." But that, combined with folks who don't manage their own repos or even pin things, leads to folks who's names the coders probably don't even know essentially having commit privileges on their code.


> the appropriate question about dependency management is, "under what conditions is it appropriate for a person or group outside of your org to publish code into your project?"

>Note that this isn't a cost-benefit approach.

Eh, it is a cost-benefit approach if the appropriate circumstances are "where the benefit outweighs the cost/risk." And I don't know any good way to answer it without doing that.

What are your own examples of specification of such circumstances that don't involve cost-benefit?

Unless you just decide, when asking the question like that, "Geez, there's no circumstances where it's appropriate" and go to total NIH, no external dependencies at all (no Node, no Rails, no nothing).


Well, in that everything is eventually a cost-benefit decision, sure.

What I was getting at is that "under what conditions" is a baseline gating requirement that needs to reflect the nature of what you're building. If I'm building something that is intended to replace OpenSSL, my gating conditions for including third party code is going to be a lot different than if I'm building what I hope is the next flappy birds' high-score implementation.

People have all sorts of gating functions. I don't write code on Windows, because I never have and see no reason to start. Baseline competency of developers is another one. You can view baseline requirements as part of the cost-benefit if you like, but generally, if the requirements need to change to make a proposed course of action "worth it", you probably need to redo the entire proposal because the first one just failed. (If your requirements are mutable enough that ditching some to make the numbers work seems acceptable, I have grave doubts about the process and/or the decision making.)

I basically included my own examples already: a private repo to freeze versions in use and host our own forks when appropriate, code review by at least one experienced developer of any included modules, and a look at the developer[1].

[1] This is pretty fuzzy (the process, not necessarily the developer). I generally google-stalk them to see what else they've worked on, if there have been public spats that lead me to believe the future upgrade path might be trouble for us, whether they have publicly visible expertise in what they're coding, etc. Basic reputation stuff, plus evidence of domain expertise, if appropriate.


So when you say "code review by at least one experienced developer" as a condition, you really mean "code review AND the developer thinks it's 'good enough'", right? Presumably same for "a look at the developer" is the same the condition isnt' just that someone is looking at the developer, it's that they are looking and satisfied.

Without being clear about that, it can sound like your conditions are just about having a certain internal _process_ in place, and not about any evaluation at all -- if the process is in place, the condition is met. Rather you're saying you won't use any dependencies without reviewing them more than people typically do and deciding they are okay.

The problem of HOW you decide if something is okay still remains; although the time it takes to review already means you are definitely going to be using less dependencies than many people do, even before you get to the evaluation, just by having only finite time to evaluate. And you'll be especially unlikely to use large dependencies, since they will take so much time to review. (Do you review on every new version too?)

And you still need to decide if something is even worth spending the time to code review. I'd be surprised if expected 'benefit' doesn't play a role in that decision. And, really, I suspect expected benefit plays a role in your code review standards too -- you probably demand a higher level of quality for something you could probably write yourself in 8 yours, than for something that might take you months to write yourself.

What you do is different than what most people do in code reviewing all dependencies, and making sure you have a private repo mirror. I'm not sure it's really an issue of "conditions instead of cost-benefit analysis" though. You just do more analysis than most people, mainly, and perhaps have higher standards for code quality than most people (anyone doing "cost benefit analysis" probably pays _some_ attention to code quality via some kind of code review, just not as extensive as you and without as high requirements. If you're not paying any attention to code quality of your dependencies, you probably aren't doing any kind of cost-benefit analysis either, you're just adding dependencies with no consideration at all, another issue entirely).


I'm actually having difficulty seeing if there's a real disagreement here or if we're arguing semantics. And for the record, you are correct that, for instance, I don't ignore what was learned in code reviews; I assumed I didn't need to spell everything out, but you're correct that I'm not endorsing cargo-cult code review or whatever.

Try this. Say you want to build a house. You plan it out, and while doing so write requirements into your plan. One is that you want zombie-green enameled floors, and a second is that it needs to comply with local construction ordinances.

Neither of those are costless. It is possible that your green floor related urges are something you'll compromise on - if it is too expensive, you'll suffer through with normal hardwood. This, to me, looks like a classic cost-benefit tradeoff - you want to walk on green enamel, but will take second-choice if it means you can also have a vacation instead of a bigger mortgage.

The building code requirement looks different. While you can certainly build your home while ignoring codes, that is usually not a very good strategy for someone who has the usual goals of home ownership in mind. Put a different way, unless you have rather unusual reasons for building a house, complying with building codes is only subject to cost-benefit insofar as it controls whether the home is built at all.[1]

Does this better illustrate where I'm coming from?

When one begins a coding project, hopefully one has a handle on the context in which is to be used. That knowledge should inform, among many other things, expectations of code quality. Code quality in a given project is not just the code you write, but also the code you import from third parties. If you don't subject imported code to the same standards you have for internally produced code, you have a problem, in that at the least you don't know if it is meeting the standards you set.[2]

So… I agree that, from one perspective, every decision is cost-benefit, including eating and putting on clothes when you leave the house. I think it is useful, however, to distinguish factors for which tradeoffs exist within the scope of succeeding in your goals from factors that amount to baseline conditions for success.

As an aside, if you wanted to jump up and down on this, I'm surprised you didn't take the route of asking how far down the stack I validate. What's _really_ running on that disk controller?[3]

[1] I do hope we can avoid digressing into building code discussions.

[2] "I don't care" is a perfectly fine standard, too, depending. Not everything is written to manage large sums of money, or runs in a life-or-death context, or handles environment control for expensive delicate things, or... I absolutely write code that I'm not careful about.

[3] http://www.wired.com/2015/02/nsa-firmware-hacking/


PHP's composer, for example, writes a "composer.lock" tracking the exact versions/commit-shas as was recursively resolved on "install" or "upgrade" time, so you can commit that after testing a "composer upgrade" and be sure you stay with tested versions, and others can then do a composer install based on the lock-file instead of the composer.json "spec" file (which may contain version wildcards).


npm has a similar feature, it's called a shrinkwrap.


Maybe more encouragement of caching where NPM downloads into places where it can be checked into source control. Then you can have the beginner friendliness of NPM, but you are only relying on it being up when you want to add or update a package.

Pretty much the only thing needed to accomplish this is to discourage the use of the global flag, and a few strong notes in the documentation and tutorials to check in your node_modules folder that remind people that if you don't check it in, you have established a build time dependency on NPM being up/correct.

edit: corrected spelling


> I think there are several lessons to be learned here.

Indeed. Unfortunately, they've mostly been learned by others before this, and ignored or forgotten. If you want to see how these issues have been dealt with in the past, look to a project that's been doing the same thing for much longer, such as CPAN and the Perl module ecosystem. It's been around for quite a while, and is very mature at this point. Here's a few features that I would hope other module ecosystems would strive for:

- Mirrored heavily, easy to become a mirror. [1]

- Immutable by default, older packages migrated to special archive version of system which does not archive older packages. [2]

- Indexed in multiple ways (author, name, category) and searchable. [3]

- All modules tested regularly on a matrix of multiple operating systems, language versions, and module releases. [4][5] There's also often a few different architectures tested to boot (IBM/z, Sparc), , but that may be quite a bit less regular.

- Static analysis of modules as a service to authors to help them possibly identify and fix problems, as well as reporting . [6]

- Module installation defaults to running all tests and failing to install if tests fail.

- The ability to set up a local private CPAN mirror for yourself or your company (you can peg your cpan client to a specific mirror). [7]

Undoubtedly there are features of other module ecosystems that CPAN doesn't do, or doesn't do as well, and could look to for a better way to accomplish some things, but that's the whole point. None of this exists in a vacuum, and there are plenty of examples to look to. Nothing makes any one system inherently all that different than another, so best practices are easy to find if you look. You just need to ask.

1: http://mirrors.cpan.org/

2: http://backpan.cpantesters.org/

3: http://www.cpan.org/modules/index.html

4: http://cpantesters.org/

5: http://cpantesters.org/distro/P/Path-Tiny.html

6: http://cpants.cpanauthors.org/ (CPANTS is distinct from CPAN Testers)

7: http://blogs.perl.org/users/marc_sebastian_jakobs/2009/11/ho...


> Sentry depends on it 20 times. 14 times it's a pin for 0.0.1, once it's a pin for ^1.0.0 and 5 times for ~1.0.0.

This is what I was mentioning in the other thread (and being called a troll for... sigh). I appreciate the idealism of "if we have micromodules, we don't have to reimplement common helper functions, which scales to thousands of bytes saved!". But in practice, there's craptons of duplicate dependencies with different versions. Which negatively scales to hundreds of kilobytes wasted. In code, in downloads, in install time, in developer time (because devs install things too. A lot more than end users in fact...), etc.

One of the many problems which means what's on paper doesn't at all correspond to what we actually get.


I don't see how that is worse than the alternative where every library rewrites its own version of the micromodules. With N libraries with their own version of leftpad, you get N copies of leftpad. With N libraries sharing versions of leftpad, you get <= N versions. Seems like a win to me...


Ideally you would cut down on dependencies on every level, and end up with significantly less than N dependencies.

If you have 5 similar string formatting libraries that depend on leftpad, you could collapse them into one, and have a single instance of leftpad inside the combined library. Less licenses, less READMEs, less time downloading, and with tree shaking you still get similar end result.


It would nice for packages to include other packages in it...


Theory != practice.

In practice, you need to balance the overhead of an extra dependency with the benefits from sharing versions with others. When you add a dependency, you also now have a larger attack surface. Any extra dependency adds some amount of friction. Sometimes it is negligible. But, if you look at the overhead across all of your modules, it can add up quickly.

In some cases, the benefits from a dependency outweigh the overhead costs.

In other cases, you just write your own leftpad function and move on.


It seems like the thing to do would be for the community to build up a single set of commonly used functions. Having one authoritative package would make that package CDN-friendly, which should cut down on the bytes-downloaded problem; and if it becomes popular enough, it could eventually be bundled with browsers or rolled into the standard for the language itself.

You could call it a "standard library."


And worse, if you're using npm for frontend, every time every single end user loads your web app.

Previously if you wrote a frontend library and wanted to depend on another library you had to either tell developers to install it too, or bundle it in your library (not a great idea), but either way the file size it added was obvious.

Now you can just drop one line in your package.json, which is super convenient, but obscures the cost of that dependency, and all of it's dependencies.


It's about maintenance and testing. Who is going to be responsible for maintaining functionality X? The module developer or you are going to be responsible for everything!

The main question is about caching. Why not just have signed versions of everything floating around?


[nevermind]


OP was talking about the code send to the end user of the web app, not the size of node_modules.


The problem with standard libraries is they are a standard library. A place where good code goes to die. Standard libraries also mean you can't use the particular version of the module you need; now you are pinned to the version of the standard library comes with the version of the language you are running on. The workaround there is to fork out the standard library code into...a module. Now, a lot of these modules are designed for old JS runtimes like old versions of IE, so you wouldn't have a standard library anyway.

There's plenty of good libraries like lodash and math.js that are pretty much the next best thing to a standard library.

If your dependency tree sucks, that's a personal problem. It's not npm, JavaScript or node's fault. That's like blaming git because you pushed some crappy code.

The problem was fixed 10 minutes later anyway. This whole discussion surrounding this is a combination of knee-jerk reaction, "waah", and realization of "Oh shit, depending on external code means we are dependent on external code!"

If you want to code without dependencies, go write JavaEE. Everything is included, you don't need any third party dependencies, and you can use cutting-edge tech like JSP, app servers and JavaServer faces.


>If your dependency tree sucks, that's a personal problem. It's not npm, JavaScript or node's fault.

While I sort of agree, in the case of JavaScript, the language is so limited that people are attempting to make it into a general purpose language by gluing on stupid dependencies that really did belong in a "standard library", or directly in the language. The "isarray" function is silly, if you write code that need that kind of inside into data types, then you shouldn't have picked JavaScript.

I agree that it's a knee-jerk reaction, and only a few people where really affected. It does however highlighted a major short coming of JavaScript, the fact that it's not a usable language in it self, it needs libraries to be usable as a general purpose language. Hopefully ES6 will fix a lot of this.


Well, first of all, you don't have a choice on the client, and secondly, Array.isArray is in ES5 (see: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...).


> If your dependency tree sucks, that's a personal problem

What about the fact that React and other popular projects have apparently bought into this ecosystem? You can't use them without also using their dependency tree.


I don't Armin would disagree with you. The problem here is the sheer number of micromodules developed under different banners that would be better suited to be part of a larger coherent effort like lodash, and all the problems that come with having such functionality spread across so many micropackages with many separate maintainers.

Dependencies aren't bad in and of themselves, but when you've lots of tiny dependencies, the costs mount rapidly.


"Larger coherent efforts like lodash" already exist and are used by many people on many projects. Perhaps Node's real problem is that it allows developers to structure each project in exactly the way that they feel is best for that project, rather than enforcing a "Java way" etc. A greater variety of techniques naturally yields more techniques that bother any particular developer's sensibilities. Those who don't like the "Java way", for instance, are likely to simply avoid Java rather than tweeting, "OMG I just realized that you can write Java the Java way!"

It's a matter of taste, and Node accommodates various tastes.


You could say that about just about any modern-ish packaging system.

Sure, npm gives you plenty of rope to hang yourself with, but you can't really blame people for pointing out that what you were doing was a bad idea when you end up dangling from the noose you just created.


Haha I was saying "real problem" ironically. I don't think it's a problem at all to flexibly support a variety of approaches. Let a hundred flowers bloom, I say. Neither do I think it's inherently misguided to use the "micromodule" approach. That will work for some projects, if not for others. This episode has been entirely overblown. I'm sure some people went offline over it, but that wasn't entirely or even mostly due to the assholes at Kik. Rather, it was one or more fragile parts of their own infrastructure, exposed by an untimely missing dependency.


> A place where good code goes to die.

Oh I agree. But maybe the answer is neither extreme? A healthy ecosystem with both established "standard" libraries and larger than single-line packages.


"If your dependency tree sucks, that's a personal problem. It's not npm, JavaScript or node's fault. That's like blaming git because you pushed some crappy code."

How is it even remotely similar? In one case it's all your code and in the other it's none of your code.


Like git, npm does only what it is told to do. One writes one's own package.json file. Depending on a module one doesn't own is an action one takes oneself.


Since at least the 70s people have been trying to "componentise" software in the same way that electronics is componentised: rather than assembling something out of a pile of transistors, build integrated circuits instead. The intent is to reduce cost, complexity and risk.

This has never yet quite worked out in software. Object-orientation was part of the resulting research effort, as are UNIX pipelines, COM components, microkernels and microservices. When it goes wrong you get "DLL Hell" or the "FactoryFactoryFactory" pattern.

It looks like the javascript world has forgotten about integration and instead decided to do the equivalent of assembling everything out of discrete transistors every time. The assembly process is automated, so it appears costless - until something goes wrong.

But really this is the fault of the closed source browser manufacturers, who prefer to attempt lockin over and over again through incompatible features rather than converge on common improvements.


I disagree. I think it has worked out quite well. Nowadays nobody has to write basic data structures or algorithms themselves. Unfortunately, the hard part of building software that is useful to today's businesses is not sorting lists or storing dictionaries.

But remember that things like computing a KWIC index used to be real problems back in the day that required serious programmer work. They have become trivial thanks to better libraries and better computers.


> But really this is the fault of the closed source browser manufacturers, who prefer to attempt lockin over and over again through incompatible features rather than converge on common improvements.

There's currently a single major browser engine that is still closed source, EdgeHTML (And with the current trend of open sourcing things at Microsoft, this might change very soon)

Plus the standard bodies were created to prevent that. After a significant stagnation in the mid-2000s, that was ended by the WHATWG, we're getting amazing progress.


Steve McConnell's classic book "Code Complete: A Practical Handbook of Software Construction" references some ~1980s studies of code quality. IIRC, defects per KLOC was inversely correlated with function length, leveling off around 200–400 LOC per function. Software composed of many micro-functions (or, in the npm case, micro-libraries) is more difficult to grok because there is more context "off screen" to keep in your head.

An unrelated Cisco study of code reviews found that 200–400 LOC is the limit of how many LOC can be effectively reviewed per hour. Applying the findings of these studies suggests that neither functions nor patches/pull-request diffs should not exceed 200 LOC. FWIW, I have worked on commercial software that had functions many thousands of lines long! :)


> My opinion query quickly went from “Oh that's funny” to “This concerns me”.

This was my response as well:

> The combination of a micro-library culture, “semver” auto-updates, and a mutable package manager repository is pretty terrifying.

https://mobile.twitter.com/tlrobinson/status/712442098381754...

Either of the second two properties are dangerous on their own, but culture of micro-libraries compounds the problem.


Everyone is blowing the "micropackages are the problem" completely out of proportion. The real problem with the left-pad fiasco is that someone was able to revoke a package other people depended on. Packages should be immutable.


To quote an old adage, package size doesn't matter.

The actual issue has to do with trusting a package of any size over time. This is true regardless of whether the package implements 1 line of code or 1000.

The trustworthiness of a package is a function of several factors. Code that is not actively maintained can often become less trustworthy over time.

What we need is one or more 3rd party trust metrics, and our bundling/packaging utilities should allow us to use that third party data to determine what is right for our build.

Maybe some of us want strong crypto, maybe others of us want adherance to semver, maybe others want to upgrade only after a new version has had 10K downloads, maybe others only want to use packages with a composite "score" over 80.

On the continuum of code quality from late night hack to NASA, we all must draw a line in the sand that is right for a particular project. One size does not fit all.

It's a big mistake (as well as a profound example of bad reasoning) to blame micropackages. The size of the package has nothing to do with it. Any codebase with any number of dependencies faces some risk by trusting the maintainers or hosting of those dependencies to third parties, which is the problem we need to do a better job of solving.


The size of the package matters when smaller packages tends to higher numbers of packages as dependencies.


Larger projects tend to have more dependencies too, so why not rail against large projects?


Because there will be fewer of them.


Fewer packages? Who is to determine the optimal number of packages? Not sure how that benefits anyone. I hesitate to accuse you of trolling but your argument does not seem all that coherent.


It's pretty simple: the more dependencies there are, the more upstream authors you have to trust.


That assumes a lot. My code may utilize one dependency that itself utilizes a few dozen useless ones. Whereas someone else may carefully choose 20 dependencies, none of which include any dependencies.

The issue is measuring the trustworthiness of a dependency, and recursively doing that operation throughout the dependency graph.

Simply focusing on the number of dependencies or the size of a dependency is silly.


I heard one of js "devs" refer to npm as nano package management. It sounded more like abdication of duty as a developer to understand what you are adding as a dependency, why, and the long-term cost.

How many developers here would gladly add a rogue "dependency", like a developer they had never spoken to before, into their project without some care? And yet the willingness to open the front and literal back doors of the project to so many dependencies, like low-quality functions-as-a-module is astounding.


The large amount of python standard packages is clearly a benefit for Python. Node should start a vetting process to be included into a standard package system, and start moving in key libs, then host official doc.

I guarantee that the weaknesses of the NPM ecosystem are already known and exploited by bad actors. There are people who earn large 6 figure salaries / consulting fees for finding and exploiting these issues. This is a wakeup call that we need to do something about it.


A lot of the value in these remarks is that they are coming from the author of Flask, the most popular microframework for Python, which itself has a massive extension tree that also does suffer a lot of the same problems as NPM - trying to update or maintain a Flask project often involves navigating a tremendous amount of dependency hell on all kinds of modules, from flask-sqlalchemy to flask-wtforms to flask-bootstrap to flask-oauth.. etc. The worst part is tons of these modules and extensions are dead projects that code rot for years, but when you implement everything independently in its own git tree it gets many fewer eyes upon it, as Armin mentions in the OP regarding one liner Node packages.

But it does not spiral out of control nearly as bad as any attempt at frameworks on NPM, because unlike Node almost every Flask extension depends on three things - Flask, the external package the extension attaches to( ex: WTForms) and Python's standard library.

A similar Node package would depend on possibly hundreds of tiny one liners to replace the absence of standard library.

Which gets to the heart of the problem, right? The reason I've never even considered Node is because Javascript is like PHP - a mutant language born of need rather than intent, that kind of just grew over time to fill use cases constrained by its unique position in the ecosystem rather than as what someone considered the "best answer to the job". Python (3) is almost entirely anthesis to that. Writing Python is a joy because it is designed ground up to be a paradigm to solve problems, not a problem that breeds a paradigm.

There is no way for Node to fix this as long as it tries to be browser compatible. We will never see ECMAScript standards adopt an ISOC++ stance of maturing the language with a comprehensive standard library to meet the needs of the language in the day and age it is being used, because there are very disparate interests involved in Javascripts language design going forward. That is its blessing and curse - Javascript will never grow into a Java-scale monstrosity of standard library bloat because a tremendous number of people involved in Javascript also have to implement Javascript and thus don't want a larger surface area of work to do. But it is important to remember that Javascript was never meant to be anything. It was made for dynamic HTML pages in Netscape. The fact that two decades later it is being shoehorned into web server dev and desktop applications should be scary.


Maybe we could start to publish signed approvals of specific package hashes.

For example: "I, mbrock, think that pad-left v1.0.3 with hash XYZ seems like an uncompromised release."

Then the tool that upgrades packages could warn when a release isn't trusted by someone you trust (or transitively via some trust web scheme).

The approval system becomes like a "release review" process.


Yes! I've been working on this scheme at https://github.com/chromakode/signet


That would probably be doable using a PGP-style trust system. Don't know if it'd add much security in practise though, since the "trust these declarations of trust" decision would most likely be automated.


Yeah, just GPG with Keybase and some place to publish the messages.

Trustworthy people wouldn't approve releases automatically... but then, who's trustworthy?

Like Pynchon wrote, paranoia is the garlic of life...


You still have to decide who to trust, but having a collection of many independent parties verifying a package can be a useful signal even if you don't have anyone directly in your trust chain. It makes it a lot harder for rogue releases to go unnoticed.


Help me understand why these micropackages exist in a world where tree shaking is a thing? Why is there no stdlib that rolls up all of the commonly used small dependencies? (I'm kind a n00b to JS so it's a non-rhetorical question.)


Up until very recently, tree-shaking JavaScript code was essentially impossible due to the dynamic nature of commonjs and AMD. Thanks to the new static module system in ES6, Rollup and Webpack 2 will be able to support tree shaking of ES6 modules. But the vast majority of modules out there are still commonjs, and it will be quite a while before that changes.


There are packages which roll these up. Lodash is popular.

But there's a philosophy and mindset in JS which is relatively unique to other languages which is to optimize for download size to the browser (noting in another comment, tree shaking its not entirely yet "a thing"). Users don't want to include all of Lodash (or a stdlib) in their download package to just use _.isArray. Now Lodash does also provide the ability to use subpackages specific to a function (lodash.isArray) can be loaded as an independent package to mitigate that download hit but its relatively unique in that sense.


I find tree shaking a much better solution for that issue.


Good article even though I don't agree with all the conclusions.

I find a good way to think about things is that every single dependency you have adds another set of people you have to trust.

You're trusting the competence of the developer (i.e. that the library has no security flaws), you're trusting their intent (i.e. that they don't deliberately put malicious code into the library) and you're trusting their Operational Security practices (i.e. that their systems don't get compromised, leading to loss of control of their libraries).

Now when you think about how little you know about most of the owners of libraries you use, you can see possibility for concern.

The bit I disagree with the article about is signing. I personally think that developer signing is a useful part of this as it takes the repository owner out of the trust picture (if done correctly). Without it you're also trusting the three items above for the repository provider and it's worth noting that a large software repo. is a very tempting target for quite a few well funded attackers.

Docker at least has provided some of the technical pieces to address this in their repositories with content trust...


I think the problem is in npm and not in the micro-modules.

Writing isomorphic, cross-browser code in a language full of edge-cases, like JavaScript is hard.

Oneline function, but 20 lines of tests and another 20 test environments.

The solution should not come from people that write only front-end JS code. I'm waiting for a response by the libraries that were broken by left-pad.


Maybe the solution for highlevel languages is to just routinely add useful helper functions, either in separate name spaces or directly to global with a naming convention to avoid conflicts. If thousands of people are doing the same thing it really doesn't make any sense for them to all come up with their own version.


Thinking about this for the last few days, I think the Java SE / commons-approach seems the most sensible to me.

Your stdlib is bundled with the language runtime and contains binaries, OS interactions and the necessary fundamental stuff. Updating this is hard, because this depends on hard things like the OS, and because it must maintain compatiblity with everything.

However, the commons-package can add all kinds of pure (as in, just the language in question) stuff to the language, such as several dozens of collections, all kinds of utility languages, and so on. Updating this would be a lot less risky than updating the entire language runtime, especially if it uses proper semantic versions. This would allow this lib to iterate faster and include these utility-functions easier as well.


The example that always drives me crazy is itertools in the Python standard library. In the documentation there is a recipes* section that lists 23 helpful functions, instead of just including them in the library

* https://docs.python.org/2/library/itertools.html#recipes


What's super frustrating is that Python 3 would've been an ideal opportunity to move those into the standard library, but it was never done.


This is the PHP approach, minus the consistent naming convention. It's very convenient from the developer side to have a myriad of things in the core language.


There is a bigger debate on micropackages, for sure. But even in the sort term, breaking your build instantly every time third parties make changes is just madness. Reduce operational dependencies as well as library dependencies.

This is one approach we used to deal with this last year, for example, on build/devops side: https://medium.com/@ojoshe/fast-reproducible-node-builds-c02...


Isn't this similar to broken links on the web? You can either:

1) Bundle everything in your distribution. Not unreasonable, but would be nice to have a hybrid protocol that lets the publisher store a signed copy of all the dependencies but only send them on request (so less duplication is sent over the wire).

2) Have the same as 1 but in dedicated "mirrors" and persistent distributed storage a la freenet. Files are only deleted if there isn't enough space on the network and they are the least-- recently-requested ones.


I asked about this last year:

https://news.ycombinator.com/item?id=9629091


Totally agree, I never understood this fetish developers have with small packages. It's probably related to the 'Unix philosophy' but it just doesn't scale in a web context...

Personally, I much prefer a single nice, complete, well-polished module that works perfectly over lots of tiny modules which are awesome individually but which suck when used together.


This was well written. The balance between convenience and liability is something that takes time to digest.

I don't really understand why there isn't a stdlib of these "micropackages" that can be downloaded to save a lot of effort.


Micro deps are the sign of something missing from the core language. We should be working to have that expanded and not have it shouldered to a package manager system and community to fill IMO.


There's where git-vendor [0] comes in place!

[0]: https://brettlangdon.github.io/git-vendor/


Learn about clean room. If you saw the code. You'll copy it subconsciously. If the license requires attribution and you don't. You're in trouble.


> Multiplied with the total number of downloads last month the node community downloaded 140GB worth of isarray.

This is not true. NPM locally caches every module the first time it is downloaded.

Therefore with widely downloaded modules such as isarray, it is very likely it has already been downloaded on the local system and so is pulled from the cache.

The actual percentage of fresh downloads from NPM in a real-world deployment is overwhelmingly small.


> This is not true. NPM locally caches every module the first time it is downloaded.

This is literally coming from the npm website. Presumably that number is what it says: downloads.


I'm curious if npm's published statistics like "downloads in a day", "downloads in the last month" include copies from a local cache.


Why is nobody talking about the real issue? That NPM unpublished someone's module because a lawyer threatened them in an email.

People should be upset that NPM will just take your module away if they feel it's appropriate. The whole reason left-pad was unpublished has been completely ignored.


Holy crap, I had a look through https://www.npmjs.com/~sindresorhus

There are so many one line packages:

https://github.com/sindresorhus/is-finite/blob/master/index....

https://github.com/sindresorhus/is-fn/blob/master/index.js

https://github.com/sindresorhus/is-gif/blob/master/index.js

https://github.com/sindresorhus/is-github-down/blob/master/c...

https://github.com/sindresorhus/is-ip/blob/master/index.js

https://github.com/sindresorhus/is-npm/blob/master/index.js

https://github.com/sindresorhus/is-obj/blob/master/index.js

https://github.com/imagemin/advpng-bin/blob/master/lib/index...

https://github.com/chalk/ansi-regex/blob/master/index.js (my favourite)

https://github.com/sindresorhus/array-move/blob/master/index...

https://github.com/sindresorhus/compare-urls/blob/master/ind...

https://github.com/sindresorhus/debug-log/blob/master/index....

https://github.com/sindresorhus/file-url/blob/master/index.j...

https://github.com/sindresorhus/fix-path/blob/master/index.j...

https://github.com/sindresorhus/fn-args/blob/master/index.js

https://github.com/sindresorhus/fn-name/blob/master/index.js

https://github.com/sindresorhus/globals/blob/master/index.js

https://github.com/sindresorhus/imul/blob/master/index.js

https://github.com/sindresorhus/is-text-path/blob/master/ind...

https://github.com/sindresorhus/is-up/blob/master/index.js

https://github.com/sindresorhus/is-travis/blob/master/index....

https://github.com/sindresorhus/is-video/blob/master/index.j...

https://github.com/sindresorhus/is-webp/blob/master/index.js

https://github.com/sindresorhus/is-archive/blob/master/index...

https://github.com/sindresorhus/is-admin/blob/master/index.j...

https://github.com/sindresorhus/is-absolute-url/blob/master/...

https://github.com/sindresorhus/ipify/blob/master/index.js

https://github.com/sindresorhus/is-url-superb/blob/master/in...

https://github.com/sindresorhus/is-tif/blob/master/index.js

https://github.com/datetime/leap-year/blob/master/index.js

https://github.com/sindresorhus/lpad/blob/master/index.js

https://github.com/sindresorhus/md5-hex/blob/master/index.js

And I ran out of willpower, at only L. Seems to me the complete lack of any decent standard library has caused this Cambrian explosion of packages, and the overhead is astounding. Sure it's appealing to google "nodejs validate ip", then run "npm install is-ip" and use it with "require('is-ip')", but fuck me how wasteful do you want to be. My frontend ember app ends up installing over 500mb of dependencies (most of which is useless test files and other redundant fluff files). How has this happened?

What's to stop one of those one line packages adding a malicious one liner that concatenates and uploads your super-secret private source code to somebodies server? You're really trusting the complete integrity of your codebase because you depend on "is-array", because you can't be bothered to write "x.toString() === '[object Array]'", and JS introspection (which seems so (ab)used) is so broken that this is needed? Woah.


> because you can't be bothered to write "x.toString() === '[object Array]'"

([1,2,3]).toString() => "1,2,3"


My bad, '({}).toString.call([1,2,3])'.

Because type introspection is so ugly in JS, but so needed, that it's hidden behind 'require('is-array')' to make it palatable. Ridiculous.


It's telling that only the immature JS ecosystem thinks this is a good idea.


If you don't know history.....

40 years of computer experience as EE, coder, IT security, DBA tells me; when IT moved from a way to do work (science) to a thing of its own (marketing); this happened during Dot-Com bubble, time to market became the goal and security was tossed. You here this in mantras like:

Push NOW fix latter. Fail fast and often.

I say: Secure, Complete, Fast - Pick two.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: