The security of package managers is something we're going to have to fix.
Some years ago, in offices, computers were routinely infected or made unusable because the staff were downloading and installing random screen savers from the internet. The IT staff would have to go around and scold people not to do this.
If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.
In the linked piece, Russ Cox notes that the cost of adding a bad dependency is the sum of the cost of each possible bad outcome times its probability. But then he speculates that for personal projects that cost may be near zero. That's unlikely. Unless developers entirely sandbox projects with untrusted dependencies from their personal data, company data, email, credentials, SSH/PGP keys, cryptocurrency wallets, etc., the cost of a bad outcome is still enormous. Even multiplied by a small probability, it has to be considered.
As dependency graphs get deeper, this probability, however small, only increases.
One effect of lower-cost dependencies that Russ Cox did not mention is the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them. This further multiplies the task of reviewing all of the code important to your program.
Linux distributions had to deal with this problem of trust long ago. It's instructive to see how much more careful they were about it. Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust. Of course, the distributions are at the end of the day distributing software written by others, and this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review. And of course, the hassles of getting a library accepted into distributions is one reason for the popularity of tools such as Cargo, NPM, CPAN, etc.
It seems that package managers, like web browsers before them, are going to have to provide some form of sandboxing. The problem is the same. We're downloading heaps of untrusted code from the internet.
After using Go and Dart on a number of projects and using very few dependencies (compared to JavaScript projects) I'd say a good starting point is having a great standard library.
For example, it's a bit ridiculous that in 2019 we cannot decode a JWT using a simple browser API, still need Moment for time and date operations, there is no observable type (a 4 year old proposal is still in draft stage), and still no native data-binding.
The TC39 is moving too slowly and that's one of the reasons why NPM is so popular.
I mean, even all of those examples you listed aren't as crazy as the fact that you need a library to parse the cookie string and deal with individual cookies...
> Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust
At the very least. More often people receive mentoring for months and meet in person.
> this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review
It makes packaging extremely time consuming and that's why a lot of things in Go and javascript are not packaged.
The project cares about security and compliance to licensing.
> ... the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them.
It's not just a security problem. It also hampers composition, because when two libraries talk about the same concept in different "terms"/objects/APIs (because they rely on two different other libraries to wrap it), you have to write a bridge to make them talk to each other.
That's why large standard libraries are beneficial - they define the common vocabulary that third-party libraries can then use in their API surface to allow them to interoperate smoothly.
> The security of package managers is something we're going to have to fix.
why the generalization? lot of package manager have been serviceable for decades, their security model based solely on verifying the maintainer identity with clients deciding which maintainer to trust.
this is of course an issue with all package manager, but it's the lack of trusted namespacing that makes it easy to fall into it. (there's scope which sound similar but the protection model of the scope name is currently unclear to me and it's optional anyway)
compare to maven, where a package prefix gets registered along with a cryptographic key and only the key holder can upload to it to the central repo.
sure you get malicious packages going around, but it's far easier not to fall into it because it's significantly harder to get user to download a random package off the namespaces he knows
> We're downloading heaps of untrusted code from the internet.
this is not something a package manager can fix, it's a culture problem. even including a gist or something off codepen is dangerous. a package manager cannot handle the 'downloading whatever' issue, it's not reasonable to put that in its thread model, because no package management maintainer can possibly guarantee that there is no malicious code in its repository and it's not its role anyway. a package manager is there to get a package to you as it was published at a specific point in time identified by its versioning, and its threat model should be people trying to publish packages under someone else name.
speaking of which it took npm 4 years to prevent people to publish a package with new code under an existing version number: https://github.com/npm/npm-registry-couchapp/issues/148 - they eventually came to sense but heck the whole node.js ecosystem gung-ho attitude is scary.
> why the generalization? lot of package manager have been serviceable for decades, their security model based solely on verifying the maintainer identity with clients deciding which maintainer to trust.
What happens when the maintainer of a package changes?
The big problem I see happening is maintainers getting burned out and abandoning their packages, and someone else taking over. You might trust the original maintainer, but do you get notified of every change in maintainer?
> The security of package managers is something we're going to have to fix.
Companies that care about this already have dependency policies in place. The companies that don't care so much about security already have an approach to security problems that they will employ if a significant threat is revealed, spend time and money to fix it then.
It's a herd approach. Sheep and cattle band together because there's strength in numbers and the wolves can only get one or two at a time. It's extremely effective at safeguarding most of the flock.
>Companies that care about this already have dependency policies in place. The companies that don't care so much about security already have an approach to security problems that they will employ if a significant threat is revealed, spend time and money to fix it then.
I think that probably the majority of companies actually fall into a third group: Those who don't really care enough about this but also don't really have a good policy for dealing with it.
> It's instructive to see how much more careful they were about it.
"Much more careful" would have been a requirement to consult upstream on all patches that are beyond the maintainer's level of expertise. Especially so for all patches that potentially affect the functioning of cryptographic libraries.
Debian has had a catastrophe to show the need for such a guideline. Do they currently have such a guideline?
If not it's difficult to see the key parties as little more than security theatre.
> The security of package managers is something we're going to have to fix.
Inclusiveness and the need for Jeff Freshman and Jane Sophomore to have a list of 126 GitHub repos before beginning their application process for an intern job is at odds with having vetted entities as package providers.
When I was developing Eclipse RCP products, I had three or five entities that provided signed packages I used as dependencies.
Plus: with npm, you even have tooling dependencies, so the former theoretical threat of a malicious compiler injecting malware is now the sad reality[0].
I'm not claiming the "old way" is secure, but the "new way" is insecure by design and by policy (inclusiveness, gatekeeping as fireable offense).
[0] I have tooling dependencies in Gradle and Maven too, but again, these are by large vendors and not by some random resume padding GitHub user.
I'm a big fan of kitchen sink frameworks for this reason. Whenever I want to do something in JS the answer is to install a package for it. When I want to do something in rails the answer is its built in. I have installed far far fewer packages for my back end than the frontend and the back end is vastly more complex
TLDR: it boils down to analysing dependencies at the level of the callgraph; but building those callgraphs isn't easy. The benefit in the security use case is ~3x increased accuracy when identifying vulnerable packages (by eliminating false positives).
This right here is why Go's 'statically link everything' is going to become a big problem in the long run when old servers are running that software and no one has the source code anymore.
i dont see how that's true. in both worlds, a developer has to take the manual action to review published vulnerabilities and track down repos they own that are affected and upgrade the dependencies.
No: with dynamic linking, and especially with Linux distributions, most of the work is automated and the patching is done by the distribution security team.
The time to write a patch and deliver it to running systems goes down to days or, more often, hours.
Cautiously posting that link, because I'm not against vendoring. You just need a process around keeping your dependencies up to date / refreshed automatically. The ability to vendor is one thing, how you use it is another.
It would be nice if our compilers had the ability to directly incorporate the source code into the binary in some standard way. E.g. on Win32, it could be a resource, readily extractable with the usual resource viewer. On Unix, maybe just a string constant with a magic header, easy to extract via `strings`. And so on.
I agree this would be positive. But the source only get's you halfway there though, you still need to actually be able to reproduce a compatible build system. And the longer ago the software was originally developed, the more challenging that becomes.
There are lot's of old projects out there relying on some ancient VS2003 installation. Same will happen with modern languages in a decade - code goes stale, and it get's more and more difficult to pull down the versions of software it was originally built with.
I hate (read: love) to be pedantic, but all scripting languages already have this feature built-in, and thanks to modern VMs and JIT compilers and the like, performance is much less of an issue.
It would be interesting to see e.g. a Go executable format that ships with the source, build tools and documentation that would compile to the current platform on demand. Should be doable in a Docker image at least.
No one's going to waste resources putting source code on the server dude, they'll host it somewhere else and then something will happen to it or they just won't see the need to give the source to anyone because they're the only people in the company who understand it anyway etc.
Given the ease with which the parser and AST are made available to developers, we should be able to implement tools which can detect naughty packages. Also, given the speed at which projects can be compiled, the impetus to keep the source code should remain strong.
> we should be able to implement tools which can detect naughty packages
We can! It's one thing to know that there's no major technical obstacle to having a security-oriented static analysis suite for your language of choice. It's quite another for one to actually have already been written.
The primary wrinkle tends to be around justifying the cost of building one. For companies that use small languages, that means a non-trivial cost in engineer time just to get a research-grade scanner. For companies whose products are security scanners, it means waiting until there's a commercial market for supporting a language.
This is a problem I've been struggling with. I sympathize a great deal with developers who want to use the newest, most interesting, and above all most productive tools available to them. This stacks up awkwardly against the relatively immature tooling ecosystem common in more cutting-edge languages with smaller communities and less corporate support.
Granted. But it will at least raise the bar for building an exploit package from "knows how to code" to "knows how to code, knows something about exploits, and knows how to avoid detection by an automated scanner."
It really depends on how the developers work; if they know the software will have to run for 10+ years, mostly unmaintained / unmonitored, they can opt to vendor all the dependencies so that a future developer can dive into the source of said dependencies.
Also, the Go community tends to frown at adding superfluous dependencies - this is a statement I got while looking for web API frameworks. Said frameworks are often very compact as well, a thin wrapper around Go's own APIs.
I've also worked on a project which had to be future-proofed solidly; all documentation for all dependencies had to be included with the source code.
It's an interesting line of inquiry to think about how many of these evaluation heuristics, which are all described as things a person can do manually, could instead be built into the package manager itself to do for you automatically.
The package manager could run the package's test suite, for instance, and warn you if the tests don't all pass, or make you jump through extra hoops to install a package that doesn't have any test coverage at all. The package manager could read the source code and tell you how idiomatically it was written. The package manager could try compiling from source with warnings on and let you know if any are thrown, and compare the compiled artifacts with the ones that ship with the package to ensure that they're identical. The package manager could check the project's commit history and warn you if you're installing a package that's no longer actively maintained. The package manager could check whether the package has a history of entries in the National Vulnerability Database. The package manager could learn what licenses you will and won't accept, and automatically filter out packages that don't fit your policies. And so on.
In other words, the problem right now is that package managers are undiscriminating. To them a package is a package is a package; the universe of packages is a flat plane where all packages are treated equally. But in reality all packages aren't equal. Some packages are good and others are bad, and it would be a great help to the user if the package manager could encourage discovery and reuse of the former while discouraging discovery and reuse of the latter. By taking away a little friction in some places and adding some in others, the package manager could make it easy to install good packages and hard to install bad ones.
NPM (nodejs package manager) has started doing that, revealing three combined metrics for each package in the search results. It assesses every package by popularity (number of downloads), quality (a dozen or so small heuristics about how carefully the project has been created) and maintenance (whether the project is actively maintained, keeps its dependancies up to date and if it and has more closed issues than open issues on github). The idea is that when you search for a package, you can see at a glance the relative quality and popularity of the modules you're choosing between.
Its not perfect - there's no way to tell if the packages under consideration are written in a consistent style or if they have thorough unit tests, but its a clever idea. And by rating packages on these metrics they encourage a reasonable set of best practices (write a readme and a changelog, use whitelists / blacklists, close issues on github, etc). The full list of metrics is here:
To some extent, R works like this... packages are only on CRAN if they pass automatic checks and there's a pretty strong culture of testing with testthat. You can have your own package on Github or external repo, but then that's a non-standard, extra step for installing the package.
1) suppose we have pseudonym reputation ("error notice probability"): anyone can create a pseudonym, and start auditing code, and you mark the parts of code that you have inspected. those marks are publicly associated with your pseudonym (after enough operation and eventual finding of bugs by others, the "noticing probability" can be computed+).
2) consider the birthday paradox, i.e. drawing samples from the uniform distribution will result in uncoordinated attention, while with coordinated attention we can spread attention more uniformly...
+ of course theres different kinds of issues, i.e. new features, arguments about wheiter something is an improvement or if it was an oversighted issue etc... but the future patch types don't necessarily correlate to the individuals who inspected it...
ALSO: I still believe formal verification is actually counterintuitively cheaper (money and time) and less effort per achieved certainty. But as long as most people refuse to believe this, I encourage strategies like these...
The big idea is for people to publish cryptographically signed "proofs" that they've reviewed a particular version of a given module, allowing a web-of-trust structure for decentralised code review. I particularly like how, thanks to the signatures, a module's author can distribute reviews alongside the module without compromising their trustworthiness - so there's an incentive for authors to actively seek out reviewers to scrutinise their code.
This paper lends significant legitimacy to a casual observation that I've been concerned about for a long time: as the standard for what deserves to be a module gets ever-lowered, the law of diminishing returns kicks in really hard.
The package managers for Ruby, C#, Perl, Python etc offer ~100k modules. This offers strong evidence that most developer ecosystems produce (and occasionally maintain) a predictable number of useful Things. If npm has 750k+ modules available, that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community. Given that every dependency increases your potential for multi-dimensional technical risk, this seems like it should be cause for reflection. It's not an abstract risk, either... as anyone who used left-pad circa 2016 can attest.
When I create a new Rails 5.2 app, the dependency tree is 70 gems and most of them are stable to mature. When I create-react-app and see that there's 1014 items in node_modules, I have no idea what most of them actually do. And let's not forget: that's just the View layer of your fancy JS app.
When I create a new rails 5.2.2 app, I see 79 dependencies in entire tree. Which is about what you said, and a lot less than 1014, sure.
There are various reasons other than "low standards" that the JS ecosystem has developed to encourage even more massive dependency trees.
One is that JS projects like this are delivered to the browser. If I want one function from, say, underscore (remember that?), but depend on all of it... do i end up shipping all of it to the browser to use one function? That would be unfortunate. Newer tools mean not necessarily, but it can be tricky, and some of this culture developed before those tools.
But from this can develop a community culture of why _shouldn't_ I minimize the weight of dependencies? If some people only want one function and others only another, shouldn't they be separate dependencies so they can do that? And not expose themselves to possible bugs or security problems in all that other code they don't want? If dependencies can be dangeorus... isn't it better to have _surgical_ dependencies including only exactly what you need so you can take less of them? (Makes sense at first, but of course when you have 1000 of those "surgical" dependencies, it kind of breaks down).
Another, like someone else said, is that JS in the browser has very little stdlib/built in functions.
Another is tooling. The dependency trees were getting unmanageable in ruby before bundler was created (which inspired advanced dependency management features in most of the rest subsequent). We probably couldn't have as many dependencies as even Rails has without bundler. Your dependency complexity is limited by tooling support; but then when tooling support comes, it gives you a whole new level of dependency management problems that come with the crazy things the tooling let you do.
These things all feed back on each other back and forth.
I'm not saying it isn't giving rise to very real problems. But it's not just an issue of people having low standards or something.
You are correct, sir: 79 dependencies. Need more coffee!
The influence of Bundler is just another example of Yehuda Katz doing something initially perceived as unpopular having a massive long-term impact on developer ecosystems. He is the Erdős of the web dev world. I wish he would sell options on his future endeavours like David Bowie did.
Anyhow, in trying to keep my initial comment relatively brief, I held back on several points; what I really and truly don't get about the 750k thing is how anyone can track the libraries available. You know... with their brains. In Rails, gems are big ideas: authentication, an ORM, the ability to address all of AWS, send tweets. The idea that someone would even think to publish a "left-pad" (which I understand is just missing from the also-missing stdlib) as something people should import is what seems crazy. What makes you stop and think... wait, I'm typing too many characters, I need to see if there's anything on Github that can insert spaces at the beginning of a string. How would you even know what it's called? Is there a module for concatenating two strings? When does it become silly?
How is it possible that finding a library to add padding which may or may not exist, doing even cursory code review and integrating it not take longer than just writing a few lines of code?
Using the example of the escape-string-regexp module mentioned in the whitepaper... this would be a deeply flawed thing to add to your project. It has a hard-coded error message that has zero affordances for localization strategies. It is, at best, a few random lines from someone's hobby app.
If every one of those 1014 modules in an empty project have weird, unknown-unknown failure modes, that sounds like a recipe for trouble.
Finally, 1014 packages getting frequently version bumped is way less reliable in terms of unfortunate conflicts. Your surface area of all things that can go wrong shoots up and to the right... all to save a few lines of code?
I don’t really disagree with anything you’ve said, just want to add.
It’s not really 750k packages you’re keeping in your head. Instead, there are swirling communities of standard convention. This happens because the js community is so large and the code syntax so “forgiving” that many different “dialects” of js exist. There’s a reason “Babel” is called “Babel.” Within your dialect, one package may be better suited than another package, which does the same thing. A great example of this is where ‘_ => result’ may look perfectly coherent in one sub-ecosystem, another sub-ecosystem would more readily understand ‘function AsyncCalculationProvider(callback: CompletionFunction) { return callback }’, both for perfectly legitimate reasons. That people wish to ‘standardize’ on one common way of doing a thing within their sub-ecosystem is the reason people accept dependencies for dumb little things. That people disagree on how those things should manifest in the context of their subculture is the reason there’s 10 different popular ways to do the same exact little thing.
This isn’t even just ECMAScripts fault. The Web APIs provide a lot of great fill-in as a standard library (for example, for Date localization), but even here we do not see consistency. For example, the “fetch” api for browsers is nice enough, but is not implemented in Nodejs. If you want to share a signature for HTML requests between nodejs and browsers, you’re going to use “axios” or maybe “node-fetch” (which replicates the Web api but is more verbose), or you’re going to end up rewriting a wrapper for that functionality, and why would you do that when there’s a community that will immediately see and understand what you mean when you import the “axios” module?
Big optional standard library needed, but minimized deliverable also needed. Recent tooling (eg tree-shaking and Babel), make this more reasonable today than it was in the past.
If historical lessons continue to apply, what will likely happen is a few more significant iterations of shakeout as the community coalesces around something closer to consensus. It's important to remember that the vast majority of developers in any ecosystem are simply trying to use a tool to do a job; they are far less likely to commit a significant amount to the ecosystem and that's okay. Wikipedia is very similar in this regards.
Perhaps the best comparison is the evolution of the Linux distribution ecosystem. Maybe React is Ubuntu? The story is still being written.
Here's an actual example I'm dealing with right now.
I am trying to write something in more-or-less "pure" browser Javascript, for Reasons. (Not JQuery, not react/vue/etc, maybe some polyfills).
I need a "debounce" function. There are a bunch around as copy-paste code. The one in underscore is famous. There are some blog posts saying they have taken the one from underscore, but what they paste doesn't match _current_ underscore code. Underscore has apparently changed with improvements or bug fixes. The literal code in underscore depends on other parts of underscore, so if I want to copy-paste current one, I've got to do some work to refactor it to be independent.
I don't want to use underscore itself, because I don't want all of underscore to be shipped to client browsers when I just need debounce.
I'm going to spend several hours or a day or more on this before I'm done. (that I am fairly new to 'modern' JS doesn't help)
If there was just a debounce dependency that seemed popular and trustworthy that I could just take, in a few seconds instead of spending a day on investigating it, so I could get back to actually developing my domain func -- you bet I'd take it. (Yeah, I spent some time looking for _that_ too, and surprisingly didn't find it, despite everyone using an independent 'leftpad', there isn't an independent 'debounce' (which is much less trivial code even though it's still only a few lines when done)? Or did I just not find it?)
There are "ecological" reasons that lead to the nature of the ecosystem, various pressures and barriers in the existing environment, not just an issue of developer judgement.
The fact that this kind of JS code is _shipped to browser_ makes the size of the built product matter a lot more, which changes approach to dependencies.
Those of us who use _Rails_, Rails _comes_ with so much (including apropos the current discussion, various text mangling utilities which aren't directly related to the web domain specifically), that you don't need to look elsewhere for it. But _that_ choice has been subject to much critisism too, with people thinking Rails does too much, which has it's own downsides. It's all trade-offs. And the trade-offs often effectively happen at the "community" level rather than the individual one.
If it wasn't obvious from my barely-contained bias, I'm old-stock Rails and I have been present for all of the heckling about ActiveSupport. Those criticisms happened but largely amounted to a tempest in a teapot.
Rails between 3.0 and 3.2 was a painful time, as the merger of Rails and Merb didn't go smoothly. Yehuda got everyone excited about pure modularity but didn't stick around long enough to do the hard, boring last 10%. Thank goodness for Tenderlove. Today, you really can disable ActiveSupport if it offends you, but in the field you rarely see a Rails project without ActiveSupport. It's modularity that nobody would benefit from taking advantage of. It's libraries like CoffeeScript, Turbolinks and now thankfully Sprockets that people are turning off. (I would strongly argue that people should use Turbolinks 5, but that's a different thread for a different day.)
The thing is that DHH's whole deal was that there's an "omakase" best-practices menu that you can sign off on and reap a huge number of benefits for free. What people don't seem to understand is that the biggest of these benefits was not any particular feature of Rails, but that it was a consistent vision throughout... both in terms of what was there and especially what was not there.
I think that's why the fuzzy size feeling I perceive from most gems is so practical and comforting to me. They feel like the right amount of stuff, and I can be reasonably confident that they will fit my other stuff.
Honestly, I feel really badly that most JS devs have never experienced anything like that. Whatever gains have been obtained by adopting JS-everywhere, the losses in developer ergonomics are not properly appreciated.
> that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community.
Err, does it? The first thing this suggests is that the JS community is 7.5x larger (true for Ruby, even larger factor for Perl and not true for Pyhthon [1]). The second thing this suggests to me is that npm is X times more usable than those languages package managers, which from my experience true for Python (not for Ruby though, dunno about Perl).
I'm not trying be argumentative, but I honestly don't see how you conclude that 7.5x more packages === 7.5x more usuable.
I don't have a huge amount of experience with pip or artisan, but they seemed roughly similar to both rpm and bundler in terms of concept and operation. You have a file with a list of modules to require, an executable tool that downloads all of those modules and their dependencies and finally a generated file that tracks the snapshot state of all dependencies.
I don't perceive npm bringing 7.5x game to that evaluation.
Don't get me wrong: I'm thrilled that it's working for so many people. Java was officially working for people when Rails became popular, too. However, popularity is a dangerous way to measure whether something is actually objectively better. Sometimes the most popular thing really is better. And sometimes Putin is super popular. It's complicated. :)
> If npm has 750k+ modules available, that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community.
Or that JS lacks a decent stdlib! As evidenced by the left-pad fiasco, JS simply lacks things that are either available by default in other languages or easily include-able from the system libc (in c/c++).
Totally agree. If JS had only a decent standard lib for string, collections and math I bet the number of packages would be reduced by a lot. I wonder what the reception of such a thing in NPM would be. Would people appreciate having only one, but larger dependency?
There's an old engineering joke about how there's 13 incompatible power socket standards in use on Earth today. Sensing an opportunity to engineer a better solution, they created a power socket that is simpler, easier to use and cheaper to install than all current standards. Now there are 14 power socket standards.
I know that one but I don't think it applies here. In JS there really doesn't seem to be a comprehensive library. When I look at a lot of NPM packages they are simple 5-10 line functions. Consolidating them into a larger library would help a lo in my view.
>Is the code well-written? Read some of it. Does it look like the authors have been careful, conscientious, and consistent? Does it look like code you’d want to debug? You may need to.
This, 10,000x. I've repeated a similar mantra many, many times, and it's one of the most important reasons I refuse to use proprietary software. You should consider no software a black box, and consider the software you chose to use carefully, because it's your responsibility to keep it in good working order.
Making it someone else's responsibility to keep it in good working order is the value proposition behind (good) proprietary software: You give them money, they give you a support contract.
For a company with more money than development resources, or even just a company whose development resources can be more profitably focused elsewhere, this can be a quite reasonable trade to make.
If a company behind proprietary software goes belly up, there's no support. But there are always companies or even freelance devs who can be paid to support open source code.
That said, realize that a lot of what people think is important when it comes to reviewing code is iffy, at best. Consider, git.[1] Odds are extremely high that this does not fit any style of most startups. Now, you could take that as an argument against many startup stylings, but that is not necessarily my intent here.
To their credit, they have a very exhaustive coding guideline that is fairly open to breaking rules when need be.[2]
That is, I am asserting most metrics people use for intrinsic quality of code would likely mark git as something they would not want to support. And yet it is proving to be very reliable.
My personal sense, from watching developments in this space, is that we are going to have to find some way for taking on an open source dependency to be an economic transaction, with money actually changing hands. With open source, the code itself is free (in both the libre and gratis sense), but there are other places to identify value. One of them is chain of custody - is there an actual, somewhat responsible human being behind that package? Many of the most dramatic recent failures are of this nature.
Other value is in the form of security analysis / fuzzing, etc. This is real work and there should be ways to fund it.
I think the nature of business today is at a fork. Much of it seems to be scams, organized around creating the illusion of value and capturing as much of it as possible. The spirit of open source is the opposite, creating huge value and being quite inefficient at capturing it. I can see both strands prevailing. If the former, it could choke off open source innovation, and line the pockets of self-appointed gatekeepers. If the latter, we could end up with a sustainable model. I truly don't know where we'll end up.
On the other hand, it seems like making automatic payments to dependencies would be easy to screw up. Adding money to a system in the wrong way tends to attract scammers and thieves, requiring more security vigilance, while also giving people incentives to take shortcuts to make money. (Consider Internet ads, SEO, and cryptocurrency.)
Monetary incentives can be powerful and dangerous. They raise the stakes. You need to be careful when designing a system that you don't screw them up, and this can be difficult. Sometimes it can be easier to insulate people from bad incentives than to design unambiguously good incentives.
A counterpoint: the system has already attracted scammers. see eg the bitcoin injection in npm. And now that someone smart has blazed the way and demonstrated the opportunity, others are sure to follow.
Absolutely. Two other negative models are the music publishing industry and academic publishing. I was going to write "paywalled academic publishing," but some of the worst ethics are in the predatory open access space.
There's an opportunity. Paying an open source developer a living wage in return for taking some responsibility for security and updates is a reasonable thing, and would obviously benefit everyone all around. Whether we can actually get there is another question.
ActiveState has had this business model for quite a while. Even though you can download everything from PyPI, ActiveState has customers who are happy to pay someone else to take responsibility for dependencies.
We desperately need people using packages to pay. Otherwise it's nothing but a bunch of companies issuing demands to often unpaid people who build / maintain our shared code in these packages.
I will personally cop to having received an email complaining about a broken test in code I shared with the world and writing a less than polite email back. The code is freely given; that does not come with any obligations on my behalf.
Fyi, the article's title and sibling top-level comment by austincheney may give the wrong impression of what Russ Cox is talking about.
His essay is not saying software dependencies itself is a problem. Instead, he's saying software dependencies _evaluation_ methodology is the problem. He could have titled it more explicitly as "Our Software Dependency Evaluation Problem".
So, the premise of the essay is already past the point of the reader determining that he will use someone else's software to achieve a goal. At that point, don't pick software packages at random or just include the first thing you see. Instead, the article lists various strategies to carefully evaluate the soundness, longevity, bugginess, etc of the software dependency.
I think it would be more productive to discuss those evaluation strategies.
For example, I'm considering a software dependency on a eventually consistent db such as FoundationDB. I have no interest nor time nor competency to "roll my own" distributed db. Even if I read the academic whitepapers on concurrent dbs to write my own db engine, I'd still miss several edge cases and other tricky aspects that others have solved. The question that remains is if FoundationDB is a "good" or "bad" software dependency.
My evaluation strategies:
1) I've been keeping any eye on the project's "issues" page on Github[0]. I'm trying to get a sense of the bugs and resolutions. Is it a quality and rigorous codebase like SQLite? Or is it a buggy codebase like MongoDB 1.0 back in 2010 that had nightmare stories of data corruption?
2) I keep an eye out for another high-profile company that successfully used FoundationDB besides Apple.
3) and so on....
There was recent blog post where somebody regretted their dependency on RethinkDB[1]. I don't want to repeat a similar mistake with FoundationDB.
What are your software dependency evaluation strategies? Share them.
- How easily and quickly can I tell if I made the wrong choice?
- How easily and quickly can I switch to an alternative solution, if I made the wrong choice?
To contextualize those a bit, its often when trying to pick between some fully managed or even severless cloud services vs something self-managed that ticks more boxes on our requirements/features wish-list.
Also, its pretty important to consider the capabilities and resources of your team...
- Can my team and I become proficient with the service/library/whatever quickly?
Re: other high-profile companies using FoundationDB in production, I suggest checking out these two talks from the project's community conference, FoundationDB Summit: Wavefront/VMware[0], and Snowflake Computing[2].
> Does the code have tests? Can you run them? Do they pass? Tests establish that the code’s basic functionality is correct, and they signal that the developer is serious about keeping it correct.
This is one thing I thoroughly miss from Perl's CPAN: modules there have extensive testing, thanks to the CPAN Testers Network. It's not just a green/red badge but reporting is for the version triplet { module version, perl version, OS version }. I really wish NPM did the same.
That implies too much faith in tests. Tests are no better or worse than any other code. In fact, writing good tests is an art and most people cannot think about every corner case and don’t write tests that cover every code path.
So, unless you audit the tests they add no practical additional layer of trust, IMO, to just using the “package” with or without tests.
Many times I've had the most use for a test that didn't fit into the conventional unit test format but I didn't try to get it approved because I didn't want to get into a dogmatic argument about what a test should or shouldn't be. A lot of what I worry about doesn't get tested well using unit tests.
I feel like the production environment situation has changed significantly since Perl became popular. Now everything is run in "FROM alpine:latest" or whatever, and if it works once, it will mostly work everywhere. All the bugs that CPAN testers tried to find were platform differences like "debian puts /etc/passwd somewhere other than redhat". Yes, you will absolutely at some point encounter a bug due to some difference between Broadwell and Haswell or ARM and x86_64, so you can't completely ignore the issue. The regex to escape parentheses will probably work on ARM if it worked on x86_64. (I doubt the tests cases are likely to find this bug anyway, though it is sure nice if it does.)
Modest proposal: do the opposite of everything suggested in this article. After all, if you spend all your time inspecting your dependencies, what was the point of even having them in the first place?
This will ensure that maximum time possible is spent implementing new features. Everyone on your team can pitch in to accelerate this goal. Even non-technical outsiders can give valuable feedback. At the same time, this ensures minimum time spent fiddling about in a desperate attempt to secure the system and slowing everyone else down. Besides, unless you're already a fortune 500 company, no one on your team knows how to do security at all. (And even then the number of experts on your team is probably still dangerously close to zero.)
The software you ship will obviously be less secure than if you had focused any time at all on security. However, the utility of your software will skyrocket compared to what it would have been if you had sat around worrying about security. So much that your userbase is essentially forced to use your software because nothing else exists that even has a fraction of its feature set.
Sooner or later the insecurity will catch up with you. But this is the best part-- your software has so many features it is now a dependency of nearly everything else that exists. There is no chess move left except to sit down and somehow actually secure it so that arbitrarily tall stacks of even less secure software can keep being build atop it without collapsing like a house of cards.
And it's at this point that the four or five people in the world who actually understand security step in and sandbox your software. Hey, now it's more secure than a system built by a cult of devs tirelessly inspecting every little dependency before they ship anything. Problem solved.
Worse than package dependency is platform dependency. My code runs on top of 10 million lines of Kubernetes insanity that no one really understands, including the thousands of authors who wrote it. In theory, that means at the drop of a hat I can switch to a different cloud, kubectl apply, and presto! Platform independence. In reality, every cloud is slightly different, and we now depend on and work around a lot of weird quirks of Kubernetes itself. We're stuck with what we've got.
1. Convenience at cost to everything else. Easier is generally preferred over simplicity. If the short term gains destroy long term maintenance/credibility they will solve for that bridge when they come to it at extra expense.
2. Invented Here syndrome. Many JavaScript developers would prefer to never write original code (any original code) except for as a worst case scenario. They would even be willing to bet their jobs on this.
For me (Javascript Developer), you have to stand on the shoulders of giants if you want to compete. Any code you re-invent is code you have to maintain.
I've found though, some engineers love to create everything from scratch, and this greatly hinders their ability to hire/fire as everything is proprietary, and usually not documented.
Most decisions are pretty grey, but for me, choosing to handle stuff yourself is never a good choice. In the same way as no-one should ever try and create Unity from scratch, no-one should try to create React from scratch. You simply can't compete with the support and effort of a global development team.
If you wanna learn though, that's a different kettle of fish. Reinvent the wheel all day. Just don't use it in production.
Re: Any code you re-invent is code you have to maintain.
But the flip side is that any code you borrow you have to fix, debug, and/or replace if it doesn't work as intended. I personally find it far easier to fix my own code than others' code.
I'm not claiming one should reinvent most wheels to gain control, though, only that there are trade-offs for each. If you borrow code, you should have an alternative(s) in mind if it stops working in the future or key bugs are found in it.
For example, using a JavaScript date-picker gizmo for forms is usually not a big maintenance risk because there are multiple to choose from. If you had only one choice and that choice went kaflooey, then you'd be stuck.
Well, the user could hand-enter dates. For internal software that's an acceptable temporary work-around, but not for commercial software, because every customer expects a date-picker (and the HTML5 ones currently stink if you want mm/dd/yyyy formats).
In short, don't heavily depend on any external part you don't have a "Plan B" for.
> I personally find it far easier to fix my own code than others' code.
This is a problem with larger teams though if anyone else needs to work on the code, and results in code that doesn't have documentation/support available.
Not documentation as in 'He didn't make a confluence' page, but documentation as in 'If I google this error message will I get a good answer'
I'm not sure what your googling-error-message example is intended to demonstrate. And again I'm not claiming self-rolled is always better; but merely responding to a specific claim. The bottom line is one has to weigh the trade-offs using multiple metrics and factors. A large part of managing technical projects is the balancing of tradeoffs.
IF an app depends heavily on a feature AND there are not many external alternatives, then a shop should lean toward rolling their own.
A related issue is the volume of code. If the in-house version is 500 lines of code and the external (downloaded) version is 5,000 lines, then the internal one may be easier to debug and fix. (Yes, I realize that lines of code is not the only factor to consider.)
Common factors to look at:
1. Complexity of the part, including dependencies on other parts.
2. Swappability of the part for alternatives.
3. How well the part is written and documented.
4. Need for the part. If it's merely a "bonus", then dependency may be less of a problem/risk.
If you use an external library, it's much more likely that you can google any issues you have with it, rather than having to reverse engineer it. The reason the GP quoted this:
> I personally find it far easier to fix my own code than others' code.
is that if you work on a team, you're fixing others' code either way. The difference is that with an external library there's usually documentation, a community, etc; but with a home-rolled solution, you're usually on your own.
External libraries usually have to cater to wider usage and thus have more code and more parts and more options. A "dedicated" in-house module will typically be much less code because it has one job, meaning less to read, reverse engineer, test, etc. I'm not saying one is always better than the other, only that one brings in a rather complex wad of code in most external libraries that could be a bottleneck.
The "community" often ignores my bug fix requests, I would note. Maybe I ask wrong, but we don't all have wonderful asking skills. If you do, that's a nice feature of your personality, but doesn't necessarily extrapolate to all code users.
I'm not sure how long you've been a JavaScript dev or how senior you are, so forgive me if this sounds condescending. For what it's worth, I'm genuinely not trying to be condescending.
As I matured into the profession, I went through stages of dependency. In the beginning, I wanted to build everything myself because I was learning. In retrospect, this was a good choice because I was way too green to pick a dependency and at least I knew my own code was crap. Then, I reached a place where I wanted to stand on everyone's shoulders - I agreed that any code I had to write myself was code I had to maintain myself.
Lately, I see all the grey in between those two points. I'd sooner not try to reinvent React because that would take me an obscenely long time. But, for more trivial things like left-pad or escape-string-regex, I've discovered that it's cheaper to implement and maintain it myself than to go through all the steps to vet a new dependency.
There's a strange place in this industry where rolling your own solution is cheaper in the long run.
I never suggested left-pad myself, but what I would do is copy and paste a function off stack-overflow, name it appropriately, and encapsulate it, maybe write a test that is the flavour of the month.
Then leave a link to the page I got it from in the comment above.
I'm about 5 years into my career, and have re-invented the wheel in the past myself. It's great fun to make these huge rude-goldberg-esque inventions but for making an MVP, it simply isn't worth it.
Specifically avoiding libraries like express, react, ORMs or crypto though... People love to do that, and far too often it ends up living for eternity.
I've been a professional developer for 15+ years, I don't think I've ever seen anything posted on stack overflow of high enough quality that I would happily paste it into a production system.
Great to use a guideline for a problem... but copy and pasting from stack overflow (even with a comment to that page, which i've seen!) just no!
I'm inclined to agree with you, though I still think that RealDinosaur deserves a lot of credit. He or she was still honest and forthcoming about that. It's impossible to start teaching better practices if people aren't honest about what they do!
I am a JavaScript developer fulltime for more than a decade.
> you have to stand on the shoulders of giants if you want to compete
Hold on there cowboy. We are what we are. Nobody starts out as a rockstar coding ninja. It takes practice. Just take it one step at a time. Solve for problems as you encounter them. The challenge you go through to provide that solution will open your eyes to other potential solutions to other ever challenging problems you never would have seen otherwise.
> Any code you re-invent is code you have to maintain.
If its code that has found its way into your code base you are maintaining it anyways. That is why dependencies are debt. If you have not reviewed the tests and/or validation of those dependencies you are introducing untested code into your application on blind faith.
> In the same way as no-one should ever try and create Unity from scratch
I don't think anyone is really trying to create Unity from scratch (unless they're trying to compete with Unity). But that doesn't necessarily mean you should just use Unity. Unity is complicated, and designed to be a general purpose game engine. Which means for any given game, it may or may not be a good fit. A lot of times you can get up and running faster in Unity, but soon start to hit walls as you try to do something more advanced. If you wrote your own engine, you would just modify it to meet your needs, but with Unity you end up trying to find workarounds for its shortcomings.
I also think your example with React is flawed. There's a big difference between left-pad and React, and I think the most interesting packages to discuss is the range of things between those two.
> no-one should ever try and create Unity from scratch
Some game developers do write their game engines from scratch. I know of at least one successful example: Jonathan Blow, with Braid (2D engine) and The Witness (3D engine). Note that in both games, a generic engine wouldn't have worked, or at least would have required such an amount of customisation that it's not clear it would have cost less, or looked and felt as good. Sure, don't go rebuild a generic engine from scratch. But a custom one, tailored to a very specific use case? That's not such an obvious no-no.
Another example would be Monocypher¹, my crypto library. Why would I write such a thing when we already have Libsodium? The reason is, I saw that I could do better for my use case: an opinionated, easy to use, portable package. The result is massively simpler than Libsodium. I don't care that I cannot compete with the support and effort of Libsodium team. I made sure I didn't need to.
> Another example would be Monocypher¹, my crypto library.
I don't know who you are, nor what Monocypher is, beyond what you have written here, but this example should come with some caveats listed.
It is generally considered "best practice" to -not- attempt to roll your own cryptographic system and use it for production purposes, unless or until it has "ran the gauntlet" of peer review - and that review may be long and harsh.
Maybe your library is a wrapper around existing functionality to make that functionality simpler to use; or maybe your library has "run the gauntlet" and you are also a "well known" person in cryptographic circles, and so your work is trusted.
But again - the general developer should never think to create and distribute their own cryptographic library system, and one would be cautioned that even a crypto library that is "only a wrapper" around other crypto algorithms or libraries should also be thoroughly vetted before incorporating it into your project, especially if that project is anything more than a hobby grade system.
> this example should come with some caveats listed.
No longer. You would know if you spent 10 hours reviewing Monocypher (which I reckon is not a good use of your time), so it's natural that you don't.
> It is generally considered "best practice" to -not- attempt to roll your own cryptographic system
I am keenly¹, painfully² aware of what it takes to write production grade crypto. And I didn't really roll my own. I only implemented primitives everyone trusts. And I wasn't alone either. I've had lots of reviews, as well as substantial external advice and contributions. That you can confirm by scouring the GitHub repository and the Monocypher website for 15 minutes.
Braid and Witness could have been written in Unity though.
I'd argue that dealing with high level concepts such as game/level design and art direction and low level stuff like graphics and rendering simultaneously is insane. I don't know how Jon Blow did it, but personally being able to abstract away all that low level stuff makes the design process way easier for me.
There was a recent game, 'Return of the Obra Dinn', which went the opposite way. It was the dev's first game with Unity, and he attributed most of his success to the Engine.
It doesn't look like a Unity game, it doesn't play like a Unity game, and it has won several game of the year awards.
Come to think of it, there's Antichamber, a non Euclidean labyrinth based on Unreal Engine (4, I believe).
As for how Jon Blow did it, I suspect having his own engine let him explore gameplay ideas more readily than using a generic one. The time travelling in Braid and all its variations would be pretty hard to bolt on a generic engine: it's not just rewind, it's partial rewind, with some entities being immune to the rewind. There's even a level where time goes forward and backward depending on the position of the main character. Go right, forward. Go left, backwards.
For The Witness, it's a bit more subtle, but about a third of the game required pretty crazy 2D projective analysis of the 3D world (the "environmental puzzles", don't look them up if you don't want spoilers). While it didn't en up being central to the game, it was basically the starting point.
The engines of Jonathan Blow's games are more central to their gameplay than for most games. Still bloody impressive, but probably less unnecessary than one might originally think. Also, Jonathan Blow has pretty strong opinions about game development, and I got the feeling that he disagrees with most generic engines out there. Working with them would probably caused suffering, whose cost he didn't want to pay. (Speaking for myself, my productivity drops pretty sharply when I spot stuff I too strongly disagree with, and I can't fix it.)
Maybe Braid and The Witness could have been written in Unity, but it's not at all clear that it would have saved any time, or that it would have resulted in something as good.
There are a lot of issues I've seen devs run into with using Unity, sometimes much earlier on than you'd expect. Unity was designed to be a generic engine, which means it won't be ideal for every use case.
There's a greater issue that is often ignored here. If React is so large/complex that one ought not create their own version from scratch (I actually question this assertion, but will grant it for the sake of discussion), it begs the question as to how risky it is to bring into your stack. How trivial is it to replace if its so complex? Vendor lock-in can be a real problem for teams.
Remember the React license fiasco? What if Facebook hadn't caved there and removed all the controversial patent language (by relicensing as MIT)? What if, as a matter of existential risk, you had to make the business decision to remove React from your app?
As such, I think the other component of this particular decision tree should include how much you become dependent on a particular library/module/framework and the consequent risks that introduces to your organization.
There’s a nuance here that I’d like to dive into, though, as well. There’s a parallel that any code you don’t invent, I’d argue, is harder to maintain, if necessary.
Find a bug or want to add new functionality? You’re either now maintaining an internal fork or working through legal to get a patch pushed to OSS (Personal experience, but the idea of there being work involved with changing a dependency is still generalizable, I believe).
The result of this friction, I’ve found, is one of two scenarios. The first is that there is just a lot of glue code. This glue code bridges different libraries in an idiomatic way so that the team can now understand how different parts interact. The second method is the wrapper method. Every piece of useful functionality is abstracted out so that the wrapper writer can do additional things in between and also buffer API churn.
Even with that said, I often fall into the “If someone else can do it for me, I’ll manage this dependency risk” category, as I assume you do as well.
What a regular backend developer would do is to call ffmpeg from his language
What a JS/node dev does is to find a random package on npm that does that for him, maybe has no idea what ffmpeg is, has no idea that he can change parame6ters and options, he just reads the package readme, installs it and it's 10 dependencies and job done.
JS Code runs on so many VMs, which makes trivial things difficult. Imagine you have 10 different JDKs and you want to support all of them.
Of course you can write your own code, but you will be in a bubble then and "works for me" will be something that you say very often.
On the other hand maintaining this code means that you need to have quite solid tests for it, which means that it would be easier to just separated in a module.
Modules are published on npm. Yeah there are lots of them.
When it comes to environmental concerns two general rules account for most of your design decisions: 80/20 rule and separation of concerns. Environment APIs should be well separated from the core of your application. That core of your application will get you the 80% value from 20% of code/effort.
On my open source projects, for example, I perform a huge ton of test automation. My application run in browsers, but I don't provide any test automation for the browser environment. I just manually test there occasionally. The effort is too high and the environment is so simple that I can get what I need from brief manual testing. Since the code that runs in the browser (and its documentation) is dynamically assembled maybe half the concerns there are solved by testing the code in other environments more conducive to test automation.
Rationally speaking you don't really need to provide 100% test coverage. You just need to provide enough tests to guarantee the application does all it claims to do.
> Of course you can write your own code, but you will be in a bubble then and "works for me"
That is what test automation and feature tests are for. The application does all it claims to do or it doesn't regardless of whether you wrote original code or cobbled together various untested packages.
> My application run in browsers, but I don't provide any test automation for the browser environment. I just manually test there occasionally. The effort is too high and the environment is so simple that I can get what I need from brief manual testing.
I do the same.
But I must admit, nearly every time I manually test on a different browser, some behaviour or other is different in some broken way, and I spend a while updating the code to handle yet another browser difference.
Perhaps I'm pushing the edge a bit with the sort of things I write, but still. It's astonishing how much variation there is.
I am pretty old school with my approaches to writing for the browsers so I don’t really see these differences in either my JS or CSS. I love template strings, and if weren’t for my heavy use of those my code would work just fine in IE9 and possibly IE8.
It might be that data protection regulations start to 'encourage' movement in this area regards more careful consideration of the software dependency chain. If you pull in a malicious dependency which results in personal information being exfiltrated, I doubt the "we pulled in so many third party dependencies it was infeasible to scrutinise them" defence is going to mitigate the fines by very much.
that is the ideal path, but sadly most things indicate the system prefers the opposite path, especially if we look at "responsible disclosure" where the contributor is expected to give a centralized temporary secrecy agency advance warning, and we blindly have to trust them not to weaponize what essentially amounts to an endless stream of 0days (or trust them not to turn a selective blind eye to malicious exfiltration of these 0days)
I like (and basically agree with) the article, but I have to think it basically does a good job of pointing out the problem, and a bad job of suggesting a solution. The sheer number of dependencies of most commercial software now, and the ratio of backlog-to-developers, basically insures that the work required to check all your dependencies does not normally get done.
Hypothesis: it will require a massive failure, that causes the ordinary citizen (and the ordinary really, really rich citizen) to notice that something is wrong, before it changes much.
Hypothesis 2: after that happens, the first language whose dependency manager handles this problem well, will move up greatly in how widely it's used.
For a 100 man year project we have accumulated around a dozen external dependencies and only two of them are transitive (one for zipping and one for logging).
I think that’s fairly reasonable and about what I’d expect.
So as you might have guessed it’s not a node project, but that’s my point - perhaps the idea of dependencies is manageable so long as the platform allows you to keep it reasonable. Meaning, at the very least, a good standard library.
I think object-capabilities are one way to have much safer code reuse. Suppose a dependency exports a class UsefulService. In current languages, such a class can do anything - access the filesystem, access the network, etc. Suppose however that the language enforces that such actions can only be done given a reference to e.g. NetworkService, RandomService, TimeService, FilesystemService (with more or less granularity). Therefore if UsefulService is declared with `constructor(RandomService, TimeService)`, I can be sure it doesn't access any files, or hijacks any data to the network - nor do any of its transitive dependencies.
The method of sandboxing using OS processes + namespaces and what not is too heavy and unusable at such granularity.
The method of per-dependency static permission manifests in some meta-language is also poor.
The method of a single IO monad is too coarse. Also using any sort of `unsafe` should not be allowed (or be its own super-capability).
Obviously there are many tricky considerations. [For example, it is anti-modular - if suddenly UsefulService does need filesystem access, it's a breaking change, since it now must take a FilesystemService. But that sounds good to me - it's the point after all.] But does any language try to do this?
The problem I see is not in the fact the develpers choose to rely on third party software reuse and thus create dependencies, but in how developers choose which third party software to use. If their judgment fails, the consequences for the user can be dire.
For example, Google chose to reuse the c-ares DNS library for their Chromebooks over other available DNS libraries. It is maintained by the same person who oversees the popular libcurl.
The company issued a challenge and a $100,000 bounty for anyone who could create a persistent exploit with the Chromebook running in guest mode.
As it happened, the winnning exploit relied on an off-by-one mistake in the c-ares library.
Users are not in the position to decide which (free, open-source) code is reused in a mass market corporate product. They must rely on the judgment of the developers working for the corporation.
On my personal computers, where I run a non-corporate OS, I prefer to use code from djbdns rather than c-ares for DNS queries. If someone finds an off-by-one mistake in djbdns, and this has negative consequences for me, it will be my own judgment that is to blame.
The real dependency problem is that most languages give out way too much trust by default. Any code can have any side effects.
I'd like ways to guarantee my dependencies have no side effects, like they were Haskell with no IO/unsafePerformIo, or to aggressively audit and limit those side effects. Malicious event stream package suddenly wants to use the network? No.
Another way to state this is: accept the state of the world and approach the problem using an existing methodology - treat code as untrusted and whitelist execution paths. SElinux and others do this, intrinsic is another product that uses the same approach for app runtime, I think this is probably the future of this problem space.
This is zero trust, and this pattern is showing up everywhere (again?).
There used to be talk about how to increase "reuse" of software, and now that systems use masses of libraries, the down-sides of heavy but casual reuse are coming to light.
I'm not sure of an easy answer. Perhaps the libraries can be reworked to make it easier to only use or extract the specific parts you need, but it's difficult to anticipate future and varied needs well. Trial and error, and blood, sweat, and tears may be the trick; but, nobody wants to pay for such because the benefits are not immediate nor guaranteed.
OOP use to be "sold" as a domain modelling tool. It pretty much failed at that for non-trivial domains (in my opinion at least), but made it easier to glue libraries together, and glue we did.
It's not that hard. You just need to think of dependencies as something that has non-zero benefits and non-zero costs. The problem is that, as usual, whereever you've got a "zero" showing up in your cost/benefits analysis, you're overlooking something. Sometimes it's minor and negligible stuff, but sometimes it's not. Act accordingly.
One thing that I believe we will come to a consensus on is that there is a certain fixed cost of a dependency, analogous to the base cost of a physical store to manage the stock of anything that appears on the shelves no matter how cheap the individual item may be, and that a dependencies will need to overcome that base cost to be worthwhile. I suspect that the requisite functionality is generally going to require in the low hundreds of lines at a minimum to obtain, and that we're going to see a general movement away from these one-line "libraries".
I say generally more than a few hundred lines because there are some exceptional cases, such as encryption algorithms or some very particular data structures like red-black trees, where they may not be a whole lot of lines per se, but they can be very dense, very details-oriented, very particular lines. Most of our code is not like that, though.
Do you mean creating libraries that are flexible and partitioned well for future needs? I do find that hard and almost no library maker I know of gets it right the first time. Analysis of current needs is difficult; analysis of future needs is extra difficult. Experience helps, but is still not powerful enough. The future continues to surprise the heck of out me. Tell God to slow things down ;-)
No, I mean that it's not that hard to do some due diligence when picking a dependency. You just need to get over the idea that it's something you don't need to do.
No, you're not going to read every single line, but you ought to be running through the basics outlined by Russ in his post. If you're being paid to code and you're not doing those basics, you're being negligent in your professional duty.
And knowing the internet and its inability to deal with nuance, let me say again, no, it's not trivial. But it's not that hard, either. If a dependency is worth bringing in, it's bringing you enough value that you ought to be able to spare the effort of doing the basic due diligence.
15 years ago adding an external module was an endeavor involving approval forms, lawyers, etc. so that it frequently were much easier just to develop required functionality yourself. These days i still shudder seeing how the build goes somewhere, downloads something (usually you notice it only when whatever package manager being used for that part of the build didn't find the proxy or requires very peculiar way of specifying it - of course at the companies with transparent proxies people didn't notice even that ) ... completely opaque in the sense that even if i spend some time today looking into what is downloaded and where from, tomorrow another guy would just add another thing ...
Is the package management story significantly worse for js/node than other languages or is it just a meme? If it actually does have more issues, why? Are the npm maintainers less rigorous than, maven central (for example)?
Java is lucky enough to have a lot of very solid Apache libraries built with enterprise money. Is the culture different for js and npm?
1) Crap standard library and core language, with lots of accidental complexity. These problems are fixed or (more often) swept under the rug many times over by library authors over the course of years, then eventually the core language/lib provides its own attempt at a fix, but by then there are a ton of implementations out in the wild and it'll takes years for a typical mid-size projects dependency tree to shake out all the "deprecated" libraries of any given feature like this, if it ever does. That so many libraries have to pull in other libs for really basic stuff bloats your node_modules dir in a hurry.
2) Platform incompatibility plus no-one actually writes real Javascript that can run directly on any platform anymore anyway, so there are polyfills for yet-to-be-implemented language features and compatibility overlays galore.
3) And yes a lot of it's just the fault of Javascript "culture".
Java/.NET/C++/etc. people don’t have the urge to publish every other line of code they deem “useful”. They also don’t have the urge to import said one-liners when writing a helper method in 15 seconds is perfectly adequate.
> Adapting Leslie Lamport’s observation about distributed systems, a dependency manager can easily create a situation in which the failure of a package you didn’t even know existed can render your own code unusable.
Gold right here. Makes me wonder what Lamport’s TLA+ could be used for in the problem area.
> We do this because it’s easy, because it seems to work, because everyone else is doing it too, and, most importantly, because it seems like a natural continuation of age-old established practice.
And because we literally could not be creating software with the capabilities we are at the costs it is being produced without shared open source dependencies.
I guess this is the same thing as "it's easy", but it's actually quite a different thing when you say it like this.
Dependencies are such a huge pain but I kinda liked the way we handled it when I did contracting work for the NSA years ago. Essentially we told them _exactly_ what dependencies we needed, including subdependencies, and we audited them the best we could and then we included them. To avoid this headache meant we were less incentivized to just pull in a module for every little thing and, instead, write our own where necessary or used modules that had less subdependencies.
I think we're ready for a new class of dependencies. Dependencies that have little to no subdependencies. Dependencies that you can more easily audit because of fewer subdependencies.
Also, we need less building of JavaScript code in npm packages. Instead, let people access the raw code so they can not only do tree shaking but they can examine the code that is running versus the code that may be in git. You can still include it and minify it with your stuff. This would also mean you could have larger libraries that do more stuff because you'd only include what you use (think how many Java libraries work except you could pull out what you need).
I don't think there is a good software / npm solution. I think we need to change the way we work with dependencies entirely.
I much prefer Java's model of software dependency consisting of (for the most part) well-documented, large, feature-filled libraries distributed in an easily discoverable and maintainable manner (maven/gradle/...) to the dependency hell that is modern JS libraries. Hopefully newer languages like rust don't succumb to the same trap.
I once took a close look at a Java Web service application running internally at a large bank. It depended on over 3000 jar files, most of them likely transitive. When I queried the rationale of this, the dev team just shrugged it off as common Java practice. I do not think Java is in a significantly better place than the JS world with regard to dependencies.
I am curious if it was actually 3000, or if that is embellished. I've been in about 6 java shops from Nike to startups, and the number is USUALLY around 100.
The reason is really simple - Jar files used to be required to download manually to add as a dependency. So there was a history of about 15 years of doing it the hard way. After maven was introduced, it took a while before OSS libs started adding other libs. Most of the time OSS libs are just including other Jars from their own organization.
It's possibly an embellished number, I cannot remember the actual number but it was significantly larger than 100. I seem to have remembered it as 3000. The project had many frameworks: spring, glassfish, camel among others.
Gotcha, I'm guessing 300 range. That is quite a bit for Java actually - so still counts as pretty bad. Contextually here's a fairly complicated program in our current stack (which is all JS), the node_modules folder has 722 dependencies in it right now. Edit: I was replying to your first edition of the reply. If it's truly 3000 that's quite insane. That being said, the projects with a few hundred would have Glassfish, CXF, Spring, etc...
Yes, that's why I package as much stuff as I can into a hermetic Bazel build, including Python modules (and yes, I build Python programs in Google PAR format using Subpar). They're all stored in my own cloud bucket, the entire transitive closure can be tracked down, and they don't change underneath me willy-nilly. For C++ cross-builds I also package toolchains in a similar fashion. You could also package a toolchain for the host if you'd like, I just don't bother. And I package test data likewise. The build isn't 100% hermetic, but I'd say about 90%. I feel pretty good about this set-up and recommend it to others. Grabbing random packages (and worse, their transitive closures) from the internet as a part of the build sounds insane to me.
Article kind of mangles the relationship between software reuse (which _has_ been here for a long time) and specific language library, etc..management.
many years ago now systems administrators were tasked with providing a safe and sane environment for end user and developers by performing the exact due diligence that is described in this article. In the 'move fast and break things age' all this has been thrown to the wind and everyone decries the language manager code sprawl and breakage. Of necessity enterprises revert to 'immutability' as if it was a desirable and necessary deployment characteristic. This is an ugly time in IT.
The way I see it, our over-dependency (sorry for overloading the phrase) on Javascript as the de-facto web language has the pendulum far in one direction. How much longer can we keep this up? What's the maximum capacity of a developer ecosystem before dependency-hell and framework churn reaches critical mass? This is still a complicated information system - how far can it scale? What's the breaking point?
There's so much amateur work and muddied merit-sense-making of what's good software, who to listen to, and how to move forward - my feeling is that pendulum is just about at peak.
But what of the newbie developer? Is he/she going to just roll their own dependencies and do so in a way that's tenable? Green developers make up most of the category.
I guess I was trying to approach a few concerns beyond just dependencies: learning curve, conventions/standards, framework volatility, and merit assess-ability of ideas.
The more people involved (popularity), the greater the difficulty to parse the merit of an idea without pre-existing competence. How easy is it for a new developer to find a cogent way of doing things in Javascript land compared to a smaller more specific ecosystem? In the smaller ecosystem the experts are easier to determine due to a smaller population, whereas in Javascript-land there's so many people, opinions, articles, and conventional disparities; a much more challenging exercise.
This sounds like a problem pretty unique to Javascript, honestly.
I think if you saw a professional C++/C#/F#/LISP*/Clojure dev etc pulling in a dependency to do IsOdd you'd rightly laugh at them... Yet in JS, that just seems like an acceptable thing to do.
I don't understand why the standard is so low. Is it because of all these learn Javascript / Web development in 12 weeks bootcamps?
Along the same lines is Docker Hub. Blindly building your own images via dockerfiles that pull from others images should warrant serious consideration, especially given those images can be updated at any time.
"Dependency managers can often provide statistics about usage"
Using module usage statistics as a proxy for trust is not always a good idea.
For example, I confirmed with the security team of npm that they do not audit module download statistics, i.e. no detection of gaming the system through multiple downloads from a given IP.
It's quite possible for a module to have 10,000 weekly downloads, all generated by a cron curl script run by the module's author.
I wouldn't be surprised if this was the case for not a few modules on npm, especially to develop trust for later exploits.
- Cost of creating a package must be low (ideally the package just lives in source control). This encourages code reuse and therefore testing.
- Verification of changes must be easy. Git is a great tool for this - we can review patches between versions, rather than whole versions at once.
- It should be easy to extract dependency graph (including transitive deps) so that you analyze who you are trusting.
- There must be a verifiable chain from package source to package bundle (NPM fails here, do you really know the source code reviewed on GitHub is what went into the bundle on the NPM registry)? Better yet, have no bundles at all, just source code + build instructions.
- Reproducible installations (usually implemented via lock-files) are critical. Many package managers have lock-files that do not actually give reproducible installs. Beware!
- Package builds must be isolated from each-other (otherwise one package might tamper with another; I believe this is possible in NPM packages)
This is excellent. Not only for the subject matter but the quality of writing. I often take an article like this, distill it into my own (usually fewer) words and save it as a text file. This one I kept "distilling" only to realize, nope, nope, the way he said it was more exact/precise/correct.
This is probably a prelude to a deeper discussion of the module notary service that the Go project intends to run. First announced in this post from the end of last year: https://blog.golang.org/modules2019
There aren't a lot of tools out there to keep you 100% up to date and to keep moving.
There is maven-dependencies that can auto upgrade, however, that's just a simple version upgrade and may have issues with non-standard versioning. Also it doesn't help with transitive dependency conflicts.
We need good tools to alert and stop transitive dependency conflicts in their tracks. Versions helps with this, but it doesn't tell you much.
What we do need: Jenkins dependency triggers for the projects. We need something that will automatically work wtih SCM and CI to create commits based on new found dependencies. If there is something that changed your tests should confirm if it works or not.
Worst case is more along the lines of a bad actor makes malicious changes to the dependency which you then unwittingly deploy to prod potentially compromising your entire system.
Really I couldn't help but facepalm when I heard about that. "Haven't those people heard of local caching?!"
Really and also freeze the version number to what you know will work while you are at it. Unless it is an actual security and/or standardization important component (say SSH) it can wait until you know what it will do and that it won't break anything. It is good for bloat avoidance, security, and reliability.
Reducing the number of dependencies can avoid many of the problems, and makes it easier to examine the code, as well as less likely to cause problems (of several kinds). Many code has too much dependencies. Whether writing in JavaScript or C or something else, I will usually not use many external libraries; most commonly none at all.
I am pretty new to development, and I keep trying to prove myself wrong over my apprehension to willy-nilly accumulate dependencies just because “the time savings add up”.
Before starting any new project, I research and try all the existing similar projects I can find. I can predict their stability with overwhelming precision just by glancing at the dependencies, so the few projects I have built use only the most vanilla version of mainstream dependencies.
And another result of this observation has been that I have come to devalue the word of devs with that happy-go-lucky approach to dependency accumulation. It seems to correlate with the exaggerated optimism that persists around everything in the development community. I’d like to be more optimistic just like everyone else, but ignoring debt like this doesn’t seem like the right way to do it.
I'll just go ahead and take the downvotes/burial/lectures/ridicule whatever but I need to say it anyway. I've been programming for thirty years and in my opinion effective code reuse with npm is one of the greatest achievements in the history of software engineering. It's not perfect but it should be appreciated more and the issues are being overblown.
Some years ago, in offices, computers were routinely infected or made unusable because the staff were downloading and installing random screen savers from the internet. The IT staff would have to go around and scold people not to do this.
If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.
In the linked piece, Russ Cox notes that the cost of adding a bad dependency is the sum of the cost of each possible bad outcome times its probability. But then he speculates that for personal projects that cost may be near zero. That's unlikely. Unless developers entirely sandbox projects with untrusted dependencies from their personal data, company data, email, credentials, SSH/PGP keys, cryptocurrency wallets, etc., the cost of a bad outcome is still enormous. Even multiplied by a small probability, it has to be considered.
As dependency graphs get deeper, this probability, however small, only increases.
One effect of lower-cost dependencies that Russ Cox did not mention is the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them. This further multiplies the task of reviewing all of the code important to your program.
Linux distributions had to deal with this problem of trust long ago. It's instructive to see how much more careful they were about it. Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust. Of course, the distributions are at the end of the day distributing software written by others, and this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review. And of course, the hassles of getting a library accepted into distributions is one reason for the popularity of tools such as Cargo, NPM, CPAN, etc.
It seems that package managers, like web browsers before them, are going to have to provide some form of sandboxing. The problem is the same. We're downloading heaps of untrusted code from the internet.