Hacker News new | past | comments | ask | show | jobs | submit login
Surviving Software Dependencies (acm.org)
147 points by kick 16 days ago | hide | past | web | favorite | 37 comments

Good read, I like that he expanded on the assumptions we make when we include a dependency. However, I think his treatment of dependency upgrades is a bit confused. Of course Equifax should have upgraded their struts dependency, but that is not the result of relying on software reuse, as security vulnerabilities are found in all code. If they had built their own framework, the team that produced it could just as easily published a security patch that was ignored. So to me the point is that important upgrades, whether proprietary or FOSS, should be made as backwardly compatible as possible. Fortunately, they usually are.

A related issue that wasn't mentioned in the article is the problem of forced upgrades. Dependencies can arbitrarily introduce incompatibilities that don't align with your own priorities, so that you end up spending a lot of time keeping current with package releases that your users really don't care about. Publicly facing services with a broad attack surface should choose their dependencies carefully, as they'll be forced to upgrade often. Services behind a secure firewall are less urgent.

I think you missed his point on the Equifax story. I think he was trying to illustrate that their reliance on the Struts dependency come with a responsibility of both knowing where that dependency was being used. As well as monitoring that usage to ensure that you aren't left vulnerable by that usage.

It's entirely orthogonal the responsibility of publishers of the dependency which you address in your comment. Both are important. But Russ's audience wasn't publishers. He was talking to the consumers and what they need to understand when they take on a dependency.

> If they had built their own framework, the team that produced it could just as easily published a security patch that was ignored.

I believe you're just making the inverse assumption from Cox. He seems to assume a monorepo-esque setup where everyone depends on HEAD, and you seem to assume a microrepo-esque setup where everyone declares their own dependency revisions.

Back in the day we had curated libraries. E.g. Roguewave, Boost, etc. You chose your libraries relevant to domain needs and stuck with them. Adding a new library to the project was a big deal.

This may still be true in C & C++ land, and to small extent in the Java world with the Spring libraries. But Node, Python, et al. seem to have zero curation and an explosion of transitive dependencies. By making it easy to reference arbitrary dependencies, projects haul in random unscrutinised dependencies.

> But Node, Python, et al. seem to have zero curation and an explosion of transitive dependencies. By making it easy to reference arbitrary dependencies, projects haul in random unscrutinised dependencies.

Adding a library should be easy and I think we shouldn't blame ease of use. I'm worried too about the explosion of dependencies in Node, Python and Rust in my case specifically, but at the same time I'm happy that I don't have to waste my time with the boring work of adding a new library like in C or C++.

What is needed in my opinion is three things:

1. Developers should be more suspicious. I believe security disasters to come will solve this problem over time.

2. A better way to express trust in package managers. I would like to be able to express trust or revoke trust against persons and groups and not against packages. For example in the Rust world there are a view packages written by folks that mainly write the standard library and work on the compiler. I trust these people and if I wouldn't I probably couldn't use Rust at all. So I don't want to spend a single thought about importing one of these crates. Crates form others will undergo more scrutiny, but ultimately using a package or not boils down to the fact that I trust certain people and per default I don't trust everyone else. What I want to be able to do is to express to my package manager, so it prevents me from using stuff from untrusted entities and at the same time doesn't bother me otherwise.

3. Curated repos. When I wrote Java for big corp we had to use a curated Maven repo. The regular public Maven repo was not allowed. The curated version was supplied by a third party, Sonatype in this case. Back then I thought this service will become a big deal, that we will soon have many companies that provide services like Sonatype and that they will blossom. It never happened but maybe time was not ripe for it...

> This may still be true in C & C++ land, and to small extent in the Java world with the Spring libraries. But Node, Python, et al. seem to have zero curation and an explosion of transitive dependencies.

That's a great point. I wonder if there is any interest in providing a Boost-esque package bundle for Node or Python, where maintainers adopt a selection of packages and treat their official releases as unstable/bleeding edge releases.

Don't really agree about Python. The trend is still to larger projects maintaining a set of libraries that are relatively self-contained.

Off-topic, but reading "Rogeuwave" gave me a sudden flashback to an old project.

Genuinely a name I've not heard for 20+ years. Thanks for the memories!

Their c++ stdlib docs used to be best in class around 2008... Maybe even later

I just spent two weeks trying to figure out code at work that required 8 dev dependencies from npm locked down to certain version numbers not specified in the package.json.

That ends up being hundreds of total dependencies to run two build steps and 1 http service. What a waste of time and code.

How did we get so drug addicted to shitty outside code? Had I not been completely new to the team I could written a better solution in about an hour directly with Node.

You need one function so you add library. But that library contains dozens of functions and uses other libraries to implement those functions. So now you have all those dependencies even if you don't use them.

Do you really need that one external function though? At what point is it better to simply write it yourself?

It's a matter of balance, always. A thin line. And dogmas that people on teams follow (we all do in some way).

Not invented here syndrome, always create a class, always put a class in its own file, premature optimization blablah, don't use goto, this and that is bad practice in that language, don't optimize until you need it (but you later don't have time to profile, or don't really want to...)

Never-invent-here syndrome is just as bad as not-invented-here syndrome, though.

No, never-invent-here (aka extensive 3rd party library reuse) is orders of magnitude worse. NIH means you retain full control of your codebase. It's really determinism that leads to you being able to make reliability/robustness, availability and security guarantees.

Never-invent-here is a disaster waiting to happen.

External code is better than internal code because you can reduce it to internal code just by forking it. If it is even minkscully useful, that's a win.

My main point is that if it's something you cannot write exhaustive tests yourself in a few minutes then forget it and use something that has been in the field.

That is not a compelling argument. Yes, tests are important. However tests also require maintenance, so they are debt just your like dependencies. If writing my own build and HTTP service requires a few tests I would happily take that over failing for weeks with hundreds of megs of extraneous code that doesn't work.

Honestly, I think people justify this nonsense to themselves because they are scared to write original code, even if its incredibly tiny. The primary purpose of writing software is automation, if you aren't automating things with everything you write you are probably just an expense center. This is the financial way to say that your misplaced objectives make you unproductive and/or unreliable. It doesn't have to be that way.

You don't need hundreds of megs. Use a stripper to strip put the parts you don't need.

I don't govern the mega corp.

IMO it's a bad state of modularity. There should never be desire to write function yourself because of fear to bring unused code to the project.

Imagine world where you can import a single function and it'll bring only required dependencies, no more.

But, yeah, currently you have to weight whether it's better to write the code yourself or bring external libraries with some burdens.

IMO, you get NPM when you get venture capital pouring money at script kiddies and tech evangelists. No worries, it will all burn down eventually.

This reminds me of a recent post I made, urging people to not use EntityFramework.


It's almost never worth it.

This is a great write up, but it seems to be missing a key ingredient to deciding to use a dependency. That is, are you willing to become a contributer on the project, and/or are the maintainers open to your participation. Example: You may really want to use a dependency, but test coverage is lacking. Are you willing to contribute the test coverage?

The open source model is about building an alliance with others so that you do not have to create the whole tech stack yourself. It's the only way to compete with a monopoly and win.

That's certainly important for some users for some dependencies. But most users cannot and should not contribute to most of their dependencies.

The overhead of reviewing all those contributions would be enormous, and the effort required to make a quality contribution that doesn't waste other people's time would be impossible to justify in most cases.

also getting involved in a project also means dealing with its politics and drama, too. And the entitlement, criticism and blame from users.

and in a lot of cases, it would be less effort to write the functionality you actually need (and the tests for it) than writing the tests for an entire OS project (and then dealing with all the other crap).

People say this and then you find out it means that they brag about reducing their dependencies by 1%. How big the code base for your entire system. We're not shipping 128kB cartridges anymore.

They way people carelessly added dependancies to their python projects kept me away from the language for way longer than it should have.

It’s another thing you have to be aware of when you’re programming just like good naming and problem decomposition.

Not just Python. For some reason, Java developers all got together without consulting me and decided that

    import org.apache.commons.lang;

    if (!StringUtils.isEmpty(s))
is better than

    if (s != null && s.length() != 0)
bonus points for:

    if (s != null && !StringUtils.isEmpty(s))

OK, I'll bite. The third option is obviously wrong. The first one should instead have been "if (StringUtils.isNotEmpty(s))".

But it seems like you prefer to type repetitive code that is more difficult to read, rather than use a function, if that function is in a library. Did I get that right? Is that because the repetitive code is relatively short (it gets longer if the variable has a longer name), or because there is the extra line required for the import?

Apache Commons is the "things that should have been in Java SE, but are missing for mysterious reasons" library. I can barely imagine a project that wouldn't benefit from some of its functionality. (Of course, there is always the alternative option of reinventing the wheel.)

By the way, with static imports the code can be reduced to "if (isNotEmpty(s))".

Ok, I’ll bite back. Yes, “StringUtils.isEmpty” is materially worse, in every way. It’s not really any more readable: the first time you see it, you ought to check to make sure that is just checking to see if the pointer is non-null and the string is 0-length to make sure that it’s not doing something else, like trimming whitespace, which you may not want. You’re adding a few hundred kilobytes of dependency plus an extra unnecessary function call, just to make it less clear what your program is doing. If you use static imports, it’s even less readable because it’s not at all clear where this mystery function came from: I hope you’re lucky enough to be able to import this code into an IDE and click-through the function to see what it is and what it does. Maybe if it was just this one function it wouldn’t be so bad, but this bit of pointlessness ends up scattered everywhere.

This is great!

I feel this is the missing guide we should have referred to, in our recent writeup:

"Software Engineering for Scientific Big Data Analysis" - https://doi.org/10.1093/gigascience/giz054

We were touching on the issue in point 1 in that paper, and did have a discussion about it internally, but this guide really takes a holistic view on the problem.

Thinking about all of this, I'm also happy we managed to build our workflow manager SciPipe (http://scipipe.org) completely without code-level dependencies :)

To me the Equifax story is the Nth such story that points out a crying need for a better security model. In 99%+ cases single teams can no longer produce 100% of the code that companies rely on. Most teams do not have (or can not afford to hire) people with expertise to audit or even an ability to understand the implementation details of the third party codes they use. Consequently any 3rd party code should only be allowed to access resources or data structures that it absolutely needs to carry out some work on behalf of its caller. This is separate from managing the scale complexity of s/w dependencies.

The fundamental problem is that as a group we write too much code, building dozens of approximately equivalent stacks and ecosystems instead of compromising on few things of much higher quality that require a little customization

A highly relevant study:

"Small World with High Risks: A Study of Security Threats in the npm Ecosystem"


The paper is full of hard data.

This seems to be a copy of the blog post from January 2019. Previous discussion: https://news.ycombinator.com/item?id=18979596

Mods, please update the status of link.

The ACM article used a bunch of dependencies that break the text formatting. Russ's original blog post is well formatted.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact