
Thoughts on “Dependency hell is NP-complete” - cwp
https://thefeedbackloop.xyz/thoughts-on-dependency-hell-is-np-complete/
======
thinkpad20
Nix has a different approach to the problem, with its own set of benefits and
drawbacks: packages do not specify version ranges of their dependencies at
all; instead they reference them via variables (most commonly -- more
generally they can be any arbitrary expression). For example if I'm writing a
python package which depends on flask, I might list `flask` as a dependency of
my package, or perhaps `flask11` or something.

    
    
        # pseudocode
        my_package = buildPythonPackage {
          name = "my_package";
          source = ./path/to/source;
          dependencies = [flask];
        };
    

Whatever that variable resolves to is the thing which will be built (and is
itself defined in a similar way), so "version resolution" is replaced with a
pure computation, the evaluation of an expression.

This approach offers a great deal of advantages, including very fast (roughly
linear) calculation of what to build, full determinism in what gets built, and
the ability to specify any arbitrary nix expression as a dependency, say some
C library or a command-line tool, rather than being limited just to specifying
packages written for your particular language.

However, of course, it does come with some drawbacks. Among them is the
requirement that you often can't simply use nix out-of-the-box to build your
code; depending on how developed the nix ecosystem is for a particular
language, you might have to figure out for language X how packages are built,
tested, installed, and linked together, and write the appropriate abstractions
in nix. It also means that you won't automatically pick up later versions of a
dependency when you (re)build it, which means you might miss out on updates.
However, this latter drawback is arguably a benefit, because by the same token
you won't accidentally pull in regressions due to an update in a dependency.

~~~
skrebbel
I don't understand this. You're saying that the way the dependencies are
written down (with code, not declarations) changes the NP-completeness
characteristics of dependency resolution.

I strongly doubt that, just like a search in an unindexed database table is
O(N) no matter whether you write it as an SQL query or a for loop.

What assumptions about dependencies does Nix make / weaken that makes
resolution linear-time? Any of those 4 Russ Cox lists? Or did Nix come up with
some genius insight that shows that there's more than 4 core assumptions in
the NP Completeness proof and the 5th one is the one that can be safely
dropped?

~~~
DougBTX
> > Whatever that variable resolves to

The "fifth" or perhaps "zeroth" assumption is that the versions of all the
dependencies must be resolved by a single system. If the developer does half
the work by e.g., defining explicitly what flask is, then the remaining part
of the system won't have to be NP complete any more.

I think that's what the parent post means by "won't automatically pick up
later versions of a dependency". Since it isn't doing full dependency
resolution, it doesn't need to solve the whole problem, but also doesn't give
all the benefits.

~~~
smallnamespace
Even though the developer does half the work, the system of 'Nix-using
developer + Nix' is still constrained by NP-completeness, so in practice the
two combined will not be able to solve the 'dependency resolution problem' in
its full generality without a potentially exponential search.

So the real question is, which assumption does the package of 'Nix-using
developer + Nix' give up?

~~~
cwp
It violates assumption #4. With nix, you can have multiple versions of a
package installed.

------
Too
> _I am personally very interested in the idea of allowing packages to mark
> their dependencies as either "shared" or "internal". We could then disallow
> shared dependencies from duplicating, but allow internal dependencies to
> duplicate freely._

CMake has this feature. You can mark dependencies as either public, private or
interface. With public meaning that sub dependencies can be reached through
public interfaces and private that they just stay inside the program and can
be replaced while still maintaining full backwards compatibility with your
public interface.

------
jpollock
I didn't understand why it would be NP-complete, dependencies represent an
acyclic graph.

However, this is the killer:

"Two different versions of a package cannot be installed simultaneously."

Get rid of that, and your problems disappear.

Which is why my linux server has:

1 gcc-4.8-base:amd64

2 gcc-4.9-base:amd64

3 gcc-5-base:amd64

4 gcc-6-base:amd64

~~~
lazyjones
> _dependencies represent an acyclic graph_

AFAICT that's not always true. Several scripting languages allow circular
dependencies (Node.js, Perl) and so do some popular package managers (with
specific workarounds, but the point is that such dependency constellations
exist): [https://www.debian.org/doc/debian-policy/ch-
relationships.ht...](https://www.debian.org/doc/debian-policy/ch-
relationships.html)

Perhaps the problem needs to be examined differently depending on
allowed/existing-in-the-wild circular dependencies.

~~~
dozzie
>> dependencies represent an acyclic graph

> AFAICT that's not always true.

And you are right. apt-get has "upgrade" and "dist-upgrade" commands for a
long, long time for a reason (the latter allows to upgrade packages in the
presence of dependency cycles and some other cases that would be nasty if left
in the middle of upgrade; obviously, no changes that need dist-upgrade land in
Debian stable).

------
tolmasky
At some point we will do a more in depth write-up about this, but I think we
have a very unique take on this at RunKit as we've effectively installed every
version of every package. It's actually more interesting than that, we've
installed every reasonable combinatorial dependency result of every version of
every package on npm.

Long story short, I think there was tremendous insight in npm 2's original
model of having every dependency duplicated. Beyond allowing neat things like
diffing two versions of the same package ([https://runkit.com/tolmasky/api-
diff-example-4](https://runkit.com/tolmasky/api-diff-example-4) ), I think it
most accurately portrays the meaning of a package/version.

The fundamental problem with trying to "unify" all the dependencies a program
will need, is that it almost guarantees that code will be running in an
untested state. Package A was written when Dependency 1 was at version 1.0.0.
By assigning it the semver range 1.x.x (instead of the strict 1.0.0
dependency), that means that at some later point it will run against code its
never seen or was tested against before. Sure, semver ranges are supposed to
protect against this having a "meaningful" effect, but this simply kicks the
can down the responsibility chain, changing the error to "human error" as if
it makes a difference. For example, let's say at the time of publishing
Dependency 1 had a bug that Package A accounted for. According to semver, a
fix for this bug would qualify as a 1.0.1, but now Package A's workaround may
actually _break_. The fact of the matter is: 1. rarely is code fully tested,
2. humans can't be trusted to properly assign semver, 3. semver doesn't even
account for every possible "meaningful" change (I've seen performance changes
that had devastating effects despite the "result" being the same). Different
code is different code.

Now, you may be saying, what does this have to do with npm, it sounds like
what you want is strict versioning -- the problem is that compounding this
already tenuous situation with versions that change depending on the
combination of dependencies you have in order to maximize deduping makes this
problem even worse. Let me give you a _security_ example:

1\. Package A relies on Dependency 1, ranged at 1.x.x.

2\. Dependency 1 is found to have a security problem, so a 1.0.1 fix is
issued.

3\. Package A correctly no longer suffers from this problem.

Later, Package B is added, which someone has maliciously, 10 sup-denendencies
deep, added a strict dependency to Dependency 1 1.0.0. The deduping algorithm
now says:

1\. 1.x.x and 1.0.0 are the requirements, I will use 1.0.0.

Now Package A is using an insecure version of Dependency 1. Previously, if
every package was given its own dependencies, the issue would be scoped to
Package B, instead it now infects everyone that relies on this Dependency.
Sure, you can fix this by saying "dedupe after giving every package its
highest possible dependency", but now you still suffer from monkey-patching
affecting other code. You end up with very difficult to understand results.
Sometimes, all the code uses the same package, other times not, seemingly
randomly. The problem is that a performance issue is being turned into a
functional issue. Package should be black boxes, not things you change the
guts of regularly. There exists a tool for sharing dependencies:
"peerDependencies". This says "I expect this to be around" vs. "I am including
this".

The basic idea here is that you would never say "let's arbitrarily replace
functions I've written in my program with approximately the same function in
order to be able to be able to share them and reduce program size". This is
nuts! Package versions should be considered hashes that only coincidentally
have human names. NPM 2 is the closest I've seen anything come to this, and it
creates the least surprises.

Most the remedies I've seen revolve around solving performance or efficiency
problems this has. But I think it should first be important to make it right,
THEN make it fast. You can definitely imagine a system that uses content-
hashes to get the space efficiency without the cost of the conceptual
separation. In fact, IED does this quite well:
[https://github.com/alexanderGugel/ied](https://github.com/alexanderGugel/ied)

~~~
tlrobinson
A couple points:

1\. The npm 2 model breaks down for client-side applications where you want to
minimize the size. I don't want 5 slightly different versions of left-pad when
1 will do.

2\. I think of semver ranges in package.json as expressing what versions the
developer believes _should_ work, and npm-shrinkrap.json or yarn.lock as
expressing the latest versions that have _actually been tested_.

I think it's insane to use npm without shrinkwrap (even though it has its own
problems). One of the best things about yarn is it generates yarn.lock by
default.

I suppose this view is problematic in the context of RunKit which doesn't have
any sort of lock file, and would be challenging to implement UX-wise.

3\. Packages with known security issues should be marked as deprecated, which
will cause a warning to be printed every time it's installed.

Malicious packages can do a lot worse things than forcing another package's
dependency to a specific version, though that is a particularly subtle attack.

~~~
tolmasky
RunKit generates a shrink wrap on the fly. In that sense it is the same as
yarn.

------
zzzcpan
If you keep dependencies resolved at all times and operate on packages with
hardcoded versions of dependencies, as Nix does, you can avoid that NP-
problem.

Sadly, I remember people involved in Go packaging discussions being dismissive
of Nix's ideas.

~~~
sagichmal
You remember incorrectly. Nix, unfortunately, just isn't suitable for non-
final dependency resolution, i.e. libraries. And that's an important part of
the mandate for language packaging discussions.

~~~
zzzcpan
Nix itself isn't suitable for many things, but the ideas those guys came up
with are great and work well.

------
hellbanner
Relevant: The Science of Security, arguing against Turing Complete programs
when less powerful will suffice

[https://www.youtube.com/watch?v=v8F8BqSa-
XY](https://www.youtube.com/watch?v=v8F8BqSa-XY)

