
Always Review Your Dependencies, AGPL Edition - zdw
https://www.agwa.name/blog/post/always_review_your_dependencies
======
marcan_42
> The bulk of my trust is consolidated in the Go project, and thanks to their
> stellar reputation and solid operating procedures, I don't feel a need to
> review the source code of the Go compiler and standard libraries.

Well... I _have_ reviewed the Go runtime, after I ran into a bug in it... And
that's one of the reasons I no longer use Go unless forced to. That thing is
_ugly_ under the covers. There is seriously quite a bit of insanity and poor
design going on, e.g. when they did the ARM port they had to make a ton of
changes to data structures/functions to include the link register, because
they didn't or couldn't abstract out architecture specific tidbits like that.
Also, the whole C interop stuff is nuts with a ridiculous call chain and
several stack switches every time you call to or from C code (this is where I
found the bug).

Not saying people shouldn't use it, but I wouldn't hold the deep guts of Go up
as an example of stellar development practices. Maybe all the stdlib stuff on
top is prettier, but the runtime sure isn't.

~~~
grammarxcore
I ran into a speed issue last year messing around with Go RSA keys. Turns out
it's an open bug. Even though fixes have been made (see links in the thread),
it's not a global fix and Go ciphers can be absysmally slow.

[https://github.com/golang/go/issues/20058](https://github.com/golang/go/issues/20058)

The recommended fix is to use a library like this one. However, that means
your containers blow up with complicated dependency trees so it's not really a
good solution for a distributed container architecture (eg Kubernetes).

[https://github.com/ncw/gmp](https://github.com/ncw/gmp)

I love working with Go because of its simple binaries and small containers,
but there are some things that it just does not do well.

~~~
dependenttypes
I don't know if using gmp for RSA is a good idea. I am pretty sure that gmp
would open you to side channel attacks due to its operations not being
constant time.

~~~
grammarxcore
I abandoned the project because I couldn't find a reasonable solution using
standard Go libs that worked in a reasonable timeframe. The bottleneck is this
function.

math/big.addMulVVW

There was some work on it recently.

[https://go-review.googlesource.com/q/addMulVVW](https://go-
review.googlesource.com/q/addMulVVW)

But I feel like this issue might have been ignored.

[https://go-review.googlesource.com/c/go/+/164966](https://go-
review.googlesource.com/c/go/+/164966)

It might be addressed in Go 1.14 (although it's been marked as Backlog since I
last looked at that issue).

[https://github.com/golang/go/issues/32492](https://github.com/golang/go/issues/32492)

Point being Go, like OP of this thread suggests, has some issues. This is, to
me, a critical flaw preventing my teams from using Go as a primary web
language. It is both consistently faster and consistently timed to make an
external call to, say, gpg2 to generate a key than it is to use the opengpg
lib that relies on math/big. That's nuts.

------
cygned
I try to convince my team that the node.js ecosystem has gotten into a stage
where it cannot be used for security/financial applications because the sheer
amount of dependencies pose an inherent threat. I advocate for Go because of
the tendency to less and easier reviewable dependencies. Nobody except me
seems to see a problem there, despite me being able to point out specific
security incidents.

I am wondering if I am missing something obvious here and would value any
opinion.

~~~
lmm
Dependencies are cattle, not pets. There's nothing wrong with having zillions
of them; what you need is good tools to manage them in bulk.

In the JVM ecosystem, projects are only allowed in the Maven Central
repository if they have their license documented in a machine readable fashion
there; it's then trivial to check what licenses you're depending on (and there
are several plugins available for doing so). I'm amazed other ecosystems don't
offer the same.

~~~
anton_gogolev
> There's nothing wrong with having zillions of them...

There's nothing wrong until something goes wrong an now you're royally
screwed. With zillion dependencies you are at a mercy of zillion maintainers,
and none of them has any obligation to you. They can break backwards
compatibility in patch releases, introduce subtle behavior changes, steer the
project in an unexpected direction or abandon it altogether.

~~~
codeflo
I’m a bit torn on this. I have most of my experience in the .NET ecosystem,
where dependencies are a lot more manageable. However, if something breaks,
you’re screwed a lot harder, because it’s not so easy to replace a large
library, and there are very likely fewer well-maintained alternatives than
there would be on NPM.

In total, I find it hard to deny how productive the NPM ecosystem can be,
despite my philosophical objections to the way the community is run. Am I
crazy here?

~~~
iSnow
You aren't alone in this. The Node/NPM/JS scene is churning out code and
innovations like there's no tomorrow, that's something to admire.

What I feel they are missing is a community process to consolidate things. You
don't need three generations of ten incompatible solutions for a given problem
- after some iterations, things should consolidate into one or two more or
less standardized libs that don't break existing code at every damn point
release.

~~~
q3k
> You aren't alone in this. The Node/NPM/JS scene is churning out code and
> innovations like there's no tomorrow, that's something to admire.

I don't find churning out code admirable, and I also don't think I've seen any
true innovation come out of the NPM scene (bar innovation in the browser/JS
space itself, which I think isn't a good measure as it's mostly just working
around limitations that shouldn't be there in the first place).

------
rubenbe
This is one of the items embedded Linux development using Yocto has
particulary well covered. It checks every license for every package and you
can ask it to verify the checksum of the license file. If the license changes
after upgrading a package, you'll get a build error.

------
generationP
> Aside: this is why I don't like to accept pull requests that move code
> around. Even if the new code organization is better, it's usually not worth
> the time it takes to ensure the pull request isn't doing anything extra.

This sounds like a solvable problem no one has bothered to solve. We need an
analogue of diff highlighting for move-around changes, ideally one that
decomposes a changeset into the coarsest block partition such that the changes
boil down to a permutation of the blocks.

Something similar should be done for merge commits, which at the moment are
completely undebuggable.

~~~
robbya
For changes that are a pure file rename 'git mv' tracks those pretty cleanly.
Anything beyond that... I don't know of any good tooling either.

~~~
progval
Mercurial goes a bit further by tracking copies, so if you split a file in
two, a diff viewer will show there is no added code (but a lot of deletions).
[https://stackoverflow.com/a/4156146/539465](https://stackoverflow.com/a/4156146/539465)

------
sansnomme
Package managers need to automatically derive properties of end builds based
on licenses. E.g. Eclipse License 2.0 without the presence of another more
liberal license means it cannot be used in copyleft software, any dependency
that is copyleft is also infectious etc. Of course it won't account for every
single legal property but the basic checks should be done. To prevent work
duplication a single binary/library written in Go/Rust can take care of this
problem and be used across all package managers. It is in the interest of
GitHub to sponsor such a project.

~~~
256dpi
The licensing of a dependency could be easily determined programmatically
(GitHub already built a decent scanner). However, I think that the quality of
a dependency is most important and that requires a manual vetting process. A
trivial solution would be to create a crowd-sourced dependency vetting
platform.

~~~
kijin
GitHub's license detection algorithm is crap. It tends to get GPLv2 right, but
most other licenses are hit-and-miss. And I still can't find a setting where I
can manually specify the license when GitHub can't autodetect one or gets it
wrong.

~~~
rurounijones
Ruby gems have the license as an explicit field in the gemspec (manifest
file). Is this not the case elsewhere?

It makes license scanning a doddle.

~~~
saagarjha
An even older tradition is a LICENSE file in the root of your project, which
GitHub seems to read, but it's not great at matching up the license text to
actual licenses.

~~~
kijin
An alternative tradition uses a COPYING file. I'm not sure which is older, but
most GNU projects use COPYING instead of LICENSE. Meanwhile, some projects use
COPYRIGHT or other variation, BSD-style licenses are often added directly to
source files, and properly applying LGPLv3 to your project involves adding a
separate COPYING.LESSER file. It's a mess.

------
oaiey
I am doing similar analysis within my company. While I usually never dig that
deep asking why something is AGPl or GPL, the amount of careless managed
dependency trees within the NPM ecosystem is astonishing. It is not only about
licenses, but also code quality and vulnerabilities.

I came to the absolute same conclusion as the author: focus on platforms where
there is a reasonable standard library. That is .NET in my case and Go in his
case. These product dependency trees are much (much) cleaner.

I ended up with writing software to analyze the dependency trees licenses for
our browser based products

------
brian-armstrong
It seems like this is a good opportunity for Github to warn you if

a) a PR would merge a new dependency with an incompatible license to yours

and

b) allow you to filter out projects based on licenses from search results. in
general most of us would prefer to avoid being tainted by GPL code and it'd be
great to hide it on the site entirely.

------
samat
I would love to have this guy write my security sensitive software.

~~~
AmericanChopper
I used to work with a guy that would review every line of our projects node
modules. He’d find issues all the time, so there’s no way he wasn’t pay
attention. He also managed to be productive with writing his features. Having
that guy on the team was amazing, I have no idea what motivated him to do such
tedious work, but he seemed to love it...

~~~
atoav
With a bit of electronics background this feels like the difference between a
hobbyist and a professional.

Most programmers feel like hobbyists to me, in their YOLO-approach. Your
software might ruin someones day, their life or maybe even end up killing
people and more people should take that thing seriously. The way your collegue
worked should be the norm.

~~~
TeMPOraL
I'd think the comparison goes the other way around :). Hobbyists _care_ enough
to do things right even if it's not in the short-term interest. Professionals,
judging by all the advice I read on-line, are supposed to focus on _delivering
value_ \- which is usually measured short-term, and not aligned with doing
things right. It's that attitude that makes most companies care little to none
about security. Bringing in tons of dependencies in order to increase velocity
and implement user-facing features faster is what I see advertised as
archetype of a professional in this industry.

~~~
atoav
> I'd think the comparison goes the other way around :)

Only in the Software industry. This is however _not_ normal:

* Mechanical Engineering: Hobbyist bridge vs. professional Bridge – which one should you be able to trust more to carry you?

* Electrical Engineering: Hobbyist wall wart vs. professional wall wart – which one should you be able to trust more not to burn your house down?

* Medical Treatment: Hobbyist vs. professional – which one do you trust more with your body?

The list goes on. Professional work isn't about cobbling together something
that barely works at the edge of complexity you can just about manage.
Professional work is building something reliable, robust that _you_ can
guarantuee for at least to a certain degree.

I know that this isn't how Software Development works right now in most places
– that was the whole point of my comment. However it is _not normal_. It is a
bit like with early cars: back then a hobbyist could have easily made a safer
car than any professional manufacturer with the right amount of commitment,
because there were no real safety standards. Try doing that nowadays.

Edit: Good related talk by the legend himself (Ross Anderson):
[https://media.ccc.de/v/36c3-10924-the_sustainability_of_safe...](https://media.ccc.de/v/36c3-10924-the_sustainability_of_safety_security_and_privacy)

~~~
jlg23
IMHO, lots of software development resembles crafting much more than
engineering and we have put so much effort into sounding smart that we are now
training craftsmen that sound like scientists but are not necessarily good at
their craft...

"I used the Singleton Pattern!" "Awesome, you learned about global variables!"
_blank stare_

~~~
ncmncm
"All patterns are anti-patterns."

------
mcguire
" _I just finished writing a vastly simpler, attestation-free library which is
less than one tenth the size (I will open source it soon - watch this space)._
"

...

" _Even though I love Rust, I am terrified every time I look at the dependency
graph of a typical Rust library: I usually see dozens of transitive
dependencies written by Internet randos whom I have zero reason to trust._ "

I have some kind of point to this comment, but I'm trapped in a sudden feeling
of professional malaise.

------
fenomas
Not sure about this bit:

> This is quite a bit of work, but is necessary to avoid falling victim to
> attacks like _event-stream_.

Reviewing dependencies is important, but I don't think anything the author
mentions would have made a difference with event-stream. The whole issue there
was that malicious changes were snuck in via a change of maintainers and a
later update to a child dependency, so when people initially adopted it as a
dependency there were no red flags in the library to find.

~~~
square_usual
He talks about reviewing dependencies when they are updated, as well, which
certainly would help if someone snuck in malicious code in a minor version
change.

~~~
fenomas
Well, he talks about reviewing when _he upgrades_ a dependency, but the
tricksy thing about event-stream was that people could get the malicious
change without having intended to upgrade anything.

The only thing that really prevents such issues is version locking of
transitive dependencies (which the author doesn't mention, but it could be
that his package manager does it by default, or similar..).

~~~
colonwqbang
I had missed that npm doesn't (didn't?) lock package versions by default.
That's really scary.

~~~
fenomas
It does today, and I could be crazy but I think it did even before event-
stream. But of course some users will have been on old versions of npm, or not
checking in their lockfiles, etc.

I should add, I was probably wrong to talk about lockfiles _preventing_ such
issues. With event-stream the malicious code was hidden deviously enough to
evade a pretty rigorous check - if a Node update hadn't deprecated one of the
functions used in the payload, I suppose we might still not know about it. In
such cases a lockfile is at least a layer of defense, but naturally it only
helps if you're lucky enough to have installed the library before it got
corrupted..

------
herpderperator
When this webpage loads, it briefly flashes in plaintext before loading CSS -
at least in Chrome. This is unusual since it has the CSS stylesheet in the
head which I thought would block until it's fully loaded. Does anyone know
what's going on with that?

~~~
holdenc
This is what happens when you dynamically insert your CSS tag in the head
after the DOM loads.

~~~
herpderperator
If you view the raw data coming from the web server, the link tag containing
the stylesheet is already present in the head. Although what you're saying
sounds like it can absolutely cause this type of behaviour, I'm not sure if
that's what's happening in this case.

~~~
holdenc
You are correct. I suppose it could be an unreviewed NPM dependency causing
this ;)

------
choeger
That's the hidden cost of npm, cargo, pip, et. Al. The other one is IMO akin
to overweight. Try to modernize a mid-sized project after one or two years and
cry when you see the dependency graph.

Ceterum censeo go inferior est.

~~~
lifthrasiir
If you define the "modernization" as mindlessly switching to a trending
framework or library, the cost is inevitable even without package managers.

~~~
squiggleblaz
You don't need to mindlessly switch to a trending framework or library.

Try upgrading typescript to the latest version. I think that's a fair
definition of "modernisation" and something which should be benign.

But now all of a sudden you have to upgrade every dependency to the latest
version - or write your own .d.ts files - since they're typically written to
the current library version - typescript version combination.

~~~
gpm
The rust eco system has done a pretty good job at this.

The language and standard library is always backwards compatible (minor
security fixes excepted). So updating to a newer language version just works.

Major libraries at the roots of many dependency trees have upgraded in
backwards incompatible ways a few times, but the community has pretty
consistently come together and helped move every other library that anyone
uses to the new version.

------
correct_horse
Why don't npm-like package managers have settings for licenses in applications
(as opposed to libraries)? Settings like "no AGPL" or "no copyleft
dependencies" would allow easy vendoring with modifications. This (disabled by
default) feature might break some proprietary code, but if it does, that
indicates you were not following copyright law prior.

Obviously this doesn't solve the general quality problem with dependencies
that the author notes, but it fixes some licensing issues.

~~~
fmajid
There are over 200 OSI-certified open source licenses, and no automatable way
to handle them.

~~~
thenewnewguy
I (and I'm sure many others) would gladly take an option to provide a
whitelist of acceptable licenses.

------
mattigames
It's always been weird to me how Microsoft invests so much in making
JavaScript easy to develop and maintain with TypeScript but does so little to
make it safe, the one thing JS needs is a standard library by Microsoft (or
similar, e.g, Google) that we can trust, aiming to significantly reduce the
number of dubious-origin dependencies of every JS project (on node and the
browser)

~~~
evmar
Here's another opportunity to point out this unfortunate licensing bug in
TypeScript's standard library:

[https://github.com/microsoft/tslib/issues/47](https://github.com/microsoft/tslib/issues/47)

(Disclaimer: I like TS, but I filed the above bug.)

~~~
pbhjpbhj
Aside, can anyone explain this to me?

>Please do not use "public domain", it's not a license and it makes it
impossible to use such code in corporate context. //

Seems pretty ridiculous, "can't use code unless it's encumbered by licensing
restrictions"??

~~~
Doctor_Fegg
CC0 is a licence. "Public domain" is a state.

Some works have no copyright, typically because they're the product of a US
Federal institution, or the copyright has expired. In American English this
copyright-free state is also known as "public domain". (In British English it
can mean something quite different.)

To put something in the "public domain", you need to make a statement of such.
In other words, a licence. You could just say "I put this in the public
domain" but it may not provide the certainty that bigcorps like. CC0 is a way
of saying this in bigcorp-friendly language. (The CC0 page at
[https://creativecommons.org/share-your-work/public-
domain/cc...](https://creativecommons.org/share-your-work/public-domain/cc0/)
explains this well.)

Alternatively, you can use WTFPL, which is a really great way of putting your
works in the public domain while guaranteeing that bigcorps won't use them
(see: statements on HN by Google people, passim).

~~~
M2Ys4U
>To put something in the "public domain", you need to make a statement of
such. In other words, a licence. You could just say "I put this in the public
domain" but it may not provide the certainty that bigcorps like. CC0 is a way
of saying this in bigcorp-friendly language.

It's more than that - it's _impossible_ to place things in to the public
domain in many juristictions - like the UK.

(This is because copyright is property, and property _must_ have an owner
under the law of England and Wales [also Scots law I think, but I don't know].
Thus, a declaration that something is in the PD is of no effect.)

~~~
pbhjpbhj
You can abandon property to public ownership (land, cars, anything you leave
in the street in England & Wales: it gets sold and proceeds go to the
Exchequer or Police) so your justification at the end doesn't sway me.

Moreover, Google have been allowed to assume ownership of intellectual
property (content of books). The moral rights on those properties have been
effectively cancelled.

~~~
M2Ys4U
>You can abandon property to public ownership (land, cars, anything you leave
in the street in England & Wales: it gets sold and proceeds go to the
Exchequer or Police) so your justification at the end doesn't sway me.

Sure, bona vacantia is A Thing - but your comment spells it out. The copyright
gets _sold_ by the state to whoever wants to buy it.

That is not the same thing as public domain - the new owner can enforce their
monopoly rights over the works which can screw over anybody depending on the
work being "in the public domain".

------
robmccoll
On the licensing front, one thing that has really helped me is adding a script
to my build / CI process that scans every package I'm dependent on for a
LICENSE / COPYING / etc. file and then runs through a set of dumb fingerprints
to match on different licenses for a combined whitelist / blacklist approach.
Packages with no recognized license file or license are rejected. Then it
spits the licences out into a map[string]string in a file where the Go package
maps to the license string which gets served up on its own HTTPS endpoint to
keep in compliance with copyright notice requirements.

In case it's useful to anyone else, here's an older copy of it:
[https://gist.github.com/robmccoll/240317eceb73e3f4e29ea662e3...](https://gist.github.com/robmccoll/240317eceb73e3f4e29ea662e3ea4063)

~~~
dwheeler
I _completely_ agree that you should run 1+ license checkers in your CI
process, just like you should check anything else in your CI process that you
care about.

In the CII Best Practices badge project we use both the "license_finder"
program (which is an OSS tool) and the FOSSA service (
[https://fossa.com/](https://fossa.com/) ). Both examine the dependencies for
licensing issues. While something could still slip through, it's much less
likely, and more importantly we have a really good case for showing due
diligence. I'm not a lawyer, but I do know that courts look very favorably on
people who are demonstrably making an effort to meet their legal obligations.
You're _way_ less likely to have legal problems that way.

------
fmajid
The problem is reviewing dependencies is labor-intensive, thus expensive. The
economics would justify that being a paid service, like UL/TÜV certification
for electronics, but most companies don't actually care enough about security
to pay for such a thing.

------
pythonist
And also very important to watch for changes of your dependencies, as they can
be quite dynamic. A single review too early or too late in the development
process may have misleading results. Watching for commit changes is too time
consuming, so watching for releases is a bit easier, especially if you use
something like [https://newreleases.io](https://newreleases.io) or other
similar sites.

------
henvic
Slightly related: I have created this useful tool called vendorlicenses that
allows you to check and concatenate licenses on Go programs (to give credit on
a CLI "legal" command).

It will only work if you are vendoring your packages though (and you should!).

[https://github.com/henvic/vendorlicenses](https://github.com/henvic/vendorlicenses)

------
Vizarddesky
> I repeat the above recursively on transitive dependencies as many times as
> necessary. I also repeat the cursory code review any time I upgrade a
> dependency.

If this guy has to work on a "modern" frontend project, he's gonna review
dependencies until the heat death of the universe.

~~~
fhars
But doesn‘t that say more about modern leftpaddable frontend frameworks than
about the author?

~~~
taneq
It sure does. ‘Don’t repeat yourself’ and ‘avoid NIH syndrome’ are noble goals
but ‘automatically update myriad libraries from random sources on the internet
and then run them’ gives me the heebie jeebies.

------
povils
I think you would end up in a rabbit hole. Do you also review your all
GNU/Linux libraries and dependencies? Probably not because you trust them.
Thus I think we should be pragmatic and review only libraries which are
created by unknown/untrusted creators.

~~~
atoav
Well I think you should at least do two things:

\- avoid dependencies where you reasonably can. Less moving parts are usually
good

\- just have a _look_ at the dependencies of your dependencies - this might
help you decide which one to trust

------
baybal2
NPM ecosystem has quite similar situation.

A lot of popular packages being used in complete contravention to their
license terms, and care not at all their dependency chains pulling quite
restrictive licenses.

------
ones_and_zeros
Johnny come lately but is that true of the AGPL, that merely linking to agpl
code infects your code? No modification to the agpl code needed to trigger the
clause?

What is the point of having a license like that? Trolling?

~~~
paulmd
Yes, that is generally how copyleft (GPL) software works. Linking against it
confers a requirement that your source also be distributed under a compatible
license. This is why it's called a "viral" license.

AGPL was intended to close the SaaS gap in copyleft software. Essentially,
prior copyleft provisions don't kick in if you never distribute the software
itself, so if you only offer network interaction _with_ the software you have
no duty to redistribute.

AGPL is intended to restore the freedom of end users to have access to the
source of the software they interact with, even if it lives on someone else's
computer. As with all copyleft software, it does this by restricting the
rights of the developers of the software (to keep their source proprietary).

As always, it is the old philosophical split between MIT/BSD style licenses
and GPL style licenses. MIT/BSD provide you with complete freedom, including
the freedom to use that library and keep your source closed. GPL is intended
to prevent that and to build a common base of software on which further things
can be built.

(there is also Lesser GPL license, which only kicks in for modifications to
the library itself. So if you write something that uses a LGPL library, you
don't have to give away source for the larger program as a whole, just the
LGPL program and any modifications you made to it. This is essentially
removing the "viral" copyleft provisions.)

------
vemv
> It started off poorly when I noticed some Brown M&M's: despite being a
> library, it was logging messages to stdout

In many language ecosystems, logging to stdout tends to be the right thing to
do in a lib as it does not impose any specific choice of logging
implementation.

One always can recapture stdout and route it through one's logging solution of
choice; doing something analog with an extraneous logging library can be
harder if not unfeasible.

Are things different in Go land?

~~~
robmccoll
In what language or environment would recapturing and routing stdout be a
desirable solution? Seeing that without a really good explanation would
concern me. That's a global hack with fun side effects that can lead to very
confusing bugs. Also, you would lose which library is which log if you have to
do this for more than one library.

In Go, I would assume that a library that needs to log would do one or more of
the following a) offer the ability to disable logging b) offer overriding the
output that it uses for logging probably with an io.Writer interface or
something c) accept a log.Logger or offer an interface that something like
log.Logger meets. Ideally it would offer an interface that supports some key-
value meta data to be able to have nicely structured logs with formatting
independent of the library.

