
Backstabber's Knife Collection: A Review of Open Source Supply Chain Attacks - adulau
https://arxiv.org/abs/2005.09535
======
marcus_holmes
I have a feeling this is going to be the major problem of the next decade in
software/web development.

~~~
seemslegit
Only if by next you mean last and by 'major problem' you mean 'mostly ignored
due to lack of tangible accountability for anyone involved'

~~~
foobiekr
this.

at my previous company, and on the current project, we have explicit audit and
mirrors for every dependency. it's a ton of work, especially at first. the
developers, especially the front end, sort of resent it. but for our
particular niche, especially previously, we put a priority on knowing what we
are running and requiring importers to take an active, documented role in
vetting that they pull in instead of just grabbing whatever.

the _entire_ ecosystem works against you if you want to do this. it's pretty
incredible.

~~~
neilv
I did this same thing for the last large system I worked on. All third-party
packages got audited and checked into our CM system before they could be used.

Also, I prohibited third-party HTTP/S requests, such as to pull a JS package
from a CDN. I did this not only so that we could audit it and assure integrity
and availability, but also so that we weren't leaking information to third
parties under ordinary circumstances.

(Little details: I had some naming conventions for versioning so that multiple
versions of a package could be used simultaneously in a checkout. For the JS
packages, it simply making a subdirectory named after the package name and
version number; for the main application language, it was more complicated, to
support their package system better. I made the main backend use thin wrapper
modules for each of the third-party packages used, which effectively specified
which version to use (with zero runtime cost from the wrapper), since the
language platform package system didn't have some kind of manifest to do
this.)

~~~
foobiekr
solid.

our jenkins can't talk to anything except github. it's isolated.

we use git for everything; our "versions" are git hashes. we maintain a
manifest for the state of the repo and dependencies. this allows us to upgrade
versions of dependencies as part of the normal workflow with ease (just a pull
& checkout).

------
paulvs
I think that the test cases included with packages might have the advantage of
being able to obfuscate URLs or other strings as benign test dummy data.

This would be especially easy by using the technique called string sampling
that the author mentions. I could choose a "Lorem ipsum" like text for use as
dummy data, but ensure that the first letter of every word, when combined,
forms the domain name of a server that will be used to download a second
malicious payload.

------
trishankkarthik
This is why we designed TUF and in-toto to detect MitM attacks anywhere in the
software supply chain between developers and end-users themselves, and provide
E2E compromise-resilience.

It's strange that the paper doesn't mention us considering that we have
considerable expertise in this very area.

[https://www.datadoghq.com/blog/engineering/secure-
publicatio...](https://www.datadoghq.com/blog/engineering/secure-publication-
of-datadog-agent-integrations-with-tuf-and-in-toto/)

~~~
raesene9
Whilst TUF absolutely does help with some of the cases in the paper and
generally, it's important to notice that at least one of the scenarios in the
paper may not be covered by solutions like TUF.

I'm thinking of the scenario where a bad actor takes over an existing library
with the original owner's blessing, either by contributing and then taking on
maintainership, or via payment to the original owner.

In that case ownership of signing keys may transition to the new owners
voluntarily, so there would be no noticable change, in terms of signing of
packages.

~~~
trishankkarthik
That's a bit like saying: well, encrypting the iPhone isn't all that jazz,
because all I have to do is hit the owner with a $5 wrench.

I mean, yes, but cryptography alone cannot solve that problem. TUF and in-toto
provide cryptographic solutions to cryptographic problems, which is much more
than anyone else is doing today.

------
mmhsieh
What is the effectiveness of obfuscation? My understanding is that the
existing dynamic analysis tools can usually defeat anything obfuscated within
O(1 day).

~~~
negus
And what kind of automated dynamic analysis of popular dependencies do you
expect to meet? And who will analyze its report and prevent the malicious
package to be uploaded to the repo?

------
nathancahill
They mentioned the dataset that they collected a few times in the paper, but I
didn't find the actual data. Is that typical for this type of research?

~~~
boomboomsubban
They link to a github account containing it, but say access will only be
provided on justified requests. No clue if that's standard.

~~~
knolax
Standard or not I wouldn't take any "research" that doesn't actually provide
data seriously.

~~~
boomboomsubban
Sure, this one provides the data though.

