
Software supply chain security - mayakacz
https://github.blog/2020-09-02-secure-your-software-supply-chain-and-protect-against-supply-chain-threats-github-blog/
======
westurner
Estimates of prevalence do assume detection. How would we detect that a
dependency that was installed a few deployments and reboots ago was
compromised?

How does the classic infosec triad (Confidentiality, Integrity, Availability)
apply to software supply chain security?

Confidentiality: Presumably we're talking about open source projects; which
aren't confidential. Projects may request responsible disclosure in an e.g.
security.txt; and vuln reports may be confidential for at least a little
while.

Integrity: Secure transport protocols, checksums, and cryptographic code
signing are ways to mitigate data integrity risks. GitHub supports SSH, 2FA,
and GPG keys. Can all keys in the package signature keyring be used to sign
any package? Can we verify a public key over a different channel? When we
specify exact versions of software dependencies, can we also record package
hashes which the package installer(s) will verify?

Availability: What are the internal and external data, network, and service
dependencies for the development and deployment DevSecOps workflows? Can we
deploy from local package mirrors? Who is responsible for securing and
updating local package mirrors? Are these service dependencies all HA? Does
everything in this system also depend upon the load balancer? Does our
container registry support e.g. Docker Notary (TUF)? How should we mirror TUF
package repos?

See also: "Guidance for [[transparent] proxy cache] partial mirrors?"
[https://github.com/theupdateframework/specification/issues/1...](https://github.com/theupdateframework/specification/issues/110)

~~~
jonahbenton
A toolset that answers some of your questions is grafeas- a metadata store at
[https://github.com/grafeas/grafeas-](https://github.com/grafeas/grafeas-) and
kritis, a policy engine at
[https://github.com/grafeas/kritis](https://github.com/grafeas/kritis).

Cheers.

~~~
p932
Thanks for the links. Do you know how this toolset helps to mitigate/prevent
what is called in the GitHub blogpost "Supply chain compromises". Quickly
checked around and couldn't find anything that applies to the dependencies of
applications/binaries before they land into the target runtime (i.e k8s).

~~~
jonahbenton
Have you seen these preso slides

[https://www.slideshare.net/mobile/aysylu/q-con-sp-
software-s...](https://www.slideshare.net/mobile/aysylu/q-con-sp-software-
supply-chain-management-with-grafeas-and-
kritis#:~:text=22.-,Software%20Supply%20Chain%20with%20Grafeas%20%26%20Kritis%20CI%2FCD%20pipelines%20Build,vulnerabilities%2C%20build%20info%2C%20etc.&text=Production%20Grafeas%20backed%20storage%20vulnerabilities%2C%20build%20info%2C%20etc).

They walk through one of the workflows (end state is deploying to k8s).

Grafeas is a metadata store, Kritis is a policy engine that plugs into k8s as
an admission controller- blessing the "admission" (running) of an image in a
namespace.

There are existing tools for each language/runtime that produce known vuln
lists for individual artifacts in the language ecosystem. These you feed into
Grafeas. And you have your CI pipeline providing manifests for each of your
built images that contain all upstream dependencies (these produced from each
app's build tool). Then at deploy time, Kritis checks the manifest on the
image, and for each artifact in the image, checks for vulns and determines
whether the vuln should keep the image from being deployed.

Hope that helps. There are many other workflows but that one is the most
direct.

Cheers.

------
trishankdatadog
Don't miss how we used TUF [1] and in-toto [2] to build compromise-resilient
CI/CD (the first in the industry AFAICT) for the Datadog Agent integrations
[3][4] that detects attacks _anywhere_ between our developers and end-users

[1] [https://theupdateframework.io/](https://theupdateframework.io/)

[2] [https://in-toto.io/](https://in-toto.io/)

[3]
[https://www.youtube.com/watch?v=9hCiHr1f0zM](https://www.youtube.com/watch?v=9hCiHr1f0zM)

[4] [https://dtdg.co/integrations-tuf-in-toto](https://dtdg.co/integrations-
tuf-in-toto)

~~~
p932
How this pattern/toolset protect against supply chain compromises of the
dependencies used to build the "Datadog Agent" itself?

~~~
trishankdatadog
Apply pattern/toolset recursively. Software supply chain problems largely
eventually solved this way.

~~~
p932
Is there any initiative in this direction towards applying this pattern on big
dependency management tools (e.g maven, pip, npm)?

~~~
trishankdatadog
Yes, please see PEP 458:
[https://www.python.org/dev/peps/pep-0458/](https://www.python.org/dev/peps/pep-0458/)

------
philips
I think a big step forward is for folks to pin versions of things. NPM and pip
and many other systems let software depend on a semantic versioning of their
dependencies which makes it impossible to know what will be installed. If you
at least know what is going to be installed and the URL is known then you can
rely on a third party notary to tell you the expected contents...

Which is what we are building with Asset Transparency to provide a public
transparency log backed database of URL content digests.

[https://www.transparencylog.com](https://www.transparencylog.com)

We have started to build tools for integrating into release pipelines too:

[https://www.transparencylog.com/software-release-process-
int...](https://www.transparencylog.com/software-release-process-integration/)

I think it would be great to see package management systems use things like
this. Go already does.

If anyone wants to get started quickly checkout our CLI tool:

[https://github.com/transparencylog/tl](https://github.com/transparencylog/tl)

~~~
p932
How this compares with
[https://github.com/theupdateframework/notary](https://github.com/theupdateframework/notary)?

~~~
philips
Notary is a signing scheme from the publisher. It is an improvement over GPG
signing + a better scheme for signaling to clients the next version to update.

Asset Transparency doesn't require the publisher to be involved at all and can
work on any URL on the internet that is publicly accessible. It also
complementary to signing schemes.

Here is the Asset Transparency CLI fetching and verifying the contents of a
notary release for example:

    
    
        tl get https://github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64
    

Or if you are curious hit the service’s lookup endpoint directly:

    
    
        curl http://beta-asset.transparencylog.net/lookup/github.com/theupdateframework/notary/releases/download/v0.6.1/notary-Linux-amd64

~~~
trishankdatadog
Philip is right: they are complementary:

[https://ssl.engineering.nyu.edu/blog/2020-02-03-transparent-...](https://ssl.engineering.nyu.edu/blog/2020-02-03-transparent-
logs)

------
brobdingnagians
By default, have your firewall block _all_ outward connections. Only whitelist
the ones you know you need. And as narrow as possible (i.e. specific hosts).

Minimize the number of dependencies. Systems that make it hard to add
dependencies have the virtue of thinking harder about whether you want to add
them. Having a few central libraries that do exactly what you need is better
than drawing in the kitchen sink.

It is often easier to write a specific function that does precisely what you
need than people think.

That is easier to change, and easier to maintain in the long run, than
ingesting a huge library with its dependencies that do things you will
probably never need.

------
snicker7
Related: GNU Guix [0] and Bootstrappable Builds [1]. Guix tries to
reproducibly bootstrap an entire Linux distribution from a ever-shrinking
binary seed.

[0]: [https://guix.gnu.org/](https://guix.gnu.org/)

[1]: [https://bootstrappable.org/](https://bootstrappable.org/)

------
pornel
I'm a proponent of distributed code reviews as a solution:
[https://github.com/crev-dev/crev](https://github.com/crev-dev/crev)

Ultimately, someone has to manually review the code. Antivirus-like heuristics
won't catch everything. Sandboxing may prevent some exfiltration, but can't
prevent malicious code from returning malicious results (e.g. imagine a
password checking library modified to always accept attacker's password - it
can be sandboxed like a nuclear reactor and still screw you). If you verify
the code is actually safe and does what it says, then it doesn't matter where
the code came from, who wrote it, which CI server published it.

But reviewing code is tedious. It's wasteful for every user to individually
review the same code over and over again. You can trust code if enough people
who you trust have reviewed it.

------
holri
Just use software that is in Debian stable. If a library is not in Debian,
then pack it and become a Debian developer and solve that problem for you and
thousands of other people that are affected.

Software supply chain is a very old Problem and already solved. No need to
reinvent the wheel for each generation of software developers.

~~~
dward
Just run apt install billion-$$$-arr regulated-institution and write a systemd
unit file, then run apt upgrade occasionally. What’s the problem?

------
Tainnor
Maybe we need some sort of "trust model" for dependencies. E.g. if you depend
on a package, you'll have to explicitly state that you trust it. Conversely, a
package author may declare that not only are they responsible for their own
code, they have also either only used trusted dependencies, or declare their
own trust (e.g. by review) of certain dependencies, so that you can
transitively build up a trust chain...

In practice, that would all be much more difficult, of course. But it would
surface the underlying issue which is that while code reuse is fine and
acceptable, using unvetted code is not.

------
katsume3
And more reading on this matter, relevant to CCleaner's supply chain mishap:

[https://www.wired.com/story/inside-the-unnerving-supply-
chai...](https://www.wired.com/story/inside-the-unnerving-supply-chain-attack-
that-corrupted-ccleaner/)

Something like Bleachbit seems more reputable, but not necessarily immune from
similar attacks, but it's what I use instead of CCleaner.

~~~
mint2
Interesting. I always thought a supply chain hack meant compromising one of
the dependencies a product uses, not simply hacking the company itself and
altering their product.

------
marcus_holmes
I'm glad this is beginning to be taken seriously. And that the answer isn't
"sandbox every dependency" which is ridiculous.

~~~
ryukafalz
Why is that ridiculous? Certainly it’s not the sole answer to the problem on
its own, but if I were to use (say) a string manipulation library, why should
it have access to my filesystem and the internet?

~~~
marcus_holmes
Because it's not an app. It's just some code.

You're going to have to segment your whole application into chunks, each chunk
being sandboxed away from the others, causing huge overheads and
complications. It'll generate more complexity, more errors, more security
vulnerabilities. And it doesn't even guarantee that the code doesn't do other
bad things that the sandbox doesn't deny. Sandboxing has comprehensively
failed as a security measure for browser extensions - hence both Chrome and
Firefox retreating from extensions.

Or, as a spurious example: you could audit the library's code to make sure
it's not doing bad things, and then copy/paste it into your code base. You
could even just copy the bits you need and leave the bits that deal with use
cases that you don't need. Easier, simpler, more efficient and less dangerous.

~~~
ryukafalz
>Because it's not an app. It's just some code.

Yes. So?

>You're going to have to segment your whole application into chunks, each
chunk being sandboxed away from the others, causing huge overheads and
complications. It'll generate more complexity, more errors, more security
vulnerabilities.

I'm going to dispute this. Yes, if your sandbox takes a ton of memory to
isolate some piece of code, scaling that up to confine each module
individually isn't going to be workable. But who says a sandbox has to be
heavyweight?

Our current systems (UNIX-likes, etc) provide a ton of ambient authority to
each process; given that, it takes a lot of effort to e.g. intercept syscalls
and decide whether or not the application should have access to them. That's
an artifact of design decisions from decades ago, though; let's say we were
starting from scratch, why give every process access to all those syscalls to
begin with? If you want an example of how a system could be designed from the
start _without_ that authority, take a look at this paper:
[http://mumble.net/~jar/pubs/secureos/secureos.html](http://mumble.net/~jar/pubs/secureos/secureos.html)

For a recent attempt at doing essentially this, take a look at this intro to
the Bytecode Alliance: [https://hacks.mozilla.org/2019/11/announcing-the-
bytecode-al...](https://hacks.mozilla.org/2019/11/announcing-the-bytecode-
alliance/)

~~~
marcus_holmes
>Yes. So?

There's a difference between compiled binary and uncompiled code. I guess if
you're working in an interpreted language that never gets compiled, like
Python, you might not notice the difference so much. But even then, this is
not using an API for a separate service that exists on a different server.
This is something that happens in your process.

> ...processes...

If your string processing library has to live in a separate process in order
to sandbox it, then yes, you are creating more problems than you're solving.

------
trabant00
Please stop with the blog spam, upvote astroturfing and "great
article/comment!" stuff. I understand you are payed for evangelism but just
take it somewhere else.

