
Binary Authorization for Borg - mayakacz
https://cloud.google.com/security/binary-authorization-for-borg/
======
rsync
In case anyone else wonders what 'borg' is:

"Our infrastructure is containerized, using a cluster management system called
Borg."

I was hoping they had some predictable, indexed build for borg backup[1].

[1] [https://www.stavros.io/posts/holy-grail-
backups/](https://www.stavros.io/posts/holy-grail-backups/)

~~~
kevincox
Borg is basically the internal predecessor to Kubernetes.

~~~
mehrdada
Predecessor may imply "worse" or "outdated" (although may not be the intent of
the OP). I want to clarify that is definitely not the case: Kubernetes is a
joke compared to Borg when it comes to running Google workloads (on many
dimensions, most importantly scale).

~~~
kerng
This post highlights why many start disliking Google - its that vibe of
superiority that comes across often. These days Google workload (e.g. qps)
isn't that unique anymore...

Moving Borg to Kunernetes will eventually happen because it doesnt make sense
to help maintain two systems that solve the same problem. And because
Kubernetes is open source it will eventually be superior, because of diverse
group of contributions.

~~~
mehrdada
Huh? How is superiority relevant here? I don’t work at Google (I have in the
past) and both systems come out of that company with some of the same people
have worked on both. If anything, Google marketing department would probably
prefer for people to believe that Kubernetes ~= Borg to help sell GKE, not the
other way around. Kubernetes is basically limited to several thousand hosts.
That doesn’t even register at Google scale. Other folks do have high QPS
systems but none really use Kubernetes to manage the entire cluster; Facebook
comes to mind, for example, with an in house system. I would bet against that
prediction; I don’t believe such a thing is even on the roadmap internally.

------
trishankdatadog
On a related note, we have built an E2E-verified, tamper-evident CI/CD
pipeline for the Datadog Agent integrations [1]: the Agent will trust and
install only integrations that correspond to source code that have signed by
our developers. If there is an attack anywhere between our developers and end-
users, it will be caught.

Unlike Binary Authorization for Borg, our security guarantees are publicly
verifiable.

[1] [https://www.datadoghq.com/blog/engineering/secure-
publicatio...](https://www.datadoghq.com/blog/engineering/secure-publication-
of-datadog-agent-integrations-with-tuf-and-in-toto/)

~~~
packetslave
That's a bit of disingenuous reply...

Binary Authorization for Borg is for verifying binaries running inside Google,
not code installed on end-user machines. Having the authorization be "publicly
verifiable" makes no sense.

~~~
mayakacz
Agreed with this statement. It's a best practice generally to verify all
software updates originate from a particular source before applying them in
your environment. Most over the wire updates do this. What's different with
Binary Authorization for Borg is that within Google, that last verification
step means more than just "came from Google", but "came from Google and went
through all previous necessary checks", because of the way the CI/CD system
works together.

Disclosure: I work at Google and helped write this whitepaper on Binary
Authorization for Borg.

~~~
ithkuil
in-toto.io also addresses the "proof it went through some steps". How would
you compare the two systems?

~~~
mayakacz
The biggest difference for me is that in-toto allows you to define any set of
upstream metadata requirements, in a very open format, and Binary
Authorization has a set of centrally defined requirements, that teams tend to
implement in tranches, to meet minimum requirements. It may sound better to
have a freeform format, but in practice, I've found that it makes it harder
for people to know what they should _actually_ do. In Binary Authorization for
Borg, services still define service-specific policies, but pick from a
previously defined set of potential requirements. See the section on service-
specific policies: [https://cloud.google.com/security/binary-authorization-
for-b...](https://cloud.google.com/security/binary-authorization-for-
borg/#service-specific-policy)

You can more easily compare Grafeas and Kritis (OSS projects Google developed,
which are similar to GCR Vulnerability Scanning and Binary Authorization for
GKE), to in-toto. In fact, I gave a talk covering some of the options for this
here:
[https://youtu.be/uDWXKKEO8NU?t=1314](https://youtu.be/uDWXKKEO8NU?t=1314)

Disclosure: I work at Google and helped write this whitepaper on Binary
Authorization for Borg.

------
justicezyx
I led the portion of this project on Borg itself.

Security team did most of the security infrastructure, and coordination among
almost every large infrastructure system team inside TI.

I'll be waiting for them to answer any questions. :)

~~~
agv123
I've seen references to gVisor being used 'in production' for google app
engine && cloud run and so forth.

Scanning through recent commits && the github repo this is clearly not the
case - there are way too many outstanding issues and outright missing support
for various things. Is this another project where it was written in a
different language or something and then ported out?

Can you clarify?

~~~
prattmic
I work on gVisor, I can answer this!

gVisor is not a rewritten version of an internal tool. The code you see really
does run in production for App Engine and Cloud Run. While there are some
internal modifications to better integrate with internal infrastructure, the
vast majority of the code is identical to open source, critically including
all of the system call handling, filesystem, and memory management code.

While browsing through our issues will show that we still have plenty to work
on, the vast majority of applications work well inside gVisor.

------
seriesf
One thing that really squicked me out when I left Google is how other
companies, even large and sophisticated ones, are using all kinds of garbage
that comes from canonical or red hat or percona, and they have NO IDEA what's
in there. Say what you want about google's NIH culture, but in regards to code
provenance and verifiable builds they are doing the right thing and many
others are not.

~~~
GuyPostington
Can you give an example of this garbage?

~~~
seriesf
Literally anything that comes from a vendor in a package? Percona
server/toolkit? Every binary package in Ubuntu? The Linux kernel as built and
distributed by Red Hat?

~~~
pjc50
Yeah, that's not a definition of garbage that's going to be taken seriously.

~~~
seriesf
You're right; Google is the only big tech that takes insider risk seriously.

------
philsnow
> Adopting similar controls in your organization

> Figure out how to manage third party code.

> Many of the CI/CD controls we describe in this paper are placed where your
> code is developed, reviewed, and maintained by one organization. If you are
> in this situation, consider how you will include third party code as part of
> your policy requirements. For example, you could initially exempt the code,
> while you move towards an ideal state of keeping a repository of all third
> party code used, and regularly vet that code against your security
> requirements.

I don't know how much third party code is in use at Google these days, but I'd
be curious to know if there is a formal effort at cataloging most-often used /
most-sensitive third party code and prioritizing reviews of it.

I've thought about the problem of vetting programming language packages (pypi,
npm, rubygems, whatever) off and on. It seems like the only two tenable
strategies are "don't pin anything / always use tip of master" and "freeze
deps, vet transitive deps at that frozen point, vendor the corresponding deps,
and if you ever need to update a requirement for a feature or bugfix, go
through the process again".

The latter seems like it could be out-sourced to a certain degree, if you
trusted other organizations to "vet transitive deps".

~~~
cbhl
For one thing, all third-party code is checked into one top-level folder in
the monorepo.

This is publicly documented here:

[https://opensource.google/docs/thirdparty/](https://opensource.google/docs/thirdparty/)

This means that dependencies on a third-party library can be found simply by
looking at deps lines in BUILD rules. That can then inform which projects you
want to run (for example) fuzzers on:

[https://opensource.googleblog.com/2016/12/announcing-oss-
fuz...](https://opensource.googleblog.com/2016/12/announcing-oss-fuzz-
continuous-fuzzing.html) [https://google.github.io/oss-
fuzz/](https://google.github.io/oss-fuzz/)

------
sterlind
At Microsoft, we just require all binaries to be signed on production systems.
Some systems are configured to block execution 9f unsigned code. Where we
can't do that, monitoring cuts an immediate sev-2 and wakes us up if any
unsigned code is executed.

Does Linux not have a way to run only signed ELFs?

~~~
cbhl
If a single person can generate a signed binary by themselves, then
restricting execution to signed binaries does not address the threat model in
the article.

~~~
sterlind
You can't. Only official builds are signed, and two different people have to
submit and approve PRs.

~~~
cbhl
That's a big difference from the starting state that Google had, which was
that a single person could create a signed production binary from
_unsubmitted_ code all by themselves.

(It was very convenient for iterating on one-off fixes in production in an
emergency, but you would rightly question how someone gets into that position
in the first place. Plus there was no guarantee that the code would ever get
submitted, and post-fix code review might cause the code to be subtly broken
prior to being committed to the monorepo.)

------
tptacek
Has anyone outside of Google implemented something similar in spirit to this
for K8s or ECS? What was the threat model you were considering when you built
it? Was it worth it?

~~~
mayakacz
Yes, there's a few listed in this blog post:
[https://cloud.google.com/blog/products/identity-
security/bey...](https://cloud.google.com/blog/products/identity-
security/beyondprod-whitepaper-discusses-cloud-native-security-at-google) \-
Kubernetes admission controllers, OSS part of Kubernetes:
[https://kubernetes.io/docs/reference/access-authn-
authz/admi...](https://kubernetes.io/docs/reference/access-authn-
authz/admission-controllers/) \- Kritis, OSS:
[https://opensource.google/projects/kritis](https://opensource.google/projects/kritis)
\- OPA Gatekeeper, OSS: [https://github.com/open-policy-
agent/gatekeeper](https://github.com/open-policy-agent/gatekeeper) \- Binary
Authorization on GKE/Anthos: [https://cloud.google.com/binary-
authorization/](https://cloud.google.com/binary-authorization/) They don't all
do all the pieces. The hardest part is going to be integrating whatever
enforcement solution you choose with your upstream CI/CD pipeline.

Disclosure: I work at Google and helped write this whitepaper on Binary
Authorization for Borg.

------
alexellisuk
Fascinating, the part we may get the most out of is "Adopting similar controls
in your organization" \- not just Google For Everyone Else.

------
HocusLocus
I read the CIO-level summary!

Woo hoo hoo! Im portant.

------
panarky
_> We want to have confidence that the administrators who run the systems that
access user data cannot abuse their powers._

So "Binary Authorization for Borg" is a defense against getting Snowdened.

~~~
shadowgovt
It's more a defense against getting NSA'd (via the specific threat model of an
attacker secretly replacing a security service with an implementation that
looks very similar but is much easier to crack).

~~~
skybrian
More generally you might say it supports rule of law. If something happens
according to procedure then it's ok.

You might not think that's much of a guarantee, but it beats the alternative
where things happen due to shadow processes.

~~~
mayakacz
I think that's right. I would strengthen that statement slightly - it's about
ensuring that no actor - whether an insider, or someone who has stolen their
credentials, or otherwise compromised them - can perform an action that single
handedly accesses user data, without it being known to another actor - via
access logs, via approvals, etc.

In terms of the upstream introduction of a new vulnerability, Binary
Authorization for Borg can only verify that the code was in fact merged. See
the section on third party code, "When importing changes from third party or
open source code, we verify that the change is appropriate (for example, the
latest version)."

Disclosure: I work at Google and helped write this whitepaper on Binary
Authorization for Borg.

