
Zanzibar: Consistent, Global Authorization System - acjohnson55
https://ai.google/research/pubs/pub48190
======
iampims
“Zanzibar scales to trillions of access control lists and millions of
authorization requests per second to support services used by billions of
people. It has maintained 95th-percentile latency of less than 10 milliseconds
and availability of greater than 99.999% over 3 years of production use”

Impressive!

~~~
the-rc
"This caching, along with aggressive pooling of read requests, allows Zanzibar
to issue only 20 million read RPCs per second to Spanner." ("Only")

I'm surprised by all the numbers they give out: latency, regions, operation
counts, even servers. The typical Google paper omits numbers on the Y axis of
its most interesting graphs. Or it says "more than a billion", which makes
people think "2B", when the actual number might be closer to 10B or even
higher.

~~~
justicezyx
Kind of needed in the cloud war where many actually question if the mysterious
Google supermacy in global infrastructure is really true.

~~~
wbl
Throwing servers at the problem is less impressive then thinking very hard and
solving it with less.

~~~
justicezyx
If your conclusion is "throwing servers at problems" after years of reading
the papers about Google infrastructure, you probably are a non infrastructure
guy.

A serious conclusion should be that all these infrastructure enable
application devs and researchers alike "to throw servers at problem". And
these work are exactly the opposite, where they spent years and sometimes even
decades meditating the nifty and figure out the most effective and efficient
way of utilizing the servers.

~~~
paulryanrogers
With the demise of Moore's Law and physics education improving I imagine
efficiency of the machines will eventually overtake developer time concerns.

------
wallflower
> There's also a story behind that project name. That is not the original
> project name. The original project name was force-removed by my SVP. Once my
> hands are free again, I can explain

[https://mobile.twitter.com/LeaKissner/status/113663143751427...](https://mobile.twitter.com/LeaKissner/status/1136631437514272768)

~~~
pronoiac
[https://twitter.com/LeaKissner/status/1136691523104280576](https://twitter.com/LeaKissner/status/1136691523104280576)

> Zanzibar was not the original name of the system. It was originally called
> "Spice". I have read Dune more times than I can count and an access control
> system is designed so that people can safely share things, so the project
> motto was "the shares must flow"

~~~
adfm
“The people who can destroy a thing, they control it.“

------
argd678
The distinguishing feature I see compared to other systems is the ACL ordering
and consistency, which is indeed difficult to do at scale. Looks like Spanner
is doing most the heavy lifting, great use case for the database.

~~~
usaar333
Well even more broadly it is how generalizable it is, while still providing
ordering guarantees (though not necessarily perfect ones.. see my long sibling
post)

Using Windows style ACEs for ACLs is also perfectly scalable and consistent,
(and more performant) so long as users don't end up in too many groups and
objects only inherit ACLs from objects on the same shard. It's just no where
as generalizable as Zanzibar which allows much more complex dependencies.

There's always tradeoffs! But this is the best system I've seen for the
general ACL evaluation against non recently updated objects.

~~~
argd678
I’ve been part of similarly generalized ACL systems and it’s pretty
straightforward and very similar to Zanzibar. Though we didn’t need n ACLs and
could assume the list wasn’t too long, so we didn’t need a tree. If we did,
then we’d have ended up in a similar place as Zanzibar I believe, there are a
limited number of ways to solve that problem.

------
victor106
What do other large (non-google scale) to medium companies use for
authorization? Can anyone recommend open source (preferably) or close source
products?

~~~
jamafu
We use Keycloak at our place and are really happy with it.

Website: [https://www.keycloak.org/](https://www.keycloak.org/)

GitHub:
[https://github.com/keycloak/keycloak](https://github.com/keycloak/keycloak)

~~~
dusty_mc_dusty
Jup, it's quite nice!

------
usaar333
Excellent paper. As someone who has worked with filesystems and ACLs, but
never touched Spanner before, I have some questions for any Googler who has
played with Zanzibar. (in part because full-on client systems examples are
limited)

A check my understanding: Zanzibar is being optimized to handle zookies that
are a bit stale (say 10s) old. In this case, the indexing systems (such as
Leapord) can be used to vastly accelerate query evaluation.

Questions I have (possibly missed explanations in the paper):

1\. If I understand the zookie time (call it T) evaluation correctly, access
questions for a given user are effectively "did a user have access to a
document at or after T"? How in practice is this done with a check() API? The
client/Zanzibar can certainly use the snapshots/indexes to give a True answer,
but if the snapshot evaluation is false, is live data used (and if so by the
client or Zanzibar itself?)? (e.g. how is the case handled of a user U just
gaining access to a group G that is a member of some resource R?)

2\. Related to #1, when is a user actually guaranteed to lose access to a
document (at a version they previously had access to?) E.g. if a user has
access to document D via group G and user is evicted from G, the protocol
seems to inherently allow user to forever access D unless D is updated. In
practice, is there some system (or application control) that will eventually
block U from accessing D?

3\. Is check latency going to be very high for documents that are being
modified in real time (so zookie time is approximately now or close to now)
that have complex group structures? (e.g. a document nested 6 levels deep in a
folder where I have access to the folder via a group)? That is, there's
nothing Zanzibar can do but "pointer chase", resulting in a large number of
serial ACL checks?

4\. How do clients consistently update ACLs alongside their "reverse edges"?
For instance, the Zanzibar API allows me to view the members of a group
(READ), but how do I consistently view which groups a user is a member of?
(Leapord can cache this, but I'm not sure if this is available to clients and
regardless it doesn't seem to be able to answer the question for "now" \- only
for a time earlier than indexed time).

Or for a more simple example, if I drag a document into a folder, how is the
Zanzibar entry that D is a child of F made consistent with F's views of its
children?

E.g. can you do a distributed transaction with ACL changes and client data
stored in spanner?

5\. It looks like the Watch API is effectively pushing updates whenever the
READ(obj) would change, not the EXPAND(object). Is this correct? How are
EXPAND() changes tracked by clients? Is this even possible? (e.g. if G is a
member of some resource R and U is added to G, how can a client determine U
now has access to R?)

~~~
ruomingpang
Excellent questions. We have encountered all of them in practice and have
solutions for most of them, e.g., (4) requires an ACL-aware search index.

Unfortunately we don't have enough space to explain them in the paper. Please
consider coming to Usenix. :-)

------
eximius
Semi-off topic: What is the latest and greatest in authorization mechanisms
lately?

I like capability-based at on OS level, but sadly I'm not doing anything that
interesting. For things like webapps, is there anything better than ACLs or
Role-based. Or at least any literature talking about them? Probably overkill
for the application I work on, but it'd be nice to take inspiration from best
practices.

------
ubercow
Semi-off topic but is there a curated "best of" list for systems papers like
this that anyone knows about, from Google or otherwise?

~~~
bretthardin
If you find this, please let me know.

------
zeeed
I got stuck on the first line in the abstract:

> Determining whether online users are authorized to access digital objects is
> central to preserving privacy.

Can someone dissect that sentence and explain why that is? I honestly fail to
make the connection.

~~~
Dowwie
Replace "digital object" with "a PDF of your checking account transactions for
2018". You want to control who can do what with that PDF. Your privacy is at
stake.

~~~
zeeed
Sure, that’s privacy in the sense of “no one can access my stuff,
unauthorized”.

I struggled with the sentence cause, at the same time, creating one global
centralized authentication source creates the opposite of privacy in the sense
of anonymity. Certainly OT wrt the actual content of the work...

~~~
BitPolice
I may be misunderstanding the issue you're pointing out here... but I note
that while the paper/sentence talks about "authorization" you're talking about
centralized "authentication."

As an authorization system Zanzibar focuses on: can agent A (identified
through some means) perform action X on object Y. It isn't about deciding
whether an arbitrary actor is agent A but proscribing what actions agent A can
perform against the universe of all possible objects (which likewise are
referenced abstractly and not stored within the system itself).

The knowledge that A could do X on Y is information that might be disclosed
(and thus entails some privacy risk)... but inherently doesn't reveal:
anything about the identity of A; whether A has ever done X; or what Y's
contents are or what it represents.

On the other hand, perhaps you mean that because membership in sets of users
is also stored within it (via a sort of "is member of" permission) you can use
that to de-anonymize who a given actor is. This might work but it assumes you
can uniquely derive which agent from a set of abstract agents represents that
individual _and_ that you extrinsically something about the person being the
only person in this specific set of sets.

------
sb8244
How would you deal with questions like "provide all content accessible to a
user" in a system like this? Would you watch and replicate to your own
database?

~~~
ruomingpang
You will need an ACL-aware index, which is one of the main use cases of
Zanzibar.

~~~
sb8244
Do you get the ACL from Zanzibar to your data store using a watch?

I'm just confirming that replication is the best strategy and there's not some
magic that I'm not aware of

------
1023bytes
Why is it called Zanzibar though? I'm kind of intrigued

~~~
pronoiac
There's another thread about naming it -
[https://twitter.com/LeaKissner/status/1136691523104280576](https://twitter.com/LeaKissner/status/1136691523104280576)

The original name was Spice, which was nixed from a higher-up; they went to
Zanzibar, one of the Spice Islands.

~~~
ddebernardy
Odd. Zanzibar is off the coast of Tanzania in East Africa. The Spice Islands
(the Moluccas) are in Eastern Indonesia.

~~~
pronoiac
I was going by the twitter thread, but I looked and found this in Wikipedia:

> the Zanzibar Archipelago, together with Tanzania's Mafia Island, are
> sometimes referred to locally as the "Spice Islands" (a term borrowed from
> the Maluku Islands of Indonesia).

[https://en.wikipedia.org/wiki/Zanzibar](https://en.wikipedia.org/wiki/Zanzibar)

------
cryptonector
This reminds me I need to get my authz paper published, and now sooner than
later...

I've built an authz system that is built around labeled security and RBAC
concepts. Basically:

    
    
      - resource owners label resources
      - the labels are really names for ACLs in a directory
      - the ACL entries grant roles to users/groups
      - roles are sets of verbs
    

There are unlimited verbs, and unlimited roles. There are no negative ACL
entries, which means they are sets -- entry order doesn't matter. The whole
thing resembles NTFS/ZFS ACLs, but without negative ACL entries, and with
indirection via naming the ACLs.

ACL data gets summarized and converted to a form that makes access control
evaluation fast to compute. This data then gets distributed to where it's
needed.

The API consists mainly of:

    
    
      - check(subject, verb, label) -> boolean
      - query(subject, verb, label) -> list of grants
        (supports wildcarding)
      - list(subject) -> list of grants
      - grant(user-or-group, role, label)
      - revoke(user-or-group, role, label)
      - interfaces for creating verbs, roles, and labels,
        and adding/removing verbs from roles.
    

Note that access granting/revocation is done using _roles_ , while access
checking is done using _verbs_.

What's really cool about this system is that because it is simple it is
composable. If you model certain attributes of subjects (e.g., whether they
are on-premises, remote, in a public cloud, ...) as special subjects, then you
can compose multiple check() calls to get ABAC, CORS/on-behalf-
of/impersonation, MAC and DAC, SAML/OAuth-style authorization, and more. When
I started all I wanted was a labeled security system. It was only later that
compositions came up.

Because we built a summarized authz data distribution system first, all the
systems that have data will continue to have it even in an outage -- an outage
becomes just longer than usual update latencies.

check() performance is very fast, on the order of 10us to 15us, with no global
locks, and this could probably be made faster.

check() essentially look's up the subject's group memberships (with the group
transitive closure expanded) and the {verb, label}'s direct grantees, and
checks if the intersection is empty (access denied) or not (access granted).
In the common case (the grantee list is short) this requires N log M
comparisons, and in the worst case (the two lists are comparable in size) it
requires O(N) comparisons. This means check() performance is naturally very
fast when using local authz data. Using a REST service adds latency,
naturally, but the REST service itself can be backended with summarized authz
data, making it fast. Using local data makes the system reliable and reliably
fast.

query() does more work, but essentially amounts to a union of the subject's
direct grants and a join of the subject's groups and the groups' direct
grants.

special entities like "ANYONE" (akin to Authenticated Users in Windows) and
"ANONYMOUS" also exist, naturally, and can be granted. These are treated like
groups in the summarized authz data. We also have a "SELF" special entity
which allows one to express grants to any subject who is the same as the one
running the process that calls check().

~~~
galaxyLogic
Cool. Keep us posted

------
GMLOOKO
A

------
sonnyblarney
What's interesting to me here is not the ACL thing, it's how in a way
'straight forward' this all seems to be.

It's the large architecture of a fairly basic system, done I supposed
'professionally'.

I'm curious to know how this works organizationally. What kind of architects
involved because this system would have to interact with any number of others,
so how do they do requirements gathering? Do they just 'have experience' and
'know what needs to be done' or is this something socialized with 'all the
other teams'?

And how many chefs in that kitchen once the preparation starts? Because
there's clearly a lot of pieces. Do they have just a few folks wire it out and
then check with others? Who reviews designs for such a big thing?

Or was all of this developed organically, over time?

~~~
the-rc
Especially at Google, you first see the same problem appearing and getting
solved in multiple products, then someone tries to come up with a more generic
solution that works for most projects and, just as importantly, can serve more
traffic than the existing solutions. Having to rewrite things on a regular
basis because of growth is painful, but can also be a blessing in disguise.

Who that someone is who works on the generic solution, can vary. Sometimes
it's one or more of the teams already mentioned. Sometimes, like in this case,
it's someone with expertise in related areas that takes the initiative. And a
project of this scope invariably gets reviewed on a regular basis by senior
engineers, all the way to Urs (who leads all of technical infrastructure).
Shared technologies require not just headcount to design and write the
systems, but also to operate them (by SREs when they're large enough), so you
need to get upper management involved as well.

~~~
sonnyblarney
This project says way more about the organization than any specific technical
competence.

I'm not close to Google, but from those I know on the product side it can be
'a Gaggle' with nobody really in charge ... but I guess if you have enough
self-motivated conscientious actors, and mature people, without ugly turf
wars, who can have reasonable discussions, and responsible enough people in
charge that can steer things in an appropriate direction ... it works.

But the fact this is an evolution and not a 'new product' is probably
prerequisite - so many smart people are hard to coral around new ideas, but if
it's done A B C times, then a 'Z' solution speaks to an Engineers sense of
efficiency and it should be natural for such an org to want to do it.

I won't name names, but I worked at a large tech company that could not get
'Single Sign On' to work. It was really frustrating to think so many
reasonably smart people couldn't figure that out.

We don't need genius I think just a wealth of experience and a lot of common
sense.

------
colesantiago
I love reading about Google's systems, but I wish I could work on those
problems at scale, that is my dream really. I wonder what more systems Google
has that we don't know about.

I know Borg has become what we know as k8s but surely there must be more
things that Google has made internally that are not open source.

Curious about this and would like to know more about it from anyone in the
trenches at Google.

~~~
gregorygoc
The harsh truth of working at Google is that in the end you are moving
protobufs from one place to another. They have the most talented people in the
world but those people still have to do some boring engineering work.

~~~
duality
What is the right data format to move around? JSON?

~~~
Xorlev
It's just a saying. All we do is move protos from one service to another.

JSON is definitely not the right stuff.

~~~
dmoy
The encoding/decoding cost is painful :(

I mean in this context if you're doing that level of scale. For a lot of
purposes json is totally fine.

~~~
isatty
It's really not - compared to the wire cost/static type checks and loads of
other stuff you give up.

------
nippoo
As a side-note: 95th percentile latency statistics are pretty meaningless at
this scale. With a million requests per second, a 95th percentile latency of
10ms still means that 50,000 requests per second are slower than that.

~~~
jsty
They do give p99 latencies in the table on page 10

~~~
ktta
99 - < 20ms

99.9 - < 90ms

That is amazing.

~~~
alexeldeib
This is absolutely incredible. Since we saw login with Apple yesterday, makes
me wonder if any of the other big companies can compete with this. Curious
about Facebook/Netflix/Amazon.

Netflix seems zippy, but I've never looked at the request timings, which could
differ pretty dramatically from UI load times. I imagine Google also dwarfs
their login scale. Would be interesting to see numbers capturing full load
time from clicking the login UI to successful redirect (or however you would
measure this without including the time of the page load post-login).

~~~
lclarkmichalek
This isn't a login service, this is an ACL service. Related space, but
different concerns. You wouldn't send a user's password here to find out if
it's correct (authentication), you'd use this to figure out if a user can do
something once you know who they are (authorization) :)

Also, generating the login page etc is often more expensive than the actual
'validate the username and password'. Getting to the server is also going to
dwarf these latencies; you probably don't store all your passwords in PoPs, so
you need to make the full trek to your local Google datacentre to complete a
login :)

~~~
alexeldeib
Awkward. I realized it was used for authz, but for some reason I assumed it
would be used for authn as well. Now I’m wondering how Google does authn...

And yeah, the second half of my comment is trying to scope down the comparison
to one that is reasonably “fair”

~~~
scottlamb
> Now I’m wondering how Google does authn...

That's my corner of Google. We haven't published anything comparable to this
paper in the time I've worked on it (maybe we could—I'm pleasantly surprised
to see the Zanzibar folks got approval to share qps numbers and everything)
but here's a bit about how it worked back in 2006:

[https://www.usenix.org/legacy/event/worlds06/tech/prelim_pap...](https://www.usenix.org/legacy/event/worlds06/tech/prelim_papers/perl/perl_html/gaia-
worlds.html)

Some of that still applies.

fwiw, while we do our fair share of password checking, we do a _lot_ more
oauth token and cookie checking. Most folks just stay signed in on both mobile
and web, so no need to recheck their passwords. In contrast, session
credentials get checked on every request.

------
demarq
Not sure how I feel about adopting a countries name for a project.

Or more to the point I'm not sure how I would feel if every time I searched my
countries name on the web this Google project appears rather than my actual
country.

i.e Zanzibar is a national identity not just a "spice" island

~~~
NameOfTeam
To be clear, Zanzibar is neither a country nor a national identity. It’s a
semi-autonomous region of Tanzania.

------
stingraycharles
Am I alone in thinking that 99.999% measured availability for a service so
completely in the critical path for almost everything is relatively low?

Phrased another way, when it is not availability, do end users experience
service disruption, and if not, how is that mitigated?

~~~
gtirloni
I think you might be alone. 5.26 minutes of down time per year is beyond
excellent for any moderately complex system.

~~~
cheez
I have a simple system that depends on another system and I can't keep it up
for a week without 15 minutews downtime

