
JWT scope claim compression using a bitmap - rmedaer
https://raphael.medaer.me/2020/05/21/scope-bitmap.html
======
sonofgod
That's a great garden path sentence. (or at least beautifuly ambiguous)

I initally parsed it as "[The] James Webb Telescope ([which is a tele]scope)
[team] claim [they can] compress [pictures somehow] using a bitmap".

~~~
cpcallen
I've skimmed the article and I'm still not sure what JWT stands for.

~~~
eyelidlessness
At least the first several google results look pretty explanatory to me?

~~~
ernesth
> the first several google results look pretty explanatory to me?

Is the article really about Java Web Toolkit?

~~~
eyelidlessness
Maybe my Google results are especially tailored to my search/browsing history
:( Every result for me was about JSON Web Tokens.

------
shoo
As a casual reader not familiar with problem of large JWT scopes, I suggest
the strength of the argument for this proposal could be improved by explaining
exactly why larger JWT tokens cause problems in practice, defining some
metrics or benchmarks that can be used to measure the impact of the problem,
then comparing the proposed solution to the baseline approach using those
metrics.

E.g. are large JWT tokens bad because they take longer to send? Then compare
durations between both implementations. Or are large JWT tokens bad because
they exceed some maximum supported token size? Then compare how many scopes
are supported by the standard approach and new approach before hitting limit.

Another thought: the proposed compression scheme requires that each token
producer & consumer needs to depend upon a centralised scope list that defines
the single source of truth of which scope is associated with which index for
all scopes in the ecosystem.

If we assume we've set that up, why not generalise and say the centralised
shared scope list is actually a centralised JWT compression standard that
defines how to compress and decompress the entire JWT.

This could be implemented as something like
[https://facebook.github.io/zstd/](https://facebook.github.io/zstd/) running
in dictionary compression mode, where a custom compression dictionary is
created that is good at compressing the expected distribution of possible JWT
values. As long as each JWT producer and consumer is using the same version of
the centrally managed compression dictionary the compression & decompression
could still occur locally.

In practise in any given JWT ecosystem perhaps there's a subset of scopes that
commonly appear together. If common enough that entire subset of common scopes
could be encoded in 1 or 0 bits using a custom compression scheme tuned to the
frequency of observed JWT scopes, instead of compressing each scope
independently.

~~~
rmedaer
Hi shoo, thanks for your feedback. At this point in time this is only a
proposal. A proposal which needs to be challenged. So thanks again for your
useful feedback.

Indeed the goal is to limit the size of JWT tokens. I can't tell you if it
really improves the performances for now. I already started a spreadsheet to
compare the bitmap scope list against the space-separated list. Although I
need some real examples and metrics to do a relevant analysis of the impact.

One of the questions you raised is about the scopes commonly used. Should we
defined a shared dictionary/registry ? Maybe. I would propose to open an issue
on Github to discuss about that. Here it is:
[https://github.com/rmedaer/rmedaer.github.io/issues/2](https://github.com/rmedaer/rmedaer.github.io/issues/2)

If there is more interest in this proposal, I would propose to create a
dedicated repository where ~~I~~ __we __could discuss, compare and challenge
it.

Kind regards,

~~~
magicalhippo
> Indeed the goal is to limit the size of JWT tokens.

At work we just implemented some M2M auth using JWT[1]. The other party
requires a full certificate chain as our identification and RS256 as
algorithm, so our "compact" tokens end up around 8k in size.

At least the auth token we get back lasts a couple of minutes.

[1]:
[https://difi.github.io/felleslosninger/maskinporten_protocol...](https://difi.github.io/felleslosninger/maskinporten_protocol_jwtgrant.html)

~~~
rmedaer
I see that you have a lot of scopes
([https://github.com/difi/felleslosninger/blob/ad9ef79b4fef61f...](https://github.com/difi/felleslosninger/blob/ad9ef79b4fef61f741cdfb296ac5f975167f81bf/_docs/ID-
porten/oidc/oidc_protocol_scope.md)). Especially from 3rd parties
([https://integrasjon.difi.no/scopes/all](https://integrasjon.difi.no/scopes/all)
[https://integrasjon-ver2.difi.no/scopes/all](https://integrasjon-
ver2.difi.no/scopes/all))

Do you have some statistics about that ? For instance, do you know how many
scopes are usually requested, on average ?

~~~
magicalhippo
Unfortunately not. We're just outsiders, using Maskinporten to get an auth
token to be used against the REST API of some other gov't agency. For that we
use one of two scopes, prod scope or test scope, as they (the agency we talk
to) haven't narrowed it down further yet.

But if you're interested maybe try to contact the Difi folks running
Maskinporten, from my impression there's a high chance there's someone willing
to share there.

Maskinporten is being phased in as the primary M2M auth solution for any
Norwegian gov't agency, so they're bound to get a lot more "users" (agencies),
and hence scopes, going forward.

------
tyingq
I wonder if this is much smaller than using one character claims and regular
http transport deflate/gzip compression.

~~~
rmedaer
Here I'm talking about the value of one particular claim: `scope`. If you
identify each scope by only one character it would be limited to the size of
your alphabet.

If you talk about claim names, they basically aim to be short. For instance,
claims defined in RFC7519 Section 4.1
([https://tools.ietf.org/html/rfc7519#section-4.1](https://tools.ietf.org/html/rfc7519#section-4.1))
are only 3 characters long. As explained in the same section:

    
    
      "All the names are short because a core goal of JWTs is for the representation to be compact."

------
tehbeard
> Your resources (aka content) ACL should not be in the scope itself

I have to disagree on this one, being able to specify which resources an OAuth
client can tinker with is useful (e.g. only read access to x,y and z repos).

I'm also curious on how often these use cases are of needing many scopes / a
god JWT, vs. production usage and keeping a narrow scope for the task at hand.
There's also the other option (if in charge of authoring the resource server)
to have broader scopes that encompass several others.

~~~
user5994461
In enterprise, one example is when active directory groups are put into the
token.

This makes sense because permissions are often managed by groups (read write,
read only, user, admin, etc...), so employees can request a specific group to
access some business application(s). This causes issues when an employee
invariably has a hundred groups, adding multiple kilobytes to the token, more
than is permitted by HTTP headers.

~~~
tehbeard
Ah, I haven't had the pleasure of dealing with enterprise AD that convoluted
yet.

------
tlarkworthy
The access token is designed for saving size. Scopes are in the ID token. The
disadvantage of the access token is the required back channel. But then this
scheme also needs a shared backchannel so it's pretty much an access token
implementation (you could express it as such)

[https://developer.okta.com/docs/guides/validate-id-
tokens/ov...](https://developer.okta.com/docs/guides/validate-id-
tokens/overview/)

------
cafxx
How about side-stepping the problem of big lists of scopes by training a zlib
or zstd dictionary with the list of scopes, and then compressing the scopes in
the jwt token using this dictionary?

Obvious benefit is that you can still represent a scope that is not included
in the ones used to train the dictionary (vs. the proposed approach that
breaks down in this case)

------
abhishektwr
I am curious, why will you not use OAuth 2 userinfo endpoint which can serve a
lot more detail and keep claims in JWT simpler and lightweight.

~~~
quaffapint
If you can just pass around the JWT you can save a network call. I would say
the size of the JWT wouldn't matter as much as that call.

~~~
abhishektwr
You still have to make network calls to obtain public key (JWKS) to validate
token signature. Unless you are using shared private keys. With userinfo you
will know if token is invalidated or not.

I guess it also depends on use case. If you are in domains such as banking
with elevated security requirements, then probably you want to hit userinfo
endpoint else you can continue with token validation with cached or stored
keys.

~~~
jepcommenter
You don't pull JWKS on every request

------
quaffapint
Should http/2 header compression not take care of the JWT size to the point
that would make this more work than it's worth?

~~~
rmedaer
Thanks quaffapint to raise this point. To be honnest, I hesitated to add this
question in the post. Indeed HPACK could partially solve the issue. But as you
said, it requires HTTP/2\. Btw, HPACK is well explained here:
[https://developers.google.com/web/fundamentals/performance/h...](https://developers.google.com/web/fundamentals/performance/http2#header_compression)

I tried in this post to not talk about the "transport". Indeed JWTs can be
used with HTTP/1.1, HTTP/2 or even SIP. Furthermore HPACK maybe disabled in
some cases. Here is what the RFC 7541 (HPACK) says about Authorization header
([https://tools.ietf.org/html/rfc7541#section-7.1.3](https://tools.ietf.org/html/rfc7541#section-7.1.3)):

    
    
      "An encoder might also choose not to index values for header fields that are considered to be highly valuable or sensitive to recovery, such as the Cookie or Authorization header fields."

------
amaccuish
This very similar to SID compression in Windows Kerberos. Funny to see the
same challenges and problems in the web space.

------
compassionate
I would like to see a standardized scopes composer tool to complement this.

------
Ken_Adler
I commend you for the attempt...

But the issue you are trying to mitigate (heavy tokens due to complex scope
strategy) is a symptom of a bigger problem that has caused OAuth-using folks
to scratch their heads for a long while. (of course, also realtes to non-Oauth
JWTs)

Tldr: The new "Cloud native" way of solving for this is to not push your
"Permissions" thru the token.

Basically, you limit the scopes included in a token to just a few basic ones
(essentially assigning the user to a "Role" \- think RBAC)....

... and then you use a modern Authorization approach (e.g. CNCF Open Policy
Agent) to implement the detailed/fine grain authorization.

Its hella cool, declarative, distributed, and infinitely scalable...

... and it obviates the whole "heavy JWT" issue before it starts....

Source: This is what I do day in day out in my day job....

~~~
cordite
What libraries or services do you recommend using to implement that very
approach?

