Hacker News new | past | comments | ask | show | jobs | submit login
API Tokens: A Tedious Survey (fly.io)
387 points by enobrev 33 days ago | hide | past | favorite | 120 comments

Since the author is on here - reckon you could beat up on simple random tokens a _little_ bit more? In particular how easy they are to identify and prevent leaking (easily fixed by adding a prefix).

I work on secret scanning at GitHub. When token issuers use easily identifiable formats for their tokens we can easily spot them when they're accidentally committed. We can then work with the token issuer to automatically alert them of those leaks. A good example is AWS - if you commit an AWS key and secret to a public GitHub repo we will tell AWS about it and they will tell you about it (and quarantine the exposed keys) within a few seconds. We work with dozens of other token issuers too, though - some of the latest we added were Linear, PlanetScale and Ionic.

The above relies on tokens being identifiable - we can't send hundreds of partners everything that looks like 32 hex chars. In future we want to be able to do even more sophisticated things, like ask users for confirmation before the push code that contains secrets. We recently changed our own token pattern for that reason.

GitHub secret scanning program: https://docs.github.com/en/developers/overview/secret-scanni...

GitHub's updated token format: https://github.blog/2021-04-05-behind-githubs-new-authentica...

I bring this up every time this is mentioned, but I really wish the API token format included a domain to notify in case of leaks. Having services register with your in-house secret scanning system works very well if you're GitHub, but otherwise it's a very closed mechanism.

If sendgrid tokens were `secret:sendgrid.com/91on9SIkbUfSs` instead of `SG.91on9SIkbUfSs`, or Amazon keys looked like `amazon.com/JGUIERHT` instead of `AKIAJGUIERHT`, we wouldn't need a database of regexes and endpoints to report secret leaks.

See also the last time I ranted about this: https://news.ycombinator.com/item?id=26568651

Appreciate your passion here Remi. I don't think a full standardisation of API token formats is ever likely, but I do think there's value in nudging things in that direction.

One big challenge is that it's hard to get service providers to change their token formats. Very few have this at the top of their priority list - they're busy with other things. Here's an example playing out in OSS that is pretty typical: I tried to persuade the (excellent) team at Sentry to update their format, and they essentially told me "we have other priorities" https://github.com/getsentry/sentry/pull/26313. And that's a relatively simple changes, not the adoption of a whole standard.

In addition, as Thomas points out in this article, there are a lot of different token types for someone thinking about minting API tokens to choose between. They might rationally have different preferences over them. A standard that is prescriptive of format and approach is likely to struggle given that diversity.

With that said, I do see an opportunity here for a more modest standard targeted at service providers that already use JWTs or Macaroons. Generic tokens of those types are relatively easy for scanning providers to identify, and it's easy (and hopefully uncontroversial) for service providers to encode more information in them, like an "if found" link. I think a standard that defines the attribute name there, and the API for reporting / responding, would be a good start that might see adoption.

> I don't think a full standardisation of API token formats is ever likely

I thought so too but then https://www.rfc-editor.org/rfc/rfc8959.html came out this year.

Importantly I am not proposing a big change at all: The tokens can stay exactly the same (in the database and crypto code), you can still use UUID or Macaroons or JWT, you only change the frontend to add this prefix. Apologies if this wasn't clear in the the two examples I posted without explanations. The benefits would also be a bit higher than the PR you reference, which seems to help with scanning on GitHub (you mention that it would already work without the change).

As you note in your PR, many tokens are already identifiable, so standardizing a way to put the reporting domain in there shouldn't reduce security (by obscurity).

Taken a step further, the secret should just be a URL that revokes itself - Like https://revoke.sendgrid.com/91on9SIkbUfSs. Github should then just make a get request to every URL (can whittle abuse down a bit by requiring https://revoke.sendgrid.com/robots.txt to have a Revoke: YES` section). They and anyone could maintain an allowlist of revocation URLs to pattern match as well. This makes a global registry unnecessary, and standardises the act of revocation.

"Woops. I hit ctrl+enter instead of ctrl+c while copying my secret. Guess production's down for a bit while we roll new ones!"

I mean your core idea is decent but that's just really funny.

There's some amount of practicality being lost if your secrets start growing massively. There's also potentially restrictions to what you can put in them, and a prefix with an underscore or colon might be easier than something that has slashes in it.

Your idea is probably living as a queriable dns record on the domain in question. Or a standard subdomain, or even a .well-known path.

> Woops

Could require a POST or maybe even HTTP DELETE to guard against woopsies. The latter has semantic niceness about it.

The alternative (as in, current reality) is "Whoops. I hit ctrl+entry while copying my secret and no-one noticed for a month. Guess all our data is leaked now!"

It's also up to each provider what actually happens when the "revoke" action is triggered. Maybe they just warn you immediately, which is still better than nothing.

But if it's not a URL, Ctrl-Enter won't do anything.

If it is a URL, it opens in a web browser, and now you have problems.

Have shorter lived secrets. Also, not sure we should be solving for the "ctrl c" of secret use case tbh.

What I had in mind is posting them to <domain>/.well-known/report-leaked-secrets or a location looked up from the domain using DNS. Making them URLs is an interesting idea, but they are likely to look awkward (e.g. include "revoke" like your example) and get a lot of non-revocation traffic (even if we have a way to tell scanning apart from actual revocation requests, we'd probably rather only get the revocation traffic).

My professional email also mangles URLs, turning them into urldefense.proofpoint.com/... Such solutions are sure to interfere with tokens looking like URLs.

Maybe don't let anyone delete other people's tokens, even if leaked, but automatically alerting the admin if anyone accesses the "URL" would probably be a good option.

That wouldn’t be idempotent.

Also, you couldn’t send such a secret to, say, Gmail.

It would be in that repeatedly hitting the URL will not have any effects, other than disabling them the first time.

But yeah, auto-link followers will invalidate them immediately. There's a case to be made for that being a good thing, but don't want to get into that.

I agree with you. Someone brought this up on Twitter and I'm kicking myself for not remembering to include the notion of adding identifiable markings to sensitive tokens (I'd do it not, but I'd feel like I was plagiarizing).

And it's a noodly and somewhat incoherent notion of "safety" I'm using here, because of course, random tokens are unconstrained bearer tokens --- authenticated requests, CATs, Macaroons, and Biscuits all address that weakness. I'm biased by my concern over cryptographic implementation mistakes.

It's a neat property of Macaroons (and maybe Biscuits) that you can come up with sane configurations where checking a Macaroon into source code can be, if not totally safe, at least not a major incident. I wish I'd thought of that, too, since I think "checking the token into source control" is a more vivid example than "emailing tokens" or "passing them around".

Surely you needn't feel like you're plagiarizing, just give credit. The credit won't be any less deserved next week or next year, even if you "Know" you'd definitely have fixed this without prompting, nobody else knows that, and so it looks like you avoided giving credit where it was due, so, don't do that.

Nobody's asking for royalties, so "Shout out to @SomeoneOnTwitter for reminding me" is enough.

I will probably get around to doing that. If it was an error in the post I'd feel differently, but this is just another good idea I forgot to include.

Thanks for doing the good work of encouraging people to mint better tokens!

tbh I actually do think this is an error. I worked someone where we didn't do this and I think of it as a bug rather than a missing feature. It makes it very difficult to identify the token if it's ever leaked without checking it against our database!

Macaroons are great. I was using them successfully in a (now dead) product. Very easy to reason about when you get the basics. Unfortunately the ecosystem is small and the effort spent of getting colleagues on board (i.e. convincing them you're not using some fringe thing) was substantial and continuative.

I think part of the problem of Macaroons is the belief that there should be an ecosystem of them, and a standard, and standard libraries. They make work best when they're custom tailored to the applications that really want them.

> In future we want to be able to do even more sophisticated things, like ask users for confirmation before the push code that contains secrets

If you all have thought about it, do you imagine you'd only warn in the presence of some generic token identifier, like `secret-token` a la https://datatracker.ietf.org/doc/html/rfc8959 ? Or, would you be able to warn on everything that matches the regular expressions your partners give you to identify their API tokens?

The latter. Our objective for secret scanning is to prevent as many serious secret leaks as possible. Where a service already has a token format that is highly identifiable we want to take advantage of that, rather than rely on the adoption of generic token identifiers.

That's for the exact same reason (obvious identification and easy scanning) that I proposed to prefix all Scaleway's access keys with `SCW`.

See: https://blog.scaleway.com/strengthening-scaleway-token-secur...

This is great.

One thing that's worth remembering about randomly generated tokens is that it's important to always use safe comparison methods when comparing them to the stored one - otherwise you could be vulnerable to timing attacks.

In Python you can use secrets.compare_digest(a, b) for this: https://docs.python.org/3/library/secrets.html#secrets.compa...

I am wondering how that relates to searching for the token in a database (index). Does it still matter? I would assume the time might depend on where or even wether the token is in the index or not, not sure how much though and what to do about it.

Yes, it matters - timing attacks could absolutely work against an indexed column in a database.

The trick I've used for this is to have tokens that look like this:


Then you have database entries like this:

id = 234523 token = 7a561002f780e23853bdfbd89ae79bf2

When a token comes in you use the bit before the : to look up the database record by its indexed ID, then perform a safe comparison between the rest of the token and the value you retrieved from the database.

I’d rather store an HMAC of the token.

That way you’re hashing the user input, which sanitizes it. It also protects against timing attacks.*

And most importantly it protects against credentials leaking. Even insiders who can read a token from the database can’t use it to authenticate, because the app logic expects a pre-image of that value.

*HMAC resists timing attacks because the attacker can no longer control the bit pattern on either side of the equality check.

So basically passwords

Tokens are basically passwords, so yes.


Semantically these are very different objects/concepts


- tightly couple authentication (I am kortex) and authorization (I can do everything logged-in kortex can do)

- not meant to be rotated unless breached (forced password rotation mandates weaken passwords, imho)

- are passwords meant for human transport, if even briefly

- hard to revoke (once expired, still valid until user rotates)

- usually insufficient entropy, dependent on user habits

- one flavor: a secret string


- decouple authentication and authorization

- ephemeral

- revokable

- enable automatic rotation (session tokens vs auth tokens)

- can have metadata

- as much entropy as you want

- many flavors

Give me a break -- we're just arguing semantics at this point. My whole premise was that tokens are a secret, just like passwords are a secret, and that tokens should be treated like a first-class secret. Not that they're absolutely "the same." The article gives the illusion that Simple Random Tokens are safe to store as plain text. I was arguing that it's not, and that the token should be ran through an HMAC function at the very least to prevent plain text storage and to harden against timing attacks when querying. Nowhere was I implying that passwords and tokens are strictly identical in all aspects. I've worked at places where tokens are stored in plain text (i.e. almost everywhere I've worked) and it's a bad practice.

Passwords are hashed because users constantly reuse them between sites. If they didn't do that there would be no reason to hash them.

I thought salting+hashing was primarily to mitigate damage in an event of someone scooping the credentials data store. And/or bruteforcing.

If there were one login system/site in the whole world, you'd still want to hash/encrypt them.

I mean hashing is one of the easiest things to do to add a layer of security, so even if the db is encrypted, why not? Maybe I'm missing your point.

No, passwords are salted because they are reused...

This is not true at all. This whole discussion is pretty confused. All of the mitigations we apply to passwords are premised on passwords being (1) cryptographically weak, (2) irrevocable, and (3) infectious to other systems. Nerds are, for reasons I will never understand, hyperfixated on "salting". Randomized password hashes primarily address an old attack that is more or less irrelevant to any modern hash or KDF. But that attack, too, mattered because (1), (2), and (3) hold for passwords.

Ah ok I see what you are saying. My bad.

Totally agree with your main point of "tokens, like passwords, should be kept discrete."

Tokens are secrets.

Passwords are secrets.

Ed25519 private keys are secrets.

We hash passwords.

Thus we must hash tokens.

Should we therefore hash our Ed25519 private keys?

> Should we therefore hash our Ed25519 private keys?

No, but I'm of the opinion that you should encrypt private keys before storing them in a database.

What about the key that decrypts the private keys?

Encrypt that one too! Chelonia all the way down.

Isn't there a weird attack possible with publishing SHA512 hashes of Ed25519 private keys that Signify barely side-stepped?

Not really, no.

You wrote that entire blog post ... and you disagree that tokens are essentially passwords? Why so?

You might find the answer in the post.

I read the entire post. Still not following. Perhaps just a misunderstanding between us?

Same here. These are all examples of the same things. If you get your hands on the secret, you can do anything.

The only thing the post explains that there is a difference between checking a secret with the database vs message validation, also based on a secret. But for both, if you have the secret, you're doomed.

Expiration / immediate revocation when doing message validation is more difficult than having a distributed cache, which supports key eviction/expiry/removal.

Also, an asymmetric solution has similar problems: if you get the private key from the client, you're doomed. So it does protect the server part, but if people have access there, you're even more doomed. But at least it'll be easier to identify who had a security leak./

I think the difference may be in the "They’re easily revoked and expired" part.

Passwords are usually one-per-resource, e.g. a user has a single password. If that password is compromised you can reset it, but all the consumers that used it will need the new credential.

Whereas tokens are typically one-per-consumer, so if one is compromised you can revoke/expire just that token, without affecting other consumers.

Same with expiry times -- you can set that on a token without it expiring all access to that resource.

Not sure if that's what the author had in mind, but it's a difference in how these things are often used (even if fundamentally they are otherwise very similar).

Part of the problem here is that these discussions mean a bunch of different things when they draw the equivalence between passwords, tokens, and keys. Sometimes we're talking about the secret storage problem; sometimes we're talking about the bearer token problem; sometimes we're talking about brute-force, human-recall, and reuse problems.

Frustratingly, different tokens have different levels of exposure to all of these problems. A well-confined Macaroon has essentially none of the secret storage problems of a password. An API request authenticating key has none of the bearer token problems. No API token has the human-recall, reuse, and brute-force problems.

People hyperfixate on the secret storage problem, I think because they feel like they can get their heads around it. As you can probably tell, I'm loath to relitigate the debate about whether we should store passwords with secret salts. Even when the discussion is legitimately focussed on that problem, I find discussions about it go to weird, incoherent places that aren't really informed by real threat models, and are really hair-splitting arguments about the security of environment variables vs. the security of filesystems, just dressed up as top-3 security considerations.

Even for trivial basic-auth API keys, the secret storage problem is not the same as that of a password. Part of the problem with passwords is that they effectively cannot be revoked; "revoked" passwords creep back into systems, and, worse, infect other systems. That's not how API keys work; the threat model is different and so are the countermeasures that are profitable to deploy for them.

Another thing that I feel happens in discussions like this is that there's no notion of cost-benefit. There is option A, and option B, and option B is on some axis superior to option A, even if that benefit is marginal. Since there's no engineering cost to consider, there's no meaningful discussion. But in reality, there's always a cost: deploying a countermeasure at a minimum incurs the opportunity cost of not deploying some other countermeasure that could have been built with the same (finite) engineering resources. Since these discussions always seem to get mired in secret-salt double-hashing hmac(hmac(root-secret, measurement_1 || measurement_2), secret) stuff, I hope you can see the concern: the discussion essentially advocates for ever-more-marginal wins that "fit" the message board thread, rather than seriously considering the real problem.

But of course, the other problem is that the arc of the thread bends towards HMAC'ing your Biscuit token. "Tokens are just passwords", after all.

At any rate: this thread says, "passwords are secrets, tokens are secrets, ergo passwords are tokens". Signing keys are also secrets. Nobody HMACs them. Something is wrong with the syllogism. I'm not that interested in picking apart what.

Serious question - what makes the database serialization not vulnerable to timing attacks in the same vein? I wouldn't expect those to be purely constant time implementations.

You do the final comparison outside of the database after retrieving the stored value - I use the Python secrets.compare_digest() function for that.

> I am wondering how that relates to searching for the token in a database (index). Does it still matter?

It can, but the practicality of exploiting this timing leak isn't at all a settled issue.

Previously, https://soatok.blog/2021/08/20/lobste-rs-password-reset-vuln...

Run the token through a secure HMAC function before storing it in the DB. Problem solved. KISS. :)

Why would something derived from a random string have better comparison properties than the random string?

You can’t perform a timing attack for a token “foo” in SELECT WHERE token = :token if the token stored in the DB is the HMAC of “foo”. E.g. trying “f” and then “fo” produce 2 entirely different, random tokens from the query’s POV. The attacker could never deduce that the correct token is “foo.”

Could you elaborate on what you mean by timing attacks? What would go wrong if you did a basic params[:token] == dbvalue ?

Most naive string comparison methods (including language expressions like above) will usually compare one letter at a time until a mismatch is found. This allows an attacker to build up the correct password one character at a time. The correct letter will take longer to check then the incorrect.

But in the "real world" wouldn't any such time difference be very small and fluctuations in network delay and measurement error so large in comparison that this isn't typically exploitable?

Sure, better be safe and use a safe comparison method and that's simple enough, but still, how realistic is such an attack over the public internet?

I'm aware of "practical" results: http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

But the experiment setup was on a LAN, with AFAICT a single server, no other traffic to that server, and other such conditions.

The trick with timing attacks is that you don't measure the time taken for a single request. You send the same request thousands of times and take an average of the response time, which lets you pick away at the secret one character at a time.

Sure, that's what you'd attempt to address errors, but that still doesn't convince me. That only gets you so far. My intuition says that you won't be able to distill any meaningful results in practice across long distances and requests being routed to different backend servers under varying load. You are trying to isolate a very specific duration in the order of microseconds.

Of course, I'm not in the field and am happy to learn more. However, my intuition also tells me we should have seen more realistic experiments and results over the years to get a clear picture of the extent at which such an attack is actually feasible and when it seizes to be.

The difference is probably below 1 microsecond. A modern CPU running at 3GHz roughly performs 3 instructions every 1 nanosecond. It varies a lot but character by character comparison, especially one that’s happening often and thus cached is one of the most trivial usecases so let’s assume this nominal throughput.

I’d estimate a naive comparison loop to be around 20-30 cycles, for compare and control? And let’s wrap it in 250 cycles more because someone decided to use Python. 300 cycles, then, are about 100ns (tenth of a microsecond).

Not saying it can’t be done with a timing attack over the internet but you’d need a huge sample size.

A real attacker would try to minimize all those constraints by being collocated with the target as close as possible such as in the same datacenter or same rack if possible.

This post reminded me a little of https://blog.thea.codes/building-a-stateless-api-proxy/ from 2019, an absolutely brilliant hack that attempted to make up for GitHub's lack of finely-grained API tokens.

The key idea there is to build your own custom proxy for the GitHub API, then issue tokens for it which are actually encrypted bundles of the full-permission API token plus a set of rules about what the proxy should allow it to do - only allow a GET to paths that match "/gists/.*" for example.

It's somewhat similar to Biscuits storing a Datalog program "to evaluate whether a token allows an operation."

This was great. A really fair survey of various token methods. Plus plenty of liveliness, not boring at all. Thanks, OP!

One thing that I wish was addressed more was language/library support. It gets casual references a couple of times, but for an average developer (as I consider myself) a set of robust, supported open source libraries that help me use a token is so important (not write an implementation, but use in a project that just wants to use the tokens safely).

I don't have anything but anecdata, but I feel like most software is going to be in the 'just want to use it' category, rather than the 'need to implement it'.

This is where the standards like OAuth and JWT win right now. That doesn't mean they always will, but in my experience, that's the current situation.

For PASETO, the quick guide to library support is https://paseto.io

There's an additional nuance to opaque random-ish tokens that can be helpful in high-traffic situations. You can essentially encode some, for lack of a better word, "routing" information (shard, region, etc) into the token when you generate it. It's still random, you still verify the whole token with your database, but you can extract the routing info and pass it to the correct backend from a mostly-stateless frontend.

The one thing I'm not super comfortable about here is my PASETO take. My attitude going in was that PASETO has a lot of boosters and not a lot of critical takes. I can beat up on Macaroons because we're using them, and I'm going to follow up with a post about what our Macaroons like like. I'm not doing that with PASETO. So, like, I stand by it, but take it for what it's worth.


I turned the post into a handy chart. Let me know if you want a poster of it.


I have no idea if I am reading the chart right, but it looks like biscuits are the worst for scalability. Why is that?

I would imagine that if the entire policy is encoded in the biscuit, it is very easy to evaluate without needing to call external services. And it can be extended like macaroons without needing a central authority, assuming I groked your blog post correctly. The only issue I can see is revocation.

Ok so first of all let me just say the chart was a joke I wrote for Twitter. Then Joël Franusic suggested I add a bunch of meta tags to the post so that Twitter would show the chart as the "card" for the post on Twitter. To make that work I had to pull the chart into the actual site (I'd just posted it to Twitter originally), so I figured, what the hell, might as well slap it on the end of the post. I don't even know if the ratings I came up with make sense! I docked Biscuits because they're chained public key verification on an per-API-request basis, but who knows? I haven't used them! The chart isn't serious! And I feel like it's all I'm talking about now!

To top it all off, my dumb meta tags didn't even work; they needed to be in the <head> of the page, and I'll be damned if I'm going to figure out how to do that in our static site generator configuration.

I just wanted the Carl Yastrzemski with the big sideburns.

haha okay fair enough!

I hope you have learned your lesson in adding pictures to a long blog post :)

Jokes aside - I did enjoy reading through it and thank you for educating me on macaroons, CATs, and biscuits.

In your article, you say "For reasons I will never understand, the PASETO authors submitted it to CFRG for consideration. Never do this". Is issue here the implied design-by-committee that led to JWT being a bad standard, or something else? It does seem like it found some valid issues in PASETO, at least, so I had trouble understanding your point.

CFRG's response to PASETO was tepid. The few people who participated in the thread identified three issues:

* The v1 local tokens used a novel nonce construction (I'm doing this from memory) and CFRG's take was "standard constructions or GTFO".

* The HMAC/RSA thing, which PASETO noted and documented but didn't fix.

* The fact that PASETO is basically a restricted profile of JWTs, begging the question of why it didn't just specify a restricted JWT profile.

I don't think this feedback was especially valuable.

I think there are subjects on which CFRG discussions shed a fair bit of light, when they're high-profile enough to drag academic cryptographers into the fray. But the other thing that happens in CFRG is that bad stuff (like the Dragonfly PAKE) gets blessed (because there's no outcome besides "this is fine" and "this is trivially broken" --- and even "this is trivially broken" can get laundered back to "this is fine" if the threads get tedious enough).

In the worst case, you get people proposing bikeshed changes to constructions that are already de facto standards, which (if I remember right) happened with Curve25519, though thankfully not successfully.

I think the whole practice of standards based cryptography is mostly discredited at this point. Signal Protocol isn't a standard despite being the reference model for most secure messaging systems. WireGuard isn't a standard either. The original ethos of the IETF was that things get popular, and then they get standardized. IETF does a lot of stuff de novo now, which is how we end up with stuff like Heartbleed and JWT.

is there a key to decode the scores on the chart?

This seemed funny until I realized everyone else isn't 40+ years old and former Consumer Reports subscribers. The red-with-white-dot thing is "best".

Ya, I can't parse this whatsoever.

What is the difference between red and black? Thick circle? Half circle? No circle?

Original harvey balls might be better here: https://en.wikipedia.org/wiki/Harvey_balls

PS - thanks for putting together.

You'll have to take that up with Consumer Reports. :) The chart isn't very serious.

Here's some help (left to right is worst to best): https://www.pentagram.com/work/consumer-reports#22596

I still don't know where the black with white dot goes. I guess it's one better than the empty circle in the middle.

Brilliant chart, but this 40+ year old was left wanting a “TP Best Buy”

Could do with adding a key to the chart.

As a feedback on the visuals, I'm sure you know what this scorecard means. I'm sure I could figure out what this scorecard means if I put a bunch of effort in. The fact that "figuring out what it means" is a problem worth talking about is all you need to know.

If the idea was "LOL, joke scorecard" then, I guess joke is on me, otherwise, if you revise this I'd recommend to decide what one thing the scorecard is supposed to communicate, then visualise only that and accept that people will need to read the rest of the text to know more.

I also was unclear on what the scorecard was. From elsewhere in this thread it's something to do with "Consumer Reports", which I've never heard of. Wikipedia helped with the rest.

"Consumer Reports graphs formerly used a modified form of Harvey balls for qualitative comparison. The round ideograms were arranged from best to worst. On the left of the diagram, the red circle indicated the highest rating, the half red and white circle was the second highest rating, the white circle was neutral, the half black circle was the second-lowest rating, and the entirely black circle was the lowest rating possible"

It's like 90% a joke (it's the old Consumer Reports scale, where the red-white-dot is "best" and black-white-dot is "worst"). It's definitely not important to the content in the post.

You know, what I'd like to see more than standardised API tokens to make scanning easier is actual addressing of the underlying problem.

For example, we had a pentest done on a website and the pentester got all excited because they found some AWS tokens.

Trouble is, they would be worthless to anyone external because we were making good use of AWS IAM to lock them down with ACLs, Roles etc.

So it was effectively a non-event.

What happened to the old concept of layered security ? Why should discovery or leakage of an API key automatically give the attacker all the keys to the castle ?

In my ideal world, all cloud and API service operators would have the equivalent to AWS IAM and preferably would enforce its usage (i.e. "here's your API key, but it won't work until you set some layered security")

This very good practice and I am pleased that the one time I got the dreaded message from GitHub and AWS, that I'd done the unthinkable (on a public repo), the keys were only for accessing a single junk dev S3 bucket - phew!

But no amount of layering makes the problem go away. Sometimes god-like keys are unavoidable. Those needed by Terraform etc are an example that comes to mind.

I'd love to hear tptacek's view on the pros and cons of "authenticated connections," where the client uses asymmetric crypto to establish an authenticated-as-the-principal bytestream, and then issues a bunch of unsigned requests over that bytestream.

In practice, I'm talking about (1) "SSH public-key authentication to GitHub" (or to an EC2 or GCE instance), or (2) using FIDO2/WebAuthn (or TLS client certs...) to authenticate an HTTP connection or browsing session.

One major pro is that this is a lot more widely implemented and standardized than "authenticated requests" -- there are lots of reasonable implementations of SSH pubkey authentication or FIDO2/WebAuthn, compared with trying to sign an individual HTTP request. And that maturity brings robustness and security, e.g., I like that I can have my SSH private key (or my FIDO2 private key) on a tamper-resistant hardware device and not worry about some rogue software stealing my private key, whereas trying to find a hardware device to hold my AWS credentials and do the AWSv4 signature is a different story. Any bearer-token-based scheme seems... a lot scarier in that sense.

I'm less clear on the practical cons (assuming "authenticated requests" is not available). With SSH keys or TLS client certs, I could imagine it's annoying because only the terminus of the secure connection can verify that the client is who it claims to be, and maybe it's annoying to have your SSH/TLS termination component directly talk to a user database to authenticate individual clients and then attest that it did so to the rest of the backend. But... somehow GitHub/GitLab/Bitbucket all manage to do this at scale, and with WebAuthn/FIDO2, any component can be the one to make the challenge. (I suppose only recently can you make a "user presence not required" challenge, so maybe that's why nobody uses this in the non-presence setting...?)

I have actually a lot to say about this, in the context of mTLS; it's a post I'm working on.

I am generally a fan of the mTLS approach for very simple topologies, like "central Consul cluster nobody can talk to without the right client certificate". I am much less a fan of it for the general inter-service authentication problem.

Very consistent with what tptacek has been advocating for on news.yc over the years.

The authn mechanism we use is closer to Keybase's NIST (non-interactive session tokens)[0] that are a mix of AWS-style Bearer Tokens and the usual Random Tokens. Of course, the problems around "logistics" (public-key cryptography)[1] are a real nightmare as the post points out.

We exchange these tokens between devices (if needed) over password-authenticated channels (using CPACE [2]).

[0] https://keybase.io/docs/api/1.0/nist

[1] As examples, see what goes on when a Keybase user associates a new device: https://book.keybase.io/docs/crypto/key-exchange or when SQRL user revokes compromised keys: https://www.grc.com/sqrl/idlock.htm

[2] https://github.com/jedisct1/cpace

Is NIST NIST-compliant?

Nothing to do with NIST.gov, that's just unfortunate naming coincidence. Re: Compliance: Keybase, if am not wrong, has pretty much rolled out their own crypto here. At least in one instance they were subject to criticism for rolling out a key-wrapping scheme viz. TripleSec: https://news.ycombinator.com/item?id=9655245

> I continue to believe that boring, trustworthy random tokens are underrated, and that people burn a lot of complexity chasing statelessness they can't achieve and won’t need, because token databases for most systems outside of Facebook aren’t hard to scale.

Then why did you decide to go with macaroons rather than random tokens? Do you know for sure that fly.io needs that statelessness?

We need statelessness for service reliability more than we need it for scale. There are a number of services that people use within our infrastructure that need to continue working even when a centralized DB is unavailable.

Logs are a good example. We expose logs as a NATS service, people auth and subscribe to their logs. It's useful to terminate auth right there both because it's faster and because it continues to work when the internet burps.

I think this is a good question. I'm writing up our Macaroon implementation as I roll it out --- it'll be a few weeks before I publish --- and, if I've done it right, I'll have answered that question clearly.

Google's internal mutual authentication/encryption might be of interest. https://cloud.google.com/security/encryption-in-transit/appl...

They show how far you can drive Protocol Buffer Tokens.

I tend to fall in the trap of once you have authenticated the person and passing tokens around that can hold claims/role for authorization, you will eventually reach the point where the tokens are getting so bloated due to the complexity of access controls.

AuthN is the next big challenge especially in a multi-tenant/multi-enterprise SaaS type platform


...and by that I mean authorization is the next big challenge especially in a multi-tenant/multi-enterprise SaaS type platform

Authorization is authz, for what it's worth. "authn" means authentication.

I'm curious where kerberos tickets fit into this. A little out of vogue, but it'd be neat to hear how exactly they're long in the tooth.

Nobody sane uses Kerberos to do IAM for public APIs. But people use them for inter-service authentication, as the post mentions. It links to an article I wrote a couple years ago that considers Kerberos in the context of a variety of other inter-service security tools.

> Nobody sane uses Kerberos to do IAM for public APIs.

Why if you don't mind? I'm actually looking at a major proprietary protocol used by hundreds of millions of users that use Kerberos as the fundamental cryptographic attestation of identity and roles. Like clients all connect to a VPN who's bridge uses the Kerberos ticket to whitelist backend services accessible from that connection. They're basically being used the same as if an API gateway understood oath claims and could stop whole classes of client calls at the perimeter.

Because it's very complicated and fussy (no HTTP API client framework has K5 built in) and, if you're going to force your clients to use a nonstandard authentication protocol, you can do better than Kerberos. A private CA, mTLS, and an authenticated role-based certificate issuer probably does a better job across the board. Facebook talks a little bit about the tradeoffs here in the paper linked to the post; note that they could have used K5 instead of CATs, and the stuff that CATs does is in some ways a response to the limitations of K5.

Because standards are always improved by creating new options, I should note runes here (simplified macaroons with Python implementation): https://pypi.org/project/runes/

Disclosure: I'm the author

Simple random tokens are essentially passwords. Thus one should store them 'salted' in the authentication database. Is my understanding correct?

I don't see the need to salt them in most cases.

The reason you hash and store passwords with a salt is that there's a very good chance that a user will have used that same password somewhere else. As such, it's important that you make it as hard as possible for you - or a co-worker - or someone who gets hold of a dump of your database - to gain access to the original password.

Random tokens were generated by you. They are only valid against your own service. If someone bad gets hold of them, all they can do is make API calls against your service until that token is revoked. So salting and hashing them doesn't win you much.

That said, if a lot of people at your company have unfettered access to run queries against your database it may be a good idea to store your random tokens in a way that prevents people from copying them out and abusing them. But if you're storing any private data at all and you have that kind of a culture you have bigger problems you need to solve.

If you're worried about database timing attacks (I'm mostly not) then there's a good thread about that here:


What about revokable JWT tokens? Isn't that similar to the random token approach, except each token holds some additional meaning, and revokable.

Yes. Revocation is painful with all of the stateless approaches; it's not on its own a reason to avoid JWT. On the other hand, the folkloric draw of JWT is that it's stateless, and they're only stateless if you can revoke them without issuing SQL queries.

A downside of stateful auth is the extra DB round trip on every request.

Different revocation techniques like periodically distributing a revocation list to your auth services can resolve that part of the issue.

I get the criticism of ECDH-ES with JWT in vivo, but not the linked tweets' broadside against it in vitro. Are we not just talking about ECIES?

It's all the things that had to go wrong all at once. P-curves, not Curve25519 (where curve point validation is less important). Static-ephemeral ECDH, so there's a key to target. Long-term durable keys. I'm not making an argument against ECIES (though I guess I'd push not to use the P-curves).

Ironically my main takeaway is that given the pros and cons, JWT seems like the best balance of security and convenience, assuming you always hardcode/verify the algo. Lots of library support, allows Macaroon type policy setup using the JSON (but no chaining). It's not recommended in the article, but it seems like knowing what's out there has made JWT's usefulness clearer by making its downsides more obvious.

Why all the hate for JWTs?

Just pick a crypto scheme and the JWT is just an encoding that makes it easier to use.

(It's just a more convenient way of rolling your own scheme)

That said: random tokens have a lot going for them :)

> Why all the hate for JWTs?

> Just pick a crypto scheme and the JWT is just an encoding that makes it easier to use.

That's not what JWT is, but I can understand why someone would be misled into believing that.

JWT isn't just an encoding format, it also includes a crypto algorithm negotiation protocol that lets the attacker choose the algorithm. Even if you strictly allow-list which algorithm you want to support, you can accidentally bypass this control in many libraries if you support the `kid` (key ID) header. [1]

It also allows attackers to completely strip the security. [2] [3]

Put shortly, JWT is a gun aimed directly at your foot. That's why there's so much hate for JWTs.

[1] https://github.com/firebase/php-jwt/issues/351

[2] https://paragonie.com/blog/2017/03/jwt-json-web-tokens-is-ba...

[3] https://www.howmanydayssinceajwtalgnonevuln.com/

> Even if you strictly allow-list which algorithm you want to support

This is the norm.

As for the key ID attack, this sounds like it's just a trick to know where the private key is located? It shouldn't be publicly accessible.


> Blame the library, or its defaults.

Every JWT proponent says that, but it's a misuse that shows up in multiple libraries, in multiple languages, and isn't explicitly called out in the JWT Best Practices RFC at all.

I'm going to blame the standard for being error-prone.

There's nothing in any JWT RFC, to date, that calls out the need for cryptographic keys to be the raw key material in addition to its parameter choices, rather than just the raw key material. That's a fault of the standard.

That's not a single library's fault. That's the standard's fault.

PASETO has this to say: https://github.com/paseto-standard/paseto-spec/blob/master/d...

> As for the key ID attack, this sounds like it's just a trick to know where the private key is located? It shouldn't be publicly accessible.

This doesn't involve private keys at all. Look at the proof of concept code. https://github.com/firebase/php-jwt/files/6966712/php-jwt-po...

I don't think it's called out in the RFC because when you have a list of all the keys that are accepted, it makes it so the algorithm is effectively whitelisted.

If I give you a map of kid -> key material that looks like this:

      "my-super-cool-key-id": "some hs256 secret key string goes here",
      "another-key-id": "-----BEGIN RSA PUBLIC KEY-----\n ... snip ...",
      "yet-another-key-id": "----BEGIN EC PUBLIC KEY-----\n... snip ..."
And let's say you've got three different API endpoints, which each hard-code a specific algorithm (one HS256, one PS256, one ES256). But, because of the framework you're developing in, you're expected to provide a single configuration object containing a map of key IDs used by the entire application. (This is a common framework quirk.)

What stops you from swapping out the kid in a JWT's header and getting the underlying library to use the wrong key type for the endpoint that accepts HS256?

      "alg": "HS256",
  -   "kid": "my-super-cool-key-id",
  +   "kid": "yet-another-key-id",
      "typ": "JWT"
The answer varies per implementation. The JWT standards do not call this misuse potential out at all.

Strictly listing the keys does not, at all, hard-code the algorithm those keys are used with in every possible programming language and runtime.

Some languages (Java) accidentally prevent this through type safety in the low-level crypto APIs. Others accidentally prevent this by not supporting the kid header.

I have yet to see a JWT library that deliberately prevents this misuse potential. Is that the library's fault? Or their defaults' fault?


The fix, by the way, requires doing this:

      "my-super-cool-key-id": {
          "alg": "HS256",
          "key": "some hs256 secret key string goes here"
      "another-key-id": {
          "alg": "PS256",
          "key": "-----BEGIN RSA PUBLIC KEY-----\n ... snip ..."
      "yet-another-key-id": {
          "alg": "ES256",
          "key": "----BEGIN EC PUBLIC KEY-----\n... snip ..."
And then verifying that the alg for the key matches the alg for the token before attempting to verify the signature/MAC.

> crypto algorithm negotiation

If you control both sides, then you can ignore this part or do it out-of-band.

Though, if you control both sides, then you can use literally anything else too.

But that's the problem with JWTs, the whole "if you..." part. You want fewer of those rather than more in your crypto code, and JWT has too many. That's the whole problem.

There's a lot of material out there explaining what's wrong with JWTs. Two of my recent favourites:

https://groups.google.com/g/django-developers/c/6oS9R2GwO4k/... - on the Django mailing list

https://www.zofrex.com/blog/2020/10/20/alg-none-jwt-nhs-cont... - where the punchline is "Writing the code to sign data with a private key and verify it with a public key would have been easier to get correct than correctly invoking the JWT library. In fact, the iOS app (which gets this right) doesn’t use a JWT library at all, but manages to verify using a public key in fewer lines of code than the Android app takes to incorrectly use a JWT library!"

The main argument is not against JWT, it is against the libraries and that people use them, without knowing what they are doing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact