Hacker News new | past | comments | ask | show | jobs | submit login

A reason to hash cookies/tokens on the server side is so you don't have to reset them if your server database is compromised. A fast cryptographically secure hash with digest size equal to the token size is a good choice.

This also has the benefit of defending against timing attacks, for example, if you don't completely trust your code or the compiler / interpreter to not optimize your "constant time" comparison.




It would be good practice to reset your auth tokens in the event of a database compromise anyway. In most cases that's a low-cost operation. The advantage is that it rescues you from the possibility that the attacker logged auth tokens when they were passing through one of your servers.


I agree, of course there is nothing stopping you from resetting if and when you want to, and on your own schedule. That's not a reason not to run a hash on the input token. I think there's no possible reason not to run 'token = sha256(token)'.

> In most cases that's a low-cost operation

But resetting API tokens is not necessarily low cost, as it's literally pulling the plug on all your persistent connections, and requiring manual intervention to bring them back online. There are many cases where you can't simply pull the plug on all your clients.

Also, as a byproduct it enforces all sorts of best practices, like ensuring the token is just a token, and doesn't have data piggybacking on it.


So what are your cookies/tokens if hashing them makes a difference? Seems like cookies should be CSPRNG values, and it is unclear what hashing one of those gets you.


The tokens are CS-PRNG, e.g. 32-bytes or larger. The preimage is stored on the client as the token. The token value received by the server is sent through a hash or HMAC before being compared against the persisted value on your backend database.

If the backend database is compromised/leaked whatever, the attacker sees the result of the Hash/HMAC, but cannot generate a preimage, so your tokens stored by clients do not need to be reset. Otherwise, a leak of the backend token database grants full access to the attacker.

This is particularly useful for long-lived tokens, like API tokens, but can also be used for short-lived sessions alike.

This is not currently considered a must-have / expected best practice, but I believe it will evolve into one, because it's hard to get wrong, and it protects against a reasonably common attack vector. Guaranteed timing attack resistance is a bonus add-on which is valuable in its own right.


The backend has been so invasively compromised that attackers have dumped the session token store. The thing that the session tokens protect essentially belongs to the attackers. Even if, as a result of crazy overengineering, they can't read the original "seed" secret the store holds authenticators for, they can simply overwrite them.

There is no point to hashing cookies and session keys. Don't bother. It's not just not a best practice. It's something a smart auditor would file a sev:info bug against.


Writing a new token will make a lot of noise in the system, exposing the breach. Using a stolen token from a new IP, perhaps much less so, although that too should set off warning bells. Appeal to authority aside, I can't see a downside.

What I like most about 'token = sha256(token)' is that it expresses a significant part of the design in a single, obvious line of code; we don't store tokens, we can't provide tokens after the fact, we can't display tokens on our god-mode UI, if you lose your token a new one much be regenerated because we simply don't have it anymore, the token must not encode data beyond being a unique random value, we cannot be socially engineered into sharing your token with an attacker, the token will never be disclosed through a timing side-channel, insider threats cannot access your token, etc.


> Writing a new token will make a lot of noise in the system

Simply doing a hash of the token won't get you the desired "noise" in the system. That is, you need to do something else besides what you have said here. For example, writing a new token, whether produced by an attacker or a legitimate user, will produce the same some "noise", no? Unclear how, when the attacker owns your database and any triggers, you are going to tell if it is an attacker.

Similarly, the single obvious elegant line of code is not going to deter an attacker. Oh, I imagine that there are attackers that enjoy the beauty of code in a hacked system and put it on pastebin, possibly slowing them down.

And what are the chances if the attacker is that deep into your system, that they aren't reading the tokens before they are hashed?

The downside that I see, and I would flag this scheme, is that it doesn't provide the protection that the developer is led to believe.


An attacker can't change a token without locking out the valid client, that's the "noise" I was speaking of. If my requests suddenly start failing because my token is no longer valid, the attacker has shown their hand and we now have proof of a compromise. There are many attack vectors which could provide read or even read/write access to the token database without memory access or code execution. To name one from last week, how about Slack?

Nothing, literally, nothing, can provide protection against the attack vector you're saying this does not help against. Sure, and of course I agree. But obviously we don't give up and do nothing because sometimes exploits grant root. We add layers of defense and expect that some will hold, and some will not. Some will act to contain a breach across certain vectors, and some will be run over. Some will simply slow an attacker down, or prevent them from walking away with 100% of your data, and only get a percentage based on the time they were able to maintain the compromise.

I don't know what you mean that the line is "not going to deter an attacker". Surely you understand the difference between a short-lived attack which can only target persisted data, and a persistent attack with full memory access, and surely your threat model allows for one without the other in many cases? So green boxes in those cases for this technique, and N/A for the other boxes. What am I missing?

There are attacks it does not withstand, the same attacks that make password hashing useless by-the-way, and there are attacks that it certainly does withstand. There are also several attack vectors which it eliminates completely, and various follow-on benefits which I listed as well.

On the whole it is a large positive in a simple obvious package with no possible downside (except perhaps auditors flagging it as a bug). I never said it was a magic bullet which protects you from full persistent system compromise. I understand exactly what protections it provides and beneficial policy restrictions it imposes.

I find the resistance to such a simple defense-in-depth perplexing. It not being perfect is not a reason not to do it. Neither is password hashing, but we still do that. I really am shocked that token hashing would be at all controversial. I do it, and I think everyone else should too.

"It doesn't provide the protection that the developer is led to believe." Well this is much more true of bcrypt than it is of token hashing! What percentage of passwords does bcrypt protect, and for how many minutes after the database is popped? It's impossible to answer. What percentage of tokens does token hashing protect? If you can answer a) describe the access level granted by the compromise, and b) describe the time when the compromise began, then I can definitively answer that question.

The answer cannot be, assume the attacker had full root access to all systems. Because we are under attack every single day, and some of those attacks succeed in some limited measure. There are laws on the books today, and new ones coming, which require rapid disclosure of the actual extent of data breaches, so this isn't academic.

Of course the exact extent of a compromise is not always know, but likewise there are cases where the extent of a compromise is precisely known, and in a subset of those cases, a hashed token prevents an attacker impersonating your clients.

I mean, sha256(token) is child's play, to be arguing about it is just odd. I'd like to see homomorphic encryption stop even a memory-sniffing attacker from ever intercepting my tokens. Lets get serious! What, you'd rather just trust TLS? :-/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: