The supposed benefit of the scheme is that you don't need to evolve your DB schema as you add or remove validation fields. That's rubbish. If you're happy with storing data with no schema in the links, you should be equally happy storing it without a schema in the DB. (For example a token field, an expiration time field, and a json blob containing the actual payload corresponding to the token).
Even if you're using a stateful approach (ie. storing the token in the database) you still need to store and send a fairly long and convoluted URL. So in either case you're going to have to deal with the usability problem of making sure your long URL is still easily clickable.
As for the link invalidation, I'm confused as to why "." was included in the first place. It's not usually included in a base 64 encoding. (Still, that's why I use base 62 for sending URLs.)
They'll be more likely to be misinterpreted by a MUA, more likely to be copy-pasted badly (e.g. due to word-wrap issues), and will look more suspicious to users wary of phishing. Optimizing the URLs for length will then cause other problems, like my base64 one (where I started with hex, and then switched to a supposedly URL-safe base64 encoding). It'll also prevent you from having a fallback mechanism when the URL does break. With state, the email can simply include the token as plaintext ("If the link didn't work type in this 10 letter code"). It'll make it harder to later implement validation over SMS.
On the flip side, the gains from not storing state are basically non-existent.
If you can get away with a 64-bit or even 128-bit hash (maybe truncated SHA256?), and you encode in some base32 scheme, then it's 13 or 26 characters — at worst https://example.com/resetpw/d232rgs5690y6nphk00tdmmw60 (128-bit), which isn't so bad. base64 only gets you down to 11 / 22 characters, so it's not a huge win.
The benefit of the scheme is that this is stateless and you don't need to access the database at all. This isn't about schema, this is so that you don't have to think about expiring fields from the DB at all. It's done entirely in the web server.
That said, the input to the hashes should be well-structured (so you don't generate a collision by having two API functions that can be coerced to hash the same string with different meanings). That's a vulnerability in this writeup, and I think that would also be something a proper specification of this scheme should address.
There is a really good reason to store these things in the database though, and that's on-demand authorization retraction.
IIRC, Django uses both password and last login timestamp inside the hash. If you needed to invalidate someone's token you could just add 1 millisecond to their last login time.
Edit: This obviously doesn't invalidate the general idea.
The point of a transactional database is, you know... handling transactions... and in this case, resetting a password is a transaction... Sure, you have to kill long standing transactions manually, but that's a fair bit better than having to deal with a url that is longer than https://foo.com/account/password-reset/(uuid) for users that may well copy/paste ...
While I do like and appreciate the idea.. it's easy enough to scale when you are either using a replica-set for your reads, or a sharded database...
Base36 and base32 are popular where portability and paranoia of being MIME/URL-mangled are concerns, e.g., Google Authenticator's OATH TOTP implementation for the secret key.
Unfortunately, that's just a problem with the wikipedia article. The only one listed there as an RFC using '.' is RFC 4686, (in the padding column right now it says "= (optional, not recommended. Recommended use .)"). That's wrong though. In RFC 4648 it actually says:
The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.
The pad character "=" is typically percent-encoded when used in an URI, but if the data length is known implicitly, this can be avoided by skipping the padding
The data length is known for a cryptographic hash, so you would omit the '='.
I'll fix wikipedia.
> The problem is that the links become ugly, unwieldy for the users to handle
Why? The hash only needs to be big enough that brute-forcing becomes infeasible. Depending on the type of server-level flood control you're using, you can exactly determine the amount of bits required.
Or if you choose not to, a conservative estimate of 48 bits (8 characters in base64) characters should be more than enough. Remember that offline brute-forcing is not possible here, and at a rate of, say, 1M tries per second you'll be brute-forcing yourself silly for somewhere in the order of 2 years.
If you can make tighter estimates about flood control, these are rough (and still conservative x4) estimates: at a limit of 15k tries per second you can do with 7 bytes (times 6 bits). At a limit of 200 tries per second, just 6 bytes of base64 code (without proper flood control a large botnet could make this last one somewhat feasible, maybe).
... unless you mean to imply that the string "reset?user=John&expiry=987654321" is the ugly, unwieldy part :)
> For example the base64 encoding I was using for the authentication token included '.' as a one of the characters. Well, turns out that the linkification code in gmail will ignore a '.' at the end of the link (which is kind of sensible). So 1/64 of my validation links ended up being invalid.
Using the wrong encoding for your URLs has nothing to do with this technique. In fact, I don't think the article even suggested a particular type of encoding.
Did you not know what characters your base64 routine would output, or something? There's more than just the period that you don't want to end a URL with because of ambiguity with the ways people might want to use URLs in a sentence: Closing parens, comma, exclamation, asterisk, etc. Regardless of what one particular webmail provider will linkify or not, it's just common sense, if you think about it.
Closing parens are always a big problem with many Wikipedia links, it's not like this is some obscure "gotcha".
That is what the %XX URL-encoding was made for.
Or why not just use what YouTube does, they seem to have it figured out pretty well: A-Z, a-z, 0-9, _ and -
Why not just try appending "." on the server side if what you get is too short? All of the links become valid again.
Edit: interestingly, HN's text formatter doesn't!
But for things that are less common, like password resets and email validation, consider the downsides:
a) using the database gives you a built in (depending on schema) audit log of these events, where a signed token does not
b) If the key is stolen in the scheme described, the person in possession of the key can hack into all your accounts. If you use database state, someone needs to have access to the database.
c) (also mentioned above) You will almost certainly be able to have a smaller token if it refers to database state, which has it's own advantages: copy and paste errors, etc
There's certainly advantages to crypto tokens/cookies, and they're the right call in some circumstances, but there's downsides to consider as well.
Should I allow an abusive user to send 37 reset codes in 15 minutes to an email address they don't own (or even if they do own it)? Absolutely not. How else do you keep track of that activity without storing it in a database of some sort to check against?
- Anyone with access to the secret token can now generate their own urls without fail or evidence. With a DB you need to at least insert a row and thus leave evidence (especially if you are auditing db activity)
- No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.
- Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system.
- I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.
- No way to easily invalidate all tokens of specific nature. Example: If you want to invalidate all password reset tokens, all 2-factor authorization tokens, all confirm email tokens after any password reset, all systems must know of each other rather than go to the token table and set "valid" on all those tokens to "false" for everything. The hash would need to account for what change triggers this hash invalidation rather than just someone picking what to reset from a dinner menu as a separate concern.
And they wouldn't be valid. It's much simpler to invalidate them provided that you don't need information from the DB, than reading from the DB.
> No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.
Of course you design it in such a way, that you can change one of the inputs, which will by definition invalidate the tokens.
> Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system.
The point is to seperate this boilerplate logic from your core business logic.
> I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.
Any serious security systems assume that an attacker already know 100% of the workings of the system. You should not rely on obscurity, but on your secret - in this case the crypto key.
> No way to easily invalidate all tokens of specific nature.
Again, if this is a requirement, then make such a global parameter an input to the token generation. It's still much simpler than doing it "the old fashioned way".
I was looking at using JWT to avoid a database read when authenticating a request, but in order to get some sort of variable per-user value I'd have to hit the database to get it, no? Doesn't that kind of defeat the purpose?
Include a parameters that varies with user. And optionally something that will change over time, and that an attacker can not trivially obtain.
Now the generated urls will fail unless the attacker knows both the user ids _and_ the auxiliary information. You can make that auxiliary information simple (e.g. last login time and old password hash, in the password reset example), or you can explicitly generate a value per user that is updated according to certain criteria. Call it an API key for the user account.
(as for auditing, nothing stops you from logging requests; a big difference is typically that it's easier to scale non-transactional log-writes where eventual consistency is generally sufficient)
> - No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.
See above. All you need, is to ensure the URL includes a value that can be changed per user. This could be an existing field that is re-purposed (as in the "last login time" example, or a field added explicitly for this purpose if you so choose. But it's a valid consideration.
> - Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system. - I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.
So encrypt the fields to obscure it, or make the key/values mean nothing to users.
> - No way to easily invalidate all tokens of specific nature. Example: If you want to invalidate all password reset tokens, all 2-factor authorization tokens, all confirm email tokens after any password reset, all systems must know of each other rather than go to the token table and set "valid" on all those tokens to "false" for everything. The hash would need to account for what change triggers this hash invalidation rather than just someone picking what to reset from a dinner menu as a separate concern.
All you need is to encode a value for the request type as part of the information that you can change when you want to invalidate that type of request.
You don't need to include that information in the URL either (assuming the URL already holds other information to identify the type of request)
Bump the value, and all the old urls of that type are now invalid. You don't even need to think about this in advance: Just allow the mapping to map to null/nil and in that case don't include the value when calculating the hmac. Now you can invalidate one specific type without invalidating any of the other urls that pre-date adding this mechanism. It's easy to evolve this checking if you identify additional factors you need to be able to revoke on.
You can also use this mechanism to selectively enable/disable subsets of requests based on various factors (e.g. you can create urls that are only valid within office hours, if you please, by including a flag when you calculate the hmac, and change the "true" value of that flag depending on the time of day), though it quickly becomes easier to simply include the information in the URL and relying on the HMAC to determine validity of the assertion.
So is the alternative the author describes (except for scaling). HMACs for temporary credentials are clever and simple, if you're already versed in crypto primitives. But to anyone else, they're magic security sauce: "But why can't I just use MD5? Wait, no, that's broken somehow - what about SHA? What do I include in the HMAC? Do I have to use a new key every time?"
The solution trades implementation complexity for arcane knowledge. The author is so confident that his solution is superior because he already has that arcane knowledge.
That said, a well-implemented HMAC-based password reset flow is probably more bulletproof than a well-implemented DB-based flow. So if you're implementing AWS, go for HMAC. If you're implementing a blog, go with whatever you feel like. Just implement it well.
While I consider myself to be a pretty good developer, I'm definitely not well-versed in crypto and math isn't my strongest skill. Despite that, I've had no problem using crypto for several of the obvious use cases (ex. password resets).
The most important rule of thumb for crypto is that you'll probably mess up doing it yourself.
However, if you read Wikipedia on HMAC, you'll find that HMAC is in fact probably one of the easier crypto primitives to understand and implement (as a TOY project). As long as you do the padding thing exactly as specified.
But really, use a library.
I'm saying that in this scenario, it doesn't make much of a difference which system is used. So the author shouldn't be quite so forceful in telling people to use a crypto primitive outside of their comfort zone for minimal benefit. I'd rather use a system written by someone confident in their DB solution, rather than someone who learned their HMAC solution in a couple of hours on Wikipedia.
The first thing you learn about crypto is that it's harder than it looks and best left to experts. Knowing when it's actually okay is difficult. At the very least, I'd expect that they ask for a review from someone who does know what they're doing, and how many developers who don't work at a big Internet company have a spare security expert handy?
Signed URLs are incredibly useful, but the examples given dont seem that compelling to me.
Don't get me wrong, I'm not advocating this principle or implying it is needed and doable with the current technologies (I don't think we have a way of reliably storing such long-term data within the user's browser), but it's an interesting train of thought. It doesn't obviate the need for the server to store any sensitive information, just some of it. User info, password hashes, credit card numbers, maybe even social graphs depending on the use case, could all be stored this way. There will always be websites for which this doesn't make a lot of sense, but a lot of the other ones, mainly small players, could benefit from this.
It's curious how, when you describe it like that, it sounds that your data is being held hostage by a third party. After all, the server doesn't allow you to see what's in the encrypted opaque blobs, it only allows a controlled set of operations to be done with this data, and the only thing they do is further mutate this opaque blob. But it is no less transparent than the current state of affairs, namely servers keeping the data for themselves.
I'm interested to hear HN's opinion on this, flaws in the approach I've missed, etc.
Trying to authenticate solely on the client side isn't as fruitful as you might be imagining, since it introduces potential security isues. Without any server-side authentication using only information (i.e. a blob) that the server has access to, it becomes very easy for clients to spoof/pretend to be other clients.
Consider that if this kind of system were easy/fruitful to implement, a lot of businesses would have done it already to avoid increased development/server costs on their end.
One issue I can think of is users spoofing older blobs (reminiscent of a replay attack). I believe this could be mitigated if the server keeps a single 256-bit hash of each user's state (this would also make client-side HMACs redundant), or a 64-bit ever-increasing integer which is then included in the state. That's way more tiny than any user state.
User 1 registers username "alice".
User 2 registers username "alice".
User 1 verifies account.
User 2 tries to verify the account. If the programmer didn't forget to check twice if an account name exists there's some error.
This might not happen often, but you still need to write a good error handling for that case.
In fact I seem to remember django's default out-of-the-box password reset mechanism does it more or less this way.
It just seemed so obvious.
In the password reset example, something is stored: The new password hash. Hence the example of including the old password hash, as a value that will be invalidated when the request has been processed.
Otherwise you need to bump a value on accessing the URL, but you still benefit by not having to create book-keeping information when creating the URL.
This is solved by rate limiting, I suppose. Feels like something that should've been included in the article.
Of corse, I'm not arguing against rate limiting.
What does this have to do with whether you're storing tokens or using a hash?
The question is not how to make links that expire after a time, the question is how to make links that expire after a fixed number of uses (e.g. one use).
Suppose your link expires in 30 minutes (or any other time). What stops me from using that link 30 times, once per minute?
It's still not clear to me, however, how to generalize this technique to create limited-use links that are not necessarily for resetting passwords.
JSP and ASP.NET have allowed for this kind of shennanigans in their "view state" (albeit, security is a configuration option away...) mechanism. It's not hard to extend it out to things flying back and forth to the user.
As for usability, these sort of things should be wrapped up in a nice container class; HMAC taken care of, and (probably) a key-value API presented. No fuss, no muss.
If there's no such library, creating one could definitely pose a security risk to any project without sufficient expertise, as this post appears to be endorsing.
Find an existing, tested, reviewed implementation that provides the API you need, and stick with it.
Also, unless you plan on making the valid window very short, you probably want to mark reset requests as used so someone's old email doesn't get used again.
It's not a completely crazy idea, but the minimal amount of trouble required for the extra tables doesn't seem like enough to give up the logging and expiration benefits.
You also don't need to mark reset requests as used, if the generated token was composed of the user's old hashed password or email, as that token will be invalid as soon as the user does the reset.
How else can you keep track of whether there is a wave of reset abuse targeting an user / email, if not through saving it to some sort of data store?
eg - after 5 reset attempts in 15 or 30 minutes, prevent any further reset attempts for the next X amount of time; either outright, or based on a signature of the request
There is a big difference between defining a table in your database that keeps track of resets, or using a queue of messages (event sourcing even?) with filters applied for detecting abnormal behavior.
The former is a dirty solution, dirty because you're storing junk that shouldn't be stored in a relational database and it doesn't take care of other much more important kinds of attacks on your system. Whereas the later is extensible.
So lets say that in addition to limiting the number of resets one does, you also want to limit the number of failed login attempts to 10 per hour. You may also want to limit users jumping between IP addresses, you don't want to be too strict about it, because mobile connections, but you do want to prevent multiple sessions active that use the same user credentials. You may also want your system to evolve based on taking averages out of user's past activity.
Now where does that data go? Of course it's in "some kind of data store", but that says nothing, because the log files stored on disk are also a data store.
In the examples in the article, the JWT header is just plain cruft because you're unlikely to be switching encoding often (and if you decide to, including a single much shorter token as a "stand-in" for the bloated JSON data would be much better; using JSON).
The payload also represents a lot of extra overhead unless you intend to transfer more than just a single level dictionary.
It's kind of comical that they present it as "compact" given that probably something like 30% of the length of the presented example is unnecessary.
I've read "the network is my favorite database" a while ago. Similar concept. But, the problem is that the urls are really long and wouldn't fit into an sms for example. If you are trying to get users to get started with your service the last thing you want to do is throw more friction at the process.
It's a bit like Microsoft shuttling ViewState back and forth in the old ASP, instead of just storing sessions in the database. Signing things is good for validation, to prevent giving out resources to a client which wasn't given valid credentials. But that stuff should be stored in cookies, not links. Links should be simple and to the point. And if that means having the links contain a token the server can use to look up additional info, tinyurl-style, then so be it. That lookup is fast, and once you did it, you can cache it in a hash based cache like memcache. Better experience for the end user and better security for you.
In short - I agree with the author's premise about guarantees etc. but when giving humans links, they should be simple. Store all the authentication crap in cookies.
To bring my point home, I'll just remark that inviting someone by sms or email is a way to not only motivate them to sign up but also having them click a unique link already verifies their mobile number or email address in one shot. One click and they have an account.
Also, Rails has done something similar with their sessions, ages ago, with the whole state being stored in the client with a secret on the server.
It's such a cool idea, and now there's the JWT  thing that you can use, and I use all the time these days.
Not to say it's not useful, just that it's not really novel. You also still have to be careful of replay attacks, which the author briefly addresses.
Also, keeping systems secure when they need to support use cases that inherently try to subvert some measure of security (like "forgot password") is difficult, and I find the difficulty is much more likely to be in the overall workflow than flaws in the "generateToken" and "validateToken" methods themselves.
Unless you are google scale you don't need that, so I am on your side (other than HMACS not actually being that difficult to fuck up, compared to e.g just encrypting the entire thing in AES at which point you are hosed).
1. In Rails apps with the devise gem the password reset token is just a column, not a whole other table.
2. Email activation, having the additional non-activated records is not a big problem, 20% extra records in my users table is pretty acceptable.
Django has native support for unverified users.
That said it is still a good point.
Maybe I'm wrong but he can still use this password reset if he remembered his login and logged in. ID stays, it might not be expired, BcryptHash stays, Ip stays.
That, of course, is suboptimal.
If you want to expose to user his or her account's activity, like history of recent password reset, recent login activity, things that a typical FB user would see in the user's activity page, that can be a really nice security feature. So maybe after all the HMAC approach works just fine with extra state added. But I think the point the author is bringing is that the application can simply just do HMAC(.....) easily and verify in a few lines of just if and else branch, whereas the traditional password reset code would probably take more lines and more conditions. I had written code that did exactly what the author didn't like and trust me that code wasn't pretty and I had spent many hours to optimize the code. Edge cases and test cases shitted my codebase.
This was a really refreshing read on HN. Wish there were more articles like this one utilizing crypto for modern programming applications.
While I don't think I'll be implementing any of the solutions outlined in the article (because the database does not represent a pain point for me) I do think this is a fascinating technique and is something worth keeping in the toolbox as a solution.
user id|timeout|user custom property
Having custom property (per user number), gives an opportunity to kill sessions. E.g. session is compromised, just increment it and regenerate session for valid user.
I'm not sure I follow this. To change a password you need to write to the db anyways, it's one extra check to set token used=1 or remove it. Why is that a huge deal?
Now you don’t have to worry about clearing old bitrot from the database or worrying about when to expire non-verified accounts.
It's also possible to use Redis or another fast store with timeouts. The problem seems rather exaggerated.
Usually when I implement a stateless approach like this I still fire off a log event every time we generate a password reset.
The reason why I completely disagree with this is because this is embedded in code. All your developers will have access to this code, and you have people joining and leaving your company all the time. If this scheme is used to protect more important information and if it leaks, then it could cause chaos.
The difference is that all developers generally have access to code, but only some well-trusted people have database access in prod. If I care about security, that is a more secure way to do things.
PS: I don't understand why you are getting downvoted, your comment is more relevant than most of the comments here who don't understand a MAC.
If the production DB credentials are hard-coded, developers would indeed have access to the prod DB; if the production HMAC secret were hard-coded, developers would indeed be able to forge tokens. So don't do that.
That's why we use environment variables.
Compare that savings with the additional size of URL parameters (increasing network overhead) in addition to the processing overhead of encryption/signing/validation. Have we really made a performance gain? I don't have any hard numbers, but I'm willing to wager than at gains are fairly negligable.
From a security standpoint, I'm not fond of this approach, and I don't think it's benefits outweight the added risks. Here's why:
- Encryption (including simple signing) is really easy to get wrong in subtle ways. A simple mistake like using SHA256() instead of HMACSHA256() breaks the system badly.
- A corralary to the first point is that using a strong key is vital. The article says nothing of the criteria a strong key should meet and a poor key could enable additional attacks. While I'm not aware of anything allowing for HMAC key recovery in better than brute force, this could be problematic. If your key is a random 10 character ASCII string, it falls to brute force in minutes at most. If it's 256 random, you're much better off.
- This scheme introduces a significant number of anonymous user-supplied inputs to the application which didn't previously exist. Each of these is a potential place for things to go wrong, which can give rise to other vulnerabilities (XSS, SQLi, etc) depending on what you're doing with that input. You'll need to be sure that you validate the signature prior to doing anything with these unverified inputs, in addition to performing sanity checks on their contents prior to signing. Part of secure application design is to minimize attack surface and complexity - this does the opposite.
- Once a signature is created, it can't be revoked. I mean, you could store the signature in the database and validate it's presence on receipt, or include in your signed values a nonce that's stored in the database, but that kind of defeats the goal of avoiding database storage. This is something the article tries to address by signing over things like an expiry time, the user's old password digest, etc. But that's insufficent as well (I'll get into why below).
- While I'm not aware of anything allowing for HMAC key recovery in better than brute force, choosing a secure key is still important. The article says nothing of the criteria a strong key should meet and a poor key could enable additional attacks.
- Even if you get everything right, key exposure is fatal. You've introduced a single point of failure in the system that didn't previously exist, and a fairly catostrophic one. If the app ever leaks that key, you're only option is to rotate it (assuming you detect the exposure), which invalidates any existing signed values.
That's the broad points, let's look at the article's examples:
As the article notes the token can also be reused multiple times, even if johnny has already used it. They offer a means of fixing this:
Furthermore, this token is still valid if Johnny requests a password reset, and while waiting for the email to arrive remembers that he changed it from 'Password123' to 'Password456' and logs in. There's then a window of time where that token remains valid (until expirationTime arrives) and can be abused. The article doesn't address this case at all. I suppose they'd say that on issuing a token, you should disable login to Johnny's account until he's reset, but how would you do this? Well, you'd set a flag in that database on Johnny's record, and we're kind of back where we started again with both a database read and write, but with added cryptographic overhead and attack surface too.
The next example talks about account registration. It states that you can verify a user's email account by sending them a link to:
Case Study 3 talks about one-time-use and expiring resources. There's not enough detail here about their scheme for me to really critique, but they point out AWS/S3 use of HMAC signing as an example. Except that AWS/S3 signing isn't about one-time use or expiration. Instead, it's about authenticity.
AWS operates with an ACCESS_KEY and SECRET_KEY which are a pair of tokens related to each other. Requests with 'params' are signed as:
Signature = HMAC(ACCESS_KEY + params, SECRET_KEY)
This can be done reasonably well, but it's a minefield to navigate when we have a tried and true solution. What's so bad about database storage of tokens? The post kind of starts with the assertion that they should be avoided, but I see no clear explanation as to why.
Is a table and a couple of columns actually difficult? Is this, or has this ever been, a problem for anyone? When did we enter the "post-database" world?
This all makes plenty of sense if you buy the strawman, otherwise it seems like some pretty strong over-reaching.