Hacker News new | past | comments | ask | show | jobs | submit login
Life in a post-database world: Using crypto to avoid database writes (neosmart.net)
256 points by ComputerGuru on Feb 15, 2015 | hide | past | favorite | 117 comments

Last time I used this approach I ended up regretting it. The problem is that the links become ugly, unwieldy for the users to handle, and occasionally cause problems that are rather hard to debug. For example the base64 encoding I was using for the authentication token included '.' as a one of the characters. Well, turns out that the linkification code in gmail will ignore a '.' at the end of the link (which is kind of sensible). So 1/64 of my validation links ended up being invalid.

The supposed benefit of the scheme is that you don't need to evolve your DB schema as you add or remove validation fields. That's rubbish. If you're happy with storing data with no schema in the links, you should be equally happy storing it without a schema in the DB. (For example a token field, an expiration time field, and a json blob containing the actual payload corresponding to the token).

Sorry, but I don't really see how this introduces new problems.

Even if you're using a stateful approach (ie. storing the token in the database) you still need to store and send a fairly long and convoluted URL. So in either case you're going to have to deal with the usability problem of making sure your long URL is still easily clickable.

As for the link invalidation, I'm confused as to why "." was included in the first place. It's not usually included in a base 64 encoding. (Still, that's why I use base 62 for sending URLs.)

The difference is that the URL with just a lookup token will be a lot shorter than the URL with all the data embedded in it, and will have a static and predictable length. And longer URLs are a source of problems.

They'll be more likely to be misinterpreted by a MUA, more likely to be copy-pasted badly (e.g. due to word-wrap issues), and will look more suspicious to users wary of phishing. Optimizing the URLs for length will then cause other problems, like my base64 one (where I started with hex, and then switched to a supposedly URL-safe base64 encoding). It'll also prevent you from having a fallback mechanism when the URL does break. With state, the email can simply include the token as plaintext ("If the link didn't work type in this 10 letter code"). It'll make it harder to later implement validation over SMS.

On the flip side, the gains from not storing state are basically non-existent.

Just truncate it to a collision probability that is acceptable to you. Nothing's forcing you to keep all of the signature bytes.

I would rather do both, and store the short url to long url in a key value store with automatic deletion. I would rather keep that crap away from my customer data if its not needed.

It'd be nice for someone to standardize this. I believe that it's safe to use a cryptographic PRF that's too short to be a proper cryptographic hash, like the 64-bit SipHash https://131002.net/siphash/ : you're not worried about collisions, and you (hopefully) have your own ways of shutting down 2^64 brute-force attempts already. But I haven't thought hard so don't quote me on that.

If you can get away with a 64-bit or even 128-bit hash (maybe truncated SHA256?), and you encode in some base32 scheme, then it's 13 or 26 characters — at worst https://example.com/resetpw/d232rgs5690y6nphk00tdmmw60 (128-bit), which isn't so bad. base64 only gets you down to 11 / 22 characters, so it's not a huge win.

The benefit of the scheme is that this is stateless and you don't need to access the database at all. This isn't about schema, this is so that you don't have to think about expiring fields from the DB at all. It's done entirely in the web server.

That said, the input to the hashes should be well-structured (so you don't generate a collision by having two API functions that can be coerced to hash the same string with different meanings). That's a vulnerability in this writeup, and I think that would also be something a proper specification of this scheme should address.

This is why people use base48 or base36 for binary data in URLs.

There is a really good reason to store these things in the database though, and that's on-demand authorization retraction.

You can still do that on-demand retraction as long as the hash is fed something from the database that you can change.

IIRC, Django uses both password and last login timestamp inside the hash. If you needed to invalidate someone's token you could just add 1 millisecond to their last login time.

you can always use base64-url structure too... base64, remove trailing "="'s, replace "+" with "-", replace "/" with "_". Though the "_" and "-" can still be ugly at the beginning or end. You can also encode them %## style.

Same problem here. Plus I am using sendgrid which automagically makes the link about 4 times longer using the same "trick" which further amplifies the problem. In fact I decided that I should go the other direction instead. Now every outgoing email is stored in the database before sending. This makes it trivial to track every email open, every click on every link and the links look beautiful, like "/e/<random_email_id>/open". Been happy ever since.

Edit: This obviously doesn't invalidate the general idea.

Another benefit of generating and saving an email to the database first, is at that point putting a "view this email on the web" type link into the top of the email becomes trivial. Formatting HTML emails is a significant pain, and mobile only exacerbates that...

The point of a transactional database is, you know... handling transactions... and in this case, resetting a password is a transaction... Sure, you have to kill long standing transactions manually, but that's a fair bit better than having to deal with a url that is longer than https://foo.com/account/password-reset/(uuid) for users that may well copy/paste ...

While I do like and appreciate the idea.. it's easy enough to scale when you are either using a replica-set for your reads, or a sharded database...

A UUID is 128 bits, written in hex, so it's 36 bytes. If you use an appropriately URL-safe base64, you can encode a 216-bit value in the same number of bytes. That's long enough to be cryptographically safe for this application.

This of course is an email layer on top of your core logic. Your core logic still thinks it's sending a mail with a very long URL. It's just shortened and when it's incoming it will be converted back. So you get the best of both worlds.

'.' is not a standard base64 character; did you roll your own?

Base36 and base32 are popular where portability and paranoia of being MIME/URL-mangled are concerns, e.g., Google Authenticator's OATH TOTP implementation for the secret key.

if you check http://en.wikipedia.org/wiki/Base64 there are actually RFC-standard alphabets that use . and - because they're URL-safe.

I was surprised when I read that, since I've implemented those RFCs before and didn't remember this coming up - so I checked.

Unfortunately, that's just a problem with the wikipedia article. The only one listed there as an RFC using '.' is RFC 4686, (in the padding column right now it says "= (optional, not recommended. Recommended use .)"). That's wrong though. In RFC 4648 it actually says:

The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.


The pad character "=" is typically percent-encoded when used in an URI, but if the data length is known implicitly, this can be avoided by skipping the padding

The data length is known for a cryptographic hash, so you would omit the '='.

I'll fix wikipedia.

You should use URL-safe base 64 encoding described in [1]. It uses `-` and `_` and should be safe for Gmail.

1: http://tools.ietf.org/html/rfc4648#section-5

In the cloud, I would think that a better alternative is to store temp data in Azure table storage / Amazon SimpleDB. This is cheap, scalable storage that avoid having to use the database when not needed.

The benefit is that you do not have to store any additional data in your DB, of if you need it for BI purpose you can store it in an appropiate BI system instead.

These regrets sound to me more like "your problems" than problems with this confirmation scheme.

> The problem is that the links become ugly, unwieldy for the users to handle

Why? The hash only needs to be big enough that brute-forcing becomes infeasible. Depending on the type of server-level flood control you're using, you can exactly determine the amount of bits required.

Or if you choose not to, a conservative estimate of 48 bits (8 characters in base64) characters should be more than enough. Remember that offline brute-forcing is not possible here, and at a rate of, say, 1M tries per second you'll be brute-forcing yourself silly for somewhere in the order of 2 years.

If you can make tighter estimates about flood control, these are rough (and still conservative x4) estimates: at a limit of 15k tries per second you can do with 7 bytes (times 6 bits). At a limit of 200 tries per second, just 6 bytes of base64 code (without proper flood control a large botnet could make this last one somewhat feasible, maybe).

... unless you mean to imply that the string "reset?user=John&expiry=987654321" is the ugly, unwieldy part :)

> For example the base64 encoding I was using for the authentication token included '.' as a one of the characters. Well, turns out that the linkification code in gmail will ignore a '.' at the end of the link (which is kind of sensible). So 1/64 of my validation links ended up being invalid.

Using the wrong encoding for your URLs has nothing to do with this technique. In fact, I don't think the article even suggested a particular type of encoding.

Did you not know what characters your base64 routine would output, or something? There's more than just the period that you don't want to end a URL with because of ambiguity with the ways people might want to use URLs in a sentence: Closing parens, comma, exclamation, asterisk, etc. Regardless of what one particular webmail provider will linkify or not, it's just common sense, if you think about it.

Closing parens are always a big problem with many Wikipedia links, it's not like this is some obscure "gotcha".

That is what the %XX URL-encoding was made for.

Or why not just use what YouTube does, they seem to have it figured out pretty well: A-Z, a-z, 0-9, _ and -

Can I ask, did you send an HTML email or did you just throw the link in and did GMail do you the service of making 63/64 links easily clickable without copying it to the address bar? Because I'm pretty sure it will work fine 64/64 times if you use an a-href.

Text email. A scheme that requires HTML email to paper over the problem of ugly URLs, has in my mind already lost any ease of implementation argument.

> So 1/64 of my validation links ended up being invalid.

Why not just try appending "." on the server side if what you get is too short? All of the links become valid again.

It's symptomatic treatment rather than curing the problem. In this case it's GMails tendency to remove a character that's considered URL safe. So really the best you can do is using a different encoding that is not subject to this.

I wonder if Gmail would understand the standard textual form <URL:http://example.net/stuff.>.

Edit: interestingly, HN's text formatter doesn't!

Also, that's usually <http://example.net/stuff.> or [http://example.net/stuff.]; no 'URL:'.

There's really a strong reason to do this for things like user session tokens, where using crypto and avoiding database updates could easily remove a substantial portion of your database traffic.

But for things that are less common, like password resets and email validation, consider the downsides:

a) using the database gives you a built in (depending on schema) audit log of these events, where a signed token does not b) If the key is stolen in the scheme described, the person in possession of the key can hack into all your accounts. If you use database state, someone needs to have access to the database. c) (also mentioned above) You will almost certainly be able to have a smaller token if it refers to database state, which has it's own advantages: copy and paste errors, etc

There's certainly advantages to crypto tokens/cookies, and they're the right call in some circumstances, but there's downsides to consider as well.

Spot on. The article disregards that you really need a log of activity to check against with reset requests.

Should I allow an abusive user to send 37 reset codes in 15 minutes to an email address they don't own (or even if they do own it)? Absolutely not. How else do you keep track of that activity without storing it in a database of some sort to check against?

For short lived things like these you can store the info you need in a memory based storage.

Problems I see:

- Anyone with access to the secret token can now generate their own urls without fail or evidence. With a DB you need to at least insert a row and thus leave evidence (especially if you are auditing db activity)

- No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.

- Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system. - I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.

- No way to easily invalidate all tokens of specific nature. Example: If you want to invalidate all password reset tokens, all 2-factor authorization tokens, all confirm email tokens after any password reset, all systems must know of each other rather than go to the token table and set "valid" on all those tokens to "false" for everything. The hash would need to account for what change triggers this hash invalidation rather than just someone picking what to reset from a dinner menu as a separate concern.

> Anyone with access to the secret token can now generate their own urls without fail or evidence. With a DB you need to at least insert a row and thus leave evidence (especially if you are auditing db activity)

And they wouldn't be valid. It's much simpler to invalidate them provided that you don't need information from the DB, than reading from the DB.

> No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.

Of course you design it in such a way, that you can change one of the inputs, which will by definition invalidate the tokens.

> Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system.

The point is to seperate this boilerplate logic from your core business logic.

> I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.

Any serious security systems assume that an attacker already know 100% of the workings of the system. You should not rely on obscurity, but on your secret - in this case the crypto key.

> No way to easily invalidate all tokens of specific nature.

Again, if this is a requirement, then make such a global parameter an input to the token generation. It's still much simpler than doing it "the old fashioned way".

> Of course you design it in such a way, that you can change one of the inputs, which will by definition invalidate the tokens.

I was looking at using JWT to avoid a database read when authenticating a request, but in order to get some sort of variable per-user value I'd have to hit the database to get it, no? Doesn't that kind of defeat the purpose?

Said variable could be in memory, but the point isn't to save DB reads, but to avoid saving data you don't need to. This is a technique to implement the same feature without having to store additional information in your core business logic, that is relatively useless outside of this password reset or user signup. Of course you may want to save this for BI of audit purposes, but then you can (and should) use an appropiate seperate system.

Note that the title says avoiding database writes.

> - Anyone with access to the secret token can now generate their own urls without fail or evidence. With a DB you need to at least insert a row and thus leave evidence (especially if you are auditing db activity)

Include a parameters that varies with user. And optionally something that will change over time, and that an attacker can not trivially obtain.

Now the generated urls will fail unless the attacker knows both the user ids _and_ the auxiliary information. You can make that auxiliary information simple (e.g. last login time and old password hash, in the password reset example), or you can explicitly generate a value per user that is updated according to certain criteria. Call it an API key for the user account.

(as for auditing, nothing stops you from logging requests; a big difference is typically that it's easier to scale non-transactional log-writes where eventual consistency is generally sufficient)

> - No way to individually invalidate anyone. Example: What if ONE account's db tokens were exposed. Should be able to invalidate those tokens, not ALL tokens.

See above. All you need, is to ensure the URL includes a value that can be changed per user. This could be an existing field that is re-purposed (as in the "last login time" example, or a field added explicitly for this purpose if you so choose. But it's a valid consideration.

> - Not necessarily easier than one "expiring token" mechanism that is re-used throughout the system. - I don't feel great about exposing all my private security checks in url parameters. Why give users clues on how I do security? I can provide that in a better, more meaningful way.

So encrypt the fields to obscure it, or make the key/values mean nothing to users.

> - No way to easily invalidate all tokens of specific nature. Example: If you want to invalidate all password reset tokens, all 2-factor authorization tokens, all confirm email tokens after any password reset, all systems must know of each other rather than go to the token table and set "valid" on all those tokens to "false" for everything. The hash would need to account for what change triggers this hash invalidation rather than just someone picking what to reset from a dinner menu as a separate concern.

All you need is to encode a value for the request type as part of the information that you can change when you want to invalidate that type of request.

You don't need to include that information in the URL either (assuming the URL already holds other information to identify the type of request)

Bump the value, and all the old urls of that type are now invalid. You don't even need to think about this in advance: Just allow the mapping to map to null/nil and in that case don't include the value when calculating the hmac. Now you can invalidate one specific type without invalidating any of the other urls that pre-date adding this mechanism. It's easy to evolve this checking if you identify additional factors you need to be able to revoke on.

You can also use this mechanism to selectively enable/disable subsets of requests based on various factors (e.g. you can create urls that are only valid within office hours, if you please, by including a flag when you calculate the hmac, and change the "true" value of that flag depending on the time of day), though it quickly becomes easier to simply include the information in the URL and relying on the HMAC to determine validity of the assertion.

In other words: a maintenance nightmare, filled with security gotchas, difficult to scale, and hard to write.

So is the alternative the author describes (except for scaling). HMACs for temporary credentials are clever and simple, if you're already versed in crypto primitives. But to anyone else, they're magic security sauce: "But why can't I just use MD5? Wait, no, that's broken somehow - what about SHA? What do I include in the HMAC? Do I have to use a new key every time?"

The solution trades implementation complexity for arcane knowledge. The author is so confident that his solution is superior because he already has that arcane knowledge.

That said, a well-implemented HMAC-based password reset flow is probably more bulletproof than a well-implemented DB-based flow. So if you're implementing AWS, go for HMAC. If you're implementing a blog, go with whatever you feel like. Just implement it well.

I don't really consider HMACs to be particularly arcane knowledge. They're easy to generate in any language and straightforward to work with.

While I consider myself to be a pretty good developer, I'm definitely not well-versed in crypto and math isn't my strongest skill. Despite that, I've had no problem using crypto for several of the obvious use cases (ex. password resets).

Arcane to use as the consumer of a library, or arcane to implement on your own?

The most important rule of thumb for crypto is that you'll probably mess up doing it yourself.

You shouldn't implement HMAC yourself, of course. I repeat: you should not do this.

However, if you read Wikipedia on HMAC, you'll find that HMAC is in fact probably one of the easier crypto primitives to understand and implement (as a TOY project). As long as you do the padding thing exactly as specified.

But really, use a library.

That rule if for like, implementing AES on your own, not just using HMAC! No one is saying roll your own HMAC!

That's a common misconception. While designing your own cipher or implementing AES is a terrible idea, it's fairly easy to produce broken systems with strong primitives like AES or HMAC, and the brokenness would not have anything to do with the implementation of those primitives.

How will you make a broken system with HMAC? Other than exposing the key or the data inside there really isn't that much you can do wrong with HMAC.

Are you arguing in favor of developer laziness? An average programmer should be able to understand the use cases for HMAC within a couple hours or so, given nothing but the Wikipedia page and an IDE or REPL, and knowing about MACs might prevent him from cooking up all manner of broken security protocols in the future.

Well, I'm certainly not in favor of name-calling like that. HMACs don't provide any greater provable security guarantees than a database row in this scenario, so it comes down to implementation. HMACs are harder if you're not familiar with them, but almost everyone is familiar with a database. On the other hand, HMACs make the security of the overall system easier to verify, assuming HMACs are used correctly - but it's harder for newcomers to use them correctly.

I'm saying that in this scenario, it doesn't make much of a difference which system is used. So the author shouldn't be quite so forceful in telling people to use a crypto primitive outside of their comfort zone for minimal benefit. I'd rather use a system written by someone confident in their DB solution, rather than someone who learned their HMAC solution in a couple of hours on Wikipedia.

How confident should the average developer be that they are able to create a secure crypto implementation in a couple of hours, based on some possibly-dubious advice they read in Wikipedia? I would argue that they shouldn't be very confident in this.

The first thing you learn about crypto is that it's harder than it looks and best left to experts. Knowing when it's actually okay is difficult. At the very least, I'd expect that they ask for a review from someone who does know what they're doing, and how many developers who don't work at a big Internet company have a spare security expert handy?

Nobody is suggesting the developer should roll their own crypto. Pretty much every language out there have mature packages for HMAC's. What you need to know are a few basic things about how to use them to verify the integrity of a message.

The risk is that a developer might not realize there is a significant difference between just encrypting something, and using a MAC.

This is a valid and very real concern and one of the reasons that a lot of companies want to hire CS graduates as developers. They have an idea of what this is, and any reasonable CS grad would know where to look for this, if they aren't already well versed in the crypto world.

I felt like the article sounded like an infomercial- making an ordinary task sound incredibly bothersome when it really isn't.

Signed URLs are incredibly useful, but the examples given dont seem that compelling to me.

Armin Ronacher wrote about this back in 2013. [My Favorite Database is the Network](http://lucumr.pocoo.org/2013/11/17/my-favorite-database/)

It's a nice article and generally a sound approach, but reading it has led me to think about something else: imagine an Internet where the data about each user (what would normally be entries in different tables on the servers) is stored, encrypted and authenticated, on the user's computer, as opposed to being stored on the server. Yes, I know, this goes against the 'cloud' principle we see around us and introduces a bunch of issues with syncing this data and it being available everywhere, which is the point of SaaS-like websites. But on the other hand, small websites could use this approach to drastically cut their costs and maintenance work they need to do. Also, a compromise of a server would expose the keys used for encryption, but then the attackers would need another piece of the puzzle, and that's stored on the user's computer.

Don't get me wrong, I'm not advocating this principle or implying it is needed and doable with the current technologies (I don't think we have a way of reliably storing such long-term data within the user's browser), but it's an interesting train of thought. It doesn't obviate the need for the server to store any sensitive information, just some of it. User info, password hashes, credit card numbers, maybe even social graphs depending on the use case, could all be stored this way. There will always be websites for which this doesn't make a lot of sense, but a lot of the other ones, mainly small players, could benefit from this.

It's curious how, when you describe it like that, it sounds that your data is being held hostage by a third party. After all, the server doesn't allow you to see what's in the encrypted opaque blobs, it only allows a controlled set of operations to be done with this data, and the only thing they do is further mutate this opaque blob. But it is no less transparent than the current state of affairs, namely servers keeping the data for themselves.

I'm interested to hear HN's opinion on this, flaws in the approach I've missed, etc.

> imagine an Internet where the data about each user (what would normally be entries in different tables on the servers) is stored, encrypted and authenticated, on the user's computer, as opposed to being stored on the server.

Trying to authenticate solely on the client side isn't as fruitful as you might be imagining, since it introduces potential security isues. Without any server-side authentication using only information (i.e. a blob) that the server has access to, it becomes very easy for clients to spoof/pretend to be other clients.

Consider that if this kind of system were easy/fruitful to implement, a lot of businesses would have done it already to avoid increased development/server costs on their end.

If the data is encrypted and authenticated with a server-known key, you couldn't change the data and end up with the server accepting it. How would a spoofing scenario work in this case? If you can steal another user's blob, you can pretend to be them, but if you can steal that blob, you can also steal a cookie if the website uses cookies for authentication. Maybe I am misunderstanding you.

One issue I can think of is users spoofing older blobs (reminiscent of a replay attack). I believe this could be mitigated if the server keeps a single 256-bit hash of each user's state (this would also make client-side HMACs redundant), or a 64-bit ever-increasing integer which is then included in the state. That's way more tiny than any user state.

This is kind of what some sites use to eliminate signing up and logging in - give each user a unique URL that itself is enough to authenticate them. Not great for security, but OK for a game or something that doesn't matter too much.

Why the scheme in case-study 2 can be bad:

User 1 registers username "alice". User 2 registers username "alice". User 1 verifies account. User 2 tries to verify the account. If the programmer didn't forget to check twice if an account name exists there's some error.

This might not happen often, but you still need to write a good error handling for that case.

Add a salt. It doesn't even have to be that large because of the unlikelihood. Alternately, adding IP address as a hashed parameter should distinguish them well.

Do I understand you right, that you should add something to the user-generated usernames? This seems like something that would be rejected by users to me.

This is the way I've always done password resets and activation emails, and it never seemed particularly non-obvious.

In fact I seem to remember django's default out-of-the-box password reset mechanism does it more or less this way.

I've also always done it this way. I didn't realize that was anything unusual.

It just seemed so obvious.

An iPad app I like and wanted to build on top of uses signed requests. Except the signing occurs on the iPad and is verified by the server. Oops. I disassembled the app and took the secret key to generate my own requests.

Cool approach, but I don't see how it could be used to make one-time links. You can invalidate a valid immutable datum; you can only wait until it is expired. To make a link expire immediately after being clicked, you need to store a something server-side.

You are right that something needs to be stored server side for one-time links.

In the password reset example, something is stored: The new password hash. Hence the example of including the old password hash, as a value that will be invalidated when the request has been processed.

Otherwise you need to bump a value on accessing the URL, but you still benefit by not having to create book-keeping information when creating the URL.

If you include the current password hash when calculating the hash for the reset link and you also include a random salt in your password hash, the reset link will become automatically invalidated because the password hash will change when a successful reset is performed.

I think the point was, if I request a password reset X times for the same email within the duration of the expiration time, there'll be X amount of valid password reset URLs that can potentially be bruteforced. It doesn't matter if the other links are invalidated as long as one of them works.

This is solved by rate limiting, I suppose. Feels like something that should've been included in the article.

I think this problem can also be mitigated by the additional parameters in the reset link like the expration time. If these parametes differ you actually dont get to hash one thing and test it against all currently valid reset URLs to make brute force easier. You could even salt your reset URL deliberately.

Of corse, I'm not arguing against rate limiting.

> I think the point was, if I request a password reset X times for the same email within the duration of the expiration time, there'll be X amount of valid password reset URLs that can potentially be bruteforced.

What does this have to do with whether you're storing tokens or using a hash?

Sorry for the late reply. In the apps I have, I delete any existing tokens for the given e-mail address when a new one is requested, so only one is valid at any given time.

You include the expiry time in the URL and HMAC the URL as well.

I don't think you understand. At a minimum, I don't understand you.

The question is not how to make links that expire after a time, the question is how to make links that expire after a fixed number of uses (e.g. one use).

Suppose your link expires in 30 minutes (or any other time). What stops me from using that link 30 times, once per minute?

mschwaig replied with the answer above [0]: use the old password hash when creating the token. For an example in the wild, see how Django does it[1].

0. https://news.ycombinator.com/item?id=9055749

1. https://github.com/django/django/blob/master/django/contrib/...

Thank you for the reply, and once I read it enough times in a row, I got it through my thick head.

It's still not clear to me, however, how to generalize this technique to create limited-use links that are not necessarily for resetting passwords.

If you want to have limited-use links that don't cause any writes to the db, then no, you can't with this approach. All it does is use an already necessary db write, the password change, to avoid adding another one, a token.

Yea... this has been around for years.

JSP and ASP.NET have allowed for this kind of shennanigans in their "view state" (albeit, security is a configuration option away...) mechanism. It's not hard to extend it out to things flying back and forth to the user.

As for usability, these sort of things should be wrapped up in a nice container class; HMAC taken care of, and (probably) a key-value API presented. No fuss, no muss.

If there's no such library, creating one could definitely pose a security risk to any project without sufficient expertise, as this post appears to be endorsing.

Find an existing, tested, reviewed implementation that provides the API you need, and stick with it.

These things can be useful fraud-detection signals. If you don't log password reset requests then you're missing a chance to catch suspicious behavior. Even if you're capturing that data but using it now, at least having it gives you that potential.

Also, unless you plan on making the valid window very short, you probably want to mark reset requests as used so someone's old email doesn't get used again.

It's not a completely crazy idea, but the minimal amount of trouble required for the extra tables doesn't seem like enough to give up the logging and expiration benefits.

Logging things in a relational database for the purpose of catching suspicious behavior is a bad idea.

You also don't need to mark reset requests as used, if the generated token was composed of the user's old hashed password or email, as that token will be invalid as soon as the user does the reset.

It's worth noting the parent didn't say anything about relational databases. Even if they did, I don't see how that would be a problem.

How else can you keep track of whether there is a wave of reset abuse targeting an user / email, if not through saving it to some sort of data store?

eg - after 5 reset attempts in 15 or 30 minutes, prevent any further reset attempts for the next X amount of time; either outright, or based on a signature of the request

"Some sort of data store" does not imply the normal solution that people deploy to solve this problem.

There is a big difference between defining a table in your database that keeps track of resets, or using a queue of messages (event sourcing even?) with filters applied for detecting abnormal behavior.

The former is a dirty solution, dirty because you're storing junk that shouldn't be stored in a relational database and it doesn't take care of other much more important kinds of attacks on your system. Whereas the later is extensible.

So lets say that in addition to limiting the number of resets one does, you also want to limit the number of failed login attempts to 10 per hour. You may also want to limit users jumping between IP addresses, you don't want to be too strict about it, because mobile connections, but you do want to prevent multiple sessions active that use the same user credentials. You may also want your system to evolve based on taking averages out of user's past activity.

Now where does that data go? Of course it's in "some kind of data store", but that says nothing, because the log files stored on disk are also a data store.

you should have a look at JSON Web Token http://jwt.io/

Interesting, but massive overkill when you're transferring assertions between two parties where only one party (the server) is allowed to create the assertions.

In the examples in the article, the JWT header is just plain cruft because you're unlikely to be switching encoding often (and if you decide to, including a single much shorter token as a "stand-in" for the bloated JSON data would be much better; using JSON).

The payload also represents a lot of extra overhead unless you intend to transfer more than just a single level dictionary.

It's kind of comical that they present it as "compact" given that probably something like 30% of the length of the presented example is unnecessary.

About the only thing useful here is having the session id be validated before being used. PHP is notorious for allowing the client to select any session id they want.

I've read "the network is my favorite database" a while ago. Similar concept. But, the problem is that the urls are really long and wouldn't fit into an sms for example. If you are trying to get users to get started with your service the last thing you want to do is throw more friction at the process.

It's a bit like Microsoft shuttling ViewState back and forth in the old ASP, instead of just storing sessions in the database. Signing things is good for validation, to prevent giving out resources to a client which wasn't given valid credentials. But that stuff should be stored in cookies, not links. Links should be simple and to the point. And if that means having the links contain a token the server can use to look up additional info, tinyurl-style, then so be it. That lookup is fast, and once you did it, you can cache it in a hash based cache like memcache. Better experience for the end user and better security for you.

In short - I agree with the author's premise about guarantees etc. but when giving humans links, they should be simple. Store all the authentication crap in cookies.

To bring my point home, I'll just remark that inviting someone by sms or email is a way to not only motivate them to sign up but also having them click a unique link already verifies their mobile number or email address in one shot. One click and they have an account.

I love this technique, I copied it from Amazon a while back and have been using it to avoid storing state in the database ever since, I love it.

Also, Rails has done something similar with their sessions, ages ago, with the whole state being stored in the client with a secret on the server.

It's such a cool idea, and now there's the JWT [1] thing that you can use, and I use all the time these days.

[1] http://jwt.io/

Using HMAC'd tokens to authenticate requests is pretty common in the form of cookie-based session variables (at least in most non-PHP web frameworks, e.g. Flask). This is really the same technique, except for email requests, where the data is embedded in a URL instead of a cookie.

Not to say it's not useful, just that it's not really novel. You also still have to be careful of replay attacks, which the author briefly addresses.

It's both common and a common source of mistakes. Every web framework I know of that uses HMAC'd sessions has had at least one glaring vulnerability that could have been avoided by using the old school opaque token database lookup technique.

If you are google scale I would say do whatever you need to do to make your scale work since your very expensive scale experts can know if there is something that is going to bite you, but in most cases that is overkill. Store it in your database, then in some key value database (e.g redis) if you must.

I tend to mistrust using crypto for things that can reasonably be done without crypto. Even if the initial implementation is perfect, it's too easy for a future maintainer to make a simple mistake that introduces major security flaws. The place to use this sort of technique is where it enables a fundamentally different architecture for your application, and not just where it's a small performance optimization.

While this is true, to my mind it's almost "6 of one/half dozen of the other". Using the examples given, there have been many security flaws where a generated token like a password reset token (one stored in the DB) is not truly random, or not as random as the implementer envisioned.

Also, keeping systems secure when they need to support use cases that inherently try to subvert some measure of security (like "forgot password") is difficult, and I find the difficulty is much more likely to be in the overall workflow than flaws in the "generateToken" and "validateToken" methods themselves.

In this case (although it wasn't shown very well in the article) it does enable a very different architecture: you can generate the code in one data center and have it work on a computer in any other datacenter.

Unless you are google scale you don't need that, so I am on your side (other than HMACS not actually being that difficult to fuck up, compared to e.g just encrypting the entire thing in AES at which point you are hosed).

Pure code with stateless systems are usually easier to test and less error prone.

I loved the article and all the use cases are valid. However I don't think the first two use cases save that much effort.

1. In Rails apps with the devise gem the password reset token is just a column, not a whole other table. 2. Email activation, having the additional non-activated records is not a big problem, 20% extra records in my users table is pretty acceptable.

It's not the extra database columns, it's the removal of a state field. The additional of a single boolean state field essentially doubles your API area because now you have two types of users that may try to do things, and it's highly unlikely you have saturation testing for unverified users.

Actually rails has (or had I haven't looked at it seriously since the load and execute all the yaml fiasco) a relatively simple before filter (it may have been called before_filter, actually) that you use to filter out users who are not signed in, it wouldn't be difficult at all to change that to check the user was signed in and verified and to update the single place where the user is allowed in (likely /verify) as an unverified user to not use that filter.

Django has native support for unverified users.

That said it is still a good point.

But that's not the only place you use users. What about batch jobs? What about every place where you get a list of users for the admin for one reason or another? Did you cover and test those? Devs seriously underestimate how much the additional of a state field impacts their code, especially once you get the combinatorial explosion of several state variables interacting...

That makes sense, normally Devise handles that for me https://github.com/plataformatec/devise

Since Rails 4.1, the technique described in the article is trivial to use. Check out message_verifier.

I looked it up, http://api.rubyonrails.org/classes/ActiveSupport/MessageVeri... looks cool, but didn't find a quick way to use it with Devise.

>HMAC256("userId=johnnysmith&expirationTime=1356156000&oldBcryptHash=$oldBcryptHash&clientIpAddress=$clientIpAddress", $mySuperSecretKey)

Maybe I'm wrong but he can still use this password reset if he remembered his login and logged in. ID stays, it might not be expired, BcryptHash stays, Ip stays. That, of course, is suboptimal.

I don't understand what problem you are describing. A user who remembers his momentarily-forgotten password can of course change or reset it regardless. The article quite sensibly describes using the old password hash as a nonce without pestering the database with extraneous records.

It would be nice to stop all reset tokens if the user has logged in after the reset tokens were made. But you're right, it's not really a problem unless one chooses make it one ;)

One could overcome this by tracking the user's last password-driven login date and rolling it into the MAC; alternately, just use a short expiration time on the reset token and don't sweat it.

But that's just introducing additional state (last login date), so if you're starting to do that there's not much point in the HMAC approach.

I think it all comes down to what benefit your user's security.

If you want to expose to user his or her account's activity, like history of recent password reset, recent login activity, things that a typical FB user would see in the user's activity page, that can be a really nice security feature. So maybe after all the HMAC approach works just fine with extra state added. But I think the point the author is bringing is that the application can simply just do HMAC(.....) easily and verify in a few lines of just if and else branch, whereas the traditional password reset code would probably take more lines and more conditions. I had written code that did exactly what the author didn't like and trust me that code wasn't pretty and I had spent many hours to optimize the code. Edge cases and test cases shitted my codebase.

It will be checked against the current hash. The reset creates a new hash, so the token won't work a second time.

Hes saying the user logs in with their normal password and takes no other action on the reset. The reset is still valid until the expiration.

Loved reading this. Most original article I've recently read.


This was a really refreshing read on HN. Wish there were more articles like this one utilizing crypto for modern programming applications.

Great article!

While I don't think I'll be implementing any of the solutions outlined in the article (because the database does not represent a pain point for me) I do think this is a fascinating technique and is something worth keeping in the toolbox as a solution.

One of the other use case is session ids. Pretty much you can sign:

user id|timeout|user custom property

Having custom property (per user number), gives an opportunity to kill sessions. E.g. session is compromised, just increment it and regenerate session for valid user.

In other words: a maintenance nightmare, filled with security gotchas, difficult to scale, and hard to write.

I'm not sure I follow this. To change a password you need to write to the db anyways, it's one extra check to set token used=1 or remove it. Why is that a huge deal?

Now you don’t have to worry about clearing old bitrot from the database or worrying about when to expire non-verified accounts.

It's also possible to use Redis or another fast store with timeouts. The problem seems rather exaggerated.

This is all fine and well until you need to measure "how many users reset their passwords" or "what is the bounce rate of users confirming their email addresses". This may be useful for avoiding sensitive data, but it's not a practical substitute for writing to the database, nor is it really that much effort to create two database tables.

That sort of data belongs in an analytics or event database (ie. one with entirely different requirements from your normal database).

Usually when I implement a stateless approach like this I still fire off a log event every time we generate a password reset.

If I understand the article properly, what the author is suggesting is that instead of writing lines into the database that contain information about users, they create a hash that has some secret + user-supplied information, and if it matches, then that's considered validation.

The reason why I completely disagree with this is because this is embedded in code. All your developers will have access to this code, and you have people joining and leaving your company all the time. If this scheme is used to protect more important information and if it leaks, then it could cause chaos.

The difference is that all developers generally have access to code, but only some well-trusted people have database access in prod. If I care about security, that is a more secure way to do things.

You could make the secret of the MAC an environment variable.

PS: I don't understand why you are getting downvoted, your comment is more relevant than most of the comments here who don't understand a MAC.

How do you prevent your developers from accessing the production database? Use the same mechanism to prevent them from knowing the production HMAC secret.

If the production DB credentials are hard-coded, developers would indeed have access to the prod DB; if the production HMAC secret were hard-coded, developers would indeed be able to forge tokens. So don't do that.

That's why we use environment variables.

Sounds like security by obscurity to me. If you are worried about the secret part of the MAC then keep that secret, or change it when someone leaves.

Lately I see this approach used more and more. I really don't understand what the big gain is. Sure, you save yourself a few database round trips, but is that really that big a deal?

Compare that savings with the additional size of URL parameters (increasing network overhead) in addition to the processing overhead of encryption/signing/validation. Have we really made a performance gain? I don't have any hard numbers, but I'm willing to wager than at gains are fairly negligable.

From a security standpoint, I'm not fond of this approach, and I don't think it's benefits outweight the added risks. Here's why:

- Encryption (including simple signing) is really easy to get wrong in subtle ways. A simple mistake like using SHA256() instead of HMACSHA256() breaks the system badly.

- A corralary to the first point is that using a strong key is vital. The article says nothing of the criteria a strong key should meet and a poor key could enable additional attacks. While I'm not aware of anything allowing for HMAC key recovery in better than brute force, this could be problematic. If your key is a random 10 character ASCII string, it falls to brute force in minutes at most. If it's 256 random, you're much better off.

- This scheme introduces a significant number of anonymous user-supplied inputs to the application which didn't previously exist. Each of these is a potential place for things to go wrong, which can give rise to other vulnerabilities (XSS, SQLi, etc) depending on what you're doing with that input. You'll need to be sure that you validate the signature prior to doing anything with these unverified inputs, in addition to performing sanity checks on their contents prior to signing. Part of secure application design is to minimize attack surface and complexity - this does the opposite.

- Once a signature is created, it can't be revoked. I mean, you could store the signature in the database and validate it's presence on receipt, or include in your signed values a nonce that's stored in the database, but that kind of defeats the goal of avoiding database storage. This is something the article tries to address by signing over things like an expiry time, the user's old password digest, etc. But that's insufficent as well (I'll get into why below).

- While I'm not aware of anything allowing for HMAC key recovery in better than brute force, choosing a secure key is still important. The article says nothing of the criteria a strong key should meet and a poor key could enable additional attacks.

- Even if you get everything right, key exposure is fatal. You've introduced a single point of failure in the system that didn't previously exist, and a fairly catostrophic one. If the app ever leaks that key, you're only option is to rotate it (assuming you detect the exposure), which invalidates any existing signed values.

That's the broad points, let's look at the article's examples:

Okay, so we have an irrevokable reset token for johnnysmith valid until 12/22/2012. The article doesn't mention where the key comes from. Since the key is the only thing johnny doesn't know, he can attempt as many keys as he wants offline. If he gets a hit, he can now create a valid token for any account he wants. NOT GOOD If the key is sufficiently long and randomly generated, this becomes much more difficult to the point of infeasibility but if the key is weak, so too is the overall security.

As the article notes the token can also be reused multiple times, even if johnny has already used it. They offer a means of fixing this:

  HMAC256("userId=johnnysmith&expirationTime=1356156000&oldBcryptHash=$oldBcryptHash&clientIpAddress=$clientIpAddress", $mySuperSecretKey)
Okay, so with the bcrypt digest covered under the hash, a change of password will cause the signature to no longer match. But now on verification we have to pull the user's hash before calculating the HMAC, so we've added a database read back which reduces the perceived benefits.

Furthermore, this token is still valid if Johnny requests a password reset, and while waiting for the email to arrive remembers that he changed it from 'Password123' to 'Password456' and logs in. There's then a window of time where that token remains valid (until expirationTime arrives) and can be abused. The article doesn't address this case at all. I suppose they'd say that on issuing a token, you should disable login to Johnny's account until he's reset, but how would you do this? Well, you'd set a flag in that database on Johnny's record, and we're kind of back where we started again with both a database read and write, but with added cryptographic overhead and attack surface too.

The next example talks about account registration. It states that you can verify a user's email account by sending them a link to:

This doesn't concern me as much as password recovery, since account creation is likely to be fairly anonymous anyhow and not nearly as security sensitive. Still, it's not technically true that the token validates your receipt of the email. Instead, it validates your receipt of the email, or your knowledge of the key and values used to generate the token. In practice, this is likely a small difference, but a truly random token stored in the database would require database access or visibility into the email to obtain, whereas this requires only knowledge of the key instead.

Case Study 3 talks about one-time-use and expiring resources. There's not enough detail here about their scheme for me to really critique, but they point out AWS/S3 use of HMAC signing as an example. Except that AWS/S3 signing isn't about one-time use or expiration. Instead, it's about authenticity.

AWS operates with an ACCESS_KEY and SECRET_KEY which are a pair of tokens related to each other. Requests with 'params' are signed as:

  Signature = HMAC(ACCESS_KEY + params, SECRET_KEY)
Signature, ACCESS_KEY, and params are sent to the server. SECRET_KEY is the HMAC signing key. On the server side, AWS presumbaly does a DB lookup for your ACCESS_KEY, obtains the corresponding SECRET_KEY, and validates the signature matches. This authenticates that the requestor has knowledge of the SECRET_KEY, but it implies the existing of a database lookup. If you think of the ACCESS_KEY as a username, you'll see that this is really authenticating each request, as well as preventing tampering. You can issue keypairs that're good for only one use, or expire after a set time period, etc. but there's nothing implicit in the HMAC scheme that causes this. Interestingly, this scheme almost certainly requires storing the token in a server-side database the exact thing the author wants to prevent.

This can be done reasonably well, but it's a minefield to navigate when we have a tried and true solution. What's so bad about database storage of tokens? The post kind of starts with the assertion that they should be avoided, but I see no clear explanation as to why.

Okay, wow... forgive my typos and grammatical errors. Rage replying :)

While encryption is obviously a tool every developer should be using, this seems a little hand-wavy and built against a rather contrived strawman. It is a little info-mercially -- you fuss, you muss...

Is a table and a couple of columns actually difficult? Is this, or has this ever been, a problem for anyone? When did we enter the "post-database" world?

This all makes plenty of sense if you buy the strawman, otherwise it seems like some pretty strong over-reaching.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact