Hacker News new | comments | show | ask | jobs | submit login
There are no good constant-time data structures (wingolog.org)
43 points by dmit on Dec 2, 2014 | hide | past | web | favorite | 52 comments



I feel like I must be missing something here.

You do not need constant-time comparisons of password hashes; to see why, reason through how you'd attack them.

An attack against an HMAC authenticator provides full, incremental control over the hash. No such control exists for a password hash; you can't step from "AAAAAA" to "AAAAAB" without a devastating break in the hash function itself. The capability to predict the input that generates "AAAAAB" implies a total failure of preimage resistance.

You can end up in a situation where timing leaks are relevant to password authentication. To do that, you need to design your own password storage system, and it needs to be badly flawed; for instance, you could literally use a general-purpose fast lookup hash and a chaining hash table. This is yet another reason to simply use bcrypt or scrypt to store passwords; doing so takes timing off the table.

In reality, though, you could just use a salted SHA1 hash (don't, though) and still be safe from this attack.


I had to read the article a couple times.

The scenario he's describing is:

  1. A user submits a password (with no username) to authenticate.
  2. The server looks up the password in a hash table. This involves:
     a. Hashing the password to find the correct bucket.
     b. Doing a string comparison against passwords stored in the bucket.
There are vulnerabilities in two places:

1. Buckets containing more passwords will take longer to return a negative result because they will have to perform more string comparisons. (This irrelevant except in pathological cases.)

2. String comparisons with longer common prefixes will take longer to return a negative result.

This is a bad example for a couple reasons:

1. Password inputs are not compared directly to stored values in any real authentication system. They are hashed first. A timing attack here implies a preimage attack against the hash function.

2. Passwords are not global. Passwords are paired with a username. You don't get to time the comparison against every password in the system to determine common prefixes, for example.

A better example might have been API tokens, which are often stored in cleartext and fetched directly from the database.


I don't think the post is about passwords in that sense, but of some kind of authorization tokens handed out by the server after authentication. E.g. in web frameworks an opaque session cookie that's used to index into server side session storage doesn't seem like a horribly exotic design choice.


Never mind, I think I see it now. For posterity:

I still don't follow. Good authentication tokens make extremely sparse use of their numeric domain; they are e.g. 128 bit numbers, of which only ~15-20 bits are needed. It doesn't look like there's enough information in the indices to incrementally attack them, which is how timing attacks work.

I buy that I'm just missing something here. I'm just looking for someone to reframe this issue in terms of what an attacker actually does.

The canonical example of the target you're referring to is a Java JSESSIONID cookie. So: what's the timing attack against JSESSIONID?


So I'm not a security guy. And I don't know what kind of constraints Andy's system has, or what attack he was thinking of. But it just didn't seem like there was a need for a "use bcrypt" response since from the context these were not passwords in the classical sense. That'd be an inappropriate operation to have on the authorization fast path.

That said, I think we'd be talking of something like this for a web framework session storage attack:

Assumptions:

- A separate chained hash table implementation that doesn't use any kind of per-table randomization (e.g. like what Perl does for DOS prevention).

- A string comparison function that's not constant time.

What the attacker would do:

- Log in, observe session key.

- Generate on client side other tokens that would hash to the same bucket as the real session key

- Use a timing attack using the generated keys to determine the number of other session keys in the same hash bucket.

- Repeat with new logins until they find a bucket with exactly one other session key.

- Conduct a timing attack on the first character of the other session key in the bucket. (I.e. generate 256 fake session keys all hashing to the same bucket, each starting with a different byte).

- Conduct a timing attack on the second character.

- And so on.

- Once you've found a genuine session key, see whether it granted you any elevated rights.

Seems to me that the only thing that's needed is the ability to generate strings with an arbitrary prefix that hash to an arbitrary bucket. Without giving it too much thought, that seems pretty simple. I must be missing something here since you guys are finding this to be such an outlandish idea. Looking forward to hearing what it is.


This doesn't look like a straightforward attack, but it seems plausible and worth exploring. I think the word "password" just turned my brain off --- in part because I see so many people obsess about constant-time comparison for password hashes.

That said, 'tedunangst's suggestion to hash before comparison seems sound; double-hashing is an old hack people use to do timing-safe comparisons of normal hashes, too.

So I'm still not sure you need timing-safe containers or timing-safe indices, although it's interesting to think about how you'd design them.


This may be doable. Finding the size of the hash table and the length of each bucket seems like the hard part. What are real world differences between zero, one, and two long chains? Also, practically speaking, I think the attack assumes chaining, but both python and lua and others use open addressing, which makes it harder to target one bucket.

If this concerns you, I think I'd be more worried about a system that looks up tokens in a database. A (My/Post)SQL btree index may leak more data and is more "linearly" attackable.


Sure, I'm definitely not personally worried about this :-) Just spitballing on what such an attack might look like.

I wouldn't have thought that figuring out the chain size is the weak point. Every step in the chain must add at least as much processing time as a string comparison that fails on the first byte, and the rest of the attack is already assuming that we can measure something at that resolution.


The solution is the same. The server does sha256(token) and then looks that up in its magic table.


In fairness, almost nobody actually does this; they directly index the token itself.


Sure, and the post explicitly mentions that solution multiple times.


Write a password library that uses bcrypt. Compare hashes with strcmp. Put it on the internet. See what happens. :)


That (using ==) is what py-bcrypt does, written by Damien Miller, a fairly well-known OpenSSH and OpenBSD hacker. It has definitely attracted some "smart" comments from "experts," as does anything that uses == in a security setting, no matter the actual context.

Sure, it's usually poignant critique, but anyone who actually knows anything about this subject agrees that doing a non-constant-time-compare on what is essentially a random oracle leaks nothing of value to an attacker, and people are just parroting a catch phrase without understanding it. It's the evolution of the fan favorite "That's security through obscurity!" applied outside cryptography.


Who do you think told me what happens? :)


Perfect. Small world.


That doesn't tell me anything. :/

As far as I can tell, you're implying an attacker can determine a password using timing of hash comparisons. How is this any different (just more complicated) than a regular brute force attack?


strcmp will stop at the end of the string. (the first zero or \0) Comparing the hashes ABC\0DEF and ABC\0XYZ will result true. There are a lot of hashcollisions which make finding a password a lot easier but the password won't work with sites that don't use strcmp.


I don't understand: why not write the function to take a predetermined (or even slightly random) amount of time to run, and have it just sleep at the end of execution until the timer's up?


Came here to say this. If that's deemed too heavyweight, you can just add "delay" as a random amount of computation overhead (say a loop that does nothing but does not get optimized away). The trick is to make the delay large enough that if you run X times a password with wrong first characters and a password with wrong last character, you aren't able to tell the difference (or you need like a zillion tries -- at which point you may as well do bruteforce).

If you did not have the requirement that password could be arbitrarily long (which seems like a bad idea anyway), you could just pad the password, then compare char by char, never stopping until the end, updating a boolean variable that says if the string is different at each step (character).

Also, this might not be a concern on the web where there's already quite a bit of variability in the response time.


Adding random delays does not remove signal; it only adds noise. Noise can be removed by simply performing more trials.


The point is to pad the time to a random amount, not by a random amount.

If a particular function by design needs to have a throughput of at least 100 requests/sec, then you ensure that it has a throughput of exactly 100 rq/sec by calculating the response and then busywaiting until 10ms has elapsed - no matter if the actually neccessary things took 1 or 9 ms based on the input, caching and whatever else.

Alternatively, you use a secure random to state that this request will respond in exactly x ms - not when some calculations are finished. Again, you need to be sure that your calculations are guaranteed to finish before x, so you need to know the worst case time of your algorithm, but you don't need it to be constant time.


The delays aren't random. They last just enough so that the function always takes the same amount of time, presumably the worst-case amount of time.


> presumably the worst-case amount of time.

Determining the the worst case amount of time is far from trivial. What's the slowest amount of time it takes lookup table based AES to decrypt a block?

The only answer to that question is to measure it a bunch, then pick some percentile and hope your margin of safety is sufficient. Or you could use a constant time algorithm to start and rely less on hope.


It is actually quite trivial.

You simply align function running time to the worst case _so far_. Start with 0, put it through a warm-up cycle and use it to set a reasonable default. Cache the value between app's sessions, re-initialize when changing the setup.


'If that's deemed too heavyweight, you can just add "delay" as a random amount of computation overhead (say a loop that does nothing but does not get optimized away).'

If the delay here is not meant to be random, what is the "that" being contrasted against? I understood "that" to be what you say, which would imply this is something different.


I know, that was my point about making the random delay large enough that recovering the signal would be less efficient than a brute-force attack.

And I also suggest a scheme that is about rounding to a fixed delay.


> a key-value map on common hardware that runs in constant time and is sublinear in the number of entries in the map

Counterexample: Cuckoo hashmap, at least for lookup.

Calculate both hashes. Index the table via each hash. Check both table indexes to see if either match.

The only timing information this leaks is the length of the input string, and that an adversary already knows.

Now, there will be some cache information potentially leaked, although there are ways around that too. But this is constant time disregarding cache, and is sublinear.

As for the question of "a nice constant-time comparison algorithm for (say) 160-bit values", it's relatively easy. Do a pairwise comparison of constant-size chunks of the value, mapping <, =, and > to 01, 10, and 11, respectively. Then recurse as necessary. Of course, this assumes you have a constant-time comparison function for a single chunk.


A 2-way 1-ary cuckoo hash was going to be my answer as well. As you say, you want to always check both table indexes rather than waiting to see if you need to, but in almost all cases it's faster to do this anyway.

Since I think he only wants to check for equality, a constant time comparison of 160-bit values should be easy too. If you have a modern x86 machine with SSE4.1, the easiest might be to use _mm_cmpeq_epi64(vpcmpeqq), _mm_and_si128(pand), and _mm_testz_si128(ptest). With 256-bit vectors, it gets even simpler. But any scalar solution should be fine too as long as you can convince the compiler to do what you want.


You likely to leak a fair amount of information simply from cache misses, but also things like branch prediction. Constant time operations in x86 is surprisingly hard.


A cuckoo hashmap doesn't have any data-dependent branches, assuming the hash function itself is thusly secure.

And as for cache misses - what exactly is a cache miss going to tell you in this case? All it'll tell you is that that an input having a hash close to either hash of your input wasn't accessed recently. That, as far as I can tell, doesn't leak anything, assuming your passwords are salted.


> One problem is, I don't know of a nice constant-time comparison algorithm for (say) 160-bit values. [...] I would appreciate any pointers to such a constant-time less-than algorithm.

The following should be constant-time relatively to the contents of its inputs:

    // return -1 if x < y, else 0
    static unsigned lt(unsigned x, unsigned y) {
      return -((x - y) >> (sizeof(x)*8-1));
    }

    // return -1 if x != y, else 0
    static unsigned ne(unsigned x, unsigned y) {
      const unsigned d = (x - y) | (y - x);
      return -(d >> (sizeof(d)*8-1));
    }

    // return 1 if x[0..n-1] is lexicographically less than y[0..n-1], else 0
    int less_than(const unsigned char * x, const unsigned char * y, size_t n) {
      unsigned flag = 0;
      unsigned done = 0;
      for(size_t i = 0; i < n; ++i) {
        flag |= lt(x[i], y[i]) & ~done;
        done = ne(x[i], y[i]);
      }
      return flag & 1;
    }


This is very similar to the algorithm OpenBSD uses in timingsafe_memcmp().

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string...


Indeed. The two main differences are that I didn't care about returning {-1, 0, 1} to emulate `memcmp`, and that I don't rely on implementation-defined behavior (signed shifts being arithmetic).


Right-shift of signed integer values is implementation-defined behaviour though. If (x - y) is unsigned, the comments are wrong, they would return 1, not -1.


The `-` is there to turn 1 into -1 (mod 2^n), and 0 into 0. I use `- (x >> (sizeof(x)*8-1))` to get arithmetic shift semantics while only using unsigned arithmetic.


Why not just use xor instead of "or + subtraction" in your "ne" function.


I'm not sure I understand what you're suggesting. x ^ y will obviously never be 0 when x != y, but there is no quick way to locate where they differ to turn it into a mask. When x != y, either x - y or y - x will be negative and therefore have the MSB set.


    //quantize run duration to mitigate timing attacks
    checkPasswordWrapper(...) {
        result = checkPassword(...)
        until (getMilliseconds() % 250) {
            sleep(1) //Needs optimization
        }
        return result
    }


You can gate the completion of each hash lookup with an absolute-sleep, e.g. using the POSIX function clock_nanosleep with a flags value of TIMER_ABSTIME.

Before the hash lookup, calculate a set time into the future, say 300 ms. Then do the hash lookup. Then wait until the predetermined time and return the result.

Waking up a thread at a specific time is just real-time programming 101.

Another idea: if you have some context information that lets you identify that queries are coming from the same entity (same IP address, same tty, same operating system user, whatever), you can impose a shadow ban after 10 unsuccessful tries: fail even if there is a hash "hit", and randomize the times to feed the attacker junk timing data.


You have to ensure the attacker can't measure temperature-induced clock drift, because you can still measure the amount of work the CPU is doing to surprising accuracy. You can probably introduce extra busy-work, or farm the work out to satellite servers whose timing jitter can't be observed at the microsecond level.

You can also run your lookup against uncacheable memory, so that the attacker can't detect hot keys. There's still some memory timing analysis available because of DRAM banking, though.

Security is a process :-)


Or you could put a heater with temperature control on the CPU. Oh look, just take that one out of the soldering iron ... :) No wait, sync the clock to a remote external source.


One approach is to use a hash table where each entry has a block of values, not a linear list. The entire block is always tested linearly, even though most of the values are null. Block overflow means an expensive operation to increase the size of the hash table or the blocks, but that's an insertion-time issue, not a lookup-time one.

There's still the problem of cache noise.


So, hash tables don't really work like this. There's no reason for the "FOOBAR" password to be stored close to "FOOBAZ", if you pick a well behaved hash function. You might get to know how many passwords are on the same hash bucket, but that's it. Furthermore, the authentication will be accessed over a network. The network latency (and it's variance) is several orders of magnitude larger than the possible variations. It would be seriously hard to measure something like this with the resolution needed (nanoseconds). Even if you're taking samples and doing an average based guess, you'd need an absurd number of samples (so a bruteforce attack on passwords would be more feasible?)

Edit: Actually, even if by coincidence you did get FOOBAZ on the same bucket as FOOBAR, you wouldn't be able to easily distinguish from a single comparison with 5 equal characters from multiple comparisons with smaller number of equal characters. Also, if you're getting that many collisions to the point that this is a problem, you've sized your hash table incorrectly.


For the problem mentioned in the post (determining whether a given password is valid), how about using a Trie[0]? Wouldn't this work perfectly in this situation?

[0]: http://en.wikipedia.org/wiki/Trie


The author's point about memory access being non-constant still holds for a trie. Nodes that have been recently accessed will still be in cache and will load faster.

The quest for a sublinear algorithm is futile in the face of caching. The solution is similar to the solution to branch-based timing attacks: don't branch, in code or in data. Every comparison must follow the same code path and the same memory path, and that means reading every element of the data structure every time you search.


I'm embarrassed to admit that I thought "timing attacks" had something to do with race condition bugs. ...Crap, I've got some reading to catch up on.


Or, ...add sleep(rand(..)) to each lookup, regardless of result.


This is suggested every time timing attacks are discussed. This is not a good mitigation. It increases the number of requests required to complete a timing attack, but in the end all of your rand() calls average out and you still see timing differences.


OK, then why not have the sensitive operation always take 100ms? If it finishes early, just sleep until the 100ms mark.


And in case it takes more than 100ms or a widely variable amount of time: https://news.ycombinator.com/item?id=8691076


That won't help in general since the noise this method adds can be filtered out over many measurements. Even without taking into account that the sleep would have to be at least in the same order of magnitude than the leaked information you want to hide.


Not if the random variable distribution changed over time. Calculate a new mean and variance every now and then, based on some hash of current time or whatever you want, maybe view count of Gangam Style Videos :-). Joking, but that would work.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: