
There are no good constant-time data structures - dmit
https://wingolog.org/archives/2014/12/02/there-are-no-good-constant-time-data-structures
======
tptacek
I feel like I must be missing something here.

You do not need constant-time comparisons of password hashes; to see why,
reason through how you'd attack them.

An attack against an HMAC authenticator provides full, incremental control
over the hash. No such control exists for a password hash; you can't step from
"AAAAAA" to "AAAAAB" without a devastating break in the hash function itself.
The capability to predict the input that generates "AAAAAB" implies a total
failure of preimage resistance.

You _can_ end up in a situation where timing leaks are relevant to password
authentication. To do that, you need to design your own password storage
system, and it needs to be badly flawed; for instance, you could literally use
a general-purpose fast lookup hash and a chaining hash table. This is yet
another reason to simply use bcrypt or scrypt to store passwords; doing so
takes timing off the table.

In reality, though, you could just use a salted SHA1 hash (don't, though) and
still be safe from this attack.

~~~
jsnell
I don't think the post is about passwords in that sense, but of some kind of
authorization tokens handed out by the server after authentication. E.g. in
web frameworks an opaque session cookie that's used to index into server side
session storage doesn't seem like a horribly exotic design choice.

~~~
tedunangst
The solution is the same. The server does sha256(token) and then looks that up
in its magic table.

~~~
tptacek
In fairness, almost nobody actually does this; they directly index the token
itself.

------
thaumaturgy
I don't understand: why not write the function to take a predetermined (or
even slightly random) amount of time to run, and have it just sleep at the end
of execution until the timer's up?

~~~
norswap
Came here to say this. If that's deemed too heavyweight, you can just add
"delay" as a random amount of computation overhead (say a loop that does
nothing but does not get optimized away). The trick is to make the delay large
enough that if you run X times a password with wrong first characters and a
password with wrong last character, you aren't able to tell the difference (or
you need like a zillion tries -- at which point you may as well do
bruteforce).

If you did not have the requirement that password could be arbitrarily long
(which seems like a bad idea anyway), you could just pad the password, then
compare char by char, never stopping until the end, updating a boolean
variable that says if the string is different at each step (character).

Also, this might not be a concern on the web where there's already quite a bit
of variability in the response time.

~~~
stouset
Adding random delays does not remove signal; it only adds noise. Noise can be
removed by simply performing more trials.

~~~
Narishma
The delays aren't random. They last just enough so that the function always
takes the same amount of time, presumably the worst-case amount of time.

~~~
tedunangst
> presumably the worst-case amount of time.

Determining the the worst case amount of time is far from trivial. What's the
slowest amount of time it takes lookup table based AES to decrypt a block?

The only answer to that question is to measure it a bunch, then pick some
percentile and hope your margin of safety is sufficient. Or you could use a
constant time algorithm to start and rely less on hope.

~~~
eps
It _is_ actually quite trivial.

You simply align function running time to the worst case _so far_. Start with
0, put it through a warm-up cycle and use it to set a reasonable default.
Cache the value between app's sessions, re-initialize when changing the setup.

------
TheLoneWolfling
> a key-value map on common hardware that runs in constant time and is
> sublinear in the number of entries in the map

Counterexample: Cuckoo hashmap, at least for lookup.

Calculate both hashes. Index the table via each hash. Check both table indexes
to see if either match.

The only timing information this leaks is the length of the input string, and
that an adversary already knows.

Now, there will be some cache information potentially leaked, although there
are ways around that too. But this is constant time disregarding cache, and is
sublinear.

As for the question of "a nice constant-time comparison algorithm for (say)
160-bit values", it's relatively easy. Do a pairwise comparison of constant-
size chunks of the value, mapping <, =, and > to 01, 10, and 11, respectively.
Then recurse as necessary. Of course, this assumes you have a constant-time
comparison function for a single chunk.

~~~
Retric
You likely to leak a fair amount of information simply from cache misses, but
also things like branch prediction. Constant time operations in x86 is
surprisingly hard.

~~~
TheLoneWolfling
A cuckoo hashmap doesn't have any data-dependent branches, assuming the hash
function itself is thusly secure.

And as for cache misses - what exactly is a cache miss going to tell you in
this case? All it'll tell you is that that an input having a hash close to
either hash of your input wasn't accessed recently. That, as far as I can
tell, doesn't leak anything, assuming your passwords are salted.

------
pbsd
> One problem is, I don't know of a nice constant-time comparison algorithm
> for (say) 160-bit values. [...] I would appreciate any pointers to such a
> constant-time less-than algorithm.

The following should be constant-time relatively to the contents of its
inputs:

    
    
        // return -1 if x < y, else 0
        static unsigned lt(unsigned x, unsigned y) {
          return -((x - y) >> (sizeof(x)*8-1));
        }
    
        // return -1 if x != y, else 0
        static unsigned ne(unsigned x, unsigned y) {
          const unsigned d = (x - y) | (y - x);
          return -(d >> (sizeof(d)*8-1));
        }
    
        // return 1 if x[0..n-1] is lexicographically less than y[0..n-1], else 0
        int less_than(const unsigned char * x, const unsigned char * y, size_t n) {
          unsigned flag = 0;
          unsigned done = 0;
          for(size_t i = 0; i < n; ++i) {
            flag |= lt(x[i], y[i]) & ~done;
            done = ne(x[i], y[i]);
          }
          return flag & 1;
        }

~~~
tedunangst
This is very similar to the algorithm OpenBSD uses in timingsafe_memcmp().

[http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string...](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string/timingsafe_memcmp.c?rev=1.1&content-
type=text/x-cvsweb-markup)

~~~
pbsd
Indeed. The two main differences are that I didn't care about returning {-1,
0, 1} to emulate `memcmp`, and that I don't rely on implementation-defined
behavior (signed shifts being arithmetic).

------
daveloyall

        //quantize run duration to mitigate timing attacks
        checkPasswordWrapper(...) {
            result = checkPassword(...)
            until (getMilliseconds() % 250) {
                sleep(1) //Needs optimization
            }
            return result
        }

------
kazinator
You can gate the completion of each hash lookup with an absolute-sleep, e.g.
using the POSIX function clock_nanosleep with a flags value of TIMER_ABSTIME.

Before the hash lookup, calculate a set time into the future, say 300 ms. Then
do the hash lookup. Then wait until the predetermined time and return the
result.

Waking up a thread at a specific time is just real-time programming 101.

Another idea: if you have some context information that lets you identify that
queries are coming from the same entity (same IP address, same tty, same
operating system user, whatever), you can impose a shadow ban after 10
unsuccessful tries: fail even if there is a hash "hit", and randomize the
times to feed the attacker junk timing data.

~~~
kabdib
You have to ensure the attacker can't measure temperature-induced clock drift,
because you can still measure the amount of work the CPU is doing to
surprising accuracy. You can probably introduce extra busy-work, or farm the
work out to satellite servers whose timing jitter can't be observed at the
microsecond level.

You can also run your lookup against uncacheable memory, so that the attacker
can't detect hot keys. There's still some memory timing analysis available
because of DRAM banking, though.

Security is a process :-)

~~~
kazinator
Or you could put a heater with temperature control on the CPU. Oh look, just
take that one out of the soldering iron ... :) No wait, sync the clock to a
remote external source.

------
Animats
One approach is to use a hash table where each entry has a block of values,
not a linear list. The entire block is always tested linearly, even though
most of the values are null. Block overflow means an expensive operation to
increase the size of the hash table or the blocks, but that's an insertion-
time issue, not a lookup-time one.

There's still the problem of cache noise.

------
TheCoreh
So, hash tables don't really work like this. There's no reason for the
"FOOBAR" password to be stored close to "FOOBAZ", if you pick a well behaved
hash function. You might get to know how many passwords are on the same hash
bucket, but that's it. Furthermore, the authentication will be accessed over a
network. The network latency (and it's variance) is several orders of
magnitude larger than the possible variations. It would be seriously hard to
measure something like this with the resolution needed (nanoseconds). Even if
you're taking samples and doing an average based guess, you'd need an absurd
number of samples (so a bruteforce attack on passwords would be more
feasible?)

Edit: Actually, even if by coincidence you did get FOOBAZ on the same bucket
as FOOBAR, you wouldn't be able to easily distinguish from a single comparison
with 5 equal characters from multiple comparisons with smaller number of equal
characters. Also, if you're getting that many collisions to the point that
this is a problem, you've sized your hash table incorrectly.

------
sjm
For the problem mentioned in the post (determining whether a given password is
valid), how about using a Trie[0]? Wouldn't this work perfectly in this
situation?

[0]: [http://en.wikipedia.org/wiki/Trie](http://en.wikipedia.org/wiki/Trie)

~~~
salmonellaeater
The author's point about memory access being non-constant still holds for a
trie. Nodes that have been recently accessed will still be in cache and will
load faster.

The quest for a sublinear algorithm is futile in the face of caching. The
solution is similar to the solution to branch-based timing attacks: don't
branch, in code or in data. Every comparison must follow the same code path
and the same memory path, and that means reading every element of the data
structure every time you search.

------
daveloyall
I'm embarrassed to admit that I thought "timing attacks" had something to do
with race condition bugs. ...Crap, I've got some reading to catch up on.

------
masonium
Or, ...add sleep(rand(..)) to each lookup, regardless of result.

~~~
NeutronBoy
This is suggested every time timing attacks are discussed. This is not a good
mitigation. It increases the number of requests required to complete a timing
attack, but in the end all of your rand() calls average out and you still see
timing differences.

~~~
cousin_it
OK, then why not have the sensitive operation always take 100ms? If it
finishes early, just sleep until the 100ms mark.

~~~
daveloyall
And in case it takes more than 100ms or a widely variable amount of time:
[https://news.ycombinator.com/item?id=8691076](https://news.ycombinator.com/item?id=8691076)

