Hacker News new | past | comments | ask | show | jobs | submit login
OpenSSL DSA key recovery attack (iacr.org)
184 points by bqe on June 11, 2016 | hide | past | web | favorite | 17 comments

Make Sure DSA Signing Exponentiations Really are Constant-Time

"...the OpenSSL team committed two code changes relevant to this work. The first adds a “constant-time” implementation of modular exponentiation..."

"The execution time of the “constant-time” implementation still depends on the bit length of the exponent, which in the case of DSA should be kept secret [12, 15, 27]. The second commit aims to “make sure DSA signing exponentiations really are constant-time” by ensuring that the bit length of the exponent is fixed."

"...While the procedure in this commit ensures that the bit length of the sum kq is fixed, unfortunately it introduces a software defect. The function BN_copy is not designed to propagate flags from the source to the destination. In fact, OpenSSL exposes a distinct API BN_with_flags for that functionality..."

"In contrast, with BN_copy the BN_FLG_CONSTTIME flag does not propagate to kq. Consequently, the sum is not treated as secret, reverting the change made in the first commit..."

Exploitation requires a local 'spy' process recording timing signals while the handshakes are running. I assume this is an unprivileged process, otherwise wouldn't the key be directly accessible?

Yes, the posited attacker is an unprivileged spy process sharing a cache with the victim. The FLUSH+RELOAD cache-timing algorithm they use relies on a shared cache. See section 2.2.

Of course, there may be other ways to extract the same data remotely. Bernstein's earlier paper[1] demonstrating cache-timing attacks on AES over the network is an example. He sent many packets of different sizes to evict different lines from cache. Compared to FLUSH+RELOAD, Bernstein's technique is extremely low-resolution; I don't believe anyone has ever demonstrated it against a typical, real-world server configuration.

[1] https://cr.yp.to/antiforgery/cachetiming-20050414.pdf

From section 5.2, Attacking SSH (emphasis mine):

"...our client saves the hash of the concatenated string and the digital signature raw bytes sent from the server. All subsequent messages, including SSH_MSG_NEWKEYS and any client responses, are not required by our attack. Our client therefore drops the connection at this stage, and repeats this process several hundred times to build up a set of distinct trace, digital signature, and digest tuples. See Section 6 for our explicit attack parameters. Figure 3 is a typical signal extracted by our spy program in parallel to the handshake between our client and the victim SSH server."

This inclines me to believe that an attack can be executed via connections to a vulnerable SSH server, not from a local process on the SSH server.

This is the only point where I wish the paper was more clear...

"Similar to Section 5.1, we wrote a custom SSH client that launches our spy program, the spy program collects the timing signals during the handshake. At the same time it performs an SSH handshake where the protocol messages and the digital signature are collected for our attack."

I think that 'spy process' must be local to observe the low-level cache timings they are using to perform the attack, but I haven't found anywhere they come out and say that explicitly.

Exactly. The paper is frustratingly unclear on what the exploit scenario is.

"...we wrote a custom SSH client that launches our spy program, the spy program collects the timing signals during the handshake..."

I have a bad feeling that this means an attacker with a custom client can extract private key information over the network by repeatedly establishing connections with a vulnerable server.

I believe they're talking about a test harness they built, to synchronize collection of cache traces with the execution of the handshake. FLUSH+RELOAD requires a co-resident attacker (they need to be on the same hardware, though not necessarily the same VM).

Random network delays would probably hide any meaningful timing signal in the case of remote connections, thus they used a local process.

One might expect this to be the default case, but it may not be so, given a large enough sample size.

Found further discussion here: http://seclists.org/oss-sec/2016/q2/514

One would presumably not be seeding keys with data pulled from network delays. Too predictable.

Oh, and in case you're wondering why was the constant time implementation put behind a flag anyway, the paper explains it's because the constant-time algorithm is slower than the alternate implementation, so constant-time is only supposed to be used when it's actually needed to protect a secret. Of course that opens the potential for not using constant-time when you were supposed to, which is exactly what happened here.

I'm not sure how it was missed that the constant-time code path wasn't actually being used in this case. If you put your security-critical DSA bugfix behind a flag, wouldn't you check the flag actually propagated to the internal function for DSA operations?

What I don't get is why the _default_ isn't made the safe choice, with the more performant option being available for when it actually matters?

I think this would have been a much smarter way to code it! Default slow and safe, require an added flag for more performance. That would make it much harder to end up on the timing-sensitive / high performance implementation by mistake.

Probably the trap was since the constant-time algorithm was the new code, "backward compatibility" would suggest adding a flag to select the new algorithm.

You might be misinterpreting the code. Not all algorithms are vulnerable to this attack, so OpenSSL might be picking the faster version for those specific algorithms, and the constant time version for the others.

Needs to listen to only a few hundred handshakes. Ugh. Something tells me this could be deployed on AWS large instances with some success.

As others have mentioned, the attack requires a local process that has a shared cache, so it would also require a server that has a remote code execution vulnerability.

My memory is that there are ways to punch through virtualization that AWS uses and get insight into cache hits/misses of other tenants, but I'm not sure if that applies here.

In that world, you just rent some time on a large AWS instance and sit around mucking with the cache timing waiting for another tenant to do some cryptography.

I can't find the link to the paper now, but my thoughts went to combining these two immediately -- you can get a local process without remote code execution if you just log onto your shiny new multitenant virtual server.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact