"...the OpenSSL team committed two code changes relevant to this work. The first adds a “constant-time” implementation of modular exponentiation..."
"The execution time of the “constant-time” implementation still depends on the bit length of the exponent, which in the case of DSA should be kept secret [12, 15, 27]. The second commit aims to “make sure DSA signing exponentiations really are constant-time” by ensuring that the bit length of the exponent is fixed."
"...While the procedure in this commit ensures that the bit length of the sum kq is fixed, unfortunately it introduces a software defect. The function BN_copy is not designed to propagate flags from the source to the destination. In fact, OpenSSL exposes a distinct API BN_with_flags for that functionality..."
"In contrast, with BN_copy the BN_FLG_CONSTTIME flag does not propagate to kq. Consequently, the sum is not treated as secret, reverting the change made in the first commit..."
Exploitation requires a local 'spy' process recording timing signals while the handshakes are running. I assume this is an unprivileged process, otherwise wouldn't the key be directly accessible?
Of course, there may be other ways to extract the same data remotely. Bernstein's earlier paper demonstrating cache-timing attacks on AES over the network is an example. He sent many packets of different sizes to evict different lines from cache. Compared to FLUSH+RELOAD, Bernstein's technique is extremely low-resolution; I don't believe anyone has ever demonstrated it against a typical, real-world server configuration.
"...our client saves the hash of
the concatenated string and the digital signature raw bytes
sent from the server. All subsequent messages, including
SSH_MSG_NEWKEYS and any client responses, are not required
by our attack. Our client therefore drops the connection at
this stage, and repeats this process several hundred times to
build up a set of distinct trace, digital signature, and digest
tuples. See Section 6 for our explicit attack parameters.
Figure 3 is a typical signal extracted by our spy program in
parallel to the handshake between our client and the victim
This inclines me to believe that an attack can be executed via connections to a vulnerable SSH server, not from a local process on the SSH server.
"Similar to Section 5.1, we wrote a custom SSH client that launches our spy program, the spy program collects the timing signals during the handshake. At the same time it performs an SSH handshake where the protocol messages and the digital signature are collected for our attack."
I think that 'spy process' must be local to observe the low-level cache timings they are using to perform the attack, but I haven't found anywhere they come out and say that explicitly.
"...we wrote a custom SSH client that
launches our spy program, the spy program collects the timing
signals during the handshake..."
I have a bad feeling that this means an attacker with a custom client can extract private key information over the network by repeatedly establishing connections with a vulnerable server.
I'm not sure how it was missed that the constant-time code path wasn't actually being used in this case. If you put your security-critical DSA bugfix behind a flag, wouldn't you check the flag actually propagated to the internal function for DSA operations?
Probably the trap was since the constant-time algorithm was the new code, "backward compatibility" would suggest adding a flag to select the new algorithm.
In that world, you just rent some time on a large AWS instance and sit around mucking with the cache timing waiting for another tenant to do some cryptography.
I can't find the link to the paper now, but my thoughts went to combining these two immediately -- you can get a local process without remote code execution if you just log onto your shiny new multitenant virtual server.