

NIST may not have you in mind - dmit
http://www.imperialviolet.org/2012/10/21/nist.html

======
blazingice
This article focuses pretty heavily on the possibility of cache timing attacks
against AES, and cites djb's original work along with Tromer/Osvik's
publication in 2005.

Last week at CCSW, we published a paper[1] detailing our attempts to bring
these attacks to bear against Chromium.

In short, we don't see AES cache timing attacks as possible on more recent
processors, and especially so once you factor in the sheer size of modern
architected code.

[1] <http://cseweb.ucsd.edu/~kmowery/papers/aes-cache-timing.pdf>

~~~
tptacek
This is very cool, thanks for posting.

DJB's attacks were from a remote attacker's vantage point. But your paper also
takes on Osvik and Tromer, who used "spy processes" to continuously probe
local caches to create traces that could be analyzed for key information. I
know your paper mentions branch prediction and says you don't have results for
it, but what's your take on whether Aciicmez's BTB attack is going to remain
viable?

I thought the BTB attack was the cleverest and most disquieting of the bunch,
in that it suggested that we don't even know enough about the x86-64
microarchitecture to predict what side channel vulnerabilities we might have
in software AES.

Regarding the paper itself: the most provocative claim it makes is that we're
trending towards "complete mitigation" of cache side channel attacks. You give
two reasons: AES-NI and multicore systems.

The AES-NI argument seems compelling but a little obvious, in the same sense
as one could have argued that peripherals that offloaded AES would also blunt
attacks against software AES. AES-NI blurs the line between software and
hardware AES, but it's still a hardware implementation.

Another argumentative point that could be made here is, AES-NI mitigates
cache-timing attacks against systems that use _AES_. It doesn't do much good
if you can't use AES, since the most popular block ciphers that compete with
AES are also implemented with table lookups.

I found the multicore argument a lot less compelling, since it relied in part
on the notion that attackers wouldn't easily be able to predict the cache
behavior of their target multicore systems. It seems to me that the most
likely environment in which cache timing attacks are going to be a factor on
the Internet is shared hosting environments, in which attackers with the
sophistication to time AES are easily going to be able to get a bead on
exactly what hardware and software they're aiming at. Most users of AES are
also using off-the-shelf hardware and software.

~~~
blazingice
Aciicmez's BTB attack looks at the branch predictor, and is potentially valid
against any implementation which branches based on sensitive data. There's a
whole class of these attacks which look at instruction paths, including a new
one by Zhang et. al. against ElGamal at CCS this year, but they usually target
asymmetric ciphers. In particular, since AES doesn't have key-dependent
branching, these attacks don't apply.

I do agree with you that x86-64 is extremely complicated, and that new attacks
might crop up due to some future optimization.

As for the paper:

Yeah, AES-NI is sort of the final hammer against AES cache timing attacks,
since it doesn't use the cache at all, but I felt that a paper on AES cache
timing would be remiss without mentioning it :)

There are two parts to the multicore argument: the first is that it
complicates things massively, and the second is that it can be a complete
mitigation if used properly.

First is the complication bit, and that's just saying that the attacker must
understand almost everything about the multicore implementation, including
multilevel cache behavior and (possibly non-deterministic?) replacement
strategy. I'm willing to believe that, were this the only hurdle, a dedicated
attacker could still succeed. I was looking at a single core machine, so I
didn't have to deal with the complexity here.

For the complete mitigation, you need to rely on platform support for core
pinning. If you're allowed to say "I want to do encryption now, give me my own
core for 400ms", then, since the 4KiB T-tables fit into your core's L2,
attacker threads on other cores just can't examine them during use. This
complicates the VM hosting model and might be a decent DOS attack, but it does
completely stop cache probing attacks.

Finally, as you said, my work can really only apply to AES on the x86 on the
desktop. Change one of these variables (such as AES to ElGamal or RSA or
Blowfish), and side channel attacks might still exist. Such is the problem
with negative results :)

~~~
tptacek
This was fun to read; thanks. It's interesting how side channel attacks can be
both assisted and complicated by new hardware; usually, advances in hardware
tend to favor attackers slightly more than defenders, but even just by pushing
operations below attacker measurement thresholds --- without even trying, that
is --- hardware makes some side channels very hard to exploit.

If you're an HN'er reading along at home, Aciicmez' BTB timing paper (you
should just be able to Google that) is very very very cool. They not only
realized that you could theoretically watch the caches used by the branch
predictor to build a trace from which you could recover RSA keys, but also
came up with a very simple way to profile those branch predictor caches; that
is, they designed a "spy process" like Osvik and Tromer did for memory caches
that targeted the BTB instead.

------
dfc
I do not understand the jump from the NSA having a history of building systems
from the chip up to reasoning by analogy that the same is true for NIST (The
shared worldview link is 20 years old). I'm not disagreeing with the
statement, I just do not see any support for the conclusion that NIST's is bad
for the general public because unlike NIST's target customers we are not
building custom chips.

Can anyone shed any light?

~~~
tptacek
NIST doesn't build systems. It standardizes technology for the US. NIST
standardizes far more than just crypto algorithms, but in the crypto cases,
NSA reviews potential standards before the standard is published, for
suitability to DoD. It is entirely reasonable to propose that NSA pushes NIST
towards standardizing crypto that NSA is in a better position to use than
industry is.

~~~
dfc
I was hoping you would respond, thanks tptacek. In light of your comment, is
it reasonable to assume that NSA is going to supply the custom chips to the
rest of the federal government? Given federal procurement standards it seems
that the majority of federal IT departments rely on industry to provide
hardware. Is it still reasonable to propose that NSA pushes NIST in a
direction that serves NSA's interest at the cost of weakening other
governmental agencies? What is the implementation deadline for federal use of
SHA-3? Is it unreasonable to assume that the standards committee expects SHA-3
hardware implementation similar to AES-NI?

On a related note AES is the NIST standard for protecting sensitive but
unclassified information:

 _"Applicability. This standard may be used by Federal departments and
agencies when an agency determines that sensitive (unclassified) information
(as defined in P. L. 100-235) requires cryptographic protection.

Other FIPS-approved cryptographic algorithms may be used in addition to, or in
lieu of, this standard. Federal agencies or departments that use cryptographic
devices for protecting classified information can use those devices for
protecting sensitive (unclassified) information in lieu of this standard."_
[1]

I have always assumed that this scope limitation within FIPS197 meant that
NSA, DoD, Secret Army of Northen VA, etc had a different standard/requirements
(NSA Suite A&B) for classified (and up) information. Is this the case? If so
why would NSA have so much skin in the game if they were not restricted to
FIPS197 requirements?

[1] <http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf>

~~~
m0nastic
I wouldn't expect the NSA to be involved with sourcing of hardware for the
rest of the federal government (unless the continued fear around supply chain
management completely takes over, which I suppose is possible, as it's
probably the number one concern presently among a lot of agencies).

While they exert a disproportionate amount of guidance to other agencies, that
is generally orthogonal to their primary motivations.

You can certainly make a case that NSA pushes NIST towards making guidelines
for things that they favor, but I don't seriously believe that's at the
expense of weakening other agencies (at least not as a goal).

As to classified and up information, the NSA can't get enough ECC, and many of
the Suite B implementations of other standards are just a version of the
standard that works (and is "certified") to use Elliptical Curve Cryptography
(like TPM Suite B, which we work on).

~~~
dfc
I apologize for not being clearer about NSA's motivations. I did not mean to
imply that there was any malicious intent when it comes to weakening federal
IT standards. (It seems that NSA would be aware of the negative side effects
of their recommendations.)

Could NSA's hardware centric recommendations be motivated by an interest in
leveraging economies of scale (due to the size of federal IT procurement) and
purchasing COTS hardware that was optimized for AES?

~~~
m0nastic
It's possible (although they're already procuring hardware at GSA-approved
rates, so I'm not sure if there's the same economies of scale that you see in
the commercial realm).

------
dguido
I just checked and all of the computers and devices I own for work have AES
hardware in them (Mac Mini, Macbook Air, iPhone). Maybe NIST thinks that,
through standardization efforts, they can encourage more people to integrate
such hardware over the long term?

The amount of hardware support that AES has already is pretty substantial:
<https://en.wikipedia.org/wiki/AES_instruction_set>

I'd rather not suppose there's something insidious going on here, just that
maybe NIST is taking a longer-term view than racing to put AES and SHA3 in
everything yesterday.

------
pbsd
Käsper and Schwabe's bitsliced AES [1] does not need very long streams to be
fast. It processes 8 blocks simultaneously, not 128 (as a 'pure' bitsliced
approach would), and therefore reaches peak performance at relatively small
lengths, starting at 128 bytes.

[1] <http://cryptojedi.org/papers/aesbs-20090616.pdf>

~~~
agl
Looks like I haven't been keeping up to date even with my own colleagues'
work! Thanks, I'll update the post.

------
Tobu
Hardware will evolve. CPU's design constraints — programs with low parallelism
and not much awareness of the memory hierarchy — have caused a bottleneck.
SHA-3 will end up as yet another specialty instruction, with the actual
programming done by the hardware vendor. For people who don't want to be
dependent on that, I imagine GPUs provide a faster and more flexible
alternative.

------
el_cuadrado
> try Salsa20 rather than AES

Salsa20 is a stream cipher, AES is a block cipher.

It is like saying 'Try GCC rather than Windows 8'.

~~~
dchest
In addition to what others said, there's even fewer differences between them
internally.

Salsa20 <stream cipher> is not a "traditional" stream cipher -- basically it's
Salsa20 <hash function> (not collision resistant) in counter mode. The hash
function itself is implemented as permutation with final addition of input
words to make it irreversible. With this permutation you can build a block
cipher, and by the way that's what SHA-3 finalist BLAKE does by introducing
addition of key words during rounds (except it's built on a variant of Salsa20
called ChaCha20).

(I'm not saying that you should build your own block cipher from Salsa20, of
course :)

The original sentence, though, is:

 _try Salsa20 rather than AES and Poly1305 rather than GCM._

AES-GCM is also not a "block cipher".

~~~
tptacek
For those playing along at home, AES-GCM is AES in CTR mode plus a variant of
Wegman-Carter MAC function to authenticate the data.

------
planckscnst
It seems strange that the author is complaining about AES speed. About a year
ago, I benchmarked an IPsec setup between two cheap routers with an ARM9
processor that did not have any special crypto blocks in it. AES significanly
outperformed the other algorithms I tried.

~~~
tptacek
You were almost certainly benchmarking an implementation of AES that relies on
table lookups for speed; those table lookups create a side channel
vulnerability, which was much of the point of this article.

~~~
planckscnst
Interesting; it was what's in the Linux kernel.

~~~
tptacek
Pretty sure that's an S-box AES. The issue is that when implemented (a) in
software (b) without reliance on large lookup tables --- ie, "securely" ---
AES is significantly slower.

------
Nursie
Intel have specific instructions for GCM that mitigate some of this stuff I'm
sure. I know this doesn't translate to 'NIST are keeping software
implementations in mind', but when these things are available on a few
processors that does make the software guy's job easier.

