
CREAM: the SSL attack you’ve probably never heard of - bascule
http://tonyarcieri.com/cream-the-scary-ssl-attack-youve-probably-never-heard-of
======
sdevlin
This attack is conceptually simple, and the linked paper
([http://cr.yp.to/antiforgery/cachetiming-20050414.pdf](http://cr.yp.to/antiforgery/cachetiming-20050414.pdf))
is very approachable for the layman. I recommend giving it a read.

Here's the basic idea:

    
    
      1. The first thing the AES encrypt function does is XOR user input against the secret key.
      2. The result of this operation is used to index into a lookup table (the S-box).
      3. The timing characteristics of step 2 will vary based on the contents of the cache.
    

By measuring the timings for a large number of random inputs and correlating
them by index, (e.g. sort all messages into buckets based on the value of the
first byte), we can recover the corresponding byte of the secret key.

Though it's not clear a priori what the timing characteristics _should_ look
like for a particular key byte, we can easily measure this on a machine we
control that matches the description of the target platform.

~~~
xiata
I know this is naive as hell, but would adding some random sleeps before
returning encrypted bits make it better or worse for this particular attack?

~~~
AlyssaRowan
_That_ kind of "blinding" is entirely ineffective, I'm afraid. It just
averages right out.

~~~
AnimalMuppet
If "sleep" means "sleep until the next timer tick", how can it average out?
Especially if the timer is started at the start of the encryption, and all
encryptions (at least, for one block) take less time than the timer is set
for. That means all times get set to exactly the timer time, and no
information reaches the attacker.

What am I missing?

~~~
bmm6o
"Random sleep" does not in general mean "sleep until the next timer tick". The
best fix is making the function constant time, if you can achieve this with a
sleep that makes the operation always take exactly one quantum then the sleep
is really an implementation detail and quite far from "random sleep".

~~~
e12e
I've realized this (and am certainly not alone in that, it's rather obvious,
when one think about the nature of timing) -- but does anyone have some links
on implementing this? Are there some (sequence of) x86_64 instructions that
can be used to bound a procedure (in the pascal sense of the word) to a
quantum, regardless of things like when the procedure is called (assuming the
procedure is short enough, and depending on branching behaviour, I suppose
instruction fetch/decode, data fetch/write and accompanying cache hit/miss can
make it hard to a) select a worst-run time in terms of clocks cycles to target
(and if so, for which concrete cpus) and b) be hard to make sure the cpu is
actually busy for exactly that many cycles...)? Is this even possible to
approach in this portable C99?

I suppose if one ignore the information leak due to possible change in cpu
load, one might device a kind of evented "call back" model, where one wait to
return the result of a procedure until an interrupt is triggered?

I don't expect a full answer, but if anyone has a link to some source code
that isn't _too_ complicated, I'd be very happy (either "real-world" or some
good "example" code).

~~~
bmm6o
The most common way to avoid timing side channel attacks is to write the
procedure in C or ASM in such a way that there are no data-dependent
differences in execution path. You've probably seen e.g. the memcmp that
doesn't exit early. This attack is a little different in that it's not that
different instructions are executed, it's that different memory access
patterns take different amounts of time. For that, you can maybe change the
implementation to not have any data-dependent array accesses, or maybe you can
do things with prefetching to make the memory accesses constant time.

An approach where you watch the clock will be inherently less portable and
actually much harder. Not only will the timing calls be hardware or OS
specific, but so will the worst-case time. Imagine having to deal with a chip
going into low power mode during your computation. Also you probably don't
want to count time that your thread wasn't scheduled to run, so now you're
talking about integrating with the scheduler.

------
feld
So what hearing is:

\- AES is vulnerable to timing attacks by design unless the implementation is
very careful. But even then no promise it is safe because of CPU hardware
nuances between models and architectures.

\- Use AES-GCM if you must use AES.

\- AES-GCM(?) is accelerated by the Intel AES extension and provides a
constant-time implementation, but you have to trust Intel's hardware isn't
backdoored.

\- Don't use AES for network communications if you can avoid it.

Is this correct?

~~~
zimmerfrei
>> \- Use AES-GCM if you must use AES

Without CLMUL instructions, GCM is either very slow or potentially sensitive
to cache-based side-channel attacks (regardless of AES being the underlying
cipher).

The EAX mode of operation is more secure - though not very popular for some
reasons and not part of any TLS cipher suite.

~~~
tptacek
Given Rogaway's recent patent grants, OCB would be a more straightforward
improvement over GCM.

~~~
tedunangst
Does anything use OCB? It seems GCM "won".

The patents may have killed it. Even "free for open source" can be troublesome
if you're worried about putting free software into an appliance and getting
trapped. Easier to avoid patented algos entirely.

~~~
AlyssaRowan
Rogaway's patent grants are now _very_ liberal: they cover all open source and
everything non-military. It passed through CFRG, too, and is documented in an
RFC:
[https://tools.ietf.org/html/rfc7253](https://tools.ietf.org/html/rfc7253)

But yes, definitely the fact that there were patent grants hurt it a _lot_ in
adoption before; even when (as in WiFi) it was one of the contenders, CCM is
more common.

A _few_ things do use it: off the top of my head, I think Mumble does,
although I think that's an earlier variant (OCB2, perhaps, rather than OCB3 as
documented in the RFC?).

I'm also looking forward to the results of the CAESAR authenticated-encryption
competition -
[http://competitions.cr.yp.to/caesar.html](http://competitions.cr.yp.to/caesar.html)
\- there's a lot of competition, and quite a few entries fell and have been
withdrawn. The current version of OCB is among the current list of contenders,
among several other interesting candidates.

~~~
tedunangst
Specifically regarding the patent grants (of which there are three on
[http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm](http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm):
open source, non-military, and OpenSSL) they would appear at first glance to
cover OpenBSD. All three in fact. The problem is that this then creates a trap
for anyone taking OpenBSD and using it to build something to sell to a
military. Suddenly they are no longer protected; we prefer not to incorporate
anything that can create such traps.

For an example, this came up in the thread where OCB was proposed to be added
to OpenSSL. You think you're free and clear, and then you're not.
[http://marc.info/?l=openssl-
dev&m=136016226304441&w=2](http://marc.info/?l=openssl-
dev&m=136016226304441&w=2)

Then came the OpenSSL specific license. That license probably applies to
LibreSSL today, but now there's a Ship of Theseus problem. How much OpenSSL
does one need to keep to qualify? And of course, the OpenBSD IPsec stack is
completely unrelated to OpenSSL.

~~~
e12e
> Specifically regarding the patent grants (of which there are three on
> [http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm](http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm):
> open source, non-military, and OpenSSL) they would appear at first glance to
> cover OpenBSD. All three in fact. The problem is that this then creates a
> trap for anyone taking OpenBSD and using it to build something to sell to a
> military. Suddenly they are no longer protected; we prefer not to
> incorporate anything that can create such traps.

How so? As far as I can tell, the "open source" grant, covers everything under
a BSD license (among other licenses) -- and holds no provision for "military
use". I don't see how anyone using the [ed: algorithm, not code] under license
1, could become subject to license 3?

~~~
tedunangst
People take OpenBSD and turn it into not open source products all the time.
For a more famous example, FreeBSD is at the core of the Playstation OS, but
it's no longer open source.

------
tptacek
For this attack to get a stupid name, it needs to actually attack SSL/TLS. But
Bernstein got it to work only in a lab setting, with a target pessimized to
expose the vulnerability. That target, of course, did not use SSL/TLS.

Pedantry Points += 10

~~~
bascule
Normally I'd agree the way an attack makes the leap from an academic paper to
a cute name is a real-world PoC||GTFO, but c'mon, is this not the perfect
name?

------
ZoFreX
I'm really wary of the "don't invent your own crypto" mantra, so I want to
venture an idea I had here, rather than writing any code for it:

I understand the problems with adding a random delay to try to add "noise" to
the measurements, but what if the delay was non-random? Specifically, what if
the delay was calculated to make the whole operation always take constant
time?

Example:

User input comes in on thread A. Thread A sends the request to thread B and
delays for time t. Thread B does the encryption and sets the result. Thread A
resumes and returns the result.

If we pick 't' such that it is always larger than the amount of time it takes
to do the operation, any timing differences observed by the attacker won't be
correlated with the timing of the cryptographic operations. I think.

(NB: it could be something other than threads, such as "microservice B"
instead of "thread B", that particular detail isn't important)

~~~
sdevlin
A better strategy (and an area of research for Dan Bernstein, author of the
referenced paper) is to design crypto that doesn't leak this kind of side-
channel information. See, for example, his Salsa20 family of stream ciphers.

~~~
ZoFreX
I know that is a better strategy, but it's also a much much harder strategy :)
Even a well designed cipher could end up leaking info in practice due to
compiler quirks or implementation mistakes.

------
secabeen
Many SSL attacks seem to require thousands or millions of interactive sessions
or inputs. Is there a reason we aren't modifying our Internet-facing servers
to drop connections and discard ephemeral keys when a particular IP or set of
IPs performs actions that are outside the norm?

~~~
snowwrestler
Well I think the crypto nerds would like to design crypto systems that are
inherently (i.e. mathematically) resistant to such attacks.

But in general, I think you are right. I am astounded at how few networked
applications perform rate limiting. Wordpress, for example, does not ship with
any rate limiting on the login form. Brute force? Go ahead, give it a shot.

By comparison, Drupal 7 out of the box limits any IP to a small number of
quick login attempts before blocking that IP temporarily.

If your application is intended for human interaction, it just makes sense to
limit things to human speed. Maybe it's harder than I think it is, or maybe
people just don't think of it.

~~~
e12e
It's a tradeoff (like so much in security): limit an IP to N number of quick
login attempts, and it's easy for your students to DOS the Drupal-powered
school portal (assuming the school is behind a NAT, at least). Often you want
more security, and less convenience ... but it's not as easy as "most secure
all the time!".

------
justcommenting
from the blog post:

> Fortunately, Intel solved this problem… for the hyperspecific case of AES.
> Newer Intel CPUs (and also other vendors including ARM) now provide a fast,
> constant-time implementation of AES in hardware.

Others have pointed to particular aspects of Intel hardware that they don't
believe are 'backdoored', for some definition of backdoored. One point some of
these comments appear to be missing is that using things like AES-NI typically
_also_ means using things like RDRAND.

Whether you think there's any relationship between Intel's Bull Mountain and
NSA'S BULLRUN is up to you, but at the end of the day, the only way any of us
can know for sure whether RDRAND uses DUAL_EC_DRBG is to:

1) decide to take Intel at their word about RDRAND using only CTR_DRBG, or

2) undertake some difficult and probably-expensive hardware forensics to find
out

~~~
pbsd
DUAL_EC_DRBG would have been the dumbest possible way to backdoor RDRAND. It
would also imply that Intel is somehow able to slip in circuitry able to do
two 256-bit elliptic curve scalar multiplications in under 300 cycles, and
pretend it is AES circuitry which would normally be orders of magnitude
smaller and faster.

------
Blackthorn
CREAM? We Wu-Tang now?

