

Cache Missing for Fun & Profit - jacquesm
http://www.daemonology.net/papers/htt.pdf

======
tptacek
Jacques Mattheij seems floored by the idea that you could look at something as
obscure as cache timings in a hyperthreading-enabled processor and use it to
deduce crypto keys. He's right: it's a clever attack. But if it _surprises_
you, please count it as another reason you should never implement your own
crypto.

For the past 10 years or so, crypto researchers have been pawing through the
x86 microarchitecture (microarchitecture: the part of the processor that isn't
directly exposed in the instruction set; best example: cache) looking for side
channels. They've found a lot of scary stuff.

For instance, every time your processor takes a branch, it's caching the
computed branch target; it does that so that it can predict branches, so that
when it snorfles instructions into its pipeline, it doesn't have to chuck them
when a branch goes the "wrong" way. But the branch cache (the BTB) is finite.
You can write a process that continuously profiles branch timing to deduce the
state of the branch caches, and if that code runs alongside crypto code, it
can make predictions about key bits (that's Aciicmez's paper).

There's every reason to believe that there are areas of the microarchitecture
that aren't documented that have even worse effects.

There is actually a practical point to pull out of this (and here I've buried
a lede): it is very probably unsafe to run "secure" code (SSL credit card
processing, etc) on shared hosting of any sort. A motivated attacker can
probably get their code running on the same physical hardware, and the odds
right now say that attacker can pull key bits out of your processes.

~~~
DarkShikari
An acquaintance of mine implemented a solution to this problem: it's an
implementation of AES that used no tables _and_ no branches; it used the
shuffle instruction (pshufb) in a very creative way combined with a different
interpretation of the AES algorithm to make it possible. It should be
invulnerable to (at least standard) timing and cache attacks. For bonus
points, it was also far faster than most standard implementations.

Link to the paper: <http://mikehamburg.com/papers/vector_aes/vector_aes.pdf>

~~~
tptacek
All mainstream modern crypto libraries are hardened in one way or another
against timing attacks (Nate Lawson can probably be counted on to jump in here
and correct us on this), but to my mind the bigger concern is that we simply
don't know what trails our code is leaving inside the processor. Processor
microarchitecture is vast, undocumented, complicated, and intricate.

~~~
jacquesm
Essentially you'd have to audit the entire cpu design to be sure that you
didn't leave any fat fingerprints all over the place. Any flip-flop that had
changed state in a recoverable way would be a potential security leak.

Good luck getting that kind of access to a cpu design though, and even better
luck trying to figure out _all_ the angles of attack, miss one and someone
else has a vector, silicon is hard to change after the fact.

What is interesting here to me about this whole subject is that it has
apparently long ago gone from 'theoretical' to 'practical', and you'd be hard
pressed to prove that you didn't lose a key or two (or all of them) to an
attacker sophisticated enough to do his snooping and then disappear quietly.

After all, any hack or hacker that you read about or hear about wasn't as good
as they could have been, the ones you never hear about are the ones to worry
about.

For me this whole idea of having someone plant a piece of software on a
machine that I own without me detecting it was a prime reason in choosing not
to do my own card processing (even though in the dark ages I wrote 'webpay',
which later became the foundation for a major IPSP).

I realized that by simply having a page hosted that accepts credit cards on a
https connection I would basically be putting up an open 'hack me' invite.

~~~
tptacek
You want to google "Remote Timing Attacks Are Practical", which is the
landmark paper in this subfield and also the one crypto people use to scare
their children around campfires with.

What you're really observing is the flip side of the "covert channel problem",
which has been a well-known Hard Problem driving systems research since the
'70s (it's part of Saltzer/Schroeder). Systems security people generally
concede that the covert channel problem is impractical to solve in theory, and
have resigned themselves to occasional fire drills as theory becomes practice.

~~~
jacquesm
Maybe I'm thinking about this too simply, but wouldn't fugding the timing on
any cryptographic operation in a random fashion make it (much) harder to
deduce the keys ?

~~~
tptacek
No. This is basic signal processing. Increase the noise, and all you do is
increase the number of measurements needed to recover the signal.

This is a really common misconception about defeating timing attacks.

A more pernicious problem is that many of these microarch side channel attacks
don't rely on direct timing of crypto code, so much as on microarch artifacts.

(For what it's worth, the specifics beyond this get to a place where I'm out
of my depth and Colin and Nate are still in theirs.)

~~~
jacquesm
Right, I get that, increase the noise by a factor of two and you'd need more
samples, increasing the chance of detection but ultimately not making the
problem much harder.

But every factor of two that you make it harder is effectively one bit extra
on the key that you're trying to find right ?

Assuming that's true (going out on a limb here) the more you are willing to
waste time the harder it will get for an attacker to recover the keys, so
ultimately it is a measure of efficiency vs security.

As for micro architecture, I don't know much about that other than reading up
on the 68K when it came out and how it functioned at the block level, but one
way to side-step that issue completely if you want to write bullet proof
crypto code (if that is even possible) would be to write it for a controller
on a separate plug in card (pci, possibly usb) that would be permitted to
allow exactly one crypto operation per fixed interval of time to be performed.

That way you could effectively 'hide' the key and all its side-effects from
view from both local and remote attackers.

It would then take at a minimum physical access to the card to monitor its
buses in order to recover they keys.

------
cperciva
Are there any scribd people around here? I sent the required DMCA notice to
have this taken down on 2008-10-04, and I believe that it was taken down at
that point; do I have to send a new DMCA notice every time the same document
gets re-vacuumed?

~~~
tptacek
It's a scientific paper. Why do you want it taken down?

~~~
cperciva
I'd prefer to have people reading it from my website. Among other issues, this
allows me to see from where it is being linked.

------
jacquesm
That's one of the most amazing side channel hacks I've ever heard of, I just
ran in to this reading up on how hyperthreading works on the i7 and what the
consequences are for performance.

Trust cperciva to use that to translate hyperthreading in to a security issue.

Wow.

~~~
cperciva
_That's one of the most amazing side channel hacks I've ever heard of_

Thanks! The Osvik-Shamir-Tromer attack on AES via hyperthreading (the same
shared-cache issue, just a different target) was pretty neat too, although I
must say that I find my attack on RSA to be more aesthetically pleasing thanks
to its ability to steal a key by observing just a single cryptographic
operation.

~~~
jacquesm
Not to put too fine a point on it, but I'm really completely floored by taking
this from the general idea of a resource monitoring attack to actually
figuring out how to retrieve the contents of the cache and working out which
bits map on to which bits in the key.

Did you ever get it to a practical level where you were able to retrieve a
complete key on a live system in a substantially reduced time ?

How much knowledge would you have to have about the process being monitored
(in most cases I assume you would not have the luxury of knowing in advance
that someone was going to run a key generator, so I assume that you'd have to
be monitoring all the time, but then you have to pick the cryptographic
activity out of the background of all activity by all processes) ?

(what I got from the article is that you basically stopped at the point where
you retrieved a significant number of bits, but that leaves as guess as to how
long it would take to recover the remainder, through brute force or otherwise,
obviously every retrieved bit weakens the number of bit substantially, in
other words, how practical is this attack ?).

Has intel done something againt this kind of attack on the i7, from the
references I take it that you did this work before the i7 was released ?

Have you heard of this attack being used in the wild ?

edit: re-reading the paper, more questions :)

Do the people that write cryptography code spend time making sure that the
code will generate as little opportunity to do this by ensuring that the
length of the codepaths and the memory footprints of '1' bit operations in
keys is the same as it is during the processing of '0' bits in keys ?

~~~
cperciva
_Did you ever get it to a practical level where you were able to retrieve a
complete key on a live system in a substantially reduced time ?_

I retrieved enough key bits to make the rest easily computable, yes.

 _How much knowledge would you have to have about the process being monitored
?_

Cryptographically speaking? None. Operationally speaking, if you don't know
what program is running (or more importantly, what cryptographic library is
being used) it may take you a few hours of "eyeballing" the timings before you
see something which looks like a modular exponentiation -- but once you find
it, you should be able to extract key bits (if the cryptographic code is
vulnerable, that is).

 _you retrieved a significant number of bits, but that leaves as guess as to
how long it would take to recover the remainder_

Recovering the remaining bits is just a problem of modular algebra --
computationally speaking it's trivial.

 _Has intel done something againt this kind of attack on the i7_

I don't know. I've asked, but they don't want to talk to me.

 _Have you heard of this attack being used in the wild ?_

Not exactly... but I've heard reports of certain government agencies suddenly
getting very interested in how to defend against this sort of attack, so I
suspect that some spooks are aware of this attack being used.

~~~
jacquesm
Do you plan on verifying if this vulnerability is still present in nehalem/i7
?

What about multi-core and level 2 cache monitoring, is that also feasible or
do you get too much noise ?

~~~
cperciva
_Do you plan on verifying if this vulnerability is still present in nehalem/i7
?_

If I have the time and opportunity, probably. Right now I don't have an i7
processor or the time to spend on it.

 _What about multi-core and level 2 cache monitoring, is that also feasible or
do you get too much noise ?_

Every L2 cache I've seen so far leaks data too slowly to be useful as a
cryptographic side channel (at least for attacks like mine).

~~~
jacquesm
Thanks for all your answers, if you ever get around to this and need access to
an i7 let me know.

