
Robust and Efficient Elimination of Cache and Timing Side Channels (2015) - KirinDave
https://arxiv.org/abs/1506.00189
======
zeotroph
This seems to require user input

"In our solution, developers annotate the functions perform- ing sensitive
computation(s) that they would like to protect."

and assumes that the cache timing is used to gather info on a secret function,
so it does not consider the usecase of dumping everything a process can see?

------
comex
Their performance results depend on isolating “protected” functions that
access sensitive data, while keeping the rest of the code unprotected. That’s
fine in a sane world, but with Spectre, just about _any_ code can be tricked
into speculatively accessing sensitive data in its address space. So trying to
adapt this into a Spectre mitigation reduces to just flushing all CPU caches
on _every_ context switch (plus giving each core a separate chunk of the L3
cache, which is supposed to be shared). That would be… not pretty, from a
performance standpoint.

~~~
KirinDave
Consider though: a similar logic has lead us to not flush the cache, and here
we are.

Until we have better ways of reasoning about and restricting the code we are
running, it isn't going to get prettier. Conservative approaches like this may
not be best for every propose, but in hardened code like a hypervisor it might
make sense.

~~~
srett
Isn't the problem rather that we have instructions to control the cache at
all? As far as my research goes, things like CLFLUSH were introduced at the
same time as SSE2, which was what, the Pentium III era? We were apparently
doing fine without them before that, even with 4+ CPU systems.

I'm currently trying to get the spectre.c PoC to work without using CLFLUSH or
similar, but it doesn't look too promising yet. Then again, I only started an
hour ago and lack the ability to think in those creative and twisted ways that
lead to the discovery of this whole issue in the first place.

~~~
jcelerier
> We were apparently doing fine without them before that

well, feel free to return to using pentium2-like CPUs

------
esaym
This reads as if a poisoned CPU cache can indeed leak secrets and has been
known for many years now? Am I reading that right?

~~~
KirinDave
Even longer.

What's really wild is that they even got the rough proportion of security cost
for losing it down to 5% of the numbers were seeing with today's security
patches.

I submitted this because I think their technique is preferred for lots of
stuff distributed systems developers do with crypto primitives (as opposed to
retpoline).

There are lots of well-known things no one seems to know outside of Academia,
like how generalized sorting is O(n) and compilers that can prove the
correctness of real world code when it's written a specific way

~~~
tyilo
> generalized sorting is O(n)

What do you mean by generalized sorting and how isn't this affected by the
comparison sorting lower bound?

Do you have a source?

~~~
cryptonector
They probably mean the "postman sort", or whatever catchy name it goes by
(radix sort, IIRC). That doesn't involve any comparisons between elements to
be sorted. Think of how a postman sorts mail into P.O. boxes... you just
algorithmically map a number to a row and column (bin), and deliver. This
truly is O(N), but there's a catch: it doesn't work when you don't have bins
to sort into, or when you don't know even how many bins you'll need. If you
try to generalize this it becomes O(N log N).

~~~
KirinDave
Well... Actually. No. First of all, radix sort uses structural features to
arrive at linear time, it doesn't incomplrtely sort the input.

Secondly, I've tried to get the white paper exposed here and it generally goes
over like a lead balloon but:

[http://www.diku.dk/hjemmesider/ansatte/henglein/papers/hengl...](http://www.diku.dk/hjemmesider/ansatte/henglein/papers/henglein2011a.pdf)

Is the technique. It's an imposing 80 page paper even dedicated educators like
Edward Kmett has trouble explaining trivially, but there is a talk here by the
author to help sum it up:
[https://www.youtube.com/watch?v=sz9ZlZIRDAg](https://www.youtube.com/watch?v=sz9ZlZIRDAg)

We've had some of Henglein's associates here to talk about it, too.

There is a Haskell implementation and it can really speed up certain types of
operations. It's tricky to get the constants low in Haskell, but Kmett seems
to have done a pretty good job.

