
Spectre is here to stay: An analysis of side-channels and speculative execution - matt_d
https://arxiv.org/abs/1902.05178
======
phkahler
>> These attacks leak information through micro-architectural side-channels
which we show are not mere bugs, but in fact lie at the foundation of
optimization.

So we'll need to have non-speculative execution for cloud CPUs and stronger
efforts to keep untrusted code off our high performance CPUs. This may even
lead to chips with performance cores and trusted cores.

~~~
Tuna-Fish
No. The paper notes that Spectre can, and will in the future be able to defeat
all programming language level techniques of isolation. With properly designed
OoO, Spectre cannot defeat process isolation. The fundamental lesson that
everyone must take to heart is that in the future, any code running on a
system, including things like very high-level languages running on
interpreters, always have full read access to the address space of the process
they run in. Process isolation is the most granular security boundary that
actually works.

Or in other words, running javascript interpreters in the same address space
as where you manage crypto is not something that can be done. Running code
from two different privilege levels in the same VM is not something that can
be done. Whenever you need to run untrusted code, you need to spin up a new
OS-managed process for it.

~~~
pcwalton
So this is something that I've never gotten a full answer to: what is the
difference between a "thread" and a "process" in this model?

This isn't a facetious question. A thread is just, at its core, a process that
shares memory with another process. (In fact, this is how threads are
implemented on Linux.) But _all_ , or virtually all, processes _also_ share
memory with other processes. Text pages of DLLs are shared between processes.
Browser processes have shared memory buffers, needed for graphics among other
things.

What separates processes that share memory from threads that share memory
regarding Spectre? Is it the TLB flush when switching between processes that
doesn't occur between threads? Or something else?

~~~
twtw
For meltdown (spectre v3, iirc) It's not so much sharing memory as sharing
address space. Processes have different page tables. Threads within a process
share page tables.

For spectre v1 and v2, right now (on existing hardware) mostly nothing
separates threads from processes. In the future, process isolation is a good
candidate for designing hardware + system software such that different
processes are isolated (via partitioning the caches, etc).

You probably still want threads within a process to share cache hits.

~~~
pcwalton
So, if that's true, why is Chrome considered to have solved Spectre? Browser
content processes from different domains share some memory. Moreover, if
process boundaries don't have any effect on the branch predictor on current
hardware, then why is process separation relevant at all? Doesn't all this
mean Spectre is still an issue?

~~~
twtw
I guess I jumped the gun a bit in my comment above.

In terms of the _possibility_ of exploit, as I understand there isn't at this
point any isolation between processes.

In terms of the _ease_ of exploit, being able to run untrusted code in the
same process as the victim helps quite a bit. Otherwise, you have to find a
gadget (i.e. qualifying bounds check for v1, indirect branch for v2) in the
victim process that you can exploit from the attacker process. Possible, but
quite a bit harder than making your own gadget.

This all ignores the forward looking reasons process isolation is a good idea.
I can't keep track of the latest mitigations in Linux, but they pretty much
all will only help between processes by flushing various hardware data
structures. And hopefully someday we will have hardware actually designed to
restore the guarantees of isolation between processes.

I'm pretty sure this is accurate, but I'm just a random guy on the internet so
don't trust my word for it too much.

~~~
zzzcpan
It's not really about process isolation then, but the amount of control
untrusted code can have over a process. Which means if everything that code
can do is masked to some part of the process, it should be able to achieve the
same isolation between such subprocesses but within the OS process boundaries.
Although the paper claims this is too hard.

------
rickmode
How about CPUs without speculative execution and simultaneous multithreading
(SMT / Hyper-Threading, which has similar issues)? We would, of course, need
other optimizations to claw back the performance loss--an engineering problem
I feel we can solve.

I've wondered if the solution is more, simpler cores. We concentrate on
smaller, faster cores, and the programming to utilize them better. Perhaps
advances in memory architectures as well. Hardware isn't my specialty, so I'm
just brainstorming here.

Perhaps this is where ARM and even RISC-V based systems can step in.

But I'm a software guy, so what do I know? I just know I'd feel more
comfortable with systems based on simpler CPUs that just cannot be exploited
by the recent side-channel attacks discovered, rather than trying playing
whack-a-mole with patches, along with trying to reason when it might be safe
to use CPUs with these optimizations.

~~~
dnautics
I don't know why this is being down-voted.

A few things: ARM and RISC-V definitely have specEx baked in (though you can
not include SpecEx module on RISC-V). There are interesting alternatives to
SpecEx. DSPs use delay slots, and I've seen delay slots used quite well in a
GP-CPU. Getting high instruction saturation on a CPU with delay slots is a
"hard compiler problem", but I have a few things to say about that:

Despite jokes about "better compilers", compilers _are_ getting better (e.g.
polyhedral optimization). One way to think of what OOOex/SpecEx is that it's
figuratively the CPU JITting your code on the fly. The most popular
programming language JITs aggresively anyways so one wonders if there isn't
some reduplication going on.

Furthermore, the most popular programming language isn't entirely the most
raw-power performant, and it's pretty clear that in our current ecosystem just
pushing operations through the FPU (which is what x86 optimizes for) isn't
necessarily the most important thing in the world; uptime, reliability, fault-
tolerance, safe paralellization, distribution, and power conservation might be
more important moving forward.

HM, oops, apparently RISC-V has OOOEx, not SpecEx.

~~~
xenadu02
> Despite jokes about "better compilers", compilers are getting better

The compiler has to make static decisions. The hardware knows what is actually
happening. There is an inherent information asymmetry at work that a
"sufficiently smart" compiler seems unlikely to overcome.

My intuition says software can't beat the speed of a superscalar OOO CPU
anymore than a GP CPU can beat a roughly equivalent DSP for algorithms
suitable to run on the DSP, but I have no proof for that.

I'll also note that we've been promised "smarter compilers" for decades. Intel
has tried that route several times. No one has ever made it work.

~~~
repolfx
I think it's definitely worth exploring this angle because modern JIT
compilers have become very advanced, and there's still a lot of juice left to
squeeze there. Look at some of the things Graal is doing and it looks a lot
like what OOO speculation is doing - it'll recompile branches on the fly based
on profiling information and things like that.

~~~
gpderetta
Nvidia Denver couples a software based jit/translator with an inorder VLIW
backend. It is vulnerable to spectre.

------
openasocket
Side-channel attacks are some of the most terrifying to me, because
fundamentally your code works fine, it's just some detail in the timing that
gives away information. At least with buffer overflows there are automated
ways to try and find them, like fuzz testing. It's not perfect, but at least
it's something to try and validate you are doing the right thing. Is there
some sort of fuzz testing that could potentially find side-channel timing
attacks? Maybe some sort of statistical analysis on the timing results? I
found this
[https://arxiv.org/pdf/1811.07005.pdf](https://arxiv.org/pdf/1811.07005.pdf)
but it seems to be fairly recent, and I don't know how mature this area is.

~~~
Strilanc
One of the interesting aspects of quantum computing is that it upgrades these
issues into actual bugs. Bit flips are errors in the Z basis, and phase flips
(information leaks) are errors in the X basis. [1][2]

Trying to build large scale quantum computers could lead to styles of
optimization that provably don't have side channel issues.

[1]:
[https://www.scottaaronson.com/blog/?p=3327](https://www.scottaaronson.com/blog/?p=3327)

[2]:
[https://www.youtube.com/watch?v=uPw9nkJAwDY](https://www.youtube.com/watch?v=uPw9nkJAwDY)

------
rolph
I found some good reads here if anyone would like to review and get a good
perspective:

[https://www.realworldtech.com/sandy-
bridge/10/](https://www.realworldtech.com/sandy-bridge/10/)

[https://en.wikipedia.org/wiki/Out-of-
order_execution](https://en.wikipedia.org/wiki/Out-of-order_execution)

[https://googleprojectzero.blogspot.com/2018/01/reading-
privi...](https://googleprojectzero.blogspot.com/2018/01/reading-privileged-
memory-with-side.html)

[https://bugs.chromium.org/p/project-
zero/issues/detail?id=12...](https://bugs.chromium.org/p/project-
zero/issues/detail?id=1272)

------
jnordwick
Has there been an in-the-wild exploit found for many of these micro-arch side
channel attacks?

I still have reservations that Specter can be actually exploited, and
certainly not by JavaScript running in a browser it seems (even without the
timing fixes -- just too much noise in the system to get a real world
exploit). All the proof of concepts I saw effectively needed a running start
(and a much lower bar to clear). I'm still not sure if we are just making too
much out of many of these attacks.

Also, many of these exploits require running native code to really even be
possible, and a JIT offers a large degress of protection. Most of the machines
I use, if you are already able to run native code, the battle has already been
lost.

I think the security community can sometimes have Chicken Little syndrome. I
most definitely wouldn't want performance enhancing techniques that might have
side-channel vulnerabilities to not be produced or used because they might be
exploitable in a use case (eg, web serving and rendering) that doesn't fit
most others (eg, internal servers).

~~~
YorkshireSeason
The article states in Section 4.5 (Page 20):

 _As part of our offensive work, we developed proofs of concept in C++,
JavaScript, and WebAssembly for all the reported vulnerabilities. We were able
to leak over 1KB /s from variant 1 gadgets in C++ using rdtsc with 99.99%
accuracy and over 10B/s from JavaScript using a low resolution timer._

~~~
icedchai
This is nice, but is there a _real world_ exploit, not a POC? Example: pulling
session cookies for another site, or extracting an SSH private key?

~~~
xenadu02
What are you basing this unearned confidence on? We've seen this same pattern
repeat thousands of times:

1\. This might be a vulnerability in theory but there's no POC 2\. Well it's
just a POC, no one has actually weaponized this. 3\. Sure someone weaponized
this but it is very difficult to pull off. 4\. Oh the script-kiddie toolkits
can build exploits for this automatically. Who could have seen it coming?!?

(Step #2 is probably where the various governmental spying agencies start
deploying the exploit in targeted ways)

Perhaps this time things are different but it seems unlikely.

~~~
icedchai
Reading 10B/sec from JS, which is under ideal conditions, does not sound like
a fast path to exploitation. I agree it is a concern, but I think it is
overblown, similar to pcwalton below.

