
Understanding L1 Terminal Fault aka Foreshadow: What You Need to Know - jterrill
https://www.redhat.com/en/blog/understanding-l1-terminal-fault-aka-foreshadow-what-you-need-know
======
lvh
This is _bananas_.

\- Unlike previous speculative execution attacks against SGX, this extracts
memory "in parallel" to SGX, instead of attacking the code running in SGX
directly. It always works: it doesn't require the SGX code to run and it
doesn't require it to have any particular speculative execuction
vulnerability. This also means existing mitigations like retpolines don't
work.

\- It lets you extract the sealing key and remote attestation. That's about as
bad as it gets. Because SGX is primarily about encrypting RAM, anything that
pops L1 cache is game over and this is a stark reminder of that fact.

\- The second attack that fell out of this allows you to read arbitrary L1
cache memory, across kernel-userspace or even VM lines.

The good news here is that the mitigation is somewhat straightforward. It's a
pure L1d attack: flush L1d (or prevent things from accessing the same L1d via
e.g. core pinning) and you're fine.

If there was any doubt left that speculative execution bugs were an entire new
class and not just a one-off gimmick...

~~~
zimmerfrei
>> \- It lets you extract the sealing key and remote attestation. That's about
as bad as it gets.

It could have definitely been worse, with the leak of the fused secrets or a
breach to integrity of the microcode (the two things that together constitute
the TCB, which put simply is the only piece of the system you assume will
never be broken).

All in all, assuming a microcode update can counter the attack as Intel
claims, sealing and attestation secrets will be rekeyed via the KDF rooted in
the fused keys, so that you can start afresh.

Of course, operationally speaking, that is a total pain but it is frankly
remarkable to see this kind of deep recovery strategy finally built into
consumer devices (and yes, I know DRM is unfortunately the main driver, but
there are still some very legitimate use cases).

>> flush L1d (or prevent things from accessing the same L1d via e.g. core
pinning) and you're fine

No, you are not fine. As the paper explains, an adversary (which is by
definition more privileged than you are) can operate between the moment you
use secrets in L1 and the moment you flush them out. Only the CPU (silicon or
microcode) can assist you in the flushing of L1 when you exit enclave mode.

~~~
geogriffin
Agreed.

> Only the CPU (silicon or microcode) can assist you in the flushing of L1
> when you exit enclave mode.

This seems correct, upon double-checking. The interrupt process within SGX is
called Asynchronous Enclave Exit (AEX) and does not give the enclave an
opportunity to run any code upon interrupt, though it is possible to run code
upon every enclave entry (via code placed at the Asynchronous Entry Pointer).
I'm not sure that would help with any speculation-based exploits, however.

~~~
lvh
There's more going on than just the SGX attack. What I'm not saying is "add
this 1 instruction and everything is copacetic" \-- what I am saying is that
the patches for at least some of the vulnerabilities are somewhat
straightforward.

------
cobookman
If helpful a few Cloud Providers Responses:

Google Cloud

\- Google Cloud's protections against this new vulnerability: \-
[https://cloud.google.com/blog/products/gcp/protecting-
agains...](https://cloud.google.com/blog/products/gcp/protecting-against-the-
new-l1tf-speculative-vulnerabilities))

\- GCE Related information: \-
[https://cloud.google.com/compute/docs/security-
bulletins](https://cloud.google.com/compute/docs/security-bulletins)

\- GKE Related information: \- [https://cloud.google.com/kubernetes-
engine/docs/security-bul...](https://cloud.google.com/kubernetes-
engine/docs/security-bulletins)

Oracle Cloud

\-
[https://blogs.oracle.com/oraclesecurity/intel-l1tf](https://blogs.oracle.com/oraclesecurity/intel-l1tf)

Azure

\-
[https://blogs.technet.microsoft.com/virtualization/2018/08/1...](https://blogs.technet.microsoft.com/virtualization/2018/08/14/hyper-
v-hyperclear/)

~~~
praseodym
Google writes that ‘Google Compute Engine employs host isolation features
which ensure that an individual core is never concurrently shared between
distinct virtual machines’.

However, GCE does offer shared core machine types (f1-micro and g1-small) with
0.2 and 0.5 vCPUs respectively. This seems to contradict their statement
(unless the cores are not shared after all, but that doesn’t make sense from
an economical standpoint).

Also, they offer machines with one vCPU, but since a vCPU is only a single
hyper-thread and not a full core, this still allows for the core to be shared
over multiple VMs. If this means that Google will stop using hyperthreading
and instead give everyone a full CPU core per vCPU, that will likely give
noticeable performance benefits (but cost more for them).

~~~
ssambros
I think the key word is 'concurrently'. My guess is that micro and small
instances indeed share CPUs, just never 'at the same time' and L1 is flushed
before transition from one VM to another.

I work for Google Cloud, but not related to security or OS development, so am
not aware how it is actually being done.

~~~
jshap70
seems slow. Having to flush on every swap, I mean. I wonder if there wont be a
move away from offering these kinds of offers: if offering 0.2 of a core means
putting in 0.3 effort, that's a 20-30% drop in how many you can run per core.

~~~
Dylan16807
512 entries, 12 cycles to refill each entry from L2, at 3GHz that's only 2
microseconds worst case to refill the L1 cache. And that's if you keep the
_same_ task in it. If you're switching tasks most of it is going to get
evicted anyway.

GCE uses KVM, which defaults to the linux scheduler with time slices from
0.75ms to 6ms, so the extra impact should be negligible. It's possible they
tuned it weirdly, but I can't think of any reason to do so.

Flushes that occur from hypervisor calls could possibly have an impact, but
those will happen whether you share a CPU or not.

------
walterbell
If hyper-threading should be disabled for maximum security, this is good for
AMD CPUs which maximize cores per socket.

2 months ago thread on OpenBSD and hyper-threading:
[https://news.ycombinator.com/item?id=17350278](https://news.ycombinator.com/item?id=17350278)

~~~
c2h5oh
Or just keep using AMD CPUs, because yet again they are unaffected
[https://www.amd.com/en/corporate/security-
updates](https://www.amd.com/en/corporate/security-updates)

~~~
sebazzz
Never mind that AMD sockets used to last longer (less frequent incompatible
socket changes), which meant you could do longer with your motherboard. Not
sure if that still is the case though.

~~~
stephengillie
We all miss Socket 7.
[https://en.m.wikipedia.org/wiki/Socket_7](https://en.m.wikipedia.org/wiki/Socket_7)

------
sofaofthedamned
We're at a stage where to be safe on x86 we need to have multiple microcode
and kernel layers to be safe.

At which point do we agree the performance increases over the last 20 years
have been built on sand and move elsewhere?

~~~
colechristensen
It's probably time for a new architecture that isn't so convoluted with
decades of optimizations and iterative improvements.

It's also time for a computer system with one and only one general purpose
processor (no tiny CPUs in storage or "system management" or every other
device)

Probably something like a programming language/OS/computer system written new
with a CPU based on current GPU designs.

~~~
slededit
You won't make any CPU of reasonable performance without speculative execution
and all the rest. Your limited by data dependencies and the only way to break
them is to "cheat".

Unless your willing to run on the equivalent of a Cortex-M0 then you have to
live with it.

~~~
dnautics
I hear delay slots aren't so bad, and compiler optimizations have gotten
really good since the itanium days when they last (half-hearted) tried vliw

~~~
slededit
Delay slots are merely the pipeline of the uarch peeking through, its bad
practice because you'll probably want to change the pipeline depth at some
point. Other uarchs have them but they hide them from the public API.

VLIW only removes the logic to detect data dependency - it doesn't workaround
the actual need to wait for data to be ready.

None of this has much to do with speculative execution which is guessing which
way a branch will go. You simply can't have what would be considered a modern
computer without it.

------
miloignis
Getting the SGX attestation key would permanently break SGX-based blockchain
(Hyperledger Sawtooth?) mining, if I understand correctly. It's amazing that
(if this is correct) this vulnerability has permanently broken a large
software project.

~~~
geogriffin
SGX Remote Attestation was built specifically to deal with events like this.
Intel starts to reject attesting to vulnerable microcode revisions after some
period following disclosure. In this case, they even postponed disclosure
until patched microcode revisions were available and those revisions already
required for successful attesation.

If said SGX application wasn't built around this model then it's probably not
a valid use case of SGX..

~~~
bscphil
Isn't this pretty much a no-go for any large public software project, given
that microcode updates often depend on the OEMs, which are notoriously bad
about supporting devices older than about a year?

~~~
zimmerfrei
I think that is mostly the case for BIOS and platform firmware. CPU microcode
can be loaded by the OS (if the OS allows you to, as Linux does -
[https://www.cyberciti.biz/faq/install-update-intel-
microcode...](https://www.cyberciti.biz/faq/install-update-intel-microcode-
firmware-linux/)).

~~~
geogriffin
BIOS updates are required for most SGX-related microcode updates, as the
microcode has to be up-to-date before enabling the SGX feature via a MSR
(which is usually done by the BIOS). This is so you can't start an enclave
with old microcode, exploit it, upgrade microcode, and still pass remote
attestation.

Also, the more major spectre-related microcode updates have to be applied very
early (in the BIOS) probably for technical reasons. For this latest microcode
update, for example, Intel didn't even include it in their downloadable
microcode package as you linked to. On my v6 Xeons, I was able to get to
revision 0x84 with the latest OS microcode package, but 0x8e with a BIOS
upgrade.

------
c2h5oh
I'm tempted to just buy the cheapest 8th gen intel cpu and play with that to
extract widevine keys from sgx

~~~
bscphil
Since pirates have most likely broken widevine (to the best of my knowledge -
I don't have direct confirmation of this), this was also my first thought. I
wonder if they've used something like this. As far as I know, it would
constitute a complete break of the Widevine DRM model.

------
dannyw
What is the net performance impact of all these Meltdown, Spectre, now
Foreshadow mitigations?

-10%? -20%? -30%?

Have we gone back 3 CPU generations?

~~~
Sohcahtoa82
This is something I'd really like answered.

I'm still running an i7-3770k on my desktop at home. I was considering
upgrading when the 9th gen comes out in October, but if the
Spectre/Meltdown/Foreshadow fixes have a significant performance impact, it
won't be worth spending the money. As it is, I'll already need a new
motherboard and RAM, since I'm still on DDR3.

------
mehrdadn
What secrets do typical VM hosts (like cloud service providers) have that must
be protected from guests?

~~~
mmozeiko
Private keys for HTTPS certificates. API keys or different credentials for
other systems.

~~~
mehrdadn
Huh, interesting. VM _hosts_ serve HTTPS sites and hold API keys? Those aren't
done by other servers?

~~~
icegreentea2
I think the idea is that a given VM client can more easily trust that their
keys and such will not be captured by other VM clients if they know that SGX
is working.

------
aosaigh
Can anyone "explain like I'm 5" this issue?

~~~
cowboysauce
Modern processors execute instructions speculatively, that is without knowing
if the instructions should actually run. If it turns out they weren't supposed
to be executed then the processor undoes effects of the instructions. However,
not everything is undone. If the speculatively executed instructions access
RAM, then they can move data in and out of the cache. By measuring how long it
takes to read memory, you can tell what memory the speculatively instructions
accessed.

Speculative execution is what allows Meltdown to work. You make the processor
speculatively execute an access to kernel memory, then access a memory address
based on the value of the data read from kernel memory. Intel processors
preform the speculative execution without first checking if the memory access
in allowed while AMD processors check before the speculative execution. This
is why AMD processors aren't vulnerable to Meltdown.

SGX was thought not be vulnerable to a speculative execution attack because
attempts to access SGX memory, without having the necessary access, just yield
-1 for reads and writes are ignored, as opposed to causing an exception as
with access to kernel memory. However, if the SGX memory is marked as not-
present, then attempts to read the memory will trigger a page fault exception.
The page fault circumvents the normal SGX protection and allows the memory to
be read by speculatively executed instructions.

------
zimmerfrei
Interestingly, the upcoming CPUs with built-in resistance to Meltdown (new MSR
bit RDCL_NO set to 1) will also be immune to L1TF already.

------
based2
[https://foreshadowattack.eu/](https://foreshadowattack.eu/)

via [https://lobste.rs/recent](https://lobste.rs/recent)

~~~
based2
[https://www.reddit.com/r/netsec/comments/97a6ty/foreshadow_e...](https://www.reddit.com/r/netsec/comments/97a6ty/foreshadow_extracting_the_keys_to_the_intel_sgx/)

------
lvh
AWS bulletin: [https://aws.amazon.com/security/security-
bulletins/AWS-2018-...](https://aws.amazon.com/security/security-
bulletins/AWS-2018-..).

Amazon Linux bulletin:
[https://alas.aws.amazon.com/ALAS-2018-1058.html](https://alas.aws.amazon.com/ALAS-2018-1058.html)

RHEL patches are out. CentOS after delay, presumably. Nothing yet for
Debian/Ubuntu.

TL;DR: AWS is patched. Go update your kernel (especially if you run other
people's code).

~~~
officialchicken
Ubuntu:
[https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/L1TF](https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/L1TF)

~~~
wgjordan
On Ubuntu 14.04 (Trusty), be warned that there is a showstopper bug in the
latest kernel update to 3.13.0-155 [1].

[1]
[https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787127](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787127)

