
Intel SA-00145: Lazy FP State Restore - lkurusa
https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00145.html
======
kentonv
Andy Lutomirski noted on another thread that he unintentionally fixed this two
years ago in Linux:

[https://news.ycombinator.com/item?id=17304947](https://news.ycombinator.com/item?id=17304947)

(He switched it to eager FP because it's faster on modern hardware.)

But a lot of people running old LTS kernels may be affected.

EDIT: Looks like Luto's change landed in kernel version 4.6.
[https://kernelnewbies.org/Linux_4.6#List_of_merges](https://kernelnewbies.org/Linux_4.6#List_of_merges)
[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ecc026bff6e8444c6b50dcde192e7acdaf42bf82)

~~~
techntoke
Does that include Red Hat Enterprise Linux? I can't believe how old of a
kernel they use still. I have to use it for work, but I run Arch at home which
is running 4.16.13. I'm honestly surprised that Red Hat can't manage to keep
their distros current with stable package builds.

~~~
snuxoll
Red Hat backports security fixes, hardware support and certain new features to
their kernel packages, staying on a stable kernel version throughout the
product lifecycle is what allows them to guarantee a consistent kernel ABI
that virtually every other distribution throws to the wayside. Personally I
prefer not having third-party kernel modules on my servers break every time I
run `yum upgrade` like I do with Fedora (oh VMWare, how I hate you so).

~~~
vbezhenar
I'm using HP B120i fakeraid controller with proprietary driver and it broke
after 7.4 upgrade, so while they probably doing a good job about binary
compatibility (7.5 didn't broke it), it's not ideal.

~~~
snuxoll
Unfortunately the kABI does not encompass every symbol exported by the kernel,
there is a whitelist maintained in the kernel-abi-whitelists package and
scripts to check conformance of a module to the whitelist. Symbols are only
ever added throughout the lifecycle of a RHEL release, so anything that
conforms to the 7.3 kABI will also work on 7.4 - but if the module uses a
symbol NOT whitelisted in 7.3 there's no guarantee the 7.4 update won't break
it.

Not that it's much solace when your proprietary blob breaks, but Red Hat does
make considerable effort to give vendors a stable target to build against -
unfortunately not everyone fully validates conformance which results in the
same old problems cropping up from time to time.

~~~
silly-silly
That and Spectre fixes forced a BUNCH of third party code that relied on non
exported code to break.

------
cperciva
I posted some details about this to Twitter:
[https://twitter.com/cperciva/status/1007010583244230656](https://twitter.com/cperciva/status/1007010583244230656)

~~~
jetrink
Here is the text of the tweets for convenience:

So about that "Lazy FPU" vulnerability (CVE-2018-3665)... this probably ought
to be a blog post, but the embargo just ended and I think it's important to
get some details out quickly. This affects recent Intel CPUs. It might affect
non-Intel CPUs but I have no evidence of that. It is an information leak
caused by speculative execution, affecting operating systems which use "lazy
FPU context switching". The impact of this bug is disclosure of the contents
of FPU/MMX/SSE/AVX registers. This is very bad because AES encryption keys
almost always end up in SSE registers. You need to be able to execute code on
the same CPU as the target process in order to steal cryptographic keys this
way. You also need to perform a specific sequence of operations before the CPU
pipeline completes, so there's a narrow window for execution. I'm not going to
say that it's impossible that this could be executed via a web browser or a
similarly "quasi-remote" attack, but it's much harder than Meltdown was. I was
not part of the coordinated disclosure process for this vulnerability. I
became aware of this issue after attending a session organized by Theo de
Raadt at @BSDCan. It took me about 5 hours to write a working exploit based on
the details he announced. Theo says that he was not under NDA and was not part
of the coordinated disclosure process. I believe him. However, there were
details which he knew and attributed to "rumours" which very clearly came from
someone who was part of the embargo. My understanding is that the original
disclosure date for this was some time in late July or early August. After I
wrote an exploit for this, I contacted the embargoed people to say "look, if I
can do this in five hours, other people can too; you can't wait that long".
While I have exploit code and it is being circulated among some of the
relevant security teams, I'm not going to publish it yet; the purpose was to
convince the relevant people that they couldn't afford to wait, and that
purpose has been achieved. I know from the years that I spent as FreeBSD
security officer that it takes some time to get patches out, and my goal is to
make the world more secure, not less. But after everybody has had time to push
their patches out I'll release the exploit code to help future researchers. I
think that's everything I need to say about this vulnerability right now.
Happy to answer questions, but I'm not part of the FreeBSD security team and
don't have any inside knowledge here -- FreeBSD takes embargoes seriously and
they didn't share anything with me. </thread>

One more thing, some advisories are going out giving me credit for co-
discovering this. I didn't; I just reproduced it and wrote exploit code after
all the important details leaked.

------
cmsimike
Serious question - should we considering speculative cpu execution to be A Bad
Idea (tm) and move on from it (since these problems keep coming up), or is the
thought that we more or less have been gaining performance on the back on
incorrectly written software (which does not take these speculative execution
edge cases into account), and the only forward is patching?

I guess a another question I have is can we win the performance back through
fixes on the cpu or will speculative execution always be insecure and thus
need patching in software?

 __Try as I might, I am not a CPU person.

~~~
Tuna-Fish
Completely dropping all forms of speculative execution means dropping overall
performance to a tenth of today. There are really hard limits on how fast any
operation, especially memory operations, can be done. The way we have made our
CPUs faster is by making them do more operations in parallel, at all levels.
At the lowest level, in straight line code, this very often requires
speculation to achieve.

Speculation is not fundamentally incompatible with security. It's just that
literally everyone in the industry never though that leaking information out
of speculative context was possible -- and so there is no hardening anywhere.
Now that it was proven possible, people are rushing to find all the ways this
can be exploited. New CPUs currently being designed will fix all these, and
then eventually we will have speculation without security issues.

Except for Spectre variant 1. That will always stay with us, because there is
no sensible fix for it. The only real solution to that is to accept that
branches cannot be used as a security boundary. This is mostly relevant to
people implementing secure sandboxes and language runtimes. Going forward, the
only reasonable assumption is that if you let a third party run their code in
a process, no matter how you verify accesses or otherwise try to contain that
code, you should assume it has a read access to the entire process. Any real
security requires you to make use of the proper OS-provided isolation.

~~~
kentonv
> Going forward, the only reasonable assumption is that if you let a third
> party run their code in a process, no matter how you verify accesses or
> otherwise try to contain that code, you should assume it has a read access
> to the entire process.

If this is true, it is unbelievably bad for the future of security and
computing in general. People throwing around this assertion are, in my
opinion, not appreciating how bad it is. We need to try a lot harder before we
give up.

Here's why:

1\. Finer-grained isolation makes security better, because it allows us to
apply the Principle of Least Authority to each component of the system, and
protect components of a system from bugs in other components. To make
meaningful gains in security going forward, we need to encourage _more_ fine-
grained isolation. If process-level isolation is the finest grain we'll ever
have, we can't make these advances.

2\. The scalability of edge computing _requires_ finer-grained isolation than
process isolation. The trend is towards pushing more and more code to run
directly on devices or edge networks rather than centralized servers. That
means that the places where code runs need to be able to handle orders of
magnitude more tenants than before -- because everyone wants their code to run
in every location. If we can't achieve high multi-tenancy securely -- by which
I mean 10,000 or more independently isolated pieces of code running on the
same machine -- then the only solution will be to limit these resources to
big, trusted players. Small companies will but shut out from "the edge".
That's bad.

Luckily, the "process isolation is the only isolation" claim is wrong. It's
true that we need to evolve our approach to make sub-process isolation secure,
but it's not impossible at all. In fact, it's possible to design a sandbox
where you can trivially prove that code running inside it cannot observe side
channels.

Here's how that works: Thank about Haskell, or another purely-functional
language. An attacker provides you with a pure function to execute. Because
it's a pure function, for some given set of inputs, it will always produce
exactly the same output, no matter what machine you run it on, or what else is
going on in the background. Therefore, the output _cannot possibly_
incorporate observations from side channels, no matter what the code does
internally.

So: It is possible to run attacker-provided code without exposing secrets from
the process's memory space.

The question is, of course, how do we build a useful sandbox that relies on
this property. There is a lot of work to be done there. Luckily, we don't
really have to use a purely-functional language, we only need to use a
_deterministic_ language. It turns out that the world's most popular language,
JavaScript, is actually highly deterministic, in large part due to its single-
threaded nature.

We do need to remove access to timers, or find a way to make them not useful.
We also need to prevent attackers from being able to time their code's
execution remotely. Basically, we need to think carefully about the I/O
surface while paying attention to timing. But sandboxing has always been about
thinking carefully about the I/O surface, and we have a lot of control there.
We just have a new aspect that we need to account for.

I think it's doable.

~~~
jcranmer
> Here's how that works: Thank about Haskell, or another purely-functional
> language. An attacker provides you with a pure function to execute. Because
> it's a pure function, for some given set of inputs, it will always produce
> exactly the same output, no matter what machine you run it on, or what else
> is going on in the background. Therefore, the output cannot possibly
> incorporate observations from side channels, no matter what the code does
> internally.

This is patently false. The attacker can do timing on his side and see how
long it takes your service to return a response. If you let an attacker have
any sort of output channel, you give him the power to use _his_ stuff to find
side channels, and there is nothing you can do to close it.

~~~
eridius
If the attacker can do timing on their side in a pure function, then by
definition the time the response takes is one of the function inputs.

~~~
makomk
Haskell functions are only pure in an abstract sense that ignores micro-
architectural side-effects and the resulting timing changes. All of the recent
speculative side-channel attacks are the result of the abstraction layer
exposed by modern processors leaking. You can't fix this by putting more
abstraction layers with neat theoretical properties on top.

~~~
Symmetry
You can't be guaranteed to prevent the attacker form writing information to
those side channels but you can guarantee that the attacker can't read from
those side channels because reading from them requires doing timing analysis
and timing analysis requires IO - which we've stipulated hasn't been passed to
the attacking function.

------
0x0
Status in Debian: [https://security-
tracker.debian.org/tracker/CVE-2018-3665](https://security-
tracker.debian.org/tracker/CVE-2018-3665)

~~~
maurom
Just what I was looking for. Guess we'll get a new security announce.

Thanks

~~~
0x0
... for debian 8 only, though. Looks like current stable dodged that bullet
quite a while ago.

------
kanox
What is the impact of this exactly? If "FP State" just means floating point
register values then those rarely contain interesting stuff.

If this also affects register used for crypto acceleration then it could be
used for stuff like leaking browser secrets from javascript, right?

~~~
cperciva
"FP State" in this case includes MMX and SSE registers; they're handled via
the same "lazy context switching" mechanism. Yes, you can steal keys which are
being used for AESNI.

Leaking secrets via javascript -- I'll be impressed if someone pulls that off.
There's a very tight timing window to exploit this before a trap fires and
flushes your pipeline, and I doubt you can get the right instructions in there
using javascript.

~~~
caf
Can't you executive the trapping access in a not-taken speculatively-executed
branch à la Meltdown?

~~~
cperciva
Sure, but that gives you an even shorter window because the pipeline flushes
as soon as the CPU realizes that it mis-speculated.

~~~
caf
It is my understanding that you can make that window very long by having the
mis-speculated branch depend on a value that has to be loaded from main
memory.

~~~
cperciva
Hmm, good point. Yes, that would probably make this much easier to exploit.

Unless the CPU does a partial pipeline flush upon realizing that there would
be a trap. I don't know -- do Intel CPUs check for exceptions in order?

~~~
caf
Well, they don't for #PF at least, because that's how Meltdown works - I'm not
sure if it works differently for #NM though.

------
andreiw
Do we have PoC code? Has anyone tried attacking FP/SIMD state on other ISAs
like Power or AArch64?

~~~
cperciva
I have exploit code -- took me about 5 hours to write after Theo announced all
the important details of the vulnerability. I'm not going to publish it yet,
though.

AFAIK other systems aren't affected -- is lazy context switching even a thing
on them? The fundamental issue here is that one process' data is still in
registers when another process is running and we've been relying on getting a
trap to tell us when we need to restore the correct FP state.

~~~
monocasa
Lazy context switching is a thing on pretty much every architecture with an
FPU.

~~~
cperciva
Hmm, I thought most RISCy CPUs kept FP values in GP registers?

~~~
monocasa
No, I can't think of an arch that does that. Power, SH, Mips, Sparc, Alpha,
ARM, and RISC-V all have separate architectural register files for the
floating point state.

Some ARM ABIs end up passing floats in integer registers, but that's just for
compatibility for code that doesn't assume the presence of an FPU and might be
doing everything soft float.

~~~
cperciva
Hmm, ok. It's a long time since I've looked at that. Come to think of it, I
think it might be 20 years since I opened CA:QA...

~~~
andreiw
Yeah I’d say that modern OoO Arm implementations (A57, A72, ...) are worth
trying to speculate into trapped VFP state. Lazy FPU is definitly a thing
everywhere.

My hunch says that chips affected by 4a could easily be fair game (4a is
speculating reads into priviledged regs... I wonder if 4a would work on regs
that are trapped, not inconceivable)

------
jibanes
Is it fixable by microcode update?

~~~
cperciva
Very unlikely. The key issue here is that traps aren't handled until a long
way down the pipeline.

But it's a (relatively) simple OS fix.

------
sctb
We'll let this thread take the frontpage spot now that the announcement is
official. Previous one:
[https://news.ycombinator.com/item?id=17304233](https://news.ycombinator.com/item?id=17304233).

------
andreiw
What are we calling this Spectre variant?

~~~
cperciva
Either "LazyFP" or "Lazy FPU".

~~~
andreiw
Not numerating it? Variant 6 is as good as anything...

