
More details about mitigations for the CPU Speculative Execution issue - el_duderino
https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html
======
iforgotpassword
OK so this might be only partially related, but earlier today I checked out
the poc code for spectre and saw that it uses that instruction to discard a
cache line. That reminded me of that performance issue ryzen had after launch
in some game I don't remember the name of, which was tracked down to the same
instruction for flushing a cache line. In the case of that game, the
instruction was for some reason emitted by the vc++ compiler in a tight loop
for a variable that was accessed therein. On Intel CPUs that didn't do any
harm, because these actually don't evict the line from cache but just mark it
as least recently used, while ryzen actually invalidated the cache line and
had to repeatedly load the data back from ram.

Back then I already wondered how many valid use cases there are for these kind
of instructions, apart from synthetic benchmarks. And even if you can use it
to make an algorithm faster, how many times is it used incorrectly because
someone just thought it would be better without benchmarking properly, (or
because they mess up some compiler).

So my (naive) thought back then was "why not just noop all those cache related
instructions in the CPU itself?" Then during 2017 I saw a talk about using
these very same instructions for creating a side channel between two VMs in
Amazon's cloud, and now this here. I am aware you could still make sure the
data is not cached by just thrashing the cache (which would make this attack
much slower than it already is) but really, what have these flush and prefetch
instructions ever done for us?

~~~
cesarb
> what have these flush and prefetch instructions ever done for us?

Flushing the cache is important when the data _must_ be in the main memory,
for instance when an I/O device does DMA to it and is not coherent with the
caches. As for prefetch instructions, they are hard to use correctly, but in
some cases can make an algorithm much faster.

~~~
Unklejoe
Aren't most modern CPU's cache coherent? I know the recent PowerPC SoC I was
working with was. All of the DMA cache sync functions were no-ops in the Linux
kernel for that architecture.

~~~
noselasd
For many architectures the coherency is turned off and managed in software by
the device driver, to achieve better performance.

~~~
Unklejoe
Any examples of this in the Linux kernel? Not doubting you, I'm just curious.
I would think this would be device driver specific rather than architecture
specific.

Most drivers I've seen just use the kernel's DMA API (dma_alloc_coherent),
which is guaranteed to provide a buffer that doesn't require any explicit
cache management. If the architecture is coherent, then it leaves caching
turned on for those pages. If it's non-coherent, then it turns it off.

------
renchap
> In our own testing, we have found that microbenchmarks can show an
> exaggerated impact.

This. The talks about 30% (or even 300%!) impact based on a graph without any
details or methodology, often itself based on a microbenchmark or a tweet that
got viral is really not helping.

~~~
acdha
> a tweet that got viral is really not helping.

I especially liked the AWS forum post being breathlessly circulated which is
from someone claiming that their business critical workload no longer fits on
a single m1.medium (1 vCPU, <4GB RAM), joined by someone else who was actually
swapping and hitting the OOM killer.

There is a real impact but this is going to be the lazy IT worker's new
plausible excuse of the day for months…

~~~
PuffinBlue
That particular AWS forum post doesn't support your argument. Especially when
Matt the AWS rep specifically acknowledged in that very thread that the update
AWS had applied had had an effect on performance.

It matters not what the workload was running on before or accusations of
laziness, the key takeaway from that post was that there _was_ an impact on
performance caused by the Meltdown/Spectre mitigations.

The fact another commenter barged into the thread with an unrelated issue
simply shows that AWS forum operates within the Law of Forums (which states
all threads are to be cross pollinated with unrelated issues from at least one
other user).

EDIT: I should say that if this fellow is the only affected user from all the
computer users of the world, then the mitigations applied can be called
successful. That seems unlikely :-)

~~~
acdha
I think you missed my point, which was simply that any service which you care
about should be running on n > 1 servers and with at least enough capacity
planning so that a small percentage change in workload doesn't cause user-
visible failures. That's especially true in an environment like EC2 where
hosts fail and noisy neighbors can cause service degradation.

In this case, the thread made it clear that the real problem was that they
didn't have a good deployment story — notice how the original poster was
mentioning needing to move everything to an m3.medium manually? I would be
quite surprised if they haven't had other problems in the past (e.g. what
happens when a system update kicks off if that server is already running at
90+% utilization? or when they get more users and/or the existing users start
doing slightly more) but hadn't wanted to deal with the hassle of migrating.

------
chrononaut
This post seems to contradict Intel's recent release:

> _Intel has developed and is rapidly issuing updates for all types of Intel-
> based computer systems — including personal computers and servers — that
> render those systems immune from both exploits (referred to as “Spectre” and
> “Meltdown”)_

That states that Meltdown is specifically addressed and mitigated. However,
this post by Google does not indicate that "Variant 3", or "Meltdown", can be
addressed by a microcode update:

> _Mitigating this attack variant requires patching the operating system. For
> Linux, the patchset that mitigates Variant 3 is called Kernel Page Table
> Isolation (KPTI). Other operating systems /providers should implement
> similar mitigations._

The option of applying a microcode update is explicitly called out for their
mitigation of "Variant 2":

> _Mitigating this attack variant requires either installing and enabling a
> CPU microcode update from the CPU vendor (e.g., Intel 's IBRS microcode),
> ..._

Am I misreading one of these statements?

~~~
m_mueller
Reading through all this stuff I had similar thoughts. Statements by Google
are the most trustworthy at this moment IMO, and they don't add up with what
AMD and especially Intel state as far as I can tell.

------
_stephan
I'm curious whether the Retpoline mitigation will still be
necessary/recommended for user applications (that don't operate as a JIT or
interpreter) once the kernel and CPU mitigations for Spectre that are
currently in the works have been applied.

~~~
dingo_bat
After the Intel microcode update it shouldn't be. At least according to Intel.

------
leonardinius
And now I have to wonder about negligible performance implications.

Does it mean Google applied point-and-shoot type of fixes in several areas,
e.g. OS kernel and hypervisor? Most likely.

Or. The speculative execution did not provide meaningful performance benefits
in the first place? At least for Google workloads? If so - why all this extra
complexity?

~~~
jpatokal
The mitigations don't involve disabling speculative execution entirely, only
adding safeguards to how it's done.

------
jblz
This list linked at the bottom is pretty insightful for Google products &
relevant mitigations:

> You can learn more about mitigations that have been applied to Google’s
> infrastructure, products, and services here[1].

Confirms the `chrome://flags/#enable-site-per-process` flag is useful here &
sure enough, the 2018-01-05 SPL was waiting for me on my Pixel when I looked.

1\.
[https://support.google.com/faqs/answer/7622138](https://support.google.com/faqs/answer/7622138)

------
noselasd
So I have read the meltdown.pdf paper, but I don't quite understand why/how it
bypasses the kernel/usermode checks of the pagetable, does anyone have a good
explanation ?

Furthermore the paper mentions it is unable to reproduce it on AMD and ARM,
but has some suggestions of things to try to make the exploit work. Other
sources , including AMD itself, claims not to be vulnerable to meltdown, does
anyone know the technical reasons as to why, and what is different compared to
Intel cpus ?

~~~
BraveNewCurency
Basically, you have some code that says "go read a byte from kernel memory. if
the high bit of that byte is true, then access page X of memory".

Normally, that code will just error out right away.

But if you add a new branch before the code (such that the branch is taken to
avoid the code, but the CPU predicts the branch to NOT be taken), the CPU will
speculatively execute the above code just past your branch. The speculative
execution doesn't check for memory violations (because that takes time).

Normally, that's cool: if the new branch IS taken, there is no harm because
the the result of the (bad) kernel access will be thrown away. If the new
branch IS NOT taken, the CPU notices the bad access and complains.

But if you are extra devious, you can ensure that page X is NOT cached when
running this code. After, you check if page X suddenly got cached. That tells
you the value of the high-bit of your kernel memory. Keep scanning all the
bits and you can read out all of kernel memory.

~~~
RachelF
An excellent, succinct explanation!

------
tzs
How are GNU Hurd, Minix, and other microkernel systems affected by these
issues? I would expect that they would have less sensitive information in
kernel memory, and so exploits to read kernel memory would not be as dangerous
as on systems with a monolithic kernel.

~~~
noselasd
It's not so much about sensitive info residing in the kernel, but that the
kernel has an identity mapping of the entire physical memory, Thus if you can
read kernel memory, you can dump all ram, where secrets from other processes
or virtual guests reside.

I don't know if Minix or Hurd maps all of ram into the kernel address space
though (or if they add the kernel address space to each user space processes ,
as the exploit also requires)

------
saganus
A bit off topic but I asked the same issue in a thread that got delisted as
[dupe] so I'm reposting here so hopefully someone can enlighten me:

I've been reading up a bit on these attacks and I was wondering if there are
any particular requirements to implement them in an arbitrary language?

For example, can you implement the attack with Java but without JNI? i.e. are
syscalls required to be able to leverage the exploit?

~~~
talyian
the requirements are easy to meet - you just need to be able to time
operations at a suitable precision. The exploits are possible in Javascript
running in a browser.

~~~
saganus
Yeah, I saw the PoC in the Spectre paper, but I was wondering if a JVM
language could meet those reqs.

I have absolutely no idea if using the JVM would for example, mess with the
required precision since I'm guessing one would need to use JNI/JNA to get the
timing and that could maybe not be suitable?

~~~
peoplewindow
Java provides access to high resolution timers without JNI and optimises them
very well.

------
j_coder
I suspect a better solution instead of KPTI is to evict all user space pages
from cache when an invalid page access happens if fault was caused by
read/write kernel space pages. My kernel days was so long ago that I don't now
if it is possible.

Massive performance hit but only on misbehaved software. Well behaved software
will not have the performance hit of KPTI.

Kernel could even switch dynamically to KPTI if too many read/write attempts
from user space.

~~~
corsix
Implementations of meltdown do not need to trigger a page fault (because the
instruction which would fault can be made to execute speculatively - in
addition to the instruction which leaks information into the cache executing
speculatively). Accordingly, there would be nothing for the kernel to observe
or respond to.

~~~
j_coder
I thought that:

mov rax, [Somekerneladdress]

would trigger an interrupt even on speculative execution as described on
[https://cyber.wtf/2017/07/28/negative-result-reading-
kernel-...](https://cyber.wtf/2017/07/28/negative-result-reading-kernel-
memory-from-user-mode/)

ADDED: So in the interrupt handler the kernel could evict all user space pages
from cache before returning control to user space so it could not use the
timing attack on the cache of the speculative execution of Mov
rbx,[rax+Someusermodeaddress] on the address rax+Someusermodeaddress.

~~~
dmitrygr
and what if it was preceded with

    
    
       cmp $0, [some_readable_but_uncached_addr_containing_zero]
       je some_safe_location
       //now the exploit
       mov rax, [somekerneladdr]
       ...the rest of it...
    

cpu may speculatively execute past "jz" and speculatively do the load. no
fault generated

~~~
j_coder
So it is a game over here. Unless Intel can change the microcode to force a
page fault in this case.

~~~
bstx
It doesn't make sense for speculatively executed code to throw architecturally
visible exceptions. The appropriate behavior would be to not perform
speculative loads across protection domains (i.e. the behavior of AMD
implementations).

~~~
j_coder
It would make sense if it was the only alternative as the kernel can handle
it. The appropriate behavior is to remove all traces of the speculative
execution including cache hits.

~~~
GrayShade
Is that even possible? The data that would need to be removed from the cache
has already evicted other cache lines, and that re-fetching those might have
observable effects, like the timing.

------
SapphireSun
How do you go from knowing the location of memory to actually doing an attack
if the memory is read protected? That's the part I don't get.

------
XorNot
So kind of left field but what's the theoretical effects these exploits would
have on The Mill ?

Would it be vulnerable in the same way?

~~~
zeotroph
See [https://millcomputing.com/topic/security-on-mmooo-
processors...](https://millcomputing.com/topic/security-on-mmooo-processors-
is-hard/) and [https://millcomputing.com/topic/meltdown-and-
spectre/](https://millcomputing.com/topic/meltdown-and-spectre/)

The Mill has portals (possibly) gating everything and making microkernel
design easy and fast, so Meltdown would not work. Spectre, not so sure, there
are claims the NaR (not a result) option might prevent it.

I bet this will come up in the next talk, looking forward to it.

~~~
XorNot
Honestly I'm just thinking if there was a moment to break into the CPU
market...I'd say this is it. Half the planet is going to be looking for
architecture updates which can be proven to be immune to this type of
vulnerability.

I'm sure not going to be buying any new CPUs till some new arch's come out
which remove this (or make the mitigations performant).

------
benjaminjackman
Google Security Blog ... requires javascript enabled to read ...

~~~
thesandlord
Works fine for me with JS disabled. In fact it loads even faster...

[https://imgur.com/a/Hbyj8](https://imgur.com/a/Hbyj8)

~~~
the8472
Alas, content blockers blocking content won't put a site into noscript mode
which shows <noscript> tags because they only block requests.

The site breaks if you block their JS.

~~~
benjaminjackman
That makes sense, maybe content blockers could be configured to also "run"
noscript tags.

