
Linux 5.8 Set to Optionally Flush the L1d Cache on Context Switch - blopeur
https://www.phoronix.com/scan.php?page=news_item&px=L1d-Cache-Flushing-Queued
======
rayiner
I'm not that old, but I've been closely following these trends since the late
1990s, and it seems to me like we are descending into madness now. We are
wiping out years of single-core performance gains--which have been hard to
come by over the last decade to begin with--through all of these mitigations.
It seems to me like maybe the mental model is broken. If untrusted code is
running on the same core/package/what have you, your security has already been
breached.

~~~
ghshephard
You run untrusted code every day you browse the web.

~~~
asveikau
One of the recent times where I pointed this out somebody retorted that the
browsers got rid of js access to high resolution timers, making a timing
attack infeasible. Is this true?

~~~
georgyo
Not true, at most it increases the number of samples required. But the are
other ways to simulate timers with adequate resolution.

~~~
mahkoh
Please post an example showing that this works. I'll run the example with OS-
mitigations disabled to confirm.

~~~
dijit
This is a browser checker that does not use the high precision timers:
[https://jsfiddle.net/lukelol/43015xpv/1/](https://jsfiddle.net/lukelol/43015xpv/1/)

~~~
mahkoh
It shows "Browser not exploitable." even after rebooting with mitigations=off.

~~~
dijit
Congratulations!

Now pray that there is not another exploit that can be used. :)

Like many things; security is an onion, there are layers to it and removing
some of those layers can be fine because the outer layers may protect you, but
ultimately you just increase risk.

~~~
mahkoh
Why did you even post if you were going to resort to "but can you prove that
there isn't an exploit?" anyway? I could have saved myself two reboots if you
had started with that.

~~~
dijit
I assume good faith, you asked for an exploit that works without precision
timers to test, I provided one.

Don’t get mad about the fact it didn’t work, I’m glad it didn’t work, but you
can’t walk away with the knowledge that something like that will _never_ work.

I’m not sure what you were trying to prove, but you can’t tell people to prove
you wrong without any knowledge of what browser you’re running with, or what
version or what you’re expecting to see.

A “fully working” exploit for the most modern browser is probably possible
frankly, but it’s not something that anyone is looking at with seriousness
because everyone has mitigation’s enabled anyway. It’s the very definition of
high work low reward.

~~~
mahkoh
>I assume good faith, you asked for an exploit that works without precision
timers to test, I provided one.

No you didn't. You posted some code that doesn't do anything.

>I’m not sure what you were trying to prove, but you can’t tell people to
prove you wrong

I didn't ask anyone to prove me wrong. I asked georgyo to prove his claim that
an exploit is possible. You chimed in and posted some nonsense code which
obviously does nothing and never did anything even before Meltdown and Specter
were mitigated anywhere because even the original native PoC's were much more
complicated.

>A “fully working” exploit for the most modern browser is probably possible
frankly

Stop talking out of your ass.

~~~
dang
You broke the site guidelines badly here. If you wouldn't mind reviewing
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)
and sticking to the rules when posting here, we'd be grateful.

------
saagarjha
> This flushing does address CVE-2020-0550 for snoop-assisted L1 data sampling
> but the main emphasis seems to be on the "yet to be discovered
> vulnerabilities."

I am unsure if this is just idle speculation (heh) that there may be issues in
this area or there are issues that have been disclosed to vendors but not the
public yet?

~~~
snazz
Yes, this is probably another coordinated disclosure where we see evidence in
Linux kernel commits first. I noticed that Apple hasn't released the security
credits page for iOS 13.5 yet; is that a normal delay or are we waiting on the
disclosure of another processor bug (that presumably also affects ARM)?

~~~
0x0
Maybe, the notes will probably be published when macOS 10.15.5 drops (this
week? But who knows if the new 0day in the unc0ver jailbreak will shake up
things)

~~~
saagarjha
They're both out now.

~~~
0x0
Just as predicted :)

------
enitihas
The performance implications of this would be huge, and I hope it remains opt
in for long time.

~~~
lmilcin
I don't see how the implications will be huge.

This is L1d cache which is just 48kB for Ice Lake. We are also talking about
context switches which are not happening very frequently. Applications that
are generating load don't context switch all the time because they are busy
doing work.

Then, when you context switch it is likely the context to which you are
switching would like to use that cache for something. By the time we switch to
your original thread it is very likely L1d has already been filled with
something else.

I am pretty sure you would not notice anything except for very special, rare
situations.

~~~
ollien
What do you mean by "not happening very frequently?" The default timeslice is
something on the order of 100ms, isn't it? And that's if the process doesn't
yield. Clearing L1d every 100ms (at worst) seems pretty frequent to me.

~~~
magicalhippo
> The default timeslice is something on the order of 100ms, isn't it?

Consider that a single core on a modern CPU running at 2 GHz can execute over
20k instructions in those 100ms.

~~~
sgerenser
20 Million instructions in 100ms. More if IPC is >1.

~~~
magicalhippo
And that's why I shouldn't do math at night...

Anyway, 100ms is quite a lot in the life of a modern CPU.

~~~
lmilcin
Even 1ms is a lot. I have some experience with algorithmic trading. The
application took messages off the network, processed them and responded to
market within 5 microseconds. That's 1/200th of 1ms. This measured on a
special type of switch ([https://en.wikipedia.org/wiki/Cut-
through_switching](https://en.wikipedia.org/wiki/Cut-through_switching) ).

Lots of stuff happens during those 5us. The message is read from the network
device (directly by the application, no Linux or syscalls anywhere during
those 5us). Then it is parsed, deduplicated (multiple multicast channels carry
redundant copies of the messages), uncompressed (the payload is compressed
with zlib), the uncompressed payload is parsed, interpreted (multiple types of
messages). Business logic is executed to update state of the market in memory
then to generate signals to listening algorithms. The algorithm is run to
figure out whether it wants to execute an order. The order is verified against
decision tree (for example to check whether it does not exceed available
budget). The market order packet is created and sent over TCP.

Now imagine, all that stuff happens in 1/200th of 1ms. In comparison,
transferring 48kB from L2 or L3 to L1 is pretty damn insignificant.

------
Trellmor
Last week Microsoft also rereleased the Intel microcode updates package
[1][2]. I kinda expect to see a new CPU flaw in the next few days.

[1][https://support.microsoft.com/en-
us/help/4497165/kb4497165-i...](https://support.microsoft.com/en-
us/help/4497165/kb4497165-intel-microcode-updates)

[2][https://www.windowslatest.com/2020/05/21/windows-10-kb449716...](https://www.windowslatest.com/2020/05/21/windows-10-kb4497165-update-
released/)

------
Zenst
So with more cores and associated L1 cache, context switching would be
potentially less I would of thought, small but maybe measurable.

Interestingly enough:
[https://www.theregister.co.uk/2020/05/24/linus_torvalds_adop...](https://www.theregister.co.uk/2020/05/24/linus_torvalds_adopts_amd_threadripper/)

------
Zenst
Maybe time to redo the ring model with multi core cpu's now the norm.

Why one or two CPU cores couldn't be dedicated to the OS aspect and locked out
of user-space of any form, certainly would be something worth exploring.

~~~
wtallis
In general, dedicating some cores to userspace and some to kernel would mean
sending a lot more data up to L3 for core-to-core communication, and I'm not
sure that would be any better than flushing L1d. The exception would be with
SMT, but then you're locked into a 1:1 ratio for user and kernel virtual
cores, and still have to worry about side channel vulnerabilities.

~~~
Zenst
That is true, which does somewhat put a whole new perspective upon this, will
they flush l2,l3 next!

Be nice though to have a proper isolated core or two for the OS, after all -
that is exactly what is done for enclave based security and management
systems. Though some not all a great track record.

------
ajsnigrutin
So.... when will we get a part of our money back, due to performance losses?
(i'm talking about a case like Volkswagen)

~~~
KCUOJJQJ
>part of our money back

A replacement CPU would make more sense. It should support the same operating
systems and motherboards. I wonder if Intel would be asked by a lot of people
to replace their CPUs.

~~~
madars
If history is any tell Intel allocated $475M pre-tax charge against earnings
when they did Pentium FDIV replacements --
[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)
. But I assume most people are more likely to raise a ruckus for potentially
invalid computations, than for subtle potentially leaky computations.

------
rasz
Thanks Intel.

