
The Performance Cost Of Spectre/Meltdown/Foreshadow Mitigations On Linux - rbanffy
https://www.phoronix.com/scan.php?page=article&item=linux-419-mitigations&num=1
======
captainmuon
Maybe we should just give up wanting to run untrusted code safely as a failed
experiment.

For example 90% of the JavaScript I run is for advertizing purposes. The
pragmatic solution would be that either everybody moves to self-hosting ads,
or that the neccessary telemetry for third parties is moved into a browser
function. Web pages (that are not applications) should not need a "turing
complete" language (in the vulgar sense of the term).

If you have real applications - like GMail or Facebook - you should run them
in a sandbox, but they should not be treated differently from native apps.
Yes, a native app could use Meltdown etc. to attack other apps, but usually it
can compromize you even simpler since most desktop apps are not sandboxed. It
can just open the memory of other apps or abuse accessibility functions. My
point is, I don't _expect_ MS Word or Photoshop to be perfectly sandboxed. I
explicity trust the vendor when I install their app. If we would really
_trust_ what we trust, and really _distrust_ what whe don't, a lot would be
won.

Of course, I know that the world is not set up rationally and we'll continue
to be compelled to trust untrustworthy code on our PCs, so for the time being
it looks like we're stuck with the mitigations.

~~~
frankzinger
> desktop apps are not sandboxed. It can just open the memory of other apps

Care to elaborate? What about process isolation?

------
galadran
I guess we know why Intel tried on that benchmarking restriction. Serving
static content from Nginx is a 22% drop on Intel, a 7% drop on AMD. That's
huge for cloud providers!

~~~
GrayShade
The microcode update allowed an optimization (VMX conditional cache flushes)
for virtual machine hosts. It was just a lawyer having fun, I guess.

~~~
yoklov
The theory I heard that seems more likely than a rogue lawyer, is that it
accidentally got left in from the license of a pre-release test version -- I
can understand why they'd want to prevent benchmarks of unreleased software
(not that I necessarily agree with that).

------
samfisher83
AMD seems to fare better with the patches. Its good for AMD. I remember when
AMD64 came out and AMD had a lead on intel. Its good to have have some
competition.

~~~
vlovich123
Except Intel + mitigations still much faster on most workloads than AMD
without mitigations.

~~~
zrm
This isn't Bulldozer anymore. Intel has faster single thread performance on
some workloads, but the difference for "most workloads" is modest and for the
same money AMD is offering more cores, more memory channels and more I/O. For
threaded (i.e. server) workloads it has been a solid choice even before these
mitigations started eating into Intel's single thread performance lead.

~~~
ebikelaw
A 2-socket AMD setup has the same NUMA topology as an 8-socket Intel machine,
and because of that the performance is terrible on many workloads.

~~~
zrm
There are a lot of things that have to go the wrong way at once to get to the
point where that really matters.

The first is that your working set won't fit in the processor caches and has
regular cache misses into main memory -- but most of the Epyc line has 64MB of
L3 cache.

Then the access pattern has to be random rather than sequential, which knocks
out a major class of the applications satisfying the first criteria (all the
ones that process big files in sequential order).

Then the operating system scheduler has to fail to schedule the process on a
core in the same node as its data, most commonly because you have a process
with more active threads than there are threads per node.

What you're left with is, basically, large databases. But large databases also
benefit significantly from more cores, memory channels and I/O. Which factor
dominates is going to depend on specific usage, e.g. a database with randomly
accessed individual bits will be more sensitive to latency whereas one
containing pictures or other medium-large blocks of data will be more
sensitive to memory bandwidth.

You can certainly find a worst-case usage pattern for one or the other but in
general they're going to counterbalance each other.

~~~
ebikelaw
I don't think you really understand the architecture of the machine. EPYC
looks on paper like it has a large L3 cache, but it consists of separate L3
caches per "core complex" of which there are two per die and four dies per
package. So what you've actually got is a bunch of redundant 8MB caches, which
is not the same thing. Because of the baroque topology, especially when you
have two sockets, access to main memory varies between almost-as-fast-as-xeon
to way-way-slower-than-xeon. Combined with the small caches it's a total
disaster.

~~~
zrm
The amount of L3 per thread is still 64MB/#threads. Where the difference
you're describing most matters is for single-threaded code, where in theory
the one thread could otherwise have the entire 64MB. But that isn't the
circumstance with the NUMA latency anyway. If there is only one thread the OS
can schedule it on the same node as its data.

Most working sets fit in even 8MB (or less) -- the reason for 64MB is to
provide for multiple threads. In which case if one thread isn't using its
proportionate share there are seven others that can use it. Sharing with
sixty-three instead would be "better" but at some point it's diminishing
returns.

------
twtw
KPTI hurts (a lot). Syscall latency and architectural impact is dramatically
increased with KPTI, in large part due to the TLB flush that is typically
required.

This is probably why AMD CPUs do not have as big of a drop; KPTI is not
(urgently) necessary on AMD CPUs.

------
alkonaut
Regardless of whether the impact is big or small, you are stuck with it if you
need the security. But 5-10% performance impact is completely unacceptable for
the cases where you don't care about security (much) such as gaming. Given how
hard it is to keep my windows machine from eating my CPU anyway with updates
etc while I'm gaming: should people start booting to a special partition with
a legacy OS without these mitigations, and without a lot of the unnecessary
background tasks? 10% CPU might not be enough (depending on where the
bottleneck is) but for 10% higher FPS I'd do it.

~~~
saati
Games are rarely bottlenecked by CPU and they don't do much io, so they are
much less affected by the mitigations than server loads.

~~~
bryanbuckley
I suppose. Though, personally, I have to manually toggle the meltdown patch
(and reboot windows) to be able to play certain games effectively. One game
that I play frequently has a benchmark launcher and it shows a 40% performance
hit for the meltdown patch...

------
baq
this is very useful to know if i've already got a server or ten thousand, but
unfortunately the data doesn't help me with making a purchase decision. does
amd overtake intel performance-wise after mitigations or is it only a reduced
gap?

~~~
GrayShade
See [http://openbenchmarking.org/result/1808296-RA-
SMELTDOWN82&ob...](http://openbenchmarking.org/result/1808296-RA-
SMELTDOWN82&obr_imw=y) for the full data set.

------
ams6110
tl;dr: 2% - 20% degradation in performance depending on specific task; AMD
EPYC less affected (sometimes significantly) as compared to Xeon.

------
1996
What is the best way to disable the mitigations if you don't care about
security but care about performance?

~~~
ams6110
Don't update your OS/Microcode.

~~~
pritambaral
> Don't update your OS

Sounds like a good way to remain insecure to all other types of
vulnerabilities.

~~~
shakna
It is, but if you care only about performance, and not security, you don't
care.

~~~
pritambaral
Let me rephrase: Sounds like a good way to remain insecure to all other types
of vulnerabilities, and to never receive any performance improvements.

