
Meltdown Initial Performance Regressions - natbobc
http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html
======
mrep
> Applications that have high syscall rates include proxies, databases, and
> others that do lots of tiny I/O. Also microbenchmarks, which often stress-
> test the system, will suffer the largest losses.

My team's RDS instances got hit hard with a 40% increase in CPU usage:
[https://imgur.com/a/khGxU](https://imgur.com/a/khGxU)

~~~
vanderZwan
> _Also microbenchmarks, which often stress-test the system, will suffer the
> largest losses._

Tangent: this only seems like more proof that we should not rely on them as
developers, no?

~~~
pdpi
It’s proof they serve a specific purpose. If the micro benchmark is especially
affected by the effect you’re trying to measure, it provides an upper bound of
sorts for the effect’s magnitude. This informs the rest of the analysis.

------
scott_s
The syscall in his benchmark made me laugh
([https://github.com/brendangregg/Misc/blob/master/s1bench/s1b...](https://github.com/brendangregg/Misc/blob/master/s1bench/s1bench.c#L124)):

    
    
      close(999);	// the syscall (it errors, but so what)

~~~
ajross
It's surprisingly difficult to come up with a syscall that is guaranteed to
enter the kernel and come back out on a fast path. Old choices for this were
things like getpid() or gettimeofday() which are now handled in the VDSO. This
seems fairly clever, honestly.

~~~
Hello71
getpid is only fast on ia64 according to vdso(7).

~~~
jmgao
glibc caches getpid, so you'll only see the syscall happen after the cache
gets invalidated by a call to fork, clone, etc.

------
jnordwick
An open question is still who should enable the mitigation. The risk-cost
doesn't seem to fit too many scenarios.

Meltdown requires running native, intrusted code, and that doesn't apply to
too many servers. While it may be possible to chain this onto another exploit,
once an attacker has gained remote code execution, you have much bigger
problems.

While meltdown is interesting, i wouldn't enable kpti on my database servers
buried behind other network infrastructure.

~~~
mehrdadn
> Meltdown requires running native, intrusted code, and that doesn't apply to
> too many servers. While it may be possible to chain this onto another
> exploit, once an attacker has gained remote code execution, you have much
> bigger problems.

[https://react-etc.net/entry/exploiting-speculative-
execution...](https://react-etc.net/entry/exploiting-speculative-execution-
meltdown-spectre-via-javascript)

~~~
prewett
You realize he's talking about his servers, and the article is talking about
Javascript running in the browser, right? The only things talking to his
database server is his own server code via SQL queries; if there's any
browsers and third-party Javascript running on that database server he's had a
real security meltdown, of the kind that makes Meltdown irrelevant.

~~~
mehrdadn
Yes? I'm just addressing his assertion that _" Meltdown requires running
native, intrusted code"_.

~~~
cookiecaper
He's not wrong, it's just that your browser does that automatically every time
it loads a web page. That's an insecure-by-default security model and it's why
people run JavaScript blockers like NoScript.

A server, however, is unlikely to do this, so the risk profile of something
like Meltdown is much lower. It's not nil because if someone _does_ get on
your box, whether they're supposed to be there or not, Meltdown can be
leveraged to read the memory anywhere on the system, effectively neutering the
protections of a multi-user operating system. This is a major information risk
by itself since it could leak important things like encryption keys or user
login info, but it could also be used to make local privilege escalation
exploits simpler. (That said, if a remote user is able to get a shell on your
box and submit code to the CPU, it's probably not likely that you won't
already have a latent local escalation vulnerability that can be chained to
get root.)

So he's right that people who don't execute arbitrary code on their CPU are
much less vulnerable, but they're not totally invulnerable because you can't
really guarantee that no one will be able to submit instructions to your CPU
due to RCEs and the like. Also, you essentially need to trust everyone who has
shell access to your server with anything that may be held in the server's
memory, including keys, passwords, etc.

People who are considering disabling KPTI will have to decide whether they
want to make the extra local attack surface available in exchange for the
performance gain under normal operation.

------
Scaevolus
What syscall rates do different databases sustain at maximum load? Transparent
huge pages negating most of the overhead is very good news-- but probably
helps less with mmap'd IO which so many databases use.

~~~
sargun
Mmap’d IO is still a shit show because of clearing the CR3 register on page
faults.

~~~
amluto
What precisely do you mean by "clearing the CR3 register?"

~~~
vardump
Although slightly technically inaccurate, he clearly means a full TLB flush.

Needs to be done on Intel CPUs older than Haswell, on those CPUs without
INVPCID support.

With INVPCID you can partially invalidate TLB.

~~~
amluto
Having recently rewritten Linux's TLB code, this is quite wrong. For an
ordinary page fault, there's no flush at all -- changing a page from not
present to present doesn't require a flush on x86. _Removing_ a page from page
cache can be done with INVLPG, which had been around for a long, long time.

From 4.14 on, Linux has used PCID to improve context switches, independently
of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not
terribly useful, even with PCID. In fact, Linux only uses INVPCID on user
pages to assist with a PTI corner case. It's not entirely clear to me what
Intel had in mind when INVPCID was added

------
vbernat
It would be interesting to know the interaction between the patched host and
the patched guest. As a simple example, if the host aggressively flushes the
TLB, the performance impact on the guest of doing the same could be lower. On
the other hand, depending on how the host was patched, the loss of performance
of the guest could be different when using some features.

------
rmrfrmrf
Out of curiosity, why does the syscall rate scale descend from left to right?

~~~
tyingq
He chose to graph from high syscall rate to low. I was initially confused, as
most people would have shown the ramp up, left to right. Doesn't matter much
though, once you get it.

------
voidlogic
It would be great to see performance deltas for AMD CPUs too, especially since
Meltdown only effects Intel and AMD patches for Spectre Variant 2 are
considered optional. I would also be nice to see a discussion of AMD's ASID
and any differences it has with Intel's PCID when when PCID is addressed.

~~~
mangix
What's being tested here has nothing to do with Spectre. Only KPTI.

~~~
bitL
Maybe a better formulated question would be - do the Meltdown changes to
kernel impose performance penalty on AMD processors as well (regardless of
exploitability)?

~~~
paulmd
They are not enabled for AMD by default. If you force them to be enabled, they
impact performance (of course).

~~~
jnordwick
Even on Intel there should be an evaluation of performance cost vs risk. If
you are not running unknown code, you may not want to enable kpti either.

~~~
paulmd
Extremely correct. Database and frontend servers are hit pretty hard, nothing
in the middle is hit. But neither of those ends is actually running untrusted
code, for the most part.

I predict a decline of the "hyperconvergence" server and a return of the usual
"database server" \+ "app server" \+ "frontend server" combo.

------
b4lancesh33t
I'm kinda surprised Netflix weren't already using THP.

~~~
Xorlev
Not a Netflix employee, but at my last role we disabled THP on every instance
we had. We had issues with databases (MySQL/Cassandra/HBase), Hadoop, and Java
applications.

I hear it's gotten a lot better since then, and the compactor doesn't freeze
stuff like it used to.

------
kev009
Do you have any idea how much it will regress your I/O workload on OCA?

~~~
brendangregg
The OCA team were working on it, but I don't have a number. My guess is small,
since it uses sendfile and then the packets are all done kernel to kernel. So
the syscall rate should be relatively low.

