My team's RDS instances got hit hard with a 40% increase in CPU usage: https://imgur.com/a/khGxU
Still terrible though. :(
Tangent: this only seems like more proof that we should not rely on them as developers, no?
close(999); // the syscall (it errors, but so what)
Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.
While meltdown is interesting, i wouldn't enable kpti on my database servers buried behind other network infrastructure.
A server, however, is unlikely to do this, so the risk profile of something like Meltdown is much lower. It's not nil because if someone does get on your box, whether they're supposed to be there or not, Meltdown can be leveraged to read the memory anywhere on the system, effectively neutering the protections of a multi-user operating system. This is a major information risk by itself since it could leak important things like encryption keys or user login info, but it could also be used to make local privilege escalation exploits simpler. (That said, if a remote user is able to get a shell on your box and submit code to the CPU, it's probably not likely that you won't already have a latent local escalation vulnerability that can be chained to get root.)
So he's right that people who don't execute arbitrary code on their CPU are much less vulnerable, but they're not totally invulnerable because you can't really guarantee that no one will be able to submit instructions to your CPU due to RCEs and the like. Also, you essentially need to trust everyone who has shell access to your server with anything that may be held in the server's memory, including keys, passwords, etc.
People who are considering disabling KPTI will have to decide whether they want to make the extra local attack surface available in exchange for the performance gain under normal operation.
To exploit meltdown you need to peek at specific places of memory (and flush the probe array from cache which might not be possible in JS easily).
Needs to be done on Intel CPUs older than Haswell, on those CPUs without INVPCID support.
With INVPCID you can partially invalidate TLB.
From 4.14 on, Linux has used PCID to improve context switches, independently of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not terribly useful, even with PCID. In fact, Linux only uses INVPCID on user pages to assist with a PTI corner case. It's not entirely clear to me what Intel had in mind when INVPCID was added
They're slower, because kernel needs to be mapped in and out of virtual address space, just like for syscalls.
If the access pattern is sufficiently local, perhaps this could be mitigated by using large (2MB) pages. A bad idea for a random access pattern, of course.
I predict a decline of the "hyperconvergence" server and a return of the usual "database server" + "app server" + "frontend server" combo.
I hear it's gotten a lot better since then, and the compactor doesn't freeze stuff like it used to.