Hacker News new | comments | ask | show | jobs | submit login
Meltdown Initial Performance Regressions (brendangregg.com)
226 points by natbobc 11 months ago | hide | past | web | favorite | 43 comments

> Applications that have high syscall rates include proxies, databases, and others that do lots of tiny I/O. Also microbenchmarks, which often stress-test the system, will suffer the largest losses.

My team's RDS instances got hit hard with a 40% increase in CPU usage: https://imgur.com/a/khGxU

Clarification of the parent: A 40% increase over baseline (60% util -> 84% util), not an absolute 40% CPU util increase.

Still terrible though. :(

> Also microbenchmarks, which often stress-test the system, will suffer the largest losses.

Tangent: this only seems like more proof that we should not rely on them as developers, no?

It’s proof they serve a specific purpose. If the micro benchmark is especially affected by the effect you’re trying to measure, it provides an upper bound of sorts for the effect’s magnitude. This informs the rest of the analysis.

The syscall in his benchmark made me laugh (https://github.com/brendangregg/Misc/blob/master/s1bench/s1b...):

  close(999);	// the syscall (it errors, but so what)

It's surprisingly difficult to come up with a syscall that is guaranteed to enter the kernel and come back out on a fast path. Old choices for this were things like getpid() or gettimeofday() which are now handled in the VDSO. This seems fairly clever, honestly.

This is a thing that generates many useless syscalls despite vDSO:


Why not just syscall(__NR_pid)?

getpid is only fast on ia64 according to vdso(7).

glibc caches getpid, so you'll only see the syscall happen after the cache gets invalidated by a call to fork, clone, etc.

manpages can generally be assumed to be out of date

An open question is still who should enable the mitigation. The risk-cost doesn't seem to fit too many scenarios.

Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.

While meltdown is interesting, i wouldn't enable kpti on my database servers buried behind other network infrastructure.

> Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.


You realize he's talking about his servers, and the article is talking about Javascript running in the browser, right? The only things talking to his database server is his own server code via SQL queries; if there's any browsers and third-party Javascript running on that database server he's had a real security meltdown, of the kind that makes Meltdown irrelevant.

Yes? I'm just addressing his assertion that "Meltdown requires running native, intrusted code".

He's not wrong, it's just that your browser does that automatically every time it loads a web page. That's an insecure-by-default security model and it's why people run JavaScript blockers like NoScript.

A server, however, is unlikely to do this, so the risk profile of something like Meltdown is much lower. It's not nil because if someone does get on your box, whether they're supposed to be there or not, Meltdown can be leveraged to read the memory anywhere on the system, effectively neutering the protections of a multi-user operating system. This is a major information risk by itself since it could leak important things like encryption keys or user login info, but it could also be used to make local privilege escalation exploits simpler. (That said, if a remote user is able to get a shell on your box and submit code to the CPU, it's probably not likely that you won't already have a latent local escalation vulnerability that can be chained to get root.)

So he's right that people who don't execute arbitrary code on their CPU are much less vulnerable, but they're not totally invulnerable because you can't really guarantee that no one will be able to submit instructions to your CPU due to RCEs and the like. Also, you essentially need to trust everyone who has shell access to your server with anything that may be held in the server's memory, including keys, passwords, etc.

People who are considering disabling KPTI will have to decide whether they want to make the extra local attack surface available in exchange for the performance gain under normal operation.

There is no Meltdown JavaScript exploit that i know of. This page lumps them together.

To exploit meltdown you need to peek at specific places of memory (and flush the probe array from cache which might not be possible in JS easily).

Your knowledge is broken then. There's literally an example in the Spectre Whitepaper. Here is the example from the whitepaper: https://react-etc.net/page/meltdown-spectre-javascript-explo...

How does that apply to the Meltdown and these KPTI-related slowdowns?

Can you be a little more specific about this JavaScript meltdown implementation in the Spectre paper?

What syscall rates do different databases sustain at maximum load? Transparent huge pages negating most of the overhead is very good news-- but probably helps less with mmap'd IO which so many databases use.

Mmap’d IO is still a shit show because of clearing the CR3 register on page faults.

What is a good source to read more about that?

What precisely do you mean by "clearing the CR3 register?"

Although slightly technically inaccurate, he clearly means a full TLB flush.

Needs to be done on Intel CPUs older than Haswell, on those CPUs without INVPCID support.

With INVPCID you can partially invalidate TLB.

Having recently rewritten Linux's TLB code, this is quite wrong. For an ordinary page fault, there's no flush at all -- changing a page from not present to present doesn't require a flush on x86. Removing a page from page cache can be done with INVLPG, which had been around for a long, long time.

From 4.14 on, Linux has used PCID to improve context switches, independently of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not terribly useful, even with PCID. In fact, Linux only uses INVPCID on user pages to assist with a PTI corner case. It's not entirely clear to me what Intel had in mind when INVPCID was added

Why would that be the case? I don't think you'd be changing page directory very often for mmio

I think he means page fault every time a page is not present.

They're slower, because kernel needs to be mapped in and out of virtual address space, just like for syscalls.

If the access pattern is sufficiently local, perhaps this could be mitigated by using large (2MB) pages. A bad idea for a random access pattern, of course.

It would be interesting to know the interaction between the patched host and the patched guest. As a simple example, if the host aggressively flushes the TLB, the performance impact on the guest of doing the same could be lower. On the other hand, depending on how the host was patched, the loss of performance of the guest could be different when using some features.

Out of curiosity, why does the syscall rate scale descend from left to right?

He chose to graph from high syscall rate to low. I was initially confused, as most people would have shown the ramp up, left to right. Doesn't matter much though, once you get it.

It would be great to see performance deltas for AMD CPUs too, especially since Meltdown only effects Intel and AMD patches for Spectre Variant 2 are considered optional. I would also be nice to see a discussion of AMD's ASID and any differences it has with Intel's PCID when when PCID is addressed.

What's being tested here has nothing to do with Spectre. Only KPTI.

Maybe a better formulated question would be - do the Meltdown changes to kernel impose performance penalty on AMD processors as well (regardless of exploitability)?

They are not enabled for AMD by default. If you force them to be enabled, they impact performance (of course).

Even on Intel there should be an evaluation of performance cost vs risk. If you are not running unknown code, you may not want to enable kpti either.

Extremely correct. Database and frontend servers are hit pretty hard, nothing in the middle is hit. But neither of those ends is actually running untrusted code, for the most part.

I predict a decline of the "hyperconvergence" server and a return of the usual "database server" + "app server" + "frontend server" combo.

Thanks, I understand now. What I didn't realize is that KPTI is only related to Meltdown mitigation. Thanks.

I'm kinda surprised Netflix weren't already using THP.

Not a Netflix employee, but at my last role we disabled THP on every instance we had. We had issues with databases (MySQL/Cassandra/HBase), Hadoop, and Java applications.

I hear it's gotten a lot better since then, and the compactor doesn't freeze stuff like it used to.

THP can cause serious problems (like “freeze your application for 1s while I compact its pages without asking”). Use at your own risk. It’s much better to use explicit huge pages.

Previous HN discussion on THP: https://news.ycombinator.com/item?id=15795337

Do you have any idea how much it will regress your I/O workload on OCA?

The OCA team were working on it, but I don't have a number. My guess is small, since it uses sendfile and then the packets are all done kernel to kernel. So the syscall rate should be relatively low.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact