
Meltdown fix impact on Redis performances in virtualized environments - ABS
https://gist.github.com/antirez/9e716670f76133ec81cb24036f86ee95
======
lettergram
Breakdown from post:

~8% pipelined SET performance reduction after patch

~15% pipelined GET performance reduction after patch

~29% non-pipelined SET performance reduction after patch

~25% non-pipelines GET performance reduction after patch

~~~
BonesJustice
Note also that the host OS (the one running the hypervisor) was already
patched in all cases. These tests only reflect the difference between a
patched and unpatched _guest_ OS. The total performance reduction is likely
greater.

~~~
eric_h
Good point - I guess it's more or less impossible to do this test on the
public cloud, now (at least on the big three).

It'd be fun to set up the hardware and do a full matrix benchmark of
patched/unpatched host and patched/unpatched guest.

------
justinsaccount
[https://www.phoronix.com/scan.php?page=article&item=linux-
kp...](https://www.phoronix.com/scan.php?page=article&item=linux-kpti-
pcid&num=3) shows that the kernel version has more of an impact.

~~~
antirez
That's super informative, as I wrote my methodology was flawed and indeed...
Tests to redo. Btw what's the reason for this terrible regression?

~~~
justinsaccount
The regression being the drop in performance you saw? I don't know.. I'd be
interested in seeing what the performance looks like on multiple "identical"
C4.X8LARGE instances, regardless of KPTI.

If you mean their results.. they actually show an increase in perf in later
kernel versions.

Based on

[https://github.com/phoronix-test-suite/test-
profiles/tree/ma...](https://github.com/phoronix-test-suite/test-
profiles/tree/master/pts/redis-1.0.0)

it looks like they just use the default configuration and run the benchmark
with

    
    
      -n 1000000 -P 32 -q -c 50 --csv

------
scrollaway
Yep, seeing a much more severe hit on all our elasticache instances than
anywhere else. CPU graph from two underused ones:
[https://twitter.com/Adys/status/949432228727218177](https://twitter.com/Adys/status/949432228727218177)

~~~
necubi
We also saw a huge hit on our memcached elasticache nodes. Eyeballing from our
CPU graph, seems ~30%.

~~~
berbec
I've noticed a big slowdown in PV EC2. One machine used to be fine running
apache and mysql. I had to add a 2nd cpu and split the mysql off onto a
different instance to keep load to manageable levels.

------
freen
Here's hoping your infrastructure scaling plan has more than 30% CPU
headroom...

~~~
forgot-my-pw
Is this good news for AWS? Some customers will be forced to upgrade to a
larger instance.

~~~
meritt
Unless customers absolutely raise hell, this is going to be very good for AWS.
Suddenly all of these autoscaling setups are going to consume 30% more
resources in order to achieve the same throughput.

My expectation is we may see AWS apply some significant discounting to AWS
services as a result, though that doesn't help all of us who utilize mostly
prepaid reserved instances.

~~~
epmaybe
Are the majority of aws revenues from casual users that probably wouldn't
care? If so, I think you're going to be proven right.

If, however, major customers make up the majority of revenue, then the answer
seems less clear to me.

~~~
meritt
Major customers are absolutely the primary revenue source. AWS revenue follows
the pareto principle.

------
cookiecaper
Is there a real reason to set `pti=on` in virtualized kernels, other than an
abundance of caution? I'm not familiar with the internals of VT-x or intel-
kvm.ko, but as far as I know, this is just doubling the hit for no real reason
(unless you're doing nKVM or similar?).

~~~
loeg
Do you expect privilege isolation between users and kernel in your virtualized
environment? If so, you have to enable PTI. If your virtualized environment
just has a single root user, there's no need.

~~~
cookiecaper
Are the timing semantics within a virtual environment predictable enough to
pull off the attack within the limits of the guest context? I guess at this
point we could just test for ourselves.

EDIT: Actually, thinking about this some more ... if the host has retpoline
and the microcode update, the indirect jump technique can't be used to deduce
the state of the BHB, so the kernel addresses never leak. If KVM's _internal_
kernel space, which isn't actually protected by the CPU's ring 0 (I think,
unless this is also provided by VT-x?), is guarded by whichever of IBRS/IBPB
is relevant, then would those mitigations prevent guest exploitation as well
regardless of timing semantics?

~~~
comex
The mitigations you mentioned are all for Spectre, aka variants 1 and 2, aka
tricking code running in kernel mode into doing something weird. But KPTI is
specifically for Meltdown, aka variant 3, aka having user mode code
speculatively access kernel data directly (data which is mapped in its address
space but marked kernel-only - thus the ‘fix’ is to separate the address
spaces). Protecting kernel code with retpolines or by flushing the branch
predictor won’t help with Meltdown, because it doesn’t rely on kernel code at
all.

------
whalesalad
Fairly substantial slowdown BUT still a phenomenal set of performance numbers.
800k GET’s per second is way above my pay grade.

------
jacksmith21006
Would be curious to see this also done on the Google cloud.

~~~
jedmeyers
Google Cloud does not have Redis as a service. You can deploy your own Redis
to GCE VM and try.

~~~
jacksmith21006
Yes deploy your own and do benchmarks before and after they implemented the
patch.

~~~
jedmeyers
Check Container-optimized OS release notes to see which version has the fix.
If I am not mistaken stable-63 has the latest set of fixes.

[https://cloud.google.com/container-optimized-
os/docs/release...](https://cloud.google.com/container-optimized-
os/docs/release-notes)

------
jacksmith21006
Rather shocked that Amazon and the other cloud providers are not screaming
harder at Intel. Maybe they are behind the scenes.

I believe the increase cost of I/O is going to hit Amazon and can not be
passed to the customer as what you pay is contractual.

Then with computer instances the customer is paying more for ultimately the
same amount of compute and that is a PR issue for Amazon and the other cloud
providers.

Seems like Amazon would not be OK with that and will want to pass it to Intel
somehow.

------
jnordwick
And this is only going to be for installations that actually care about this
level of security, so mostly just shared infrastructure/serverless/cloud.
Internal systems aren't going to need the mitigation.

I think people are blowing this way out of proportion.

~~~
Xylakant
Like “all installations on AWS, GCE, DO and so on are affected? Even semi-
private clouds might need mitigations since people rely on VMs as a security
boundary. So yes, this is a big deal.

~~~
cookiecaper
These test results do _not_ show the real effect of PTI, they only show the
effect of enabling pti within a virtualized kernel. That's still useful
because a lot of people are going to be enabling it (even though the value of
this seems dubious), but real tests showing the approximate performance loss
would need to happen on host hardware.

~~~
fpoling
Meltdown on Linux allows to read all physical memory, not only the kernel
data. The defence against it is only dubious if one does not rely at all on
process isolation as extra layer of defence.

~~~
cookiecaper
I understand that, but I don't know that either Spectre or Meltdown _needs_ to
be mitigated within the context of a KVM guest running on a host with
mitigations. They are timing attacks, and virtualization may incur enough
overhead/latency to make them infeasible (I don't know if it does or not, I
haven't tested).

Also, theoretically, the KVM driver and/or QEMU could be updated to block
speculative execution attacks specific to guest passthrough, though admittedly
I haven't done any KVM or VT-x hacking so I don't know its intricacies; maybe
this wouldn't work?

With a host that is using new vendor microcode to disable branch prediction
within unsafe contexts (IBRS as a global alternative to retpoline, or IBPB),
speculative execution attacks may be sufficiently mitigated at the host level
without necessitating additional mitigation at the guest level.

I'm not sure if anyone who doesn't work at Intel knows that for sure yet,
since it seems that some people are still trying to figure out how to get the
new microcodes...

~~~
Xylakant
Sure, a VM running on a patched host might not need extra mitigation, but the
host mitigation won’t magically be free, so running tests in a VM to get a
rough idea of the impact is still a good idea.

~~~
cookiecaper
Yeah I agree, but these tests can't show the cost without the base layer of
mitigation (since at least PTI should be in effect across all instances at
Amazon), so they're incomplete. The "unmitigated" version have at least the
host-level mitigations. I'd love to know more definitively whether host-only
mitigation is enough.

------
misterbowfinger
I don't quite understand what "pipelining" means. Can anyone TL;DR?

~~~
doublerebel
Please don't bring lazyweb to HN. We can already see the duplicate replies
(which are on the edge of karma whoring, for lack of a better term) filling
the comment thread.

There is redis documentation for this, which is literally the first result for
searching "pipelining redis":

[https://redis.io/topics/pipelining](https://redis.io/topics/pipelining)

HN has a standard which at minimum is above LMGTFY.

~~~
misterbowfinger
Woah chill out. I don't think it's that big of a deal to ask a question,
especially for newbies who find documentation intimidating and tough to get
through.

~~~
doublerebel
I can see from your profile that your account is "half troll", and there is
not really room for troll accounts on HN, while we do value legitimate
"devil's advocate" or contrary positions. This is how we maintain our level of
discourse.

In the old days of the internet, asking for anything searchable is the action
of last resort, and generally considered quite rude unless truly unfindable or
incomprehensible. Taking action on one's own intellectual curiosity is
invaluable for education.

~~~
Axsuul
Not to play devil's advocate here but there's no harm in labeling oneself as a
troll unless his actions meet the criteria. I certainly don't believe he's
trolling in this instance nor do his previous comments suggest past behavior
of such. Anyways, just wanted to point out that this is a straw man argument.

