
Intel vulnerabilities costing 25% CPU performance loss to a cloud provider - samber
https://twitter.com/waxzce/status/1128711501206913028
======
morrbo
Does anyone actually have some benchmarks of the latest gen AMD vs the latest
gen intel processors with all mitigations for spectre, meltdown, and the 10
other sidechannel/speculative execution vulnerabilities applied?

I'd genuinely be curious to find out what the eventual results are because as
i understand it AMD is not too far away from Intel as a standalone processor,
surely in a "real world" scenario they'd be significantly faster?

~~~
dhx
For Linux 5.0 and as of March this year, the performance impact of enabling
Linux kernel mitigations for Spectre/Meltdown against various CPUs with the
latest microcode (as of March) are:

Intel: -13% for the Core i9 7980XE, -17% for the 8086K.

AMD: -3% for the 2700X.

Reference:
[https://www.phoronix.com/scan.php?page=article&item=linux50-...](https://www.phoronix.com/scan.php?page=article&item=linux50-spectre-
meltdown&num=6)

Phoronix is due to release new benchmarks tomorrow showing full impact from
Spectre/Meltdown/L1TF/MDS. There are some initial benchmarks at
[https://www.phoronix.com/scan.php?page=news_item&px=MDS-
Zomb...](https://www.phoronix.com/scan.php?page=news_item&px=MDS-Zombieload-
Initial-Impact)

~~~
adossi
What about for the newer processors like the 9980XE, 9900K, etc.? I would have
assumed that Intel's latest processors have some additional engineering in
place to mitigate the spectre/meltdown performance impacts.

~~~
kllrnohj
> What about for the newer processors like the 9980XE

The 9980XE is Skylake. It's not actually a "new" processor at all. Consumer
parts were released in 2015 (Core ix-6xxx) & server parts in 2015 as well
(Xeon E3-v5).

In fact the 9980XE itself isn't even a new offering in the HEDT space for
Intel, as it's basically a rebrand of the 7980XE. The differences are just a
soldered heat-spreader instead of paste & a small clock bump to go along with
that. It's +200mhz turbo & +400mhz base, complete with a power consumption
increase to match.

EDIT: The 9900K (Coffee Lake) does have in-silicon mitigations for Meltdown &
L1TF (Foreshadow), though: [https://www.anandtech.com/show/13450/intels-new-
core-and-xeo...](https://www.anandtech.com/show/13450/intels-new-core-and-
xeon-w-processors-fixes-for-spectre-meltdown)

------
samber
@waxzce: "FYI, as cloud provider we rawly loss around 25% of CPU performances
the lasts 18 months due to different CVE and issues on CPU and mitigation
limiting capacity using microcode, so we stuff more CPUs, but prices didn't go
down at all... That's a kind of upselling. #IntelFail"

~~~
aiCeivi9
Is it #IntelFail if Intel had _supply_ issues in last year? It doesn't look
like demand was affected by much.

~~~
Traster
Any way you slice it it's an Intel Fail. They've basically had to lower the
performance of their existing chips, they've failed on the road map for new
chips, and they don't have capacity for more of the existing chips because
they thought 14nm would be winding down by now rather than peak production.

------
jeltz
> If AMD was in a better shape, there is a real market momentup here.

What does this comment refer to? As far as I know AMD is in pretty good shape.

~~~
jasonvorhe
They probably can't meet the demand for server-grade CPUs. But I'm just
guessing.

~~~
ksec
Most of the current AMD offering, APU, GPU, and CPU is still being fabbed with
GF 14nm, which I believe to be very limited in capacity. Once the next
generation GPU ( Navi ) and CPU ( Zen 2 ) along with EPYC 2 moves to TSMC 7nm
they should be able to flex their muscles a lot more.

I wished they have made the move earlier, like start of this year, but it is
now all set for Q3.

~~~
mtgx
Amd should probably keep production of whatever they think will sell +20% to
account for whatever new Intel screwup is revealed.

~~~
ksec
Pure Speculation, judging from the slight delay ( 1Q ) in EPYC 2 introduction,
my guess is that AMD wants the launch and fulfil all the demand of the big
cloud provider ( AWS, Azure and Google ) while still have enough Chips for the
rest of the market. The worst thing could happen is AMD announced their EPYC 2
and AWS getting all the stock with little left for other Channels.

I have high expectation but I trust Dr Lisa Su and her executive team. And
Intel is currently having a new low in my book.

------
LinuxBender
We are doing perf testing of the AMD versions of HP/Dell servers. We may tech
refresh about 40k servers with epyc-2 if performance is better than Intel.
It's about time for some new servers anyway.

~~~
tgtweak
It seems that they're on track to get epyc into OEMs and cloud providers in
quantity. They've been successfully growing the ecosystem in the server space
over the last 2 years (Supermicro, dell, HPE, Lenovo all pushing epyc sku's
now) so they're very well poised to capitalize on it if they can keep up with
supply. The power efficiency gains on 7nm will be pretty tempting to cloud
providers and colocation clients alike. If you can put 256 cores (4 epyc2
sockets) into a 1U server and keep it under 1200W that's going to be a
landslide win for density and running costs. Factor in pcie 4.0 and things are
looking pretty sexy.

I've read that wafer yeilds on zen2 are 70%+ which gives a huge advantage on
cost and production efficiency to AMD. I think Intel's skylake yield for an
equivalent 28 core die is sub-40%. If anything the limiting factor is going to
be TSMCs ability to give AMD capacity on it's very crowded and in-demand 7nm
fab.

~~~
LinuxBender
Dell and HP already have them in quantity. We just have to perform due
diligence to ensure all of our workloads will remain the same or better
performance. Thus far, testing is looking good.

------
exabrial
What are the possible pitfalls of _not_ runnning spectre et all mitigations,
if you're _not_ hosting other people's code, but on your own hardware?

~~~
chmod775
Just guessing here, but suppose an attacker is able to get unprivileged code
to run on your machine (by taking over some process). He is now able to
extract secrets from other processes on the machine that he ordinarily
wouldn't have access to. Think SSL keys etc.

But I believe you're right in that those exploits are not as dangerous in a
non-shared hosting scenario.

~~~
phire
I'm not even sure code execution is strictly necessary for this style of
attacks.

An attacker could carefully craft network packets to force the control flow of
existing software to manipulate the CPU state. They could and then use the
timing differences in network packets to read data out.

Would be painfully slow, but theoretically possible.

~~~
bannable
I don't think this is correct. The timing attacks here all require extremely
high resolution timers, and network + I/O latency would obscure the variance
entirely.

~~~
dymk
People are able to crack poor password comparison implementations over a
jittery, latency heavy network. It’s possible to get almost arbitrarily high
resolution when doing timing side channel attacks, you’ll just need many more
samples.

------
walrus01
Don't get married to Intel, if you need big beefy xen or kvm hypervisor
machines, there's lots of good EPYC based motherboards.

~~~
gcells
I am planning on making a home-server. Any suggestions for EPYC boards?

~~~
cyphar
I'm setting up a home server and decided to just go with Ryzen. You get a
bunch of server features (ECC RAM, lots of PCIE lanes, and lots of cores) for
a lot less cash than the equivalent EPYC build.

Obviously EPYC has a place but for the home usecase you could use a Ryzen as a
substitution for Intel Xeons because of the baseline features of Ryzen.

~~~
mfatica
Do you need Ryzen Pro for ECC?

~~~
mizzack
Nope. Regular Ryzen supports unbuffered ECC modules.

~~~
mfatica
What about registered (buffered) ECC modules?

~~~
mizzack
Nope, Ryzen and TR don't. Epyc does.

~~~
mfatica
Thanks!

------
kierank
This is an argument for organisations to go back to being on-prem, running
their non-public facing workloads privately.

~~~
nixgeek
How's that an improvement over using 'Dedicated Instances' or the 'Bare Metal'
offerings which exist with cloud providers, where you're not colocated with
other entities but still enjoy all the benefits of how the infrastructure is
managed, flexibility in upscale/downscale (instead of acquiring capital assets
which depreciate over 36-48 months).

~~~
kierank
External access to the bare metal is still possible and thus information can
be leaked via Meltdown and Spectre

------
botto
Is this a chance for ARM providers to move in and outcompete Intel, especially
if they can provide similar tech but with security? Basically an ARMs race?

~~~
AnIdiotOnTheNet
You can't stop this class of vulnerabilities without sacrificing performance
because the reason for the performance is also the reason for the
vulnerabilities. That's the choice you have, and no brand affiliation will
save you.

~~~
mfatica
It's not about brand affiliation, AMD and Intel have vastly different
implementations

~~~
dfrage
Which didn't save AMD from Spectre bugs. And everyone else with high
performance speculative execution out-of-order designs also had Meltdown bugs,
ARM and IBM, both POWER and mainframe/Z:
[https://en.wikipedia.org/wiki/Meltdown_(security_vulnerabili...](https://en.wikipedia.org/wiki/Meltdown_\(security_vulnerability\))

~~~
mda
Yet, AMD looks way better considering all vulnerabilities reported so far. Why
downplay this fact?

~~~
013a
AMD was also measurably slower to begin with, in terms of per-core
performance. Not just by some small number; depending on the benchmark the
difference can hit as high as 40% on the latest architectures. What they lack
in single-core perf they make up for in a vastly superior multi-core
architecture.

Also, something everyone seems to forget: All of these cloud providers are
running Skylake chips, which were launched in 2015. Google Cloud in particular
will even give you chips OLDER than Skylake by default if you don't
specifically request Skylake. Even assuming instances like an AWS m5a are
running on Zen 1, not Zen 2, that's a 2017 architecture.

So you're presented with all these graphs that say "AMD only lost 5%, Intel
lost 25%, fuck Intel" but the reality is that Intel was previously far faster
than AMD, and they're not even fabbing their best designs for the data center.
Intel definitely had more vulnerabilities and they WERE hit harder, but its
more nuanced than just blindly wondering why more cloud providers aren't
making a fleet-wide switch to AMD.

~~~
nolok
> AMD was also measurably slower to begin with, in terms of per-core
> performance.

Not so surprising now that we know Intel merely skipped a lot of checks to get
there though. I mean a lot of the vulnerabilities impacting Intel only have
been the likes of "when predicting, the processor does it even it shouldn't to
avoid the delay from checking".

Is it really fair then to say that AMD doing it the proper way was slower, or
that the loss of performance of Intel can't be used as a way to congratulate
AMD on being safer on that front ?

~~~
qes
It would be fair to say that if AMD had the features but implemented in a way
that wasn't vulnerable. That, however, is not the case. AMD simply lacks the
features.

------
imhoguy
Now how much this 25% is in terms of CO2 emissions?

~~~
the-dude
About 25%.

------
userbinator
It's a little funny to see the "use AMD!" comments --- since from what I
understand, Intel's optimisations that lead to these side-channels are
specifically for performance, so using AMD instead of Intel might mean the
same amount performance loss.

~~~
mda
They have completely different architectures, some optimizations can be done
in safer ways.

~~~
dfrage
Isn't one of the "secrets" to AMD's post-486 success adopting the general
approach the Pentium Pro took along with others in the 1990s, which in turn
are based on IBM's 1960s Tomasulo out-of-order algorithm adding speculative
execution? This _general_ approach made _all_ out-of-order speculative
designs, including ARM, POWER and IBM Z vulnerable to Spectre, so I submit
they're not " _completely_ different".

------
adrianN
Why is nobody suing Intel? They sold defective chips.

~~~
deweller
Is there an End User License Agreement that users agree to before using these
chips? If it is anything like software EULA's, then Intel is likely protected
from lawsuits for defects.

~~~
ahartmetz
Except unreasonable EULA terms are void in most (all?) of Europe.

------
qaq
Intel vulnerabilities are providing additional 25% revenue to Cloud Providers.

~~~
kasey_junk
Most cloud providers are trying to move up the value chain & provide higher
margin differentiated services (managed db, queues, etc) instead of staying at
the race to the bottom vm market.

These intel mitigations impact those services just like everyone else.

~~~
tln
Does it? If customers can't run code on those servers, then these CPU level
side channel attacks aren't an issue.

This assumes the servers for S3 etc are dedicated to the task.

~~~
msbarnett
> Does it? If customers can't run code on those servers, then these CPU level
> side channel attacks aren't an issue. This assumes the servers for S3 etc
> are dedicated to the task.

I’m not sure that follows. Unpatched, most of these Intel CVEs are almost like
unpatched local privilege escalation vulnerabilities. Once you’ve compromised
a low privilege process you can sniff your way to more and more powerful
credentials on the machine, host, and in the network.

Unless higher abstraction cloud service providers are dumb enough to just run
everything as root, in which case this doesn’t change anything, they probably
can’t (competently) get away with not patching these. Defence in depth
matters.

------
sureaboutthis
Perhaps this is the reason for Intel's recent push on their self created
version of Clear Linux?

~~~
yjftsjthsd-h
How does that help? You still take the perf hit?

~~~
sureaboutthis
Tests on Phoronix show Clear having significant wins in a number of tests.
They have done so by rewriting libraries to perform well with Intel hardware.
Those who complain about a 25% performance hit elsewhere can now be shown
pretty good performance with Clear that isn't attainable with other distros.

~~~
yjftsjthsd-h
Oh, gotcha; boost from optimizations cancelling out against loss from patches.
I would argue that that still constitutes a performance loss since before the
patches you could run with just the improvements get better performance rather
than neutral. But you're not wrong.

------
mtgx
I bet Google isn't so giddy now about being FIRST!! [1] with Skylake in the
data center a couple of years ago, or "going on all-in" with Intel on the
Chromebooks (it didn't even give AMD a chance until very recently...), despite
Chrome OS being one of the very few operating systems that are truly
architecture agnostic.

Now it's paying dearly for that mistake, with up to 40% performance loss on
Chromebooks due to the disabling of HT:

[https://www.techrepublic.com/article/mds-vulnerabilities-
lea...](https://www.techrepublic.com/article/mds-vulnerabilities-lead-chrome-
os-74-to-disable-hyper-threading/)

Google broke one of the most basic business rules: never rely on a single
supplier. You're always worse off _in the end_ , even if the exclusivity deals
seem very tempting in the short-term.

[1] [https://cloud.google.com/blog/products/gcp/compute-engine-
up...](https://cloud.google.com/blog/products/gcp/compute-engine-updates-
bring-skylake-ga-extended-memory-and-more-vm-flexibility)

~~~
yjftsjthsd-h
> "going on all-in" with Intel on the Chromebooks

Weren't Chromebooks on ARM before x86?

> Google broke one of the most basic business rules: never rely on a single
> supplier. You're always worse off in the end, even if the exclusivity deals
> seem very tempting in the short-term.

I don't think this is _universally_ true, although I agree that it's probably
prudent. Diverse options/suppliers are a risk mitigation, but they do have a
cost.

