
Firmware Updates and Initial Performance Data for Data Center Systems - baq
https://newsroom.intel.com/news/firmware-updates-and-initial-performance-data-for-data-center-systems/
======
girst
> Impacts ranging from 0-2% on industry-standard measures of integer and
> floating point throughput, [...]

well, that's expected! its _context switching_ that causes the slowdown -- it
seems, one cannot trust intel's PR on meltdown/spectre issues.

further down:

> For FlexibleIO, [..] When we conducted testing to stress the CPU (100% write
> case), we saw an 18% decrease in throughput performance

well, that's more like it.

~~~
jo909
I understand where your coming from, Intel's communication on this issue was
pretty awful.

But in this specific case I don't really mind them publishing an "expected"
result of one benchmark together with a lot of other ones. Not everybody
understands the problem as well and is able to tell which use cases are
affected or not. If you send them to this site and all they see are negative
numbers they might not be able to tell that nevertheless there are other use
cases without significant impacts.

And I find it completely fair for them to show that not everything is affected
the same, and show the whole spectrum starting with zero impact.

~~~
philjohn
They should have been running actual workloads that are common in data
centres, rather than synthetic benchmarks.

~~~
semi-extrinsic
Well, to put it bluntly, I can tell you all the Real Programmers at national
labs, universities, government agencies etc who buy processors by the tens of
thousands for running big simulations of Important Things (as opposed to
running bucketfuls of Mongo instances for sharing streaksnaps and cat
pictures) are very interested in any effect on these synthetic benchmarks.

~~~
criley2
I love Hot Takes(TM) like this which purport to imply that the majority of
profitable programming isn't Real(TM), and only those applications that are
government, military or academic are somehow Real(TM).

The irony is of course that the software and hardware in use is only as
powerful and useful as it is because a billion people wanted processors to
take, share and look at cat pictures...

(That, and programmers with "real jobs" [see what I did there] happen to think
the opposite, that cashing your government grant isn't a real job!)

~~~
nickpsecurity
It was probably a reference to this old piece of satire:

[http://web.mit.edu/humor/Computers/real.programmers](http://web.mit.edu/humor/Computers/real.programmers)

In this case, there would be lots of programmers in HPC field alone who want
the micro and macro benchmarks to be good since they're squeezing every bit of
number crunching they can out piles of machines they spent a fortune on. When
I studied supercomputing, they were timing everything from latency of memory
operations (esp NUMA) to context switches on CPU's to raw MIPS. The suppliers
were competing on that stuff, too.

~~~
semi-extrinsic
Yes, I was not entirely serious. The word I was looking for was "flippant",
but I couldn't conjure it up when writing so I wrote "blunt".

------
jlgaddis
Meanwhile, "Red Hat slams into reverse on CPU fix for Spectre design blunder"
[0]:

> _Red Hat is no longer providing microcode to address Spectre, variant 2, due
> to instabilities introduced that are causing customer systems to not boot._

[0]:
[http://www.theregister.co.uk/2018/01/18/red_hat_spectre_firm...](http://www.theregister.co.uk/2018/01/18/red_hat_spectre_firmware_update_woes/)

~~~
cookiecaper
Yeah, Intel is caught with its pants down again, and clearly pushed out an
unfinished microcode just so people couldn't say they hadn't published
something, while quietly telling the Important People(tm) not to install it.

I've blacklisted the microcode update from my systems for now. I understand
this is being rushed out due to security issues, but the risk posed by complex
local exploits like Spectre is substantially less than the risk posed by
system lockups/reboots due to broken microcode. It appears that even ultra-
conservative Red Hat is being forced into that conclusion.

All of the mitigation stuff needs at least 4-6 more weeks in the oven before
it's anything near production-ready, and in the case of Intel, probably more
like 3-6 _months_ before they have a semi-stable microcode, if ever.

Disclaimer: I say this as an outside observer with no direct knowledge.

------
jgrahamc
We aren't seeing any significant overall performance impact on our service of
rolling out the 4.14 kernel. In synthetic tests of our HTTP/HTTPS workload we
saw a 2% increase in CPU but in production that doesn't appear to actually
happen. Will report on microcode later.

------
JepZ
> [...] for 90 percent of Intel CPUs introduced in the past five years [...]

Means neither '90% of Intel CPUs sold in the past five years' nor '90% of the
Intel CPUs currently in use'.

At least they took five years and not the usual two years...

------
codeulike
To address Spectre, I assume these patches involve turning off speculative
execution in some way. These various benchmarks that seem to show very little
performance degradation after the patch should perhaps lead to the question
"why was the processor ever doing that in the first place if cutting it out
barely affects performance?".

edit: And if its not turning off speculative execution, how is it addressing
Spectre? Because I thought that was the only way.

~~~
VladTheImplier
To be fair, the Source of the benchmarks is Intel itself, aka the one place
you should not take the numbers from due to conflict of interest, no matter if
they actually are accurate or not.

~~~
toyg
It doesn't help that during this whole thing, Intel behaved very much like the
Iraqi propaganda minister. If they had been a bit more honest from the start,
maybe I'd be keener to take their numbers at face value.

~~~
pdpi
Yeah — e.g. ARM were so upfront about their exposure that if they'd posted
this I wouldn't even blink.

------
jgrahamc
As an aside. Here's a fun story about how post-Cloudbleed, we went hunting for
bugs and ended up finding some of our crashes were caused by an Intel
processor bug: [https://blog.cloudflare.com/however-improbable-the-story-
of-...](https://blog.cloudflare.com/however-improbable-the-story-of-a-
processor-bug/)

------
magnat
Are those firmware updates related to Spectre and Meltdown? I thought those
bugs were unfixable via firmware/microcode updates and require either OS-level
workaround or completely new silicon design.

~~~
phire
Yeah, intel has been continually misleading with their PR. By neglecting to
mention the software updates, they are trying to trick people into thinking
"oh there is a firmware/microcode update which fixes it".

Anyway, Meltdown is only fixable by an os update (that software patch which
causes the massive 5-20%).

The microcode updates give os developers a few extra tools that allows them to
build Spectre migrations, like temporarily disabling indirect branch detection
while kernel code executes, or flushing the indirect branch entries on switch
to kernel mode.

~~~
jo909
> Anyway, Meltdown is only fixable by an os update (that software patch which
> causes the massive 5-20%).

Yes, currently the OS update with the performance degradation is all we have.
But could there be other future solutions that work differently and thus have
lower performance impacts?

I think there could be. AMD is not affected, so it is not at all impossible to
have a CPU behave "correctly". Wether Intel is able to correct their behavior
only in microcode is of course a different question, that I'm not really able
to judge.

But it could still be possible for them to add special CPU instructions that
allow the kernel to explicitly protect it's address space and go back to the
previous memory mapping.

I'm not super hopeful since they already had a lot of time to look into that
and did not come out or announce such a solution, but maybe they deferred that
in the light that KPTI works and is "good enough" for a first mitigation.

~~~
xenadu02
My understanding is that the latest AMD CPUs use a neural net branch predictor
which requires that they store the entire address, not just some of the bits,
so you can’t effectively poison the branch predictor.

Everyone else will switch to the same model.

I suspect that cache changes will need to be tagged with the reorder buffer
slot and rolled-back on mis-predict. It also means a hit to the N-way scheme
because you must be able to hold multiple instances of the cache line for the
same address.

I also worry there are undiscovered side channels lurking in arch-specific
registers or status bits.

~~~
phire
Rolling back the cache changes is just asking for trouble. Consider that any
cycles you spend undoing cache changes is also a side effect that can
potentially be measured.

Instead, hold the newly loaded cache lines in a "cache line buffer", the same
way how stores are held in a store buffer. For any reads, the CPU will check
the cache load buffers before L1 cache.

Then once the instruction which triggered the cache read completes, the new
cache line will finally be applied to the L1 cache and the old line evicted.

In the case of a misprediction, the cache load buffers can be discarded
instantly.

------
joeyh
You know, there are blind people working on software. Some of them might want
to read the detailed fine print in the enormous gif at the end of this post.
Shame.

~~~
chrisan
It is a PDF link which is fully searchable!

Granted, I don't know why they couldn't just put it in HTML.

At the very least they should change the link text and/or add some alt text on
what exactly they will be clicking on

~~~
joeyh
Thanks, I totally missed that. Apparently I've developed some sophisticated
circuits to avoid clicking on random gifs in the morning.

------
Animats
Bloomberg says that Intel is understating how big a business problem they
face. Big Intel buyers are not happy about this. Intel stock is down.[1]

This is an attack that lets an attacker read all of memory from user space.
Maybe even from Javascript in the browser. Remember, serious attackers don't
want to take over your computer and send spam. They want your data.

[1] [https://www.bloomberg.com/news/features/2018-01-18/intel-
has...](https://www.bloomberg.com/news/features/2018-01-18/intel-has-a-big-
problem-it-needs-to-act-like-it)

------
gcbirzan
What worries me is that those tests are with unpatched kernels. So these drops
are only from the microcode updates?!

~~~
sp332
These firmware updates are for Spectre Variant 2. The kernel patches are for
Meltdown (aka Variant 3). So it makes sense to benchmark the changes
separately. [https://arstechnica.com/gadgets/2018/01/heres-how-and-why-
th...](https://arstechnica.com/gadgets/2018/01/heres-how-and-why-the-spectre-
and-meltdown-patches-will-hurt-performance/)

Edit: I'm not sure this is right. RHEL/Centos kernel 3.10.0-693 is vulnerable
but 3.10.0-693.11.6 is patched.

~~~
danieldk
_The kernel patches are for Meltdown (aka Variant 3)._

There are also Spectre fixes landing in kernels. E.g. Linux 4.14.14 added
initial retpoline support:

[https://lwn.net/Articles/744621/](https://lwn.net/Articles/744621/)

The current LWN has very good coverage on the latest work on Spectre/Meltdown
mitigation in the kernel:

[https://lwn.net/SubscriberLink/744287/d868ef1ac3f68d70/](https://lwn.net/SubscriberLink/744287/d868ef1ac3f68d70/)

(Posting a subscriber link in good faith. If you like such content, please
subscribe to LWN.net, they are excellent!)

~~~
gcbirzan
> There are also Spectre fixes landing in kernels. E.g. Linux 4.14.14 added
> initial retpoline support:

Which has its own performance drawbacks, but the microcode update itself has
even more. And you need the microcode update for Broadwell and newer for
retpolines to work.

~~~
sp332
I thought the retpoline basically defeated speculation for certain
longjmp/function calls. Why do you need a firmware update for that to work?

~~~
gcbirzan
That subscriber link above explains why, but in the whitepaper I read, it said
Broadwell, not only Skylake.

------
ControlledBurn
I feel like the majority of this article can be construed as "Works ok on my
machine ¯\\_(ツ)_/¯"

------
programbreeding
>As I noted in my blog post last week, while the firmware updates are
effective at mitigating exposure to the security issues, customers have
reported more frequent reboots on firmware updated systems.

It does go on to say they have been able to reproduce the issue and are making
progress towards finding the cause.

~~~
Filligree
"Install this patch, and your servers will start regularly crashing."

Let's say it like it is.

~~~
babilen
Absolutely! They phrased it in such a way that it sounds as if it is normal
that computers just randomly reboot and as if the firmware upgrade simply
increases that frequency.

Random reboots should really never happen and the fact that Intel is trying to
imply otherwise is deeply worrying.

We all know that things go wrong. The problem is rarely that mistakes are
made, but rather that people aren't open about them and don't simply provide
concise technical analyses.

------
nagora
It's 2018 and Intel and Microsoft are proud to announce that they've decided
to put security first.

~~~
overcast
I mean, it took how many decades to find this, and accidentally? Using the
"current year argument" is silly, because it assumes that any view from an
earlier time is just inherently inferior because of where it fell in history.

------
luckydude
Would there be any interest in a before/after run of LMbench on a Haswell box?

------
amq
> Energy efficiency: 100%

It's just embarrassing.

~~~
douglasfshearer
100% of the original power usage. Not a 100% increase in power usage, that
wouldn't be thermally possible in cases of high CPU load.

------
axelfontaine
After the storm is before the storm:
[https://skyfallattack.com/](https://skyfallattack.com/)

~~~
organsnyder
Has anyone credible confirmed that this might be real?

~~~
slacka
This is most likely fake IT news. The site is hosted on a Raspberry Pi. Here's
u/kawaiineko5 's analysis on it:

* Domain registered only 8 days after meltdownattack.com yet is "based on the work highlighted by Meltdown and Spectre". Hardly seems like enough time to have come up with something significant enough to give a name to. Goes out of its way to copy the font used by meltdownattack.com and advertises itself with the names of meltdown and spectre, and their CVE IDs without listing its own. Given what they said it should have its own CVE IDs reserved by now. Just looks like a cheap grab for attention as it is.

* Unlike meltdown: Where are the mysterious Linux patches being speculated about if it's going to be announced when "operating system vendors have prepared patches." Is it so early that noone's begun work on it? Did noone invite Linux to the party?

* If it's actually important enough to be under "embargo", why are they hinting details on a public website about it at all?

* Its current icon[1] is a really cheaply made recolour of the Intel logo. Worst of any "hip and cool vulnerabilities with a name, logo and a website" yet, if real. Seems like the kind of thing I'd expect someone who doesn't understand meltdown/spectre to create because they saw people shitting on Intel, and definitely not the creation of someone who is supposedly working with chip manufacturers and following "embargos".

* Also apparently there's a second icon[2] based on the solaris logo. If one vulnerability is intel-related (i.e. a general purpose attack) and the other is solaris-related (i.e. a specific attack on solaris), why would they be bundled together? It's either inconsistent or the logos have nothing to do with the vulnerabilities which would make even less sense.

[1] [https://skyfallattack.com/android-
chrome-512x512.png](https://skyfallattack.com/android-chrome-512x512.png)

[2] [https://solaceattack.com/android-
chrome-512x512.png](https://solaceattack.com/android-chrome-512x512.png)

------
acd
I want to be able to buy systems with an Opensource boot loader like Coreboot
and no Intel management features at all. Plus with secure system calls in the
cpu.

Please disable Intel boot guard for coreboot and work with the open source
community.

That way we will have more secure systems.

~~~
nerdponx
AFAIK these attacks had nothing to do with IME or proprietary UEFI.

~~~
orclev
Both you and madez are correct, although I still agree with the principle
behind the original statement. IME (and the AMD and ARM equivalents) are a
gaping security hole in all modern processors and that should terrify everyone
who cares about security at all. Think about things like Meltdown and Spectre
and that's in parts of the processor we can actually audit, now consider all
the exploitable flaws that are lurking in things like IME that we (although
presumably nation states can) can't audit.

