Hacker News new | comments | show | ask | jobs | submit login
Intel Security Issue Update: Addressing Reboot Issues (intel.com)
259 points by bcantrill 9 months ago | hide | past | web | favorite | 117 comments

This is -- to say the least -- frustrating. First, the busted microcode is still available on the Intel Download Center[1], without any warning that they recommend that you not, in fact, install it. Second, the press release is still being evasive: they have not merely "received reports"; they in fact know that it's causing issues, and the press release is avoiding the much stronger language that Intel is giving privately (namely, don't install this).

The broken microcode is (at some level, anyway) forgivable; Intel's ongoing inability to communicate transparently and honestly with its customers during this crisis of its creation is much less so.

[1] https://downloadcenter.intel.com/download/27431/Linux-Proces...

Why should they bother communicating clearly with their customers? Who are those customers going to turn to? AMD? ARM?

Between Intel's numerous CPU bugs that they refused to refund customers for and ME, it's crystal clear what Intel thinks about their customers.

Customers actually could turn to AMD... their offerings are very competitive right now.

I'm thinking of building an AMD dev box. For enterprise consumers, if they're using 1U or blade servers, they could make the choice to switch to AMD for future nodes.

I strongly recommend that you go AMD. I went all-in on AMD - I agonized over the choice between 8-core Ryzen and 8-core ThreadRipper: ended up with a 12-core TR thanks to steep holiday-season discounts that lowered prices one rung down. TR4-socket motherboards a way more expensive compared to Ryzen ones (same-old AM4 socket).

I know my box is overkill for my needs now, but upgradeability is a big plus for me; I'm only using 16GB of RAM, but could up that to 128GB, and maybe I might swap out the CPU for a 64-core Zen4+ in 2022. For reference, my last dev box is from 2010[1](!) which I upgraded over time and this strategy has served me well. YMMV.

1. Westmere - 1st Gen 'Intel Core'

I have a Ryzen developer box at work and it is awesome, paired with 32GB of RAM and SSD's it absolutely screams.

My next home PC will be Ryzen 2 at some point this year.

Linux or Windows? I've been doing dev on a large React app recently and the thought of running npm install on Windows makes me anxious about the performance vs my Mac - wondering if Windows has gotten better of late with lots of tiny file I/O.

As an avid fan of AMD going back to the late 80s they have always been a cheaper and better alternative. I am still bitter about rdram in regards to Intel

They really haven't. AMD was so far behind Intel they were in danger of going extinct in data centers. Only very recently have they caught up again to be a credible competitor.

This bug and Intel's response is very good timing for AMD though.

It was Intel’s anti-competitive and illegal actions that prevented AMD from owning the market during the several year period when Opteron was not only the best CPU but the only 64-bit x86 CPU.

Unfortunately the legal process was far too slow and the penalties were a pittance compared to the profits.

It benefits all of us to have a competitive market for x86 CPUs.

They have been in a similar position before. It took Intel a while to respond to x64/Opteron. AMD was soaring for while back then.

They are literally no better.

AMD and ARM also has vulnerabilities similar to Spectre and Meltdown. For example http://fortune.com/2018/01/11/amd-chips-vulnerable-to-both-v... switching to AMD or ARM won't save you.

Many CPUs are vulnerable to Spectre, but Meltdown is much more severe and far easier to exploit. Meltdown is fairly specific to Intel.

AMD is vulnerable to Spectre, but nothing like Meltdown.

There are issues with this 'laptop' (don't put it on your lap is one), but AMD is a viable option I think..


Well AMD mobile CPUs are 4 core, so I would assume you bought one of the hackjob DTR's with a desktop CPU or just are using a desktop in a jest :P

It's a "hackjob" by Asus Republic of Gamers.

Not a home-built thing, but not what you'd call a laptop (portable, battery life) either..

> There are issues with this 'laptop' (don't put it on your lap is one)

ouch, that is definitiv not a good issue... but well my mbp late 2013" gets hot as well.

Hm? The screenshot was merely showing the good parts (8 cores, 16 virtual).

It's not getting particularly hot in general, entirely depends on the use case. When I max out the cores or run a game? Quite hot. Otherwise: Mostly fine..

It's just unwieldy, big and heavy, hence not really useful on a lap..

Have you tried cleaning the dust accumulated inside?

I always do.

>Between Intel's numerous CPU bugs that they refused to refund customers for

How do you propose this could work?

Are side channel vulnerabilities in CPUs really bugs?

I mean, this one, yea. Speculative execution should not have side effects when wrong because it is Intel silently, sneakily breaking the model of how the CPU works (at least, if you only include the cache in how the PC works and not branch prediction).

I would have expected, if I thought to ask, that items were not added to the cache or were removed from the cache if the branch was not retired.

Removing items afterwords probably wouldn't work as you might be able stuff (instead of flush) the cache and figure out which line was emptied.

Intel isn't being sneaky, speculative reading was a standard and accepted feature for out of order processors for over 20 years (remember it affects ARM,AMD,Apple,IBM etc as well). Speculative reading privileged memory while unprivileged was a big mistake though.

Intel's greatest PR success in this mess has been to conflate Meltdown with Spectre. Only Intel is affected by Meltdown because of their design, and it is a more easily exploited bug.

Meltdown is not only Intel. Some ARM and Apple designed ARM processors are affected by Meltdown as well. https://en.wikipedia.org/wiki/Meltdown_(security_vulnerabili...

There are no products on the market shipping with the one ARM-designed processor affected by Meltdown.

I think that's mainly out of luck. If the exploit had been discovered two years later, the story would likely be different. Apple has been much more ambitious with their ARM processor designs and has shipping iOS and AppleTV products affected by Meltdown.

Shipping or not, it illustrates, that Intel was not unique.

Now can you argue that the given faulty design was not directly influenced by Intel conscious and deliberate bad decisions ?

I'm not sure what kind of answer you are expecting. All I am saying is that Intel is not uniquely in the wrong here. There is a whole industry of bad decisions. Whether the decisions were conscious, or only obvious in hindsight I can't say.

What’s your source for the Apple claim, as your link doesn’t support it.

"Apple has already released mitigations in iOS 11.2, macOS 10.13.2, and tvOS 11.2 to help defend against Meltdown. To help defend against Spectre, Apple has released mitigations in iOS 11.2.2, the macOS High Sierra 10.13.2 Supplemental Update, and Safari 11.0.2 for macOS Sierra and OS X El Capitan. Apple Watch is not affected by either Meltdown or Spectre." https://support.apple.com/en-us/HT208394

Meltdown is a Variant of Spectre this isn't how Intel classifies it, this is how Google Project Zero, and heck even Intel's competitor AMD classifies it.



It's also not the scariest variant, it's easily fixed (performance degradation aside), doesn't require a microcode update to be fixed hence is 100% software mitigated, doesn't allow you to cross between guest and host memory address spaces and isn't remotely exploitable.

On the other hand variant 1 and 2 are much scarier because they are the complete opposite of Meltdown.

Meltdown is not a variant of Spectre. Spectre itself has two variants.

And Meltdown was the easiest to exploit. Spectre is "bad" because it affects everyone, but it's less exploitable than Intel's Meltdown.

Meltdown is very much a variant of this, 3 variant exist 3.1 if you consider Meltdown on ARM.

Meltdown is the easiest to exploit and the easiest to fix it’s also the least scary one as far as compromises go.

Meltdown is a specific type of Spectre exploit.

While it's more easily exploited, it's also patchable with minimal performance impact, unlike Spectre in general.

No, it's not. Please read the website of the attacks created by those who discovered and named them: https://meltdownattack.com and https://spectreattack.com

Potentially minimal is probably more accurate. It's workload dependent. In some cases, such as frequent interrupts or system calls on older CPUs without the PCID and INVPCIB features to mitigate the cost, it can be be very expensive.

I don't mean they're literally being sneaky. The point was, from an OS or userland perspective, it should be invisible. Besides performance, it should have no effect because it is literally breaking the CPU model by executing code it shouldn't. It fixes it by not retiring the results, but the bug is in leaving an effect that can be found.

If you had said CPU designers were being sneaky it would be more obvious that you weren't being literal. By saying "Intel silently, sneakily...", it's more personal and seems as if you are being literal. It wasn't really silent either, it was well enough documented that they did speculative execution. Many many very technical and educated people from across the industry knew about this and didn't think it was an issue. They were wrong.

Let's not throw the baby out with the bathwater here. I don't think the problem is that speculative execution is not as invisible as it was once believed. The problem is more of awareness and documentation. If there was an option to disable speculative execution and awareness of the associated security issues from the beginning, I don't think anyone would have a problem with using it for a performance boost where it was safe to do so. The problem is there was an industry wide assumption that it wasn't a problem that turned out to be wrong.

They promise modern process isolation and fail to deliver it. Their fixes reduce performance significantly. IANAL, but that sounds like a defective product.

> They promise modern process isolation and fail to deliver it.

Before one makes such a statement, one has to define "modern process isolation" in a very formal way, so that not anybody (neither Intel nor the customer) can redefine the meaning as they desire. I am not aware that Intel gave such a formal definition that they claim to obey to (but perhaps fail). So any operating system can only rely on very weak guarantees for the processor to provide "isolation" (using quotes since I have not defined the term "isolation" formally). Thus the OS has to implement stronger isolation primitives that it desires by itself (by using the weak primitives that the processor provides).

>They promise modern process isolation and fail to deliver it.

Modern process isolation is not flawless, therefore it is not modern process isolation.

They cant because every PR is written by a lawyer this is full protection mode against lawsuits who are coming anyway.

I've only owned Intel processors my entire life but this crosses a line. I plan to buy my first AMD motherboard/cpu and not look back. I really hope that one day Intel realizes that it's not enough to distract us with new shiny toys. Nearly all of us want solid trustworthy hardware first and foremost.

This entire issue will eventually become a case study entitled "how to fail at incident response, PR and make customers hate you"

if you're going to make the accusation that they're lying, at least provide a source - near as I can tell they are being as transparent as is reasonable.

bcantrill is https://en.wikipedia.org/wiki/Bryan_Cantrill

If he is saying that Intel is giving other advice privately then you are welcome not to believe him (and note that it is you who is using the much stronger term "lying" here).

Personally I think un-sourced statements from him are worth listening to.

My general attitude is to presume good faith - both on intels part, and on the part of commenters online. I had no idea who he was till you raised the point with me - had he I identified some source for his assertions (even first had observation on a large number of systems), I probably wouldn't have said anything.

Not that it's a source you can point to, but I have also heard the much stronger language — don't push this microcode — trickle out of Intel.

I think it's a bit ironic that the text, in a sense, blames Google for these problems by calling them the "Google Project Zero Exploits" as if Google was some sort of cyber crime syndicate using their evil powers to exploit intel.

Yeah I thought that was interesting too. I think they are most likely name dropping Google just because they want to reinforce whatever type of association with Google that they can get. Lots of people won't register this as a bad thing per se, and will just think "Wow it's super cool that Intel is working with The Googles on something".

As developers, we should know this phenomenon well by now, as it's dictated an ever-increasing portion of our toolchain. "Oh, you say Google uses this thing?! I use it too then! Google and me are best buds!". (This applies equally to Facebook, and to a lesser extent, Amazon. Compare one of my son's favorite YouTube videos at [0]).

Alternatively, they may want customers to think "Oh boy you have to be a super genius guy like the Googles to beat up Intel so this isn't a big deal", or "How could Google do this to a nice company like Intel".

So many possibilities, but really all of them turn out well for Intel.

[0] https://youtu.be/6x0yWfmh-zk?t=19

Intel's spin throughout this has been so scummy that it's hard if not impossible for me to not go with Ryzen from now on. Especially as my Intel CPU keeps rebooting with no fix in sight.

Wow, that's a good point. "...the exploits uncovered by Google Project Zero" would be much, much more appropriate.

The "Intel CPU fundamental design defects uncovered by Google" is more like it.

"Intel CPU deliberate design defects now uncovered by Google"

I rather read it as giving them credit.

Uhm, what a mess. This, just when Linux vendors began pushing updated intel-microcode packages (Ubuntu just released intel-microcode 3.20180108.0). Should we put the update on hold until this issue is hopefully resolved, or should we still update as suggested in the last paragraph of this Intel press release, somehow believing that the random reboots don't apply to "end users"?

Lenovo has put out an advisory about what to do with the BIOS updates that contain the microcode:


Withdrawn CPU Microcode Updates: Intel provides to Lenovo the CPU microcode updates required to address Variant 2, which Lenovo then incorporates into BIOS/UEFI firmware. Intel recently notified Lenovo of quality issues in two of these microcode updates, and concerns about one more. These are marked in the product tables with “Earlier update X withdrawn by Intel” and a footnote reference to one of the following:

1 – (Kaby Lake U/Y, U23e, H/S/X) Symptom: Intermittent system hang during system sleep (S3) cycling. If you have already applied the firmware update and experience hangs during sleep/wake, please flash back to the previous BIOS/UEFI level, or disable sleep (S3) mode on your system; and then apply the improved update when it becomes available. If you have not already applied the update, please wait until the improved firmware level is available.

2 – (Broadwell E) Symptom: Intermittent blue screen during system restart. If you have already applied the update, Intel suggests continuing to use the firmware level until an improved one is available. If you have not applied the update, please wait until the improved firmware level is available.

3 – (Broadwell E, H, U/Y; Haswell standard, Core Extreme, ULT) Symptom: Intel has received reports of unexpected page faults, which they are currently investigating. Out of an abundance of caution, Intel requested Lenovo to stop distributing this firmware.

it gets worse, Lenovo shoved out that firmware update as a 'critical' update back in december, and now it's causing major issues https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/...

It is a mess. I suggest that you and anyone else asking these questions pay attention to Microsoft a little bit, to receive all of the information that there is to be had on this.

Microsoft has been telling people about problems with the mitigations up-front. There are, for starters, Microsoft KnowledgeBase articles detailing problems with older AMD CPUs and with anti-virus softwares that behave like rootkit viruses resulting in systems that will not boot, and web log articles discussing the performance considerations for server systems.

* https://news.ycombinator.com/item?id=16076660

Spin control. By "higher system reboots" they mean "the OS crashes after their CPU 'fix'".

Sorry If i had this wrong, they knew of the issues in June, and they only fix it in Jan.

I can hardly call this rushed.

The fixes weren't widely run until this week.

presumably because Intel does not have access to the necessary hardware configurations to test the fixes on?

I couldn't accept this as an excuse when you have possibly the worst CPU bug in x86 history, or perhaps all CPU history, with ample of man power and resources, along with 6 months time frame.

Rush isn't a word I would use.

I highly doubt this, given how much hardware they appear have lying around to throw at the Linux 0day Test Bot (which does full kernel compiles, boots, and integration tests for dozens of hardware configurations for every patch sent to most LKML lists).

No, because they don't have access to all the software in the world.

I think all the more jarring is the smiling face of the spokesperson next to this announcement. Atleast can't the announcement be not with a photo or have a serious looking photo of the spokesperson.

The photo seems to remind me of someone saying, "that's all you get suckers" coupled with an evil grin.

“We have received reports from a few customers of higher system reboots after applying firmware updates.”

What does higher mean here?

The original statement, as phrased by their engineers, probably was something like “Our latest firmware regularly crashes your system, triggering reboots” (plus a few paragraphs with a highly detailed description of why that happened that only the engineers who wrote the firmware would understand)

This is what they ended up with after a few reviews with legal (“we can’t say ‘our’; they’ll eat us in court”) and marketing (“We need a less emotionally loaded way to say ‘crash’”)

Legal aimed to maintain just enough meaning in the statement to be able to say “we warned customers as soon as we could”; marketing aimed to make it a positive message. I guess that’s why ‘higher’ won over ‘more’.

Means whoever typed that can't write.

As in more reboots than they had been experiencing before applying the updates

more frequent

With the implication that random reboots of a lesser, but non-zero frequency are okay, it's at least expected?

Odd wording for sure.

If you have Intel wireless or display drivers installed then it is perfectly normal for a computer to randomly reboot.

How often?

I have 495 days of uptime here with intel graphics & wireless.

Reboot your PC mate, the kernel needs updating.


Not for the scale of changes that KPTI needs, and almost certainly not for the scale of updates they have likely missed.

OpenBSD. I don't think the PTI patches are quite ready yet. Either way I'm not too worried.

Kpatch works on (or has an equivalent in) OpenBSD?


Happy face on profile of that post is a bit out of context with the customer's feelings...

Intel's "security pledge" is hilarious, when ME/AMT continues to be used as a backdoor.

This just in:


I think it's hard to see how this will affect Intel in the long term.

When Samsung phones were blowing up, I thought that was it, but somehow people kept preferring the phones.

Now, in retrospective, the Samsung battery issue affected only a small portion of users, whereas this will affect every single user in the form of decreased performance.

It raises awareness about issues. Change takes time. Look at the fight of blacks and women for equality in society. We must not stop pointing out the issues, and always demand change.

probably related news from Dell: "NOTE 1: 13G, select 12G, and select DSS server BIOS files have been pulled from http://dell.com/support. This note and article will be updated as soon as more information is available" [1]

the pulled out BIOS update files for 13th gen were released on 5th of Jan.

[1] http://www.dell.com/support/article/us/en/04/sln308588/micro...

To be fair, it can well be that some sloppy OEM drivers take too many assumptions on reserved bits in registers (which the new microcode may be legitimately changing) or on undocumented timing side-effects related to some instructions (which the new microcode may affect, being that the root problem in the first place!).

These symptoms are also the classic ones you get when you install an OS on a new-generation, well-functioning CPU.

Don't speculate, it adds nothing but confusion.

My comment is no more speculative than those blaming the new microcode for causing reboots. The real bad quality, bad update process and unjustified binary nature of OEM firmware is too often overlooked.

My broadwell (i5-2500k) windows desktop has started blue screening like crazy if I do large continous network transfers (i.e. saturating gigabit ethernet). I didn't have this problem before I rebooted for my most recent update.

I thought it might be memory, but an 8+ hour memory scan (windows internal one, not the normal linux one) didn't tickle any bad bits and its not erorring in any unique component, each time, it seems to be a different one (first I caught the blue screen, it was the network driver, that made sense, so I upgraded it, just in case), but then it started being ntfs and other things. wondering if its just limited to those arches, or others.

> My broadwell (i5-2500k)

That's Sandy Bridge, 3 generations older than Broadwell.

and apparently with the latest news, it perhaps wasn't all in my head


you are correct, I actually looked it up, and meant sandy bridge (hence why I wrote "limited to those arches, or others"), but brain farted while writing.

Doesn’t exactly reassure you, does it.

Well this promises to be fun, especially for cloud providers (and those running instances on the cloud, who now potentially get to suffer through host instability)

Intel's adaptation of Netflix's Chaos Monkey [0]?

[0]: https://en.wikipedia.org/wiki/Chaos_Monkey

I was thinking the same thing as I read this. “Chaos engineering” has served some customers very well here.

From speaking in my circles, I get the impression that those of us without our own data centers to worry about are much better off than those who do.

It is not clear if Xeon server processors are affected.

Intel has made such a complicated product line it's really difficult to figure out what is and isn't affected. As per the Lenovo update mentioned elsewhere in this thread:

"*3 – (Broadwell E, H, U/Y; Haswell standard, Core Extreme, ULT) Symptom: Intel has received reports of unexpected page faults, which they are currently investigating. Out of an abundance of caution, Intel requested Lenovo to stop distributing this firmware."

So far as I can figure, Xeons are covered by "Haswell standard". Core Extreme was those ridiculously overpriced i7s. ULT is the "Ultra Low TDP" chips.


It looks like from the desktop and mobile processor fields, if there is anything special about the core they put a suffix on denoting it, so Xeon may well classify as "Haswell standard"?

I heard from my colleague that he had boot issues with a server (new Centos7 kernel) and a RAID he set up today.

“We rushed a patch out and it’s causing problems we don’t understand yet. We’ll rush out an updated patch as soon as we can, so please don’t hesitate to install that.”

"It could be true hardware and software orchestration requires an understanding of how components will work in concert, we apologize we did not rehearse in advance. "

Please don't use quotation marks to make it look like you're quoting someone when you're not.

It's clear that monochromatic's "quotation" was ironic.

A possible convention for ironic quotes:


> To avoid the potential for confusion between ironic quotes and direct quotations, some style guides specify single quotation marks for [irony], and double quotation marks for verbatim speech.

Is HN an authority on the proper way to write English? Ironic quotation marks are a fixture of the language.

In context it is very clear that the use is sarcastic, given how tight-lipped Intel has been about admitting any fault so far.

I think it was pretty clear from context that this was lighthearted and not a real quotation. Hell, I didn’t even include a source for where the quote allegedly came from.

I just had a hard crash/reboot on a Dell running Windows 10 on an AMD processor, followed by a couple of auto updates. There was no indication of updates being available before the crash.

Out of curiosity, how do you suppose an Intel microcode update caused a blue screen on your system running an AMD processor?

These issues aren't limited to Intel, and the people developing the mitigations are probably sharing ideas with each other.

As I have just written elsewhere on this very page, you need to pay attention to the several KnowledgeBase and web log articles that Microsoft has been publishing on this subject as things develop.

* https://news.ycombinator.com/item?id=16076660

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact