
Intel Skylake/Kaby Lake processors: broken hyper-threading - vbernat
https://lists.debian.org/debian-devel/2017/06/msg00308.html
======
userbinator
The problem description is short and scary:

 _Problem: Under complex micro-architectural conditions, short loops of less
than 64 instructions that use AH, BH, CH or DH registers as well as their
corresponding wider register (e.g. RAX, EAX or AX for AH) may cause
unpredictable system behavior. This can only happen when both logical
processors on the same physical processor are active._

I wonder how many users have experienced intermittent crashes etc. and just
nonchalantly attributed it to something else like "buggy software" or even
"cosmic ray", when it was actually a defect in the hardware. Or more
importantly, how many engineers at Intel, working on these processors, saw
this happen a few times and did the same.

More interestingly, I would love to read an actual detailed analysis of the
problem. Was it a software-like bug in microcode e.g. neglecting some edge-
case, or a hardware-level race condition related to marginal timing (that
could be worked around by e.g. delaying one operation by a cycle or two)? It
reminds me of bugs like
[https://news.ycombinator.com/item?id=11845770](https://news.ycombinator.com/item?id=11845770)

This and the other rather scary post at [http://danluu.com/cpu-
bugs/](http://danluu.com/cpu-bugs/) suggests to me that CPU manufacturers
should do more regression testing, and _far more_ of it. I would recommend
demoscene productions, cracktros, and even certain malware, since they tend to
exercise the hardware in ways that more "mainstream" software wouldn't come
close to. ;-)

(To those wondering about ARM and other "simpler" SoCs in embedded systems
etc.: They have just as much if not more hardware bugs than PCs. We don't hear
about them often, since they are usually worked around in the software which
is usually customised exactly for the application and doesn't change much.)

~~~
wbl
CPU manufacturers do do huge amounts of testing, and Intel does formal
verification of some functional units. The reliability is far better than most
software, in part because making a new release costs billions.

~~~
davidmr
In my limited experience, their root cause analyses are really impressive as
well with lots of internal attention and resources. I'm not allowed to talk
about any Intel issues, but we reported a very strange issue to Nvidia, sent a
couple of dozen cards back and six months later got a truly fascinating report
back we with hundreds of pages of compute test result tables and electron
microscope images and chemistry lab reports. Anything that hints of a
manufacturing problem is taken incredibly seriously.

~~~
baruch
I worked for a large company that used thousands of Intel CPUs every year and
when we suspected a CPU bug we were mostly brushed off. We had a very
persistent person on the team who kept tracking the issue to find correlations
and some very good kernel developers that went on to nearly pin-point the
issue and only then did Intel pay attention and it then took them still
several months to acknowledge the issue give a brief report on the issue and
acknowledge that our proposed workaround will indeed work.

I've never seen Intel do a very good job at failure analysis or following on
with failures unless prodded very hard.

~~~
magila
Intel likely has very thorough data on the issue, but you'll never see it
unless you are one of their tier 1 customers* and have an NDA with them. In my
experience working for a large hardware manufacturer they are very skittish
about releasing detailed failure analysis data to outside companies.

* For Intel that would be companies like Dell, Apple, HP, and maybe a couple of others.

------
theGimp

      The issue was being investigated by the OCaml community since
      2017-01-06, with reports of malfunctions going at least as far back as
      Q2 2016.  It was narrowed down to Skylake with hyper-threading, which is
      a strong indicative of a processor defect.  Intel was contacted about
      it, but did not provide further feedback as far as we know.
     
      Fast-forward a few months, and Mark Shinwell noticed the mention of a
      possible fix for a microcode defect with unknown hit-ratio in the
      intel-microcode package changelog.  He matched it to the issues the
      OCaml community were observing, verified that the microcode fix indeed
      solved the OCaml issue, and contacted the Debian maintainer about it.
     
      Apparently, Intel had indeed found the issue, *documented it* (see
      below) and *fixed it*.  There was no direct feedback to the OCaml
      people, so they only found about it later.
    
    

Inexcusable.

~~~
d33
What exactly do you find inexcusable here?

~~~
theGimp
The contempt for users. I know what I do when a user files a real bug: respond
to them, acknowledge it's a problem, tell them when it's fixed.

The fact that Intel does not do that with a bug of this magnitude shows how
much respect they have for their users.

~~~
wyldfire
It's totally plausible that Intel detected this bug independently with their
own verification effort or through another customer. Matching different defect
reports when "unexplained" or nondeterministic behavior is the expected result
can be challenging.

------
fotcorn
The latest intel-microcode package from Ubuntu 16.04 does not fix the problem.
I installed the same package from Ubuntu 17.10 [0] which fixes the problem.
You can check your system with the script linked in the mailing list thread
[1].

[0] [https://packages.ubuntu.com/en/artful/amd64/intel-
microcode/...](https://packages.ubuntu.com/en/artful/amd64/intel-
microcode/download)

[1] [https://lists.debian.org/debian-
devel/2017/06/msg00309.html](https://lists.debian.org/debian-
devel/2017/06/msg00309.html)

~~~
arde
Indeed, the latest Intel microcode published for Ubuntu 16.04 is the ancient
20151106 [1]. Later Ubuntu releases do have more recent microcode packages
[2]. I cannot understand why they left out 16.04 there. So much for LTS, it
seems.

This recently came to my attention while debugging some increasingly frequent
lockups, which took me a solid week of eliminating all seemingly more likely
causes (VirtualBox, nVidia driver, faulty RAM, etc). In the end I found the
culprit while digging into the Intel Specification updates: my Core i7-5820K
(and most other Haswell-E and Broadwell processors) has a bug when leaving
package C-states, and the only workaround is to disable C-states above level
1. Timely updated microcode, which applies this workaround, would have saved
me my week.

[1] [https://launchpad.net/ubuntu/xenial/+source/intel-
microcode](https://launchpad.net/ubuntu/xenial/+source/intel-microcode) [2]
[https://launchpad.net/ubuntu/+source/intel-
microcode/+change...](https://launchpad.net/ubuntu/+source/intel-
microcode/+changelog)

~~~
rlpb
> Indeed, the latest Intel microcode published for Ubuntu 16.04 is the ancient
> 20151106.

By ancient, perhaps you mean the version that was current at the time 16.04
shipped?

> I cannot understand why they left out 16.04 there. So much for LTS, it
> seems.

See
[https://wiki.ubuntu.com/StableReleaseUpdates](https://wiki.ubuntu.com/StableReleaseUpdates).
The point of an LTS (or any stable release, for that matter) is that it
doesn't change by default. For those who want to keep everything up-to-date,
Ubuntu ships a new release on a six month cadence. If you choose not to use
that, then you shouldn't be surprised when things aren't updated, since that's
exactly what you opted in to.

The microcode package may warrant an exception, however, and we have a bug to
track that. It's tricky because without the source we cannot pick apart what
changed, or determine whether any changes meet our update policy. We have to
be careful. Sooner or later some user will inevitably come along to tell us
that a microcode update broke things, and ask why we didn't fulfill our LTS
promise by not changing it.

~~~
arde
By ancient I mean that it is 1.7 years old and by now Intel has published 4
later releases (20160607, 20160714, 20161104 and 20170511). I think it's a
fair and tame adjective considering this is processor microcode we're talking
about and that it made me spend a week of my own time hunting it.

You say Ubuntu has a bug to track the microcode package as an exception, but
that doesn't seem to be having a positive effect, does it? Precisely because
Ubuntu cannot pick it apart, what is it that they're trying to decide in the
bug? Why is Ubuntu second guessing Intel in deciding which microcode update to
apply and which to skip? How would Ubuntu know that better than the
manufacturer? The Intel specification updates list tons of processor bugs
including some very critical ones, so we know the microcode updates do help
with some of those. When was the last time that an Intel microcode update
brought a new bug or made something worse? I'm not aware of any such instance,
and although that may indeed happen sometime it doesn't seem as likely as
facing existing known bugs, right?

I think it could be argued that it is up to the user to decide (say, a warning
during installation), or that Ubuntu could choose to apply all microcode
updates by default and let the user opt out. Ubuntu might impose a certain
delay, say a month or two at most, in order to see if a microcode update gets
withdrawn or ends up too buggy. But I don't think Ubuntu could reasonably
choose to skip all microcode updates for 1.7 years like it did in my case, or
to choose which ones to apply and which ones to skip, like it seems to be
trying to. Microcode should be treated like other propietary software, but
with special dilligence due to its criticality. If nVidia says a particular
driver release is very buggy and should be updated, Ubuntu promptly updates
it. Why would Ubuntu sit on known critical microcode updates then? If, and
it's really a big if, eventually some microcode update brings a new bug and
Ubuntu deployed it, it would be Intel's fault and not Ubuntu's.

~~~
rlpb
Why is this Ubuntu's sole responsibility? Can you not get a UEFI firmware
update from your vendor?

> I think it's a fair and tame adjective considering this is processor
> microcode we're talking about...

Processor microcode updates haven't, to my knowledge, ever automatically been
applied by distributions in the past. In light of that, I don't see how it's
reasonable to have an expectation otherwise.

> You say Ubuntu has a bug to track the microcode package as an exception, but
> that doesn't seem to be having a positive effect, does it?

By being careful before pushing out an update to millions of users? I'd say
that's a positive effect.

> ...what is it that they're trying to decide in the bug?

Whether to continue to let users have a choice, or by taking that choice away
by doing things automatically for them. There are also packaging-based
regressions to consider; not just the microcode ones. For example: if the
wrong microcode is applied to the wrong processor because of a packaging
error, who would you be blaming? Intel or Ubuntu?

> But I don't think Ubuntu could reasonably choose to skip all microcode
> updates for 1.7 years like it did in my case...

Ubuntu didn't "choose to skip" all microcode updates. Ubuntu didn't choose at
all; pushing an update requires a specific effort.

In light of this issue, Ubuntu is now considering what to do about it,
responsibly, for all users. Both for this particular issue, and for microcode
updates going forward.

~~~
arde
I don't think yours is a serious answer, what I said already refutes your
points. Ubuntu has already made me waste a week and I don't want to waste any
more, particularly when you can't or won't listen to what I say and look for
irrelevant excuses like the availability of UEFI vendor updates. Just FYI
Intel provides these updates for very good reasons and RHEL/CentOS/Fedora have
been providing processor microcode updates for ages now and with relatively
frequent updates (see the microcode_ctl RPM changelog). Bye now.

~~~
rlpb
> RHEL/CentOS/Fedora have been providing processor microcode updates for ages
> now and with relatively frequent updates (see the microcode_ctl RPM
> changelog)

I wasn't aware of this, thanks. Though I searched, and I found that they've
been causing their users problems by doing so:
[https://rhn.redhat.com/errata/RHBA-2017-0028.html](https://rhn.redhat.com/errata/RHBA-2017-0028.html)

I think this backs up my point: care must be taken.

------
pedrocr
Here's how to fix it on a Thinkpad on Linux. I've got a T460s and checked with
the script[1] that it was indeed affected. The Debian instructions said to
update your BIOS before updating the microcode package so I went to the model
support page[2] to the BIOS/UEFI section and downloaded the "BIOS Update
(Bootable CD)" one. The changelog included microcode updates so it looked
promising[3]. To get the ISO onto a usb drive I did the following:

    
    
      $ geteltorito n1cur14w.iso > eltorito-bios.iso # provided by the genisoimage package on Ubuntu
      $ sudo dd if=eltorito-bios.iso of=/dev/sdXXX # replace with your usb drive with care to not write over your disk
    

I then had a bootable USB drive that I ran by rebooting the computer, pressing
Enter and then F12 to get to the boot drive selection and selecting the USB.
From then it's just following the options it gives you. It's basically
pressing 2 to go into the update and then pressing Y and Enter a few times to
tell it you really want to do it. After that just let it reboot a few times
and the update is done. After booting again the same test script[1] now said I
had an affected CPU but new enough microcode.

[1] [https://lists.debian.org/debian-
user/2017/06/msg01011.html](https://lists.debian.org/debian-
user/2017/06/msg01011.html)

[2] [http://pcsupport.lenovo.com/pt/en/products/laptops-and-
netbo...](http://pcsupport.lenovo.com/pt/en/products/laptops-and-
netbooks/thinkpad-t-series-laptops/thinkpad-t460s/downloads)

[3]
[https://download.lenovo.com/pccbbs/mobiles/n1cur14w.txt](https://download.lenovo.com/pccbbs/mobiles/n1cur14w.txt)

------
tyingq
There's a perl script on the debian mailing list that digs a bit deeper and
tells you if you're affected in the first place, if you're affected but
patched already, affected but have HT disabled, etc.

[https://lists.debian.org/debian-
user/2017/06/msg01011.html](https://lists.debian.org/debian-
user/2017/06/msg01011.html)

~~~
fattire
I ported this to bash since I have a chromebook w/o perl and (as for right
now) the fs is read-only, so I just piped the script to it and sure enough my
brand-new Samsung Chromebook Pro appears to be vulnerable, though apparently
patchable.

Details and if you want the script I link to it from here: [https://forum.xda-
developers.com/hardware-hacking/chromebook...](https://forum.xda-
developers.com/hardware-hacking/chromebooks/samsung-chromebook-pro-
asus-c302ca-t3627253) \- don't judge my shitty bash skills.

ft

------
Syzygies
In my experience with parallel code written in Haskell, hyper-threading offers
only a very mild speedup, perhaps 10%. It is essentially an illusion, a
logical convenience. (How long does it take to complete a parallel task on a
dedicated machine? Four cores with hyper-threading off has nearly the
performance of eight virtual cores with hyper-threading on.)

Many people have neither the interest nor the hardware access to overclock,
and these processors have less overclocking headroom than earlier designs.
Nevertheless, the hyper-threading hardware itself generates heat, restricting
the overclocking range for given cpu cooling hardware. In this case, turning
off hyper-threading pays for itself, because one can then overclock further,
overtaking any advantage to hyper-threading.

~~~
barrkel
It depends on what resources your code uses on-chip. If all threads are
contending on the same resources, then you won't see a speedup; if they're
using different resources, hyperthreading can increase throughput
significantly. I've seen hyperthreading give me the equivalent of 50% of
another CPU, particularly when I'm running multiple CPU-bound processes
concurrently (so they're not executing the same code at the same time in some
kind of parallel operation, and certainly aren't bound on synchronization
primitive overheads).

~~~
Syzygies
That makes sense. I'm a mathematician, and my experience is with pure
computations, homogeneous across each (virtual) core.

------
mjw1007
It's painful to have to read text like « select Intel Pentium processor models
».

If Intel used marketing names that were more closely related to technical
reality, then when something like this happens they wouldn't have so many
customers finding themselves in the "maybe I'm affected by this horrid bug"
box.

------
ourcat
So will this be affecting most Macbook Pros of the past few years?

If so, there's a way to disable hyper-threading, but you need Xcode
(Instruments).

Open Instruments. Go to Preferences. Choose 'CPU'. Uncheck "Hardware Multi-
Threading". Rebooting will reset it.

~~~
yborg
This is kind of like cutting off your leg because of a hangnail. I've been
running a Skylake MBP for more than 6 months for compilation workloads and
haven't seen a single processor hang.

I'm much more annoyed by the completely unpredictable desktop assignment on
monitors when hotplugging DisplayPort connections on multiple displays. This
one bothers me every day.

~~~
richdougherty
> haven't seen a single processor hang

If there was data corruption you might not notice.

------
onli
Rule of thumb: On a desktop, if you have an i5 you do not have Hyperthreading.
All i3s and i7s do have Hyperthreading, as do new Kaby Lake Pentiums (G4560,
4600, 4620).

On laptops, some i5s are not real quad cores but dual cores with
Hyperthreading.

~~~
decisiveness
>Rule of thumb: On a desktop, if you have an i5 you do not have
Hyperthreading. All i3s and i7s do have Hyperthreading, as do new Kaby Lake
Pentiums (G4560, 4600, 4620).

Hmm...either this statement is wrong or this desktop /proc/cpinfo is wrong:

    
    
        $ grep -E 'model|stepping|cpu cores' /proc/cpuinfo | sort -u
        cpu cores	    : 4
        model           : 94
        model name	    : Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
        stepping	    : 3
        $ grep -q '^flags.*[[:space:]]ht[[:space:]]' /proc/cpuinfo && echo "Hyper-threading is supported"
        Hyper-threading is supported
    

Intel's product spec page[1] lists this CPU as not supporting Hyper-Threading
so I'm a bit puzzled as to why the ht flag is present.

[1][https://ark.intel.com/products/88188/Intel-
Core-i5-6600-Proc...](https://ark.intel.com/products/88188/Intel-
Core-i5-6600-Processor-6M-Cache-up-to-3_90-GHz)

~~~
justinclift
Hmmm, checking for "ht" seems to be giving weird info. On a i5-750 here (few
years old), running Fedora 25:

    
    
        $ grep '^flags.*[[:space:]]ht[[:space:]]' /proc/cpuinfo
        flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
        dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts
        rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2
        ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid dtherm ida
    

"ht" is being returned even though the CPU only has 4 cores and no
hyperthreading:

[https://ark.intel.com/products/42915/Intel-
Core-i5-750-Proce...](https://ark.intel.com/products/42915/Intel-
Core-i5-750-Processor-8M-Cache-2_66-GHz)

dmidecode seems to give more accurate info for this:

    
    
        $ sudo dmidecode -t processor | grep 'Count:'
        	Core Count: 4
        	Thread Count: 4

~~~
decisiveness
It looks like dmidecode also contradicts itself with the hyper threading flag:

    
    
        $ sudo dmidecode -t processor | grep -E 'Flags:|HTT|Status|Count'
        	Flags:
    		HTT (Multi-threading)
    	Status: Populated, Enabled
    	Core Count: 4
    	Thread Count: 4

~~~
justinclift
Hmmm... yeah that's showing the same on mine too. According to dmidecode, this
CPU has hyperthreading.

------
age_bronze
I would've expected at least an example assembly code reproducing the bug? How
was it not discovered before, but only with the OCaml compiler? They say
"unexpected behavior", does this mean that code compiled with this can give
incorrect results? Can this have any security implication? How much code was
compiled with similar patterns? Can the problem reproduced with any JIT
compiler? We need to know what can cause this, maybe compiled and working code
already contains such patterns waiting to be abused...

------
zzalpha
Charming. I picked up a 5th Gen X1 Carbon configured with a Kaby Lake
processor, and apparently there's no way to disable hyperthreading in the
BIOS, and according to Intel's errata, no fix available yet.

Oh well... so far the machine (running Windows 10) has been stable minus one
or two random lockups in 2 months of heavy usage which could be attributed to
this. Guess I wait...

------
luckydude
That's a really nicely done announcement. Simple, to the point, no drama, all
the info you could want, scripts to figure out your processor, etc.

Well done Debian folks!

------
rwmj
So if I understand correctly, some affected processors can be fixed by a
microcode update, but there are some which _cannot_ be fixed at all?

Also the advisory seems to imply that the OCaml compiler uses gcc for code
generation, which it does not -- it generates assembly directly, only using
gcc as a front end to the linker.

~~~
dooglius
It sounds like they can all be fixed by disabling hyperthreading.

~~~
libeclipse
> fixed

mitigated

------
ComputerGuru
So, serious question: If the microcode "fix" for this ends up disabling HT,
how does one get a refund not just for the CPU but for the $3k laptop I spec'd
around it? Without needing to sue?

This isn't a hypothetical; what did Intel do when the only fix for broken
functionality was to disable TSX entirely?

~~~
mixmastamyk
I remember the pentium bug in the mid 90s, they actually shipped out
replacement processors. Doubt that could be pulled off on laptops. Perhaps a
microcode update can work around it.

~~~
senectus1
Given my Surface Book got a 1/10 repair-ability score on ifixit... I dont
think they'll just replace the chip :-P

------
herpderperator
If anyone on Windows wishes to update their CPU microcode without waiting for
Microsoft to push it out via Windows Update, you can use this tool from VMware
[https://labs.vmware.com/flings/vmware-cpu-microcode-
update-d...](https://labs.vmware.com/flings/vmware-cpu-microcode-update-
driver) which can update microcode as well.

Windows stores its microcode in C:\Windows\System32\mcupdate_GenuineIntel.dll
which is a proprietary binary file and you can't simply replace it with
Intel's microcode.dat file (which is ASCII text), so you have to use a third-
party tool such as VMware's one.

Simply: 1\. Download and extract the zip file in the first paragraph 2\.
Modify the install.bat file so that the line which reads `for %%i IN
(microcode.dat microcode_amd.bin microcode_amd_fam15h.bin) DO (` only contains
the microcode.dat parameter (since you obviously don't have an AMD CPU, and
the tool is made for both) 3\. Download and extract microcode.dat from Intel's
website ([https://downloadcenter.intel.com/download/26798/Linux-
Proces...](https://downloadcenter.intel.com/download/26798/Linux-Processor-
Microcode-Data-File)) and place it into the same directory as the VMware tool
4\. Run install.bat with admin privileges 5\. Hit cancel when it tells you
that the AMD microcode files are missing, and you're done

The CPU microcode will be updated immediately (yes, while Windows is running.)
The service will also run on each boot and update your CPU microcode, since
microcode updates are only temporary and are lost each time you restart. You
can check Event Viewer for entries from `cpumcupdate` to see what it has done.
It's advised to run a tool to view the microcode version before installing
(such as HWiINFO64) so you can re-run the tool after installing and confirming
that the version has changed.

I have done this and it works as described. I went from 0x74 to 0xba as shown
by the μCU field in HWiNFO64, and I have an i7-6700k.

------
wscott
Has anyone benchmarked one of these machines before and after applying this
microcode update? The options in microcode are rather limited and all are
likely to have performance impacts. This is likely disabling functionally to
avoid this case. I would hope the patch is smart enough to not apply if
threading is not enabled, but who knows.

~~~
jakeogh
Would a performance hit go unnoticed?

Usually (always?) it's not a ROM update, the encrypted microcode blob is
loaded into the CPU by the OS on every boot via CONFIG_MICROCODE.

some linkrot: [http://imgur.com/a/z1uLv](http://imgur.com/a/z1uLv)

------
ncrmro
Just got the 2017 no touchbar 13 macbook pro with the kaby lake i7. Should I
be worried, can I even disable HT with mac. And presumably the update will be
provided so the whole laptop is still ok?

I've been using the thunderbolt 3 dock with two external monitors and
occasionally get a little glitch prolly loose cable I think.

I've downloaded the bitcoin blockchain, done quite a bit of work in pycharm +
chrome, multiple projects, flow and webpack in the background and haven't had
any sort of crashes tho.

------
isaac_is_goat
Holy cow. Definitely feel like I dodged a bullet by building an AMD/Ryzen
system this time around - which had it's own set of issues (but seem to be
more or less ironed out now).

~~~
gbin
This is not a fair comment: Ryzen had a crash that can be triggered by
compiling with GCC and a memory compatibility issue where it cannot run them
at their nominal speed. Ryzen is a really young architecture, it already had
like 6 stable patches of microcode and you can expect way more.

~~~
isaac_is_goat
Still seems to be doing better than "turn off half of your processor, sorry no
fix just buy a new one".

~~~
pvdebbe
Hyperthreads are just that - threads. It won't be 50 % slower with HT
disabled.

~~~
SXX
Intel still charge extra $100 for it.

~~~
theandrewbailey
That's less than 50% of the price for CPUs with HT.

------
joshschreuder
How does one get new microcode on Windows? Is that what the Intel Chipset
drivers are?

And is the microcode fix available for non-Linux systems yet?

~~~
aarongolliver
Windows update

~~~
tempestn
Has this already been patched via Windows Update? I didn't even know WU could
do CPU microcode patches.

------
riledhel
Does Windows have a patch for this too? Or just disabling HT is the safest
option?

~~~
pbsd
Windows does have a microcode update driver, as you would expect, so it can
fix this.

However, looking at the microcode update driver on an updated Windows 10 as of
right now, I don't see a recent enough microcode version to fix it. The latest
updates appear to be from 2015.

~~~
sqldba
That's pretty cool, I didn't know of this functionality. Is there a good
resource on how to look it up?

~~~
pbsd
I haven't seen a description of it anywhere that I can think of. The driver
lives in C:\Windows\system32\mcupdate_{genuineintel,authenticamd}.dll. All it
does is detect which CPU it's running on, and load the appropriate microcode.

Loading the microcode is straightforward---all you need to do is put a few
values in the appropriate MSRs. This is described in the Intel manuals, Volume
3. The microcode itself is embedded in the above DLLs, as a big binary table.
The latest entry I see is 20150812 for the Intel 0x40651, aka Haswell.

------
bleair
When intel had the floating point division hardware bug they recalled chips.
[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

I wonder if intel will do something like that again or if the industry as a
whole is more tolerant of unreliable / buggy behavior and will just live with
it. Examples of Apple just telling people that the poor reception strength was
their own fault / changing software to hide problems / etc.

~~~
aarongolliver
If you read the link you'll see they've fixed it with a microcode update.

------
paines
I have a Skylake mobile CPU (i7-6700hq) and it pretty much rocks with Ubuntu
17. Also the system is stable and fast. Under heavy load, e.g. games the
system is stable. Compiling a big(>10000 modules) C++ project via ninja/cmake
under Qt Creator hangs the system resporducibly after ~15 minutes. I wonder
now if this broken hyperthreading could be such sideffect.

------
spektom
I wonder how this issue affects cloud providers?

------
itsoggy
When HT first started appearing on P4 chips I was looking after NetWare, 2K
and XP boxes, they would freak out with HT enabled all kinds of oddities, I
suspect most because of the OS's not fully supporting it.

To this day I disable it by reflex on everything!

------
jwildeboer
I do wonder though, why didn't Debian maintainers pick up the microcode
updates when they were made available by Intel? Why did it need a wink from
the Ocaml people for them to note? Or am I missing something?

------
msimpson
The late 2016 Razer Blade uses the i7-6700HQ which is specifically a Family 6,
Model 94, Stepping 3 processor.

I wonder if a microcode update would solve some of the various issues I have
in Windows.

------
elnik
My CPU (6th generation i5) died last week. RIP.

I installed debian 9, installed virtualbox, vagrant, setup a clean development
machine for myself, everything took 4 hours to finish.

I reboot the virtual machine, and boom, there was a kernel panic which I sadly
don't remember exactly / didn't take a picture of. After I rebooted the
machine, and opened terminal, the system froze. The cursor wouldn't move.
Reboot again, motherboard has a CPU fail/undetected light on. Couldn't get it
to boot after that.

I am both sad and relieved that bad stuff exists, but it's being patched to
prevent proliferating.

I sincerely hope I'll get a replacement from Intel.

~~~
broknbottle
CPUs rarely die, unless you're OCing or PSU went bad and took things out, I am
willing to bet your MoBo is the part that is bad.

------
Traubenfuchs
Are those microcode updates deployed via windows (10) update?

------
karussell
What is the probability for this to happen? Or how could I estimate the time
it takes for random code to hit this bug at least once with a probability of
over 90%?

------
wfunction
A little off-topic, but does anybody know of any hacky ways to disable hyper-
threading (on Haswell if it matters) if the firmware doesn't provide the
option?

------
peter_retief
If I run $grep name /proc/cpuinfo | sort -u

model name : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz

then $cat /proc/cpuinfo | grep ht

Definitely there under flags Should I be concerned?

~~~
itchyouch
Your processor isn't affected. It would be i[357]-[67]xxx processors only.

But a new computer may serve you well...

------
convefefe
One of many typical erratum... nothing to see here, been patched months ago.
Most people are unlikely to ever encounter it even if unpatched.

------
ericfrederich
Is it fixed or not? Beginning of post says to disable hyperthreading but then
goes on to say Intel fixed this with a microcode update.

------
cJ0th
This is just great! Just yesterday I got a new laptop with a skylake
processor. Now I wonder whether I experienced that bug today as I got kicked
out of dosbox (on debian) for no apparent reason. In the config file I had to
change the value of 'core' in the cpu-section from 'automatic' to 'normal'.
Could be something entirely different but it is a funny timing.

------
octoploid
Well, at least Intel acknowledges, documents and finally fixes these CPU bugs
(via microcode updates).

AMD on the other hand doesn't even acknowledge an issue when multiple
customers report problems. See this Ryzen bug:
[https://community.amd.com/thread/215773](https://community.amd.com/thread/215773)

~~~
justinclift
Huh? There are plenty of AMD employees in that thread who have acknowledged
the problem. They're "looking into it", but just seem to have no progress to
report yet, and suck at keeping people in the loop.

~~~
lstamour
And at least one person near the end of the thread reports the problem occurs
when ASLR is turned on, for what it's worth:
[https://community.amd.com/thread/215773?start=105&tstart=0](https://community.amd.com/thread/215773?start=105&tstart=0)

------
walterbell
Does this affect execution of Ocaml runtime, or only the Ocaml compiler?

~~~
rwmj
It affects code generated by the OCaml native compiler, which includes the
OCaml compiler itself since it is written in OCaml. See also:
[https://caml.inria.fr/mantis/view.php?id=7452](https://caml.inria.fr/mantis/view.php?id=7452)

It also affects code generated by GCC, but apparently GCC is less likely to
generate code sequences which trigger the CPU bug.

~~~
tom_mellior
> It affects code generated by the OCaml native compiler

Possibly, but that was not the issue here. It's clear from your own link that
the crashes were due to _C_ code in the OCaml runtime (used by the compiler
itself), which is written in _C_ and was compiled with _GCC_ at -O2. See
[https://caml.inria.fr/mantis/view.php?id=7452#c17129](https://caml.inria.fr/mantis/view.php?id=7452#c17129)

~~~
rwmj
Indeed, you're right.

------
geogriffin
has anyone affected by this bug tried using a kernel configured with
hyperthreading support disabled? would that work?

------
asow92
So what does this mean for the thousands of new MacBook Pro 2016/2017 owners
out there?

~~~
coldtea
Nothing, since they've been running their laptops without issues (it would
have been all over the news if it was some widespread issue) for 2+ years.

At some point in the near future Apple will package the microcode fix in an
update, and that will be it.

~~~
donkeyd
I've run into an issue where switching users causes a crash on my 2016 MBP.
Many more people are having this issue, according to the Google.

Also, up until the end of 2016, Apple didn't use Skylake processors, so it
wouldn't be 2+ years.

------
Magnificents
The poor guys from OCaml who found the bug. Imagine how much debugging it
takes to find such an issue and narrow it down to the precise register
sequence. I guess since it’s a hyper threading bug it even depends on multiple
threads doing certain things at the same time. Usually you trust your CPU to
execute code properly.

------
ManyEthers
Intel's communication is incredibly poor. Errata exist for all CPUs but this
one is quite important and resulted in no proper public communication it
seems.

------
angry_octet
Am I hellbanned?

~~~
jacquesm
nope

~~~
angry_octet
I was for a while there. Heisenban?

~~~
jacquesm
What makes you think you were?

~~~
angry_octet
Couldn't see multiple comments I made in unrelated threads when reading from
an incognito tab. Unless comment delay for non logged in viewers is an
undocumented feature?

Or this is inception level hellban, where some bots and devil curators lurk.
What is even real.

Btw good to see your Lego contraption article in IEEE Spectrum.

~~~
vertex-four
Pages are cached for significant lengths of time for non-logged-in users.

------
natehouk
So if I understand correctly, this was known all along?

------
salex89
Great, pay a premium for the top of the line CPU to get anything more than 4
threads, that disable it...

~~~
Frenchgeek
Only until a fix is available, if it isn't already yet.

~~~
Godel_unicode
It is, the issue being fixed was the only feedback the OCaml community
received.

