
Simple instructions for freezing a Skylake Processor - yuhong
https://communities.intel.com/thread/96157?start=15&tstart=0
======
pierrec
Better change the link to this:
[https://communities.intel.com/thread/96157?start=0&tstart=0](https://communities.intel.com/thread/96157?start=0&tstart=0)
(which was probably the intended link)

Let's hope that the planned BIOS update won't involve disabling an entire
feature set like what happened with TSX:

[http://www.anandtech.com/show/8376/intel-disables-tsx-
instru...](http://www.anandtech.com/show/8376/intel-disables-tsx-instructions-
erratum-found-in-haswell-haswelleep-broadwelly)

------
duncans
There was me thinking this would be about some crazy new overclocking
technique ...

------
conistonwater
So how exactly does one debug such a thing? I'm assuming it's not like a
normal debugger?

They say "simple instructions", but it seems to involve running a specific
program for "minutes or hours", so they haven't figured out what exactly
causes this, but Intel somehow has.

~~~
nhaehnle
I recommend this CCC presentation by an AMD engineer:
[https://media.ccc.de/v/32c3-7171-when_hardware_must_just_wor...](https://media.ccc.de/v/32c3-7171-when_hardware_must_just_work)

The bottom line is that as a last resort debugging method, all modern chips
can do scan dumps, which means reading out the state of every flip-flop on the
chip. Obviously I have no idea whether that was needed by Intel to find this
bug. It's possible that a higher-level emulation or some other technique was
sufficient. But when all else fails, you can still do scan dumps.

~~~
wyager
That sounds incredibly expensive to route. _Every_ flip flop?

~~~
trsohmers
It's a process called scan chain insertion. As a chip designer, I can tell you
that DFT (Design for Test, which scan chain falls under) can be a real PITA
even though the tools are automated.

EDIT: Should have also said that it really isn't that big in terms of area...
a register plus two wires is nothing in the scheme of things.

[https://en.wikipedia.org/wiki/Scan_chain](https://en.wikipedia.org/wiki/Scan_chain)

------
kup0
I had a Gigabyte board (GA-Z170N-Gaming 5) that was very unstable with
Skylake. The newly released beta BIOS update resolved the issue. Not sure that
it's related to this issue directly but it did involve random freezes.

------
MichaelBurge
That's interesting that a BIOS upgrade can fix this. Maybe the BIOS is able to
disable an instruction set like AVX?

~~~
ah-
This talk from the 32c3 just two weeks ago looks at exactly that problem. They
have "chicken bits" (not sure if that's the technical term :)) which can turn
off certain features, and they can patch microcode.

[https://media.ccc.de/v/32c3-7171-when_hardware_must_just_wor...](https://media.ccc.de/v/32c3-7171-when_hardware_must_just_work)

    
    
      Software design and testing is hard, but what happens when each bug fix can cost months of delay and millions of dollars? In this talk we’ll take a behind-the-scenes look at the challenges in the design of a very complex, yet critical piece of hardware: the modern x86 CPU.

~~~
rincebrain
Killbits is the one I've heard most commonly - as the talk says, most modern
processors have a bunch because it's impossible to test every possible edge
case a priori.

(Good recent examples include the AMD lockup bug that Matt Dillon found, and
the nasty counter-example of the Intel Quark segfault bug, which can't be
worked around so easily because the Quark doesn't have killbits. "Oops.")

~~~
wyager
>most modern processors have a bunch because it's impossible to test every
possible edge case a priori.

Only with the sort of engineering most current CPU manufacturers use.

It's possible to have formally verified hardware; it's just expensive with
current tech. Tools are improving, though. I suspect the next cutting edge
will be the unification of functional-language-to-hardware compilers like
Clash or Lambda-CCC with fully dependent languages like Agda or Idris, where
you can statically verify arbitrary properties though the type system.

~~~
throwupper247
I've just written an Idris programm that determined "your mothers phone
number", hows that for arbitrary facts? (tip of the hat to xkcd.)

------
Rolpa
I just had a newly-built Skylake desktop completely lock up twice in the past
week while using a resource intensive development tool. I'm curious as to
whether this is related - I was freaking out over the possibility of my ~1200$
machine being defective in some way.

