So how exactly does one debug such a thing? I'm assuming it's not like a normal debugger?
They say "simple instructions", but it seems to involve running a specific program for "minutes or hours", so they haven't figured out what exactly causes this, but Intel somehow has.
The bottom line is that as a last resort debugging method, all modern chips can do scan dumps, which means reading out the state of every flip-flop on the chip. Obviously I have no idea whether that was needed by Intel to find this bug. It's possible that a higher-level emulation or some other technique was sufficient. But when all else fails, you can still do scan dumps.
It's a process called scan chain insertion. As a chip designer, I can tell you that DFT (Design for Test, which scan chain falls under) can be a real PITA even though the tools are automated.
EDIT: Should have also said that it really isn't that big in terms of area... a register plus two wires is nothing in the scheme of things.
That sounds incredibly expensive to route. Every flip flop?
Almost all ICs nowadays already have scan chains connecting to 99+% of their internal flip flops.
This is because it literally has been decades since chips have gotten too complex to do reasonable "functional" tests on chips in production. These tests are used not to verify that a design is correct, but to verify that the chip has been fabricated as designed, that there are no defective gates or wires.
By functional I mean (e.g. in the case of a CPU) to write instructions that exercise most of the combinatorial logic and most of the flip flops on the chip. Often times it takes many millions of test vectors to achieve less than 50% coverage.
Which is where scan comes in. A few external pins are needed. All the flip flops in the chip are the wired in series in groups of scan chains. Data can be shifted in serially to the chip via these chains, then the mode switched to exercise the "normal" logic between flip flops, then flipped again and result data can be shifted out of the flip flops and verified by a tester.
It's still quite complex, especially when multiple clocks and multiple chains are involved. But it allows for 90%+ test coverage of the logic in a chip without requiring verification engineers to write functional test vectors. Instead, there are tools that automate the process of connecting flip flops in chains and in generating the patterns that exercise the logic between flip flops.
Kind of, but it's a chain. It works by shifting bits through all flip-flops until they've all "dropped out" of an I/O pin. So you don't need any expensive muxes or crossbars.
It's actually one of the largest and most realistic real-world application of TSP and related problems.
I had a Gigabyte board (GA-Z170N-Gaming 5) that was very unstable with Skylake. The newly released beta BIOS update resolved the issue. Not sure that it's related to this issue directly but it did involve random freezes.
This talk from the 32c3 just two weeks ago looks at exactly that problem. They have "chicken bits" (not sure if that's the technical term :)) which can turn off certain features, and they can patch microcode.
Software design and testing is hard, but what happens when each bug fix can cost months of delay and millions of dollars? In this talk we’ll take a behind-the-scenes look at the challenges in the design of a very complex, yet critical piece of hardware: the modern x86 CPU.
Killbits is the one I've heard most commonly - as the talk says, most modern processors have a bunch because it's impossible to test every possible edge case a priori.
(Good recent examples include the AMD lockup bug that Matt Dillon found, and the nasty counter-example of the Intel Quark segfault bug, which can't be worked around so easily because the Quark doesn't have killbits. "Oops.")
>most modern processors have a bunch because it's impossible to test every possible edge case a priori.
Only with the sort of engineering most current CPU manufacturers use.
It's possible to have formally verified hardware; it's just expensive with current tech. Tools are improving, though. I suspect the next cutting edge will be the unification of functional-language-to-hardware compilers like Clash or Lambda-CCC with fully dependent languages like Agda or Idris, where you can statically verify arbitrary properties though the type system.
It's also able to load microcode patches very early on in the machine bring-up process. Microcode can be anywhere as simple as a single hardware enable bit "turn on this hardware unit" or as complex as "this instruction should be decoded as this sequence of microops."
It's hard to know exactly what's wrong without more internal, never-to-see-the-light-of-day Intel information, but it's possible the patch is as simple as tweaking the decoding, or disabling an optimization. It's doubtful a fix like this would require disabling an entire instruction set or hardware unit (like the TSX fix required; those are fairly rare bugs, even for Intel).
Specifically, given that it happens to be a bug that only happens with FMA3 disabled and hyperthreading enabled, it's probably some kind of hardware scheduling deadlock, and the patch could be as simple as "when the decoder sees this sequence of instructions, add some strategically placed NOPs to avoid the deadlock."
I just had a newly-built Skylake desktop completely lock up twice in the past week while using a resource intensive development tool. I'm curious as to whether this is related - I was freaking out over the possibility of my ~1200$ machine being defective in some way.
Let's hope that the planned BIOS update won't involve disabling an entire feature set like what happened with TSX:
http://www.anandtech.com/show/8376/intel-disables-tsx-instru...