Let's hope that the planned BIOS update won't involve disabling an entire feature set like what happened with TSX:
They say "simple instructions", but it seems to involve running a specific program for "minutes or hours", so they haven't figured out what exactly causes this, but Intel somehow has.
The bottom line is that as a last resort debugging method, all modern chips can do scan dumps, which means reading out the state of every flip-flop on the chip. Obviously I have no idea whether that was needed by Intel to find this bug. It's possible that a higher-level emulation or some other technique was sufficient. But when all else fails, you can still do scan dumps.
EDIT: Should have also said that it really isn't that big in terms of area... a register plus two wires is nothing in the scheme of things.
Almost all ICs nowadays already have scan chains connecting to 99+% of their internal flip flops.
This is because it literally has been decades since chips have gotten too complex to do reasonable "functional" tests on chips in production. These tests are used not to verify that a design is correct, but to verify that the chip has been fabricated as designed, that there are no defective gates or wires.
By functional I mean (e.g. in the case of a CPU) to write instructions that exercise most of the combinatorial logic and most of the flip flops on the chip. Often times it takes many millions of test vectors to achieve less than 50% coverage.
Which is where scan comes in. A few external pins are needed. All the flip flops in the chip are the wired in series in groups of scan chains. Data can be shifted in serially to the chip via these chains, then the mode switched to exercise the "normal" logic between flip flops, then flipped again and result data can be shifted out of the flip flops and verified by a tester.
It's still quite complex, especially when multiple clocks and multiple chains are involved. But it allows for 90%+ test coverage of the logic in a chip without requiring verification engineers to write functional test vectors. Instead, there are tools that automate the process of connecting flip flops in chains and in generating the patterns that exercise the logic between flip flops.
The following Wiki touches on the process: https://en.wikipedia.org/wiki/Scan_chain
It's actually one of the largest and most realistic real-world application of TSP and related problems.
Software design and testing is hard, but what happens when each bug fix can cost months of delay and millions of dollars? In this talk we’ll take a behind-the-scenes look at the challenges in the design of a very complex, yet critical piece of hardware: the modern x86 CPU.
(Good recent examples include the AMD lockup bug that Matt Dillon found, and the nasty counter-example of the Intel Quark segfault bug, which can't be worked around so easily because the Quark doesn't have killbits. "Oops.")
Only with the sort of engineering most current CPU manufacturers use.
It's possible to have formally verified hardware; it's just expensive with current tech. Tools are improving, though. I suspect the next cutting edge will be the unification of functional-language-to-hardware compilers like Clash or Lambda-CCC with fully dependent languages like Agda or Idris, where you can statically verify arbitrary properties though the type system.
It's hard to know exactly what's wrong without more internal, never-to-see-the-light-of-day Intel information, but it's possible the patch is as simple as tweaking the decoding, or disabling an optimization. It's doubtful a fix like this would require disabling an entire instruction set or hardware unit (like the TSX fix required; those are fairly rare bugs, even for Intel).
Specifically, given that it happens to be a bug that only happens with FMA3 disabled and hyperthreading enabled, it's probably some kind of hardware scheduling deadlock, and the patch could be as simple as "when the decoder sees this sequence of instructions, add some strategically placed NOPs to avoid the deadlock."