See Skylake for example - the list of known errata starts on page 27 and continues on through page 63 : http://www.intel.com/content/dam/www/public/us/en/documents/...
http://www.guru3d.com/news-story/intel-atom-c2000-chips-are-...
Do CPU and GPU manufactures do any types of fuzzing?
Here's an old heavily cited paper from Intel on the topic; I'm sure their state of the art has advanced considerably in the intervening 17 years since its publication:
http://dl.acm.org/citation.cfm?id=623013
Unfortunately fuzzing ultimately has a random component, which doesn't really prove that you got all of these bugs.
Unfortunately fuzzing ultimately has a random component
As AVX2 instructions which accepts 3 registers and outputs 1 has a 1024bit problem space to validate.
This occurred in FMA3 with a ~512bit problem space.
Repeat for _every_ instruction (HUNDREDS). You can see how a few bugs slip though the cracks. The problem space is as large as some cryptographic functions!! I'm honestly surprised we don't see more of them.
-- Keeping my earlier comment but it seems there's a microcode fix in testing currently.
http://forum.hwbot.org/showpost.php?p=480922&postcount=30
http://www.fudzilla.com/news/processors/43166-amd-confirms-r...
For a personal machine, it's probably not terrible if you are using linux or something else where you can compile everything yourself. But running binaries built by someone else would be a crapshoot. I wonder how many games (of the Windows, AAA variety) use these multiply instructions?
There's also not a lot of detail in the article—like if the data has to be specific or just the combination of instructions is enough to crash it.
Does this mean disabling SMT will also fix the bug, or is that specific to this app?
I've heard that some things improve performance on Ryzen with SMT off, but I've heard that's because OS level task schedulers need to be optimized better. But I still wonder if Ryzen's SMT implementation is on par with Intel's first implementations.
http://www.anandtech.com/show/2477/2
CPU errata like this aren't uncommon.
$ dmesg | grep -i microcode
[ 0.000000] microcode: microcode updated early to revision 0x1c, date = 2015-02-26
[ 1.415548] microcode: sig=0x306a9, pf=0x10, revision=0x1c
[ 1.415665] microcode: Microcode Update Driver: v2.2.
The fact that most users does not hit those bugs is because modern OS already patches those microcode before execution (like the case above).
Of course, if AMD can't fixes those bugs without performance regressions (remember the infamous TLB Bug from earlier Phenoms?) it can be pretty bad. However I don't think the majority of users needs to be too cautious about it.
[1] - https://en.wikipedia.org/wiki/Intel_Quark#Segfault_bug
(I also had a Phenom II for many years without any mysterious crashing problems, even though the chip was the correct revision and there was no BIOS erratum workaround enabled.)
But if this specific sequence is every run, that's another question, of course.
