Amusing Zen is emulating AVX512 and AVX2 via micro-op splitting and it performs better under some workloads.
The real issue is path propagation delays of 512bits worth of electricity is extremely non-trivial, and costs a shit load of power. Just `mov`'ing to the AVX-512 instructions (initially when AVX-512 is not warm) can stall the CPU for 10,000+ cycles as it tries to power on all those registers.
Also, source on the power up stall? AVX(2) didn’t have that and I’m highly surprised AVX512 would. Agner at least claims the same reduced throughput during warmup, but I think he only has early silicon.
On to Skylake-SP, however, that chip is reported to have both reduced throughput and fully halted periods in .
Some have speculated it has to do whether chips have an integrated IVR: the models with integrated IVR having less capability of handling high dI/dt events. I don't know about that though (Skylake-SP still has external VR, right?).