Hacker News new | past | comments | ask | show | jobs | submit login

AVX-512 does not do to micro-op splitting (on intel server class CPU's, it does on some intel consumer cpu's).

Amusing Zen is emulating AVX512 and AVX2 via micro-op splitting and it performs better under some workloads.

The real issue is path propagation delays of 512bits worth of electricity is extremely non-trivial, and costs a shit load of power. Just `mov`'ing to the AVX-512 instructions (initially when AVX-512 is not warm) can stall the CPU for 10,000+ cycles as it tries to power on all those registers.

Someone analyzed AVX512 performance on the elusive Lenovo laptop? Link!

Also, source on the power up stall? AVX(2) didn’t have that and I’m highly surprised AVX512 would. Agner at least claims the same reduced throughput during warmup, but I think he only has early silicon.

AVX(2) definitely had the power-up stall on many chips, including all client Skylake I think.

No, it had reduced throughput of AVX instructions while the ALUs powered up. Not a stall.

Yeah maybe you are right for Skylake client, I haven't tested carefully there, but I'll probably get around to it. This thread [1] indicates that it may have only been Haswell that had the halted portion.

On to Skylake-SP, however, that chip is reported to have both reduced throughput and fully halted periods in [2].

Some have speculated it has to do whether chips have an integrated IVR: the models with integrated IVR having less capability of handling high dI/dt events. I don't know about that though (Skylake-SP still has external VR, right?).

[1] https://www.agner.org/optimize/blog/read.php?i=378#378 [2] https://software.intel.com/en-us/comment/1926876#comment-192...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact