In my real world benchmarks I’ve observed the same: the jump from avx2 to Avx512 is usually underwhelming. Is this widely recognized and understood or just anecdotal?
On the Xeon Phi, IIRC, it was an improvement. When it was added to larger power hungrier cores, it initial required downclocking for thermals, which defeated the purpose somewhat. I hear newer cores can do it with less impactful power limitation measures.