I think you’re over-focusing on the AVX512 argument. You even said yourself it w...

dragontamer · on Jan 11, 2021

> Also, he’s not even talking about clock downscaling

Quote the post in question:

> not with some AVX512 power virus that takes away top frequency (because people ended up using it for memcpy!)

This is pretty obviously talking about the down-clocking issue.

> You’ve basically taken his post out of context, misinterpreted it to be saying something he isn’t, and claiming it’s wrong because some details have changed on the ground after he made it.

AVX on Sandy Bridge was worthless: it wasn't any faster than SSE because Sandy Bridge was still 128-bit (even on AVX code).

Anyone who has followed ISA-development for decades would recognize that the first implementation of an ISA is often sub-par and sub-optimal. It wasn't until Haswell before AVX was faster than SSE in practice.

AVX512 was first implemented on Xeon Phi. Then it was ported over to servers in Skylake-X. AVX512 from there on remained a "1st generation" product for a few years (much like AVX on Sandy Bridge, before Haswell). From the history of these ISA-extensions, we generally can expect downclocking issues or inefficiencies in the first few implementations.

-----

In fact, some portions of AVX still remain suboptimal. vgather for example, isn't really much faster than standard load/stores. Maybe they'll be suboptimal forever, or maybe some hypothetical future CPU could optimize it.

Either way, I know that I shouldn't make an argument based on the "stupidity of vgather" instruction. There's an "obvious" optimization that could happen if that instruction becomes popular enough to deserve the transistors. At best, if I were to argue about vgather instructions, I'd preface it with "current implementations of vgather on Skylake-X", or similar wording.

CPUs change over time, much like programming languages or compilers. I expect a degree of forward-looking ability from experts, not for someone to get lost in the details of current implementations.

vlovich123 · on Jan 11, 2021

I don’t think I’ve ever met an engineer who’s a domain expert who talks like that. Unless you have actual specific information about future plans and reason to believe in them, expecting everyone to qualify their prognostication with “this generation but it might change in some hypothetical future” is just not useful. That’s generally true of all technical topics. What’s stated today can change in the future. These aren’t physical limits being discussed.

Finally, Linus’ main advice is to stop focusing on AVX and focus on architectural improvements. It took an enormous amount of engineers to implement AVX512 and then even more to get rid of the power hit. I trust Linus (and back that up with my own experience and observations in the industry) that Intel would have been far better served focusing on architectural improvements like Apple did. It’s a blind spot Intel has because they know how to bump clocks and create x86 extensions but they don’t trust architectural improvements to deliver massive gains. Hell, this isn’t the first time. Netburst (which was their attempt) screwed up so badly they had to return to the P3 architecture with the Core line. Intel has the engineers but their management structure must be so bloated an inefficient to fail with such regularity.

dragontamer · on Jan 11, 2021

> I don’t think I’ve ever met an engineer who’s a domain expert who talks like that. Unless you have actual specific information about future plans and reason to believe in them, expecting everyone to qualify their prognostication with “this generation but it might change in some hypothetical future” is just not useful. That’s generally true of all technical topics. What’s stated today can change in the future. These aren’t physical limits being discussed.

If I, 5-years or 10-years ago, claimed ARM sucks because its in-order, then I'd be making a similar mistake as Linus did. There's nothing about the ARM-ISA that forced in-order execution, it was just the norm (because back then: it was considered more power-efficient to make in-order chips).

There's nothing part of AVX512 that says "You must use more power than other instructions". That was just how it was implemented in Skylake-X. And lo-and-behold, one microarchitecture later, the power-issue is solved and a non-issue.

Similarly, ARM chips over the past decade switched to out-of-order style for higher performance. Anyone who was claiming that ARM was innately in-order would have been proven wrong.

----------

Maybe it takes experience to see the ebbs and flows of the processor world. But... that's what I'd expect an "expert" to be. Someone who can kind of look forward through that.

I mean, I've watched NVidia GPUs go superscalar (executing INT + FP simultaneously). I've watched ARM go from largely in-order microcontrollers into out-of-order laptop-class processors. I've watched instruction sets come and go with the winds.

------

If that's not good enough for you, then remember that Centaur implemented AVX512 WITHOUT power-scaling issues back in 2019, still predating Linus's rant by months. A chip did exist without AVX512 power throttling, and he still ranted in that manner.

vlovich123 · on Jan 12, 2021

Nowhere did Linus say that AVX512 being a power sink is an unavoidable consequence and should be dropped for that reason. He just said "AVX512 sucks right now - they should spend their resources elsewhere because the ROI isn't there". When he made that statement he was factually correct - Intel had spent a lot of resources bringing AVX512 to market, the first version is shit (making correct selection of turning on AVX512 support tricky), & they had to spend even more correcting it.

If you had said 10 years ago "ARM sucks because it's in order" you wouldn't be factually wrong on that front in that in-order sucks for performance, but you would be factually wrong because the first iPhone was already partially out-of-order and 10 years ago you'd be looking at the iPhone 4 which was a Cortex-A8 superscalar design. Also in-order is fantastic for battery - that's why M4 & below are still in-order. Note that me saying this doesn't imply that will remain the case for all time, which is how you seem to be interpreting statements like that which would be an extremely poor interpretation on the part of the reader.