Hacker News new | past | comments | ask | show | jobs | submit login
AMD Ryzen 7040 Series Shows Great AVX-512 Performance for Laptops / Mobile (phoronix.com)
101 points by mfiguiere on July 13, 2023 | hide | past | favorite | 50 comments



AVX-512 is one of those technologies I thought I’d use all the time, but in practice all of the compute heavy workloads I use just moved to the GPU.

The real benefit of having AVX-512 client side is that I can test hyper-optimized server side code for unique operations.

The problem is that AVX-512 isn’t a singular instruction set. Different CPUs support different sets of AVX-512 instructions, meaning the instruction set on this AMD CPU might not match what our servers support anyway. Fortunately the common instructions are still useful.


Intel loves creating product segmentation looking for reasons to charge some users more, but doing so in the instruction set is stupid as other than a few highly optimized applications, developers tend to target what the majority has. This leads to failure of uptake and removal in the future. Not sure how Intel has not learned this after having it happen many times.


Because it's presumably still profitable for them. A very small subset of users with very deep pockets finds value in those higher-tiered products and pays for them anyways. They don't care that adoption isn't high because it stops being a selling point once they have "something better" to replace it.


This sort of worked with AVX-512. Sadly, it did not work with Optane. It could completely revolutionize the way e.g. mobile devices work, creating a huge market, if it were incessantly pushed to gain mindshare, instead of being scarcely available in pricey high-end SSDs. A similar thing, sadly, happened to Plan 9.


Sure, but Plan 9 was never going to work. It didn't take security seriously until it's last release and even then only barely. Plan 9's ideas live on in Golang and some parts of it made their way into Linux.

Optane is just Intel's hubris at work. Also the fact that AMD finally started spanking them in the CPU market.


Intel and AMD seem to have settled on a large subset of mutually supported AVX-512 instructions (since Ice Lake and Zen 4).

https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512


Zen 4 is still lacking FP16 instructions Sapphire Rapids has.

Someone in the Clear Linux IRC told me that the distro will settle on a new x86 level sometime after Sapphire Rapids, once AVX512 is sufficiently differentionated from the x86-64-v4 AVX512 "baseline." I wonder if this means AMD will follow suit?


While Zen 4 is lacking the instructions for machine learning added by Sapphire Rapids, which was launched well after Zen 4, it includes the BF16 instructions introduced by Cooper Lake (and also the Vector Neural Network Instructions introduced by Cascade Lake).

Zen 4 includes all the AVX-512 instructions that existed at its launch and which had not been already deprecated by Intel (Intel has deprecated some AVX-512 instruction groups provided by Knights Landing and Knights Mill and also one AVX-512 instruction group introduced by Tiger Lake).


Oh yes. FP16 is awesome for machine learning inference-only tasks. No need to fire up GPU, and so much easier to use.


Yeah I didn't mean FP16 specifically, but just that AMD/Intel hopefully won't go out on a limb and try to seriously bifurcate AVX512


The things I want AVX512 for in a laptop:

- AV1 encoders

- VapourSynth

- llama.cpp

- whisper.cpp

- numpy

...That's about it. Autovectorized AVX512 system packages from CachyOS, Clear Linux and such would be cool, but I can live without that.

AV1/VapourSynth are branchy, and actually a reasonable fit for CPUs. Everything else seems like it would work well on IGPs, but alas it has not been done yet. shrug


AVX is very useful for audio DSP when you care about latency. Which is mostly in the pro audio niche.


Would love to get my Acustica Audio plugins to use less CPU. I've heard good things about the 7XXX processors for this.


Linux has this “pulseeffects” package with some nice features (eq, left-right balance), but it is a real CPU hog (especially for something that sits in the background all the time). Kinda wonder if it is worth re-compiling for avx-512…


>> Linux has this “pulseeffects” package with some nice features (eq, left-right balance), but it is a real CPU hog

Yeah, but one or two cores should handle it just fine. If it really is a hog, it probably needs some optimization. Real-time audio DSP has been a thing for decades and should not be a hog.


It is running on my laptop so I’m probably being unusually picky.


Looking at the source, it looks like pulseeffects doesn't run any of the DSP itself. It uses a number of lv2 format plugins to process the audio.


>The problem is that AVX-512 isn’t a singular instruction set.

Sure, but the bigger and (imo) the actual problem is that avx-512 was never popular enough(until now i guess), adopted in the eyes of the consumers buying hardware. Intel pioneered it somewhat prematurely: just like their octane they tried to make it work (so fast) they forgot people actually need some time to adopt.[Key point: adoption based on needs, which was lower because virtually no dev adopted avx-512 for the masses to 'need' it]

In this case avx-512 is a thing that can benefit way more people than the "octane" audience(I'm giving this example because octane died somewhat similarly to how intel abandoned avx-512). Right now AMD ironically steers the avx-512 ship: because what matters most is if many people can get their hands on it: that drives software adoption and importance to devs. Intel should humbly come back to avx-512 now that AMD seems to (hopefully) steadily but surely put avx-512 in all their chips.


> The problem is that AVX-512 isn’t a singular instruction set. Different CPUs support different sets of AVX-512 instructions

CPUs supporting similar but different enough extensions is a real pain in the ass, this is why everyone should use Gentoo with custom compiler flags.


> The problem is that AVX-512 isn’t a singular instruction set. Different CPUs support different sets of AVX-512 instructions, meaning the instruction set on this AMD CPU might not match what our servers support anyway. Fortunately the common instructions are still useful.

People tend to overstate the complexity of AVX-512 though. Chronological order of feature introduction isn't how the product was actually introduced, since client, server, and Xeon Phi have all followed their own independent development process.

No product segment has ever regressed AVX feature support under Intel (other than the obvious rugpull at the end with Alder Lake).

Alder Lake is a strict superset of Tiger Lake, Tiger Lake is a strict superset of Ice Lake, Ice Lake is a strict superset of Cannon Lake, Cannon Lake is a strict superset of Skylake-X.

(Rocket Lake is technically Ice Lake/Sunny Cove backported to 14nm, and its direct product predecessor is Skylake-X not Tiger Lake, which never was officially released on desktop.)

Sapphire Rapids is a strict superset of Cooper Lake, Cooper Lake is a strict superset of Cascade Lake, Cascade Lake is a strict superset of Skylake-X/Skylake-SP.

For Xeon Phi, Knights Mill is a strict superset of Knights Landing. And you completely don't need to care about it unless your employer ends in "national labs".

Again, the thing to remember is that dev teams at Intel have been very siloed for a long time, and between 10nm delays, 14nm capacity shortages, server delays, etc, it has been a long time since any of the product teams hit their chronological roadmaps. So the point in time when any product segment rolls out a uarch generation is very hit-or-miss (Golden Cove was 12th-gen/Alder Lake on desktop but is only releasing in Sapphire Rapids servers this year), and client and server are also diverging somewhat in featureset.

(early 14nm didn't go well for Intel either - it's really been a rough decade for them, they have absolutely spun their wheels and barely managed to get product out the door for a long time now.)

But there is never a case where you take a given server/client product segment and drop in the newer thing and regress a feature (up until the final rugpull with Alder Lake). It's just different progression levels of the "common core" as you advance through uarch generations/Coves, and then server/client having a few different things bolted on the sides (but they've never regressed these things).

It's true though that AMD didn't quite do the same extensions as Intel either. But they're a lot closer to Alder Lake ("supports everything", until it didn't) than not. And this is simply part of developing a new standard - AMD had some with SSE4.x and 3DNow! that didn't pan out either.

I did this chart a while ago based on taking the chronological ordering of the features and rearranging them based on the series they were introduced. Sapphire Rapids isn't quite right, it wasn't on the chart at the time so I added it, and it doesn't support VAES/VP2INTERSECT after all but you can see the monotonic capability progression pretty clearly. If you order it by product generation, it's way less intimidating than people make it.

https://i.imgur.com/2HLrIjr.png


> No product segment has ever regressed AVX feature support under Intel (other than the obvious rugpull at the end with Alder Lake).

You can really extend this a bit further: very few x86 ISA features have been dropped by newer processors. The complete list of such features as of Raptor Lake is MPX, TSX (including hardware lock elision), SGX, Branch Monitoring Counters, and Power Aware Interrupt Routing, with an additional asterisk on AVX-512. There is also talk of a future processor dropping support for 16-bit stuff so that you can no longer run Windows 3.1 on your machine without full-on emulation.

The asterisk with AVX-512 is that processors with P cores and E cores don't support AVX-512, whereas the P core lineage did support AVX-512 (and in early Alder Lake steppings, you could enable support for AVX-512 if you disabled all the E cores).


Also the branch direction hints of Pentium 4 have been dropped.

Several AVX-512 instruction groups of Knights Mill/Knights Landing have been dropped, like also the complete 512-bit instruction sets of Knights Corner and of Knights Ferry, but these have never been mainstream.


> No product segment has ever regressed AVX feature support under Intel

Actually it appears that AVX-512 VP2INTERSECT, which has been introduced by Tiger Lake (and which is not supported by Zen 4) has been deprecated and it will be missing from future Intel CPUs.

This will not be a loss, because even on Tiger Lake alternative instruction sequences could perform the same operation faster.


Out of curiosity, do you have any examples of workloads that you have shifted wholesale to the GPU instead of AVX?


Rendering is a transition in progress.

Some configs (like AMD Radeon + Windows on Cycles) are not yet supported, but looking in from the outside, CPU rendering looks like its quickly becoming obsolete.


The real benefit of AVX-512 is not even "512" part, even when used with 256 bit width registers the ability to use masking and orthogonality makes life so much easier for the compiler writers and those who deploy intrinsics manually.


I am torn on i9-13900k vs 7900X. Anyone have linux workstation experience with one of these? Missed a big discount on the 7900 yesterday (prime day) but been leaning towards Intel. Just wondering how Linux does with the perf vs. efficiency cores.


The 13900K has more cores and depending on your use case that might be nice. As a software developer, I find the "efficiency cores" of the 13900K to be similar in steady-state throughput to the "performance cores" for embarrassingly parallel workloads like C++ builds or ETL from JSON. An analogy I have made before is the e-cores are like having a Xeon Gold 6130 as a coprocessor. If your use-case is different than mine, you might have a different experience.

I also find it amusing, although not professionally relevant, to experiment with the unusual abilities of the 13900K that descend from its Atom lineage: clusters of 4 e-cores that share L2, and the WAITPKG ISA extension that makes switching threads extremely fast. On the other hand the 7900X has AVX-512 and I would rather have that instead.


I haven't touched a 7900X, but AMD Rome and Milan have core complex (CCX) that share 16MB L3 cache. AMD Rome has significantly faster L3 cache than Intel (https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7) and generally more of it overall, but it is split among many CCX. This fragmentation can be a problem. ---

On the 13900K...

At home, I have a 13900K (128G 4090 8T+8TSSD) and use Mathematica often for personal use. I installed it intially in Windows 11 Pro and got abysmal performance in benchmarks (worse than 12900 equivalent processor). With help from Wolfram customer support, I switched to Linux (Ubuntu): the performance is on par with what I'd expect.

My hypothesis is that either the Windows scheduler for Windows 11 Pro optimizes for a different performance point than Linux, or that it was oblivious of performance vs. efficiency core (13900K has double the efficiency cores of the previous generation, so if you pick randomly you have a lot more chances to end up in a slow core).

This was in the previous minor release of Mathematica, I hope they implemented a workaround in the latest version. I don't think it was a Mathematica bug, but an OS-level problem (I haven't tried, but I can it would be easy to check): many report that stable diffusion works better on Linux than Windows.

My personal toy project was a reinforcement learning thing, for which the CPU was the bottleneck on Windows (performance on par with MacStudio M1 Max) and I saw nearly 2x improvement switching to Linux.


I will need to do some digging on AVX-512. This is my current dev box, which is pretty old by todays standards. https://i.imgur.com/ebwJlBi.png


Coming off that machine, I guess I would consider a Xeon w7-3455 even though it costs a lot more. Definitely more comparable to that league of machine.



Phoronix test suite should have the data.


i9 wins in most workloads, and it supports DDR4 which is a perk since I don't particularly see a need for DDR5 and the former is a lot cheaper.


You may also want to consider power and heat; I don't have numbers to share but I think 13900k is understood to be hot and power hungry.


one thing i kinda regret with going with DDR4 13-gen: the ability to have 48GB per DIMM slot instead of getting maxed out at 32GB per slot. I'm on a mini-ITX system with 2 DIMM slots.


7945hx I like


The GP is asking about desktop cpu's, the 7945hx is a laptop cpu.


7945hx is very powerful and worth looking into based on cpubenchmark. Frankly I don’t know much about avx512 performance though.


Are these benchmarks applicable to any of ya'll on HN?

I was excited to read this looking for media encoding/decoding and GenAI inference benchmarks... Only to find nothing really applicable. Stuff like CPU raytracing, CPU resnet, and gender detection seems far removed from what I would actually do on a 7000 series APU.


Yet another AMD performance evaluation where the author is comparing against Intel CPUs from years ago. I get that that author might have only had access to these older Intel chips, but why go through the process of doing a lengthy comparison that really isn’t worthwhile?


Intel's later mobile processors don't have AVX-512, so this comparison is fair.


In addition to the linked articles from the start of this article (including Milan performance), they've previously compared Ice Lake vs Sapphire Rapids vs Genoa (https://www.phoronix.com/review/intel-sapphirerapids-avx512)


Newer Intel CPUs do not support AVX512. Tiger Lake is the most recent laptop CPU that does.


Didn't Intel kill AVX512 support on their 12th gen CPUs?


Intel removed AVX512 from their recent consumer chips, no?


Yes, but wouldn’t you have found this review more valuable if the comparison was made with a recently available Intel CPU that supported AVX-512 versus one from 2019?


Tigerlake is the most recent Intel laptop CPUs to officially support AVX-512... Thus can't do an Intel AVX-512 on/off comparison with any newer Intel laptop.

If you just want to see how the new AMD 7840U compares to a recent Intel Alder Lake, that's already been covered separately in: https://www.phoronix.com/review/amd-ryzen7-7840u


Michael buys all these machines with his own money. You can buy him one, snark semi intended.


Yep sadly most laptop vendors don't care much about Linux... Or the Linux laptop vendors and others that do offer review samples, want them back in 30 days. So really doesn't work for long-term comparisons like this... But as mentioned by other commenters, the older Ice Lake and Tigerlake were explicitly used because they have AVX-512 support. For those just wanting to see newer AMD vs. newer Intel, that has already been covered in a separate article but this article here is specifically around "AVX-512".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: