I recently ran some machine learning benchmarks on CPU versus GPU. The gap for a multicore CPU was much smaller than I expected. I, for one, am excited by a 64-core CPU in a way I wouldn't have been a year ago.
Between Hugging Face, Stable Diffusion, and Whisper, I'm using ML workloads a lot more. Being able to do so:
* with a standard instruction set
* with open-source software
* with my full system RAM
* without having to worry about what is in VRAM versus main RAM
is a big step up. I see about a 10x speed difference between an older 16-core CPU and a hot-off-the-press high-end Ampere card costing 3x as much as the CPU. If 64 core could bring that within 2x, or even 4x, I'd dump the GPU entirely.
My 5950x (measured) flops are ~2 TFLOPS in single-precision, ~1TFLOPS in double precision (obviously, due to half the SIMD vector size). This is a desktop-class 16-core machine.
I've tried it on 10980XE (18-core) that got between 600GFlops-1.6TFlops depending on the instruction in quad channel mode. Will try later on a 32-core Threadripper. The challenge there is to keep all cores busy during training while not repeating the same gradient computation I guess (both scheduling and memory stuff).
Those are Tensor flops, the numbers for the Zen CPU are "general-purpose" flops (sometimes called "vector flops" in marketing material).
The vector flops for the 3090Ti are 33 TFlops for single precision, 0.5 TFlops for double precision. So, 16x faster than the 5950x in single precision, 2x slower for double precision. At almost 3x the price and >4x the power consumption.
Of course, if all you care about is AI, then there's no argument - but then we are not really talking about a general-purpose device any more.
The narrative of GPUs being "hundreds of time" faster than CPUs is vastly blown out of proportion for general-purpose computing.
I think you missed that this whole discussion is in the context of deep learning, therefore your comment does not apply. It is 30x slower that 3090Ti for that purpose.
Here's the comment I assume you are allegedly trying to "correct":
> with full training you are out of luck with CPUs, the gap is much bigger. 64c TR could only get to roughly 1TFlops
1TFlops is not the main part of that statement, and it is qualified with "roughly" which I suppose is not too far from the truth in the context. And the context is "training ... the gap is much bigger", and in this case "much" is at least 30x even with the updated number.
Recently, I've been implementing my custom inference code in C for various models (GPT, Whisper) and am interested to see how it compares to various GPUs in terms of performance. So far, I've been running it only on my MacBook M1 as I don't have the necessary hardware.
Anyone is using 64 cores besides Linus :) I'm much more excited for 7900x on 12 cores rather than 64 cores. But I understand the limited amount of people that needs this power on desktop can also be excited.
I could use an almost indefinite number of cores for fuzzing and compiling. Currently I have to limit my fuzzing runs to 12 cores because the 3 year old AMD machine can't handle more without impacting other development work.
But then if 64 is not useful to you, why the 7900X instead of the 7700X and it's 8 cores? The 7700X is way less power hungry and boosts to nearly the same speed as the 7900X.
Genuinely asking as I plan to replace my Ryzen 3700X with a 7700X.
im on 3900x and i was planning to upgrade to 7900x i tend to run few VMs. But I'm not sure yet. Would be cool to get on DDR5 but it feels like this time it is just to have upgrade. So not sure.
Some common dev work loads that benefit: huge builds (especially for C++ and Rust), running lots of VMs to run a copy of a cloud infrastructure locally, emulating foreign hardware for testing (qemu), large scale data analytics locally instead of paying some ridiculously expensive SaaS to do it.
I'm actually considering upgrading to 5800x or 5800x3d as a cheap temporary upgrade since the newest generation (which I initially planned on) just seems too expensive given the need for new DDR5, and new very expensive motherboards which will likely need at least another year to mature. So far I've been leaning towards the pretty cheap 5800x (280e vs 450e for 5800x3d) since the difference actually doesn't look that big for real workloads (and since it's an upgrade for a shorter than usual timeframe). Is the 5800x3d actually 60% better in non-gaming to be worth it? If not (as it seems to me) I'm not sure why waiting for the next 3d specifically makes sense.
X3D seems to shine in gaming but likely helps with other code that is not computation-heavy as well. If you don't need AVX-512 or higher memory bandwidth, either 5800 CPU is probably good. X3D is going to be a bit more future-proof, requiring a later upgrade, based on how well 5775C holds up even today.
Having 64 powerful cores sounds more impressive than 1024 weak cores.
Also why even mention that 1024 core CPU instead of upmem. 128 cores per DIMM slot and up to 2560 cores in a single machine and they are fast precisely because they are directly attached to memory with a total memory bandwidth of 2.56TB/s.
That was the platform; the company was Adapteva. Cavium and Tilera also had more cores than this approximately a decade ago. "Manycore" would be the generic term to search for.
Imho if cpu manufacturers figure out how to slap a large cache on the same die (something like amd 3D V-Cache but much more) we may actually see graphic cards become obsolete in favor of software rendering.
Specialized silicon will always beat general purpose silicon.
It is true that a chip like this probably could render pretty decent 3d in software though. I wonder if combining this with the GPU in a clever way could allow more people to experience real time raytracing?
> Specialized silicon will always beat general purpose silicon.
The whole history of PCs is repeatedly proving otherwise. The NES had hardware sprites. Then Carmack & Romero showed up and proved you can have smooth side scrolling in software, on an underpowered CPU. The whole concept of a PPU was thus rendered obsolete. Repeat for discrete FPUs, discrete sound cards, RAID cards (ZFS), and so on.
Specialised silicon will beat general purpose silicon at the given task, until general purpose silicon + software catches up. You need to keep pouring in proportional R&D effort for the specialised silicon to stay ahead.
What keeps GPUs relevant is that they're in fact much more general than what the "G" originally stood for.
CPU’s have integrated a lot of specialized silicon as transistor budgets increased. x86 treats integer and floating point arithmetic as separate things because the math coprocessor used to be a separate and optional chip. Now days it’s GPU cores making the migration, but that’s hardly going to be the end of it.
When the second generation of EPYC came out, linus ran a "software rendered" version of crysis that did all rendering on CPU cores instead of GPU shader units. At 640x480 it ran alright.
Possibly - there are a lot of ray tracing algoeithms that don't really work well on GPUs (anything MCMC, for instance). But context and time aware denoising seems to be able to compensate.
the cpu is certified by AMD to be running up to 105 celsius, but it thermal throttles automatically at 95 celsius, so out of the box probably not enough to boil water, but just barely :P.
the fun fact, is that if you manually reduce the power limit to 65W the initial single thread results so virtually no loss in ST performance vs 170W, and it appears that the original AMD slides stating 75% more efficient cores at that level not too far off.
The previous generation of Intel and AMD CPUs could not consume more than 20 to 30 W with a single active core (non-overclocked).
So with the power limit set to 65 W or more the single-thread performance was always limited by the maximum turbo frequency (which may depend on the temperature of the CPU) and never by the power limit.
I have not seen yet any published value about the single core power consumption of Zen 4, but it is likely that the single core power is not higher. It is certainly much less than 65 W even at 5.85 GHz.
So the expected behavior is that the single-thread performance does not depend on whether you set in BIOS the steady-state power limit to 170 W, 105 W or 65 W. Only the multi-threaded performance is modified by the power limit, because when the power limit is reached, the clock frequency is decreased until the power consumption matches the limit.
That's the consumer variants; the Threadrippers will almost certainly not be at a lower rated TDP than current gen's 280W. If they increased it by same percentage as they did for consumer, it'd be 450W, but that's unlikely; 350W might be in the cards, though.
Between Hugging Face, Stable Diffusion, and Whisper, I'm using ML workloads a lot more. Being able to do so:
* with a standard instruction set
* with open-source software
* with my full system RAM
* without having to worry about what is in VRAM versus main RAM
is a big step up. I see about a 10x speed difference between an older 16-core CPU and a hot-off-the-press high-end Ampere card costing 3x as much as the CPU. If 64 core could bring that within 2x, or even 4x, I'd dump the GPU entirely.