
AMD Ryzen Threadripper 3000 32-core CPU is more bad news for Intel - rbanffy
https://www.zdnet.com/article/amd-ryzen-threadripper-3000-32-core-cpu-is-more-bad-news-for-intel/
======
twotwotwo
Worth noting AMD just announced that the first third-gen Threadripper will
come out in Nov with 24 cores, and the planned 16-core mainstream chip (the
3950X) is delayed to November: [https://www.tomshardware.com/news/amd-
ryzen-9-3950x-delay-la...](https://www.tomshardware.com/news/amd-
ryzen-9-3950x-delay-launch-third-gen-threadripper,40442.html)

A 32-core chip is still almost certain to show up, since benchmarks have
leaked, and folks have also leaked some pics of 3950X's and packaging, but I
guess supply/demand have kept everything from coming out yet.

As the Tom's post notes, server and client chips use the same chiplets, so it
could be that most of the higher-binned ones are going to server parts; some
are higher-margin (7742 is almost $100/core, vs. 3950X under $50/core) and
plausible there're some enterprise orders (AWS etc.) getting priority. I
wonder if they tightened the binning for the 3950X to respond to the fuss over
turbo, too.

I had never really considered everything that goes into getting to that
SKU/price list and launch date: you don't really know how much supply you'll
have at various perf levels, or what demand there will be for what at what
price, and if you don't exactly match them you might end up losing money by
underpricing, by having to nerf good silicon to fill highly-demanded lower-end
SKUs, or by having a shortage that makes customers go elsewhere. (Plus vendors
don't just have the public SKU list, they're working out contracts with
OEMs/other big customers. And who knows what the competition will do.) High
stakes, and practically speaking no backsies; seeing how upset folks are about
turbo clocks, imagine if AMD announced a price hike. Glad it's not my job.

~~~
kllrnohj
> As the Tom's post notes, server and client chips use the same chiplets, so
> it could be that most of the higher-binned ones are going to server parts;
> some are higher-margin (7742 is almost $100/core, vs. 3950X under $50/core)

The 7742 is the "halo" chip, though. Or was, rather, until the Epyc 7H12 was
announced.

Other Epycs have much lower $/core. For example the 24-core EPYC 7352 is
$1350, making it $56/core. The 16-core EPYC 7282 is actually even cheaper than
the 3950X at $650, or $40/core.

No doubt bulk orders are going to get the priority, but the margins may not
actually be that different depending on what companies are actually bulk-
ordering. The $/core drops pretty quickly even just going down slightly in the
stack. The 32c 7452 is $65/core.

And we don't entirely know which aspect of the binning is the limiting factor.
If it's just functional cores that's going to be different from if they run at
the right frequency/voltage. Epyc is all 225 W or lower. Threadripper 3000 AMD
could pretty easily slap a 300W TDP on it and ram voltage through the chips
that couldn't cut it at Epyc specs. The 2990WX is, after all, a 250W TDP part.
The existing socket is already spec'd for more power capability than the top-
end Epyc.

~~~
twotwotwo
I don't really hard-disagree with any of that; we're all guessing. The co.
sending chiplets where they fetch the most also doesn't seem outlandish as a
factor, though.

FWIW, here's the price-per-core chart for the whole second-gen server line:

[https://www.servethehome.com/wp-
content/uploads/2019/09/AMD-...](https://www.servethehome.com/wp-
content/uploads/2019/09/AMD-EPYC-7402P-in-7002-Series-Cost-Per-Core.jpg)

(Doesn't factor in voltage/freq needs for different SKUs, some of them needing
fully-working chiplets, etc. Still.)

The low pricing on the 7282 and a few others is interesting, too. Wonder if
it's a factor that it can use partially-working chiplets since the server I/O
die can take up to eight chiplets and the client die only two, or if that's
totally unrelated.

------
Symmetry
AMD is in a really good position right now.

Intel still has the lead in low idle power which is good in laptops.

Ryzen lets different cores have different max frequencies so if your code is
single threaded and your operating system isn't the newest that could be a
reason to go Intel. Likewise if its single threaded and can take advantage of
AVX-512 or the Intel Math Kernel Library.

But otherwise?

~~~
dragontamer
The main selling point for Intel isn't speed anymore.

Intel still has superior performance counters and debugging features.
Mozilla's rr (Record and Replay framework) only works on Intel for example,
and Intel vTune is a very good tool. AVX512 is also an advantage, as you've
noted.

There are other instruction set advantages: I think Intel has faster division
/ modulus operator, and also has single-clock pext / pdep (useful in chess
programs).

For most people however, who might not be using those tools, I'd argue that
AMD's offerings are superior.

~~~
luizfelberti
Is AVX512 really an advantage though?

Intel is infamous for severely downclocking the processor for these and other
AVX/SSE family instructions, to a point where sometimes using them makes the
program slower than it would be otherwise, especially if you're constantly
provoking frequency switches between them and regular instructions.

AMD might not have implemented AVX512 specifically yet (there's nothing
legally keeping them from doing so however, they have patent sharing
agreements with Intel regarding the entire x86/x64 ISA and extensions), but
what they currently DO have is all common SIMD extensions implemented (up to
SSE4 and AVX2 if I'm not mistaken) without incurring any frequency penalties
on clock speeds for using them.

I can live without AVX512 for now, even though I'd be happier to have it. But
I would really rather not have it if it came out in the same crap
implementation that Intel has.

~~~
celrod
You can look at binning statistics for non-avx/avx2/avx512 clock speeds:
[https://siliconlottery.com/pages/statistics](https://siliconlottery.com/pages/statistics)

For example, the worst 7980XEs do 4.1/3.8/3.6 GHz for each of these
respectively. 0.5 GHz down clock isn't too bad. You can change these settings
in your bios however you'd like on the unlocked CPUs (ie, the HEDT lineup +
W3175X). I do find those down clocks are necessary; I've passed 100C with a
360mm radiator and roaring fans. AVX2 loads don't get anywhere near that hot.

But for all that, I do see a substantial benefit on many workloads from avx512
-- at least 50% better performance than what I'd get from avx2.

I definitely think it's nice to have, especially if you enjoy vectorizing code
and looking at assembly. With much bigger performance wins on the line, it's
more rewarding and more fun -- and you have more tools to play with, like
vector masking (unfortunately, gather/scatter have been disappointing). Fun or
not though, if you offer me avx512 on one hand vs twice the cores with full
avx2 for the same price on the other, I'd have a hard time rationalizing
avx512.

~~~
0-_-0
> I've passed 100C with a 360mm radiator and roaring fans

Damn, which radiator? Was there a GPU under load in your water loop?

~~~
celrod
Celsius S36. My GPU is on a separate loop and was idle at the time.

I was running benchmarks of Intel MKL's zgemm vs zgemm3m because of a Julia PR
that recommended replacing the former with the latter. I don't think anything
hits a CPU quite as hard as a good BLAS.

I think my thermal paste may be bad, because the CPU idles hot -- nearly 35 C.
I ordered a Direct Die-X and MO-RA-420 radiator, so I'm planning on swapping
the AIO for an open loop with way more radiator area and flow through the
fins.

Running dgemm, that CPU would hit just a tad below 2 teraflops. I'd like to
get it just over that (and run much cooler).

------
kuon
I just built a new workstation and went AMD. I'm amazed by the performances. I
have a threadripper 1920 and it performs really well under many different
loads (compiling, gaming, video editing). It also handles my dev vms very
well. Of course this is purely based on my feeling, but I had an i7-9700
before and while I had faster FPS in some games, it was really bad at handling
vms.

This is subjective, but I have a really good feeling about AMD, both on CPU
and GPU side.

Only thing I could ask is for more open source (open source CPU firmware), but
I think this is only a dream.

------
GiorgioG
I have the original Threadripper 1950X overclocked to 4.1ghz and I’ve been
pretty happy with it. As a developer I can’t find much reason to upgrade 2
years after its release. Single core improvements (from AMD or Intel) aren’t
game-changing. More cores won’t do anything for me (.NET Core, angular, etc)
I’ve considered switching back to intel for their 5ghz processors but based on
benchmarks, I wouldn’t see anything but marginal improvements. All in all,
competition is good, but I wish they’d focus on single core performance
improvements.

~~~
arcticbull
They're doing both. AMD has ramped single-core performance, at least from an
IPC perspective, massively. From Excavator (2015) to Ryzen we saw a 52%
increase in IPC. From Ryzen 1000 to 2000 (Zen+) we saw a 3% increase in IPC.
From Ryzen 2000 to 3000 (Zen2) we saw a 15% increase in IPC.

From 2015 to 2019 we saw a total IPC boost on the AMD side of over 80%, and an
increase in max core count in the desktop line of 433%, from 6 to 32.

I'm okay with this progress :)

~~~
GiorgioG
The progress is ok, just incremental. AMD had been playing catch up, until
Ryzen/Threadripper.

Just for comparison, here are the numbers from my GeekBench 4 run:

Single-Threaded: 4,746 Multi-Threaded: 34,586

According to a leaked benchmark, the Threadripper 3000's numbers are:

Single-Threaded: 5,519 Multi-Threaded: 68,279

The multithreaded benchmark is 2x, that's a no-brainer since it's likely to
have 32 cores vs the 1950x's 16 cores. Now, I will say that TR3000 benchmark
is not overclocked (3.6ghz.) But from what I've read, it seems like there's
not much room for these latest chips to be overclocked. So, despite being the
3rd iteration, the TR3000 is only 16% faster in single-threaded benchmarks
than my 1950X.

Intel's 9900K (overclocked) gets a single-threaded GB4 score of ~7,000. That
(or its successor) may be my next machine.

~~~
x3sphere
Honestly, that seems rather low. It looks like a lot of Ryzen 3900X's are
doing well over 6000 on Geekbench? Ie.
[https://browser.geekbench.com/v4/cpu/14151649](https://browser.geekbench.com/v4/cpu/14151649)

Is the Threadripper 3000 test also from Geekbench V4? I noticed they added V5.

I don't see why it should be so far behind the Ryzen in single thread...
unless the boost isn't working properly or something, could be disabled if
it's a test chip.

~~~
GiorgioG
Yep, it's from GBv4. It could be that the 3900X is clocked higher than the
TR3000. Given that they're packing more cores on the chip, it's possible that
they can't dissipate heat as well and limit the single-core top-end speed on
the TR3000s more than on the Ryzen 3000 series.

------
khalilravanna
But will they actually be generally available for purchase? I was super
excited to grab an AMD Ryzen 3900x when they came out mid-July and they've
been consistently sold out since then. That's over 3 months with the only
reliable way to get one being to buy one for almost twice the MSRP from 3rd
party resellers at $800-900 vs MSRP of $500.

~~~
tuananh
demand in US is really high. in my country (SEA), 3900x is available almost
every big computer shop.

------
macawfish
Rumor has it there's a 64 core chip coming ~ this year too! With 128 threads,
of course.

The following article calls it a "one-two punch"... wow!

[https://www.digitaltrends.com/computing/amd-ryzen-
threadripp...](https://www.digitaltrends.com/computing/amd-ryzen-
threadripper-3000-cpu-64-core-2019/)

------
ethanpil
I'm ignorant in stock trading, I admit. But I dont understand why AMD stock
isn't blowing up. It's done a recovery after a market wide dip, but hasn't all
this Intel beating news raised the hype? So many other tech companies are so
severly overvalued and hyped daily. Why is AMD a steady 28-32?

~~~
christophilus
With a p/e of 166, there’s a lot of optimism priced in.

~~~
s1artibartfast
Even more amazing when you compare it to Intel with a p/e of 11

------
loeg
By what mechanism would the multicore performance rise 2x without increased
single-core performance? I'm kind of confused. Eliminating intra-CPU
concurrency bottlenecks, I guess?

~~~
FartyMcFarter
Which benchmark or comparison are you referring to?

~~~
loeg
Pretty much the only numbers in TFA:

> The single-core score of 1,275 is pretty much the same as for the current
> flagship Threadripper 2990WX, ...

> But when it comes to multi-core, the Threadripper 3000's score of 23,015
> absolutely destroys the Threadripper 2990WX's score of 13,400, ...

~~~
paulmd
2990WX was a weird ultra-NUMA setup where half the dies had no direct memory
access at all. Part of the gain will have been fixing that - particularly on
Windows where the scheduler did not understand this configuration at all
(Linux results have always been much better).

Another part is the power consumption and resulting higher clocks. The 2990WX
was basically power limited, the new chips are on 7nm which will drastically
reduce their power consumption and allow higher clocks inside the same power
envelope.

~~~
loeg
Wouldn't the higher clocks impact single-core performance? The article claims
that's about the same. I suppose the userbenchmark single-core perf test[1]
(the article links to some userbenchmark.com results) could just be a bad
test; I'm unfamiliar with it. (It could also be that 2990WX wasn't power/heat
limited on a single core, and neither is 3xxx; but say, at 16 core 100% clock,
3xxx can go higher than 2990WX due to a lower power process?)

Anyway, it's still 32 cores. While eliminating the ultra weird NUMA effect and
increasing clocks could explain a significant benefit, I'm still really
struggling to intuit 72% higher performance. But it is Windows 10, so your
remarks about the Windows scheduler may explain the gap.[2]

[1]: [https://cpu.userbenchmark.com/Faq/What-is-single-core-
intege...](https://cpu.userbenchmark.com/Faq/What-is-single-core-integer-
speed/72)

[2]:
[https://www.userbenchmark.com/UserRun/19698768](https://www.userbenchmark.com/UserRun/19698768)

~~~
paulmd
Zen2 is moving to a somewhat deceptive advertising model for clocks.

AMD is advertising the absolute highest that any core on the die can hit under
extremely light load for an absolute instant. Sustained single-core clocks
will be 100-200 MHz less and sustained all-core will be significantly less.
Most cores on a chip are not capable of sustaining the advertised clockrate
even under severe voltage and even for instant loads, only the "preferred
core" on a chip. There is a significant "binning effect" not just between
chips, but between individual cores on a chip.

Intel and previous generations of AMD cores used to advertise the sustained
single-core rate, which could be achieved on any of the cores. This is a
changeup to how the clockrate has been advertised.

original:
[https://www.youtube.com/watch?v=DgSoZAdk_E8](https://www.youtube.com/watch?v=DgSoZAdk_E8)

followup after a patch:
[https://www.youtube.com/watch?v=3LesYlfhv3o](https://www.youtube.com/watch?v=3LesYlfhv3o)

As such the clockrates may be significantly different from what you're
intuiting based on the advertising.

~~~
loeg
I don't actually recall any Zen2 clock advertising in particular. My comment
was purely in response to your earlier statement:

> Another part is the power consumption and resulting higher clocks. The
> 2990WX was basically power limited, the new chips are on 7nm which will
> drastically reduce their power consumption and allow higher clocks inside
> the same power envelope.

------
elheffe80
But what processor is best for Dwarf Fortress?

~~~
dragontamer
Dwarf Fortress is IIRC mainly limited by RAM latency. So you really should be
trying to get 3600 MHz RAM with as tight timings (maybe CL16 timings?) as
possible.

Dwarf Fortress's simulation is all about pointer-indirection and jumping
around memory. The CPU doesn't really do much except wait for RAM most of the
time. It takes ~50ns to talk to RAM, but the CPU is clocked at 4GHz (0.25
nanoseconds), giving you an idea of scale. The RAM tightening can bring your
latency anywhere from 50ns to 200ns depending on how well you tune your RAM
parameters, and depending on chips and stuff. (Servers usually have lots of
slow LRDIMM RAM over multiple-sockets that can be 200ns latency or worse).

AMD takes their design out of the server-playbook, and seems to have ~100ns
main DDR4 RAM Latency. So Dwarf Fortress probably will be faster on Intel
i9-9900k (monolithic design with integrated memory controller and 50ns main
memory latency).

~~~
d33
Interesting! Is there a simple way to measure memory latency?

~~~
dragontamer
Create 1-billion 32-bit integers numbers between 0 to 1-billion. Knuth shuffle
the integers.

"Linked list" traverse the integers as follows:

    
    
        //array is full of 1-billion numbers, randomly sorted
        uint32_t idx = 0;
        for(int i=0; i<200000000; i++){
            idx = array[idx]; // Random traversal
        }
    

Pull out a stopwatch (or use Linux's "time" functionality). Divide the time by
200000000. You've now measured memory latency.

~~~
dlemire
I suggest using Sattolo’ algorithm instead.

Here is a relevant blog post [https://lemire.me/blog/2018/11/13/memory-level-
parallelism-i...](https://lemire.me/blog/2018/11/13/memory-level-parallelism-
intel-skylake-versus-apple-a12-a12x/)

------
archi42
The article references a Geekbench score that's now taken offline.

Google Cache:
[http://webcache.googleusercontent.com/search?hl=de&ei=ewaGXa...](http://webcache.googleusercontent.com/search?hl=de&ei=ewaGXa_NDoLSkwWai5LwBg&q=cache%3Ahttps%3A%2F%2Fbrowser.geekbench.com%2Fv5%2Fcpu%2F170821&oq=cache%3Ahttps%3A%2F%2Fbrowser.geekbench.com%2Fv5%2Fcpu%2F170821&gs_l=psy-
ab.3...1047.2108..2466...0.0..0.120.464.6j1......0....1j2..gws-
wiz.......0j0i67j0i131.ZfZMOTb1Yww&ved=0ahUKEwjv9peD5eHkAhUC6aQKHZqFBG4Q4dUDCAo&uact=5)

Results (also in the article): Geekbench v5.0.1 Tryout for Windows x86
(64-bit); 1275 single core, 23015 multi-core for 32C/64T @ 3.59GHz Base
Frequency (and with 32GB DDR4, no info on 4 or 8 channel).

------
abledon
As more people move to city cores from rural areas... I think computers are
becoming the next generations 'car', in terms of an object to soup up with new
parts.

~~~
thelittleone
Interesting metaphor. It has been this way since the 1980s. For instance, in
1985/86 it was popular to upgrade your 386sx with an i387 math co-processor
which is something like adding a cat back exhaust system on your car. Later on
increasing RAM, installing QEMM for boosting available memory for DOS games
and over clocking and then add on 3D video accelerator cards. There was even a
physics add-on card at one point. Mostly these focused on performance gains.
The shift towards LED bling and aesthetics is more recent though.

------
kraig911
Now if only programming multi-core work could catch up to this hardware.
Honestly chips with more cores is starting to sound like cars with more wheels
or I guess more valves lately.

Doesn't matter the horsepower the speed limit is the same until we can make
software better.

~~~
klodolph
This is a very outdated complaint, we are using all cores these days, just not
on all systems. Backend software is concurrent & distributed, and although
utilization is lower than we’d like, the extra cores are not going to waste.

However, if you put this in a machine you use for playing games or checking
Facebook then you might think that it’s a waste, and you’d be right. There is
plenty of software out there which is still single-threaded, but it’s not
dominating our utilization.

~~~
mrguyorama
Honestly with the way chrome and firefox behave nowadays, even grandma looking
at facebook benefits from multiple cores

~~~
capableweb
Not really, any window/tab that is not in focus gets heavily throttled by the
browser, gradma is probably not running any extensions and single websites are
still single threaded for most parts (more and more are adding web workers for
various parts though)

~~~
NullPrefix
>not running any extensions and single websites are still single threaded for
most parts

Such a shame that all that JS botnet is still single threaded, isn't it ?

------
eyegor
_> The single-core score of 1,275 is pretty much the same as for the current
flagship Threadripper 2990WX, and is actually slightly lower than the 1,334
that the Intel i9-9900K scores._

In what world is this a valid comparison? No one shopping for a $500 8 core
consumer cpu is looking at a $1700 32 core hedt chip. They're entirely
different markets. If you can use 32 cores you'd be looking at intels x series
at a minimum, if not xeons. This is silly at best, if not intentionally
misleading.

------
qwerty456127
I'd rather choose really big caches per core than really big number of cores.

~~~
kllrnohj
Zen2 is 16MB L3 per ccx across the entire stack (hence why even the Ryzen 5
3600 is 32MB of L3 - it has 2 CCXs). There's no reason to assume this won't be
true on Threadripper 3000. So a 32-core Threadripper 3000 would have at least
128MB of L3.

~~~
qwerty456127
L2 cache is what matters. Zen2 has just 512 KiB per core. My old Pentium-M had
2MB. I wish modern CPUs had at least this much.

~~~
kllrnohj
Performance is what matters, not cache size. Your Pentium-M didn't even have
L3, hence why its L2 is so big.

------
fortran77
Why do y'all hate Intel so much?

~~~
floatboth
Because they're the dominant player (and the history of getting to that
dominance is very sketchy, including $billion fines in the EU for bribing OEMs
to use their processors)

------
guelo
Why is AMD making these chips with 14nm transistors instead of using TSMC's
7nm stuff? AMD is only catching up to Intel because Intel's 10nm has been
delayed so much, but if Intel can ever figure out their 10nm process they'll
be in the lead again.

~~~
jdietrich
AMD use 14nm for the IO die and 7nm for the compute chiplets. 10nm won't help
much - after all the security fixes that Intel have had to implement, AMD have
the IPC advantage.

~~~
arcticbull
And further, "nm" these days are marketing numbers. It's totally possible that
AMD's 7nm isn't directly comparable to a 7nm (or 10nm) Intel process.

