
AMD Ryzen Threadripper 3960X and 3970X Review - pella
https://www.anandtech.com/print/15044/the-amd-ryzen-threadripper-3960x-and-3970x-review-24-and-32-cores-on-7nm
======
shantara
The biggest indicator of the success of AMD Threadripper line to me is that
Intel has been forced to cut the price of their newly released i9-10980XE CPU
by $1000 compared to i9-9980XE while maintaining practically indistinguishable
performance.

~~~
bitL
The only advantages are AVX512 and optimized Intel compiler/libraries now.

~~~
dnautics
Is avx512 an advantage really? Iirc the machine has to go into another mode
where processing across the board becomes slower except for the avx
instructions, which to me seems only useful for niche hpc "measuring contest"
applications.

~~~
bitL
Even if 10980XE downclocks to 2.8GHz while using AVX512, it's still 10x faster
in MKL than Threadripper 1/MKL.

From the AnandTech review, 3D particle test was showing AVX512 effect nicely:

[https://images.anandtech.com/graphs/graph15044/113590.png](https://images.anandtech.com/graphs/graph15044/113590.png)

10980XE had 3.9x speedup compared to 3970WX per core using AVX512.

So for some scientific computing purposes (maybe game physics?) AVX512 is
worth it.

~~~
dr_zoidberg
> (maybe game physics?)

Not game physics, as it puts the CPU in a lower speed regime it'd have
negative implications on the rest of the games performance. So far, that
AVX512 requires this lower speed (due to thermals) is an implementation
detail, and it could be expected that newer processes (Intels 10 or 7 nm?)
would allow them to work AVX512 tasks on full speed.

Until that happens, and everyone has AVX512 (because it'd be a massive fail to
have a game that requires you to have a HEDT Intel processor to play), it'd be
a nice gimmick to have on very specific tech demos, and performance sensitive
scientific code that you know will run on a certain machine with certain
characteristics.

~~~
Danieru
Games will ship with AVX512 special paths once the AMD chips in consoles
support it. Until then it is just a fancy feature to make already fast CPUs a
bit faster.

Game programmers will put time into making slow CPUs faster. Outside tech
demos or hardware marketing tie-ins no budget is allocated to making yet more
spare capacity.

~~~
shmerl
What kind of games care about such specific instructions? Unless you are
writing something in assembly, that's not something game developers usually
are focused on.

~~~
gameswithgo
Unity is a very high level engine, that uses C#, and it now has a system built
in that lets you write code that looks like C# but will translate it to
whatever SIMD instruction set is available, like ISPC.

There are also various libraries that leverage metaprogramming to do similar
things. I don't think you understand what game devs are willing to do, to get
a few more polygons and pixels on the screen!

~~~
shmerl
_> I don't think you understand what game devs are willing to do, to get a few
more polygons and pixels on the screen!_

Totally depends on the trade-offs. You can write your whole game in assembly,
and target very specific hardware, and may be beat optimizing compiler
(doubtful). But at what cost? Time spent on that could be spent on making more
games.

Normal up to date hardware handles games just fine, as long as they are not
using some abysmal and poorly parallelized engines. Modern CPUs with more
cores are also helping that, especially after Ryzen processors opened the
gates for it.

------
stopads
Great review, but the highlight for me is that windows task manager view of
CPU utilization. It's like looking at a spreadsheet:

[https://images.anandtech.com/doci/15044/3970X%20Task%20Manag...](https://images.anandtech.com/doci/15044/3970X%20Task%20Manager%202_575px.jpg)

~~~
theandrewbailey
I remember Linus playing around with a Xeon Phi CPU with a few hundred
threads. The task manager was all percent signs.

[https://youtu.be/fBxtS9BpVWs?t=200](https://youtu.be/fBxtS9BpVWs?t=200)

Looks like Microsoft has already got 1000+ cores on Windows:
[https://techcommunity.microsoft.com/t5/Windows-Kernel-
Intern...](https://techcommunity.microsoft.com/t5/Windows-Kernel-
Internals/One-Windows-Kernel/ba-p/267142)

Can we get Bruce Dawson[0] one of those? I wonder how many more bugs he'll run
into.

[0] [https://randomascii.wordpress.com/](https://randomascii.wordpress.com/)

~~~
Macha
I'm really curious about the machine it's running on. 896 physical cores is an
odd number - 32 x 28, 16 x 56 or 8 x 112 are the likely combinations. The
picture identifies it as a Xeon Platinum 8180 which is a 28C/56T CPU. Are
there systems that support 32 Intel CPUs in one host? I thought quad socket
was the practical limit these days.

~~~
paulmd
It says right there, Xeon Phi 7210.

Knight's Landing supports 4-wide threading per core so you get 256 threads,
which is exactly what it shows in task manager under "logical processors".

~~~
Macha
I was talking about the Microsoft article, not the LTT video, which are using
different CPUs.

The HP 32 socket chassis (8x4 socket boards) seems to be the answer.

------
3fe9a03ccd14ca5
> _Last AMD generation was 250W, this one is 280W: if we’re not there already,
> then this should be a practical limit._

A good point for people looking to use this in their home server like I am.
It’s going to be really hot if you’re getting your money out of it.

At some point there’s not an efficient way to _cool_ these processors when the
ambient room temperature rises too high. Anyone have suggestions for that? I’m
in a weird position where power is cheap but I can’t AC my garage (where it’s
located).

~~~
3fe9a03ccd14ca5
Also, 280w comes out to be about $58 a month in electricity for Bay Area
pricing (29c/kWh). Collocation will probably be cheaper for some folks.

~~~
m0zg
That's only if you use all cores 24x7 and all ALUs within those cores
(vectorized workloads), which is very unlikely unless you're doing straight up
linear algebra on the CPU.

In CA there's also the issue of not having electricity in the first place
sometimes though. So power _consumption_ seems kind of secondary.

~~~
Dylan16807
> That's only if you use all cores 24x7 and all ALUs within those cores

These chips don't run at a fixed frequency. They dynamically adjust based on
thermal limits and power limits.

You only have to burden _most_ of the cores/ALUs to hit maximum power. Any
load above that threshold uses the same amount of power, as frequencies nudge
down to compensate.

~~~
simooooo
That doesn’t sound right

~~~
Dylan16807
Why doesn't it sound right? Just look at the chart in the article:
[https://images.anandtech.com/doci/15044/3970X%20Power%20Grap...](https://images.anandtech.com/doci/15044/3970X%20Power%20Graph.png)

The power limit is configured in the BIOS, and can be disabled, but with these
massive chips they default to throttling down and capping power consumption so
that you aren't forced to go crazy with the cooling system (and motherboard
and PSU capacity).

------
pella
_" AMD Pre-Announces 64-core Threadripper 3990X: Time To Open Your Wallet"_

[https://www.anandtech.com/show/15151/amd-
preannounces-64-cor...](https://www.anandtech.com/show/15151/amd-
preannounces-64-core-threadripper-3990x)

~~~
qaq
How much would that run for 5K?

~~~
brians
Probably. Or AMD pulls another rabbit out and calls it $2500 in mid 2021.

~~~
bitL
I doubt it. This is new AMD with ridiculous prices when they face no
competition. I'd say it's more likely to expect $6k than $2.5k.

~~~
rubbingalcohol
I don't really see how the pricing is ridiculous though. They're still way
cheaper than what Intel charged for the same cores a year ago, and they have
more cores than Intel can even offer on their best workstation chips. If you
compare against 28 core Xeons, the new Threadrippers are a downright bargain.

~~~
bitL
I meant in comparison to what we were used to. Now a semi-decent TRX40 board
is $700, entry-level TR3 is $1300. Top-end x399 board is $550, entry-level TR
is $250. There was a huge jump in prices compared to previous generation of
HEDT.

~~~
pitaj
Board cost is due to PCIe4 support mainly

~~~
bitL
Which is kinda unnecessary as there is no single GPU on the market capable of
saturating PCIe3 and situations where one needs a sustained transfer between
multiple M.2 SSDs that could saturate PCIe4 are very rare. Only 100Gbps+ LAN
is probably practical for a few total pro users.

~~~
dragontamer
Actually, its pretty easy to get bandwidth-bottlenecked in GPU-compute.

I know video games don't really get bandwidth bottlenecked, but all you gotta
do is perform a "Scan" or "Reduce" on the GPU and bam, you're PCIe
bottlenecked. (I recommend NVidia CUB or AMD ROCprim for these kinds of
operations)

CUB Device-reduce is extremely fast if the data is already on the GPU:
[https://nvlabs.github.io/cub/structcub_1_1_device_reduce.htm...](https://nvlabs.github.io/cub/structcub_1_1_device_reduce.html).
However, if the data is CPU / DDR4 RAM side, then the slow PCIe connection
hampers you severely.

I pushed 1GB of data to device-side reduce the other day (just playing with
ROCprim), and it took ~100ms to hipMemcpy the 1GB of data to the GPU, but only
5ms to actually execute the reduce. That's a PCIe-bottleneck for sure.
(Numbers from memory... I don't quite remember them exactly but that was
roughly the magnitudes we're talking about). That was over PCIe 3.0 x16, which
seems to only push 10GBps one-way in practice. (15Gbps in theory, but practice
is always lower than the specs)

Yeah, I know CPU / GPU have like 10us of latency, but you can easily write a
"server" kind of CPU-master / GPU-slave scheduling algorithm to send these
jobs down to the GPU. So you can write software to ignore the latency problem
in many cases.

Software can't solve the bandwidth problem however. You gotta just buy a
bigger pipe.

------
FBISurveillance
AMD slaps Intel around a bit with a large trout.

~~~
mstade
Man, that's a blast from the past – thanks for the chuckle!

~~~
walkingolof
[https://www.youtube.com/watch?v=78b67l_yxUc](https://www.youtube.com/watch?v=78b67l_yxUc)

~~~
jacquesm
Right series, wrong clip, I thought it was going to be this:

[https://www.youtube.com/watch?v=T8XeDvKqI4E](https://www.youtube.com/watch?v=T8XeDvKqI4E)

------
bjoli
Why no compilation benchmarks? Say, a Linux, chromium or GCC compile. All that
cache must make it fantastic at compiling.

~~~
IanCutress
Ever since moving to Win 1909, our compile benchmarks have been a bit off. I
was at Supercomputing last week, literally got back Saturday to start writing
the review, and I need to get some time to debug why it's not working as it
should. I've got Qualcomm's Tech Summit next week and IEDM the week after
that, so you'll have to wait a bit. Ian (the editor of the review)

~~~
keldaris
I can't resist asking something that's been bugging me for a long time. The
issue of insane travel schedules has openly plagued AnandTech for a long time
now, both delaying reviews and severely impacting their quality when they do
come out. Every tech website out there covers the big events in roughly the
same way, very few do proper deep dive technical reviews like you've done in
the past. Is it really the best use of your time to (partially) squander the
comparative advantage you have in favor of rushing from one event to the next,
reporting the same things everyone else does?

~~~
cheez
If you don't show up, no one knows you exist. He has to do everything the
other guys do and more in order to stand out.

~~~
keldaris
His CPU reviews in particular have at times been virtually unparalleled in the
industry. I don't think name recognition is an issue. It's just painful to see
that suffer for run of the mill industry news reporting.

~~~
cheez
Think of it as networking.

------
boris
6 pages of gaming benchmarks and not one compilation speed test. Baffling.

~~~
ATsch
This is my frustration with almost all tech sites. The AMD press deck included
compilation benchmarks, but the only others to reliably provide them are
Phoronix.

~~~
close04
Most reviewers don't bother doing compile benchmarks because they're not as
familiar with them and perhaps they don't come in the same "canned" form as
every other gaming benchmark. It may also be that each site caters to a
particular audience.

On the other hand bench results have to be comparable and relevant (in time).
Which is easy when you run the same still widely played GTA V year after year
on every new CPU. But comparing compilation time for kernel version 3.11
(released at the same time as GTA V) seems a lot less relevant today.

~~~
ATsch
Maybe this would change if someone would pre-package a build environment with
source code, a nice gui and fancy abstract visualization of the compile
process.

~~~
vlovich123
Phoronix does this already AFAIK.

[https://github.com/phoronix-test-suite/phoronix-test-
suite](https://github.com/phoronix-test-suite/phoronix-test-suite)

~~~
ATsch
Phoronix does do this, but it's unfortunately harder to use than would be
required for wide adoption in the press. It really has to be as simple as
downloading an exe that pops up a window with a "go" button when run, and has
to show some nice things happening on screen. Game and graphics benchmarks do
this, so that's what they use.

------
ravedave5
I love it how CPUs have more onboard memory than my whole first computer now.

~~~
bryanlarsen
It's been a while since the L1 cache size (64KB) has exceeded that of my first
computer. What I find crazy is that a single Ryzen core has more in
_registers_ than some of the computers I've used.

AMD64 has 16 64 bit registers, but Ryzen actually has 168 behind the scenes so
it can pipeline and reorder multiple instructions simultaneously. That's
almost 1.5KBytes of memory, more than some microcontrollers have in ROM or
RAM.

~~~
chapplap
It's actually a lot larger if you look at vector registers - AVX registers are
32 bytes each and there are also ~168 of them in modern AMD and Intel CPUs,
resulting in 5KB of registers. With AVX-512 the number is 10KB!

------
greatjack613
Respectfully, intel looks like its hiding under a chair by refusing to release
anything with similar core counts.

I mean I would also be afraid that Amd has a

> Manufacturing Process

> Performance

> Power Consumption

= IPC

~~~
colinchartier
Intel even refused to give i9 9900Ks to Linus Tech Tips because LTT had a
Ryzen 3950X and Intel didn't want them benchmarked.

~~~
HeWhoLurksLate
They still got one, though, through Origin PC.

------
adrianmonk
> _Thread + Ripper was a clever play on words: anything that had plenty of
> threads, the hardware was designed to ‘rip’ through the workload._

Tangent, but is this really a play on words? And if so, what is the other
meaning here?

I get the meaning where lots of cores are ripping through workload. But the
other one I don't. The closest guess I have is that it's like a comic book
hero (The Incredible Hulk) ripping through his clothes because he is so big
and powerful. And I don't guess it refers to a seam ripper (the sewing tool).

~~~
spectramax
AMD's (and Intel's) marketing teams are awfully enslaved to the gamer market -
gamer aesthetics, big bold packaging, insane names that sound like something
off of Tron, the whole RGB enchilada.

AMD started it with Ryzen. That name sounds like a character from the Lord of
the Rings. And then we Threadripper - violent to say the least, and finally,
EPYC - cheap play of the letters.

What happened to marketing like the IBM System 360? Elliot Noyes is rolling in
his grave. I don't think the marketing teams are to be blamed, its the
consumers and the Taiwanese influence around what a computer product should be
marketed as such.

~~~
adrianmonk
I guess I never thought of it in a gaming-oriented way, although it totally
makes sense.

Ryzen also sounds like "rising", and even like "horizon", which are both
pretty positive and non-gamer-y.

Ripping has other meanings too. It can mean going really fast or
energetically, like "the driver ripped right past the race leader on that
corner" or "let her rip" when you launch something at full speed. And a
sawmill or woodworker uses a circular saw, band saw, etc. to rip wood, which
means dividing it along the grain, the natural direction it wants to split.
Which I suppose makes a good analogy for embarrassingly parallel compute
problems.

------
jiggawatts
Has anyone seen independent reviews for database performance on EPYC 2 /
Threadripper 3?

I'd love to recommend the AMD platform for customers with large databases, but
there's is zero reputable information available on the Internet. The vendors
can't be trusted, because they obviously cheat their ass off, such as Intel
disabling security patches, using ludicrous hardware configurations, etc...

~~~
petronio
Phoronix has done a few DB benchmarks on EPYC 2.
[https://www.phoronix.com/scan.php?page=search&q=EPYC](https://www.phoronix.com/scan.php?page=search&q=EPYC)

Don't forget that if comparing to old benchmarks, there's a bug with Intel's
TSX now and the mitigations once again seem to take a nice hit if your
database engine makes use of it.

------
misja
280W TDP?! That's more than what a large refrigerator uses ..

~~~
gruez
>That's more than what a large refrigerator uses ..

That's partially due to good insulation in modern fridges. After all, if the
insulation is perfect, you could run a fridge on 0W (assuming you don't open
the doors).

~~~
all_blue_chucks
How's that? By powering them with Maxwell's Demon?

[https://en.wikipedia.org/wiki/Maxwell%27s_demon](https://en.wikipedia.org/wiki/Maxwell%27s_demon)

~~~
dsjimi
Watts are joules/second - with perfect insulation and doors not opening, the
only energy used is the energy used to cool the contents initially. As time
approaches infinity watts approach zero, despite the energy consumed
initially. c/t -> 0 as t->INF given constant c.

------
bitL
Any official announcement for 8-channel 4TB LRDIMM TRX80/WRX80 chipset yet?
TRX40 is underwhelming compared to high-end x399...

~~~
Jonnax
What's missing?

~~~
bitL
TRX40 can do only as much RAM as x399 (256GB) as there are only 32GB (ECC)
UDIMMs available. With incoming 64-core TR that would be 4GB/core or
2GB/thread, which is way too little for a CPU that would cost ~$5k. TRX40
boards also for some reason have fewer PCIe slots (4) than x399 (6) or x299
(7) boards. I can't call them "Pro" because of that - fewer GPUs for Deep
Learning is not a good idea in a workstation-level tech.

~~~
pmjordan
To be honest, it sounds like your particular needs are best served by an Epyc.

~~~
bitL
TR has usually higher CPU frequency and more agile BIOS, for less $.

------
IOT_Apprentice
The benchmarks on the Ryzen Threadripper and Zen 9 3rd gen cpus look great. I
have been looking to find prebuilt gaming desktops that don't have issues with
those chips. Newegg reviews show failures around their units and Amazon isn't
much help either. Is this truly a build your own situation? Not necessarily
issues with the CPUs but the overall quality of the build and bios issues/DOA
status. Any recommendations for BYO or prebuilt options?

~~~
avgDev
Build your own! Single components offer longer warranty(usually). You can
research decent RAM, decent hard drives and so on.

Building a PC these days is trivial, back in the day there was a lot more
fiddling with bios, now things are simplified and its mostly plug and play.

I actually might build a computer and make a detailed written guide, but I
would love to help if you decide to go this route.

------
naveen99
it'd be funny if amd came out with something like intel phi pcie cards, after
intel discontinued them. But for my mostly unet / fcn, segmentation deep
learning models, gpu's will continue to be the goto, until we get into a few
hundred cores range...

~~~
HeWhoLurksLate
I really want one of those, even though I have almost no use for one. Maybe?

------
fulafel
I started wondering about the physics of removing 280W of heat from the tiny
silicon die/dice. Is most of the heat conducted through the metal connections
or through the package? What material is the package made of?

~~~
kllrnohj
It's not actually that tiny and it's pretty spread out. The HEDT chips are
physically very large.

You can see how the dies themselves are laid out here:
[https://images.anandtech.com/doci/15151/amd_rome-678_678x452...](https://images.anandtech.com/doci/15151/amd_rome-678_678x452_575px_678x452.png)

And you can see how physically huge the package is here compared to desktop
CPUs you may be more familiar with:
[https://images.app.goo.gl/T5kpz8WyjGHV8Vqm8](https://images.app.goo.gl/T5kpz8WyjGHV8Vqm8)

------
jgalt212
32 cores at good price point. Now there's no reason to lease all those VM's
and use kubernetes. Unless of course, we are all I/O bound due to the bulk of
the work being telemetry and tracking. hmm....

------
posix_me_less
An advice for the younger enthusiasts out there: don't buy this _yet_, it's an
overpriced pseudo-workstation platform for now, even if you can afford it.

It's nice to watch (and have) new CPUs pushing the available computing
performance up, and bringing new features. With Ryzen 3000/EPYC Rome this
finally happened in a meaningful way, and TR is(shortly will be) the most
powerful CPU there is. _AND_ Intel is getting kicked in the ass, which is
good.

But objectively, even if AMD is better value than Intel now, it still is
overpriced.

The first problem is with the launch price. This kind of system maintains the
top status for one year or so these days. Then it slips into mainstream level
which can be got for 50%, 30% price. Look at what happened to 2950X, or EPYC
Milan platforms. They were similarly expensive at launch and they are
mainstream performance now, their prices have fallen dramatically as well. (If
you want AMD, those are much better choice now.)

The second problem is, the TRX40 motherboards available now are simply
underwhelming and disappointing, take a look:

[https://www.pcmag.com/feature/372107/first-look-all-the-
amd-...](https://www.pcmag.com/feature/372107/first-look-all-the-amd-
trx40-motherboards-for-third-gen-thre/6)

You can get 16,24,32 cores for x $1000, but only 4 channels of memory and 3-4
PCIe slots and 2010-era networking? The Xeon workstation boards are so much
better. Here is how you design a motherboard in this price range:

[https://www.servethehome.com/supermicro-x11spa-t-
motherboard...](https://www.servethehome.com/supermicro-x11spa-t-motherboard-
review-an-intel-xeon-w-3200-platform/)

It's like mobo manufacturers are thinking, the TR CPUs are cooler than Xeons,
let's make a killing on expensive gamer-like motherboards.

If you're after nice punch performance on the cheap, go here, sort by
performance, then look for rows with highest value.

[https://www.cpubenchmark.net/CPU_mega_page.html](https://www.cpubenchmark.net/CPU_mega_page.html)

The result for me is, if you're after performance but clever with money, go
for 3900X or 3700X on an B450 board, anything else is kind of stupid for most
people.

If you're after solid workstation with ECC and 2020-level, connectivity, Intel
Xeon W now or wait for better motherboards (unlikely to happen).

------
somurzakov
can it run crysis, though?

~~~
leadingthenet
It might even run it in CPU-rendered mode...

~~~
bob1029
I think this is the part where it gets interesting. Once you have enough
parallel general-purpose compute horsepower available to run a physically-
based renderer (e.g. what Pixar uses) at frame rates beyond 30/second, you can
start to enter into a realm of arbitrarily-complex scenes within real-time
applications.

How far off are we from this possibility, assuming someone sat down and
optimized existing solutions for this use case?

------
pella
review _" AMD Ryzen Threadripper 3000 (Castle Leak) Review Roundup"_

[https://videocardz.com/83333/amd-ryzen-
threadripper-3000-cas...](https://videocardz.com/83333/amd-ryzen-
threadripper-3000-castle-leak-review-roundup)

------
kd3
All these products are great but I still can't get a Ryzen 9 3950x anywhere
for fucks sake. I'm tired of waiting. Amd should work on their supplies.

------
gowld
What does "nm" mean, really?

It's only barely useful to compare generations within one company's fab model.

What's a better measure to use? transistors / m^2 ?

Or at least to label properly as "7 AMD-nm" to avoid false comparison to
"Intel-nm"?

~~~
bryanlarsen
Million transistors per mm^2 has been proposed as a comparison metric:

[https://en.wikichip.org/wiki/mtr-mm%C2%B2](https://en.wikichip.org/wiki/mtr-
mm%C2%B2)

There are lots of different transistor sizes on a chip, so it also specifies
what kind of transistors to measure.

