
AMD Reveals Threadripper 2: Up to 32 Cores, 250W, X399 Refresh - gnufied
https://www.anandtech.com/show/12906/amd-reveals-threadripper-2-up-to-32-cores-250w-x399-refresh
======
loser777
Just put together a first gen Threadripper build for work a few months ago
where out-of-the-box ECC support was too good to pass up for a box with >=
64GiB DRAM. If it’s possible to do a drop-in upgrade (cooling solution not
withstanding) then this will be a big deal as we have basically not seen
backwards compatibility, especially with such a compelling upgrade, in this
part of the market in over a decade from Intel.

But 16 to 32 cores is a leap (250W TDP!!!!!), so I wonder what cooling
solutions will remain viable (e.g., what about the TR4 edition Noctua cooler
made just for TR?)

~~~
dabockster
Cooling Threadripper has been my issue with adopting it as well. The only pre-
assembled cooler I've seen is an Enermax AIO block. Otherwise, you have to
loop it on your own.

~~~
loser777
Do you mean for liquid-cooled solutions only? We've been using this Noctua
U14S TR4-SP3
([https://noctua.at/en/nh-u14s-tr4-sp3](https://noctua.at/en/nh-u14s-tr4-sp3))
on a 1950X (no overclock though) without issues.

~~~
reilly3000
Oh man, I love that fan. But I misread the height specs on it and I couldn’t
fit everything into my case any longer, but my old fan had died. I Ordered a
replacement on amazon and the sent out the wrong item twice, so for about 3
weeks I just ran it open and yelled at the cats if they came in my office.
Amazon finally gave up on trying to send the right case, and just told me to
by it somewhere else, and eventually gave me a refund. I wanted to not take
any chances, and the cats were getting terrified of me, so I ordered a
ThermalTake core X9. The thing is insane. You could reasonably fit two
motherboards and 8 hard drives in it. Don’t try to fit it on a desk. It could
almost be a desk. But it is all worth it because I love that fan so much.

------
megaman22
What does this mean compared to say a 2011 i7?

I'm frankly amazed my first post-college desktop is soldiering on as a high-
end gaming system with gpu, ssd and dram updates

~~~
lev99
Generally speaking gaming is single thread intensive on the CPU side, and all
parallel task are offloaded to the GPU.

This is really more of a workstation or server cpu than a gaming cpu.

Edit: Tons of comments are pointing out exceptions. I was hoping to cover that
with "generally speaking".

~~~
pjmlp
Game studios are already taking advantage of all the cores they can get.

This guys go all the way up to 6 hardware ones, depending on availability.

[https://youtu.be/ptakmFGcIRU](https://youtu.be/ptakmFGcIRU)

~~~
majewsky
Fun fact: A friend has a first-gen Ryzen with 8 cores (16 hyperthreads), and
wondered why Civilization V was always crashing on his system. Turns out that
Civ V runs into weird race conditions with that many threads. When he pinned
the game to 8 or less hyperthreads, it started working properly.

~~~
ethbro
Writing code on top of legacy single threaded frameworks / content interfaces,
under an insane holiday release deadline, while trying to enable
multithreading... is pretty much my definition of hell.

I feel bad for the poor programmers who got that job.

------
cmsimike
I was just telling myself today that I wasn't going to build a new desktop for
a while. Now I'm thinking current desktop might become the new VM server and I
get a new gaming/dev box!

~~~
onli
I would not count on this being great for gaming. Already the current
threadripper is slower in games than a regular Ryzen processor or Intel i5/i7,
slower base and turbo clock and added memory latency won't help with that.

~~~
josefx
Is it actually slower or is it just the good old Intel compiler at work again?
Binaries compiled with it have great performance on Intel chips, but perform a
vendor id check and slow down on non Intel chips. As long as Intel CPUs remain
dominant game developers will continue to use it.

~~~
onli
It really is slower. In games it is slower than AMD Ryzen processors, no
compiler manipulation here. It is also to be expected with a lower clock and
higher memory latency.

------
gnufied
Kinda excited and can't wait for more details. Key detail from anandtech
article:

"Threadripper this is still the case: the two now ‘active’ parts of the chip
do not have direct memory access. This technically adds latency to the
platform, however AMD is of the impression that for all but the most memory
bound tasks, this should not be an issue"

~~~
sliken
Sounds pretty much exactly like an Epyc, but with half the memory channels
disabled.

Seems kinda silly unless it's pretty cheap. After all the 7351P is $750,
guarantees ECC, and has twice the memory bandwidth of the threadripper.

~~~
loeg
7351P is the 16-core Epyc part. The comparable Epyc part is the 32-core 7551P,
at $2317 (Newegg).

There is definitely differentiation between the server line of chips and the
TR "HEDT" chips, other than the crippled memory channels.

The TR4 socket sacrifices half of the SP3 memory channels to use for more
power.

As a result, TR models come with (much) higher stock clocks: 3.0 GHz base ->
3.4 GHz all-core turbo vs Epyc 7551P 2.0 GHz base -> 2.55 GHz all core.

So all in all, TR may be better for your workflow than Epyc even at the same
core count and price (if clock matters more to you than memory bandwidth). And
keep in mind, server boards aren't free.

~~~
sliken
Not sure I get the value. Unless you like motherboards with LEDs hot glued
everywhere.

To compare: Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels = $799
TR 1920X, 3.5GHz, 12 core, 24 threads, 4 memory channel = $761 TR 1950X,
3.4GHz, 16 core, 32 threads, 4 memory channel = $959

Sure the 1950X is 17% faster clocked, but with half the mem bandwidth and half
the mem channels (half the performance for random lookups).

So generally the 1950X will be 17% faster with no cache misses, and 50% as
fast with all cache misses.

Motherboard prices are pretty close, the Epyc, Dual GigE, 3x PCI-e x16, etc =
$369. I found some TR boards cheaper, but not with ECC or 3x PCI-e x16.

Sure the future thread ripper 2 will be faster... but so will the Epyc 2. Not
sure I see the point in the TR.

~~~
loeg
> Not sure I get the value. Unless you like motherboards with LEDs hot glued
> everywhere.

Clock and single core performance absolutely matters to some customers. This
is part of why Intel can continue to dominate the home and server market
despite selling lower core count chips — they are clocked a little higher and
have a little higher IPC.

(Also, I totally agree about the silly LEDs, but I also think board
manufacturers wouldn't do it unless it produced a return, i.e., added value on
average.)

> To compare: Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels =
> $799

> TR 1950X, 3.4GHz, 16 core, 32 threads, 4 memory channel = $959

7351P @2.9 GHz is the boost clock — base is 2.4.

1950x @3.4 is the _base_ clock — boost is 4.0, on up to 4 cores (not all-core
like the 7351P part).

> So generally the 1950X will be 17% faster with no cache misses, and 50% as
> fast with all cache misses.

Your 17% number is sort of wrong, or missing the point — yes, that's the clock
difference for all-core workloads, but TR boost kicks in at 1-4 thread
workloads.

4.0/2.9 is huge. That's 38% additional CPU on (very common) 1-4 thread
workflows. That's worth a premium to some people.

The obvious question you might then ask me is, if your workflow is only 1-4
threads, why buy a 16 core CPU? Well, sometimes people have workflows that
vary over time. Maybe some of the time you only need a few cores to manage
interrupt traffic and keep the GPUs fed, and maybe other times you do want all
of the cores to complete a parallelizable, CPU-bound task like compilation or
h.265 compression faster.

If your workload is exclusively embarassingly parallel and you can keep all
cores busy all the time, yes, a server platform like Epyc 7351P is a much
better value for you than TR. Or if your workflow needs more than 80 GiB/s
(TR1) or 95 GiB/s (TR2) memory throughput, a server platform like Epyc is
probably a better value for you.

> Sure the future thread ripper 2 will be faster... but so will the Epyc 2.
> Not sure I see the point in the TR.

Epyc 2 will almost certainly not be clocked higher than TR2, due to the
additional power draw the TR socket has vs SP3 — though, the gap may shrink.

~~~
GiorgioG
I keep my TR (1950X) running at 4.0ghz 24/7...runs like a champ.

~~~
loeg
Neat. Just curious, if you don't mind — is it overvolted, and if so, by how
much? And what's your cooling solution and how well does it do (offset from
ambient)?

------
TylerE
Nice I guess, but I wish we could get back to working on getting FASTER cores
rather than more of them. Some tasks just don't multi-thread.

~~~
sliken
The laws of physics are intruding. Clock rate doubling without prohibitive
cost, power, and heat have gone. At 3 GHz electrons don't go very far in 1/3rd
of a NS. Chips are getting physically bigger (look at the Epyc). Easy
increases in IPC are gone. There's an embarrassing number of transistors
available, but few uses for them actually increase performance on things
consumers care about.

Even GPUs are starting to peter out with lengthening product cycles, smaller
improvements, and the increasing tendency to just rename the previous
generation with a new name, as if it was actually significantly improved.

So CPUs have peaked, most of the low hanging fruit has already been taken.
Most easily shown by pretty much all the popular CPUs from the previous
generation maxed out at 95 watts or so (desktop or server), but are at 180
watts in the current generation. This happened years ago for desktops, but is
hitting laptops (last year), and smart phones (this year or so).

GPUs are about half way there, improvements are harder. No more annual updates
with 2x the performance on most common use cases. Feature improvements are
getting more obscure, gaming sites are using 16x zoom to show the
improvements. GTX1170 is rumored to be 50% faster than the GTX1070... from 2
years ago.

Specific use ASICs for mining, AI, and similarly narrow use cases are still
early in their development. Multiple vendors are showing 2x improvements with
each generation, even crazy things like including said chips in consumer
devices with tiny power budgets.

So basically a core twice as fast isn't happening, it's not because anyone is
lazy. There's tons of room for improvement in specific use cases where today's
silicon is crazy inefficient. Things like machine learning, software device
radios, vision specific processing, and related are seeing rapid improvements.

~~~
merinowool
I think it is not happening because it doesn't have to. Current CPUs are more
than fine for home user and HEDT market is too small to excuse huge spending
for research. I think once electron apps will become more popular home users
will get power hungry again and that will kick off another cpu revolution.

~~~
muxr
It's not happening because it's hard to do. Extracting instruction level
parallelism is difficult(which is how CPUs gain IPC) and then also the cores
have to be wider with exponential complexity. ie. At the point you have to
sacrifice 50% of CPU's power efficiency for 5% IPC gain, it's no longer worth
it.

------
qwerty456127
Are individual cores in this thing something nearly as powerful as in
conventional quad-core desktop/server CPUs?

Also can it sleep/downclock individual cores to cool down when its full power
is not needed?

~~~
JoshuaJB
Yes. Threadripper is essentially just four well-binned Ryzen 7s tied together
with a creative interconnect. The memory bandwidth doesn't increase as
proportionally as the core count, but each individual core should operate just
as fast as in a typical desktop part.

Edit: To answer the second part of the parent question, since the dies are the
same, it should have all the same power-saving features available in AMD's
laptop and desktop processors.

~~~
sliken
Epyc is 4 ryzen 7's. The threadripper (at least the ones shipping today) are
two Ryzen 7s.

The memory bandwidth does increase proportionally (it's 2x ryzen for
threadripper and 4x ryzen for eypc).

Individual cores do tend to be lower on the threadripper and epyc because of
power/heat limitations, so they tend to be lower clocked and tend to have less
cache per core.

~~~
JoshuaJB
According to the linked article, Threadripper 2 uses the equivalent of 4 Ryzen
7s, however they disable the memory busses on two of them which yields only a
2x increase in memory bandwidth. You're correct that Threadripper tends to
clock slightly lower, but the cache is on-die so it shouldn't be any different
per-core than in a typical Ryzen.

------
phkahler
I was hoping they would put a GPU and HBM2 in place of those non-functional
die. Like their old EHP concept that never saw the light of day. With enough
HBM you wouldn't even need regular RAM modules and could fit a monster system
on an ITX board.

~~~
paulmd
(1) the TR4 socket is too big for mITX, even the "hold my beer" team over at
Asrock only could stuff it into mATX.

(2) TR4 mobile don't have video outputs, so a GPU would only be useful for
compute.

If you can't fit a discrete GPU the Hades Canyon NUC is still the best you can
do, although AMD is readying their own competitor called "Fenghuang" with a
bigger+newer GPU - 28 CU Vega vs the 24 CU Polaris on the Intel (yes, despite
Vega branding it's actually based on Polaris).

~~~
phkahler
It would fit if the GPU and RAM were both in the module. No GFX or RAM slots
needed on the board.

------
alfalfasprout
Problem is windows support for this many cores is still a problem. On a Linux
server, sure we've been using multi-socket motherboards w/ high core count
CPUs for years. But many applications (eg; Microsoft Edge) still crash w/ 16
cores.

~~~
merinowool
I have Intel's 20 core CPU and never had problems.

~~~
akvadrako
My guess is it's a timing bug that likely exists with any number of cores, but
becomes more likely the higher the parallelism.

~~~
tjoff
My guess is it's something else entirely specific to that system.

Really weird to just single out many cores as the cause. It is also just one
application that barely has anything to do with windows anyway. There has
never been a lack of multiple-cpus nor many-cores (and 16 isn't even many) on
windows as the post implies either.

------
mpreda
Intel leads in AVX/AVX2/AVX512 perf by a large factor over Ryzen CPUs. (A
software that makes that manifest is Prime95 (Mersenne primality tester)).
Looks like AMD did have to cut something after all.

~~~
sliken
Indeed, but the problem is that often AVX intensive code triggers intel to
significantly throttle, or run out of bandwidth from cache or main memory.

A surprisingly number of codes don't seem to benefit from the AVX performance.
Most notably even on the most FP intensive code don't seem to benefit much
from the higher end xeons (with two AVX512 units) over the lower models with a
single AVX512 unit.

So sure if you compute mercenne primes all day go for it. If you are actually
doing some real world work that's vector intensive you are often better off
with AMD (per $.)

Keep in mind that today's chips have more performance per memory bandwidth
then previous generations. So generally memory is a bigger bottleneck and
AMD's 1.33x advantage in bandwidth/memory channels is an advantage on a wider
variety of codes than the previous generation.

~~~
paulmd
> Indeed, but the problem is that often AVX intensive code triggers intel to
> significantly throttle

Xeon-W, HEDT, and consumer processors suffer this to a much smaller extent
than Skylake-SP does, and this slowdown is already baked into the benchmarks
(the deviation from benchmarks only really exists for _mixed_ workloads). In
most cases, AVX is still a huge speedup.

Also, on consumer+HEDT you can manually configure the AVX offset anyway. Want
zero offset, and can handle the power/heat? Go hog wild...

> Most notably even on the most FP intensive code don't seem to benefit much
> from the higher end xeons (with two AVX512 units) over the lower models with
> a single AVX512 unit.

Intel's documentation on some of the HEDT chips was incorrect, the i7s
actually have dual AVX512 as well as the i9s. If you were referring to HEDT as
"Xeons" that would be why. Otherwise, it might be another documentation issue.
The difference should be very clear in something like x265 encoding.

------
valarauca1
2/4 CCX’s lack direct memory access. This is kind of a deal breaker.
Threadripper is already memory bound, so the 2 new packages will just be
waiting on RAM most the time.

Cite: [http://en.community.dell.com/cfs-file/__key/telligent-
evolut...](http://en.community.dell.com/cfs-file/__key/telligent-evolution-
components-attachments/13-4491-00-00-20-44-47-63/Direct-from-
Development-_2D00_-PowerEdge-NUMA-Configs-with-AMD-EPYC-
Processors.pdf?forcedownload=true)

It would be nice if L3 wasn’t an eviction cache :/

~~~
loeg
> Threadripper is already memory bound

This depends heavily on your workflow. If you find TR memory bound, then you
want a server part. TR has 4 memory controllers that will each do DDR4-3200 in
TR2 per TFA. That should be able to push ~95 GiB/s.

What do you compare that against?

Intel's highest end HEDT part, i9 7980XE with 18 cores, only supports
DDR4-2666 and has the same number of controllers, so it should hit ~80 GiB/s.
It retails for $2000.

If you want more memory bandwidth you're either buying an even more expensive
Xeon, or an AMD Epyc part.

~~~
valarauca1
>This depends heavily on your workflow. If you find TR memory bound, then you
want a server part. TR has 4 memory controllers that will each do DDR4-3200 in
TR2 per TFA. That should be able to push ~95 GiB/s.

Wrong. You neither read the OP or my citation. Eypc can push 95GiB/s with its
8 channels. TR2 isn't increasing the number of channels, so 2 CCX's (8 cores,
16hyper threads) don't have direct memory access. They have to work with in
the inter-chip CCX bus to get memory access. This is going to carry a lot of
performance penalties.

~~~
loeg
> Wrong.

What? Are you seriously claiming that a workflow that only uses 4 GiB/s of
memory throughput is somehow memory bound on a controller that can push 95
GiB/s? Workflow matters and affects reasonable CPU choice.

> You neither read the OP or my citation. Eypc can push 95GiB/s with its 8
> channels.

Um, also not true. Your link does not include any memory throughput figures
supporting your contention that TR is "memory bound." I believe you are
totally mistaken about the 95 GiB/s figure for Epyc; see below:

TR, quad channel:
[https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x#Me...](https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x#Memory_controller)

Epyc, octa channel:
[https://en.wikichip.org/wiki/amd/epyc/7551p#Memory_controlle...](https://en.wikichip.org/wiki/amd/epyc/7551p#Memory_controller)

Epyc can push 159 GiB/s with DDR4-2666. Per socket:

> In a dual-socket configuration, the maximum supported memory doubles to 4
> TiB along with the maximum theoretical bandwidth of 317.9 GiB/s.

You said:

> TR2 isn't increasing the number of channels, so 2 CCX's (8 cores, 16hyper
> threads) don't have direct memory access.

Understood. I have never said otherwise.

> This is going to carry a lot of performance penalties.

The extra cores will have higher NUMA latency than the memory-local cores,
yes. But it does not somehow decrease the total memory bandwidth available
across the CPU.

------
ksec
Are there any reason, why you don't want to use Threadripper for Server?

Quad Memory Channel + 32 Core + NVMe Sounds perfect to me for Sever. And it
should be priced similar to the Xeon-D 16 Core.

~~~
sliken
No guarantee on ECC, half the cores and half the memory bandwidth of epyc.

~~~
ksec
TR Supports ECC, and many has it running. Of coz if you need the memory
bandwidth EPYC is best fit.

------
nottorp
Sexy, but for home use I don't need a space heater, on the contrary. What's
the fastest (multiple core) you can get these days inside 65 W and who wins
there, AMD or Intel?

~~~
Zekio
Pretty sure it is AMD with Ryzen 7 1700 or Ryzen 7 2700 both rated at 65W TDP

~~~
krylon
I have a Ryzen 1700 at home, and I am very happy with it. The fan that came
with it runs very quietly even under heavy load. On workloads that utilize all
cores, frequency will drop to 2-2.5 GHz after a while, but for those, the
number of cores/threads tends to have a bigger impact on throughput than clock
frequency / performance per core.

~~~
nottorp
Aww thanks - for some reason I had forgotten there are Ryzen 7s at 65 W and
was only looking at the 5s...

~~~
krylon
I was very happy to discover the 1700 back then, because I wanted an 8-core
machine _really_ badly (not that I really need one, but hey, I only upgrade my
desktop every ~5 years), but I did not want an electric radiator. ;-)

------
SubiculumCode
Oh lovely competition. How I adore you.

------
baybal2
I wish I can get threadripper on an mini-ITX motherboard

~~~
Analemma_
Have you seen a Threadripper socket? They're freakishly huge (78mm X 105mm).
That's a huge chunk of a mini-ITX board (170mm X 170mm) all by itself. There's
no way.

~~~
loeg
And even with an ordinary socket, mITX boards have anemic amounts of PCIe
lanes. One of the top selling points of TR is the huge number of PCIe lanes
for a non-server part. All ATX boards around launch were required to support
the full complement of physical lanes. A mITX configuration would have both
anemic PCIe and anemic memory.

~~~
baybal2
On the other hand, I think TR can work without chipset. Am I right?

Would 4 DIMM slots should be enough to saturate all controllers?

~~~
loeg
Yeah, 4 DIMMs should be sufficient[0], and I guess some mini-ITX boards have 4
slots (e.g., ASRock X299E-ITX/ac).

I don't know about TR working without a chipset. This picture[1] suggests the
X399 part isn't absolutely essential, but I don't know if there are other
considerations that make it essential.

Anandtech says[2]:

> Unlike Ryzen, the base processor is not a true SoC as the term has evolved
> over the years. In order to get the compliment of SATA and USB ports, each
> Threadripper CPU needs to be paired with an X399 chipset. So aside from the
> CPU PCIe lanes, the 'new' X399 chipset also gets some IO to play with.

So, maybe? If you are willing to forgo all the chipset-attached IO, and most
PCIe lanes. That could be useful for a niche workflow or maybe a small form-
factor desktop workstation that just needs CPU compute.

[0]:
[https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x](https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x)

[1]:
[https://en.wikichip.org/w/images/6/61/x399_platform.png](https://en.wikichip.org/w/images/6/61/x399_platform.png)

[2]: [https://www.anandtech.com/show/11685/amd-
threadripper-x399-m...](https://www.anandtech.com/show/11685/amd-
threadripper-x399-motherboards)

~~~
baybal2
Thank you for full and informative reply!

------
MR4D
This looks like a fun CPU for running Redis on.

At one instance per core, this ..... ok, I'm getting carried away, aren't I ?

------
mastazi
Could an admin please fix the title?

------
dis-sys
AMD will have a kill on the market if they can price this 32 core chip around
$1999.

~~~
daveguy
The top end Threadripper 1 16c/32t was priced at $1k. I don't think this is
going to be much more than that. $1499 max is my guess. (Disclaimer: I own AMD
stock so I am optimistic)

~~~
hoistbypetard
That sounds right to me. At the beginning of the year I found a TR 9150 + a
nice motherboard bundled together at a local shop out the door for around $900
before rebates. I'd be a little surprised if this part came in at more than
$1500, and not at all surprised to see it available for around $1200 within a
couple months.

------
xamarinthrw
Somebody told me that cpu prices are like flight prices and your best bet is
to buy a bit old.

What is the best time to buy a CPU? After how many months does the biggest
drop occur?

~~~
a_f
If you are shopping for a bit of a CPU than newer models, make sure that the
motherboard that fits your requirements is also available. Motherboards tend
to get more expensive 2nd hand once they stop being manufactured, as opposed
to CPUs.

~~~
xamarinthrw
I will keep this in mind, good info, thanks

