
AMD Ryzen 3rd Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 - MikusR
https://www.anandtech.com/show/13829/amd-ryzen-3rd-generation-zen-2-pcie-4-eight-core
======
oliveshell
Wow, performance competitive with i9-9900K at nearly half the power draw would
be huge if it turns out to be true. Can’t wait for benchmarks.

What an architecture Zen is.

~~~
kissiel
130W on AMD vs 180W Intel. Nearly 75%.

Also, I'm worried that they're at "engineering sample, glad it didn't crash"
stage. Optimistically this will hit shelves at Q3, and at this point we may
see sunny-cove-based 10nm (Whole chip, not only the "chiplet") cpu from the
blue team.

I think AMD's success will come down to pricing...

~~~
nonbel
You are comparing power for the entire system. I think they subtracted 55W of
idle power consumption from both to get only the cpu, then compare: 75/125 =
0.6.

~~~
cm2187
But isn’t part of idle consumption due to the cpu?

~~~
mdorazio
Not very much. For example i9 uses under 13 watts at idle [1].

[1] [https://www.tomshardware.com/reviews/intel-
core-i9-7900x-sky...](https://www.tomshardware.com/reviews/intel-
core-i9-7900x-skylake-x,5092-10.html)

------
TwoNineA
Interesting, according to Anandtech, there might be space for something else
on the package:

[https://images.anandtech.com/doci/13829/cpu44.jpg](https://images.anandtech.com/doci/13829/cpu44.jpg)

Maybe a second Zen2 die?

~~~
DCKing
There were some rumors before this announcement that AMD would announce SKUs
with 1 CPU die (6-8 cores activated) + 1 GPU die or with 2 CPU dies (8-16
cores activated). But AMD didn't announce any SKUs - seems they're a bit
behind on schedule.

~~~
okl
Also, if they reveal too much prospective buyers might be incentivized to
delay their purchase. As stupid as that might sound, it is a thing.

~~~
mbroncano
It even has a name: Osborne effect

[https://en.m.wikipedia.org/wiki/Osborne_effect](https://en.m.wikipedia.org/wiki/Osborne_effect)

------
berbec
What I'm salivating for is an AMD 6c/12t mobile chip. With this power draw on
desktop, chop the thermals and boost clock a bit and you have something
truelly frightening on the battery life & burstable performance scale.

~~~
ben-schaaf
Considering last generation's APUs used the same "chiplets" as the desktop
processors, I'd wager the next generation of APUs will have a single 8c16t
zen2 die.

~~~
berbec
Ooh.. A 3500G, 6c with Vega 15 (or whatever) would be nice!

------
ineedasername
AMD seems to be making a push for the server market at the moment too. It
looks like a good time for it as Intel struggles to get production on 10nm
while AMD is already at 7nm for some of their range. But Intel has an
incumbency advantage. I'm no expert on the server market though. Could someone
with a bit more knowledge chime in with a how big a threat this really is?

~~~
dunpeal
I don't see how this "incumbency advantage" would help Intel, exactly. The
server market is highly competitive. If AMD puts out superior product at an
attractive price, nobody is going to say "but we've used Intel for a decade,
how can we suddenly switch to AMD?!".

~~~
hitpointdrew
>nobody is going to say "but we've used Intel for a decade, how can we
suddenly switch to AMD?!".

Maybe not those exact words, but people will say "We are adding a server to
our VMWare ESXi cluster, so if want it to be compatible with our existing
hardware we better stick with Intel."

VMWare's high availability feature (a highly desired feature for any data
center) won't work across different CPU architectures, so unless you are
replacing your entire stack and not just adding server(s) then you have to
stick with the architecture you already have in place.

~~~
neuromancer2701
These aren't "different CPU architectures". They would all be x86-64 so unless
VMware uses some specific Intel feature it shouldn't be a problem.

~~~
hitpointdrew
I have been a Sys Admin for 7 years. It is a problem, vmotion won't work going
from AMD to Intel or vice versa (at least not in any official way). In fact it
is highly suggested when building ESXi stacks that you get the exact same
model CPU for each server.

[https://communities.vmware.com/thread/305792](https://communities.vmware.com/thread/305792)
[https://www.v-front.de/2013/04/how-to-vmotion-from-intel-
to-...](https://www.v-front.de/2013/04/how-to-vmotion-from-intel-to-amd-
and.html)

------
bluedino
>> Cinebench is an idealized situation for AMD

I wouldn't say it's especially suited for it, but AMD generally does well
because of the large amount of cores/threads.

On the other hand, something like 7Zip, AMD has had a huge advantage compared
to Intel, even going back to the Bulldozer days. What is it about that code
that runs so well on AMD?

~~~
entropicdrifter
Compute (rather than I/O) heavy and highly multithreaded stuff skews towards
AMD. Most applications benefit more from Intel's caching and single-core turbo
boost features than they do from AMD's more discrete style of multithreading
and weaker single-core mode tech.

~~~
zozbot123
> Compute (rather than I/O) heavy and highly multithreaded stuff skews towards
> AMD.

This. It's worth noting that newer, legacy-free programming languages make it
a lot more feasible to parallelize compute-heavy parts of the code, and to
deal with the "Memory wall" (i.e. the rise of memory-bandwidth as a bottleneck
especially in high-CPU-frequency, high-core-count systems) by working with
efficient, low-level representations of data. This will help not just AMD, but
also newer entrants in this market segment such as ARM vendors.

~~~
wbl
Do the rooflines of these chips show that pattern? Intel has a big vector unit
with some seriously useful instructions in it, pushing the arithmetic speed
up.

------
bratao
"Identical Performance to the Core i9-9900K, At Just Over Half The Power" This
looks like trouble for Intel!

~~~
jandrese
Yeah, but that's comparing a mid 2019 chip to a chip you can buy today. Intel
will probably make some announcement about their chip that's faster than this
one right about the time this hits the shelves. It's the neverending rat race
for chipmakers.

~~~
qball
I'm not so sure about that.

The Intel playbook has, for the last 8 years, been either "same architecture,
same clock speed, less power draw" (remember, Sandy Bridge [2011] and
Broadwell [2014] perform within 10% of each other given the same clock speed-
a consequence of chasing thin-and-light laptops and tablets) or "same
architecture, higher clock speed, more power draw" after Zen hit store
shelves. Instructions per clock haven't changed significantly in many years;
Intel has just slowly gotten better at hitting higher clocks because smaller
transistors pull less power.

Overclockers have known for years that 5GHz is about the maximum that Intel's
architectures can do under standard conditions (i.e. not liquid nitrogen), and
the bottleneck there has primarily been due to heat generation and not
processor instability.

It hasn't escaped Intel that these kinds of clocks were easily possible- they
had special CPUs sold at those maximum speeds for high-frequency trading
applications (not sold to the general public), and that fact was being kept in
reserve as a strategic advantage against potential competition.

But that advantage has already been used up (not that AMD has one at the
moment either; Zen and Zen+ have practically zero overclocking headroom on
their fastest examples). Intel will need a new architecture to compete, and
that's still likely at least a year down the road- and unlike the last time
this happened (the Pentium 4 was supposed to reach 5GHz, but capped out at 3.8
for heat and power draw reasons) they don't have another architecture waiting
in the wings to save them.

~~~
kllrnohj
> the Pentium 4 was supposed to reach 5GHz

Somewhat minor nit but it was actually supposed to reach 10GHz:
[https://www.anandtech.com/show/680/6](https://www.anandtech.com/show/680/6)

"Realistically speaking, we should be able to see NetBurst based processors
reach somewhere between 8 – 10GHz in the next five years before the
architecture is replaced yet again. Reaching 2GHz isn’t much of a milestone,
however reaching 8 – 10GHz begins to make things much more exciting than they
are today. Obviously this 8 – 10GHz clock range would be based on Intel’s
0.07-micron process that is forecasted to debut in 2005. These processors will
run at less than 1 volt, 0.85v being the current estimate."

And overclockers did actually push that chip all the way to 8GHz with extreme
measures.

Core 2 then hit the reset button and clock speeds dropped substantially (from
3.8ghz to 3.0ghz) in exchange for huge boosts to IPC. Intel didn't exceed the
P4's 3.8GHz clocks until nearly 10 years later with the 4.0ghz i7-4790k (AMD
was actually the first to 4 ghz with the AMD FX-4170)

------
m0zg
Also, natively wider SIMD (256 vs 128 bit), though still no AVX512 IIRC. I'm
waiting on third gen Threadrippers in particular. Now that will be one heck of
a chip for a quad GPU deep learning workstation.

~~~
hajile
Intel does avx512 at 30+% slower clockspeeds. AMD should be able to do an
avx512 at full clockspeeds in double the cycles.

For equivalent architectures, you'd expect Intel to be faster at workloads
that are only avx512. They'd be slower at mixed workloads (the avx unit slows
down, but so do all the other ALUs executing in parallel along with the SMT
thread).

Most importantly, 512 is 256 more sets of hardware which is a big addition to
core size and power usage for what is a very fringe workload. Saving that
while getting better performance in common workloads seems like a great
tradeoff.

~~~
m0zg
I'm aware. But it's not just about clock speed per se. AVX512 has some _very_
specialized instructions that are specifically designed to speed up matrix-
matrix and matrix-vector multiply. Even at a significantly slower clock those
instructions improve things by a lot if you do linear algebra. Most of linalg
is done on GPUs nowadays, but not all of it.

~~~
bitL
As a consequence, TR/Ryzen is very slow in "classical" ML in Python (I can
tell), because MKL/AVX2 seems to be way slower. This could be fixed with Zen
2; AVX512 then will be likely at 2/3 of Intel's performance without slowing
down the rest of the system during computations.

~~~
m0zg
I agree with this assessment. Frankly, GPUs are so much faster for dense
linalg than even the highest end Intel (or AMD) chips that anything that can
be done on GPUs should be done there. But that requires one to know how to
work with them, if the primitives don't already exist, and most people are
oblivious to the finer points.

------
nicoburns
> This suggests that AMD’s new processors with the same amount of cores are
> offering performance parity in select benchmarks to Intel’s highest
> performing mainstream processor, while consuming a lot less power. Almost
> half as much power. > >How has AMD done this? IPC or Frequency?

Am I missing something? Wouldn't this be expected given that this new AMD
processor is running on 7nm while Intel hasn't really launched in 10nm
processors yet. AMD is a process generation ahead...

~~~
stoobs
The processes are measured differently, so 7nm isn't physically 3nm smaller
than 10nm in this case.

I believe they said in one of the articles that more details on how they
achieved the power saving will be forthcoming nearer the launch date.

------
reacharavindh
Question from a curious sysadmin: How are these AMD GPUs for machine learning
and math related stuff?

I have only heard of Nvidia CUDA in this context. Does anybody do ML work on
AMD GPUs?

Does the popular libraries like PyTorch, TensorFlow etc support both CUDA and
AMD's equivalent?

~~~
microcolonel
> _Question from a curious sysadmin: How are these AMD GPUs for machine
> learning and math related stuff?_

They're just fine, though I think a tuned implementation on either tends to
show NVIDIA ahead.

> _I have only heard of Nvidia CUDA in this context. Does anybody do ML work
> on AMD GPUs?_

NVIDIA is still king here, but if AMD continues to develop their Radeon
Instinct stuff, it could be competitive for some applications; it then becomes
a question of whether or not there will be enough people who know something
other than CUDA (which is not currently supported on AMD GPUs, though work is
under way to make a compatible runtime).

~~~
celrod
HIP (part of the ROCm stack) is Fairlane similar to CUDA. They also have
goodies like tensorflow-rocm, available through pip or a docker container.

That stack is also open source. And writing in it is supposed to keep you
compatible with NVidea.

Wanting to support them -- and hoping they become more competitive in ML --
are why I bought Vega GPUs.

Although, I haven't found much time to actually try and get any of my software
running on a GPU, let alone optimized. As a huge fan of avx512, I would like
to try -- maybe I can get even better performance on a graphics card.

But with AMD, the lack of support for the uninitiated is apparent. Not much in
the way of existing software and resources like online tutorials. I'd like
guides on how to optimize kernels, organizing wavefronts and discussing memory
movement. Maybe I haven't looked hard enough.

Especially painful in Julia, where almost half a year after 1.0, the only
supported way to use GPUs on any library is with CUDANative. "Cross platform"
GPUArrays.jl's open cl backend still hasn't been updated. Means all my coding
will be done in HIP. Which is fine.

------
vkaku
Meanwhile ... their Radeon VII is still on PCIe 3

I'd rather they release a better chipset with at least 2 PCIe 3s than come up
with these fast I/O paths not even commercially usable today. Such agony.

~~~
bryanlarsen
Really? The mi60 uses the exact same chip as the Radeon VII, and it supports
pcie4. They will gimp the Radeon VII in some places so they don't completely
kill mi60 sales, so dropping pcie4 makes sense in isolation but is completely
whack in the bigger picture. Doesn't the right hand talk to the left hand at
AMD? Hopefully there's time to reverse this crazy decision.

------
erdewit
Does anyone know why the IO-die is so large? Does it contain L3 cache or
something?

~~~
wmf
The Zeppelin (Ryzen 1xxx) die is 212 mm2 and its CCXs are 88 mm2 leaving 124
mm2 for the uncore. Now we see that the Matisse (Ryzen 3xxx) IO die is 122 mm2
— virtually the same size on the same process for the same functionality. (I
don't see how L4 cache would fit in there BTW.)

As to why it's so large, I guess connecting cores, memory, and PCIe at
extremely high speed just requires a lot of transistors. Intel's uncore seems
to be far smaller; I'm not sure how or why.

~~~
pmarcelll
The Ryzen chips offer more PCIe lanes and ECC memory support. These chips also
use the Infinity Fabric high-speed interconnect, which is not needed by
Intel's single-chip designs. The two CCX units in Zeppelin also need to talk
to each other, so probably all this complexity just adds up.

------
sigi45
I was looking to buy a 9700k and wasn't considering amd for probably 8 years.

Even with Nvidia and Amd, amd came out and than I bought a 1060.

I might just wait this time for amd.

~~~
reiichiroh
A Geforce GTX 1060 GPU?

~~~
nolok
That does seem rather weak given the cpu, I would expect at least a 1070. But
then again price were inflated at least by a third for a good while...

------
yazr
Whats the cache-coherency throughput of AMD vs Intel?

Isn't this a major, major issue with >=8 cores ?!

EDIT: yes, of course, lots of server code has very low memory contention.

------
gigatexal
I’m hoping to get a steak on the 2xxx lines. These 3xxx chips if they bench
well should be awesome. Can’t wait to see reviews.

------
grecy
I assume PCIe 4.0 will support the next-generation of Thunderbolt?

~~~
Dylan16807
I'm not sure what you mean. It should be straightforward to connect a
thunderbolt bridge to any version of PCIe.

If you specifically mean "without going over 4 lanes", then that's probably
true.

~~~
grecy
Currently devices have Thunderbolt 3 (up to 40 Gbps), I'm wondering if PCIe
4.0 will increase that. I assume it would be called Thunderbolt 4, and I
assume it will be faster.

~~~
Dylan16807
Increase it how? That's like asking if faster ethernet is going to increase
thunderbolt speed. No, it's a totally different way of sending data over
wires.

A theoretical Thunderbolt 4 that's twice as fast could easily be fed with a
PCIe 3 connection. It could even be fed with a PCIe 2 connection!

~~~
grecy
Huh, thanks.

I had always thought Thunderbolt was essentially "external PCIe".

------
sitkack
This is going to _own_ cloud. Costs are linear with power.

~~~
stoobs
Even the previous generations of Epyc are surprisingly cheap compared to Intel
- Azure has a L8s_v2 8 core with 64 GB RAM and decent storage for under
£100/month, which is less than half the price of anything else. Obviously,
core speed may not suit every application, but it's pretty eye-opening!

------
throwaway2021
I'll never buy AMD again, I had nothing but problems with Raven Ridge on
Linux:

[https://bugzilla.kernel.org/show_bug.cgi?id=196683](https://bugzilla.kernel.org/show_bug.cgi?id=196683)

~~~
jtl999
Yeah. I'm concerned about similar issues too.

~~~
throwaway2021
No need to be concerned, it's working fine with Arch Linux (Linux 4.20, mesa
18.3).

It's still a problem with Ubuntu 18.04.1 (kernel 4.15), but Ubuntu 18.04.2
will hopefully be more stable with kernel 4.18.

------
vbezhenar
Not released AMD CPU performs a bit worse than already released Intel CPU.
They are still behind. But it's very intriguing. If they managed to push 5+
GHz, it would be awesome CPU.

~~~
rukittenme
You mean slightly better. The AMD CPU outperformed the 9900k. Wit the same
core count. Which means AMD was likely turboing to 5GHz during the demo.

~~~
sp332
The article body shows the AMD chip slightly behind the i9. The chart just
after it shows two different figures - "pre-brief" it's behind, "on stage"
it's ahead. Not a huge margin in any case.

~~~
dralley
Power consumption was significantly better. I'd say they're still ahead.

