
AMD Discloses Initial Zen 2 Details - throwaway2048
https://fuse.wikichip.org/news/1815/amd-discloses-initial-zen-2-details/
======
beatgammit
This is a pretty bold change and really makes Intel chips less appealing. With
the recent kernel changes for Intel's Spectre issues and this chip coming up,
I think it's a distinct possibility that AMD will take back a ton of
marketshare in the server space.

If these chips really do have a ~25% performance gain over Ryzen (not counting
IPC gains), I might just upgrade, and I think a lot of other people are in the
same boat.

This could be a fantastic win for AMD. I was worried that Zen 2 was going to
be a moderate improvement, but this seems to be pretty significant. It's been
a while since AMD has been first to market with a big winner...

~~~
astrodust
What's disappointing is that none of the major cloud vendors has made any real
commitments to AMD, at least none that I've seen.

If speculative execution is a problem and you need to give each VM its own
exclusive cores, great when you can source parts with 64+ cores on them!

~~~
dopenmoredope
What's even more disappointing is that Zen 2 doesn't include support for new
techs like the upcoming DDR5 & PCIe 5.0 which seems are going to be supported
in Zen 4

[https://www.anandtech.com/show/13578/naples-rome-milan-
zen-4...](https://www.anandtech.com/show/13578/naples-rome-milan-zen-4-an-
interview-with-amd-cto-mark-papermaster)

IC: AMD has already committed that Milan, the next generation after Rome, will
have the same socket as Rome. Can you make the same commitment with Zen 4 that
was shown on the roadmap slides?

MP: We’re certainly committed to that socket continuity through Milan, and we
haven’t commented beyond that. Obviously at some point the industry
transitions to PCIe 5.0 and DDR5 which will necessitate a socket change.

IC: So one might assume that an intercept might occur with Zen 4?

MP: No comment (!)

~~~
Scaevolus
PCIE 4.0 was just standardized last year, it's not surprising that Zen 2 won't
support 5.0, which is expected to be introduced mid next year.

Zen 2 will be the first CPUs supporting 4.0.

~~~
dopenmoredope
Release date for PCIe 5.0 is Q1 2019

Furthermore...

On June 5th, 2018, the PCI SIG released version 0.7 of the PCIe 5.0
specification to its members.

PLDA announced the availability of their XpressRICH5 PCIe 5.0 Controller IP
based on draft 0.7 of the PCIe 5.0 specification on the same day

[https://www.plda.com/products/xpressrich5](https://www.plda.com/products/xpressrich5)

and...

Historically, the earliest adopters of a new PCIe specification generally
begin designing with the Draft 0.5 as they can confidently build up their
application logic around the new bandwidth definition and often even start
developing for any new protocol features. At the Draft 0.5 stage, however,
there is still a strong likelihood of changes in the actual PCIe protocol
layer implementation, so designers responsible for developing these blocks
internally may be more hesitant to begin work than those using interface IP
from external sources.

From here:
[https://en.wikipedia.org/wiki/PCI_Express](https://en.wikipedia.org/wiki/PCI_Express)

AMD had plenty of time to include PCIe 5.0 even in the Draft 0.7 stage which
is pretty much the same as final draft/release, but decided not to.

~~~
Tuna-Fish
> AMD had plenty of time to include PCIe 5.0 even in the Draft 0.7 stage which
> is pretty much the same as final draft/release, but decided not to.

... Just how long do you think it takes to make that kind of design changes?
This requires a major change in silicon. It would have had to been done more
than a year ago.

~~~
shaklee3
Not to mention there are no PCIe switches with it. You'd have to wait for
Avago/PLX/Broadcom/whatever it's called now to update them.

------
ChuckMcM
"Oh the places you'll go" :-)

Assuming this chip doesn't trip all over itself moving things around, it will
be an astonishing amount of computer power in a reasonably sized package. This
is for me, the only reason to work at an internet giant; because they will
build a tricked out motherboard with two sockets and up to 8TB of RAM and with
say a petabyte of attached non-volatile storage available, field solvers, CFD
apps, EM analysis would just melt away. Whether it is designing a rocket
engine, or folding a protein, or annealing a semiconductor on the quantum
level. So often these programs use approximations to allow them to run in
finite time, and now more and more of the approximations are replaced with
exact numerical solutions making the models more and more accurate.

Five years from now when people are throwing out these machines to replace
them with Zen3 or Zen4 machines, I'm going to be super glad to get one and
play with it.

~~~
majewsky
> exact numerical solutions

Isn't that an oxymoron? Numerical solutions always involve some sort of
rounding errors because of limited floating-point precision, so they cannot be
exact.

~~~
ChuckMcM
Reminds me of the mathematician joke, "We know that _one side_ of one house is
painted brown." :-)

You are correct in that most of these systems don't have a closed form
solution that would yield an exact result, and they rely on either an
interpolated value which is inexact depending on deviation or numerical
solutions which typically iterate to the nearest point, that the system can
represent in its limited 80 or 128 bit floating point system.

------
djsumdog
I hope AMD has something on their video chip line-up against nVidia as well.
I'm curious if they knew nVidia was working on ray-tracing and plan to
implement the same API or if they have some other tech they have up their
sleeve they've been working on vendors with. I hate how nVidia has such a
monopoly on the video chip market. Who knows, maybe Intel will finally get
back into the gamer 3D market and we might finally have three options again.

~~~
screye
I hate the Cuda monopoly in machine learning right now. I hope some of the
mature libraries (TF, Pytorch) start officially supporting AMD GPUs.

Nvidia is looting customers that want to use GPUs for Machine Learning on the
cloud (like AWS, GCP)

~~~
nabla9
Nvidia is cashing in their investments for software and integration.

It's AMD's job to make machine learning work in their GPU's. If they don't
believe in it and spend the necessary time and effort, nobody else will.
Radeon Open Compute Platform (ROCm) has existed for years but apparently it's
not good enough.

ps. Tensorflow has ROCm backend support.
[https://hub.docker.com/r/rocm/tensorflow/](https://hub.docker.com/r/rocm/tensorflow/)
but is MI25 competitive? [https://www.amd.com/en/products/professional-
graphics/instin...](https://www.amd.com/en/products/professional-
graphics/instinct-mi25)

~~~
ndesaulniers
> Tensorflow has ROCm backend support.

Why is it a separate repo rather than being contributed upstream to TF?

~~~
singhrac
PyTorch has AMD support in the main repo:
[https://github.com/pytorch/pytorch/tree/master/tools/amd_bui...](https://github.com/pytorch/pytorch/tree/master/tools/amd_build)

------
bluecalm
Damn, with 64core/128 threads becoming widely available a lot of Windows
software will have to be updated to use that because of "64 bits should be
enough for everybody" kind of decision in Windows when implementing affinity.
You can't get more than 64 threads in OpenMp when compiling with MinGW and you
can get it with Clang but the implementation wasn't very efficient when tested
it. I suspect the problem is there in most Windows thread pool
implementations.

~~~
SlowRobotAhead
Last I saw MS SQL licensing among others like it have a pricing structure per
CPU core...

That’s going to need to be modified I think!

~~~
adwf
Oracle says "Haha, no."

~~~
astrodust
Oracle says "Looks like our profits will double each time AMD doubles the core
count of their flagship chip."

~~~
walrus01
Larry Ellison needs a bigger yacht, time to re-up your Oracle per core
licensing agreement.

~~~
crazysim
Hah, yacht. He needs a bigger island.

------
mbell
Anyone know why we aren't seeing Intel/AMD go the 'ultra-wide' route that
we've seen in ARM processors? (Apple's in particular)

e.g. we've seen Apple's A12 processor expand ALU's from 4->6 and what seems
like a strong focus on cache latency and these changes seem to be rather
beneficial in real code. Why aren't we seeing the same from Intel / AMD? As
someone whom isn't particularly well informed on the topic my guess is that
AMD/Intel are scared of selling a wider-core but lower clocked CPU given how
much marketing is attached to clock speed, but I imagine there are
architectural issues as well.

~~~
phkahler
Instruction set and the number of registers visible to the programmer
influence the practical limits to issue width. AMD64 (x86_64) only has 16
general purpose registers, so there are limits to how many instructions could
possibly execute at one time. If I recall correctly the ARM ISA has 32
registers, so there is potential for a lot more data sitting there ready to do
something on any given cycle. There are limits imposed by software as well -
lots of real world program code simply doesn't have opportunities to do many
things in parallel.

Having said all that, any extra execution units can be used more effectively
with multi-threading. It sounds neat to have twice as many threads as cores,
but on my workloads that's only about a 20 percent performance increase. Going
wider would probably help the second thread quite a bit, but what would be
sacrificed is deep in the details of a given design. I suspect they increase
width so long as it doesn't impact single thread performance.

~~~
monocasa
I've heard from a CPU designer that the CISC nature of x86 lets it punch above
it's weight in terms of what you're talking about. There's a lot of
instructions that don't reference any architectural registers, but get
allocated physical registers (and would have architectural registers allocated
when compiled to something RISC). He claimed it was about equivalent to a 32
register RISC for that reason.

~~~
phkahler
x86 has plenty of instructions that use data from memory as one of the
operands. I'm sure that offsets the limited number of registers somewhat.

In the end it's all deep in the details. What I'd really like to see is a
RISC-V implementation done by a full team at Intel, AMD, or IBM. Even an ARM
implementation by those teams would make a great comparison, but that seems
even less likely ;-)

~~~
monocasa
> x86 has plenty of instructions that use data from memory as one of the
> operands.

And the fact that you can have a full register width immediates in a single
instruction means that you don't have to allocate an architectural register
for intermediate immediate construction.

------
bhouston
64 cores with 128 threads. Hope that comes to threadripper. I love and for
bringing back competition to the CPU market. We've bought a ton of amd
machines in the last year, great bargains in our space.

~~~
s3cur3
Benchmarks on the 32-core TR are... disappointing, to say the least.[1] If
you’re purely compute bound, it can be a win over the 16-core version, but if
memory access is a factor, it’s a wash due to the extra hops to memory. And to
my mind, there are very few pure-compute applications that wouldn’t benefit
more from AVX2 and the like... in which case a cheaper Intel CPU would still
wipe the floor with the 32-core chip.

I’m a huge fan of the Threadripper concept, just wish they hadn’t neutered the
32-core chip compared to its Epyc counterpart.

[1]: [https://www.anandtech.com/show/13124/the-amd-
threadripper-29...](https://www.anandtech.com/show/13124/the-amd-
threadripper-2990wx-and-2950x-review)

~~~
5436436347
Those (Anandtech) benchmarks were performed on Windows. All threadripper
benchmarks on Linux show that it is nowhere near as awful a performer as on
Windows and most compute workloads do scale okay. Seen multiple ideas thrown
around like Windows not being NUMA aware with this processor or just plain bad
core scheduling

~~~
s3cur3
They did a follow up changing the scheduling policy for thread 0 (again, still
on Windows) and it didn’t make a difference for almost all their workloads:
[https://www.anandtech.com/show/13446/the-quiz-on-
cpu-0-playi...](https://www.anandtech.com/show/13446/the-quiz-on-
cpu-0-playing-scheduler-wars-with-amds-threadripper-2990wx)

~~~
coder543
AnandTech really needs to hire a Linux-focused editor to do some benchmarks
there too, especially for these large systems that are unlikely to be running
Windows anyways.

The Phoronix benchmarks are quite clear,[0] I don't know why you keep linking
to AnandTech's Windows benchmarks. I say this as someone who reads tons of
AnandTech reviews because they're great, but Windows just doesn't do well with
high core count hardware at all.

[0]:
[https://www.phoronix.com/scan.php?page=article&item=2990wx-l...](https://www.phoronix.com/scan.php?page=article&item=2990wx-
linux-windows&num=2)

~~~
0x8BADF00D
This seems like a familiar issue I've run into with workstations I've used in
the past running Xeons. Not sure how NTOSKRNL handles scheduling of parallel
tasks. I'd venture a guess and say it's hybrid (M:N threads), where multiple
userland application threads are mapped to some "virtual processor" in
kernelmode. That leads to priority inversion between the userland and
kernelmode threads, which could explain why Windows benchmarks are terrible
when dealing with multiple physical cores.

~~~
temac
As far as I know Win NT threads are 1:1.

Not even sure how it would work or even make any sense to have N:M handled by
the kernel. N:M is usually a mainly a userspace thing. And Windows is even
less likely to use that kind of convolution, because IIRC it can call back
from kernel to userspace (that design I would not recommend, btw, but oh
well). You have fibers, of course, but that's a different thing.

Windows does not scale probably simply because the kernel is full of "big"
locks (at least not small enough...) everywhere, and they have far less fancy
structures and algo than Linux (is there any equivalent of RCU that is widely
used in there? - not sure). Cf the classic posts of the builder of Chrome who
every now and then encounter a ridiculous slowdown of his builds on moderately
big computers, sometimes because of mutexes badly placed.

~~~
0x8BADF00D
> Not even sure how it would work or even make any sense to have N:M handled
> by the kernel. N:M is usually a mainly a userspace thing.

Correct, I meant that the benchmarking program itself probably used that
implementation. Not the Win NT kernel’s implementation of OS threads.

------
zdw
The most interesting thing about this is the speculation about the I/O die
being flexible enough to take other workloads - it's basically the "chipset"
of old.

I wonder if they'll license it - with Apple's A12 already on the 7nm TSMC
process, building a Xeon crushing ARM monster for the new Mac Pro by swapping
the Ryzen dies for ARM dies seems like a great bit of leverage, assuming Cook
and Su could arrange it.

------
caycep
From a desktop/casual system builder/potential new macbook pro buyer question
but will these surpass Intel's core i5/i7 offerings in single thread + power
efficiency, given Intel is stuck on 14 nm for a while?

Thinking of upgrade my 2014 era stuff due to massive improvements in SSD,
memory etc. but not sure its worth dropping so much money for coffee lake or
just waiting for something better x86 (or ARM...)-wise.

------
piinbinary
Does 1.25x performance at the same power refer to clock speed? Does that mean
that we can expect 5 Ghz in Ryzen 3000?

~~~
m_mueller
Look up Dennard Scaling and how it broke.

~~~
dragontamer
Dennard Scaling would have been 2x clock-rate from an improved node.

1.25x scaling from an improved node is way, way, way worse than Dennard
Scaling of the past.

Intel Pentium III Coppermine (1999) went from 733 MHz on the 180nm node to
Pentium III Tualatin (2001) 1400 MHz on the 130nm node. THAT was Dennard
scaling.

Today, we "only" get double-digit gains from an improved process node. Dennard
Scaling was triple-digit gains. Furthermore, most CPU makers focus on the
power-saving aspects (which seem to be scaling somewhat well still).

~~~
m_mueller
If you look at GP's question, (s)he was asking about 5Ghz in Zen2. Since Epyc
1 is ~3Ghz, a 1.6x increase in clock speed for a ~1.4x smaller process size
(AFAIK "7nm" is overselling it compared to 14nm) to me smells like Dennard
scaling and thus cannot be expected anymore (if you disregard tricks like
turbo boost where a bunch of hardware gets disabled such that the rest can be
boosted).

~~~
Dylan16807
They're asking about a 1.25x increase on a 1.4x smaller process.

Even if you completely ignore boost clocks, the 1900X has a base clock of
3.8GHz, so interpret it as "4.75GHz base clock on a non-Epyc part" if you
must.

But I don't think you should ignore boost clocks. They're not a trick to make
the silicon seem more capable. The silicon really is that capable and boost
clocks are a trick to cap power draw. It's entirely fair to look at the 4.2
boost clock on the 2990WX and conclude that the silicon is capable of 4GHz
under non-exotic conditions.

------
gruez
>On the security side, Zen 2 introduces in-silicon enhanced Spectre
mitigations that were originally offered in firmware and software in Zen.

does intel have something comparable on their roadmap?

~~~
wmf
[https://www.anandtech.com/show/13450/intels-new-core-and-
xeo...](https://www.anandtech.com/show/13450/intels-new-core-and-xeon-w-
processors-fixes-for-spectre-meltdown)

------
baybal2
Are there any more concrete proofs for the HBM2 on package possibility?

~~~
tutanchamun
Do you mean the comment under the article that mentions HBM? If so I think
they refer to these papers by AMD:

[https://www.computermachines.org/joe/publications/pdfs/hpca2...](https://www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf)

[https://seal.ece.ucsb.edu/sites/seal.ece.ucsb.edu/files/publ...](https://seal.ece.ucsb.edu/sites/seal.ece.ucsb.edu/files/publications/2017-iccad-
stow-activepassiveinterposers.pdf)

------
pmoriarty
Is it going to support ECC memory?

~~~
throwaway2048
All AMD processors have supported ECC for a very long time, its trivial to
support it, Intel have just decided to gate it as a premium feature.

~~~
tolien
Sort of - they let mainboard vendors decide whether to support it or not,
which means it can be a crapshoot. For example, MSI's been known in the past
to kill ECC support with a BIOS update; some vendors have tested that you can
use ECC RAM but won't enable any of the error correction (for example,
Gigabyte say this in [1]: "non-ECC mode".)

Selfishly I really wish they would make it easier, because I'm in the market
for a new personal-use storage machine and I've spent far too long researching
all this crap but it's looking like I'll have much more certainty that it'll
all just work if I buy a Xeon E3/E-2000 series and that's unfortunate.

1:
[http://download.gigabyte.eu/FileList/Manual/mb_manual_ga-h11...](http://download.gigabyte.eu/FileList/Manual/mb_manual_ga-h110m-s2h\(gsm\)\(ddr3\)_e.pdf)

~~~
bpye
Asrock have been good in my experience at enabling all features on their
boards. Back when Intel's Vt-d support depended on your board they reliably
had support, and I believe they support ECC on all their new AMD boards.

~~~
ploek
I have ECC working on an AB350M Pro4. See also
[https://www.hardwarecanucks.com/forum/hardware-canucks-
revie...](https://www.hardwarecanucks.com/forum/hardware-canucks-
reviews/75030-ecc-memory-amds-ryzen-deep-dive.html) The 'edac_mce_amd' module
needs to be loaded on Linux.

~~~
Already__Taken
I have one too, seems to be trucking along fine.

------
rawoke083600
Looks good. Sidenote would love to see some ml-benchmarks in future cpu
comparison articles.

------
_emacsomancer_
Given that AMD has introduced their own version of Intel's IME, there's very
little incentive for me to consider their CPUs. At least there are workarounds
for some of Intel's CPUs.

~~~
MegaDeKay
You can disable the PSP in newer BIOSes. I can do it on my X370 Taichi. The
link below is old but accurate. Go AMD and don't look back.

[https://www.phoronix.com/scan.php?page=news_item&px=AMD-
PSP-...](https://www.phoronix.com/scan.php?page=news_item&px=AMD-PSP-Disable-
Option)

~~~
aseipp
That doesn't really change much in all honesty; it just disables support for
things like the fTPM, secure sleep states, and some communication mailbox
primitives (that allow things like offloading encryption to the PSP
coprocessor, through the Linux crypto API subsystem -- this is all supported
in upstream Linux.)

The PSP is still essential to the boot process and many probably other things
(power management, etc), and you aren't going to just magically turn it off
with a UEFI option. If your concern is that the PSP is a blackbox covert
channel, it probably changes almost nothing.

Both AMD and Intel are functionally equivalent here, as far as I'm concerned.

(My ASRock X399 board also has this UEFI option and specifically calls out
that it only disables a few key features.)

~~~
coder543
Sources are conflicting, but the general consensus I've seen is that the Intel
ME has direct access to your networking hardware, while the AMD PSP does not,
and basically just exposes interfaces to the CPU.

If accurate, this is a significant functional difference as far as I'm
concerned.

But, what's the old saying again? "The only truly secure system is one that is
powered off, cast in a block of concrete and sealed in a lead-lined room with
armed guards - and even then I have my doubts."

~~~
lbbe
And PSP also taking part in the memory-training process during the boot AFAIK.
So you can't disable it completely.

------
Const-me
> meaning 256-bit AVX operations no longer need to be cracked into two 128-bit
> micro-ops per instruction

The instruction set stayed the same. And in the current instruction set, a lot
of these AVX instruction still operate on 128 bit lanes. Instructions like
vpshufd, vshufps, vpblendw only shuffle/blend/permute within 128-bit lanes, so
do AVX512 equivalents.

~~~
Tuna-Fish
Yes, but that was not what the article was referring to. Zen1 CPUs only have
128-bit FPU lanes, and execute wider instructions by splitting them into two
in the frontend.

