
Zen 2 Missives – AMD now delivering efficiencies that are double that of Intel - olvy0
http://apollo.backplane.com/2019-Zen2Missive.html
======
guardiangod
I am surprised the article doesn't talk about Intel CPU performance
degradation due to speculative timing attack workarounds. The bugs are more
severe on Intel CPUs and as such, Intel CPUs took a much bigger performance
hit.

~~~
wilhil
Here here...

This is a hidden thing that seems to always go under the radar.

My Intel i7 4790k has lost at least 40% speed since I originally bought it,
and for years, I always thought that Intel machines were getting slower
without any actual proof whereas AMD machine I returned to after years just
felt as fast as ever.

Now with microcode updates and similar getting more coverage, I'm certain that
this is what it is.

AMD's partner program is meh, where as Intel is really partner focused - if
anything breaks, I can get a replacement shipped to me in advanced the next
day for ~3-5 years (component depending), but I am seeing much more demand for
AMD as of late.

~~~
sundvor
Yeah they silently killed my overclock 9 months ago by limiting the 6850k to
38x multi due to "Spectre Mitigation". It's the worst BS ever...

If I remove the Intel microservice dll from my windows system folder then I
get my 41x back, but there's been no documentation forthcoming. Just finding
out why speeds were nerfed required lots of research.

Having said that I can see that Asus may have _finally_ released the required
motherboard update for this. Updating requires me to reset everything by hand,
as Asus doesn't let me simply back up settings - they are invalidated between
bios versions. Between that and eating CPUs if using XMP, it's the worst
motherboard I ever bought.

[https://www.asus.com/us/Motherboards/ROG-
STRIX-X99-GAMING/He...](https://www.asus.com/us/Motherboards/ROG-
STRIX-X99-GAMING/HelpDesk_BIOS/)

Needless to say my days of building Intel systems are over.

~~~
op00to
Eating CPUs?

~~~
sundvor
The automatic XMP settings upped one of the more obscure voltage lines (new to
me since my last CPU the 2600k, which still runs perfectly on 4.4ghz as son's
PC) to dangerously high levels. I received two CPU replacements under
warranty, before I found the particular setting that needed to be put under
manual control. I think I was 1/3rd of my way to the third CPU dying.

This was incredibly sloppy work by Asus.

So yeah, eating CPUs.

~~~
sundvor
I see I got a few upvotes, so this must resonate here - thanks, wanted to back
this up with specific details:

The chipset was the Broadwell EP. VCCSA ran at ~1.33, when the maximum should
have been 1.25. Also VCCIO was 1.25, when it'd run happily at 1.096v.

Point taken was to go through every single voltage setting and research safe
ranges, not trust auto for _anything_. There were changes in the CPU memory
controllers around this time, and that's what caused the CPU to die when
unsafe voltages were applied; Asus' XMP settings for my 4x8=32GB 3200 Trident
Z DDR4s (17-18-18-36) turned them to mush.

I don't have a degree in electrical engineering; it seems like Asus wanted you
to have one.

Looking back I'm discovering more details... and another guy who experienced
essentially the same, with others backing up the instability of the platform:
[https://forums.anandtech.com/threads/my-6850x-just-died-
just...](https://forums.anandtech.com/threads/my-6850x-just-died-just-like-
that.2500188/)

Whilst I want to move over to AMD, I'd like my current 6850k to last longer as
my son's computer. It seems like I won the silicon lottery with my old 2600k
that he's currently using though; running fixed 4.4ghz which is a very nice
upgrade from the stock 3.4ghz.

The original article in this thread is music to my ears - for AM4, just add a
great cooler and away we go. It's tempting to upgrade both computers at the
same time.. which kind of wrecks my hand me down thing, but I'm sure the 6850k
got degraded by 3 weeks on the bad settings as I've had lots of issues with it
(unlike the 2600k).

(Final edit: Thus my disdain for Asus' inability to back up BIOS settings on
this board when performing BIOS updates like I should now to get my overclock
back, unlike others I've had; if I miss even a single one of the voltage
settings, I can expect it to fry the CPU - unless they actually fixed their
sh*t in one of the BIOS updates).

~~~
Grimm665
My Strix X99 just kicked the bucket a week ago...is this possibly why? Shit,
I've had it on XMP since I built it 3 years ago.

I've already gone through one CPU, about 4 months after I built it. Now the
whole machine is just dead with no signs of life. I bought a new board
assuming the board was bad but maybe I need to test the CPU now...

Agreed though, one of the worst boards I've ever bought and seriously
rethinking all Asus purchases in the future.

------
chx
Chiplets is what gives AMD an absolutely brutal advantage. Their high end
chips do not need expensive large dies -- just a few small ones. Yields are
much better. And they can bin each chiplet separately. Oh and they don't spend
the expensive top notch process on the I/O part of the CPU either. Intel might
be hard pressed to catch up to this -- sure the 7nm EUV process in two years
and a bit will very likely be a serious jump in IPC but if you are comparing
similarly priced server CPUs then even that is very likely to be simply not
enough due to this chiplet strategy. For the foreseeable future, inertia alone
is the only reason for anyone to buy an Intel server chip.

~~~
jorvi
> Intel might be hard pressed to catch up to this

We were constantly saying this about AMD until Zen, and Zen is largely
credited to Jim Keller. Where did Jim Keller go after Zen? Intel. I'm lowkey
afraid that AMD will run out of steam after one or two gens and Intel+Keller
will have just finished developing an insane architecture that brings us right
back to the pre-Zen era.

~~~
karpodiem
People who know these things have said that Zen was principally designed by a
different individual, not Keller. I'm trying to find the link/who that person
is.

~~~
bri3d
Mike Clark. I have no inside information but the rumor I have read is that
Keller was mostly responsible for the cancelled AMD ARM effort while Clark did
Zen. Clark is explicitly credited as the architect/principal designer for Zen
2.

------
neilmovva
The author's comments on cache sizes are a bit reductive. Not all "L3" is
created equal, and designers always make tradeoffs between capacity and
latency.

In particular, the EPYC processors achieve such high cache capacities by
splitting L3 into slices across multiple silicon dies, and accessing non-local
L3 incurs huge interconnect latency - 132ns on latest EPYC vs 37ns on current
Xeon [1]. Even DDR4 on Intel (90ns) is faster than much of an EPYC chip's L3
cache.

Intel's monolithic die strategy keeps worst case latency low, but increases
costs significantly and totally precludes caches in the hundreds of MB.
Depending on workload, that may or may not be the right choice.

[1] [https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-
gen/7](https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7)

~~~
pjc50
Is that really correct? That's _huge_ latency for something that's in the same
package. You can buy discrete SRAM with 70ns latency.

~~~
mort96
OP said only non-local L3 is 132ns. Local L3 (i.e L3 close to the core) is way
faster, and the core would usually use local L3 cache.

~~~
pjc50
Oh I see - a tiny NUMA system within the package.

~~~
DiabloD3
Kind of.

In general, all Zen generations share two characteristics: cores are bound
into 4 core clusters called CCXes, and two of those are bound into a group
called a CCD. Chips (Zen 1 and 1+) and chiplets (Zen 2) both have only ever
put one CCD per chip(-let), and 1, 2, and 4 chip(-lets) have been put on per
socket.

In Zen 1 and 1+, each chip had a micro IO die, which contains the L3, making a
quasi-NUMA system. Example: a dual processor Epyc of that generation would
have one of 8 memory controllers reply to a fetch/write request (whoever had
it closest, either somebody had it in L3 already, or somebody owned that
memory channel).

L3 latency on such systems should be quoted as an average _or_ as a best
case/worst case. Stating L3 as worst case only ignores memory cache
optimizations (such as prefetchers grabbing from non-local L3 _and_ fetches
from L3 do not compete with the finite RAM bandwidth, but add to it, thus
leading to a possible 2-4x increase performance if multiple L3 caches are
responding to your core); in addition, Intel has similar performance issues:
RAM on another socket _also_ has a latency penalty (the nature of all NUMA
systems, no matter who manufactured it).

Where Zen 1 and 1+-based systems performed badly is when the prefetcher (or a
NUMA-aware program) did not get pages into L2 or local L3 cache fast enough to
hide the latency (Epyc had the problem of too many IO dies communicating with
each other, Ryzen had the issue of not enough (singular) IO die to keep the
system performing smoothly).

Zen 2 (the generation I personally adopted, wonderful architecture) switched
to a chiplet design: it still retains dual 4 core CCXs per CCD (and thus, per
chiplet), but the IO die now lives in its own chiplet, thus one monolithic L3
per socket. The IO die is scaled to the needs of the system, instead of
statically grown with additional CCDs. Ryzen now performs ridiculously fast:
meets or beats Coffee Lake Refresh performance (single and multi-threaded) for
the same price, while using less watts and outputting less heat at the same
time; Epyc now scales up to ridiculously huge sizes without losing performance
in non-optimal cases or getting into weird NUMA latency games (everyone's
early tests with Epyc 2 four socket systems on intentionally bad-for-NUMA
workloads illustrate a very favorable worst case, meeting or beating Intel's
current gargantuan Xeons in workloads sensitive to memory latency).

So, your statement of "a tiny NUMA system within the package" is correct for
older Zens, not correct (and, thankfully, vastly improved) for Zen 2.

~~~
smueller1234
Which EPYC 2 four socket systems? I don't think those exist.

~~~
DiabloD3
Sorry I misspoke, dual socket Epycs compared to four socket Xeons; Intel may
following AMD and abandoning >2 socket, as well.

------
FullyFunctional
Very interesting info on the overclocking difference between TSMC 7nm and
Intel 14nm+++ however a few misconceptions:

\- Intel staying low core count probably wasn't evil intent: the software
wasn't there and Intel had better single thread perf than AMD. AMD was
basically forced into more cores earlier because of weakness in single thread.
Today, the software _is_ there (well, mostly) and we can all take advantage of
more cores.

\- Why did Intel fall behind? Easy: Brian Krzanich's hubris pushed the process
too hard, taking many risks, and the strategy failed spectacularly.

\- PCIe Gen4 does matter. M.2 NVMe has been read limited for a long time
already (NAND bandwidth scales trivially). The I/O section of this article is
basically nonsense.

\- There's is nothing magical about x86, nor about the AMD and Intel design
team. If the market is there, there will be competitive non-x86 alternatives.
The data center market is pretty conservative for good reason - but ML is
upending a lot of conventional wisdom so it'll be interesting to see what
happens.

~~~
ac29
> PCIe Gen4 does matter. M.2 NVMe has been read limited for a long time
> already (NAND bandwidth scales trivially). The I/O section of this article
> is basically nonsense.

I think the author's point was that storage is already plenty fast enough for
many tasks. Personally, I cant feel the difference in storage performance
between my NVMe system and older SATA SSD ones, despite NVMe being much
faster.

~~~
FullyFunctional
Ah, but that’s making assumptions about the user. There are lots of power
users for whom IO bandwidth matters; high-end video editing for example.

~~~
tjoff
And low-cores are not?! We have been core starved for almost a decade!

IO matters but peak bandwidth sequential reads are not such a limiting factor
even for power users.

------
Animats
Intel is the last US company with a bleeding edge fab. All other fabs below
15nm are outside the US, except for one Samsung fab in Texas. When Intel falls
behind, that's the end of the US as a leader in the semiconductor industry.

~~~
nosianu
I am in no position to evaluate this article, but I found it interesting that
it exists more so than the specifics, especially since it was a conservative
source: [https://www.theamericanconservative.com/articles/americas-
mo...](https://www.theamericanconservative.com/articles/americas-monopoly-
crisis-hits-the-military/)

They look at the military angle specifically ( _" Wall Street's short-term
incentives have decimated our defense industrial base and undermined our
national security."_) but it's wider than that.

Example quotes:

> _...in the last 20 years, every single American producer of key
> telecommunication equipment sectors is gone. Today, only two European
> makers—Ericsson and Nokia—are left to compete with Huawei and another
> Chinese competitor, ZTE._

> _...public policies focused on finance instead of production, the United
> States increasingly cannot produce or maintain vital systems upon which our
> economy, our military, and our allies rely._

As for chip production capacity, "N. America" (is there anything in Canada or
is this just "U.S."?), it's just 12.8% of worldwide capacity and 3/4 of
capacity is in Asia: [https://anysilicon.com/semiconductor-wafer-capacity-per-
regi...](https://anysilicon.com/semiconductor-wafer-capacity-per-region/)

The idea of globalization was that where production is located does not
matter, market "magic" somehow makes it irrelevant. I don't see how it does
not matter when the imbalance becomes as extreme as it is nowadays. "Finance"
is not an industry (if anyone wants to argue with me about the definition of
that word I refer to
[https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-...](https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-
definitions) \-- you know what I mean).

~~~
ryacko
Is it necessary for manufacturing to be in the United States? Certainly secure
shipping lanes and a country with a friendly government is enough.

~~~
forty
If the end users are in the US, I assume there are financial and (more
importantly) environmental costs associated with shipping when everything is
made in Asia.

We need to re-learn to consume local products if we want to save this planet.

~~~
pjc50
Shipping costs are fairly trivial to the environment compared to manufacturing
for semiconductors, because they're extremely value-dense. And the
manufacturing process is intensive of energy, water and nasty solvents. Some
of the early US semiconductor sites are now Superfund sites.

~~~
mcny
I didn't know what Superfund was so I googled it:

> Superfund sites are polluted locations in the United States requiring a
> long-term response to clean up hazardous material contaminations. They were
> designated under the Comprehensive Environmental Response, Compensation, and
> Liability Act (CERCLA) of 1980. CERCLA authorized the United States
> Environmental Protection Agency (EPA) to create a list of such locations,
> which are placed on the National Priorities List (NPL).

[https://en.wikipedia.org/wiki/List_of_Superfund_sites](https://en.wikipedia.org/wiki/List_of_Superfund_sites)

> Superfund is a United States federal government program designed to fund the
> cleanup of sites contaminated with hazardous substances and pollutants.
> Sites managed under this program are referred to as "Superfund" sites. It
> was established as the Comprehensive Environmental Response, Compensation,
> and Liability Act of 1980 (CERCLA).[1] It authorizes federal natural
> resource agencies, primarily the Environmental Protection Agency (EPA),
> states and Native American tribes to recover natural resource damages caused
> by hazardous substances, though most states have and most often use their
> own versions of CERCLA. CERCLA created the Agency for Toxic Substances and
> Disease Registry (ATSDR). The EPA may identify parties responsible for
> hazardous substances releases to the environment (polluters) and either
> compel them to clean up the sites, or it may undertake the cleanup on its
> own using the Superfund (a trust fund) and costs recovered from polluters by
> referring to the U.S. Department of Justice.

[https://en.wikipedia.org/wiki/Superfund](https://en.wikipedia.org/wiki/Superfund)

~~~
pjc50
Sorry, I shouldn't have used the Americanism. I had in mind specifically
Fairchild (trichloroethane, arsenic contamination of groundwater)
[https://cumulis.epa.gov/supercpad/SiteProfiles/index.cfm?fus...](https://cumulis.epa.gov/supercpad/SiteProfiles/index.cfm?fuseaction=second.contams&id=0901680)
, but there are plenty of others.

See also [https://www.nytimes.com/2018/03/26/lens/the-superfund-
sites-...](https://www.nytimes.com/2018/03/26/lens/the-superfund-sites-of-
silicon-valley.html)

------
tiffanyh
It should be noted that the author is Matt Dillon of Dragonflybsd fame.

I'll repost a previous post I made

[https://news.ycombinator.com/item?id=15484735](https://news.ycombinator.com/item?id=15484735)
\------

To also give context as to what Dragonfly BSD is, DragonFly BSD was forked
from FreeBSD 4.8 in June of 2003, by Matthew Dillon over a differing of
opinion on how to handle SMP support in FreeBSD. Dragonfly is generally
consider as having a much simpler (and cleaning) implementation of SMP which
has allowed the core team to more easily maintain SMP support; yet without
sacrificing performance (numerous benchmarks demonstrate that Dragonfly is
even more performant than FreeBSD [5]).

The core team of Dragonfly developers is small but extremely talented (e.g.
they have frequently found hardware bugs in Intel/AMD that no one else has
found in the Linux/BSD community [6]). They strive for correctness of code,
ease of maintainability (e.g. only support x86 architecture, design decisions,
etc.) and performance as project goals.

If you haven't already looked at Dragonfly, I highly recommend you to do so.

[5]
[https://www.dragonflybsd.org/performance/](https://www.dragonflybsd.org/performance/)

[6] [http://www.zdnet.com/article/amd-owns-up-to-cpu-
bug/](http://www.zdnet.com/article/amd-owns-up-to-cpu-bug/)

------
twblalock
It's weird for Intel to be falling behind against AMD. Intel has about 10x the
revenue and 10x the number of employees. Yet AMD is able to compete directly,
and even beat Intel in some areas.

Good for AMD, but I'm more interested in an explanation of how Intel allowed
this to happen.

~~~
thunderbird120
Intel ties its architectures to its foundry nodes since it both designs the
chips and produces them, unlike AMD. This means that they can get some extra
performance but it also means that if there problems with the new node, like
the ones they had with 10nm, everyone just has to sit on their hands until
it's resolved. Going forward they've decided to decouple their architectures
and nodes so that this doesn't happen again. Despite what other commenters are
saying, this issue isn't really a result of them getting complacent. Intel's
original plan for their 10nm node was actually extremely ambitious both in
transistor density and in implementation of new techniques. Too ambitious, as
it turned out.

~~~
close04
> Despite what other commenters are saying, this issue isn't really a result
> of them getting complacent. Intel's original plan for their 10nm node was
> actually extremely ambitious [...] Too ambitious, as it turned out.

Couldn't it be both?

~~~
sounds
Basically Intel does fine when the lowest 2 levels of management are able to
keep the company functional.

The decision to bet the company on 10nm came from the top (Krzanich).

If the entire company is dysfunctional, no amount of low-level work can undo
that: if the captain runs the ship aground, adding more sailors bailing water
doesn't get the ship to its port.

From the outside, Intel looks complacent. To a degree, that's true, but the
inside story is more panic and confusion. The third-party vendors who provide
the EUV tools are in panic mode. The leadership is in confusion mode (no
amount of seppuku can make the old ways work; re-organizing those 2 levels of
low-level managers can't fix this; top-down cannot turn back time and assume
that 10nm will be this bad).

Consumers really only see the output of the product/sales division, such as
which features to disable on which SKUs. Intel is facing an actual engineering
failure; there are no bloody features to disable. Taking the few 10nm parts
that made it out the door and sticking them in the 14nm "10th generation" is a
product/sales decision. Ignoring the people telling you 10nm is a flop -
that's a CEO level decision.

------
virtualwhys
And for laptops everyone is stuck with Intel 14++ until AMD delivers 7nm for
mobile (Intel 10nm offerings look very underwhelming).

What a great time to be in the market for a new machine, lots of hot air
(literally) and throttling with the 6/8 core Intel i9s.

~~~
zaarn
10nm has to compete with 3 generations worth of improvement on 14nm. Intel dug
their own grave there, they gambled on 10nm, they lost, then they gambled on
10nm being good enough to beat improved 14nm processes, and lost again.

~~~
virtualwhys
If they don't jump straight to 7nm then next iteration of 10nm may be more
impressive.

At 6-8 cores a 10nm with low base clock and decent boost speeds would probably
be a decent option provided it doesn't throttle in the thin/light laptops of
today.

Consumer is caught between worlds, the old power hungry one, and the new
efficient one where long battery life and cool/quiet mobile systems are the
norm (at least that's the hope).

~~~
zaarn
From what we've seen in first preliminary benchmarks, the 10nm isn't
impressive in either Performance or Power Consumption and AMD trumps it in
both. The consumer is only trapped if they choose Intel.

------
tempguy9999
> (read: Intel trying real hard to keep people on 4 cores so they could charge
> an arm and a leg for more)

There's quote, something along the lines of "You have to cannibalise your own
business. If you don't someone else will". A lesson intel chose to forget
because it made them more money, right up until it didn't.

~~~
reacweb
Interesting quote. I shall forward it to my boss (we are a software
publisher).

~~~
dagw
That is basically Clayton Christensen whole shtick (may even be his quote). He
published a rather influential book 20 years ago (The Innovator's Dilemma)
that popularized that idea. It's an interesting read, even if not everything
has aged equally well.

~~~
feb
That book is on my shelf to be read. In what way did age poorly ?

~~~
dagw
The book as a whole has aged pretty well (with the caveat that I haven't read
it in over a decade). My recollection of the book is that it tended to treat
disrupting an entrenched player as a panacea and being disrupted as a death
knell. It was somewhat un-nuanced in its disruptor/disrutptee split, and
doesn't really look much at why some of the disrupted companies remained
perfectly fine and why some of disruptors floundered.

Still a great book and it was highly influential on how I think about
business. He's also written a bunch of books after that book that may or may
not cover some of those points.

------
Ygg2
I think number one is definitely wrong. Recent inquiry led reviewers to
discover not all boards are created equal. While differences aren't drastic on
some boards you can't reach boost clock written on the box.

[https://youtu.be/o2SzF3IiMaE](https://youtu.be/o2SzF3IiMaE)

~~~
0-_-0
Reaching max boost clock could just mean some boards boosting for a nanosecond
to 4.5 Ghz and then immediately dropping back, and others keeping a lower but
stable clock. Aren't average clocks more important?

~~~
Ygg2
I think that's the point. Most cores don't ever reach advertised boost clocks
by default (let alone exceed the boost clocks as was advertised in AMD's
commercial explaining PBO). It seems you need to manually overclock via trial
and error each core, and even then, you won't get overclocks on all cores.

------
Joeri
Intel has been in this situation before and found their way out from it. They
may pull another rabbit out of a hat like they did with the core architecture,
or they could always just buy AMD.

~~~
dreich
They cannot buy AMD.

~~~
wjnc
Given the financial positions involved they could easily afford to (market cap
order of 30 vs 200 billion USD, Intel free cash flow per quarter of 5-15
billion). Regulatory authorities probably wouldn't let them, but that is
another argument to make.

~~~
FartyMcFarter
It's the same argument - it's just not going to happen, regardless of monetary
considerations.

------
floatboth
> on a Zen 2 system if you increase the voltage to push frequency you also
> wind up increasing the temperature which retards the maximum possible stable
> frequency

Well, that effect is there on Zen 1 to an extent, but you can overpower it
with >1.4V.

------
psurge
Does anyone know of any EPYC 7002 servers, preferably single socket ones, that
also support EDSFF E1.L SSDs?

~~~
wmf
Just get Supermicro to make it for you.

------
blingojames
Will this effect low end CPU prices?

------
superkuh
A bit off topic but this is the best webpage design I've seen on the HN
frontpage in months.

~~~
castratikron
[http://bettermotherfuckingwebsite.com/](http://bettermotherfuckingwebsite.com/)

I actually used this template for an internal development tool and everyone
loves how fast and simple it is.

~~~
echelon
Was this website written by Maddox? I haven't seen prose like this since the
early 2000's. It's way over the top and never calms down.

~~~
samplatt
By the looks of it, no; Drew McConville is not George Ouzounian. Guess Drew's
just a fan.

~~~
drew_mc
confirmed

------
ngcc_hk
That is cpu counts etc abc temp. But is it work for systems. Suppose you run
nvidia chips for AI, does this matter and how. If you do adobe’s premier or
apple’s, do it matter. The cpu matter still?

