
AMD Claims World’s Fastest Per-Core Performance with New EPYC Rome 7Fx2 CPUs - ItsTotallyOn
https://www.tomshardware.com/news/amd-worlds-fastest-processor-epyc-rome-7fx2-cpus
======
com2kid
The 7F52 has 16MB of cache per core.

I'd love to see what is possible with a tiny runtime/OS in the kilobyte size
and running a microservice written in a native language off of each core,
everything out of the L3 cache.

I imagine the throughput would be amazing. Single thread per core, e.g.
cooperative multitasking. Do this for stream orientated workflows, or even for
processing data that is in reasonable sized chunks, it might be screaming
fast!

~~~
yjftsjthsd-h
When I was younger, I was very pleased when I stumbled across the trick of
putting all of a DOS system on a RAM-disk, which made the system fly. Then I
got to college and discovered in my CS classes that CPUs now had multi-
megabyte on-CPU cache, which immediately led me to wonder how hard it would be
to make a DOS system that ran completely out of cache. I'm not convinced that
it would be _practical_ for any number of reasons, but it still blows me away
every time I look at what modern CPUs have in-package:)

~~~
abbeyj
Coreboot does something similar to this. Very early in the boot process the
RAM has not yet been initialized and so cannot be used. You need to execute
some code to set up the RAM before you can use it. One approach to handling
this is to write that initialization code in assembly language and make sure
it only uses registers and doesn't touch any memory.

But that's inconvenient. You'd much rather be able to write your code in
something like C. Any C code compiled by an ordinary compiler is going to
access memory (at least for the stack). Writing and maintaining a custom C
compiler for this is a lot of work and still comes with a lot of limitations
on the C code that it will accept.

So they set up the cache in a way that it will never try to write its contents
out to RAM. Then you can use the output of an ordinary C compiler that uses
memory accesses and those accesses will all be served by the cache and never
touch RAM. They call this "Cache as RAM".

With some work you could probably boot into DOS like this.

[https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006....](https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006.pdf)

~~~
mcot2
Interesting that no one to my knowledge has built an architecture where sram
is ‘memory’ such that it would be transparent to C code but no slower dram is
used. 32MB is more than my 1998 computer had.

~~~
jnwatson
Many embedded systems have a block of SRAM directly addressable.

------
Robotbeat
What’s interesting and surprising to me is that the new Epyc 2 chips by AMD
have about the same cost per double precision teraflop as a GPU, even the ones
with good floating point hardware support. I assume the cost for these
accelerators is probably a little cheaper for high performance computing folk
able to buy in bulk, but still I was very surprised. I expected there to be an
order of magnitude difference is cost per flop, even with doubles. Once AMD
introduces AVX512, the cost per flop should improve even more.

Also, in the same vein, I was surprised that double and single floating point
cost per flop has stayed fairly stagnant the last couple years as NVIDIA seems
focused on improving lower precision performance (ie for machine learning
inference).

~~~
einpoklum
This is probably because you're comparing them to consumer GPUs, which are
designed this way - favoring integer and single-precision functional units on
the SM cores. That's a sort of a marketing/tiering strategy by nVIDIA. The
Teslas have good double-precision performance - but they are priced waaaay
higher than the consumer cards.

~~~
kllrnohj
> The Teslas have good double-precision performance - but they are priced
> waaaay higher than the consumer cards.

Yup - the Tesla V100 is 7 TFLOPs of double-precision at around ~$9000

A huge split between consumer & HPC happened in the aftermath of the Fermi
(2010) architecture. Fermi was really bad in the consumer space from all the
wasted die spent on unused double-precision. It was late, hot, and loud. And
barely even faster than the competition.

With Maxwell Nvidia basically removed all the FP64 support from the
architecture itself ( [https://www.anandtech.com/show/9059/the-nvidia-geforce-
gtx-t...](https://www.anandtech.com/show/9059/the-nvidia-geforce-gtx-titan-x-
review/2) \- native FP64 rate is 1/32'd the FP32 rate) - the result was a huge
boost to gaming performance. But it also meant that HPC users who want double-
precision _had_ to use Tesla cards. The actual architectures between GeForce &
Tesla are different now, it's not "just" a lockout anymore.

~~~
Robotbeat
Exactly. About 780Gflops/$ for Tesla V100 and 460Gflops/$ (peak) for the Epyc
7742.

I was very surprised it was this close. I thought the accelerator would be an
order of magnitude cheaper per double Gflop. And AMD isn't using AVX512, yet.

~~~
jiggawatts
Imagine the very near future when AMD starts using TSCM's 5nm process, which
has approximately double the transistor density of the current 7nm process
used for the EPYC 7002 series.

They could go to DDR5, PCIe 5, AVX 512, and still have a transistor budget
left over for whatever they like.

The 'whatever' is the interesting part. What exactly does a GPU do that a CPU
doesn't?

Typical GPUs have crazy high memory bandwidths and good latency hiding by
using many (thousands) of threads.

So if AMD does something like increase the number of memory channels _and_
implement 4-way SMT, they're poised to upset NVIDIA in the HPC space in a
_big_ way.

Many people would much rather program for a general-purpose processor than the
CUDA platform with all of its quirks and limitations...

~~~
chapplap
That's not exactly true once you have to deal with vector extensions like
AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the
ways to abstract it away end up giving you a GPU-like programming model (eg.
Intel ISPC).

Plus, this has largely been tried before with Xeon Phi and it didn't end so
well.

Huge vector units like AVX-512 are mainly useful for workloads that need huge
amounts of RAM that you just can't get with a GPU, or for workloads that are
very latency sensitive and incompatible with GPU task scheduling because they
are in some other CPU-bound code.

~~~
jiggawatts
> It's quite a pain to write by hand

And we all know that autovectorisation is hit-and-miss at best.

I wonder if there will be a new C-like language that has portable SIMD-like
capabilities in the same sense that "C is a portable assembly language".

~~~
mmozeiko
There is - Intel SPMD Program Compiler.
[https://ispc.github.io/](https://ispc.github.io/)

~~~
benibela
But does that work on AMD ?

------
xvector
The rise of AMD has been like a dream come true. I only wish Intel could get
their cards straight and compete in the prosumer space.

It used to be that Intel was a good choice of money wasn’t really an object,
but as time has gone on it has become harder to justify Intel chips regardless
of the pricing.

~~~
systemvoltage
> regardless of the pricing

If Intel sells $5 CPUs, will you change your mind? I don't understand the
obsessive praise of AMD. Keep in mind that Jim Keller, an Intel ex-chip
architect lead the AMD Zen platform when he was hired in 2013 (after working
at Apple on the A4/A5 SOCs). It is the same dude competing against himself.
Btw, he is back at Intel since 2018 to help Intel out.

~~~
amitp85
"Keep in mind that Jim Keller, an Intel ex-chip architect lead the AMD Zen
platform when he was hired in 2013"

Jim's wikipedia page doesn't say he was ex-Intel employee when he joined AMD
in 2013 to work on Zen.

[https://en.wikipedia.org/wiki/Jim_Keller_(engineer)](https://en.wikipedia.org/wiki/Jim_Keller_\(engineer\))

~~~
Cthulhu_
So Wikipedia is an authorative source now?

~~~
Hikikomori
Is it wrong in this case?

------
cletus
So after years of an effective Intel monopoly I'm really glad to see AMD is
back in a way that I don't think I've seen since the Athlon64/Opteron days.
Back then it was AMD who pushed the x86-64 instruction set when Intel was
claiming EPIC was the future (ha).

At this point, Intel's move to 10nm processes is an embarrassment. I'm sure
it's a difficult problem but historically Intel has been reasonably good at
planning process advancements but in the case of 10nm they've been off by
_years_. I believe the original goal was 2017? And we're still not there yet.

I would dearly love to see an honest postmortem of this and see what went
wrong. Who made promises they'd miss by so much, why, what the issues were and
so on.

The last PC I built (because apparently I still do that, even though it annoys
me no end) has an Intel 9700 in it. At the time that was probably the best
choice. 6 months later and it would no doubt have been a Ryzen.

I hope AMD keeps this up as we need the competition.

~~~
lend000
They got comfortable with Moore's Law, which in fairness, had held for a long
time, and continued to use the model that increasing feature density
quadratically was a linear problem. Now, it turns out that once you get near
the 10nm gate size range, the difficulty diverges from linear to exponential
(and perhaps even higher eventually as some hard limit is approached).

That, and Intel isn't an innovative company anymore. Now they are a process
company riding on their manufacturing dominance and x86 market share. It looks
a lot like Apple under Tim Cook, except add another decade since there was
innovative leadership (Andy Grove). They are a few consultants removed from
IBM at this point.

~~~
thethethethe
> It looks a lot like Apple under Tim Cook, except add another decade since
> there was innovative leadership (Andy Grove)

Ehh this is debatable. Apple has put out a few products that have completely
changed the market under Cook's tenure. AirPods have introduced a new head
phone paradigm. Apple Watch is waaaay ahead of the competition. The iPhone X
made full screen phones mainstream and introduced UI gestures that were copied
by Android.

Sure there have been some missteps _cough_ butterfly keyboard _cough_ but I’d
say overall, they are still producing interesting products that define a large
part of the consumer tech market

~~~
mntmoss
What I think Cook misses that Jobs got, and made for more exciting releases,
is the idea of a totally integrated service. The iPod's victory was also a
victory for iTunes. The iPhone was also the App Store. And when Jobs left,
those kinds of distinct pairings did too. They are hard to concieve of, and to
execute on.

In contrast, the AirPods and Apple Watch are more straightforward "make 'em
smaller" incremental moves. The engineering work is leading in many respects,
but it doesn't upend a market.

And Intel does have a history that was like Apple's in parts. A big part of
their advantage as the PC market heated up was in marketing an entire
nomenclature of what the platform could be and to provide comprehensive path-
of-least-resistance solutions around that, ensuring that the industry fell in
line around their technical lead rather than IBM or some competitor.

Those bones are still there in parts of the company - Intel chipsets are
pretty well regarded for dependability(seeing Windows crash because of Intel
drivers is a very rare event) and they've been good at getting the corporate
office to standardize on them - but increasingly the platform is getting
defined around mobile and server needs, which are a more competitive space
generally. Intel doesn't get to call the shots on 5G, for example - and huge
data center customers are in the business of optimizing the system end-to-end
to provide the most efficient general computing resource possible; everything
they touch commoditizes, and they will put their foot down if they smell
enterprise contract crap.

~~~
dannyw
I think you're simplifying the AirPods and Apple Watch; while glamorising
other Apple products. The iPod wasn't the first MP3 player. It was a MP3
player that worked well. The iPhone was actually not the first smartphone; it
was the first smartphone that worked well thanks to its multitouch screen.

Do you remember the first generation iPad? I owned it, and let me tell you. It
was, almost literally, 9 iPhones stuck together.

The AirPods is more than just make it smaller. People praise its convenience,
and its innovation is in skipping the cumbersome bluetooth pairing process.

~~~
sudosysgen
To be fair, how often do people pair their headphones? I think I paired my
headphones only twice in the past month, and that took about 45 seconds.

~~~
dannyw
I own an iPhone, Macbook Pro, Work Macbook Pro, Surface, and a desktop gaming
PC.

Switching your bluetooth headphones between 5 devices can be... a chore.

------
shantara
ServeTheHome goes into much greater detail when it comes to reviewing server
oriented hardware, its position on the market and enterprise specifics:

[https://www.servethehome.com/amd-epyc-7f52-benchmarks-
review...](https://www.servethehome.com/amd-epyc-7f52-benchmarks-review-and-
market-perspective/)

------
Robotbeat
One of the most impressive things about these Epyc 2 chips is the very high
PCIe bandwidth. A LOT of lanes, plus support for PCIe 4 (doubles the per-lane
bandwidth). Interesting options for extreme SSD storage speed &capacity (in a
single node) if you combine it with a PCIe expansion box. (And Potentially
other single-node performance metrics for accelerator cards supporting PCIe 4,
which NVIDIA doesn’t yet.)

~~~
chx
And the lanes are cheap.

[https://www.avadirect.com/Tomcat-HX-S8030-S8030GM2NE-AMD-
SoC...](https://www.avadirect.com/Tomcat-HX-S8030-S8030GM2NE-AMD-SoC-
SP3-DDR4-3200-2TB-3DS-LRDIMM-8-VGA-GbLAN-2-ATX-Retail/Product/13225593) this
is a $400 standard ATX board with 80 PCIe lanes (+ 16 more via risers). That's
the equivalent of 160 3.0 PCIe lanes.

~~~
kllrnohj
The lanes come from the CPU you put into the board, not the board itself.
Although yes they are still cheap, at least if you go with something like the
EPYC 7252 at ~$500 (which still has the full 128 PCI-E 4.0 lanes)

That said I have no idea how you would actually _feed_ that many PCI-E lanes
with an EPYC 7252, but if you can pull it off it's an insane $/lane value.

~~~
chx
I know it's the CPU but the board I linked is a very rare standard ATX board,
almost all boards are proprietary.

I presume you could build an insane fast fileserver with a real lot of M.2
disks and multiple 100GbE ports?

~~~
kllrnohj
Sure but I don't think the 8 core epyc would actually keep up with that many
NVME drives. At least not if you tried to actually hit 24+ of them at once.

Linus tech tips tried this and had to upgrade the CPU from the 24 core epyc to
the 32 core to get performance up to what they wanted.
[https://youtu.be/xWjOh0Ph8uM](https://youtu.be/xWjOh0Ph8uM)

Maybe just a bad deployment but there is overhead in filesystems. Especially
with checksums and compression and redundancy and etc...

~~~
Robotbeat
It's possible to bypass the CPU in some cases using NVMe over an RDMA layer
with Infiniband. PCIe 4.0 dual-port 200Gbps Infiniband/Ethernet adapters
exist[1] which are compatible with this approach:
[https://store.mellanox.com/products/mellanox-
mcx653106a-hdat...](https://store.mellanox.com/products/mellanox-
mcx653106a-hdat-sp-single-pack-connectx-6-vpi-adapter-card-hdr-ib-and-200gbe-
dual-port-qsfp56-pcie4-0-x16-tall-bracket.html)

[1]Although you can't saturate both of them through even a 16 lane PCIe 4.0
port which has ~250Gbps of throughput each way.... Which to me means that PCIe
4.0 is not at all too soon.

------
m0zg
It's just bizarre to watch how once unassailable Intel is totally floundering
in multiple aspects of their main business. I wanted to upgrade my aging Core
i7 workstation and looked into the current Intel HEDT lineup. Only 14nm, and
even without Spectre/Meltdown mitigations the chips are way slower unless you
can use AVX512. Ended up buying Threadripper 3970X with a quad-GPU capable
board, even though the CPU is _more_ expensive than anything HEDT that Intel
currently sells.

~~~
dmux
>It's just bizarre to watch how once unassailable Intel is totally floundering
in multiple aspects of their main business.

Isn't this just history repeating itself though? We could easily replace
"Intel" with any number of previous market leaders that have fallen by the
wayside.

~~~
m0zg
I don't remember any company flubbing their unassailable lead quite this
badly. I sense there might still be some complacency behind it. Sales are
probably doing well enough to not worry about it quite yet. But it's much like
C19: if AMD gets the mindshare (which it is in the process of acquiring), with
some lag those sales will start to die, and it'll be too late to do much about
them then. Any countermeasures have to be preemptive, and I just don't see
anything exciting being announced by Intel until at least 2021, whereas AMD
keeps releasing bombshell products every quarter like clockwork.

~~~
dragonwriter
> I don't remember any company flubbing their unassailable lead quite this
> badly.

I do.

Heck, among other examples, I remember the company being Intel, the market
being x86 general purpose desktop/laptop processors, and the firm they blew
their long-established unassailable lead to being AMD. I also remember AMD
turning around much quicker and flubbing it back...

Actually, unless I'm mistaken, that happened _twice_ before, the first time
being the reason the now-universal standard for 64-bit x86 is what used to be
“AMD64”.

> AMD gets the mindshare (which it is in the process of acquiring), with some
> lag those sales will start to die, and it'll be too late to do much about
> them then

AMD had the mindshare for quite a while before, but Intel was able to do
enough about it that people apparently forget that it even happened. The
market is fickle, and AMD is at least at good as flubbing advantaged positions
as Intel, judging from history.

~~~
anon73044
...almost as if everytime Jim Keller takes the reins at AMD, they pull away
from the competition...

~~~
m0zg
He's at Intel now. :-)

------
cced
For someone considering their next build with a usecase of:

\- programming, docker, golang

\- gaming

can anyone recommend a resource for determining the relative performance of
processors? With all the new of how well AMD is doing, I’m still not sure how
to look at a given task, and determine which processor would perform better.

Does anyone know of such a source?

~~~
mebutnotme
Unless you are compiling massive projects then your best bet will likely be a
3900x. You get more cores then you can likely use to handle all the
programming multitasking while also having a cpu that’s 5-10% off of the best
gaming cpu available. All while keeping within a reasonable budget.

~~~
Analemma_
Without question, get the 3900x. It's a _bit_ behind Intel in single-threaded
performance, but only barely, and the embarrassment of cores you get compared
to the i9-9900K more than makes up for it. Microcenter currently has it on
sale for $379 if you buy it together with a compatible mobo, which is the deal
of the century.

~~~
DeathArrow
I think Ryzen 3 is coming this year, so it might be better to wait a bit.

~~~
greggyb
It's out and the darling of most tech news.

There is some confusion in their naming. Zen is the architecture name, and so
far we've had Zen, Zen+, and Zen 2. The consumer processor line is branded
Ryzen (and Ryzen Mobile for laptop parts). The HEDT processor line is branded
Threadripper, and the server line is branded Epyc.

Ryzen and Threadripper 1000-series are Zen.

Ryzen Mobile 2000-series is Zen.

Epyc 7001-series is Zen.

Ryzen and Threadripper 2000-series are Zen+

Ryzen Mobile 3000-series is Zen+.

Ryzen and Threadripper 3000-series are Zen 2.

Ryzen Mobile 4000-series is Zen 2.

Epyc 7002-series is Zen 2 (Epyc skipped Zen+).

Zen 3 is expected in 2020, based on AMD guidance. Asssuming they follow their
part numbering scheme, we should expect this to appear in Ryzen and
Threadripper 4000-series, Ryzen Mobile 5000-series, and Epyc 7003-series.

------
grenoire
Has the rise of AMD led to a shift in talent going their way as well? Not sure
how the loyalty dynamics are in the chip engineering industry.

~~~
code_biologist
It already sounds like the environment in Intel isn't great, and a lot of the
staff dynamics are very driven by cost-cutting. Great video on youtube posted
in the last week with a bunch of leaks from Intel employees:
[https://youtu.be/agxSclh27uo](https://youtu.be/agxSclh27uo)

~~~
AtlasBarfed
Pretty standard fare for monopoly/cartel in all segments in America. Pump
stock, hit options, golden parachute.

I will say the last time AMD had a brief lead on Intel with Athlon, they laid
on their laurels and started milking customers in record time. I think that
was Hector Ruiz.

The last time, it was pretty clear from the mobile processors that the core
engineering talent was somewhere in the company, I think the Core processor
was from an Israeli team rather than the one pushing out the high-frequency
pipeline-stalls-be-damned stuff.

But I get the feeling with the stunning, STUNNING process lead collapse that
the engineering talent is fundamentally gone.

Intel had a two or three year lead it was thought at one point.

------
jl6
256MB of L3 cache is incredible. There must be useful classes of application
that can fit entirely within that, OS and data included.

~~~
gameswithgo
remember that with ryzen that is split up between chunks of cores, so its not
entirely flexible.

that gets better with the upcoming generation though, its more unified.

~~~
brobinson
Can you disable cores to get more cache per (actually running) core?

~~~
wmf
Yes. That is what AMD is doing with these chips; e.g. the 32-core version has
64 cores with half disabled and you can disable more if you want.

~~~
akiselev
I thought the chiplets were binned before getting placed onto the carrier
silicon, so they wouldn't need to do the core fusing that Intel does?

~~~
wmf
There's no carrier silicon and I'm not sure what distinction you're making
between binning and fusing. The only way to get 256 MB of cache is to also
have 64 cores, the only way to get 192 MB is to have 48, etc.

------
ssutch3
Anandtech Link: [https://www.anandtech.com/show/15715/amds-new-
epyc-7f52-revi...](https://www.anandtech.com/show/15715/amds-new-
epyc-7f52-reviewed-the-f-is-for-frequency)

------
leoh
Could someone comment on how this compares with Ryzen Threadripper (e.g.
3990X)?

~~~
tracker1
More Ram and PCIe, lower clocks/heat.

------
ksec
I thought it is worth pointing out, Intel already sell a HEDT CPU that is
faster and cheaper than these AMD Counterpart, the 10980xe with 18 Core and
higher Clock speed.

All it takes is Intel to bin them with ECC Memory and renamed them to Xeon to
compete.

And it seems AMD is in no hurry to release their Zen 3. Giving plenty of time
for he market to digest their Zen 2. I just hope their Enterprise and Server
Sales Department do better. Because right now, while on paper / benchmarks
they are doing great, their sales figure aren't showing all the enthusiasm
many sites and comments are claiming.

And that is speaking from an AMD shareholders.

~~~
erulabs
BigCo's move slowly. It'll take some time before sales cycles close. I'd give
a quarter or two lag between now and really promising sales numbers to account
for how slowly things in a large data-center change. A human, somewhere, has
to rack each of those things :P

------
chunsj
Yet, there’s no official, ready to use BLAS/LAPACK.

