
AMD Rome – is it for real? Architecture and initial HPC performance - rbanffy
https://www.dell.com/support/article/us/en/19/sln319015/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance?lang=en
======
arcanus
Dell has historically been a completely Intel shop, so the fact they are even
tepidly considering AMD here is likely because of a large number of customer
request for pricing and performance quotes.

~~~
ultraism
> Dell has historically been a completely Intel shop

Because Intel paid Dell and others to not use AMD[0][1]. Dell officially ended
their Intel exclusivity 13 years ago when they agreed to sell servers with
Opterons[2].

[0]
[https://en.wikipedia.org/wiki/Advanced_Micro_Devices%2C_Inc....](https://en.wikipedia.org/wiki/Advanced_Micro_Devices%2C_Inc._v._Intel_Corp).

[1] [https://business-ethics.com/2010/07/23/0901-dell-inc-
agrees-...](https://business-ethics.com/2010/07/23/0901-dell-inc-agrees-to-
pay-100-million-to-settle-sec-charges/)

[2] [https://www.cnet.com/news/dell-opts-for-amds-
opteron/](https://www.cnet.com/news/dell-opts-for-amds-opteron/)

~~~
PedroBatista
But make no mistake about it, Intel has a fat wallet and will use it against
AMD ( as confirmed in some leaked presentation )

Big discounts, kickbacks, "deals" and "programs" are being deployed and more
on the way.

~~~
ultraism
To put some perspective on this, Intel's Marketing, general and administrative
costs in 2018 were 6.7 billion[0] whereas AMD's net revenue was 6.4
billion[1].

[0]
[https://s21.q4cdn.com/600692695/files/doc_financials/2018/An...](https://s21.q4cdn.com/600692695/files/doc_financials/2018/Annual/Intel-2018-Annual-
Report_INTC.pdf)

[1] [http://ir.amd.com/static-
files/438f4934-2883-4c85-9193-d5218...](http://ir.amd.com/static-
files/438f4934-2883-4c85-9193-d5218465c096)

~~~
flukus
And as a potential customer it's important to remember that you're the one
paying that $6.7 billion.

~~~
p1necone
I'm reminded of that any time I see cheap products with slick marketing
(mostly fast food) - you have to assume they're cutting corners in their
actual product to pay for all that graphic design and video editing.

~~~
dpark
Or their revenue and income are so massive that taking a bit from it for
advertising is a drop in the bucket. Intel has revenue of nearly 71
billion/year vs AMD at 6.5 billion.

~~~
alimbada
That may be true, but despite that they still cut corners. See Spectre and
Meltdown vulnerabilities and the impact to the performance of Intel CPUs once
mitigated. In fact, Intel recommended disabling HyperThreading if you want to
be completely protected from the vulnerabilities.

------
zelon88
Can anyone attribute the recent surge in AMD performance to recent hiring
blitz or some other shift? I know they were struggling and trying out new
things with bulldozer modules and trying to compartmentalize as much as
possible. It reminds me of when Intel hired all those Israeli engineers who
went back to the PIII design and miniaturized it, gaining performance by
throwing away years of work on the P4.

Was there any of this at play recently at AMD?

~~~
JimmyAustin
Jim Keller [1] is widely credited for a large amount of Zen's success.
Interestingly he is now at Intel.

[1]
[https://en.wikipedia.org/wiki/Jim_Keller_(engineer)](https://en.wikipedia.org/wiki/Jim_Keller_\(engineer\))

~~~
lliamander
There's a good talk by him about how Moore's law isn't dead:
[https://www.youtube.com/watch?v=oIG9ztQw2Gc](https://www.youtube.com/watch?v=oIG9ztQw2Gc).

Obviously, people have different ideas about what they mean when they say
"Moore's law is/is not dead", but I appreciate Jim's point that even if the
current innovation curves we've been riding are slowing down, there are other
innovation curves that we can take advantage of, and that the combination
these curves mean that there's still plenty of innovation and improvement to
be made in processor design.

Of course, the question of whether the x86 architecture will be the foundation
for that future improvement remains to be seen. ARM and RISC-V are both eating
x86 from below, and the fact that both of the processor architectures allow
for greater competition among processor implementations suggests to me that
one (or both) may catch up with x86 at some point in the future.

~~~
sounds
Especially the part at
[https://youtu.be/oIG9ztQw2Gc?t=1917](https://youtu.be/oIG9ztQw2Gc?t=1917)

That Sunny Cove architecture slide shows a massive increase in the number of
execution units.

It may be that Intel's next architecture, combined with memory speed
improvements, is competitive with AMD.

But AMD also isn't sitting still. AMD's contributions (from what I can tell)
include:

* Core counts have blown past 6 cores / 12 threads

* CPU prices have just been cut by half

* PCIe lanes have blown way past 8

* ECC ram is being offered in consumer PCs

~~~
lliamander
I've also heard rumors of things like 4-way SMT. Of course, that's still a
ways off (if it ever materializes).

~~~
Symmetry
I wouldn't put much stock in that particular rumor unless AMD is going all in
on servers to the detriment of consumer chips. Or maybe they're developing two
cores but they've been working hard to re-use engineering effort until now.
They don't have the number of silicon engineers that Intel has and not even
Intel is developing separate server and consumer cores.

------
ossworkerrights
Thanks to amd’s recent developments, i basically cancelled buying a new mac
pro. Max ram speed 2666 ghz, max per core perf 3.5 ghz? What’s this 2015?

~~~
arvinsim
Given that Apple is tightly coupled with Intel for now, I would assume that
they did not expect AMD to deliver great new CPUs and did not create the Mac
Pro to compete with them in mind IMO.

~~~
ossworkerrights
Indeed, but why would anyone buy a Mac Pro given the massive gap? The OS
aside, which for me personal is a massive incentive, I don't see a good reason
to do so.

------
bitL
64c EPYC with 3.4TFlops - wow! That's GPU territory!

~~~
jing
Actually I'm pretty sure the Rome numbers are for double precision whereas
most numbers quoted for GPUs are for single precision or less, making Rome's
3.4tf even more impressive.

~~~
bitL
Half of Titan V/V100 FP64 sounds unbelievable! Can't wait to get my hands on
64c Threadripper with TRX80!

~~~
rrss
How much does threadripper usually cost relative to epyc for same # of cores?

~~~
bitL
20-30% usually. You get faster cores but no LRDIMM (i.e. you are constrained
effectively to 128GB ECC UDIMM, at best 256GB ECC UDIMM if you are lucky to
get 32GB ECC UDIMM modules). EPYC has 4TB ECC LRDIMM ceiling, new TR on TRX80
might have the same ceiling as well. I am glad that AMD provides TR as they
make way less $ on them than on EPYC, but it's a great marketing tool for
them. I am running some TRs for Deep Learning rigs (PCIe slots are most
important) on Linux, and they are great, Titan RTXs and Teslas run without any
issue, but Zen 2 should give me much better performance on classical ML with
Intel MKL/BLAS in PySpark/SciKit-Learn, so I can't wait to get some.

~~~
silvr
Naive question: Are you able to use MKL on an AMD chip without jumping through
too many hoops?

~~~
bitL
Yes, just pip install ..., but it's 2x slower than on Intel for Zen/Zen+. Only
Zen 2 is close to Intel.

~~~
sliken
Intel makes rather pessimistic assumptions about AMD and uses the model name
to pick which code path to use and ignores the CPU flags for floating point,
etc.

So if you want to compare performance fairly I'd use gcc (or at least a non-
intel compiler) and one of the MKL like libraries (ACML, gotoblas, openblas,
etc). AMD has been directly contributing to various projects to optimize for
AMD CPUs. They used to have their own compiler (that went from SGI -> cray ->
pathscale or similar), but since then I believe have been contributing to GCC,
LLVM, and various libraries.

~~~
bitL
Yeah, still, Zen 2 is much faster in OpenBLAS and is faster in MKL than Zen/\+
as well.

------
acd
Side note on these CPUs.

We are one step/generation away from running BGP IPv4 routing in PC CPU L3
cache."256MB L3 cache." I believe one needs 512MB L3 cache to fit the current
routing tables in cache enabling very fast route lookups on generic PC
hardware.

~~~
zajio1am
Current global BGP IPV4 routes fits in 150 MB of RAM with all BGP attributes
(in BIRD).

For forwarding, you do not need most attributes, but you may need better data
structures for best-match lookups.

------
vondur
Interesting that they are using Redhat 7.6 with a 3.3 kernel. I'm guessing
RedHat must backport a bunch of stuff in order to make an older Linux release
work well on such newer hardware.

~~~
raverbashing
> RedHat must backport a bunch of stuff

Yes. Though you have also the option of running a newer kernel (through EPEL
but maybe not necessarily)

~~~
aorth
It's actually ELRepo, which is community run, not EPEL, which is semi
officially run by Red Hat. But yes, very impressive you can get latest stable
and long term kernels on CentOS!

------
ct520
Lol let me guess what’s in the link. Some information about rome, some
information comparing it to other Amd processors in some use cases and no sign
of dell comparing it to intel.

------
dragontamer
I'm surprised that 4NPS gives even a mild benefit to bandwidth (maybe 13% or
so), given that the central IO die handles all communication anyway. You'd
think that with the architecture here, there'd be no real benefit to splitting
the RAM channels.

~~~
loeg
Perhaps, as article sort of suggests, inside the IO chiplet there are four
quadrants that each have two memory controllers associated with them, and the
cross-quadrant bandwidth isn't sufficient for 75% non-local memory bandwidth.

~~~
wmf
I made a video yesterday with some details on this topic:
[https://youtu.be/ghFx_jyP1U8?t=390](https://youtu.be/ghFx_jyP1U8?t=390)

------
arminiusreturns
I knew AMD was on the right track back when i built a 4 CPU Opteron (64cores
total) server for physics sim and one for number crunching (Comsol and genetic
sequencing) and started doing the math on the core/thread count relationship
to perf. It took them a few years but man did they deliver. I just wish I had
invested in them.

Can't wait to build a beast as my Valve Index VR rig.

~~~
wmf
Those Opterons were terrible which is why AMD was practically giving them away
at low prices. They only got on the right track five years later when they
ditched that architecture.

~~~
arminiusreturns
Very true, they had a fair share of issues, but it showed the different
direction they wanted to head which I agreed with. One of the biggest issues
was I could only get the quad cpu boards from two manufacturers and they were
very hit or miss mb's. (at it was mb's that exacerbated the issues of the
opterons)

------
shaklee3
I don't understand articles that do benchmarks like infiniband or Ethernet,
but set up a system where the nic is limiting the test. It would be useful to
do the same thing with enough cards to saturate the memory bandwidth.

~~~
wmf
Basically no one runs that way. And in this article they're not benchmarking
two systems against each other but different tuning parameters for one system.
And if the tuning helps then the NIC wasn't the limit.

~~~
shaklee3
No one runs what way? Running with multiple NICs is very common. You can see
in the graph that it peaks at line rate very early. Running a multiple NIC
test would have resolved that.

------
rotred
Why do these multimillion dollar companies have such shitty diagrams?

*edit: shitty as in pixelated; not the content

~~~
wmf
You want the shitty diagrams because those come from actual engineers. The
beautiful ones come from marketing.

~~~
typon
Being an engineer doesn't mean you have zero design mindset. Engineers make
beautiful diagrams too.

~~~
generatorguy
beauty is in the eye of the beholder. A diagram that conveys the information
that took the minimum amount of time to generate and has the aesthetic that
appeals to the engineer is beautiful. Maybe you have different taste?

~~~
rotred
My comment was concerning the quality of the image not the content. The
pictures in the article are pixelated.

