
AMD EPYC “Rome” Server Processors to Feature 8 to 64 Cores - areejs
https://www.techquila.co.in/amd-epyc-rome-64-core/
======
bhouston
64 cores but 128 with simultaneous multithreading. And in a 2C configuration,
you get 256 threads. That is a beautiful thing.

~~~
tasubotadas
It's a good time to be a Python developer :-D

~~~
geezerjay
It's a very good time to dive into microservices/containers/container
orchestration.

Good times.

------
rubyn00bie
The real question for me with all these AMD releases, what's Intel gonna
release? It's surprising to me that Apple didn't announce an AMD based Mac
Pro-- the chip Intel gave them must really be something. Too bad though
because there could be a lot more of them (eventual Mac Pros) out there I'm
guessing if they were using these Rome chips.

~~~
shereadsthenews
Intel chips are still significantly better than AMD for many common workloads.
If you are running SpecCPU or Cinebench in production then AMD might be right
for you; in all other cases I urge you to run your own realistic benchmarks
before buying. Intel’s “response” to Rome came out in April. It’s a 28-core
chip that costs $12k. The reason Intel doesn’t feel price pressure here is
they are still way ahead in performance on real high-end workloads like DBMSs
etc.

~~~
kllrnohj
Rome is Zen 2. You have absolutely no clue whatsoever how it performs. It may
still have the weaknesses of Zen 1, but it very well might not. AMD made a
bunch of changes, including a complete overhaul of the memory system (no more
NUMA).

We'll know for sure when the product is actually out and we have independent
benchmarks, but at this point you're just making things up and stating them as
facts.

That aside Epyc was already ahead on real high-end workloads like povray or
NAMD (
[http://www.ks.uiuc.edu/Research/namd/](http://www.ks.uiuc.edu/Research/namd/)
). Epyc also puts up top numbers on compilation performance and OpenSSL. So it
already isn't as black & white as you're pretending anyway. MySQL/DMBS is not
the only server workload that exists, even though it may be the only workload
you specifically care about.

------
nabla9
Does anyone have an idea how many different processors tapeouts are needed to
create this line?

It seems reasonable to assume that 48-core chip is just 64-core chip with few
defective cores. A lower clocked version is the same ship that high clocked
that did not pass some test.

~~~
opencl
There are three tapeouts used for the entire desktop, HEDT, and server lineup
combined. Everything uses the same 8 core CPU dies and there are two different
I/O dies- a smaller one used for desktop (which doubles as the X570 chipset)
and a big one for HEDT/server. There's also a separate 4 core + GPU die used
for laptop parts.

~~~
juergbi
This is correct for desktop and server. However, AMD hasn't said anything
about the HEDT I/O die yet, as far as I know.

Assuming Zen 2-based Threadripper will still have quad channel memory and 64
PCIe lanes, AMD might go for a medium sized I/O die instead of disabling half
of the large server I/O die. Or maybe the Threadripper volume is too low and a
separate tapeout is not worth it.

~~~
jjwhitaker
I haven't seen a lot for Zen2 based Threadripper news since the end of May so
maybe I missed how they are doing those chips like via the same 8 core
chiplets as Zen2 or something larger. If their yields are really really good,
maybe they are stacking 8 core chiplets for Rome and based on final yields on
that determine what range will be offered as Zen2 TR HEDT to split 16 core
Zen2 from Rome. Maybe at 16 cores, Zen2/Ryzen 9 is HEDT enough to compete?

If they are doing larger chiplets for Rome like 16 or 32 core but pairing up
with Infinity Fabric for up to 64 cores, yields could also determine what
becomes Zen2 TR with the weak chiplets becoming 8-32 core 92 chiplets), single
CPU HEDT material? Or will they have a whole other solution here like limiting
Zen2 TR to the same socket as previous gens to push upgrades vs a first gen TR
user jumping to a Ryzen 3850X vs low end Rome?

------
leeter
So at a 225w TDP I'd guess that part is going to be clocked in the 1.2 -1.5GHz
range. Definitely a specialist part IMO because with only 8 channels of memory
that's 1 channel per eight cores which is not a ton of bandwidth. So for
workloads that largely stay in the (I assume) ample L3/L2 caches that part
will rock. But for anything that needs a lot of bandwidth spread across cores
(databases come to mind) it will probably struggle where the higher clocking
24 core or 32 core parts will probably chug on fine. This oddly seems like a
case where the 48 core may still be a better buy even for similar workloads
due to higher base clocks.

Just my two cents.

~~~
juergbi
It seems like the 64 core part will be able to run at 2.35 GHz, see
[https://www.anandtech.com/show/13598/amd-64-core-rome-
deploy...](https://www.anandtech.com/show/13598/amd-64-core-rome-deployment-
hlrs-hawk-at-235-ghz)

~~~
leeter
Good to know I was assuming they'd run a bit hotter. That will actually make
the memory situation worse though in many regards.

------
abc_lisper
Giddy! At this rate I will be running a 64 Core processor on my desktop in 2
years!

~~~
mtgx
There's already a 64-core Threadripper rumored for the end of this year.

Top-tier Ryzen 9 should reach "only" 32 cores on the 5nm process in 2 years.

~~~
Ragib_Zaman
Is this conjecture or has AMD stated in a roadmap they plan to have 5nm in 2
years?

~~~
astrodust
They're going to use whatever TSMC uses, and TSMC is committed to 5nm. The
process is already being tested.

[https://www.tomshardware.com/news/tsmc-5nm-euv-process-
node,...](https://www.tomshardware.com/news/tsmc-5nm-euv-process-
node,38995.html)

------
bitwize
Oh man, I can't wait to make -j 128...

~~~
nottorp
You'd better have a couple of very fast NVME SSDs in a striped configuration
for i/o to keep up ;)

~~~
dman
Unless you use ramdisk! :)

------
cr0sh
I'm curious if anyone knows about that image of the AMD cpu - it says on it:

"DIFFUSED IN USA"

What exactly does that mean? Given the next line is "MADE IN CHINA", it would
seem like "DIFFUSED" should be "DESIGNED" \- or does that word have a new
meaning?

~~~
floatboth
IIUC, diffused is where the silicon wafer is created. For 14/12nm, that's the
GloFo fab in New York.

Made is where it's attached to the substrate, packaged etc.

------
djsumdog
Do these processors use NUMA, similar to the high end Intel Xeon chips?

~~~
ip26
NUMA isn't a feature, it's a design compromise. Ideally every line of memory
takes the same amount of time to access. But in large complex designs, you can
increase performance for some memory at the price of lower performance for
other memory, aka Non Uniform Memory Access.

(These chips do exhibit NUMA)

~~~
eightysixfour
I believe these chips have Uniform Memory Access via a shared memory interface
on the I/O die. Am I mistaken?

Edit, just confirmed:

>Thanks to this improved design, each chiplet can access the memory with equal
latency. The multi-core beasts can support to 4TB of DDR4 memory per socket.

[https://www.tomshardware.com/news/amd-64-core-128-thread-7nm...](https://www.tomshardware.com/news/amd-64-core-128-thread-7nm-
rome-cpu,38032.html)

~~~
ip26
They exhibit NUMA because if chiplet0 wants a line of memory that is held by
chiplet4, it has to go get it from chiplet4. So the degree of NUMA is improved
from the previous generation, but it is still not UMA.

~~~
paulmd
No memory is held by any chiplet, it's all held by the IO die and chiplets ask
the IO die to access memory for it.

So there is no longer "near" and "far". In a sense, it's all "far" now (but
hopefully not too far). But it is all _uniform_ now.

~~~
ip26
The chiplets have cache, which holds copies of memory. If a process has the
line open in an exclusive state, e.g. locked, other chiplets cannot just get
the line from memory, because it might be out of date. So they must go ask
whoever holds the lock to flush & release.

[https://en.wikipedia.org/wiki/MESIF_protocol](https://en.wikipedia.org/wiki/MESIF_protocol)

~~~
wmf
When you're talking about cache it's NUCA, not NUMA.

------
deevolution
What's stopping them from creating 1000 core cpus?

~~~
Symmetry
Yields are the main barrier to single pieces of silicon that big. You have a
certain chance of getting a defect per square mm of you chip and as they get
bigger the chance of a bad defect gets higher and yields go down. Often
they'll occur in a place where you can just disable a core or bank of cache
and still sell the chip but not always. So yield rates tend to go down as
chips get bigger. Also larger chips make less efficient use of the wafer.

Economically, there aren't so many people looking for 1000 cores that it makes
sense to put in the NRE to assemble a giant package to put all of that in
versus just selling a system that can have multiple sockets. Cooling limits
also make spreading out work across multiple sockets a better choice.

~~~
Namrog84
How do GPU cores differ in that they have thousands of cores?

~~~
wmf
They don't; GPUs have <72 real cores and thousands of marketing cores. And
they can disable defective cores so their massive dies are still usable.

------
mastax
Looks like this is an account that only posts links to this obscure Indian
tech blog.

~~~
mft_
All blogs presumably start out obscure. This one seems reasonably written and
the content is interesting. (Genuinely) What's the problem?

------
Royal
Having a Rome 64 processor doesn't sound very promising.

[https://en.wikipedia.org/wiki/Great_Fire_of_Rome](https://en.wikipedia.org/wiki/Great_Fire_of_Rome)

