
Intel Doubles Down on Doubled Up Xeons for HPC - rbanffy
https://www.nextplatform.com/2018/11/12/intel-doubles-down-on-doubled-up-xeons-for-hpc/
======
drewg123
The problem with Intel is that they are too worried about competing with
themselves. Every bit of their server pricing is designed to prevent people
from finding a "bargain". That works really well when you're only competing
with yourself, but falls down when outside competition finally arrives.

Right now, the AMD Epyc is the only game in town in terms of I/O bandwidth if
you want to build a single-socket box that can serve 200Gb/s from NVME to the
network. Part of this is because Intel has categorized heavy IO as something
where they want to force you to multi-socket, so they limit the PCIe lanes to
48 lanes per socket (which is enough for only about 150Gb/s).

So, I'm really looking forward to seeing where this chip falls in their
pricing structure, and whether or not it has more than 48 lanes of PCIe wired
out of the chip.

------
xoa
This reads mostly like a pretty PR-fed puff piece version of the announcements
of the Cascade Lake MCM that HN discussed last week [1], and again the
Anandtech piece [2] is more interesting as well as more skeptical. Granted it
hasn't been out for deep 3rd party testing and inspection yet, but from what
was announced after chewing on it a week I think this still feels like a
desperate rapid response move from Intel being caught off guard rather then
the result of a long planned strategy. It feels too much like they took what
they could given yield and TDP and simply applied a ton of glue they already
had until they kind of worked together in a single one-off product that could
at least produce headline numbers more competitive to AMD. No EMIB, no range
of products up and down the spectrum with smaller Xeon modules designed for
combination, just this ginormous 300W one-off with UPI. In fact this article
seems to have a bit of new info on that:

>*"that leverages one of the three UltraPath Interconnect (UPI) links to hook
the two physical chips together to create what essentially is a 48-core socket
with 12 memory controllers"

If that's correct and it's 3 UPI, then presumably it'd be the square config so
unlike a normal 4P system each processor does not necessarily have a direct
link to all others, they may have to go through multiple hops. It definitely
would not be "essentially a 48-core" at all since it'd complicate memory
access patterns significantly wouldn't it? If that's the layout the potential
for benchmark cherry picking and specific optimization that isn't applicable
to other workloads seems even higher then normal.

AMD got a leap on Intel last with x86-64 because they didn't want to do that
on the desktop, maybe MCM will turn out to be something a little like that
(though less dramatic) as well given the realities of increasing difficulties
in large single chip fabrication. It'd be interesting if internally Intel is
rethinking that but this doesn't seem like "doubling down" so much as a first
scramble.

\----

1:
[https://news.ycombinator.com/item?id=18389373](https://news.ycombinator.com/item?id=18389373)

2: [https://www.anandtech.com/show/13535/intel-goes-
for-48cores-...](https://www.anandtech.com/show/13535/intel-goes-for-48cores-
cascade-ap)

~~~
Symmetry
If it had been a considered move, planned well in advance they would have
designed some Xeon dies to use EMIB interconnects rather than making due with
the designs they had on hand.

[1][https://www.intel.com/content/www/us/en/foundry/emib.html](https://www.intel.com/content/www/us/en/foundry/emib.html)

~~~
rbanffy
Unless they consider this a transitional generation to future MCP processor
modules.

Launch early, assess market fit and plan accordingly. Intel did differently
with Phi and it didn't quite work the way they (and I) expected.

~~~
saltcured
Also, this MCM approach is a common Intel strategy. They did it for early dual
core CPUs (Pentium D) and again with early quad core CPUs (e.g. Q6600 was
basically two E6600 chips).

------
nathanasmith
_Rajeeb Hazra, corporate vice president and general manager of Intel’s
Enterprise and Government Business Group was asked about AMD’s upcoming Rome
chip with the Zen 2 microarchitecure, saying that the company was confident in
the “unparalleled assets” that are driving the evolution of Intel’s products,
both hardware and software. “Those assets are not just around cores and
frequencies and things like that of the past, but around how we put a diverse
set of IP together, how we actually enable standards in the ecosystem … and
harness the energy from both hardware and software out there on those
platforms,” he said. “So, yes, in summary, I’d say we are going to be
extremely competitive and drive to this next phase of converged computing for
HPC and AI with the innovations we have planned and I’m very confident that
that will happen in a way that our customers can continue to bank on us for
the leading technologies and solutions they need to run their businesses.”_

For some reason this answer reminds me a lot of Steve Ballmer's (in)famous
response when asked about the iPhone.

------
physicsguy
My experience with 40 core Xeons is that the memory bandwidth makes it too
difficult to get really good performance out of them as it is.

~~~
rbanffy
It depends on your problem. If your workload spends a lot of time waiting for
a cache miss, halving the number of threads and spreading them across cores
will increase available cache for each thread and increase your throughput
substantially. If the working data is larger than the cache, additional memory
channels will make your life much better, so getting the CPU with the most
memory interfaces will be more important than getting the one with more cores,
threads, SIMD pipelines or GHz. I would say that serving web requests from one
of these or simulating quantum computers is not the best cost/benefit you'll
find, but if you need to do something like CFD, it seems like a good match.

I may be off a bit, but the last two generations of Xeon Phis could emulate a
general 10-12 qubit quantum computer without leaving the HBM memory on the
processor module and hitting the external memory buses. I'm curious if anyone
has tried that.

------
grecy
Surely this is the path of the new New Mac Pro (2019 edition)

~~~
ksec
As soon as Thunderbolt 3 is opened up I would rather have an EPYC 2 + Vega 7nm
in the next Mac Pro.

~~~
grecy
Thunderbolt 3?

Surely it will be 4 so they can run at least one 8k display, if not two.

~~~
ksec
Well, the may be the original plan, since TB has been synced to PCI-Express.
But Intel doesn't have plans for PCI-E 4.0 yet, at least not on their roadmap.
May be they will go straight to 5.0, we don't know. But they key is an open
standard.

------
m_mueller
well, looks like they finally have to compete again.

------
mtgx
Because they can charge at least 4x the prices.

~~~
rbanffy
If you get 4 times the throughput for 4 times the price, but can get away with
single socket instead of 4 socket at half the energy, space and cooling budget
of the similar 4 socket solution, it's still an impressive gain.

Processor is a relatively small part of the TCO of a server. As of late,
memory and storage have been as impactful in the total upfront cost as
processors used to be, if not more, and space and cooling dominate the ongoing
costs.

~~~
ianai
Agreed. I wonder just how stymied tech has been by memory costs and tech.
Recent years have seen an increase in cores but not much in ram. We should be
talking about persistent storage at ram speeds or closer fractions of
terabytes of ram available for workstations.

------
karmenblack
I hope the Intel gets nationalised so that affordable computing power could be
released to the mases. At present only huge corporations can afford it and
this should change.

~~~
Shivetya
Against my better judgement I will reply to this. At first I was hoping you
were being sarcastic but I am not so sure.

the amount of computing power available to anyone this day and age is
astonishing and what keeps Intel's pricing in line is the competition from all
the other manufacturers. If anything there is such an abundance of power that
many fail to understand just how much can be done at lower levels instead of
assuming having the most is the best solution

