
Next-Generation IBM POWER10 Processor - mbrobbel
https://newsroom.ibm.com/2020-08-17-IBM-Reveals-Next-Generation-IBM-POWER10-Processor
======
jiggawatts
There's some great info in these slides:
[https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf](https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf)

\- They leapfrogged everyone else with PCIe v5 and DDR5

\- 1 TB/s memory bandwidth, which is comparable to high-end NVIDIA GPUs, but
for CPUs

\- Socket-to-socket interconnect is 1 TB/s also.

\- 120 GB/s/core L3 cache read rate sustained.

\- Floating point rate comparable to GPUs

\- 8-way SMT makes this into a hybrid between a CPU and a GPU in terms of the
latency hiding and memory management, but programmable exactly like a full
CPU, without the limitations of a GPU.

\- Memory disaggregation similar to how most modern enterprise architectures
separate disk from compute. You can have memory-less compute nodes talking to
a central memory node!

\- 16-socket glueless servers

\- Has instructions for accelerating gzip.

~~~
dragontamer
I appreciate the slides, but you've got some critical errors.

> \- Floating point rate comparable to GPUs

No where close. POWER10 caps out at 60x SMT4, and only has 2x 128-bit SIMD
units per SMT4. That's 480 FLOPs per clock cycle. At 4GHz, that's only 1.9
TFlops single-precision compute.

An NVidia 2070 Super ($400 consumer GPU) hits 8.2 TFlops with 448 GB/s
bandwidth.

> \- 8-way SMT makes this into a hybrid between a CPU and a GPU in terms of
> the latency hiding and memory management, but programmable exactly like a
> full CPU, without the limitations of a GPU.

Note that 8-way SMT is the "big core", and I'm almost convinced that 8-SMT is
more about licensing than actually scaling. By supporting 8 threads per core
(and doubling the size of the core), you get a 30-core system that supports
240-threads. That's a licensing hack: since most enterprise software is paid
per-core.

I'd expect that 4-SMT will be more popular with consumers (ie: Talos II sized
systems). Similar to how Power9 was

~~~
twoodfin
_Note that 8-way SMT is the "big core", and I'm almost convinced that 8-SMT is
more about licensing than actually scaling. By supporting 8 threads per core
(and doubling the size of the core), you get a 30-core system that supports
240-threads. That's a licensing hack: since most enterprise software is paid
per-core._

I don't think that's entirely fair: Massively multi-user transactional
applications (read: databases) are right in the wheelhouse of POWER, and
they're exactly the kind of applications that benefit most from SMT. Lots of
opportunities for latency hiding as you're chasing pointers from indexes to
data blocks.

~~~
dragontamer
While that's a fair point, databases are also among the most costly software
that is paid per-core. So the SMT8 system will have half the per-core costs as
an SMT4 system (because an SMT8 system will have half the "cores" of any SMT4
system. Even though an SMT8 core is just two SMT4 cores grafted together).

------
reacharavindh
It’s very cool, but unfortunately inaccessible for those without sky high
budgets and time to talk to sales reps.

I would happily experiment with one of these at our HPC cluster(a small group
at an University), but the idea of talking to a sales rep to even figure out
what it would cost puts me off completely, ignoring the licenses for most
interesting things to do with the hardware you buy.

I wish t. power boxes were as easy to buy as x86 boxes. Simply configure, get
an idea of a price, talk to the distributor and place an order.

~~~
__d
I worked with a financial trading firm that was interested in evaluating a
POWER system. They called IBM, who arranged a meeting our our offices.

From memory, about 8 IBM people showed up. They didn't seem to actually know
each other, but were from several different groups within IBM.

We sat down, and started by explaining what we did with our existing x86-64
systems, and what we thought we'd like to try with the POWER system. We asked
for a single box to evaluate, roughly the equivalent to our existing HP DL380
dual-Xeon boxes.

The folks from IBM then spent the next 40 minutes arguing with each other
about exactly which system we should be using. Five minutes before the meeting
was scheduled to end, one of them took charge and said they'd figure it out
offline, and get back to use with the details.

Several more rounds of email were exchanged, but we never actually got to the
point of being told what system we could have, or what its specs were, let
alone actually being able to physically get one and test it.

It was perhaps the most absurd situation I've seen in 30 years in the
industry.

~~~
foobar1962
There used to be a joke that IBM sales reps don't have children because all
they do is sit on the bed and tell their spouses how good it's going to be.

------
fancyfredbot
My take on performance using the slides mbrobbel posted, from
[https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf](https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf)

The dual chip module has 30 SMT8 cores running at 3+GHz, capable of 64 FP64
FLOPS/cycle when using the matrix unit. That gives 5.7TF of peak FP64
performance (Compared to 19.5TF on NVIDIA A100 when using tensor cores, and
9.7TF on A100 when not using tensor cores).

They say it has 3x the "general purpose socket performance" of power9 in FP
workloads. Trying to make sense of this from the other data, they have 15 SMT8
cores per chip (12 on Power9). The single chip module runs at "4+"GHz and dual
chip at "3+"Ghz. (4GHz on Power9). Each SMT8 core has 30% additional
performance compared to Power9 (slide 13). If I assume the lowest possible
clock that gets me to 2.4x comparing the dual chip modules to the previous
single chip modules, whereas assuming 3.75GHz clocks would give 3x.

------
deugtniet
Our institution runs multiple HPC clusters for all kinds of scientific use
cases. I remember reading about the head of the HPC department making the
switch from INTEL/AMD to IBM because it had a much larger memory / storage (I
can't remember which) bandwidth in some astronomy application. This made the
project feasible without having to invest in custom hardware.

It's good to see that POWER still excels in important use cases.

~~~
gnufx
POWER9 has large cache and can take large memory (at least for when it
appeared), but was particularly notable for bandwidth/latency. I think they
were the first with PCIe 4, for instance, when PCIe 3 was a bottleneck for HPC
interconnect.

The native interconnect in 10 looks interesting.

------
supernova87a
For the uninitiated, what's the value prop of these processors?

Cheaper $ cost per TFLOPs to make up for the trouble of dealing with a
specialty instruction set? Speed of certain specialized computations that
cannot be matched by alternatives?

Or how would one summarize it?

~~~
qubex
Open Architecture with no black boxes. Total control over your system of the
kind we used to take for granted before the Intel Management Architecture
days. Very focussed on throughput and centralised operation (for example
homomorphic encryption and encrypted memory to forestall snooping).

And yes, when used ‘correctly’, these systems can be very fast... in the
steady marathon kind of way rather than the spasmodic sprint-racer clock-
boosting-and-throttling manner of today’s mainline chips.

~~~
fluffything
> Open Architecture with no black boxes.

> ‘Open’ means that you’re allowed to understand exactly how it works and that
> there’s no mysteries. It means having the blueprints of the machine, not a
> free machine.

No, this is 100% incorrect.

The Power ISA, i.e., the software/hardware interface of the CPU, is open
source. This means that if you want to build a Power CPU that implements its
software interface, you can do so "for free".

That's it. You don't get "the blueprints of the machine", you cannot look into
how the CPU work internally and understand it, etc.

That's like having a standard API that anybody can implement, e.g., the C
standard library, but which Apple, Microsoft, etc. ship as a black box binary
blob, so you can't understand their implementation, search/fix bugs, etc.

So no, your claim is completely incorrect. The benefits of an open ISA only
apply to those wanting to build their own CPUs, which for Power is just not
even a handful of companies, none of them making their blueprints of their
CPUs openly available...

For end users, your machine is as open/closed on a system with an open ISA
like in one with a closed one. People paying 10k$ for a Raptor II in the name
of openness are throwing their money away.

This is a completely different situation than, e.g., RISC-V, where not only
the ISA is open-source, but the VHDL implementation of many RISC-V cores is
also open source, and you can buy those cores today.

~~~
gnufx
It's not about the ISA (I assume). The point is that these systems have
essentially all free software firmware, as I understand it. You have remote
management, but it's something you can presumably fix if you need to. Apart
from trust issues, you know how valuable that is if, for instance, you've had
to deal with BMCs' brokenness continually over the years.

~~~
to11mtm
Reading between the lines on statements made by Raptor I'm thinking that
POWER10 will not be open immediately upon release.

~~~
gnufx
Yes, that does plausible, which would be unfortunate.

------
filereaper
Have they improved Load-Hit-Store penalties from previous generations?

Lots of transistors and opcodes have been sacrificed for fancy things like
transactional memory, runtime instrumentation and other features but
fundamentals haven't improved requiring expensive compiler opts which
interpreters don't do and are expensive for JITs.

The Intel chips did the fundamentals better, has POWER caught up?

------
gnufx
I noticed recently that OpenBLAS has gained some code using the POWER 10
matrix-multiplication units:
[https://github.com/xianyi/OpenBLAS/tree/develop/kernel/power](https://github.com/xianyi/OpenBLAS/tree/develop/kernel/power)

------
simonebrunozzi
IBM is dying a slow death; they still "milk" the market with old stuff like
AS/400, because of a superb lock-in.

The "Power" business is still doing ok; but I'd bet in a few years one of the
other big guys will go at it (maybe Nvidia?) and start eating at their market
share.

------
totetsu
What do they do with the edge bits of the silicon wafer that are not squares?

~~~
ehsankia
Even dumber question, why do they need to be circular?

CDs and DVDs write in a circular pattern starting from the middle going
outwards, but the actual chips on these wafers seem to be their own individual
squares.

~~~
tormeh
Also dumb question: Why do dies need to be square? Wouldn't you get less waste
at the edges with a hex shape?

~~~
jl6
That’s a great question.

I would guess that dies are built from modular sections (e.g. SRAM cells), and
it’s important that two identical modules perform identically - signal
propagation time is relevant at this scale, so the shape and layout of each
module must be identical. I would further guess that rectangular layouts are
easiest to reason about, easiest to make masks for, easiest to pack
efficiently at the transistor level, and easiest to test.

But I don’t know of a fundamental reason why a sufficiently advanced VHDL
“compiler” couldn’t produce hex-cell or even circular layouts.

~~~
TheOtherHobbes
Chip dicing hardware can produce hex-cells, or any other cell with straight
edges. (Not circular - that's not a good shape to expect from crystalline
silicon.)

But - as you say - the modular sections are rectangular, and for most
applications there's no good reason to make the dies any other shape.

There's actually a patent for hex-cell chips, but it doesn't seem to have been
used for any significant projects.

[https://patents.google.com/patent/US6030885A/en](https://patents.google.com/patent/US6030885A/en)

------
peter303
36% of Supercomputer 500 list use IBM power CPUs, including top two
supercomputers.

[https://www.top500.org/](https://www.top500.org/)

~~~
willvarfar
Japan's Fugaku with Fujitsu Spark CPUs has just taken the crown.

IBM Power CPUs are now 2nd and 3rd.

[https://www.top500.org/lists/top500/list/2020/06/](https://www.top500.org/lists/top500/list/2020/06/)

[https://en.wikipedia.org/wiki/Fujitsu_A64FX](https://en.wikipedia.org/wiki/Fujitsu_A64FX)

~~~
eqvinox
That's an ARM(v8.2-A) chip, not SPARC.

Also, SPARC is spelled SPARC.

~~~
DonHopkins
And SPARC spelled backwards is CRAPS.

------
tgflynn
So who is going to actually fabricate these chips, Samsung ?

IBM transferred its own chip fab business to Global Foundries several years
ago and it was my understanding that they were tied to them for the following
10 years. But Global Foundries announced they were abandoning EUV so I don't
think they're going to be producing 7nm chips.

~~~
bgorman
From the link

> Samsung Electronics will manufacture the IBM POWER10 processor, combining
> Samsung's industry-leading semiconductor manufacturing technology with IBM's
> CPU designs.

~~~
tgflynn
Thanks, I missed that, didn't read the PR all the way to the end.

I wonder how they got out of their deal with Global Foundries.

~~~
monocasa
GloFo probably triggered all sorts of clauses when they announced the were
stopping R&D on newer nodes if I had to guess.

~~~
tgflynn
Makes sense. That announcement certainly surprised me, especially so soon
after they took over IBM's chip fab business.

------
ckastner
It's interesting to see that they are using Samsung's 7nm process. I thought
that, apart from the work they do for Apple, Samsung kept their high-end
fabbing mostly to themselves.

~~~
tooltalk
Samsung, Glofo and IBM were members of the Common Platform. My ex-roommate
used to work at IBM Upstate where they trained Samsung engineers.

Apple moved to TSMC in Taiwan not too long after Tim Cook appeared on CBS
claiming that the engines of their mobile devices were made in US, almost 6-7
years ago. Apple's share of Samsung's production isn't probably much these
days. But they are still #2 behind TSMC and Samsung also announced recently
that they are investing $100B for next 10 years in logic business which
includes their foundry.

~~~
ytch
Apple and Samsung had been partner for a long time. Some iPods and iPhone,
iPhone 3G, 3GS use Samsung arm processor. Apple A4 to A7 is made by Samsung
too.

But A8 is made by TSMC, A9 has two versions, APL0898 by Samsung, APL1022 by
TSMC. There were some debates on which one is better.

After that, all Ax process are made by TSMC.

------
gautamcgoel
It's bittersweet reading these IBM announcements. They clearly have amazing
hardware, but I'll probably never get to play with it, since they make no
effort to sell to consumers (unlike Intel, AMD, Nvidia, etc).

------
moonchild
> transparent memory encryption designed to support end-to-end security

Does this work with process isolation? I.E. can I make it so that each
process's memory is encrypted with a different key, to prevent snooping by
other processes? How (if at all) does that work with debuggers?

~~~
jiggawatts
I'm not sure about POWER, but in AMD EPYC it is implemented at the hypervisor
level. So each VM can have encrypted memory with a unique key, but within a VM
the processes see unencrypted memory.

It's typically implemented as an extension of the virtual memory page table,
and conceptually it wouldn't be too difficult to have finer-grained keys, such
as one for the kernel and one for user mode processes, or even one per
process.

~~~
moonchild
Interesting. Does that allay the concerns about speculative execution side
channel leaks in cloud VMs? (Because even if you can leak data from other VMs
running on the same physical device, that data will be garbage without the
other VM's encryption key.)

------
ognyankulev
Is IBM POWER the modern day Cray?

~~~
unixhero
A little bit recursive perhaps. But; IBM is modern day IBM.

------
ponker
Can anyone give a description of what these chips are used for and by whom?
And who writes software for these architectures? Seems like a totally
different side of the industry that I know nothing about.

------
fancyfredbot
Is there a more informative writeup somewhere? I couldn't find any data on
performance outside AI inference workloads. There is a footnote about 30 cores
but very little detail even on that.

~~~
mbrobbel
Here are some slides:
[https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf](https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf)

------
staticelf
Looks cool, but seems to be pretty unavailable unless you're a large business.
Also, very weird that they write about themselves in third person.

~~~
pavlov
The third person is standard format for press releases. It’s so that
journalists can copy bits verbatim without having to rewrite who did what.

------
nullifidian
Considering the performance of Power9, it's likely to be slower than the
modern x86 cpus, with some specialized exceptions.

------
fortran77
Any chance Apple will return to POWER CPUs and be a real "supercomputer"
again?

[https://jeffhendricks.net/wp-
content/uploads/2019/04/Powerma...](https://jeffhendricks.net/wp-
content/uploads/2019/04/PowermacG4.jpg)

~~~
EricE
Ha! I wouldn't hold my breath - indeed they are ramping up to use their own
system on chips (SoC) - not just CPUs - in future Macs.

If they deliver a significant increase in performance - even if it's only for
a few specific use cases - the ripples will be interesting to watch play out
for decades to come.

------
bogomipz
>"With hardware co-optimized for Red Hat OpenShift, IBM POWER10-based servers
will deliver the future of the hybrid cloud when they become available in the
second half of 2021."

Can someone say what co-optimized mean here? Is this just bad marketing speak?
What is intended to mean if so?

------
sys_64738
Is anybody buying non-ARM/non-x86 systems nowadays? Seems like a dying market.

~~~
Kaytaro
Yes, the financial sector loves them.

------
tikej
I had really high hopes for cpu with native float128 support (quad precision)
with POWER9 but after tests it turned out is only native in addition and
multiplication ops. We’ll see what new generation brings to the table.

~~~
SSLy
Addition and multiplication ops are the stark majority.

------
tdhz77
In this moment I do feel like Apple has tackled its goal of making computing
more accessible. I just didn’t realize ibm wouldn’t change their playbook.

------
Something1234
[https://news.ycombinator.com/item?id=24184786](https://news.ycombinator.com/item?id=24184786)

------
mixmastamyk
So, will you be able to install 'Blue Hat' on it? Or do they have another
niche OS for it instead?

~~~
wmf
Yes, RHEL (and Ubuntu) are supported.

~~~
pr91
\+ SLES also

------
SwimSwimHungry
Now if I can get one of these puppies in a future Raptor Computing build...
That would be the dream.

------
jp0d
I used to work on an IBM AIX system as a data warehouse developer. That was on
Power architecture but unfortunately I didn't know much about this
architecture back then.

------
TheMerovingian
Anyone know if it will be littleBig endian?

------
person_of_color
Nice. Where are the main HPC jobs located?

~~~
birdyrooster
National Labs in US

------
jbverschoor
Look at how huuuuuuge those dies are!

------
hestefisk
Imagine a Beowulf cluster of these.

------
Bud
Wait. IBM still makes CPUs? IBM still makes _anything_?

~~~
_-david-_
They design them but do not actually make CPUs. They are fabless.

