
Ampere Altra 80-core ARM CPU - cameron_b
https://www.servethehome.com/ampere-altra-80-arm-cores-for-cloud/
======
jnwatson
"What we will note is that Ampere de-rated both the AMD EPYC 7742 and Xeon
Platinum 8280 results by 16.5% and 24% respectively. This was done to adjust
for using GCC versus AOCC2.0 and ICC 19.0.1.144. Ampere disclosed this, and it
is a big impact. Arm servers tend to use GCC as the compiler while there are
more optimized compilers out there for AMD and Intel."

If I read this right, they reduce their competitors' benchmarks because they
have better compilers? Can anyone justify this?

~~~
p1necone
Very little software is actually compiled with AOCC and ICC. Really Intel and
AMD are being dishonest by publishing benchmarks that don't match reality. Of
course it's different if you're compiling everything yourself, then those
benchmarks might be relevant.

~~~
adev_
> It's different if you're compiling everything yourself, then those
> benchmarks might be relevant.

And even if you do it's irrelevant. Most common large/important frameworks
won't compile with proprietary compilers.

Doing Bench's with ICC, XLC or other is hypocritical and often does not
reflect anything useful.

Only the HPC world can afford to recompile everything with proprietary
compilers and justify the man power to do so. And even so, they already have
passed most compute intensive kernels on GPGPU with cuda a long time ago.

~~~
pierrebai
No, the correct thing to do is to publish multiple columns showing performance
under the different compilers. Large co do use the specific compiler that will
give them better performance. I know many big software compiled with ICC.
Then, there is the lack of tlak about MSVC. That's one standard compiler used
extensively.

In benchmarking you have two choices: publish the real numbers or not. Which
option you choose marks you as honest or not.

You can argue about why the numbers for your product are lower in the
discussion section of your report. Not in an asterisk.

------
wmf
STH has the slides and some analysis: [https://www.servethehome.com/ampere-
altra-80-arm-cores-for-c...](https://www.servethehome.com/ampere-altra-80-arm-
cores-for-cloud/)

~~~
dang
Ok, we've changed to that from [https://amperecomputing.com/ampere-altra-
industrys-first-80-...](https://amperecomputing.com/ampere-altra-industrys-
first-80-core-server-processor-unveiled/). Thanks!

~~~
Patrick-STH
Hi dang. I know you did not have to do that but I just wanted to say from the
STH team we appreciate it. Have a great day.

------
cameron_b
192 PCIe Gen4 lanes in 2P platforms - looks like they're optimizing for next-
gen storage bandwidth or potentially GPU / TPU integrations. This could be
interesting from a company that's been busy working on their B-to-B sales,
hopefully solving for problems that cloud platform providers actually have.

------
rwmj
For context this is an evolution of the Applied Micro X-gene (I believe this
is the 3rd generation). The 1st gen was the famous Mustang, one of the first
Aarch64 chips generally available that ran Linux. I still have one in my loft
somewhere.

Edit: I should note that if you used the X-gene 1 it was very slow, albeit a
reliable workhorse for early 64-bit ARM Linux development. These newer chips
have far better performance.

~~~
ksec
Thank You, there were too many ARM Server Startup, merger and acquisition I
sort of lost count. I think we are left with Ampere and one from Marvell.

I sort of record Applied Micro were doing POWER as well, is that still the
case with Ampere?

~~~
floatboth
Yes, seems like Ampere is the big public server player now.

Marvell bought Cavium which has the ThunderX line. (ThunderX2 being a rather
HPC-oriented chip, I think there's a supercomputer already built with it.)
Marvell also makes networking-gear-oriented smaller chips (e.g. Armada 8k),
one of which is in my little ARM Desktop (MACCHIATObin) :)

NXP (Layerscape) and Mellanox (BlueField) also make network-oriented chips
that have around 24 Cortex-A72 cores. NXP's is in SolidRun's newer workstation
product.

Meanwhile Amazon bought Annapurna Labs and they make the Graviton (2) for the
AWS cloud. This isn't something you can touch physically but it's going to
have the biggest impact of all things. This is the real confirmation that Arm
servers are legit and the x86/amd64 monopoly is over.

There's also Huawei HiSilicon's Taishan/Kunpeng stuff, which you apparently
can buy if you're a serious business, but now it's available in the public
Huawei Cloud, but only for the Chinese region it seems??

Oh and Fujitsu is making some epic chip with HBM2 memory and the new Scalable
Vector Extensions. But that's only available if you're making supercomputers.

And Nuvia is going to be a thing eventually.. they have not announced anything
yet, we have no idea which ISA they are even going to use (could be RISC-V or
POWER or SPARC for all we know) but a prominent UEFI/ACPI-on-Arm person is now
their VP of Software and is still referring to the Arm ecosystem as "we"
[https://twitter.com/jonmasters/status/1234734345350369281](https://twitter.com/jonmasters/status/1234734345350369281)
:)

And yeah.. press F to pay respects for Qualcomm Centriq and AMD Seattle.

~~~
ksec
>And Nuvia is going to be a thing eventually..

According to techcrunch they have confirmed it will be built on top ARM.

Edit: That is assuming they sort out their lawsuit with Apple.

[1] [https://techcrunch.com/2019/11/15/three-of-apple-and-
googles...](https://techcrunch.com/2019/11/15/three-of-apple-and-googles-
former-star-chip-designers-launch-nuvia-with-53m-in-series-a-funding/)

------
eoerl
the key issue when compared to Epyc is that this is mono-die, and not much
faster (even with metrics straight from Ampere). Mono-die means that the die
is huge, the yield is low, it's probably pretty expensive to produce (and the
reason why they went for 32MB cache, well below Arm's recommendations, core
count is a bigger seller than cache it seems). Unless they get massively
better performance (they don't), this has no chance vs a multi-die solution
which has a much better yield. Intel is cornered in a similar situation right
now. The same applies to Graviton, this stands absolutely no chance in the
long run.

Not saying that the future has to be multi-die, but if it is not, then it has
to be way faster than the cheaper-to-manufacture competition.

~~~
ksec
>the key issue when compared to Epyc is that this is mono-die,

This die cost metrics is way overblown and its narrative is too narrowly
focused. Especially on ARM Server where unit cost dynamics with ARM IP along
with much higher margin on server CPU lower the multi die BOM benefits. And
the same definitely does not apply to Graviton, which Amazon owns the whole
stack.

~~~
shdh
I disagree.

More chips per wafer results in more yield, less chips with potential flaws.

> Dr. Hansch’s research at Universitat Munchen for example, shows how as die
> size increases manufacturers realize an accelerating yield loss and thus
> accelerating manufacturing cost. Using their model, assuming best-case
> defect densities, AMD’s small chiplet approach achieves 90% yields vs.
> Intel’s 30%-40% yields from its large, monolithic die approach. [1]

[1] [https://www.barrons.com/articles/amd-stock-can-
gain-87-fund-...](https://www.barrons.com/articles/amd-stock-can-gain-87-fund-
manager-51566400829)

~~~
ksec
Since you actually did some research and posted a link I will go along with
those numbers.

>We estimate Intel’s total server die cost at $162 per good server chip while
AMD costs about $108 per good server package.

EPYC is 8 Die + an IOD, assuming the massive IOD of ~420mm2 only cost $10 (
That is suggesting GF are selling 14nm / 12nm Wafer at ~$1.2K, even under the
AMD WSA with GF would likely not be feasible ), that number suggest the cost
per Compute die would be ($106 - $10)/8 = $12, which is again off by quite a
bit. Compared to equivalent die size cost of Intel, which is not even accurate
because intel do not require the 65%+ Gross Profit Margin from TSMC and it is
on an older mature node. And its yield are too pessimistic, we do not have any
numbers from Intel on defect per mm2, but taking a guess from Nvidia's massive
~800mm2 die isn't too far off.

So again, let's assume both of those numbers are "relatively" correct. Do you
think $62 would matter for an Intel® Xeon® Platinum 8280 Processor with
Recommended Customer Price of $10K? Or the same die they are selling at the
lower end for $3K?

While it would definitely be good to have those $62 as profits, the reality is
on the server market its advantage is relatively minimal.

The bulk of the benefits of Chiplet approach is that you can reuse one or two
designs and have it deployed across the whole range of market from Server to
Desktop. As design cost increases with Pure Play Foundry such as TSMC this is
extremely important for Fabless Design company like AMD, it cost them hundreds
of million per design variation. Intel would have that problem down the road
but it is mostly migrated for now because all of their design and fabs are in
house, and would not cost them as much.

And since the original post was specific to ARM and Server, which has a
different market dynamics in cost, you have less R&D as compared to x86
market. In ARM you are paying for IP price on the N1 core and some
Interconnect, and those cost of Spread among all ARM players. Hence the
conclusion of Chiplet is everything isn't as clear cut and why I said it is
too narrowly focused.

------
imtringued
How long until they shut it down? ARM vendors are infamous for quitting before
releasing their products.

~~~
reitzensteinm
The category is here to stay, regardless of which company is the first to make
it stick. Probably better to give them the benefit of the doubt.

------
tmikaeld
I'm curious how this will perform vs AMD's Epyc line in terms of performance
per Watt on different workloads.

~~~
drewg123
It depends on the workload. We tried the ampere emag, and what killed it for
us was that TLS performance was nowhere near modern x86-64 CPUs (Intel or AMD)

~~~
Rebelgecko
Was that heavily cipher dependent? I wouldn't be surprised if Chacha20
performed much better than AES w/o any hardware acceleration (other than SIMD
instructions)

~~~
drewg123
Ah, that's interesting. We don't use chacha20. This was AES-GCM

~~~
Rebelgecko
The situation is probably better now that ARMv8 has some crypto-specific
instructions, but AES-GCM on older ARMs performed awfully without the
instructions specifically for doing AES and Galois field multiplication

------
purplezooey
It would be neater if it were the Core Ultra 80-amp CPU

------
eecc
What’s the point of these wall of text without some hardware-porn?

;)

------
m0zg
Looks like a company by a bunch of ex-Intel people. How are they doing on
Spectre/Meltdown and other bugs caused by the culture of cutting corners?

~~~
wmf
The N1 core was designed by Arm Austin. It includes "The traps for EL1 and EL0
cache controls, PSTATE SSBS (Speculative Store Bypass Safe) bit that supports
software mitigation for Spectre Variant 4, and the speculation barriers (CSDB,
SSBB, PSSBB) instructions..."

------
mmoez
Growing sick of the trend of naming companies after famous scientists...

~~~
monocasa
You know that's a unit of electricity as well, right?

~~~
new_realist
A unit of electricity named after a famous scientist.

~~~
monocasa
And?

~~~
znpy
and he/she is growing sick of that, apparently

~~~
DannyB2
A "he" apparently...
[https://en.wikipedia.org/wiki/Andr%C3%A9-Marie_Amp%C3%A8re](https://en.wikipedia.org/wiki/Andr%C3%A9-Marie_Amp%C3%A8re)

He would be spinning in his grave. Which would generate an AC current.

~~~
posterboy
He would be well grounded though having no effective voltage. Bad joke,
although, it has potential .

------
walrus01
I've said this before on HN, but no ARM platform is going to catch on for
server use, chicken or egg problem, until people can buy a reasonably priced
ATX motherboard and CPU by themselves and build a PC with it. The motherboard
needs to have the same complement of I/O and bus ports that you can find on a
$90 to $120 category x86-64 motherboard.

There is nothing _anywhere near_ the performance of what I can get right now
by buying a $110 motherboard from one of the top six Taiwanese motherboard
manufacturers, and a $150 Ryzen 3000 series to socket into it.

Linux and *BSD developers are not going to be shelling out $6000 for a 2U
rackmount noisy system that's impossible to operate nicely in their home
offices.

~~~
wmf
ARM servers already happened; they're in production at AWS and probably MS.
Most of the software development is already done; since developers wouldn't
buy $6000 machines they just got access to them for free.

