
Ampere’s Product List: 80 Cores, up to 3.3 GHz at 250 W; 128 Core in Q4 - rbanffy
https://www.anandtech.com/show/15871/amperes-product-list-80-cores-up-to-33-ghz-at-250-w-128-core-in-q4
======
vxNsr
Sooo... is intel like crying in a corner right now? On one side we have AMD
eating their lunch in the consumer space, they still haven’t launched a full
gamut of 10nm CPUs. Apple just announced that they’re dropping them in
basically the next 5 years. And now ARM really is encroaching on their core
server business.

I feel like in 20 years from now we’re gonna be using intel as a cautionary
tale of hubris and mismanagement. Or whatever it is that caused them to fail
so spectacularly.

~~~
raxxorrax
Honestly I think the proclaimed death of intel is vastly exaggerated. AMD came
back from worse places and they do still have the manufacturing edge. Intel
CPU for desktop still use less power, which is a big plus. How many people do
you know that bought the fastest CPU available recently? Glad AMD is back on
track, they were in a rough place, far worse than intels current situation.

~~~
blattimwind
For what it's worth, Intel is still faster in most applications, simply by
virtue of having a clock speed advantage that by far exceeds any IPC
difference, and also by having _much_ lower memory latencies. AMD has
basically a 20-30 ns extra latency over Intel; so with good memory you can do
~45 ns on current Intels, but that will give you ~65 ns on a Ryzen. That's
significant for a lot of code (e.g. pointer chasing, complex logic etc.).

On the other hand, few applications scale _efficiently_ to more than just four
cores. Yes, of course, AMD delivers more Cinebenchpoints-per-Dollar and
usually more Cinebenchpoints overall, but that's not necessarily an
interesting metric.

Personally I find that if I'm waiting on something to complete that the
application in question tends to use only a tiny number of cores for the task
at hand. Usually one.

Another significant weakness of AMD's current platform is idle power
consumption.

These factors leave me with a much more nuanced impression than "Intel is ded"
or "HOW IS INTEL GOING TO CATCH UP TO THIS????"; CPU reviews these days are
just pure clickbait.

~~~
jchw
The problem is a lot of tasks that people want their CPU to be fast at is
exactly stuff that parallelizes almost embarrassingly well. Compiling code,
video rendering, compressing files. People buying CPUs for this are not as
concerned about how many cycles it takes to jump through a vtable as long as
its not slow.

Meanwhile, pointing at memory latency as the flaw in Ryzen has been a popular
misdirection for a while now. People warned me about it being a performance
pitfall since before I bought my first Ryzen processor. In practice it doesn’t
show up in even the most complexity intensive workloads as a serious issue.
For example, Zen 2 performs very well on hardware emulation. This is possibly
because where it takes a hit in memory latency it makes up in caching and
prefetching, but honestly I don’t know and I am not sure how to measure. In
any case it’s certainly favorably comparable to Intel’s best chipsets in
single core workloads even if not on top. Factor in price and multicore
workloads and you now have the exact reasons why people like me have been
singing the praises... Intel’s single core lead may exist in some form but it
is not what it once was, it is not an unconditional lead where an Intel core
beats an AMD core. Not even close.

None of this means Intel’s dead of course, but IMO thats mostly because they
have a lot more going on than just being the best CPU. They’ve got their
dedicated GPU coming out, and plenty of ancillary technology as well. It does
seem like for a company like Intel having to take a backseat in CPUs for a
while will be painful; unlike AMD, this is a new position for Intel and maybe
not one they will handle well.

~~~
user5994461
I want my video games, email reader, word, youtube, IDE and general python
code to run faster. None of those are parallelizing much of anything.

~~~
dageshi
Your email reader, word, youtube and IDE isn't likely to push the limits on
any modern CPU, your video game is increasingly optimized around multiple
cores because modern consoles ship with multi core cpu's and they need all the
performance they can out of them. Only thing that might benefit from single
cpu performance is probably your general python code.

~~~
user5994461
Gmail and the IDE take ten seconds to load, while youtube is destroying any
CPU to watch a 4k video (or 1080p on a battery saving laptop).

Youtube is possibly the single largest root cause for users upgrading laptops
over the past 10 years. They made a silent transition to 60 FPS videos last
year which cut hundreds of millions of users from watching HD.

~~~
StillBored
Destroying CPU in some configurations....

[https://www.youtube.com/watch?v=ef1wAfrMg5I](https://www.youtube.com/watch?v=ef1wAfrMg5I)
is ~10% of 1 cpu on my desktop using chrome.

OTOH, I know what your talking about, my linux machine hates youtube, but
that's because even with the chromium freeworld fork with some codec
acceleration its still burning CPU like crazy.

So, a big part of this isn't a hardware problem so much as a software one
combined with the constant fights over who's codec is the one true choice. AKA
its a youtube and !windows/android+chrome problem.

------
DCKing
It's worth noting that this is based on ARM's Neoverse N1 IP, which is also
used in the AWS Graviton2. The Graviton2 benchmarks damn close to the best AMD
and Intel stuff, so this chip looks very promising [1]. It's really looking to
be a breakthrough year for ARM outside of the mobile market.

[1]: [https://www.anandtech.com/show/15578/cloud-clash-amazon-
grav...](https://www.anandtech.com/show/15578/cloud-clash-amazon-
graviton2-arm-against-intel-and-amd)

~~~
Refefer
Phoronix paints a very different picture, especially in non-synthetic
workloads[1]. Gravitron2 looks like a nice speedup over the first generation
but either the optimization isn't there yet or there are areas which need
additional work to become more developer/HPC competitive. That said, I'm
thrilled we have competition in the architecture space for general purpose
compute again.

[1] [https://www.phoronix.com/scan.php?page=article&item=epyc-
vs-...](https://www.phoronix.com/scan.php?page=article&item=epyc-vs-
graviton2&num=1)

~~~
DCKing
Interesting data. Curious whether there's a logical explanation for these
discrepancies in their setups.

~~~
karkisuni
Didn't go too deep into it, but the AMD cpus being compared are different.
Anandtech has an AWS-only EPYC 7571 (2 socket, 32 cores each, 2.5ghz),
Phoronix has EPYC 7742 (1 socket, 64 cores, 2.2ghz). On top of that, Anandtech
is using another AWS ec2 instance and Phoronix is testing on a local machine
on bare metal.

Still would be interesting to know what differences caused the gap in results,
but their setups were pretty different.

~~~
jeffbee
That doesn't seem like it could explain a 20x difference in PostgreSQL
performance.

------
jeffbee
Does anyone have an evaluation board for these things? Their marketing
materials scream "scam" to me. For one thing they compare to competing x86
parts by arbitrarily downrating them to 85% of their actual SPECrate scores.
Why? Then they switch baseline x86 chips when making claims about power
efficiency ... for performance claims they use the AMD EPYC 7742 then for
performance/TDP they use the 7702, which has the tendency to make the AMD look
worse because it is spending the same amount of power driving its uncore but
it's 11% slower than the 7742.

Also, without pricing, all these efficiency claims are totally meaningless.

~~~
IanCutress
We're working with Ampere to get access when they're ready to let us test.

~~~
fomine3
I hope AnandTech get many-core EPYC Rome like 7702, 7502 for review.

------
jzwinck
This reminds me of Tilera, who had a 64 core mesh connected CPU ten about ten
years ago. The problems seemed to be it was harder to optimize due to the mesh
connectivity (like NUMA but multidimensional), low clock speeds, and lack of
improvement after an initially promising launch.

Will this be the same? It seems possible. Does it really get more work done
per watt than x86?

And why does the article say "These Altra CPUs have no turbo mechanism" right
below a graphic saying "3.0 Ghz Turbo"?

~~~
rbanffy
You need a lot of memory bandwidth and large caches, or else the cores will
starve. That's also why IBM mainframes have up to 4.5 GB of L4 cache.

~~~
zozbot234
That's true of all high-frequency/high-core count hardware. Which is why
running Java or Python codes on this hardware makes very little sense. Rust is
more like it. Golang in a pinch.

~~~
imtringued
It's the opposite. Running lots of poorly optimized processes allows you to
amortize memory latency. If your software suffers from cache misses then it's
not going to run out of memory bandwidth any time soon. Adding more threads
will increase memory bandwidth utilization. Meanwhile hyper optimized AVX512
code is going to max out memory bandwidth with a dozen cores or less.

~~~
rbanffy
> it's not going to run out of memory bandwidth any time soon

No, but the higher the memory bandwidth, the sooner those processes can get
back to their inefficiency.

~~~
CyberDildonics
That's really not true. Memory bandwidth, just like memory capacity becomes a
bottleneck when it is exceeded, but more doesn't automatically speed anything
up. Java and python programs will likely be hopping around in memory and
waiting on memory to make it to the CPU as a result.

Typically only multiple cores running optimized software that will run through
memory making heavy use of the prefetcher will exceed memory bandwidth.

------
samcat116
Products like this show that Apple could have an ARM based Mac Pro in two
years relatively easily. They already have PCIe Gen 4. TDP and memory capacity
is already more than intel provides in the Xeon workstation line that they
use.

~~~
ed25519FUUU
I think it’s a good time to invest in a Mac Pro. While working from home I’m
asking myself the benefit of a laptop when a desktop could give me so much
more performance.

~~~
coder543
The other person who replied to you probably paid half or a third of what you
would pay for an equivalent Mac Pro.

The profit margins on the Mac Pro are just incredible. (Yes, I'm sure that
equivalent professional workstation brands _also_ have huge profit margins...
no, that doesn't make me want to pay those lofty prices more.)

The _only_ real value the Mac Pro provides is that it's the most powerful
computer you're allowed to run macOS on legitimately. If you can do your work
from Windows (with WSL) or Linux, you can save upwards of tens of thousands of
dollars by building your own workstation, and that workstation can be
significantly more powerful than _any_ current Mac Pro at the same time.

For video professionals who rely on FCPX or similar macOS-only software, they
don't really have a choice, and they get the opportunity to essentially pay
$10k to $20k just for a license of macOS, which is fun.

~~~
systemvoltage
I have a hackintosh (i7-8700k) and it feels about 2x faster than the top spec
$4000 macbook pro latest 2019 model (subjective opinion ofcourse). It is such
a huge difference, especially when using PyCharm and Adobe apps.

It is pricey but if it is something you want to buy for 5 years, it is about
$100/month cost. Some people might want to buy it.

------
emmanueloga_
Is anybody else confused by the "Ampere" brand name? I was trying to figure
out what Ampere is...

* There's one "Ampere Computing" [1], but I guess I'm not "in the know" since it is the first time I heard about it :-/

* There's one Ampere [2], "codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia".

Are both things related? Is "Nvidia's Ampere" developed by "Ampere" the
company?

Also, I think Ampere is kind of a bad name for a processor line... just makes
me think it of high current, power-hungry, low efficiency, etc. :-)

1:
[https://en.wikipedia.org/wiki/Ampere_Computing](https://en.wikipedia.org/wiki/Ampere_Computing)

2:
[https://en.wikipedia.org/wiki/Ampere_(microarchitecture)](https://en.wikipedia.org/wiki/Ampere_\(microarchitecture\))

~~~
why_only_15
They are not related as far as I can tell other than being named "Ampere".

------
shadykiller
Most logical naming of processors I’ve ever seen. E.g: Q80-33 - 80 Cores 3.3
Ghz Q32-17 - 32 Cores 1.7 Ghz

------
sradman
> Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances,
> Ampere’s goal is essentially to supply a better-than-Graviton2 solution to
> the rest of the big cloud service providers (CSPs).

So the question is whether they can land Google, Microsoft, and/or Alibaba as
customers for an alternative to AWS M6g instances.

~~~
klelatti
Oracle is an investor ($40m) and Techcrunch reports that they have been
working with Microsoft so sounds like they are making progress on getting into
the major cloud providers.

------
cesaref
I'm interested to know what applications really scale to these core counts.
When I was working with large datasets (for finance) other bottlenecks tended
to dominate, not computation, so memory pressure, and throughput from the SAN
were more important.

These high density configurations were key when rack space was at a premium,
but these days, power is the limitation, so this is interesting to provide
more low power cores, i'm just not sure who is going to get the most benefit
from them though...

~~~
tyingq
Plain old io-bound multiprocess work would be a good match. Like static
content and php sites, for example. I imagine there's quite a lot of that out
there.

~~~
ed25519FUUU
I'd wager to say the bulk of the web is CPU bound.

------
rbanffy
As cool as it is, these server announcements are somewhat disheartening.

I want a workstation with one of these.

~~~
gpm
It has PCIE lanes, what, other than price, stops you from buying a rack,
sticking a graphics card in it, and calling it a workstation?

~~~
rbanffy
Two reasons, mostly.

Aesthetics is a big thing - rackmount servers are ugly and, unless there are
panels covering it, they are horrendous deskside workstations.

Another one is noise. These boxes are designed for environments where sounding
like a vacuum cleaner is not an issue. Because of that, they sound like vacuum
cleaners, with tiny whiny fans running at high speeds instead of more sedated
larger fans and bigger heat exchangers.

HP sold, for some time, the ZX6000 workstation that was mostly a rack server
with a nice-looking mount. If someone decided to sell that mount, it'd solve
reason #1, at least.

~~~
greggyb
Shove it in a full tower case. You can mount most server hardware easily in
such a case. At that point, you can cool with big slow fans.

------
spott
I'm kind of curious: what is the selling point of an ARM server? Why would I
use an ARM instance on AWS or similar instead of an x86?

Are they significantly cheaper per GHz*core? If so, how hard is it to make use
of that power, will a simple recompile work?

~~~
lowmemcpu
Yes. Here's what AWS' page says

> deliver significant cost savings over other general-purpose instances for
> scale-out applications such as web servers, containerized microservices,
> data/log processing, and other workloads that can run on smaller cores and
> fit within the available memory footprint.

> provide up to 40% better price performance over comparable current
> generation x86-based instances1 for a wide variety of workloads,

From what I read, it's not terribly hard to tell your compiler to compile for
a particular instruction set, you just need to do it. Cost savings and better
performance are great incentives, as well as Apple moving their Mac platform
to it will drive more market share for developers to take the time to
recompile.

Edit: Forgot to add the source of those quotes:
[https://aws.amazon.com/ec2/graviton/](https://aws.amazon.com/ec2/graviton/)

~~~
bluGill
It might or might not be hard to compile for a different cpu. Intel lets you
play fast and loose with mutil threaded code without as many race conditions.
As a result code that works fine on Intel often randomly gives wrong results
on arm. Fixing this can be very hard.

Once it is fixed you are fine. Most of the big programs you might use are
already fixed. Some languages give you gaurentees that make it just work.

~~~
FnuGk
What is different on intel since you can play fast and loose with multi
threading? Two threads reading and writing the same memory area without and
locking would give problems regardless of the ISA or am i missing something?

~~~
prattmic
ARM has a weakly-ordered memory model, while x86 is much more strongly-
ordered. See
[https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory...](https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering).

So e.g., on x86 if you store to A then store to B, then if another core sees
the store to B it is guaranteed to see the store to A as well. This guarantee
does not exist on ARM.

------
nullifidian
How come there isn't a trademark issue with NVidia? I was very confused for a
moment.

~~~
dbancajas
"Ampere" can't be trade marked since it's a name of a scientist? Unless they
are operating on the same market/segment and can prove there is willful intent
to defraud customers? probably a hard sell.

~~~
nullifidian
So is Tesla. And Ford is a name of an entrepreneur. Are these also not
trademark protected?

>they are operating on the same market/segment

They are. Called computation.

>willful intent to defraud customers

Is it a requirement? I doubt it.

btw, I only clicked the link because I thought of the Nvidia's product, so
they are definitely getting eyeball traffic due to the name.

UPD: I recognize that I'm unlearned in trademark law, so I'm not insisting on
anything.

~~~
klelatti
Ampere was founded (and presumably name registered) in 2017, Nvidia's Ampere
announced in 2020?

Ampere had products on sale in 2019.

If there is a case I can't see Nvidia winning it.

~~~
nullifidian
Nvidia's roadmap for microarchitecture names goes way back. I can google up
NVidia Ampere mentions in 2017.

~~~
klelatti
I think it was rumours in 2017 with an actual announcement later but in any
event I'm not sure using a name on a slide has the same weight as using for a
real product being bought by customers.

How long were Ampere planning to use the name before 2017 and then does Nvidia
using it on a slide in a presentation force them to change it? Still think
Nvidia would lose on this one.

~~~
sitkack
How about companies stop co-opting the names of famous scientists? Have a
little more creativity.

~~~
yjftsjthsd-h
To be fair, naming things is a pain. It's the same problem we have naming
software/services (i.e. the neverending "Show HN"/launch posts with comments
"this name conflicts with the following _multiple_ other things").

------
fizixer
Am I the only one who is super-annoyed at having to figure out everytime if
this is Ampere the company or Ampere the new nVidia line?

I mean it's probably not the fault of either, and a huge coincidence we're
getting a flurry of news articles about both in summer of 2020, but come'on
(can we have some kind of edits in the titles of HN posts to make the
distinction clear?).

------
unexaminedlife
The thing that has me bearish on cpu manufacturers in general... From what I
understand parallel architectures vastly simplify the overall schematics of
CPUs in general, while retaining the power-saving benefits.

As we approach the critical velocity (supply / demand) for parallel
architectures, the prospects of bootstrapping a CPU manufacturing company will
become extremely feasible. IMO currently it's mostly the specialized knowledge
needed to design CPUs that keeps this mostly out of reach today.

I'm no expert, just have an interest in the space, so any dissenting opinions
/ facts welcome.

------
goerz
Can anyone explain in a few sentences _why_ the ARM architecture seems to
outperform traditional CPUs so much? What fundamentally prevents Intel from
building something comparable?

~~~
dahfizz
It is a Reduced Instruction Set computer. It's a greatly simplified design.

The x86_64 ISA is absolutely insane. The only way to implement it in hardware
efficiently is to "compile" the super complicated instructions into micro-ops
which can actually be decoded and executed on the CPU.

Said another way, Intel has to implement a compiler in hardware which compiles
the machine code before it gets executed. The extra complexity means more
power and less performance.

You can read more about how microcode and micro ops work here:
[https://en.m.wikipedia.org/wiki/Intel_Microcode](https://en.m.wikipedia.org/wiki/Intel_Microcode)

~~~
jeffbee
So, that's the freshman-year CS view of the topic, but back here in reality
land the "complicated" x86 instruction format has pretty much destroyed all
others and none of the supposed advantages of RISC actually exist. Remember
that the whole point of RISC is that the CPUs would supposedly run faster.
That hasn't happened. There are no RISC CPUs running faster than state-of-the-
art x86 CPUs. POWER8 comes closest, but does not exceed.

The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not map
well to transistors, and they have to be unpacked into uops to be executed.
This is a form of compression. Having a compressed program image turns out to
be a massive advantage. RISC proponents thought that x86 was so complicated
they could beat Intel with their simple instruction decoders. That almost, but
not really, made sense in 1990 but since then has made increasingly less
sense, until today where the amount of sense this makes has hit zero. The x86
instruction decoder is a very small part of the floor plan of a modern CPU and
every time they rev the microarchitecture it gets smaller. The number of
transistors needed to decode the VEX prefix is like a speck of sand on the
beach of a 512x512-bit multiplier.

~~~
acidbaseextract
> The whole RISC philosophy was a huge mistake. Yes, x86 instructions do not
> map well to transistors, and they have to be unpacked into uops to be
> executed.

The RISC philosophy wasn't a mistake. Our architectures have just become more
sophisticated so that we don't have to make a binary choice. The hybrid is
good. The internal uops get the pipelining advantages of RISC, while we get
the encoding compression of a CISC instruction set.

------
klelatti
Two questions:

Does TSMC have the capacity to support AMD / AWS / Ampere etc making a
significant dent in the server market alongside longstanding commitments to
Apple etc?

Given how much they spend on Intel CPUs to what extent is it worth AWS /
Oracle etc making low hundred million dollar investments in their own silicon
or startups like Ampere just to keep Intels pricing competitive?

~~~
ksec
>TSMC....

TSMC _never_ had capacity problem. Which mainstream media likes to run the
story. You dont go and ask if TSMC has another spare 10K wafer capacity
sitting around. TSMC plans their capacity based on their client's forecasting
and projection _many_ months in advance. They will happily expand their
capacity if you are willing to commit to it. Like how Apple was willing to bet
on TSMC, and TSMC basically built a Fab specifically for Apple.

This is much easier for AWS since they are using it themselves with their own
SaaS offering. It is harder for AMD since they dont know how much they could
sell. And AMD being conservatives meant they dont order more than they are
able to chew.

>Given how much they spend on Intel CPUs to what extent is it worth AWS /
Oracle etc making low hundred million dollar investments in their own silicon
or startups like Ampere just to keep Intels pricing competitive?

I am not sure I understand the question correctly. But AWS already invested
hundreds of millions in their own ARM CPU called Graviton.

------
paulsutter
Maybe Intel should become a fab like TSMC and leave the CPU market to more
innovative folks

~~~
ksec
They did with Intel Custom Foundry. They tried and they failed. And they
currently have no intention to try that again. At least not until they admit
defeat. Which is going to take at least another few years if not longer.

~~~
dralley
>They did with Intel Custom Foundry. They tried and they failed.

From what I've heard, they didn't try very hard. Apparently they thought all
they had to do was make chips, and that the sheer "technical superiority" of
their process meant that they could treat their customers as second-class
stakeholders, withhold information about their production timelines, etc.

------
rurban
The most interesting blurp I read was "superscalar aggressive out-of-order
execution". But I read nothing about security mitigations or concerns with
such "aggressive" optimizations.

