
AWS Graviton2 - yarapavan
https://perspectives.mvdirona.com/2020/01/aws-graviton2/
======
otterley
(I work for AWS. Opinions are my own and not necessarily those of my
employer.)

I've been doing some initial M6g tests in my lab, and while I'm not able to
disclose benchmarks, I can say that my real-world experience so far reflects
what's been claimed elsewhere.

Graviton2 is going to be a game changer. It's not like the usual experience
with ARM where you have to trade off performance for price, and decide whether
migrating is worth the recompilation effort. In my lab, performance of the
workloads I've tried so far is uniformly _better_ than on the equivalent M5
configuration running on the Intel processor. You're not sacrificing anything
by running on Graviton2.

If your workloads are based on scripting languages, Java, or Go, or you can
recompile your C/C++ code, you're going to want to use these instances if you
can. The pricing is going to make it irresistible. Basically, unless you're
running COTS (commercial off-the-shelf software), it's a no-brainer.

~~~
vbezhenar
It is surprising, because I was under impression that Java has so many
optimizations for x86 and ARM was so new that it's almost impossible to beat
x86 without very significant investments. It's nice to hear that I was wrong.

~~~
lallysingh
x86's age and complexity give it a significant disadvantage. Both the cache
coherency model and the instruction set incur a lot of overhead to do at
speed.

~~~
atdt
That's very interesting. Could you elaborate?

~~~
hnuser123456
A lot of the most commonly run software out there doesn't use the large,
complex instructions offered on x86, so a bunch of pristine silicon goes to
waste. Use the space taken by AVX512 etc to make more, simpler cores, and you
get more performance for the same price, or less cost for the same
performance. Simpler cores are easier to clock higher with less voltage, and
less likely to have defects that would pull down yields.

~~~
Dylan16807
The big vector units aren't the problem though. They're a consequence of the
big complicated schedulers that most x86 cores are designed with. As long as
the core has to be huge anyway, you might as well spend some space on more
powerful math units.

It's possible to design an x86 chip with much more priority on throughput per
square centimeter, with many more simple cores working together, but I have no
idea how it would work out.

~~~
lallysingh
There's a lot of logic and complexity on decoding, fusing, etc.

------
Dunedan
> Here’s comparative data between M6g and M5, the previous generation instance
> type

Instead of comparing the 7nm Graviton2 processor against an 14nm Intel
processor, I'd like to see its performance compared to an AMD Epyc 2
processor, which would be a more apples-to-apples comparison as both are "7nm"
parts. Unfortunately Epyc 2 processors aren't available from AWS yet (but are
already announced: [https://aws.amazon.com/de/blogs/aws/in-the-works-new-amd-
pow...](https://aws.amazon.com/de/blogs/aws/in-the-works-new-amd-powered-
compute-optimized-ec2-instances-c5a-c5ad/)).

~~~
awill
If Epyc CPUs aren't available, then it isn't apples to apples.

A customer doesn't care about nm. They care about what's available. The apples
to apples comparison is The best x86 available vs the best ARM available.

~~~
jrockway
You can buy Epyc v2 CPUs from Newegg. It's just that AWS doesn't have instance
types that use them.

~~~
rumanator
> It's just that AWS doesn't have instance types that use them.

AWS offers instance types with Epyc CPUs

[https://aws.amazon.com/ec2/amd/](https://aws.amazon.com/ec2/amd/)

~~~
jrockway
Yeah, but not second generation Epyc.

Their blog says the instance type will be called C5a:
[https://aws.amazon.com/blogs/aws/in-the-works-new-amd-
powere...](https://aws.amazon.com/blogs/aws/in-the-works-new-amd-powered-
compute-optimized-ec2-instances-c5a-c5ad/)

Right now, according to the official C5a page, they are "coming soon".

------
gok
There are some interesting implications for widely deployed processors that
are literally never publicly seen because they spend their whole life in a
highly locked down data center. I wonder if things like Meltdown could have
ever been discovered if researchers could only poke at the chips via EC2.

~~~
QuinnyPig
I believe the “metal” variants expose the processor extensions you’d need to
discover Meltdown.

~~~
floatboth
Also, AWS processors use off-the-shelf Arm Cortex/Neoverse cores, and stuff
like Spectre is core-level.

------
yarapavan
* [James Hamilton] believe there is a high probability we are now looking at what will become the first high volume ARM Server. More speeds and feeds: >30B transistors in 7nm process 64KB icache, 64KB dcache, and 1MB L2 cache 2TB/s internal, full-mesh fabric Each vCPU is a full non-shared core (not SMT) Dual SIMD pipelines/core including ML optimized int8 and fp16 Fully cache coherent L1 cache 100% encrypted DRAM 8 DRAM channels at 3200 Mhz

* ARM Servers have been inevitable for a long time but it’s great to finally see them here and in customers hands in large numbers.

~~~
nine_k
What really stands out for me is "100% encrypted DRAM".

How efficient is this? Can different cores have different encryption keys, so
that different VMs under a hypervisor can't benefit from breaking the
hypervisor's protections?

~~~
amluto
Some Intel chips can encrypt memory with different keys for different VMs.
This sounds great for marketing but adds basically no security whatsoever. The
feature is called MKTME.

What’s going on here is that “different keys for different VMs” does not
actually improve isolation without a considerable amount of hardware or
microcode enforcement. AMD has this type of tracking of which VM is which.
Intel does not. I don’t know what AMD does.

In any case, exception makes little difference. Cores aren’t bound 1:1 to VMs,
so the core can access any VM’s data if it wants. And actually clearing the
key on a context switch would require flushing caches and require that there
is no cache shared between cores. The performance hit would be extreme.

~~~
thu2111
In fairness to Intel they also have SGX which has encrypted RAM and also a lot
of isolation logic, as well as working RA, recovery, versioned sealed data and
a lot of other things that AMD's equivalent just doesn't do well or at all.

~~~
amluto
This is true, but you can’t put a VM in SGX without massive software hackery.
Also, SGX has been broken so many times in the last couple years that it’s
silly.

~~~
thu2111
SGX has been broken by totally new classes of attacks and has been
successfully renewed via microcode patches every time.

SEV was broken once, completely (at least on EPYC) in such a way that it could
not be fixed. From what I understand.

So I'll give Intel a break here. Their performance is much better than AMDs.

The whole point of SGX is that people tried making an entire VM the security
surface. That was the prior generation of tech (Intel LaGrande/TXT) and it
didn't work. There's far too much code in an entire OS like Linux to make it
secure or auditable (and without auditing none of these schemes mean
anything).

Enclaves are a design idea that says, shrink the amount of code you have to
trust and read to the smallest size possible. Only then do you have a chance
of security.

It's unfortunate that this lesson has been learned and is now being lost
again.

~~~
amluto
> SGX has been broken by totally new classes of attacks and has been
> successfully renewed via microcode patches every time.

As far as I can tell, it’s only “successfully renewed” if you have HT off. If
HT is on, SGX is dead.

------
neonate
[https://web.archive.org/web/20200125180037/https://perspecti...](https://web.archive.org/web/20200125180037/https://perspectives.mvdirona.com/2020/01/aws-
graviton2/)

------
DSingularity
Amazon will dominate cloud computing with these server CPUs. Who can compete
with vertical integration at the sheer scale of AWS? AWS usage patterns tell
them exactly _what_ to accelerate with silicon. A process that has been
largely driven by Intel will be replaced by a process driven by the customer
workloads themselves. These processors will only get better with time.

~~~
pdelgallego
That is basically Amazon scale playbook 101 (aka flywheel), if there is an
efficiency that can benefit the customer (e.g. lowering prices), they will
chase it.

It doesn't matter if that means designing Graviton2 or challenging Fedex by
trying to build the biggest delivery network in the USA.

------
pranith
I wonder when and how Azure and Google Cloud will compete with AWS in this
market.

They could buy ARM processors available in the market, but I doubt they will
be able to get them as cheap as AWS who builds their own.

~~~
gundmc
Does Microsoft make any of their own silicon?

It feels like Google has been directing their in-house designs on ML/TPUs
while Amazon went all in on ARM. It will be interesting to see how those bets
pay off.

~~~
floatboth
No, but Microsoft bought some off the shelf Ampere and Cavium/Marvell servers.
But they keep them for internal use only for now :(

Huawei makes their own silicon and servers with that silicon — also only
internal, not available on huaweicloud :(

The only other player is Scaleway who bought first gen Cavium ThunderX's way
back when. And Packet of course but that's bare metal only, no cheap small
VPSes.

~~~
whs
Huawei Cloud does have Kunpeng ARM servers available in some AZ (at least I
know Bangkok AZ2 has some). They also run managed Redis on ARM so cheap that
it will cost more to run it yourself on Intel VM.

I'm excited to see the price drop when Elasticache moves to ARM.

~~~
floatboth
huh! I see now that they are mentioned on the Chinese Mainland website, but
not on /intl.

------
miohtama
This is good news. Are Linux server distributions for ARM64s yet on-par with
their PC counterparts? Getting base layer software is not going to be an
issue?

~~~
QuinnyPig
Been using Ubuntu on one for a few weeks; the only things I missed was a few
Docker containers that weren’t built, and aws-vault didn’t have an ARM binary.
I built my own, and aws-vault shopped a new release with ARM support 20
minutes after I whined about it on Twitter.

Everything else has been flawless.

------
dehrmann
Are there any security fears with virtualization on ARM (think Meltdown and
Spectre)? I'd think it's been less studied than Intel's x86-64 chips.

------
ijl
What services are people using to run continuous integration for ARM? I see
Travis CI has an alpha. Azure Pipelines doesn't host ARM instances I think.

~~~
otterley
AWS CodeBuild supports ARM builds: [https://aws.amazon.com/about-aws/whats-
new/2019/11/aws-codeb...](https://aws.amazon.com/about-aws/whats-
new/2019/11/aws-codebuild-adds-support-for-arm--gpu--and-x-large-compute-typ/)

------
sdan
This is great news. Except that a ton of software doesn't support ARM.

When I was trying to shift all my current infrastructure onto a couple of
RPI's, many of the Docker containers didn't support ARM (qeumu and buildx
aren't reliable) and other software didn't support ARM either.

Unless there's a good way to go from AMD to ARM, I'm not entirely sure how
great Graviton or other competitors will get.

~~~
JunkDNA
Back in the late 90’s and early 00’s, there were a ton of cpu platforms
around: SGI MIPS, DEC Alpha’s, Intel, Sun SPARC, etc... while I will admit it
was a colossal pain working somewhere that had all of those, it was often
possible to recompile from source to get things to run. I’m not suggesting
it’s trivial, but given the incredible investment in ARM in the mobile space,
the wind is at least at your back today. It certainly has got to be much
easier than it was in the days of being the only person in the world trying to
recompile an obscure open source scientific computing package for DEC Alpha.
Commercial software is a different beast, but even there, the incentive will
be high to do a port if lots of people start migrating to this.

~~~
dehrmann
Windows NT 4 supported x86, Alpha, MIPS, and PowerPC. Yikes.

~~~
wbl
All have eight bit bytes with 2's complement, same availible word sizes and
float formats (with some complexity on the Alpha due to VAX compat). C code
will mostly not care beyond endianness. The PDP is the strange one.

------
fulafel
I guess the real benchmark is whether they'll put it to use with Lambda,
Fargate etc.

~~~
ksec
I think Amazon mentioned they intend to use their own chip on All of AWS
except their IaaS / EC2 offering, where you still get to choose Server running
on x86.

That is why it was mentioned as the fall of x86 on Servers.

~~~
fulafel
Sounds interesting, where did they say it?

~~~
ksec
_And AWS ' initial strategy is to move its internal services to
Graviton2-based infrastructure. Graviton2 required significant investment, but
AWS can garner returns and improve its operating margins due to the ability to
cut out middlemen involved with procuring processors, power savings due to Arm
and efficiency gains from optimizing its own infrastructure.

AWS services like Amazon Elastic Load Balancing, Amazon ElastiCache, and
Amazon Elastic Map Reduce have tested the AWS Graviton2 instances and plan to
move them into production in 2020._

Normally I try to find Primary sources rather than secondary like Zdnet [1].
But I think those exact wording was quite widely reported at the time.

They say they are not Anti-Intel or AMD. Which is true. ( They are only Anti
x86. ) And they say the same to UPS and Fedex at the time.

[1] [https://www.zdnet.com/article/aws-graviton2-what-it-means-
fo...](https://www.zdnet.com/article/aws-graviton2-what-it-means-for-arm-in-
the-data-center-cloud-enterprise-aws/)

------
dehrmann
Out of curiosity, what's the state of Jazelle on modern ARM? Would it help
server-side ARM, or has the world (and JITs) moved on?

~~~
pm215
Jazelle is dead. The v8 version (or maybe even v7; I forget) of the 32-bit
architecture basically mandated that only 'trivial' Jazelle (which is the not-
actually-there version) could be implemented, and 64-bit has never had
anything like it. It was at best a technology of its time (when phone Java
implementations were mostly interpreted, not JITs). It would be useless to a
modern Java implementation.

------
dman
When can I buy something similar for a homelab?

~~~
ipsum2
You can buy Cavium Thunderx2s off ebay. They're last gen ARM server chips. The
performance won't be as good, but if its just for playing around with, they're
more than adequate.

------
xwowsersx
> believing that massive client volumes fund the R&D stream that feeds most
> server-side innovation.

What does he mean here?

~~~
wmf
Intel/AMD design laptop/desktop (client) cores then put those cores into
server processors. Because vastly more PCs are sold, they effectively
subsidize server processors. Arm has a similar advantage, designing cores for
phones/tablets and repurposing/extending them for servers.

~~~
xwowsersx
Ohh I see, thank you!

------
Can_Not
I'm curious about what languages or types of projects are already running on
ARM servers in the cloud (and actually benefiting!)

------
Koshkin
Annapurna, the goddess of job security.

------
taf2
Damn has anyone tried ruby on of these CPU’s is it really 20% perf improvement
on nginx? These sound too good to be true

------
gautamcgoel
This is great, but I'd be _really_ excited if we could go out and buy the
chips ourselves instead of having to pay the Amazon tax and run our code on
untrusted systems in the cloud. Of course, Amazon has little incentive to sell
the chips, since it gives them a competitive advantage against other cloud
providers.

