
Intel and ARM Performance Characteristics for S3 Compatible Object Storage - kiyanwang
https://blog.min.io/intel_vs_gravitron/
======
sitkack
If you aren't compute bound, most workloads are dictated by the speed of main
memory. And most business workloads spend their time waiting. Intel has
historically focused on single core performance, this is what they are
_really_ good at while this graviton part has decided to go wide and focus on
many core.

The memory controller on the ARM part must be excellent given that the Intel
machine had 2x sockets.

I don't think a core to core comparison or core to hyperthreaded core makes
much sense. There are so many other unknowns, they only appear to be similar
metrics. $/work_unit is the benchmark I would focus on in case like object
storage.

~~~
CalChris
Graviton isn't Arm IP. It's AWS _ne_ Amazon IP. Amazon is an Arm architecture
licensee. That's important because ...

    
    
      Arm capitalization:    $  31.6B
      Intel capitalization:  $ 272.4B
      Amazon capitalization: $1240.0B
    

Amazon can throw an insane amount of money at a problem when the want to.

Edit to add: a good source for how much the Arm IP Graviton2 uses which is
more than I thought:

[https://www.anandtech.com/show/15578/cloud-clash-amazon-
grav...](https://www.anandtech.com/show/15578/cloud-clash-amazon-
graviton2-arm-against-intel-and-amd)

The DDR4-3200 memory controllers are probably licensed IP.

~~~
baybal2
> Amazon can throw an insane amount of money at a problem when the want to.

Take a look on what's happening with Chinese state backed semi companies. Sure
surely have money.

See how well the strategy of "just throw $100B on the problem" worked for
them.

It didn't. Chipmaking is a harder thing to do than to run cat video websites,
or a web hosting business.

~~~
parhamn
As an outsider, I’d guess the biggest impact is the ability to buy chipmaking
talent. That would certainly be easier for the American tech giants, no?

~~~
saagarjha
Perhaps this might’ve helped?
[https://en.wikipedia.org/wiki/Annapurna_Labs](https://en.wikipedia.org/wiki/Annapurna_Labs)

------
rafaelturk
The best part of this new era of ARM based servers is competition.

In the early 90's Intel slowly and steady marched into the datacenter arena up
to a point that become the dominant player.

Intel was brilliant and created a distribution strategy that allowed multiple
vendors use the x86 chip each with different server solutions. IMO This
multitude of server offers based in the same architecture (x86) is what killed
all other CPUs.. Just to remember a few players: Equipment Corporation (DEC),
Sun, HPe, and even IBM now well near defunct.

It's very exciting to see ARM rising to datacenter arena. IMO ARM is a
superior CPU, and now they have, just like Intel, new distributors creating
multiple offers based on a single well know platform.

------
twoodfin
These benchmarks seem primarily to demonstrate that for “straight line”
computationally intensive algorithms, hyperthreading doesn’t buy you very
much—in my experience something like 20% additional throughput. In those
scenarios, all else being equal, 64 cores worth of compute will have a big
advantage over 36.

~~~
comex
Or to put it more strongly: The multi-core comparison makes no sense. If the
goal is to compare performance per core, as a way to evaluate CPU efficiency
in the abstract, then they should be comparing equal numbers of physical
cores, not logical cores.

Such a comparison does miss some practical benefits of the ARM instances. The
ARM instances seemingly have more physical cores available on a single AWS
instance – but “performance on a single instance” would be a different
comparison, one where the Intel instances shouldn’t be limited to 64 threads.
And the ARM instances may have a cost advantage (I haven’t checked), but
performance for equal cost would again be a different comparison.

~~~
jsnell
I agree that the article should at least have mentioned the cost, to allow for
comparison on a more meaningful basis than perf per core. Though in the end
that makes things look worse for Intel. The c5.18xlarge (Xeon) is $3.06/hour,
m6g.16xlarge (ARM) is $2.464/hour.

------
gok
The dual vs single socket section really shows how much of a performance
landmine NUMA is. You really need to properly pin your workloads, but that's a
huge pain in the ass that often never happens during tuning.

~~~
m0xte
Completely. A lot of people I have worked with have no idea what NUMA even is
whole having NUMA machines. It’s just a magic box. Zero understanding of the
hardware architecture at all.

~~~
halbritt
I've worked in places that had millions of dollars worth of infrastructure
deployed and just by enabling NUMA awareness gained 50% again capacity.

It's something that I frequently look for and more recently seems to be the
default rather than not.

------
ezoe
Comparing the systems with Two sockets vs One socket, Hyperthreading vs None.
I don't think we can trust the result that much.

But just for general opinion. Intel CPU has strong memory guarantee between
cores compared to the ARM. If the implementation is really good at parallel
execution so it doesn't require a lot of data sharing between threads, ARM
architecture potentially be better than Intel, if its implementation were
good.

~~~
rrss
Not sure what you mean by trust the result.

Graviton 2 (single socket, no SMT) comes out ahead of Skylake (dual socket,
SMT enabled) in the multicore scenario. Do you think that result would change
if Graviton 2 was instead compared against single socket, SMT disabled
Skylake?

~~~
ex3ndr
Disable hyperthreading and use real cores, not hyperthreads.

~~~
ksec
Except a hyperthreads is exactly what you get on a Cloud Vendor 's vCPU.

------
captn3m0
I'd have liked to see cost on an axis on some of the graphs.

~~~
ksec
Would it matter knowing ARM instances is always cheaper on the same vCPU
config and wins the benchmarks by a large margin post 30+ vCPU?

~~~
wmf
You and I know that but many people reading these blog posts don't.

For software that supports scale-out I also wonder about the price/perf if you
choose an instance size that corresponds to the Intel peak.

------
discodave
The goal of the S3 team at Amazon, or somebody like Backblaze is to make hard
drives 100% of costs. That is, they want to reduce the cost of compute,
datacenters, racks etc. to approximately zero.

If CPUs are dominating your costs or performance, then you're doing S3
Compatible Object Storage wrong IMO.

~~~
wmf
I don't think Minio is trying to compete directly against S3, especially when
they run on EC2 (a service running on AWS will never be cheaper than AWS
itself). It sounds like they're targeting a higher-performance S3-compatible
niche.

------
pwarner
I assume AWS will start using graviton for their S3, EBS, ELB etc services
aggressively?

~~~
fh973
The S3 protocol needs MD5 hashes, which these benchmarks don't cover. A Xeon
core can do that with a couple of hundred MB/s. As it's a continuous hash of
the whole object, this limits S3 performance.

I guess AWS does not care that much about single stream S3 performance but
will optimize for costs.

~~~
justincormack
I don't think it uses md5 any more, the newer protocol versions support sha256
hashes and thats only if the user checks, may not be what is used internally.

~~~
chillaxtian
Link? The eTag of an object is still the MD5 AFAIK.

~~~
rob-olmos
For the ETag, yes and no that it's an MD5 of the object depending on
encryption and multipart upload:
[https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRe...](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html)

For the object upload, AWSv4 request signature uses SHA256 on the
payload/object, but I don't know if S3 also computes & compares the digest or
just uses the x-amz-content-sha256 header value.

~~~
rob-olmos
AWS support via Twitter said that the SHA256 digest of the payload is
calculated and compared for S3 when using AWSv4 signature requests.

------
MrStonedOne
Now try this with hyperthreading disabled since it's disabled on the arm
machine.

If you have a 24 core machine and you run the same task on 48 threads, you
will sometimes see some performance drop compared to running it on 24 threads.

HT doesn't change this. Some loads are not ht friendly and you should account
for this when deciding on how many threads to spawn.

They limited it to 64 cores to be fair to the arm processor but did not limit
it to 18/24 cores to be fair to the intel processor. Why is that?

The last graph is the most enlightening if you keep in mind the number of real
cpus each processor has.

~~~
Taniwha
CPU architects make lots of different choices and tradeoffs - in this case
Intel have chosen hyperthreaded physically larger CPUs, while the ARM chip's
designers have decided that hyperthreading brings no meaningful improvement
but lets them build smaller (and maybe faster) CPUs, and as a result more CPUs
per die.

These are tradoffs that we make all the time when building CPUs - none are
'best' it's more that a group of changes is better for a particular case.

So the people benchmarking here have NOT turned off hyperthreading on the ARM
chips, there is none to turn off, instead there are more CPUs - the ARM guys
have optimised their 64 core chip to be useful in their target market which
happens to be closer to what these guy's application does

~~~
MrStonedOne
Doesn't change my point.

Running 64 threads on the intel cpus is slowing them down vs running the
number of real cpus they have.

and as i said, since they limited the test to 64 threads even thou one of the
cpus has more then 64 vcpus, ("to be fair to the arm processor") the moment
they saw the final graph, they should have done the same thing in reverse, to
be fair to the intel processor. Otherwise it just reeks of selective
methodology application.

Of course, as you said, the _real_ answer is they should not have limited the
test to 64 threads. that doesn't match real workloads where the number of
threads would be set to the number of cpus or vcpus.

Instead they should have done single threaded, tests with both processors
maxed out at max(intel vcpu, arm vcpu) threads on both, as well as one where
they set each to their respective max, as well as repeat with the real cpus
number.

------
tpurves
I’d really like to see a comparicon against AMD EPYC as a platform.

~~~
kllrnohj
Amazon recently added Epyc Rome CPUs: [https://aws.amazon.com/blogs/aws/new-
amazon-ec2-c5a-instance...](https://aws.amazon.com/blogs/aws/new-amazon-
ec2-c5a-instances-powered-by-2nd-gen-amd-epyc-processors/)

It would be great if they tested against one of those.

~~~
karavelov
There are already some benchmarks:
[https://www.phoronix.com/scan.php?page=article&item=epyc-
ec2...](https://www.phoronix.com/scan.php?page=article&item=epyc-
ec2-c5a&num=1)

------
acd
What is the performance per watt for Graviton vs Intel?

~~~
wmf
There's no way to know the power consumption of stuff in AWS.

~~~
userbinator
However, how much they charge for it gives a general idea, since they are
paying for the power.

~~~
usr1106
That could or could not be correct. We don't know their bill of materials.
Probably some people can estimate it better for Intel. I understand the ARM
stuff is AWS proprietary, so I would be surprised is any figures are public.
Besides the costs are unkown the margins could be very different. AWS might be
willing to pay a high investment to get more independent of Intel in the long
term.

I worked at a big corporation in the mobile space before. They were willing to
pay a high investment to get more independent of their single ARM SoC
supplier. (It failed because Intel could not deliver in the end.)

------
guenthert
I found that very interesting, thank you very much. I didn't follow the
development closely and had no idea how close apparently ARM64 (well the
Graviton2 implementation at least) in performance is to Intel's server CPUs.

I wished there would be some published results from a standardized test
though. Is there any reason there is none for e.g. SPECpower?

~~~
ksec
>I wished there would be some published results from a standardized test
though.

[https://www.anandtech.com/show/15578/cloud-clash-amazon-
grav...](https://www.anandtech.com/show/15578/cloud-clash-amazon-
graviton2-arm-against-intel-and-amd)

------
ksec
A lot of the comments and the Article seems to be missing some context.

On Graviton -

People vastly under estimated the scale of Hyper scaler. ( They are called
_Hyper_ Scaler for a reason ) and over estimated the cost Amazon actually put
/ required / invested into G2 ( Graviton 2).

Over 50% of Intel Data Center Group revenue are from HyperScaler. Some
estimate put Amazon at 50% of all HyperScaler. That is 25% of Intel Data
Center Group revenue are from Amazon alone. Or ~$7B per year. To put this into
perspective, AMD, a company that does R&D on CPU, GPU, in multiple different
market segment make less revenue in 2019 ( $6.7B ) than Amazon spend with
Intel.

It is important to note Data Centre Spends continue to grows with no end in
sight. Amazon estimated there are less than 10% of Enterprise IT workloads are
currently on Cloud. And IT in Enterprise is growing as well. Which means there
is a double multiplier.

G2 is an ARM custom N1 Design. Basically with less core and L2 cache to reduce
die size and cost. The N1 is TSMC blueprint design ready. So while you cant
really buy "Graviton 2", in theory you can get a replica fabbed from TSMC if
you have the money. ( On the assumption Amazon did not put any custom IP into
it ) And That means G2 isn't a billion dollar project. Even if it cost Amazon
$200M, if Amazon only make 1M G2 over its life time, that is still only $200
per unit. Or less than $300 including Wafer price. Compare to the cost of
buying the highest core count from from Intel. The cost is nothing.

Also reminder Amazon dont sell chips. And the biggest recurring cost to AWS
other than engineers is actually power. And in terms of workload per watts, G2
currently has huge lead in many areas. And as I have noted in another reply,
Amazon intends to move all of its services and SaaS running on G2. The energy
efficiency gain with G2 and future Graviton at AWS scale would be additional
margin or advantage.

On the Article and SMT / HyperThread -

I guess the headline "Intel vs ARM" is a little clickbaity and may not be an
accurate description. Which lead to comments thinking it is a technical
comparison between the two. But I dont think the article intends it to be that
way. It also assumes you are familiar with Cloud Vendors and pricing. Which
means you knew what vCPU are, instances are priced per vCPU and ARM instances
with the same vCPU prices are always cheaper.

And that means from a Cloud deployment and usage perspective, whether the
Intel Core does SMT / Hyperthread, or can reached up to 6GHz is irrelevant.
Since it is not designed to test those. You are paying by vCPU and you have a
specific type of workload. You could buy a 64 vCPU Intel instances, run a
maximum 32 Thread test on it and compared to a 32 vCPU G2 instances. But you
will be paying more than _double_ the price. IBM POWER9 also has SMT4 or 4
Threads per Core. No one disable SMT just to test its Single Core performance.

And as noted in the first reply, the test is clearly Memory Bound, in which G2
has an advantage of having 2 more Memory Channels along with higher Memory
Speed support. It would be much better to see how it compares to the recently
announced AMD Zen 2 instances also comes with 8 Channel memory.

~~~
fwessels
As per last week's announcement from AWS about the availability of AMD EPYC
cpus, we have repeated the “single socket” test for the EPYC cpu as well.

The updated chart is included in the post or you can find it here:
[https://blog.min.io/content/images/2020/06/single-socket-
per...](https://blog.min.io/content/images/2020/06/single-socket-performance--
1-.png)

It clearly performs and scales a lot better than the Intel CPUs (no doubt also
benefiting from the increased memory bandwidth) and at high core counts is
very close to the Graviton2.

Furthermore, I agree that the “investment costs” for Amazon are almost minute,
they might have already earned it back.

------
arpinum
I’ve seen rumours that AWS lost a lot of money developing their ARM chips. I
haven’t seen a strong source confirming this, but it does pour cold water on
the idea that ARM is ready to take on x86 for general server compute, though
no doubt there are niches where it can do well.

~~~
opencl
The Graviton chips are just ARM reference designs with a few tiny
modifications, very little R&D required. The majority of the NRE costs were
probably the 7nm tapeout which is supposedly in the 10s of millions dollar
range.

Anandtech ran a pretty good article on the Graviton2 and its Neoverse N1
cores: [https://www.anandtech.com/show/15578/cloud-clash-amazon-
grav...](https://www.anandtech.com/show/15578/cloud-clash-amazon-
graviton2-arm-against-intel-and-amd)

~~~
baybal2
> 7nm tapeout which is supposedly in the 10s of millions dollar range.

Yes. Cookie cutter SoCs stopped making money at around 65nm node. You need to
put more on the table these days than just bare ARM cores and generic
peripherals to sell a SoC.

Many dotcoms that ventured into chipmaking are now quietly bailing out.

Making your own chips just to undercut Intel is a stupid idea. People tried it
before more times than I can count. Just remember all of RISC epopeias from
nineties. Where are SPARC, PARISC, PPC, MIPS now?

The fact that they use ARM instead of any other RISC core makes little
difference.

Internet companies are too small to make money from undercutting chipmakers,
but I believe they are not that far away from that point.

Maybe in 10 years or so, you will be working on a PC with Genuine Google(R)
chip

~~~
jsnell
You maybe aren't appreciating the scale of the cloud companies properly. In
2019, Amazon had 6x the Capex of AMD's _entire revenue_. Google 4x, Microsoft
3x. Obviously all of that Capex isn't for compute, but given the cost
structure of servers, a significant chunk will be.

If AMD can profitably build and sell a state of the art CPU in the world with
that scale, why couldn't the cloud companies be able to?

~~~
baybal2
> why couldn't the cloud companies be able to?

That's easy to say, now think of why other companies with n-times money "can't
simply do that"?

It takes quite a lot of effort, and skill: execution capability. GAF companies
are easy money companies, they, and their executives are not made for that.

~~~
jsnell
But your initial argument wasn't about competence, it was an economic argument
about scale. To quote, "the internet companies are too small", "No amount I
can imagine can pay off for the tapeout on the latest node for them".

Given you've pivoted away from the economics to claiming that they just can't
do it, how do you explain the results from this article? Shouldn't Amazon have
been too incompetent to build a state of the art server CPU? How about
Google's TPUs, in-house switches, and in-house NIC ASICs? Just failed prestige
projects?

------
FpUser
This test is flawed in so many ways. I just do not understand how the authors
do not feel ashamed about what they're trying to sell to readers.

In a healthy environment the author/s presenting study of such quality
would've been laughed out the door.

What their results are actually showing is that up to about 18 cores - the
actual number of physical cores on a single Intel's CPU they've tested Intel
kicks the shit out of Graviton. So if you want proper test to compare that
particular Graviton 64 core CPU with the Intel then take system with single
socket Intel CPU with 64 _real_ cores as well and then come back with the
results.

~~~
marcinzm
The 64 core graviton costs less than the 18 core intel. Cores are meaningless,
cost isn’t. So Intel looks even worse when cost is accounted for.

~~~
FpUser
We are talking performance here. Not performance per dollar.

~~~
marcinzm
You, not me, argued that performance is not what to consider but performance
per core (not physical cpu). Don’t push arbitrary metrics and then complain
when you’re called out for it.

