The benchmarks appear to show AWS wins by a considerable margin in all tests, and is cheaper at all quoted price points, but is hardly above Azure in the cost performance table. This appears to be due to including Azure's reserved pricing but continuing to use AWS and GCP's on-demand pricing. This is misleading, as both GCP and AWS offer discounts for yearly commitments.
Also, the RAM ratios for the machines are different across vendors. E.g. In the blogpost, the AWS machines have 2GB/vCPU vs 4GB/vCPU for GCE. This does depends on the workload though.
The article end up feeling rather overwhelming. The main difference and advantage AWS holds is that it uses the newer, more advanced (and more expensive) Neoverse V1 design, while GCP and Azure are based on Neoverse N1 design which is older, cheaper, and less performant. These are largely due to how these chip were designed by ARM. It may be argued that AWS also adds its secret sauce, but so far it feels unlikely. A cursory search leads to an phoronix article [0] which has a much more in depth comparison between V1 and N1 (through AWS's c7g vs. c6g instance types.) There are also upcoming N2 and V2 design; NVIDIA's Grace CPU is reportedly based on V2 design, which will be interesting to watch for.
The 41% discount thrown in at the end for Azure, without any explanation, was also jarring. Maybe there truly is promotional rate for Azure's ARM instance, but as another poster pointed out, it's likely reserved pricing, which is available for all the providers.
There is no reason to not run AWS offerings like RDS and Elasticache on ARM for instant cost savings. Performance is comparable to Intel/AMD so I see no reason not to use ARM even if your own code is not compatible.
I have no doubt that AWS will also switch over services like DynamoDB, ELB and countless others which don't let you chose machine type to improve their cost basis.
It was mentioned in one of the topics during AWS re:Invent 2020. I just spent a minute searching the Internet for "aws graviton market share" and various sources cite everything from 10% to 15%.
Thanks, very interesting for sure. I have managed 1000s of AWS instances and never seen an ARM one. I guess it depends on your application stack and business.
Was hoping to see the results, but the images were all blurry dummy placeholders (some sort of lazy loading for images?). The images had a text on "click here for preview", but when I clicked it, nothing happened (I have ad-blockers, maybe that's why some JS code was disabled).
But what the heck, this is not how wepages should work - click here to "unblur" the image? Why add such an extra step? To save bandwidth cost?
There seems to be a proper lazy-loaded image (with low-res + high-res) with a thumbnail (low-res) on top; and the thumbnail is removed with JS code.
This is probably meant as a fallback to look nicer on browsers which don't support lazy-loading, at the expense of breaking browsers which don't support JS or fail to execute it.
AFIK, that’s because the CDN provides the ability to make the original images as WEBP. I will let maintainers know this case, may I know which browser you’re using?
this is a feature of asynchronously loading the "heavy" images to unimpede the loading of the text and markup. The "click to unblur" is because it failed to automatically load the image, and as a fallback uses a CTA (i.e., an event triggered by the user explicitly) to reattempt to load the images. CTAs have a higher privilege for fetching content.
but without the virtualization overhead. considering how much work Intel/AMD put into virtualization, it would be interesting to see if the overhead is more/less on ARM.
Many (though not all) of those mitigations remain important even for single-tenant remote servers, lest you get your keys or other secrets exfiltrated.
Not to be forgotten: per-core shared tenancy. I'd love to be proven wrong, but I doubt AWS allocates a full core for every non-burstable Arm "vCPU" sold.
It was my understanding Amazon does in fact do this for most instance types, it's why the amount of RAM and other resources scales with CPU core count.
There are exceptions though in the very small "burstable" etc instances, like the micro etc.
EDIT: Seems I am wrong.
"Each vCPU is a thread of a CPU core, except for T2 instances and instances powered by AWS Graviton2 processors."
The exception for Graviton here is because every ARM vCPU is a full core, whereas an Intel/AMD vCPU is one hyperthread (i.e. half a core). They're not saying Graviton2 cores are shared. For Intel/AMD, you always get allocated both hyperthreads on a dedicated core; that's why hyperthreaded families start at 2 vCPUs. Only the "T" families feature shared cores.
Google does, this was a real sticking point for me when I was evaluating cloud options, at the time GCP was the only provider allocating (and affining) CPU cores to tennants
I think the title is misleading - it is not a server performance comparison, it is just a comparison on a single application using a single kind of workload on a fixed dual core setup.
It also lacks basic insights into the results, e.g., when both AWS and GCP are using ARM's Neoverse V1 design, why those two servers are having such significant performance gap? Maybe it was just caused by some bad software configuration in the stack which can void all relevant results?
When there are up to 64 cores available, why only 2 cores were used? Surely other 62 cores are relevant, right?
Only Amazon, i.e. AWS uses Neoverse V1.
Google uses Neoverse N1.
Neoverse N1 is a much older and slower CPU, derived from Cortex-A76.
Neoverse V1 has a performance similar to Cortex-X1, but it also implements SVE (vector extension), which allows improved performance for floating-point applications.
The big ARM cores from the Cortex-X and Neoverse V series have more execution units, up to a double number, in comparison with the medium-size ARM cores from the Cortex-A and Neoverse N series.
For applications without much floating-point or large integer computations (where the Intel/AMD CPUs are the best), the Cortex-A/Neoverse N cores have a performance intermediate between a thread and a core of the Intel/AMD CPUs, so you need an 128-core Neoverse N CPU to beat a 64-core AMD CPU (which has 128 threads).
For the same kind of applications, the Neoverse V cores have a performance similar to (slightly older) Intel/AMD cores, so a 64-core Neoverse V CPU can match a (previous generation) 64-core AMD CPU.
So... the news here is that GCP seems to be significantly slower in practice, where all three products are priced at the same spot.
But... the weird question is why are the Azure financials funny? The listed prices are about the same, but the "annual cost" for Azure seems to include a "41% off" discount that I can't find an explanation for? And then they use the latter in the calculation of price efficiency, which seems... what? This is the kind of thing Comcast does in its marketing. What is that discount and what's the justification for assuming that all users get it in perpetuity?
Do you need a car to sign up for the free tier? I can't recall if they asked for one on initial registration but I had to add a card and went through the fraud detection bs (basically waiting for close to a week to be approved) when I had to upgrade to the paid tier.
Busy playing with this at the moment (trying a k8s)...their availability is quite patchy. Struggling to get 4x ARMs even created let alone reliably created via terraform.
Probably not all that favorably, considering that MacOS (moreover, XNU and APFS) aren't really optimized for server use. Hell, to my knowledge most of the networking drivers are quite old if not poached wholesale from early FreeBSD distributions.
Running it on Asahi might tell a different story, but it's almost completely irrelevant considering how painful it is provisioning Macs as opposed to Graviton instances.
M1/2 will probably be ahead significantly per core especially price/performance wise. But yeah, entirely different products, even if it might be worth it on paper, stacking stacking 10 mac minis is not a real alternative to something like Altra Q80-30 even though it might be a bit cheaper.
Would be cool if Apple reintroduced Xserve. Extremely unlikely, though, considering that server margin are way to low for Apple to even consider and they really can't bet that their CPUs will be ahead of everyone else indefinitely.
Have you seen Monterey benchmarks when compared to identical Linux hardware? Performance for server applications and IO-constrained operations favor Linux consistently.
It comes down to resulting cost and resources required to reach SLOs of your services. Major factors for such are the quality of support for arm64 by your workloads and specific optimizations available for such (running code that has AVX512 optimizations vs something that likes SHA256 + AdvSimd tricks).
Overall, compiled languages like Rust, C / C++ have very good support due to years of dev effort invested in LLVM/GCC to support arm64 will, C# will also become one of the better running platforms starting with .NET 7 (6 is ok but 7 improves perf dramatically and vectorizes many code paths). Can't say anything about Java though.
All in all, Graviton 2/3 on AWS is supposed to provide cheaper and denser compute which are its main selling points at the moment.
Azure for example will sell you an 8 vCPU x64 VM for more than an 8 vCPU ARM VM, but the ARM VM vCPUs are entire physical cores, whereas the Intel-compatible VM vCPUs are hyperthreads. So you get more cores and more cache for less money with ARM. For some workloads this makes a big difference.
While true, the ampere cores are weaker than intel/amd ones. It remains a question of how much juice you get out of such vCPUs vs Ampere at capacity. Graviton 3 OTOH is much closer in terms of performance to what is usually offered in Xeon / Epyc.
My employer has been achieving decent (around 15%) cost savings on most services we migrate to ARM on AWS. Not all though. Some of our services we had to revert to x64 because it didn't perform as well on ARM.
As always, you should measure and decide based on your own software and requirements. My personal experience has been that for throughput-oriented workloads the Graviton stuff tends to be cheaper than x86-64 EC2 instances, but the latest x86-64 instances have better straight-line single-core performance so they are more appropriate for critical latency-sensitive services, and cost more.
It would also be useful to compare AMD/Intel chips with ARM chips on security issues arising out of speculation issues etc.
The AMD/Intel server chips are used more intensively so they have greater scrutiny. Compensating for that, would it be possible to know which implementations are objectively better from a security standpoint?
If developers are using MacBooks, the main benefit is that you don't need to maintain both arm64 and x86 support for application/Docker images.
It's not simple to maintain support for both, and the only way to be sure application works as expected is to run CI pipelines for both architectures, which can get quite expensive.
You don't have to pay the Intel premium so you get better performance/$. I would be willing to bet AWS uses Graviton internally for many of their services like S3, DynamoDB, et. One thing to keep in mind is that ARM doesn't support Hyperthreading so you get twice the number of cores than with x86.
Assuming your workload can run on ARM, you should think of it as a different CPU generation or AMD vs Intel. For some applications, you might care about speed. Others, +/- 5ms isn't an issue. Then you see how much throughput each supports. Finally, you go through the total cost math. It's hard to make a blanket statement about which will be better for your usecase.
It's an auction price. I am paying half than normal cost in average.
The 'high' price was set up in case the instances were being killed because I was outbid by other AWS users. But the spot price has never been that high.
The instances were killed anyway. So the reason is some technical glitch and not the price.
It's a CPU and RAM intensive process. Something in the VM governor is killing VM processes if they exceed some threshold, killing my instances.