
Is Arm ready for server dominance? - PeterCorless
https://www.scylladb.com/2019/12/05/is-arm-ready-for-server-dominance/
======
walrus01
My concern about this is that with the failure of Qualcomm Centriq, there is
no industry standard/affordable/easy to buy ARM based server platform ordinary
persons and small/medium sized businesses can acquire.

It's great that Amazon has ARM based stuff, but it's something proprietary
they're purchasing in large quantities from a manufacturer they have a very
close relationship with. Undoubtedly the physical hypervisor platform and
motherboard these things are running on is something totally bespoke and
designed to Amazon's unique requirements.

I can't pull out my visa card and go buy a (atx, microatx, mini-itx) format
motherboard for an ARM CPU, the CPU itself, RAM, etc, and build a system to
run debian, centos, RHEL, ubuntu whatever on.

This means that, sure, you can get an EC2 ARM based server, but it's something
you can't physically own and you'll be paying cloud based service rates
forever if you want to keep running it. There are some categories of business
and government entities where not having things on-premises, or fully owning
and controlling the hypervisor all the way down to the bare metal, is a non
starter.

If the ARM platform Amazon is buying becomes truly price/performance
competitive with a single/dual socket xeon or threadripper/epyc, it also gives
a possible competitive advantage to Amazon over any medium-sized cloud based
VM provider out there currently selling (xen, kvm) based VMs on x86-64
hypervisors.

Based on what's available on the market right now I see no signs of there
being a viable hardware-purchasing alternative to Intel or AMD based
motherboards and CPUs.

~~~
derefr
> I can't pull out my visa card and go buy a (atx, microatx, mini-itx) format
> motherboard for an ARM CPU, the CPU itself, RAM, etc, and build a system to
> run debian, centos, RHEL, ubuntu whatever on.

The difference between ARM and x86 is that there's no standardized ARM
interface between the motherboard and the CPU—no ARM equivalent to ACPI. Every
integration of an ARM CPU into a board is bespoke. So you have to buy "a board
with an ARM CPU on it", not just an ARM CPU and board separately.

But, if you relax that restriction, it's not like it's hard to acquire "a
board with an ARM CPU on it." 80% of single-board computers (e.g. the
Raspberry Pi) are "a board with an ARM CPU on it." You can wipe pretty much
any Android device (not just phones, but also HDMI "streaming boxes", which
are a convenient form-factor for basing a workstation on) and install Linux on
them. There are also some higher-end development/SDK boards for ARM embedded
systems, like the Nvidia Jetson. What more do you need?

~~~
walrus01
My first question would be, why has no industry trade group attempted to
define a standard socket, or use an existing physical socket pin-out.

It's not hard to acquire something with an ARM CPU on it, but at prices an
ordinary person can afford, they're all in the category of toy computers. Try
to find an affordable ARM system with a M.2 2280 NVME SSD slot on it like you
can find on a $100 desktop x86-64 motherboard. Or multiple PCI-Express 3.0
x8/x16 slots.

I've previously spent many years working for a hardware manufacturer. Personal
theory is that this is a real example of a chicken or egg problem related to
economies of scale. Nobody wants to spend dozens of millions of dollars
tooling up to produce ARM socketed CPUs and motherboards and stuff, which may
or may not be price/performance competitive with current-gen Intel and AMD
stuff by the time it's ready for release. And there's a huge risk in
manufacturing something like that and then discovering that the sales volumes
are really low.

Look at the sheer massive quantities that the top-ten Taiwanese motherboard
manufacturers churn out every year.

> You can wipe pretty much any Android device (not just phones, but also HDMI
> "streaming boxes", which are a convenient form-factor for basing a
> workstation on) and install Linux on them

No you really can't, and phones aren't servers. Take any modern $600
smartphone and try installing something very close to a stock debian or centos
on it. Being able to maybe boot a Linux kernel on something doesn't mean that
there's anything like the market demand for that particular hardware platform
target for a _whole distribution_.

~~~
mattl
These have an M.2 slot. [https://www.seeedstudio.com/ROCK-
Pi-4-Model-B-4GB-p-4137.htm...](https://www.seeedstudio.com/ROCK-
Pi-4-Model-B-4GB-p-4137.html)

[https://www.youtube.com/explainingcomputers](https://www.youtube.com/explainingcomputers)
is a good YouTube channel for reviewing these types of single board computers.

~~~
a012
Has a M.2 slot doesn't mean you can put any M.2 ssd in it, especially M.2 2280
is the most common size and easily can buy on store/online, smaller than that
is mostly for OEM devices and you don't have many choices available.

------
PeterCorless
An important update has been made to the microbenchmarks to this page. I will
quote for you my editorial comment:

"Editor’s Note: The microbenchmarks in this article have been updated to
reflect the fact that running a single instance of stress-ng would skew the
results in favor of the x86 platforms, since in SMT architectures a single
thread may not be enough to use all resources available in the physical core.
Thanks to our readers for bringing this to our attention."

You'll note that the newly-used stress-ng command is: "stress-ng --metrics-
brief --cache 16 --icache 16 --matrix 16 --cpu 16 --memcpy 16 --qsort 16
--dentry 16 --timer 16 -t 1m"

The count on the flags under the old numbers was 1. This update shows even
better numbers for Arm than we originally produced. Thanks to our assiduous
readers for pointing this out.

[https://www.scylladb.com/2019/12/05/is-arm-ready-for-
server-...](https://www.scylladb.com/2019/12/05/is-arm-ready-for-server-
dominance/)

------
RcouF1uZ4gsC
I think one of the big issues may be with high performance multi-threaded
code. x86(I am including x64 in this designation) is a lot stronger memory
model than ARM. This has two implications. First, x86 is a lot more tolerant
of data races, and missing explicit memory fences. When you port server
applications that have been running well on x86 to ARM, you may be in for some
surprises as data races and missing fences now manifest as data corruption.
The other implication is that on x86, the gap between a sequentially
consistent memory order and a relaxed memory order is not that great. Thus,
many programmers may use atomics with sequentially consistent memory order to
reduce the complexity. On x86, this will generally yield decent performance.
On ARM, that gap is much bigger and you are liable to have severe performance
regressions.

~~~
dman
One upside of code increasingly being written in languages that are
predominantly single threaded like Python/JS is that these issues do not
matter as much.

~~~
anonuser123456
If you want to exploit parallel execution for performance, not having
parallelism is not a benefit.

~~~
PeterCorless
We do parallelism _across_ CPUs and nodes. We run single-threaded to get the
most out of a CPU in a shared-nothing architecture. Many single-threaded apps
aren't written to really take advantage of all a CPU has to offer. But there
are also prices to pay to run multi-threaded; context switches, etc.

------
MrBuddyCasino
"AWS, the biggest of the existing cloud providers released an Arm-based
offering in 2018 and now in 2019 catapults that offering to a world-class
spot. With results comparable to x86-based instances and AWS’s sure ability to
offer a lower price due to well known attributes of the Arm-based servers like
power efficiency, we consider the new M6g instances to be a game changer in a
red-hot market ripe for change."

I'm not sure how that conclusion follows from the numbers presented? Yes, the
new ARM processor has become much faster than the older one, but clearly
looses against x86 in cpu-heavy benchmarks.

Might be a good option for I/O limited workloads, as the NVMe storage is newer
and therefore faster.

~~~
aloknnikhil
Exactly. In fact, I'm sure there will be newer x86 instances paired with the
new NVMe storage and the advantage is lost.

Perhaps, the power saved is translated to cheaper instances. But I don't think
it's worth the performance penalty.

~~~
PeterCorless
Much of this depends on whether an app has true linearity in scale-out. If you
can use horizontal scalability, you can get the same (or better) aggregate
performance while, as you note, still reap savings both in power and dollars.
Similar by analogy to how SSDs allowed you to get "good enough" performance
for a database compared to all-RAM instances. You could still meet your SLAs
and pocket the difference. It's a game changer in that way.

------
magriz
ARM has recently made some real progress into HPC. HPE is delivering ARM
clusters in Europe. Fujitsu has developed a "optimised" ARM CPU with SVE and
it will power Japans exascale HPC system[1].

[1][https://en.wikipedia.org/wiki/Fugaku_(supercomputer)](https://en.wikipedia.org/wiki/Fugaku_\(supercomputer\))

~~~
petschge
And Cray will happily sell you an XC50 with Arm cores inside if you want that.

------
iagovar
I tried ARM servers in Scaleway and honestly, unless your profile is sort of a
sysadmin or you're motivated, it's just dealing with some issues and less
power overall.

Also, AFAIR they were around the same price of X86 instances.

But then again, I have almost no sysadmin skills, so maybe it was my lack of
knowledge.

~~~
bisby
My experience with ARM (locally anyway, raspberry pi's and pinebooks) is that
everything works great, if it works.

ARM binaries available? Great. You're all set. (hopefully its for the right
generation of ARM though, I can't get SteamLink software to run on my pinebook
because it does raspberry pi hardware version checks that obviously fail).

Not available? You're gonna run into one of two scenarios:

Closed source app? Sorry. Just never going to get it (unless you want to run
it through qemu at a huge performance hit and YMMV).

Open source app? Compile it yourself! Which takes noticeably longer than x86
for me. Maybe in server this works out. For desktop, compiling st myself was
actually the path of least resistance to getting a terminal I liked. I
wouldn't have wanted to compile firefox on ARM though without some serious
server horsepower. When I used to build docker images they would sometimes
take hours, for things that were less than a minute on x86.

~~~
qxnqd
>Open source app? Compile it yourself! Which takes noticeably longer than x86
for me.

Heh, it's funny, right? I switched from an x86 server to an arm server and now
it takes seconds and seconds to log me in using ssh. It's like the server
really struggles when crunching numbers.

~~~
bisby
If fairness, I tried a raspberry pi cluster. and it was much slower than my
xeon server. and i was insanely slower.

but i could run the entire pi cluster off a 6 port phone charger, instead of
an 800w power supply. Power bill was one of my driving motivators. but
ultimately i went back to x86 for performance

------
tombert
I don't know much about the cloud hosting for ARM stuff (since I don't work in
that space), but I have been extremely happy with my ARM home-server setup in
my basement. Docker swarm has been extremely nice on my ODroids, and I
recently upgraded to the Nvidia Jetson Nano, which has perfectly fine
Kubernetes support.

I'll admit that maybe I'm not doing the most elaborate stress tests, but I
mostly use them for my video transcoding and my (very) recent interest in
machine learning, and I haven't had much issue. The thing that's given me the
biggest headache is older versions of Ubuntu's mediocre support of ZFS, which
has largely been fixed.

~~~
syntheticnature
I've been thinking about moving my home server to an ARM-based setup to reduce
power consumption/fan noise. My situation is closer to 'glorified NAS' that
runs a few additional oddball things, though. Are you just using USB 3.0 to
SATA in the cases where such is needed?

~~~
rb808
I have a Pentium NUC which works fine for this and no noise and still x86.

~~~
tombert
I've debated buying a NUC, but at least on Intel's website, buying just the
board cost somewhere in the neighborhood of 500 USD; for that price I can buy
10 Raspberry Pi 4's or ODroid XU4s. Granted, in order to use them to their
full potential, you end up having to learn a lot about distributed computing
(which is a bonus for geeks like me, but maybe not most people), but if your
goal is to use it as a server, the NUCs seemed a bit overpriced to me.

That said, if anyone _is_ looking to stay within the x86/x64 family of CPUs, I
actually recommend looking for a used Wyse/Dell thin client on eBay. You can
often get a decent quad-core system with USB3.0 and 4-8gb of RAM for around a
hundred USD.

~~~
henryfjordan
That NUC is still going to have more computing power than your 10-node RPI
cluster though. By a lot.

------
microcolonel
I bought an AArch64 desktop board based on one of the newer NXP manycore CPUs,
because it seems to be the first of its kind.

Every SoC vendor should sell an mATX or Mini-ITX board compatible with PC
components, if they want server adoption.

That goes especially for any vendor facing the even harder uphill battle of
bringing RISC-V to servers: the server was dominated by PC clones for a
reason.

------
bob1029
I don't think quite yet. As others have noted, porting x86 apps to ARM can be
fraught with issues concerning memory models and concurrency. Especially apps
written using unmanaged languages. Newer apps written in things like .NET Core
(especially if you can keep any native dependencies out of the equation) are
probably going to be a lot easier to port when the time is right.

I think we'd have to be at a point where the ARM server is <50% the cost of
the x86 server while offering equivalent real-world performance to make the
jump worth it for the average shop. You'd also have to have a very accessible
ecosystem of reliable ARM machines that developers could purchase and hack on.
There are many businesses that will happily incinerate millions of dollars to
keep x86 around just because changing things is frowned upon or otherwise
scary.

For some applications ARM is today and it's an excellent approach. But, for
most it's still somewhere on the horizon.

------
rwmj
Can you meaningfully benchmark stuff in the cloud? It seems to make any claims
on price/performance you'd want to use two local servers dedicated to the
benchmark, with as near identical hardware as possible (apart from the CPUs of
course), and you'd want to know the full cost of both servers.

~~~
jafingi
If you run things in cloud it makes sense to benchmark that?

~~~
rwmj
Sure - you're benchmarking something a bit different from whether ARM is ready
for "server dominance". You're essentially benchmarking Amazon's prices to
early adopters of ARM vs the cutthroat x86 cloud marketplace. That may be
interesting for many people but tells you little about ARM hardware.

~~~
glommer
You can't benchmark "server dominance".

You can benchmark what you just described, plus piece other pieces of the
puzzle together, like Nuvia Series-A, and then extrapolate from that.

At the end it's still my opinion, I don't have a crystal ball =)

------
emmanueloga_
Question for Common Lisp hackers: I'm curious how good is the quality of the
most popular CL compilers on ARM.

The table of support for ARM of sbcl [1] and ccl [2] shows only Linux is
supported on ARM (versus all sorts of OSs for x86/AMD64).

I imagine the Intel targets of these compilers are a lot more widely used and
hence had had more opportunities for bug ridding.

[1]: [http://www.sbcl.org/platform-table.html](http://www.sbcl.org/platform-
table.html)

[2]: [https://ccl.clozure.com/](https://ccl.clozure.com/)

~~~
aidenn0
ARM64 Linux on sbcl is okay, and steadily improving. IIRC there's a port to
one of the BSDs (Free?) that is almost done.

Obviously there's no ARM64 windows or solaris port in the works.

~~~
mugsie
There is a port of windows already out and running on arm - MS even released a
laptop running it ... [https://www.microsoft.com/en-us/p/surface-
pro-x/8vdnrp2m6hhc...](https://www.microsoft.com/en-us/p/surface-
pro-x/8vdnrp2m6hhc?activetab=overview)

~~~
aidenn0
I guarantee you no SBCL devs have one of those.

------
Tepix
I'd like to see cheap ARM server hardware for small servers _available for
purchase_. Something relatively low power (also power efficient) that can
replace an entry-level Intel Atom server like those offered by Kimsufi (OVH)
and Online.net.

The Raspberry Pi 4 with 4GB RAM is getting close in terms of performance but
it lacks some things I'd like to see in a server, i.e. at least two SATA or
NVME ports and two LAN ports.

~~~
8fingerlouie
Depending on your preferences, the Helios4 might do the trick
([https://kobol.io/helios4/](https://kobol.io/helios4/)).

Only 2GB RAM, but 4 SATA ports.

Otherwise, Hardkernel.com might have something in their Odroid lineup you
might like.

~~~
Tepix
It lacks a second LAN port but I guess you could use USB 3.0 for that.

But the CPU with its Dual Core Cortex A9 (2011) is probably really slow.

------
contingencies
If you want to acquire large numbers of ARM boards for servers, our neighbours
in Zhongshan are Firefly. They make a cluster server capable of 11 x their own
RK3399 6 core 64-bit 'core boards', so 66 cores in total. It's cheap.
[http://shop.t-firefly.com/goods.php?id=111](http://shop.t-firefly.com/goods.php?id=111)

------
birdyrooster
At Kubecon this year, there were three different vendors doing ARM based
storage. Two were capable with NVMe and one only with SATA/SAS. I am sure the
answer to the article's title question is a no for right now, but in terms of
disaggregated storage, I think the answer is yes!

------
luord
The fact that ARM is written as Arm really threw me for a loop here. At first,
I thought they were talking about a new programming language or something.

------
pnako
Yes. ARM and MSP430 will surely dominate servers anytime soon.

------
techie128
Comparing EBS backed instance performance with an NVME backed x86 is plain
wrong. I agree with the rest of the benchmark though.

~~~
glommer
That comparison was never done. All CPU tests are on EBS backed instances, and
in the end the I/O subsystem is compared in isolation for NVMe-backed
instances in both cases.

