
Erlang: Pi2 ARM Cluster vs. Xeon VM - signa11
https://medium.com/@pieterjan_m/erlang-pi2-arm-cluster-vs-xeon-vm-40871d35d356#.uj7dm7a5a
======
Aissen
It was tried before, with professional ARM server hardware:

[http://www.anandtech.com/show/8776/arm-challinging-intel-
in-...](http://www.anandtech.com/show/8776/arm-challinging-intel-in-the-
server-market-an-overview/9)

[http://www.cnx-software.com/2014/10/26/applied-micro-x-
gene-...](http://www.cnx-software.com/2014/10/26/applied-micro-x-gene-64-bit-
arm-vs-intel-xeon-64-bit-x86-performance-and-power-usage/)

Spoiler: Intel still has the upper hand (for now), at least in perf-per-watt.

Even the green500 ([http://green500.org/news/green500-list-
november-2015?q=lists...](http://green500.org/news/green500-list-
november-2015?q=lists/green201511) ) is dominated by Xeon-based clusters.

~~~
jdietrich
Bear in mind that the X-Gene is on a 35nm node and the A1100 is on 28nm,
versus 14nm for Xeon-D. Applied Micro have announced a 16nm X-Gene 3 for this
year via TSMC, and Qualcomm have a 14nm server processor in development via
GlobalFoundries. We'd expect at least a 30% reduction in power from this
change in process node.

I think we could well see ARM server processors close the gap with Intel by
the end of this year, which is an exciting prospect.

IMO ARM's trump card is the diversity of their IP ecosystem. This has been
integral to their success in the mobile SoC market and could become a genuine
game-changer in the server market.

------
claudius
Keep in mind that this Xeon (E5620) was released in 2010, whereas the Cortex
A7-Quad seems to be from 2015? So it would be interesting to compare the ARM
cluster against a comparably new Intel CPU (say Haswell or Broadwell); in my
experience, even a mid-range notebook CPU (i5-5500U) from last year easily
outperforms Xeons (E5-2650) even from 2012.

~~~
vardump
> Keep in mind that this Xeon (E5620) was released in 2010, whereas the Cortex
> A7-Quad seems to be from 2015?

2015? Cortex A7 was released in 2011 to be a _lower performance_ energy
efficient in-order core. It's significantly slower per clock than Cortex A9
released in 2007!

If you want fairness, just compare it to any Xeon available in 2007 running
with just a few watts.

~~~
claudius
Wikipedia¹ lists a Cortex A7 „designed specifically for Raspberry Pi 2“. The
Raspberry Pi 2 was released in 2015, so the CPU was released in 2015?

[1]
[https://en.wikipedia.org/wiki/ARM_Cortex-A7](https://en.wikipedia.org/wiki/ARM_Cortex-A7)

~~~
vardump
From the Wikipedia article:

> Broadcom BCM2836 (quad-core A7 + VideoCore IV GPU), designed specifically
> for Raspberry Pi 2[7]

No, it says _BCM2836_ was designed specifically for Raspberry Pi 2. Cortex A7
is a pretty old design, probably that way RPi 2 costs could be kept low.

Edit:

Here's Anandtech 2011 article about Cortex A7:

[http://www.anandtech.com/show/4991/arms-
cortex-a7-bringing-c...](http://www.anandtech.com/show/4991/arms-
cortex-a7-bringing-cheaper-dualcore-more-power-efficient-highend-devices)

Wikipedia's Cortex series DMIPS/MHz table. It says Cortex A7 (1.9 DMIPS/MHz)
is roughly as fast per clock as Cortex A8 (2.0 DMIPS/MHz) released in _2003_ :

[https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores](https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores)

~~~
claudius
Interesting. So how would a reasonably new ARM core compare to this? The point
is, the Xeon is definitely outdated and the Cortex A7 seems to be a little bit
dusty as well, so…what exactly was the point of this comparison? Two CPUs from
two different (or equal, if you take 2010/2011) points in time are roughly
comparable?

~~~
darklajid
I feel the point of the comparison was answering the question

"Might it be feasible to replace our current infrastructure¹ with an arm based
solution², and at the same time: Can we benefit from more cores, lower clock
speeds"

1: The article uses their current dev system - there's no reason to buy a
shiny new setup if that dev system works fine today and is 'fast enough'

2: The article explicitly mentions upcoming arm8 server boards but uses RPi2s
as a general (and low) approximation for 'cheap arm performance' and to test
their setup in a multi-node environment (vs. 'one single vm')

I think the article explains quite well what it wanted to do and doesn't claim
'arm is better' or anything like that. For their workload, using Erlang/OTP
and their real world application, a multi-node arm-based architecture might be
feasible.

~~~
vidarh
But then pointing out the cost of their current system is confusing matters -
as I've pointed out in another comment, a similar performance system today can
easily be had for 1/5th of the price they list. They're comparing against a
system that is old, slow and expensive.

------
ansible
There's no argument that the Xeon server is going to be less power-efficient
than the RP2 cluster.

Before you go out an order a bunch, do keep in mind that the RP2 cluster
doesn't support ECC memory and other features often considered necessary in a
server environment, like remote management. I'm pleased to see the A1100 does
support ECC.

Still, it is an interesting test. I look forward to reading more about ARM
servers. And I hope the Mill Computing guys will ship something eventually
too.

~~~
derefr
> ECC memory ... remote management ... often considered necessary in a server
> environment

Do note that an idiomatic Erlang cluster (where the distribution of the system
is used to achieve fault-tolerance, rather than for higher throughput) doesn't
strictly need ECC memory (though, obviously, it wouldn't hurt.)

Also, such a system _should_ be plenty manageable on its own—insofar as even
an Erlang unikernel makes for a relatively ops-friendly system. There's little
"remote management" can do that can't be done from an Erlang remsh. (Erlang
even offers easy ways to expose e.g. SNMP MIBs.)

If you're talking about being able to re-image the system into being a
completely non-Erlang system remotely, that might be hard—but a cluster-of-
inexpensive-low-power-nodes setup like this only really has this one niche
use-case to begin with, so I'm not sure what else you'd want to change it
into.

~~~
ansible
ECC can prevent program crashes, but it can also prevent computation of
incorrect results. Even if your memory is basically reliable, cosmic rays
still can flip bits. As RAM sizes go up and up, your chances of seeing a
memory error increase.

My point, mainly, if we're talking about costs and power consumption, it isn't
quite fair to compare server-grade hardware (which is more expensive than
consumer-grade) with a RP2.

------
insanebits
The test seems to be biased towards Pi's. Firstly because of VM ram, who gives
3GB RAM for 2 core VM? When typical server from 2010 has 72-96GB ram, which
would really typically have at least 8GB ram, which would really be quite a
different story.

Other part about the prices - you can have used gen6 server for under $400
which is comparable to RPi's considering that you're getting at the very least
6 times the performance(if you buy it with 2xL5640 processors with 6 cores
each). Which means would need at least 18(3*6) RPis to match the performance
in theory. Which would total close to $700(assuming $35 per pi).

Raspberry's are not really meant to be used as a heavy load server. They can
be used as a cheap cluster for learning purposes but not really a viable
option for replacing rack servers, at least today, but who knows maybe some
day they will get to the point of competing.

~~~
xt00
Really what somebody needs to do is say "I built a custom chipset that has 32
octa-core Allwinner A80's on it". It has a A15/A7 big little architecture. I
would guess the processors cost like $25 each. So the board would be like $800
+ maybe $200 more in other components like RAM and flash. So $1000 and you get
a cluster that has 256 cores and 32 parallel memory buses and 32 parallel
flash chips--seems like that $1000 would get you a pretty epic server. Plus
each of those chips have powerVR GPU's, so you would have some GPU capability
that Xeon servers don't have. Buying $35 raspberry pi's is basically $5 useful
stuff in there and $30 not useful overhead. So better to actually build a
legit server board covered in ARM processors and then actually compare how
well that $1000 is spent compared to a dual 6 core xeon blade.

~~~
insanebits
There are a GPU's with hundreds of cores if that's what you're after. And
there already are ARM server boards, for example quick google turned up:
[http://www.cavium.com/newsevents-GIGABYTE-
announces-384-Core...](http://www.cavium.com/newsevents-GIGABYTE-
announces-384-Core-2U-server-powered-by-Cavium-ThunderX-ARMv8-processors.html)

------
stevencorona
Pretty cool, but would love to see it compared with a newer xeon. I have a
xeon-d in my homelab that would be cool to compare it against— 8 2GHz cores @
45W. They have a 1.7GHz version @35W now, too.

~~~
lazyjones
I have an Atom C2750 board (8 core), it runs at 2.4 GHz with only a heatsink
and has a TDP of 20W. It'a much slower than current low-end desktop CPUs, but
it runs circles around a Raspberry Pi 2. Its unique advantage for server loads
is that the board holds 64GB ECC RAM. The Xeon-D is currently an even better
option though, since some of the server boards with soldered CPUs can be kept
fanless too (e.g. Supermicro X10SDV-4C-TLN4F ) despite 35W TDP and much better
performance.

~~~
aeroevan
ASRock C2750D4I? I've got one running my home NAS and love it.

------
kitsune_
What about the Xeon-D? Slap together a couple of those Mini-ITX SoC's
together... very low TDP.

~~~
newman314
I was just reading that the Xeon D is still 5.5x faster than the AMD Opteron
A1100 (which does not have significantly better TDP) so it seems like Xeon-D
is where it's at.

[http://www.anandtech.com/show/9956/the-silver-lining-of-
the-...](http://www.anandtech.com/show/9956/the-silver-lining-of-the-late-amd-
opteron-a1100-arrival)

I built a home VMware server a while ago using a 4670T which has similar
performance to a Xeon D but does not have ECC and tops out at 32GB. Works
pretty darn well even with the multiple VMs I have running on it. I think I
measured power consumption at the wall at 40+ish-55W so it's quite efficient.

------
sciurus
Form another take, her are benchmarks of a raspberry pi 2 and raspberry pi
zero versus a decade old and modern Intel chips.

[http://www.phoronix.com/scan.php?page=article&item=raspberry...](http://www.phoronix.com/scan.php?page=article&item=raspberry-
pi-burst&num=1)

~~~
TheOtherHobbes
That was very revealing - although maybe not a surprise to anyone who has
waited for a web page to load on a Pi.

~~~
fnordfnordfnord
Pretty much exactly what I expected, and matches my experience. Pi2 is almost
equivalent to a 10-15 year old low-end PC.

------
vidarh
Not much point in a price comparison like that when the prices are completely
crazy.

For the price of that blade today you can get _at least_ a 1U dual cpu quad-
or hex-core server of a far newer/faster CPU model, and with vastly faster
storage (NVMe SSD's) and 10GbE ports.

For 2000 euro, I could instead get a server with about a 50% faster CPU, 16GB
RAM, and an 800GB NVMe Intel PCIe SSD that'd trounce that iSCSI any time. With
6Gbps SATA III drives instead, and either dropping down a bit in capacity or
going for spinning rust, you'd pay ~1000 euro for the same machine.

~~~
insanebits
Exactly my point, prices are for the new servers when they were brand new, as
well as processors are really one of the worst xeons of that time. They E5600
suck a lot of power while not giving that good performance, there are x5600
for performance and L5600 for efficiency.

------
plumeria
Is the rack [1] custom made? Would like to see a bill of materials for this
setup.

[1]
[https://d262ilb51hltx0.cloudfront.net/max/2000/1*KdGdonIRAPy...](https://d262ilb51hltx0.cloudfront.net/max/2000/1*KdGdonIRAPyJegxmpKMe9Q.jpeg)

~~~
claudius
That looks a lot like a standard 3.5" HDD case as found in many desktop
computers with the Raspberry Pis possibly just lying in it? I might very well
be wrong, but it looks a lot like e.g.
[http://www.overclock3d.net/gfx/articles/2014/11/21055622736l...](http://www.overclock3d.net/gfx/articles/2014/11/21055622736l.jpg)
or [http://www.scythe-eu.com/uploads/tx_cfamooflow/Gekkou-HDD-
Ca...](http://www.scythe-eu.com/uploads/tx_cfamooflow/Gekkou-HDD-
Cage-2_03.jpg) .

~~~
plumeria
Yes, it looks indeed like those cases. Thanks.

------
geerlingguy
I was doing similar comparisons of a cluster of 6 Pi 2s vs my Core i7, as well
as DigitalOcean droplets, and I found that the i7 (quad core, single chip,
running 6 VMs) was at least 20% faster in real world use, and using six
DigitalOcean 1GB VMs was about 68% faster:
[http://www.pidramble.com/wiki/benchmarks/drupal](http://www.pidramble.com/wiki/benchmarks/drupal)

There are tons of different use cases of course, and if measuring only raw CPU
(not network, not IO, etc.), the benchmarks could be slightly closer, but I'm
not going to recommend clients start ditching cloud infra for colocated Pi
clusters :)

------
newman314
I wonder how this would compare using ODROID-C2 boards.

~~~
mkesper
Odroid-XUs maybe even more because of Sata?

~~~
ansible
Yeah, I like the ODROID-XU4:

[http://www.hardkernel.com/main/products/prdt_info.php](http://www.hardkernel.com/main/products/prdt_info.php)

The gigabit Ethernet alone is a reason to choose it over the RP2.

~~~
newman314
The C2 is case compatible with a Raspberry Pi as well as having GigE.

So it's almost a drop-in replacement depending on the distribution you use.
I'm trying to get OSMC to officially support the C2 which would essentially
make it a much more powerful media center.

[http://www.geek.com/news/new-odroid-dev-board-outmuscles-
a-r...](http://www.geek.com/news/new-odroid-dev-board-outmuscles-a-raspberry-
pi-for-just-5-more-1646393/)

------
eddd
Why agents here a so heavy? I mean, why entire cluster cannot handle more than
256 agent concurrently?

------
_ZeD_
This reminds me of the "Beowulf cluster" slashdot meme...

------
amelius
I wonder about other applications. Say, I have a farm for converting video
from any format to mp4. Would a cluster of Pi2s be a cost-effective solution?

