
Ampere EMAG 64bit Arm Workstation - robin_reala
https://store.avantek.co.uk/ampere-emag-64bit-arm-workstation.html
======
glangdale
I've used a machine with this chip (remotely). It's not terrible; Daniel
Lemire blogs about some early performance impressions here
[https://lemire.me/blog/2019/03/26/hasty-comparison-
skylark-a...](https://lemire.me/blog/2019/03/26/hasty-comparison-skylark-arm-
versus-skylake-intel/) and I've ported simdjson to it
([https://github.com/lemire/simdjson](https://github.com/lemire/simdjson)).

I'm definitely not at the stage where I can say anything definitive about
performance. For a SIMD-intensive task, a preliminary run on a 3.3Ghz eMag
core gets about 400 megabytes/second of JSON parsing relative to 2.2
gigabytes/second on a 4.0Ghz Skylake. This may come down to having a maximum
of 2 x 128-bit NEON operations per cycle vs 3 x 256 AVX2 operations per cycle,
as well as some clock speed differences. Will eventually post about this at
branchfree.org when I don my Nomex long johns (required whenever doing any
benchmark that might imply to anyone that any processor runs faster or slower
than any other processor).

~~~
linux4kix
Just tested the Daniel's tests against our 16 Core LX2160 that is still only
clocked at 1.9Ghz (will be 2.2Ghz final production). These are our numbers

binarytrees: 37s mandelbrot: 13.6s fasta: 0.9s

I will give your benchmark a run once I have some time.

~~~
linux4kix
Okay I couldn't help myself, waiting for a build to finish.

./parse jsonexamples/twitter.json

Min: 0.00171284 bytes read: 631514 Gigabytes/second: 0.368695

~~~
glangdale
Whoa, thanks. 368, huh? That's pretty impressive given the 1.9Ghz clock
speed... By coincidence, the 'twitter.json' file was the one I was using as my
slender data point.

Do note that at this stage the benchmark is so preliminary as to be largely
meaningless - I haven't really done much more than eyeball the results. But as
a preliminary data point, that seems to be a stronger showing clock-for-clock
than the eMag one.

~~~
linux4kix
let me know if you want this data in github. Just testing our overclocking
bring cpu clock up to 2.4Ghz and DDR to 3200

Also building with gcc 8.2 not I needed to change -march=native to
-march=armv8-a+crypto

Min: 0.00111424 bytes read: 631514 Gigabytes/second: 0.566768

------
wmf
Packet also has eMAG for $1/hour in the cloud if you want to try before you
buy: [https://www.packet.com/cloud/servers/c2-large-
arm/](https://www.packet.com/cloud/servers/c2-large-arm/)

Geekbench scores:
[https://browser.geekbench.com/v4/cpu/compare/11678329?baseli...](https://browser.geekbench.com/v4/cpu/compare/11678329?baseline=12589322)

~~~
reikonomusha
Looks like Packet is sold out in all regions.

------
andersm
Out of personal interest, I did some very unscientific benchmarking by timing
a build of the Yocto core-image-sato distro for the BeagleBone. All sourcecode
was downloaded beforehand, so that does not factor into the results. All the
machines were running Ubuntu 18.04 Server except the A10, which runs Ubuntu
18.04 Desktop.

Here are the results:

    
    
       34m9.347s  EPYC 7401P   24C/48T 2.2GHz       (Packet c2.medium.x86 bare metal server)
       75m14.661s eMAG         32C/32T 3.3GHz       (Packet c2.large.arm bare metal server)
       96m31.901s i5-8259U      4C/8T  2.30-3.80GHz (Intel NUC8I5, NVMe SSD)
      139m52.184s ThunderX     96C/96T 2.0GHz       (Packet c1.large.arm bare metal server)
      194m52.745s A10-6800K     4C/4T  4.1GHz       (Old self-built desktop, slow-ish SSD)
      535m52.642s Celeron N3150 4C/4T  1.60-2.08GHz (Gigabyte Brix, SSD)
    

I assume the eMAG results will improve somewhat once its support matures, but
the difference to the i5 is disappointingly small. Both ARM machines performed
reasonably well when all cores were used, but the relatively weak per-core
performance showed whenever utilization fell. But although the ThunderX was
slow, looking at 96 cores in htop felt pretty good...

------
topspin
That case surprised me. I just built an Intel based Linux system using that
exact case; it's a "be quiet!" Pure Base 600 Black [1]. They "debranded" the
case in the photo with a low-effort edit.

[1]
[https://www.newegg.com/Product/Product.aspx?Item=9SIA68V57M4...](https://www.newegg.com/Product/Product.aspx?Item=9SIA68V57M4892)

~~~
kencausey
Slightly off-topic but what do you think of the case at this point?

~~~
jayalpha
Looks ugly, and why not buy a rack?

~~~
mrweasel
Racks are terrible as desktops

~~~
jayalpha
Why?

~~~
sliken
Generally rack based equipment is designed with less vertical height and
strict front to back airflow. They are also often designed for relatively high
input temperatures, so they are designed for high airflow rates. These combine
to be very loud and often crazy inefficient. Read that as consuming
significant power to cool the equipment.

Mini/Mid/Full tower cases generally are designed to take advantage of heat
wanting to raise. So the intakes are often large and low (140mm is not
unusual), and the top rear for the exhaust. Even 200mm isn't unusual for the
exhaust. Air moving efficient increases quickly with fan size. 1U fans often
move at 15k rpm and make more noise and vibration than anything else. Desktop
fans are often 1200 rpm or lower and just take a few watts to dump substantial
heat.

As an example the Fractal Design Mini C (a small, quiet, under $100 case) has
room for 2 x 140mm in the lower front, and 2 x 140mm on top. It's a smaller
case, so there's only 120mm in the rear. With $80 ish for the case and a few
extra fans (Fractal design isn't bad, but not quite class leading) you can
easily dump a few 100 watts quietly.

Find a rack mount case that can move as much air as quietly is challenging and
when possible often prohibitively expensive and/or crazy loud. Last time I
build one to house a single socket motherboard and a GPU (much like a desktop)
it included 4 delta fans that I needed ear protection on to be in the same
room.

~~~
jayalpha
I don't see your point. Maybe it is true for off the shelf stuff. I may assume
that a lot of YC readers build their own system.

I just build a rack for my desk. I have not really put it under load, but I
have to see the GPU, Power supply or Case fan spinning yet. For now I only
have only once fan for the case. The only fan that moves, but silent is the
CPU fan. The only thing that you can hear: The 10 TB HDD.

------
fotcorn
The only available CPU from Ampere EMAG seems to be the 8180, so I think this
workstation also uses this.

Wikichip:
[https://en.wikichip.org/wiki/ampere_computing/emag/8180](https://en.wikichip.org/wiki/ampere_computing/emag/8180)

AnandTech: [https://www.anandtech.com/show/14141/ampere-emag-in-the-
clou...](https://www.anandtech.com/show/14141/ampere-emag-in-the-cloud-32-arm-
core-instance-for-1hr)

TL;DR: 32 Cores, up to 3.3 Ghz boost, 32MB L3 cache, 125W TDP

~~~
wmf
Where have I seen that model number before... oh yeah:
[https://ark.intel.com/content/www/us/en/ark/products/120496/...](https://ark.intel.com/content/www/us/en/ark/products/120496/intel-
xeon-platinum-8180-processor-38-5m-cache-2-50-ghz.html)

~~~
0x8BADF00D
Speaking of Xeon, you could probably put together a much cheaper workstation
with an older Xeon over the eMag. I just don’t see myself picking this up, at
its current price point.

~~~
zokier
Without knowing what the perf is like, it is really difficult to say. 32 cores
@ 2.8-3.3 GHz sounds like quite a lot, I don't think getting 32 core Xeon
would be anywhere near as cheap as this.

~~~
scruffyherder
there is a chinese motherboard that'll drive 2 xeon processors. It fits in a
E-ATX case. It should be ~ $250 USD shipped.

I used some cheap ($20 USD!) Xeons (E5-2620v2) and have 24 cores.. so it's
doable for $500.

~~~
mrslave
To which model of motherboard dost thou allude?

------
nottorp
Is there anything similar but more hobbyist priced? DIY would be fine, but all
motherboards that I know of don’t have ram slots or other desktop like
amenities

------
linux4kix
[https://www.cnx-software.com/2019/03/29/clearfog-itx-
worksta...](https://www.cnx-software.com/2019/03/29/clearfog-itx-workstation-
ultimate-arm-developer-platform/)

~~~
nottorp
That link whined at me for denying targeted advertising instead of showing me
the article so I guess I wont ever know what its about...

~~~
linux4kix
[https://www.solid-run.com/nxp-lx2160a-family/clearfog-
itx/](https://www.solid-run.com/nxp-lx2160a-family/clearfog-itx/)

better :) we just announced it.

~~~
floatboth
Hi, I posted a comment on cnx-software but it's awaiting moderation, reposting
here:

> work with Linaro on the work they did for the MCBin to get a UEFI plugin
> that uses a QEMU emulation layer for supporting booting Video Card BIOS

Linaro? I thought the EDK2 for the MCbin was maintained by Semihalf :)

Anyway, 16 cores and dual-channel RAM for $500 is a lot more interesting than
4 cores and single-channel for $269. A few questions:

Will the EDK2 tree be fully open source? There was that one blob on the MCbin…

Will the core be overclockable?

~~~
linux4kix
Semihalf is maintaining the tree and patches, but I believe the original work
done to support using a GPU on the platform was done by Linaro. I could be
mis-remembering but I believe that is how things happened.

We are waiting for the final release of the EDK2 tree from NXP. I will refrain
from promising anything until that is integrated.

The cores are not overclockable. 2.2Ghz will be the limit.

------
admax88q
At this price I would rather buy a Raptor TALOS II or Blackbird.

~~~
floatboth
With Talos II, you'd get a workstation with only _4 cores_ at this price.
Blackbird board + 8-core CPU bundle is a bit better, but now you're limited by
the small form factor.

~~~
yjftsjthsd-h
I'm not sure what performance looks like on 4 POWER cores vs 32 ARM cores;
seems like it _might_ win. Genuinely curious; would love to see benchmarks.

~~~
DCKing
Those 4 POWER9 cores (16 threads with its 4-way SMT) will obliterate this ARM.
Depending on the benchmark, this 32 core ARM box will do 1/2 the single thread
performance and about 2.5x the multithreaded performance of my venerable Sandy
Bridge i7 desktop (so it does 2.5x the MT performance with 8x the amount of
physical cores). This is a hardware class from 2011, which you can usually
pick up for under $250 from eBay in second-hand corporate desktops. On the
other hand, the POWER9 in the Talos II can go toe-to-toe with Intel's Skylake
in both single thread and especially multithreaded use cases.

The reason to buy a workstation like this is not performance, but rather being
able to natively develop ARM software locally. That's the only real killer
feature it has over Intel/AMD/IBM processors, but a legitimate reason to buy
it given the availability of ARM servers at cloud providers.

~~~
floatboth
[https://lemire.me/blog/2019/03/26/hasty-comparison-
skylark-a...](https://lemire.me/blog/2019/03/26/hasty-comparison-skylark-arm-
versus-skylake-intel/)

Daniel Lemire's single-threaded Mandelbrot benchmark: 15s on the Ampere eMAG
(Skylark), 24s on a 4GHz Skylake. (Also wins in bitset_count.) It's definitely
faster 1/2 of Sandy Bridge.

eMAG will have a disadvantage in SIMD, but for normal workloads (make -j32 on
a huge project :D) it should be plenty fast.

~~~
DCKing
I don't think that that Mandelbrot benchmark is single threaded. It would mean
that this processor is _much_ faster than Skylake in computational
instructions per clock, which would be a huge game changer (and it clearly
isn't). That 25s vs 18s figure makes a lot more sense if the benchmark is
multithreaded, since Skylake is around 1.7x to 2x faster than Sandy Bridge.

Moreover, the benchmark source code [0] clearly uses OpenMP to parallelize the
benchmark tasks.

[0]: [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/program/mandelbrot-gcc-6.html)

~~~
floatboth
His code [https://github.com/lemire/Code-used-on-Daniel-Lemire-s-
blog/...](https://github.com/lemire/Code-used-on-Daniel-Lemire-s-
blog/blob/master/2019/03/26/mandelbrot.c) does not use OpenMP.

Different benchmarks favor different processors… Here's another one, now
multi-threaded

My quick run of `sysbench cpu` (something with prime numbers) shows:

Skylark 3.3GHz 32core (at Packet; Ubuntu 18.04):

    
    
        events per second: 44287.87
    

Zen 3.85GHz 8c16t (my desktop; FreeBSD 13-CURRENT):

    
    
        events per second: 14632.61
    

And with a single thread, 1386.24 vs 1740.91. Divided by clock speed, it's
about 93% of the performance.

~~~
kgardas
You can't really benchmark anything on FreeBSD X-CURRENT. There is a ton of
debug code there which slowdowns whole system including its libc. Very unfair
comparison be warned!

------
joshka
Is there some context on what is notable about this?

~~~
etaioinshrdlu
I've never seen a high-performance workstation based on an ARM cpu before.

~~~
dragontamer
Thunder X2 Arm servers came out first. But its good to see competition as
always.

------
newnewpdro
Anyone know where this machine would stand with regards to Respects Your
Freedom [1] certification barriers?

[1] [https://www.fsf.org/resources/hw/endorsement/respects-
your-f...](https://www.fsf.org/resources/hw/endorsement/respects-your-freedom)

~~~
Skunkleton
I cannot comment on this chip in particular, but my experience is that most
arm platforms have unavoidable non-free firmware.

------
dis-sys
"As low as £2,255.00"

GBPUSD is trading at 1.3

------
auvi
just wondering, is any US based shop selling similar machines?

