Hacker News new | past | comments | ask | show | jobs | submit login
Ampere EMAG 64bit Arm Workstation (avantek.co.uk)
49 points by robin_reala on Mar 30, 2019 | hide | past | web | favorite | 62 comments



I've used a machine with this chip (remotely). It's not terrible; Daniel Lemire blogs about some early performance impressions here https://lemire.me/blog/2019/03/26/hasty-comparison-skylark-a... and I've ported simdjson to it (https://github.com/lemire/simdjson).

I'm definitely not at the stage where I can say anything definitive about performance. For a SIMD-intensive task, a preliminary run on a 3.3Ghz eMag core gets about 400 megabytes/second of JSON parsing relative to 2.2 gigabytes/second on a 4.0Ghz Skylake. This may come down to having a maximum of 2 x 128-bit NEON operations per cycle vs 3 x 256 AVX2 operations per cycle, as well as some clock speed differences. Will eventually post about this at branchfree.org when I don my Nomex long johns (required whenever doing any benchmark that might imply to anyone that any processor runs faster or slower than any other processor).


Just tested the Daniel's tests against our 16 Core LX2160 that is still only clocked at 1.9Ghz (will be 2.2Ghz final production). These are our numbers

binarytrees: 37s mandelbrot: 13.6s fasta: 0.9s

I will give your benchmark a run once I have some time.


Okay I couldn't help myself, waiting for a build to finish.

./parse jsonexamples/twitter.json

Min: 0.00171284 bytes read: 631514 Gigabytes/second: 0.368695


Whoa, thanks. 368, huh? That's pretty impressive given the 1.9Ghz clock speed... By coincidence, the 'twitter.json' file was the one I was using as my slender data point.

Do note that at this stage the benchmark is so preliminary as to be largely meaningless - I haven't really done much more than eyeball the results. But as a preliminary data point, that seems to be a stronger showing clock-for-clock than the eMag one.


let me know if you want this data in github. Just testing our overclocking bring cpu clock up to 2.4Ghz and DDR to 3200

Also building with gcc 8.2 not I needed to change -march=native to -march=armv8-a+crypto

Min: 0.00111424 bytes read: 631514 Gigabytes/second: 0.566768


Sure I will keep an eye on your updates and keep testing.

I would also like to point out that the LX2160 is also a TDP of 32Watts compared to the 125W TDP of the eMag.


Packet also has eMAG for $1/hour in the cloud if you want to try before you buy: https://www.packet.com/cloud/servers/c2-large-arm/

Geekbench scores: https://browser.geekbench.com/v4/cpu/compare/11678329?baseli...


Looks like Packet is sold out in all regions.


Out of personal interest, I did some very unscientific benchmarking by timing a build of the Yocto core-image-sato distro for the BeagleBone. All sourcecode was downloaded beforehand, so that does not factor into the results. All the machines were running Ubuntu 18.04 Server except the A10, which runs Ubuntu 18.04 Desktop.

Here are the results:

   34m9.347s  EPYC 7401P   24C/48T 2.2GHz       (Packet c2.medium.x86 bare metal server)
   75m14.661s eMAG         32C/32T 3.3GHz       (Packet c2.large.arm bare metal server)
   96m31.901s i5-8259U      4C/8T  2.30-3.80GHz (Intel NUC8I5, NVMe SSD)
  139m52.184s ThunderX     96C/96T 2.0GHz       (Packet c1.large.arm bare metal server)
  194m52.745s A10-6800K     4C/4T  4.1GHz       (Old self-built desktop, slow-ish SSD)
  535m52.642s Celeron N3150 4C/4T  1.60-2.08GHz (Gigabyte Brix, SSD)
I assume the eMAG results will improve somewhat once its support matures, but the difference to the i5 is disappointingly small. Both ARM machines performed reasonably well when all cores were used, but the relatively weak per-core performance showed whenever utilization fell. But although the ThunderX was slow, looking at 96 cores in htop felt pretty good...


That case surprised me. I just built an Intel based Linux system using that exact case; it's a "be quiet!" Pure Base 600 Black [1]. They "debranded" the case in the photo with a low-effort edit.

[1] https://www.newegg.com/Product/Product.aspx?Item=9SIA68V57M4...


Slightly off-topic but what do you think of the case at this point?


It's well engineered and the tooling and finish are very high quality. Everything snicks together with great precision.

I would not build a high end gaming machine with it. Motherboard tray cable routing cutouts are not well positioned if you have a large, high feature motherboard. I knew this but as I used an mATX board I didn't care. Also, airflow is limited by the solid front panel as is always the case with 'quite' designs.

The supplied fans are excellent but only one front case fan is supplied; I knew this and obtained a second "be quiet!" 140mm fan for my build.

I have mixed feelings about the gauge of steel. Typically quiet focused cases of the sort I've been using for many years now rely in part on heavy steel. The steel in this case is comparatively thin; similar to what you get with OEM machines from Dell or HP. On one hand I miss the rigidity of prior cases, on the other I've been surprised at how happy I am with the reduction in weight.

The size is perfect. It's a little larger in every dimension than a traditional mid tower making assembly and changes easier.

This is a bit of a boutique product; Amazon doesn't have this exact model and it took a while for shipment through newegg; it shipped from the manufacturers US warehouse and took extra time.

I would buy it again.

I did a double take when I saw the Arm workstation but I suppose it's not really surprising. The market is full of windowed, LED riddled gaming cases on one hand and low end 'value' stuff on the other. There are few quality 'grown up' looking cases available. I had 'workstation' in mind when I went hunting for a case and I imagine that's what Ampere was thinking as well; we made thoughtful choices and ended up in the same place.


Looks ugly, and why not buy a rack?


Racks are terrible as desktops


Why?


Generally rack based equipment is designed with less vertical height and strict front to back airflow. They are also often designed for relatively high input temperatures, so they are designed for high airflow rates. These combine to be very loud and often crazy inefficient. Read that as consuming significant power to cool the equipment.

Mini/Mid/Full tower cases generally are designed to take advantage of heat wanting to raise. So the intakes are often large and low (140mm is not unusual), and the top rear for the exhaust. Even 200mm isn't unusual for the exhaust. Air moving efficient increases quickly with fan size. 1U fans often move at 15k rpm and make more noise and vibration than anything else. Desktop fans are often 1200 rpm or lower and just take a few watts to dump substantial heat.

As an example the Fractal Design Mini C (a small, quiet, under $100 case) has room for 2 x 140mm in the lower front, and 2 x 140mm on top. It's a smaller case, so there's only 120mm in the rear. With $80 ish for the case and a few extra fans (Fractal design isn't bad, but not quite class leading) you can easily dump a few 100 watts quietly.

Find a rack mount case that can move as much air as quietly is challenging and when possible often prohibitively expensive and/or crazy loud. Last time I build one to house a single socket motherboard and a GPU (much like a desktop) it included 4 delta fans that I needed ear protection on to be in the same room.


I don't see your point. Maybe it is true for off the shelf stuff. I may assume that a lot of YC readers build their own system.

I just build a rack for my desk. I have not really put it under load, but I have to see the GPU, Power supply or Case fan spinning yet. For now I only have only once fan for the case. The only fan that moves, but silent is the CPU fan. The only thing that you can hear: The 10 TB HDD.


The only available CPU from Ampere EMAG seems to be the 8180, so I think this workstation also uses this.

Wikichip: https://en.wikichip.org/wiki/ampere_computing/emag/8180

AnandTech: https://www.anandtech.com/show/14141/ampere-emag-in-the-clou...

TL;DR: 32 Cores, up to 3.3 Ghz boost, 32MB L3 cache, 125W TDP


If you click customize/add to cart you see a breakdown with the CPU listed separately as: Ampere eMAG 8180 32 core 2.8GHz 3MB L3 (Turbo 3.3GHz) .

So you are right.


Where have I seen that model number before... oh yeah: https://ark.intel.com/content/www/us/en/ark/products/120496/...


Speaking of Xeon, you could probably put together a much cheaper workstation with an older Xeon over the eMag. I just don’t see myself picking this up, at its current price point.


Without knowing what the perf is like, it is really difficult to say. 32 cores @ 2.8-3.3 GHz sounds like quite a lot, I don't think getting 32 core Xeon would be anywhere near as cheap as this.


there is a chinese motherboard that'll drive 2 xeon processors. It fits in a E-ATX case. It should be ~ $250 USD shipped.

I used some cheap ($20 USD!) Xeons (E5-2620v2) and have 24 cores.. so it's doable for $500.


To which model of motherboard dost thou allude?


How convenient to use the same number as the top bin Xeon CPU, eh? Can't copyright a number though, of course.


Hilariously enough, that's why it was called a Pentium and not a 586.


Yep, I'm definitely familiar with the history coming from Intel :)


It's also based on the Skylark microarchitecture.


Is there anything similar but more hobbyist priced? DIY would be fine, but all motherboards that I know of don’t have ram slots or other desktop like amenities



That link whined at me for denying targeted advertising instead of showing me the article so I guess I wont ever know what its about...



Hi, I posted a comment on cnx-software but it's awaiting moderation, reposting here:

> work with Linaro on the work they did for the MCBin to get a UEFI plugin that uses a QEMU emulation layer for supporting booting Video Card BIOS

Linaro? I thought the EDK2 for the MCbin was maintained by Semihalf :)

Anyway, 16 cores and dual-channel RAM for $500 is a lot more interesting than 4 cores and single-channel for $269. A few questions:

Will the EDK2 tree be fully open source? There was that one blob on the MCbin…

Will the core be overclockable?


Semihalf is maintaining the tree and patches, but I believe the original work done to support using a GPU on the platform was done by Linaro. I could be mis-remembering but I believe that is how things happened.

We are waiting for the final release of the EDK2 tree from NXP. I will refrain from promising anything until that is integrated.

The cores are not overclockable. 2.2Ghz will be the limit.


Ah, pretty. Hoping the 4x10 Gbe is optional, since I bet it costs as much as the rest of the board ;)


> The 10-100G ports are no extra cost. It is available on the SOC and brought out on the COM Express pins so we wanted it available for those that could use it. Additionally these are raw SERDES lanes so in theory could be attached to a PCIe expansion cage.

https://www.cnx-software.com/2019/03/29/clearfog-itx-worksta...

Looks like anyone who makes these 16-core A72 chips is making them for networking (this NXP one, Mellanox Bluefield is a somewhat similar idea I think). So you can't not pay the cost of the NIC that's on the chip already…


At this price I would rather buy a Raptor TALOS II or Blackbird.


With Talos II, you'd get a workstation with only 4 cores at this price. Blackbird board + 8-core CPU bundle is a bit better, but now you're limited by the small form factor.


I'm not sure what performance looks like on 4 POWER cores vs 32 ARM cores; seems like it might win. Genuinely curious; would love to see benchmarks.


Those 4 POWER9 cores (16 threads with its 4-way SMT) will obliterate this ARM. Depending on the benchmark, this 32 core ARM box will do 1/2 the single thread performance and about 2.5x the multithreaded performance of my venerable Sandy Bridge i7 desktop (so it does 2.5x the MT performance with 8x the amount of physical cores). This is a hardware class from 2011, which you can usually pick up for under $250 from eBay in second-hand corporate desktops. On the other hand, the POWER9 in the Talos II can go toe-to-toe with Intel's Skylake in both single thread and especially multithreaded use cases.

The reason to buy a workstation like this is not performance, but rather being able to natively develop ARM software locally. That's the only real killer feature it has over Intel/AMD/IBM processors, but a legitimate reason to buy it given the availability of ARM servers at cloud providers.


https://lemire.me/blog/2019/03/26/hasty-comparison-skylark-a...

Daniel Lemire's single-threaded Mandelbrot benchmark: 15s on the Ampere eMAG (Skylark), 24s on a 4GHz Skylake. (Also wins in bitset_count.) It's definitely faster 1/2 of Sandy Bridge.

eMAG will have a disadvantage in SIMD, but for normal workloads (make -j32 on a huge project :D) it should be plenty fast.


I don't think that that Mandelbrot benchmark is single threaded. It would mean that this processor is much faster than Skylake in computational instructions per clock, which would be a huge game changer (and it clearly isn't). That 25s vs 18s figure makes a lot more sense if the benchmark is multithreaded, since Skylake is around 1.7x to 2x faster than Sandy Bridge.

Moreover, the benchmark source code [0] clearly uses OpenMP to parallelize the benchmark tasks.

[0]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


His code https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/... does not use OpenMP.

Different benchmarks favor different processors… Here's another one, now multi-threaded

My quick run of `sysbench cpu` (something with prime numbers) shows:

Skylark 3.3GHz 32core (at Packet; Ubuntu 18.04):

    events per second: 44287.87
Zen 3.85GHz 8c16t (my desktop; FreeBSD 13-CURRENT):

    events per second: 14632.61
And with a single thread, 1386.24 vs 1740.91. Divided by clock speed, it's about 93% of the performance.


You can't really benchmark anything on FreeBSD X-CURRENT. There is a ton of debug code there which slowdowns whole system including its libc. Very unfair comparison be warned!


"obliterate" probably not. Looks like eMAG cores are pretty capable. What sucks is memory controller probably or L3 cache latency.


> 16 threads with its 4-way SMT

Oh, my mistake; I thought it was 4 logical cores (/threads) vs 32 logical cores. If it's 16 vs 32 logical cores, POWER will positively destroy ARM.


It is 16 logical cores vs 32 physical.


Me too! I do have remote access to POWER9, but this machine is kind of sick since its results are completely abysmal. As I don't know its configuration it may be whatever. I do have access to POWER8 which I know very well and I can benchmark that. Ubuntu 18.04 LTS, gcc 7.3.0:

long lived tree of depth 21 check: 4194303 real 18.00 user 17.97 sys 0.03 pixels[15476] = 7 real 7.61 user 7.61 sys 0.00 tccgcggatttaccatcctc ctgatgttaattctctgtggtcagatacagaccaaaaac real 1.18 user 1.18 sys 0.00


The 32 core ARM is 30% faster than a 8 core Ryzen on Geekbench [0]. It's not exactly slow but it also has 4 times as many cores which is pretty depressing.

[0] https://browser.geekbench.com/v4/cpu/compare/12621419?baseli...


The results are rather suspicious (SGEMM especially). How did people even get geekbench for linux-gnu-aarch64? Ripped out of the Android version?

With sysbench cpu (prime number something), it's 3 times faster than my Ryzen (8c16t 3.85GHz).


at that price i don't think punters are the target demographic.

if your build target is an arm environment (industrial, embeded?) or you are a uni teaching/researching arm isa, then i imagine it would be a pretty nice bit of kit.


If your target is some embedded device, you will still end up cross compiling for your target. ISA is only part of the picture.


Is there some context on what is notable about this?


I've never seen a high-performance workstation based on an ARM cpu before.


Thunder X2 Arm servers came out first. But its good to see competition as always.


This is what Linus was asking for, in order for ARM to make strides in the server market.


The price doesn't momentarily stop respiration either. If I needed a pile of cores for something I'd consider it.


Yes, can we at least get some benchmarks?


Anyone know where this machine would stand with regards to Respects Your Freedom [1] certification barriers?

[1] https://www.fsf.org/resources/hw/endorsement/respects-your-f...


I cannot comment on this chip in particular, but my experience is that most arm platforms have unavoidable non-free firmware.


"As low as £2,255.00"

GBPUSD is trading at 1.3


just wondering, is any US based shop selling similar machines?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: