
AMD reveals its first ARM processor: 8-core Opteron A1100 - shawndumas
http://arstechnica.com/information-technology/2014/01/amd-reveals-its-first-arm-processor-8-core-opteron-a1100/
======
quackerhacker
I gotta give AMD massive credit...while I have mainly used intel in my setups,
AMD has really pushed their offerings.

First I remember everyone driving the price up of an AMD video card just
because bitcoin mining.

Second they got the backing in PS4 and Xbox One hardware.

Now an Arm 8-core CPU...although I find the clock speed (2GHz) kinda
underwhelming, still AMD's pricing would entice me to buy 2 for the price of 1
Intel i7

~~~
SwellJoe
How about also comparing the power usage? At 25 watts, you can get three of
these for one six core Intel CPU; so you're at 24 cores at 2 GHz vs 6 cores at
3 GHz (and probably still at a lower price). The GHz can't really be directly
compared, though. Even comparing GHz across Intel product generations isn't
useful. I have a 3.16 GHz Core 2 Duo in my desktop that I think (I haven't
really benchmarked, but I've run a Litecoin miner on both for testing) does
about half the work of the 3.2 GHz i7 in my laptop.

All that said, I have a 16 core AMD server in colo that is running at about 3%
usage across all CPUs, and yet it is slow as hell because the disk subsystem
can't keep up (replacing the spinning disks with SSDs as we speak). The
reality is that CPU is not the bottleneck in the _vast_ majority of web
service applications. Memory and disk subsystems are the bottlenecks in every
system I manage.

So, I love the idea of a low power, low cost, CPU that still packs enough of a
punch to work in virtualized environments. Dedicate one of each of these cores
to your VMs; would be pretty nice, I think.

~~~
hhw
The Avoton Atom C2750 which is already out now is also 8 cores, but at 2.4GHz
and with the entire SoC at 20W. It's supposed to have comparable performance
to a Xeon E5520 quad-core/8-thread 2.26GHz CPU from 3 generations back, or
about half the performance of a current quad-core/8-thread Xeon E3 1230v3. And
it supports virtualization extensions.

I agree that I/O and not the CPU is usually the bottleneck though.

~~~
justincormack
The AMD might have more IO bandwidth - ships with dual 10GbE, while the C2750
ships with 4x2.5GbE (usually 1Gb unless on backplane), although it has PCIe
too, who knows about total bandwidth.

~~~
gonzo
I've got a C2758 on my desk with dual 10G over PCIe.

works just fine.

~~~
wmf
Except that cost ~$700 for the processor + mobo + NIC, right? Seattle is
supposed to be cheaper.

------
userbinator
I think reusing the Opteron name is really not a good idea, since now there'll
be x86 Opterons and ARM Opterons. Maybe Apteron would've been a better
choice...

~~~
alexandros
Apteron in Greek (Άπτερον), the one with no wings. Other things being equal,
maybe they should go with something else :)

~~~
userbinator
On the other hand, that's what we could call them if they were as successful
as the Itanic.

------
colanderman
Dual 10 GbE built-in? Sweet. Wonder what the price will be; a dual 10 GbE
Intel card goes for $500 alone.

More importantly – will they provide zero-copy I/O like you can get with Intel
network cards via their DPDK [1] or PF_RING/DNA [2]?

[1] [http://dpdk.org/](http://dpdk.org/) [2]
[http://www.ntop.org/products/pf_ring/dna/](http://www.ntop.org/products/pf_ring/dna/)

~~~
thrownaway2424
I don't think that's "sweet" I think that's a bad decision. What if I don't
need two of those per 10 arm cores? Now I'm just paying for gates I don't
need.

~~~
AnthonyMouse
What if you don't need AES encryption? Now you're just paying for gates you
don't need. What if you don't need SIMD instructions? Now you're just paying
for gates you don't need.

It doesn't matter. Modern processors have the complete opposite problem. It
isn't that transistors are expensive, it's that they're so cheap you end up
with too many and they generate too much heat. If you can stick a block on
there which 60% of your customers can use and the other 40% can shut off to
leave more headroom for frequency scaling, it's a win.

Also, the number of gates you need for a network controller is small.

------
dragontamer
I don't think people "get it".

This is a microserver, designed to connect up I/O bound resources to each
other. Imagine a cache like Squid running on this thing. Imagine mulitiple
RAID-0 SSD drives on one side, and 20Gbps going out through the network.

This is NOT a computationally difficult task. For computationally difficult
tasks, you have 8-core $2000 E5 Xeons (which get more and more efficient the
bigger the workload you have).

However, filling your datacenter with $2000 Xeons so that they can spend 0.01%
of their CPU power copying data from SSD drives to the network is a waste of
money and energy.

The A1100 looks like it will be a solution in the growing microserver space.
As Facebook and Google scales, they have learned that a large subset of their
datacenters are I/O bound and that they're grossly overspending on CPU power.

Big CPUs -> Big TDPs -> higher energy costs.

This machine is designed with big I/O throughput (multiple 10GbE and 8 SATA
ports _on-chip_ ), with the barest minimum CPU possible to save on energy
costs.

The upcoming competitors to this market are HP Moonshot (Intel Atoms), AMD
Opteron A1100, and... that's about it. Calexda's Boston Viridis has died, so
that is one less competitor in this niche.

------
vardump
Last time someone made ARM CPU with 1100 in model name, Intel bought them:
[http://en.wikipedia.org/wiki/StrongARM#SA-1100](http://en.wikipedia.org/wiki/StrongARM#SA-1100).

------
elipsey
It's a relief that AMD has a performance/watt alternative to bulldozer. I sure
hope they can keep being a business so I have someone to buy hardware from
that doesn't fuse off features to screw us out of a %65 margin.

Either way, I'm hoping ARM64 will trickle up from iThingies to the desktop so
I can buy a CPU with Virtual MMIO without paying an extra hundred bucks.

~~~
ANTSANTS
Maybe their desktop and server CPUs aren't so hot right now, but I don't think
you have to worry about AMD for a while. All of the current generation of
consoles have AMD GPUs, those GPUs are on the same die as an AMD CPU for two
of the three (PS4 and Xbone), AMD GPUs remain competitive with NVidia's
offerings, and they seem to be winning mindshare with their lower-level Mantle
graphics API.

EDIT: And the whole Bitcoin-mining thing (or Litecoin/Dogecoin mining thing,
these days), as mentioned in another thread.

~~~
elipsey
Does AMD GPU hardware have a general advantage mining?

err, can you link me to the other thread?

~~~
zanny
Nvidia intentionally cripples the double precision floating point performance
of their "gaming" cards to make the market for Tesla cards.

AMD doesn't, which is why you don't see much firepro, and why their entire
line sells for so much right now while mining on hashes is big.

~~~
dustcoin
Neither sha256 (bitcoin) nor scrypt (litecoin) mining uses floating point
operations.

AMD cards are faster because they are built with more, simpler "cores",
compared to Nvidia's fewer, more complex "cores". Mining benefits from the
increased parallelization and doesn't benefit from Nvidia's "fancy" "cores".
For sha256 mining, AMD cards gain an additional advantage by supporting a bit
rotation instruction that Nvidia cards do not.

------
jeffdavis
Does the ARM architecture have anything like the nested page tables in recent
x86-64 chips? Or is that an orthogonal processor feature that is not required
(or forbidden) in a particular implementation of ARM/x86-64?

To make a real entrance into the server market, I would expect good
virtualization support to be nearly a requirement.

~~~
robot
It has support for virtualization. There is a two stage translation where the
first stage handles guest operating system and second stage handles the
hypervisor mappings. Both stages have nested page tables.

Also there is an IOMMU implementation for supporting virtualization for IO.
For example, the IOMMU and CPU MMU page table mappings are synchronized such
that a DMA controller would also adhere to page table mappings set up for the
CPU.

~~~
jeffdavis
Regarding the IOMMU:

A recent post revealed some security problems using firewire (and a few other
technologies) related to DMA[1]. Would the IOMMU features you're talking about
prevent that problem?

[1]
[https://news.ycombinator.com/item?id=7123121](https://news.ycombinator.com/item?id=7123121)

~~~
robot
Right. DMA creates security holes because it does not sit behind an MMU. It
can change the memory of any guest OS. That means any OS or code that can
program the DMA controller can bypass security. IOMMU prevents that, because
all IO devices sit behind this MMU.

You can have this protection, but then face programming issues if IOMMU and
cpu MMU use different page tables. You have to update both. ARM IOMMU is
designed so that it is automatically in sync with the CPU tables.

------
msoad
I don't know anything about chips. But I know ARM architecture was around for
decades. Why it's hot again? I get the point for using it in smartphones and
tablets, but why should servers use ARM?

~~~
rayiner
It's hot because the tablet and smartphone market exploded, creating a demand
for high performance, low-power cores that could be flexibly integrated into
SoC's with other parts. With that new market came volume, and in the processor
market, volume is important. The reason x86 overtook RISC architectures is
that the high volume of x86 chips generated revenues that allowed for massive
capital investment in x86 designs. The tablet and phone market is driving a
similar process for ARM chips. Right now, there are at least three very well-
funded lines of ARM micro architectures: Qualcomm's, Apple's, and ARM's. It's
been a long time since a non-x86 platform got that kind of investment.

~~~
userbinator
If you look at it from the perspective of expressiveness, the CISC-ness of the
x86 ISA also allows far more opportunity for hardware-level enhancements than
a RISC style one: the code density is higher, meaning better cache usage and
less memory bandwidth needed (especially with multiple cores), and there's
still a lot of relatively complex instructions with the potential to be made
even faster. RISC came from a time when memory bandwidths were higher and the
bottleneck was instruction execution inside the CPU, but now it's the
opposite; memory bandwidth and latency is becoming the bottleneck. There's
only so much you can do to speed up an ARM core without adding new
instructions.

Linus has some interesting things to say about this too:
[http://yarchive.net/comp/linux/x86.html](http://yarchive.net/comp/linux/x86.html)

~~~
solarexplorer
It's pretty weird to think that x86 gives Intel any advantage over ARM.

Let's see: x86 code density is horrible for a CISC, there is hardly any
advantage over ARM, which does great being a RISC. Also remember that the
memory bandwidth is primarily a problem for data, but not code. ARM64 is a
brand new ISA, it's the x86 ISA that is a relict from the times when
processors were programmed with microcode. Intel is doing a great job to
handle all this baggage, but to claim that the ISA gives Intel an advantage is
ridiculous.

And finally, Linus has been an Intel fanboy since day one. Go read the USENET
archives to find out. He received quite a bit of critique because the first
versions of Linux were not portable but tied to i386.

~~~
userbinator
x86 code density may not be optimal but it's better than regular ARM - only
thumb-mode can beat it, and just barely.

> Also remember that the memory bandwidth is primarily a problem for data, but
> not code

RISCs, by design, _need_ to bring the data into the processor for processing;
but I see things like
[http://en.wikipedia.org/wiki/Computational_RAM](http://en.wikipedia.org/wiki/Computational_RAM)
being more widely used in the future, where the computation is brought to the
data, and this becomes much easier to fit to a CISC like the x86 with its
ability to operate on data in memory directly with a single instruction.
Currently this is done with implicit reads/writes, but what I'm saying is that
the hardware can then optimise these instructions however it likes.

The underlying principle is that breaking down complex operations into a
series of simpler ones is easy, combining a series of simpler operations into
a complex one, once hardware can handle doing the complex one faster, is much
harder. x86 lagged behind in performance at the beginning because of a
sequential microsequencer, but once Intel figured out how to parallelise that
with the P6, they leapt ahead.

Linus being an Intel fanboy has nothing to do with whether x86 has an
advantage or not. But even if you look at cross-CPU benchmarks like SPEC, x86
is consistently at the top of per-thread per-GHz performance, beating out the
SPARCs and POWERs, and those are _high performance_ , very expensive RISCs.
I'd really like to see whether AMD's ARMs can do better than that.

------
rayiner
I don't see AMD's play here. What value do they add with a processor that they
don't design, don't fab, and can't produce in the kind of volume tablet and
phone chips get produced?

~~~
gnoway
I don't understand your comment. "Doesn't design and doesn't fab" describes
every ARM licensee. Also, this chip isn't going anywhere near phones or
tablets. Did you look at the specifications?

~~~
rayiner
The other ARM licensees have a hook: Qualcomm integrates with its LTE base
bands, Apple builds a phone around it, etc. The phone/tablet angle is
important because the high volumes in those markets help justify big design
teams at Qualcomm and Apple.

------
lazyjones
Where is the market for this, apart from Facebook (Open Compute Project)? Is
it set to compete with CPUs like the Xeon E3-1220L series? Will it end up in
HP's Moonshot? I thought that bigger boxes with virtualization would be more
economical for most uses than closets full of low-power CPUs.

Perhaps I/O is the key here, N of these A1100 CPUs can easily saturate N x 2 x
10GbE, a single box with 64+ cores probably cannot push 16 x 10GbE.

~~~
rbanffy
The developer board runs Fedora. Any workload that does not depend on a
specific CPU architecture (mostly everything but Windows) should run on it.
The dev board is there to make it possible to developers to fine tune their
implementations so they run well on the new platform.

Will server makers buy it? That remains to be seen.

Making a dev board available (let's hope it's also cheap enough to make
hobbyists buy it) is rather clever. Without software tuned for it, the chip
could fail on the market like Sun's Niagara and Intel's Itanium did.

------
pippy
We're getting close to the age where you can buy a off the shelf AMD desktop
machine with Linux, a good graphics card, and the same performance of a
x86-64.

~~~
zanny
I think saying 8 A57 cores at 2ghz is anything close to x86 at 3.5+ghz on 4
cores.

Clock for clock, pipe for pipe (15 stages in A57, 14-19 in Haswell) x86 still
wins because of per-instruction operand volume.

Though i wonder how well this chip would perform at 50w. Double the watts,
maybe another 1.5ghz, might be viable for a cheap htpc.

~~~
elwin
> Though i wonder how well this chip would perform at 50w. Double the watts,
> maybe another 1.5ghz, might be viable for a cheap htpc.

As a general rule, power varies with the cube of frequency. 50W will only get
about 2.5 GHz.

------
GarrettBeck
I have a feeling history is going to repeat itself. Statements like " _AMD
believes that it will be the leader of this ARM Server market_ " reminds me of
the DRAM boom-and-bust from 2006-2009. The new (old) _hot_ technology froths
the market into a frenzy and semiconductor fabs start rushing to get a slice
of the action.

Does Qimonda ring a bell to anyone?

------
fleitz
This may be a much bigger threat to intel than AMD64 was.

~~~
skylan_q
ISAs eat the market from the bottom-up.

x86 is at risk here.

~~~
fleitz
Especially with AMDs ability to throw GPU hardware on the CPU. ARM eats the
bottom end on a $/Watt model, and the GPU eats the top end on a raw
performance model.

------
chx
> Two 10 Gigabit Ethernet ports.

Within a 25W envelope??? I thought dual 10GbE chips consume 10-15W.

~~~
donavanm
Optics are a big chunk also. A new controller with DAC Gbe phy is probably
more like 7-8W. They only need to do the controller on this chip. A couple
watts for the Phy are part of the motherboards budget.

------
thomasfl
While this seems to be targeted for small low power web servers, I really want
low powered, cool and low temperature laptops. Laptops with hot intel
processors are cooking my body if I actually keep my laptop on my lap.

~~~
kybernetyk
I have the late 2013 MacBook Air with the new Haswell Core i5 CPU and it's the
first time I can put my notebook on my lap for as long as I want without
getting a heat stroke.

As long as I don't do computing intensive stuff like playing games or visiting
some websites that overuse javascript the MBA runs really cool. Cooler than my
body temperature.

\-- typed on my MBA, lying on the couch, having it placed on my belly ;)

------
signa11
wasn't it someone at facebook who remarked that they would be interested in
ARM cpu's once the freq > 2.5Ghz, also it seems that google also has a bunch
of pa-semi guys, so, they working on an ARM clone isn't so far fetched...

edit: found the link
[http://www.theregister.co.uk/2013/12/16/google_intel_arm_ana...](http://www.theregister.co.uk/2013/12/16/google_intel_arm_analysis/)

------
bitL
25W :-(

As much as I wish AMD getting ahead, this is not good news for efficient ARM
servers.

~~~
dragontamer
25W includes the I/O.

8 SATA-3 ports and two 10-GbE ports will run you a lot of power. I can't think
of a single SoC that supports that much I/O.

------
hosh
Cool. I can finally build that ZFS plug computer I wanted :-D

------
protomyth
I do wonder when we'll be able to order a motherboard and how much information
will be available to create new drivers.

------
jjoe
So excited about the prospects but I just can't get past this sentence:

 _Rounding out the SoCs, they 'll also include dedicated engines for
cryptography and compression._

"They" have ruined it for me...

~~~
dmm
Forgive me if I'm being dense but what harm would embedded crypto and
compression engines do? You don't have to use them right?

~~~
fleitz
There's little room for tampering with crypto, the only room is for tampering
with the RNGs.

As far as I know all crypto algorithms are deterministic based on the key/IV
and the data.

~~~
djcapelis
While that's true and required to successfully decrypt most algorithms, it is
also true that there are more types of tampering one can do than changing the
output ciphertext. Usually involving storing the key or leaking data somehow.

~~~
shuzchen
Assuming they've already compromised the crypto bits of the chip there's
nothing to gain in avoiding them since the non-crypto bits could just as well
have the same compromises. Might as well just take the time/energy savings.

Tampering with the RNG probably provides the best value for an attacker, and
is harder to detect.

