
ISA showdown: Is ARM, x86, or MIPS intrinsically more power efficient? - indolering
http://www.extremetech.com/extreme/188396-the-final-isa-showdown-is-arm-x86-or-mips-intrinsically-more-power-efficient
======
userbinator
ARM and x86 are rather close, but the real disappointment here is the MIPS
Loongson, which is basically the "original RISC ISA". Unfortunately I've
encountered a huge number of people, particularly academics, who still think
(and teach) that MIPS or minor variants of it are the "best" ISAs and that one
can easily make cheap, fast, and power-efficient processors based on it.
Looking at the current state of things, it seems the only thing MIPS has
succeeded in is being cheap and pedagogical.

I think instruction density has quite some significance here too - x86 opcodes
vary between 1 and 15 bytes with 2-3 being average and ARM has Thumb mode
where instructions are either 2 or 4 bytes, but _all_ MIPS instructions are 4
bytes. It also has twice as much L1 as most of the ARM and x86 processors,
which apparently didn't help it much. Cache consumes power too, and thus I
believe small variable-length encodings (like x86) are ultimately better since
they allow for better utilisation of cache; the extra complexity in the
decoder to handle this, which basically amounts to a few barrel shifters, is
almost nothing in comparison to the area and power that more cache would need.

 _The entire reason CISC architectures emphasized complex multi-cycle
instruction execution is because memory accesses were orders of magnitude
slower than the processor and data storage was extremely limited._

When considering cache, these points are all true again. There's a common
belief about optimising for x86 to avoid the smaller but slower "CISC"
instructions, but in situations like tight loops, an instruction that's 2-3x
slower individually can be better than the faster longer one(s) if it means
the difference between code and data staying in cache or a 10x+ slowdown from
a cache miss somewhere else. Especially on an OoO/superscalar design where the
slower instruction can be executed in parallel with other nondependent ones.
(Intel/AMD's focus on speeding up these small CISC instructions - which they
have done - is possibly one of the reasons why x86 performance continues to
improve.)

~~~
ajross
The Loongson is a 90nm part, the others are 32-45nm. No ISA is going to make
up for a doubled transistor size.

Really the notable thing to me isn't the ISA nonsense at all. It's how
singular a success the Cortex A9 core is. It came at exactly the right moment
in history and hit exactly the right sweet spot, being significantly beefier
than the A8 yet only minimally more power-hungry. Krait has followed on pretty
well, but the A15 can almost be considered a failure at this point.

~~~
rcthompson
Yeah, based on just the data in this article, I would see A9 as preferable to
A15 for using as the CPU of a mobile device.

~~~
rplst8
Definitely, though wasn't the A15 designed for more of a compact server role?
IIRC the A15 had different design goals.

~~~
rcthompson
Ok, I' would believe that. I don't know much about the correspondence between
A8/A9/A15 and real-world devices.

------
mljet
Link to the study referenced by the article:

[http://research.cs.wisc.edu/vertical/papers/2013/isa-
power-s...](http://research.cs.wisc.edu/vertical/papers/2013/isa-power-
struggles-tr.pdf)

------
hendrik42
What irks me is computing and comparing W/Mips for vastly different processors
like 45W Intel and 5W ARM chips. That just isn't reasonable, as performance
increase is very sublinear in power.

~~~
0x0
Also, running the i7 in 32bit mode can't possibly be showing the intel chip
from its best side?

~~~
sounds
Did anyone find a reasonably prominent link to the source?

It seems to me as if this article is mostly linkbait simply by reason of it
failing to provide anything more than vague phrases about the source: "This
paper is an updated version of one I’ve referenced in previous stories, ...
the team from the University of Wisconsin"

Half-baked studies frequently attempt to shout down the real hard science.

~~~
cjg_
Think we have the answer in the source article's abstract,

" Our methodical investigation demonstrates the role of ISA in modern
microprocessors’ performance and energy efficiency. We find that ARM and x86
processors are simply engineering design points optimized for different levels
of performance, and there is nothing fundamentally more energy efficient in
one ISA class or the other. The ISA being RISC or CISC seems irrelevant."

[http://research.cs.wisc.edu/vertical/papers/2013/isa-
power-s...](http://research.cs.wisc.edu/vertical/papers/2013/isa-power-
struggles-tr.pdf)

------
jhallenworld
The one argument I can make is that MIPS is too simple. But I would only make
this claim on the simplest of in-order single or dual issue implementations.
Think of a memcpy loop: 32-bit ARM and PowerPC can update the pointers as a
side effect of the load and store instructions, but MIPS can not. You could
make similar arguments in favour of ARM's thumb instruction set (more work
done per 32-bits of instruction loaded with low decoding overhead vs. x86).

For implementations more advanced than this... I don't think you can make any
such claim based on ISA. x86 may be at a slight disadvantage due to decoding,
but that's about it.

------
josephlord
With ARM different companies can license the design and include different
system components on the chip. With Intel you need to take a packaged chip
provided by Intel. This can allow a system power and cost advantage compared
to having multiple chips. It is however a licensing/business model issue
rather than a fundamental ISA issue.

~~~
rational-future
Intel is starting to offer custom customer silicon on its chips. They signed a
contract with Rockchip a couple months ago are now marketing that product to
customers in China.

------
barrystaes
The way this data is normalized destroys any comparison of offsets between
mobile/server (and other) scenarios. Whats wrong with using the unit of
measure, like Watts?

------
etep
This is not a half baked study. The right comparison is being made, namely
performance versus energy. Further, they attempt normalized comparisons, here
quoting:

To factor out the impact of technology, present technology-independent power
by scaling all processors to 45nm and normalizing the frequency to 1 GHz.

~~~
gvb
Normalization is nice for a mental exercise, but I cannot buy a normalized
phone with a normalized i7 that fits in my normalized pocket. Engineering is
the art of trade-offs and the i7 has traded off size and power to achieve
speed. That is great when you have an i7-scale size and power budget, but if
the i7 exceeds your power or size budget it is a non-starter regardless of how
efficient (when normalized) it is. Full stop.

The implicit argument of the paper is that Intel could produce a direct
size+power+speed replacement for a phone-scale ARM processor, they just need
to dial the knobs to small+small+slower. The counter argument is that they
have tried but not come close. The Atom line is roughly comparable with
respect to speed, but size and power are a problem. The Galileo processor is
roughly comparable with respect to power and size but speed is horribly
lacking.

~~~
Andys
There are x86 phones on the market that have similar weight/shape/battery life
to ARM phones. Anandtech reviewed one two years ago and found it in the middle
of the pack with respect to energy.

The question for Intel is dialing down the profit knob: how much of a hit do
they want to take on each unit shipped, by competing with ARM for tiny phone
chips.

~~~
gvb
Ref: [http://www.anandtech.com/show/5770/lava-xolo-x900-review-
the...](http://www.anandtech.com/show/5770/lava-xolo-x900-review-the-first-
intel-medfield-phone)

------
gioele
I hoped the article had a look at which parts of those ISAs were reversible,
and thus did not dissipate energy.

For those interested, there is a master thesis from the '90 that discussed a
prototype reversible ISA + RTL that wasted (in theory) no power for the logic
(non-IO) parts [2].

[1]
[http://en.wikipedia.org/wiki/Reversible_computing](http://en.wikipedia.org/wiki/Reversible_computing)
[2]
[http://dspace.mit.edu/bitstream/handle/1721.1/36039/33342527...](http://dspace.mit.edu/bitstream/handle/1721.1/36039/33342527.pdf)

~~~
pjc50
It's not clear that any of those technologies have been physically
implemented? It's very different from standard CMOS.

------
kristianp
"The ISA being RISC or CISC seems irrelevant.".

I thought these were all RISC processors when you get past the instruction
decoder.

~~~
wmf
Which still leaves the question of the cost of the instruction decoder.

~~~
sharpneli
It's a balancing act.

Power consumption by the instruction decoder vs the power consumption of
additional cache&memory bandwidth.

It's amusing that despite the complaints towards X86 in the 90's nowadays it's
actually a really good instruction packing format (though it became really
sensible only after AMD64).

------
fosap
Yet out-of-the-box-computing claims to beat all of these by a order of
magnitude. I'm looking forward to see how they compare.

------
aortega
ABI is too high level to have any effect in power.

~~~
hayfield
You can have an impact on power consumption at a range of levels. From
choosing appropriate algorithms, to switching to different data types, to
changing compiler flags. Add together a few of these 'easy' 5-10% energy
savings and you've just reduced your application's energy consumption by a
third (OK, it's not _quite_ that simple, but the principle stands).

A couple of citations:
[http://arxiv.org/pdf/1406.0117v1.pdf](http://arxiv.org/pdf/1406.0117v1.pdf)
(algorithms / data types)
[http://arxiv.org/pdf/1303.6485.pdf](http://arxiv.org/pdf/1303.6485.pdf)
(compiler flags)

