

The ARM, the PPC, the x86, and the iPad - ComputerGuru
http://neosmart.net/blog/2010/the-arm-the-ppc-the-x86-and-the-ipad/

======
pascal_cuoq
Article:

    
    
        In fact, it’s now a universally accepted truth that RISC
        is better than CISC! Actually, because of how much more
        efficient RISC machines are than their CISC counterparts,
        most CISC CPUs convert their CISC instructions into RISC
        instructions internally, then run them!
    

Most RISC CPUs convert their RISC instructions into RISC internally, then run
them! Look at the G5 (IBM PowerPC 970): it does register renaming, and divides
and assembles its instructions for the purpose of out-of-order execution, just
like any out-of-order CISC processor.

The truth is that instruction sets are always outdated compared to the number
of transistors that Moore's law allow to put on a chip. At some time it seemed
that 32 registers would make renaming unnecessary. At another it seemed VLIW
would make possible a level of performance not allowed by older instruction
sets. Ask Intel and HP how that transition worked out for them.

RISC principles are only superior in the sense that 70's crazy haircuts are
superior to 60's crazy haircuts. They are dated too, just a little less so.
And in these days of memory-bound computations, higher density in CISC
instruction sets seems to give them a slight advantage, if anything.

~~~
derefr
> Most RISC CPUs convert their RISC instructions into RISC internally, then
> run them!

A RISC processor is a processor with no microcode virtual machine level. These
processors aren't RISC in anyone's view but marketing's. Likely, there are
very few true RISC processors still being designed (for speed; processors for
embedding are a different story.)

~~~
1amzave
I think one thing that ought to be dealt with first before any RISC-CISC
debate is what exactly defines either one. Some attributes commonly identified
(in my experience) with RISC machines include:

\- Fixed-width instructions (generally 32 bits)

\- Explicit load and store instructions with arithmetic being performed only
between registers (hence RISC architectures sometimes being called _load-
store_ )

\- Large register files with few or no special-purpose registers (MIPS HI/LO
registers being an exception to this one, for example)

Other common (though perhaps less "defining") traits: three-operand
instructions, relatively few/simple addressing modes, procedure calls often
done via "branch-and-link" instructions, etc.

But...

> _A RISC processor is a processor with no microcode virtual machine level._

Huh? I've certainly never heard _that_ before. In fact, I'd say the RISC/CISC
dichotomy is primarily (if not entirely) an attribute of the ISA, not the
microarchitecture implementing it. It's generally pretty easy to look at e.g.
a PPC or x86 instruction set and classify it one way or another; in terms of
internal implementations though, the lines have gotten so blurred (in both
directions) in modern machines that I don't think it's real meaningful to talk
about RISC vs CISC anymore at the microarchitectural level, frankly.

~~~
derefr
> I've certainly never heard _that_ before.

It's because we've adopted (or rather, co-opted) the terms to refer to things
that shared phenotypical traits with their progenitors, but no longer held
true to the original definitions. In reality, RISC originally just _meant_
"exposes its microarchitecture as its instruction-set architecture." All the
other well-known properties of RISC machines were _effects_ of this decision.
But these days,

> the lines have gotten so blurred (in both directions) in modern machines

...that, like I said, there are very few RISC processors under the original,
theoretical definition of the term (and it's probably alright to just use
"RISC" under the new definition, since only the embedded programmers will
complain.)

------
jomohke
The article a little out of date. The RISC vs CISC distinction is very blurry
these days. Most RISC architectures have been made more CISCy, and vice versa.

The article mentions CISC CPUs using RISC instructions internally, which isn't
the whole story anymore.

Ars Technica did a classic article on this debate (in 1999!):
<http://arstechnica.com/cpu/4q99/risc-cisc/rvc-1.html> and a follow up here:
[http://arstechnica.com/hardware/news/2009/09/retrospect-
and-...](http://arstechnica.com/hardware/news/2009/09/retrospect-and-prospect-
ten-years-of-risc-vs-cisc.ars/)

A particularly interesting part of the follow up: "But a funny thing happened
with the Pentium M: processor designers discovered that processors of all
types are actually more power-efficient if their internal instruction format
is more complex, compound, and varied than it is with simple, atomic RISC
operations ... ... The end result is that even RISC processors needed to get
more CISC-y on the inside if they wanted to juggle the largest number of in-
flight instructions using the least amount of power."

~~~
ComputerGuru
Thanks for the link, it was an interesting read!

------
kjhghjmkedfcv
The RISC/CISC things is a little simplified. One reason RISC has never caught
on in the desktop is memory speed hasn't kept up with CPU speed (and can't -
with the laws of physics). So if a RISC cpu takes 10 instructions to do what a
CISC can do in 1, it loses any speed advantage if it takes 10x as long to get
the next instruction from memory.

The principle reason people use ARM is low power, part of it's low power comes
from the RISC design but it's not as simple as that. To reach the same overall
performance as an x86 the RISC may have to use more power, simply because
power increases faster than clock frequency.

~~~
jws
The difference in RISC/CISC instruction count is closer to 2:1 than 10:1.
(Unless you are using a VAX polynomial evaluation opcode, but that is an
extreme.)

ARM ameliorates this by having multiple instruction sets. The Thumb
instructions are a denser encoding, if somewhat slower. 90/10 rules apply.

~~~
ryanpetrich
Thumb instructions are the same speed, but can only perform a subset of what
the ARM instruction set can; each instruction takes half the space. Thumb can
be faster if it eliminates cache overflows, but can also be a lot slower if
faster ARM instructions have to be emulated with Thumb equivalents.

------
artsrc
One point of view is that we are pretty close to optimal. Here is another
perspective from:

    
    
        http://queue.acm.org/detail.cfm?id=1039523
    
    

Kay says:

    
    
        Just as an aside, to give you an interesting benchmark—on roughly the same system, 
        roughly optimized the same way, 
        a benchmark from 1979 at Xerox PARC runs only 50 times faster today. 
        Moore’s law has given us somewhere between 
        40,000 and 60,000 times improvement in that time. 
        So there’s approximately a factor of 1,000 in efficiency 
        that has been lost by bad CPU architectures.
    
        The myth that it doesn’t matter what your processor architecture is — 
        that Moore’s law will take care of you—is totally false. 
    

From my point of view garbage collection, JIT compilation, and late binding
are valuable and the hardware is leaving to much to the VM's.

~~~
ippisl
Alan kay might not be correct in this case . The correct factor between 1979
and today's computers is not 1000x , is something between 10x-50x. and with
the right compilers you get a factor of 3x-10x[1].

[1][http://lists.canonical.org/pipermail/kragen-
tol/2007-March/0...](http://lists.canonical.org/pipermail/kragen-
tol/2007-March/000850.html)

------
Hoff
It's building the business case and the revenues that makes this whole
processor design discussion interesting.

Not the microprocessor technology itself.

Competitive microprocessor designs (in terms of speed, power, cost, volume and
particularly the applications the users care about) are feasible but are Not
Cheap, and to get the costs down (and the revenues up means you need to build
the production scale. And building scale means leapfrogging the existing
players in one or more dimensions sufficiently to draw over applications that
the end-users really care about.

You need to be significantly better here or sufficiently profligate with the
application vendors and the resellers, or sufficiently compatible to overcome
the inherent application inertia; you need to have enough "pull" (speed,
power, cost, particularly applications) to build up a base. Without this,
you're running a furnace with the contents of the corporate coffers.

You can't be a just little better with your processor designs, either. That
won't draw enough folks over. Both the Alpha and MIPS microprocessors had
Microsoft Windows NT, and that (and even with porting tools such as FX!32)
wasn't enough to build up an installed base against the x86 designs.

Apple is playing the long game here.

------
pedalpete
This is one of the most interesting articles I've read here in a long time.

However, with the average use of mobile devices, does the CPU architecture
really matter that much?

As the focus of mobile devices is more the consumption of media (games, video,
etc), isn't the GPU where the large differentiator in this space?

Does either RISC or CISC have a benefit in the GPU space? Or am I completely
wrong with that previous statement?

~~~
z303
A GPU may help with media. GPUs tend to be SIMD / Stream processors. So some
problems map well to that architecture, some less so.

Lots of GPUs tend to work on four pixel quads of RGBA data with very
specialised instruction sets, which are not really CISC or RISC but are maybe
more RISC like.

This is good for graphics rendering both 2D and 3D, video and audio encoders
and decoders are trickier, some parts like motion estimation can be
implemented on a GPUs SIMD architecture but the bitstream process is very
serial by design so fits badly to a SIMD architecture and is not great on CPU
either, having a custom piece of hardware or FPGA maybe a better solution.

------
pdwoolcock
CPU's were/are one of the last pieces of computer-related hardware that I
still kind of considered "magic." This article definitely helped to alleviate
that...

------
wallflower
> Today, with the optimizations and internal RISC conversions that take place,
> CISC vs RISC isn’t really about the performance any more. It’s about the
> business, the politics… and the power consumption.

------
TimMontague
_In fact, it’s now a universally accepted truth that RISC is better than
CISC!_

Is that actually true? I'd like to see some actual numbers comparing RISC vs.
CISC performance/power consumption.

~~~
ComputerGuru
Author here.

You'd be comparing apples and oranges. CPUs are optimized for specific use
cases. There's always a tradeoff between the performance of each component,
and current x86 CPUs are designed/built-for the desktop while current
ARM/MIPS/etc. are made for embedded and mobile devices.

Atom comes close to being comparable to certain embedded devices, but not
really. Because it's actually a desktop architecture (resembles older in-order
execution x86 desktop PCs) scaled down and simplified to reduce power.

Within the field though, it's taken for fact. To me though, nothing says it
clearer than that Intel internally converts CISC instructions to a series of
RISC instructions, then enters them into the pipeline. If it weren't for
compatibility issues, Intel would be RISC today.

~~~
sloughly
Microcode isn't the same a RISC. With an actual instruction set you are bound
to an architecture, microcode changes with the microarchitecture so it can
contain optimizations that would break compatibility if used in a general way.

Intel tried to move away from CISC with EPIC (Itanium), which could be
considered a kind of RISC, but it obviously didn't work out.

~~~
ComputerGuru
Didn't it? The Itanium is a great performer. Like I mention below in another
comment, if only for AMD which beat Intel to the x64 punch with the hybrid
x86_64 architecture, we'd be all on Intel's vision of true 64-bit computing:
the Itanium.

All computer architecture PhDs I've spoken to have referred to the Itanium
with an air of awe. It was built from the ground-up with all the optimizations
and bottlenecks in mind, and can be statically optimized to do magic... except
no one is ever going to use it since it requires all applications to be
recompiled, and with x86_64 offering an easy way out, that's not going to
happen.

~~~
Nelson69
If AMD didn't introduce x86-64, we'd all be using PowerPC and 32bit x86 still.
IA64 suffers from the same problems as all the other RISCs out there that
died, it has potential to do magic but there just aren't that many magic
compilers and the output tends to look sort of mundane.

The whole RISC vs. CISC is off the point, that was decided a long while ago.
x86 "CISC" which is quite a bit more RISCy than say VAX or 370 happens to be a
very compelling blend.

ARM does have an interesting position in the ultra lower power field though. I
suspect that has more to do with ARM being designed for that from the start
than the instruction set though.

------
elblanco
Anybody remember Transmeta?

------
woadwarrior01
> _z = x + y_

AFAIK, x86 doesn't have 3 operand instructions yet.

~~~
1amzave
Generally true, though I think some of the proposed (future) SSE5 instructions
may be three-operand -- or even four in the case of FMA ops (perhaps this is
what you were subtly referring to by saying "yet").

Perhaps more importantly though, (again, AFAIK) x86 does _not_ have any
instructions that operate memory-to-memory in the way the article indicates.
There are plenty of memory-to-register and register-to-memory (and register-
to-register) ops, but the ALU ops generally don't combine a load and a store
into a single instruction (some instructions like cmpxchg do for the purpose
of providing atomicity, but those are a special case).

