
RISC vs. CISC: What's the Difference? - pathompong
http://www.eetimes.com/author.asp?section_id=36&doc_id=1327016
======
struct
I don't have access to the actual paper, but looking at the linked results[0]:

    
    
      Core Name 	Performance (MIPS) 	Energy (J) 	Power (W)
      Cortex A8 	178 	                25 	        0.8
      Cortex A9 	625 	                11 	        1.5
      Atom N450 	978 	                16 	        2.5
      i7-2700 	6089 	                28 	        25.5
    

So A9 delivers 625/1.5 = 417 MIPS per Watt, whereas the i7 delivers 6089/25.5
= 239 MIPS per Watt and the Atom delivers 391 MIPS per Watt.

In addition, their spreadsheet has an "energy" tab calculated from a
"normalized" power figure (where Atom comes out on top), but if you multiply
the measured figures without the dubious adjustment, it seems that the A9 is
actually more efficient (at least when you consider the board power), and MIPS
is conspicuously absent from this spreadsheet. So the fundamental conclusion
is "either ARM or Intel are better, but it depends on what you measure under
what workload".

[0] [http://research.cs.wisc.edu/vertical/wiki/index.php/Isa-
powe...](http://research.cs.wisc.edu/vertical/wiki/index.php/Isa-power-
struggles/Isa-power-struggles)

~~~
daemonwrangler
Something else to keep in mind is that you can get significant power savings
when you lower the clock rate. So if you measure total power consumed to run a
calculation, it may actually be more efficient to run on a fast CPU, finish
quickly, and then drop into a low power state than it would be to run it on a
low performance CPU for significantly longer.

~~~
stephengillie
I can't find a good reference now, but supposedly the i7 has a set of
transistors that calculates if its workload would execute faster on multiple
cores, or fewer cores, and can park cores to save heat, and let the
electricity be focused into the unparked cores.

Intel's marketing material in 2008 mentioned the number of transistors doing
the load calculations was about equal to the number of transistors in a 486.
So you have a 486 constantly determining thread scheduling load, they claimed.

~~~
wtallis
You misunderstood. The CPU doesn't get to decide how many cores are used; the
operating system's scheduler does. The CPU just tries to keep an accurate
running estimate of its power consumption and uses that to predict whether it
has enough headroom to boost the clock speed above the nominal full speed. If
some cores are temporarily idled by the OS, then that frees up a lot of power
and allows the remaining cores to have their clock speed boosted further.

~~~
stephengillie
Intel's marketing materials helped me misunderstand. Unless the OS is
leveraging that logic when it calculates which CPU to park.

Does any OS know that unparked CPU clock speeds might increase when they park
a CPU?

~~~
wtallis
The operating systems have plenty of knowledge about how CPU power management
works. They are hampered somewhat by how things like Turbo Boost are
implemented in a backwards-compatible way through ACPI P-states that can't
directly convey this information, but it's still pretty straightforward for an
OS to support even more complicated schemes like ARM's big.LITTLE.

The real problem is that the OS seldom has enough information about the
software workload to know whether it is better run on all cores, or just a few
at higher clocks. It falls to application developers to not spawn more worker
threads than are necessary.

------
Symmetry
ARM up until the 64 bit transition was always one of the CISCiest RISC designs
and x86 wasn't nearly as CISCy as, say, VAX. 64 bit ARM is a much more
traditional RISC ISA than the previous encoding.

But anyways, here's the link I always post when people talk about RISC versus
CISC.
[http://userpages.umbc.edu/~vijay/mashey.on.risc.html](http://userpages.umbc.edu/~vijay/mashey.on.risc.html)

~~~
amyjess
I've always viewed x86 as the worst of both worlds. It lacks the orthogonality
of a good CISC, and it lacks the simpleness of a RISC. That modern x86 chips
perform so well is _despite_ their inefficient ISA, not because of it. If as
much money and research got poured into anything else, it'd perform even
better.

~~~
Symmetry
I can't believe I'm defending x86 but I think you're being too harsh.
Orthogonality isn't really that important these days now that everybody uses
compilers and the very messiness of x86 has allowed Intel to keep adding new
instructions over time.

Linus waxes poetic on 'rep movs' but I'd rather have something like PAL code
for architecture specific optimized routines for implementing copies. Still,
that's something most ISAs don't have.

------
JoachimS
None of the CPUs compared have very reduced (as in few) number of
instructions. We've come quite far from MIPS1, IBM 801 and the first SPARCs in
terms of ISA complexity.

The big difference is really that x86 has an ISA->uop decoder, which basically
is another decoder in front of the decoder in a RISC.

~~~
amyjess
RISC is ((reduced instruction) set computing), not (reduced (instruction set)
computing). That is, the instructions are what's reduced, not the set. What
makes an ISA RISC or CISC is how simple or complex each individual instruction
is, not how many instructions are in the set.

The point of RISC is that each instruction does one thing and only one thing.
There's no addressing modes where a single instruction can access memory,
perform some operation on the contents, and write it back into memory. This is
why RISC is sometimes described as a strict load/store architecture.

~~~
TheOtherHobbes
Originally RISC was a combination of:

Small simple instruction set, to minimise the size of the decoder

Single cycle execution

Aggressive pipelining

Replacement of decoder space with a much larger on-chip register space

The theory was everything would work faster. And this was true for a while.

But eventually CISC cache killed the register speed advantage, CISC pipelining
became astonishingly clever and killed the pipelining advantage, and the
actual difference in efficiency between a CISC instruction decoded to u-ops
and compiler translation of complex statements to RISC instructions turned out
to be somewhere between not much, nothing, and negative.

So RISC basically wins for relatively low performance low power computing.
It's not such a win for anything that requires SIMD, MMX, or any kind of DSP
extension - which today means most desktop computing.

The basic problem with the premise is that it's more efficient to cache memory
reads _and_ decoded instructions _and_ data than to keep data in registers and
assume a pipeline is going to give you cache-like performance for
instructions.

In fact, modern CISC chips include the equivalent of a hardware compiler that
tries to run an optimised internal RISC machine while also providing the
benefits of fast data and u-op caching.

The simple many register model is really a bit old fashioned now.

~~~
AlphaSite
Well no, doesn't modern x86 have an extremely large number of registers in
actuality?

~~~
daemonwrangler
Sort of. The physical register file has more registers than what's specified
in the ISA in order to support out-of-order execution. Hardware maps the small
number of ISA registers to the larger number of physical registers. All sorts
of complex stuff happens in the hardware to make it all work out right (e.g.,
bypass logic between pipeline stages to make sure dependent instructions are
able to use just produced data before it gets stored in the register file).
All to give the hardware greater scheduling flexibility.

So yes, there are more registers than you'd think by looking at the ISA, but
they aren't available to the compiler, which can limit the kinds of
optimizations it can make. I think when AMD introduced x86-64, they only
increased ISA registers from 8 to 16. RISC ISAs at the time were offering
32-64 (and some also had larger physical register files to support OoO).
Granted these days there's also all of the vector registers for SSE/AVX, but
you'll need to have vectorizable code to leverage those.

All that being said, I don't think cache is a replacement for the register
file.

------
TheLoneWolfling
My thoughts on the matter:

Given that process sizes keep shrinking, and every time you shrink the process
size you can fit more on the chip, and heat doesn't scale (as in, the smaller
the process size the more heat per in^2), and we're up against a heat wall as
it is, we're to the point now where a large chunk of the chip _has_ to be dark
at any point in time. As such, CISCs are looking better and better. Because
you cannot really scale frequency more (due to heat concerns - freq^2 heat
output, to a first approximation), and you have to run most of the chip dark
at a time _anyways_ , and you have the space, so you may as well have things
that are optimized for rare use cases. And we're already seeing that. The
micro-ops on modern x86 processors are getting more and more complex and
specialized.

This will especially start happening once we get decent CPU caches - the
3d-ish stacks that are being talked about. Where you have a separate chip
stacked under or over the CPU that has a process optimized for RAM.

Note that this is _not_ talking about ISAs, this is talking about the
processor itself. Although it's not done much currently, you can just as
easily (or rather, with just as much effort) convert a RISC into CISC-like
micro-ops (macro-ops?) as convert a CISC into RISC-like micro-ops. It's
looking more and more as though ISAs can be successfully decoupled from the
actual processor design. Which is encouraging. Treat the instruction encoding
as effectively a compression scheme for the instructions that the actual
processor runs.

------
m0skit0
IMHO author is actually missing the point of RISC architecture: instruction
homogeneity allows for a simpler (and cheaper) hardware. Of course for
software developers RISC or CISC, it actually doesn't matter, that's what
abstract layers are all about.

------
SixSigma
Headline : X found to be Y

"X is Y" as an assertion

"or that's what researchers claim in new report" as a caveat

I hate this style

------
higherpurpose
In other words, even if the x86 ISA itself is not bloated anymore, the CPUs
_can be_. Because x86 CPUs still support a lot of 20-year old legacy stuff.

------
VLM
Hundreds of MIPS is interesting, for a certain class of application, but it
would be interesting to see the results for sub MIP applications. The
microcontroller in a microwave oven.

