
RISC vs. CISC (2000) - jcr
http://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/
======
_delirium
For context, note that this site was part of a class project from 2000. Some
stuff is still relevant, some not.

Other similarly styled informational sites created by students in this course
over the years can be found here:
[http://cs.stanford.edu/people/eroberts/courses/soco/](http://cs.stanford.edu/people/eroberts/courses/soco/)

~~~
jerf
All of it is true in the historical sense, virtually none of it is relevant.
With the advent of "instruction decoders", everything is RISC on the inside
and the "machine code" is now nothing more than another layer of interface.
Instruction sets no longer determine internal architecture. And instruction
decoders are trivial elements of the overall die.

RISC didn't die and CISC didn't die... "RISC vs. CISC" died instead.

~~~
voidlogic
In some ways this modern hybrid is better than either. RISC CPUs were almost
always internally more efficient due to the simpler decode and easier
application of performance tricks; whereas CISC was always better externally
as it had better memory/instruction density. So in some ways the CISC on the
outside, RISC in the inside approach is the best thing that could have
happened for a CISC arch. like x86.

~~~
1amzave
> _CISC was always better externally_

...aside from being orders of magnitude more of an ass-pain if you ever have
to write software that actually deals with machine code (at least on x86;
perhaps other CISCs aren't as bad).

Simple example: given an instruction boundary, find the preceding instruction
boundary. On most any RISC it's trivial (subtract 4, or perhaps 2 in some
cases). On x86...I'm not an expert, but I'm about 80% confident it's basically
impossible to do in the general case.

~~~
voidlogic
>...aside from being orders of magnitude more of an ass-pain if you ever have
to write software that actually deals with machine code

1\. I was talking about external as in the memory hierarchy not development
effort.

>(at least on x86; perhaps other CISCs aren't as bad).

This, the PDP-11 machines have a lovely instruction set for example.

------
IvyMike
One argument I've seen: RISC is one way you can get an advantage of clock
speed, die size, and power.

But another way you can get an advantage in clock speed, die size, and power
is to go to a smaller fab process.

And given Moore's law, die shrinks marched ever on, and the next shrink was
never that far away. And Intel had a boatload of money to push their fabs
ahead of most everyone else.

(Yes, this is pretty hand-wavy, and you can pick a lot of nits, but I think
there's a lot of truth to it.)

~~~
Rusky
And combining the two ways, you get a CISC encoding turned into a RISC-like
internal microcode. This is what x86 does, for example.

~~~
notacoward
I was wondering who would point that out first. Amusingly, Intel was not the
last vendor to stop running x86 instructions natively. IIRC, Cyrix was doing
so well after Intel themselves had switched to the front-end-decoder ("ROP")
approach.

------
gilgoomesh
If anyone is wondering what happened to the RISC/CISC debate...

RISC won and all CPUs are now RISC CPUs. Even the x86 CPUs with CISC
instruction sets are now RISC CPUs.

This is done by using a thin instruction "cracking" layer that turns CISC
instructions into a series of "micro-ops" (equivalent to simple RISC
operations). Additionally, most RISC CPUs still need to break some of their
instructions down into micro-ops.

It just didn't require the death of the x86 instruction set.

~~~
ANTSANTS
Or did CISC win?

Old CISC: Microcoded interpreters interpreting higher level ISAs.

"New" CISC: JITs transpiling higher level ISAs into microcode.

Old RISC: Let's skip the middle man and deal with 80s CISC microcode directly,
so as to simplify instruction decoding and pipelining.

Somewhat new RISC: Oh wait, instruction decoding is trivial, at high
frequencies we can't ignore the fact that i.e. multiplies inherently take
longer to compute than adds, and icache density _really matters_ in an age
when DRAM is orders of magnitude slower than the SRAM it feeds. I guess we'd
better start reading those Ars Technica articles on the design of the Pentium
from the late 90s.

Is "new" CISC wearing old RISC's clothing, or is new RISC just finally
catching up to 90s "new" CISC?

~~~
gilgoomesh
Everybody wins?

~~~
ANTSANTS
Sure, we all settled relatively steadily on fruitful common ground in the end,
and that's all that _really_ matters. I just think it's ironic that people
toot the RISC horn so hard in these threads when modern high performance RISC
chips are much closer to the Pentium than vice versa. So many people miss that
RISC was inspired by CISC microcode, not the other way around, and that the
existence of complicated OoO RISC chips basically proves the RISC hypothesis
(that you could keep the architecture simple to make it easy to crank clock
frequencies up and up) false.

------
api
It's my understanding that memory bandwidth has always held back RISC.

Memory is considerably slower than CPU. CISC can be thought of as a kind of
data compression for code. The article does mention this, but kind of glosses
over its importance. In many cases a load from RAM can take hundreds of clock
cycles, so minimizing that is a huge win. It also makes programs load faster
since smaller code footprints require less disk or network access.

~~~
duaneb
This is a little overstated, typically:

1) Cisc code isn't really any better in some languages, e.g. C++, which uses
heavy code duplication. 2) The instruction cache these days is pretty fucking
huge. Any tight loop will fit in it just fine, and it's hardly like you're
thrashing the cache often at all (except, again in cases where you would
already be thrashing the cache on CISC). It's very difficult to find a
scenario where RISC is held back by the instruction cache where CISC would not
be. Obviously, if you shrink the cache and use -Os and try not to duplicate
code, you could probably run into this scenario more often, but it's just not
a very relevant argument. See: why people don't bother to use thumb
instructions on ARM. It's very rarely the bottleneck for a chunk of code. In
fact, the only place I'd expect to see it make a difference is either a) with
dynamic translation, ala Qemu, where you might rapidly thrash a lot of code
pages, and b) on application load on heavily duplicated code (again, think
C++).

In fact, I'd argue that the RISC vs CISC argument is less relevant than ever
these days:

\+ Compilers hide more assembly than ever, removing the need for human
readability of underlying instructions.

\+ Memory bandwidth and branch prediction are both larger issue on both
architectures than decoding of instructions.

\+ Intel has shown low-power CISC processors are in fact viable (atom and
celeron are both respectable), although it looks difficult to get a risc-like
performance/watt out of them.

\+ Most modern cisc code translates VERY CLEANLY to a risc-like microcode.
Assuming the translation cache is hot, translation really shouldn't add much
overhead.

No, I think the major downsides of CISC are maintainability (both in terms of
ISA and actually laying down logic) and in the difficulty of correct
implementation semantics across diverse implementations, something I'm frankly
amazed at Intel's ability to reliably pull off.

~~~
api
Good points. Intel's ability to pull it off seems a result of bucks and brute
force. They've got the money to fling suicidal hordes of rampaging Ph.Ds at
the problem for decades.

------
faragon
The past was CISC, past future was RISC, and present is CISC-decoded-to-RISC.
Future looks like more of that, i.e. the shorter the opcodes, the better: if a
short opcode can mean complex things, even memory-to-memory operations, it
will not be a problem when being scheduled in OoOE CPUs.

In my opinion, we'll have following things to come:

\- Simple in-order execution CPUs with tons of ALUs, as the generalization of
current GPUs (a la Larrabee or AMD/nVidia/ARM VLIW). This would be used for
graphics and signal processing.

\- Complex OoOE CPUs with expressive ISA for complex r-r/r-m/m-m operations,
transactional operations, runtime-defined opcodes, focused to maximize
sequential execution. With operating system support, it could allow to re-
optimize code, defining opcodes for groups of operations, so code could get
faster/smaller (code self-compression de facto).

~~~
Rusky
The Mill CPU ([http://millcomputing.com/](http://millcomputing.com/)) seems to
be going down both those routes- it's a very in-order, VLIW-like design with
some tweaks to make it work well with typical OOE workloads, including several
ways to make the encoding much smaller.

~~~
faragon
It looks interesting. The VLIW with OoOE approach will make sense, may be they
will follow that path, some day. In my opinion Industry has not yet reached
global adoption of that because of having still some room for enhancements,
being the macro-ops from Intel/AMD the last bullet in that regard. IMO, if not
explicit VLIW ISA, current ISAs (x86 and ARM) will follow a de facto VLIW
transcoding, like Transmeta did back in the day (they had no OoOE VLIW, but
in-order VLIW, in order to reach their power comsumption goals).

------
AceJohnny2
This article doesn't seem to have been updated in a while.

No mention of ARM (which is a RISC architecture)? No mention of power usage
advantage? You're not just removing those code translation transistors to
simplify your chip, you're also removing them to reduce power usage!

~~~
bostonpete
The title does say (~2000).

~~~
rodgerd
ARM had been around for a decade and a half at that point, and was quite
popular. Intel still owned their own (well, Digital's) ARM processors at that
point!

------
Symmetry
I think John Mashey's long-ago usenet post is still the best thing I've seen
on the topic of what RISC is.
[http://userpages.umbc.edu/~vijay/mashey.on.risc.html](http://userpages.umbc.edu/~vijay/mashey.on.risc.html)

