RISC vs. CISC by John R. Mashey (1995)

drivebycomment · 2024-05-19T14:49:03 1716130143

Whoa. Comp.arch. I learned so much from that during 90s. There's nothing like debates with anecdotes thrown between experienced folks to learn from, and comp.arch back then had such a high signal-noise ratio that it was amazing.

trentnelson · 2024-05-19T19:08:41 1716145721

Yeah I was thinking how different those days seemed. And how hard it would be to run into this sort of content these days if you’re a generally curious youngster.

xarope · 2024-05-20T02:02:32 1716170552

Usenet was prevalent in those days for those who had internet, which probably meant a lot of academic institutes (and then those of us who had 2400 baud modems). So it would make sense that like-minded people would subscribe to those news feeds.

I'm not a social media expert, but with mastadon, blue sky, discord, even reddit (although perhaps the latter is now on a downward spiral; slashdot used to be one such too) with their specialised servers/forums etc, maybe there are still high signal-to-noise forums (HN is probably one such)

Symmetry · 2024-05-19T13:37:25 1716125845

The end result seems to be that the two entries from that old post that have seen the most success, x86 and ARM, are essentially the RISCiest of the CISCs and the CISCiest of the RISCs. A happy medium.

hmry · 2024-05-19T15:57:17 1716134237

Intel's new x86 extension (APX) is adding 16 more registers and three-register instructions. RISCification is still in progress.

sapiogram · 2024-05-19T19:08:08 1716145688

What's RISC about adding more registers?

ajross · 2024-05-19T19:26:04 1716146764

The simple answer is that a big symmetric register set has always been a hallmark of RISC ISAs, because having lots of symmetric registers increases throughput by reducing compiler-generated spill/fill overhead and permits more permissive ABIs for leaf functions that need the extra space.

The complicated answer is that, no, it's not especially "reduced" and you could implement all those other RISCy ideas in a much smaller set of GPRs without otherwise effecting the "RISCiness" of the design. But that answer is a little specious, since it's true of all the other RISCy traits too. Most of them stand alone, which is why the whole debate tends to be a little silly.

Symmetry · 2024-05-20T10:29:16 1716200956

The name "RISC" is something we gave to a particular design philosophy, but not everything in that philosophy is summed up in the name. Classic RISC designs have 32 registers to make up for the lack of instructions that operate directly on memory addresses and to make things easier for the compiler and achieve good performance without load-op-store instructions.

snvzz · 2024-05-19T23:49:35 1716162575

>the RISCiest of the CISCs and the CISCiest of the RISCs. A happy medium.

Until RISC-V enters the conversation.

thechao · 2024-05-20T12:26:45 1716208005

If you actually dig into the details of RISC-V — especially the variable width extensions, it scratches all of my CISC-y itches just fine. Is it as CISC-y as ARM or x86? Not yet; not yet.

fanf2 · 2024-05-19T13:38:59 1716125939

The runner-up CISC, IBM z / 370, is also fairly RISCy.

pclmulqdq · 2024-05-19T22:25:32 1716157532

The runner-up RISC, IBM's POWER, is arguably more CISC-y than ARM.

brucehoult · 2024-05-20T01:36:55 1716169015

I can't agree with that -- POWER has slightly simpler addressing modes, in that `lwu` etc are always and only pre-increment/decrement, and doesn't have load/store multiple (the most CISCy thing in arm32) at all. And also, obviously, not predication, either per-instruction or with anything like `IT*`.

pclmulqdq · 2024-05-20T01:53:58 1716170038

ARM64 gets a lot closer to CISC than ARM32, honestly (aside from the "everything predicated" and "double word stores"), with a larger suite of weirder instructions. Predication is also a distinct feature of ARM instructions, not a generic CISC thing.

POWER has instructions like RLIMI, DOZ, and STHBRX on the "complex" side.

The RISC/CISC distinction is not just about addressing and accessing memory (register-register vs register-memory operations). The headline operations for x86 actually were more about the compute side rather than the memory side - things like the "AAD" instruction did a lot of computing work in one bite. There were also RISC register-memory machines at the time the distinction was meaningful.

RetroTechie · 2024-05-20T16:39:30 1716223170

As the article explains:

"General comment: this may sound weird, but in the long term, it might be easier to deal with a really complicated bunch of instruction formats, than with a complex set of addressing modes, because at least the former is more amenable to pre-decoding into a cache of decoded instructions that can be pipelined reasonably, whereas the pipeline on the latter can get very tricky (examples to follow)."

So the distinction is more about how 'easy' an ISA lends itself to pipelining (critical paths, dependencies between instructions/operands & so on), than about how rich the instruction set is or how much work 1 instruction may do.

brucehoult · 2024-05-20T11:29:28 1716204568

> ARM64 gets a lot closer to CISC than ARM32

No way. Quite the opposite. ARM64 is quite similar to e.g. RISC-V in most respects other than the use of condition codes.

> POWER has instructions like RLIMI, DOZ, and STHBRX on the "complex" side.

There is nothing complex or "un-RISC" about any of those.

RLIMI is similar to Arm32 BFI or Arm64 UBFM.

DOZ is just subtract with the output zeroed if the MSB is 1. It is easily implemented with just a handful of straight-line instructions in any ISA with min/max or slt or for that matter asr i.e. in hardware with a simple combinatorial add-on to standard subtraction.

STHBRX is just a little-endian store (on big-endian POWER). Plenty of other RISC ISAs (whether big-endian or little-endian) have that.

> things like the "AAD" instruction did a lot of computing work in one bite

Less than other ISAs with a one-shot decimal add instruction (or mode e.g. 6502). Once again, it's a very simple combinatorial circuit -- no CISCy microcode or control flow needed.

> There were also RISC register-memory machines at the time the distinction was meaningful.

Name one.

netbioserror · 2024-05-19T13:25:19 1716125119

As I understand it, Intel apparently experimented with drastically lowering power consumption. They found that x86 chips can absolutely be made to sip power like ARM chips can. The conclusion was that fetch, decode, and interconnect were the most important power consumers, and CISC vs RISC didn’t quite matter as much as the physical architecture and layout.

Am I remembering that correctly? Does anyone have a link?

Pet_Ant · 2024-05-19T13:41:08 1716126068

> The conclusion was that fetch, decode, and interconnect were the most important power consumers and CISC vs RISC didn’t quite matter as much

Wouldn't that mean that x86 and other CISCs would cost more as they often have variable length instructions and thus the most complexity and consume the most power?

Now, if you told me that scheduling uops and reordering uops etc consumed the most power, then I'd understand why RISC vs CISC wouldn't matter as at uop level are high performance chips are similar VLIW computers (AFAICT, IANACD).

Though, for my money, I'm betting on a future dominated by gigantic core counts of really simple cores that may even be in-order. The extra transistors to speed up the processing are probably better spent on another core that can just handle the next request instead. Clock speeds brought down to some integer multiple of RAM speeds, and power costs brought down. At cloud scale this just seems economically inevitable. After that, the economies of scale mean it'll trickle down to everyone else except for specialised use cases.

I believe we already seeing this happening with Zen C cores and Intel e cores, and that a simplified instruction set lie RISC-V will eventually win out for savings alone.

gumby · 2024-05-19T14:26:08 1716128768

> on a future dominated by gigantic core counts of really simple cores that may even be in-order. The extra transistors to speed up the processing are probably better spent on another core that can just handle the next request instead.

I like how you think, but not sure we can get there.

Part of the RISC dream is that you shouldn’t need reordering as the ops should be similar in timing, so you could partially get there on that (although…look at divide, which IIRC the 801 didn’t even have). So they aren’t really uniform in length even before you consider memory issues.

Then there is memory — not all references can be handled before instruction decode because a computed indirect reference requires computation from the ALU and register state to perform. You can’t just send the following instruction to another unit as its computation may depend on the instruction you’re waiting for.

That’s why the instruction decode-and-perform has been smashed into lots of duplicated bits: to do what you say but on the individual steps.

At a different level of granularity, big.LITTLE is an approach that’s along your line of thinking. We tend to have a lot more computation than we need, even at the embedded level, these days and we may end up with some tiny power-sipping coprocessors that dominate most of the runtime of a device while doing almost nothing.

zimpenfish · 2024-05-19T16:32:49 1716136369

> a future dominated by gigantic core counts of really simple cores that may even be in-order

Chuck Moore (of Forth fame) has been advocating something like this for a while, although he's more "multi-computer" than "multi-core".

https://www.greenarraychips.com/index.html

fanf2 · 2024-05-19T23:24:30 1716161070

> I'm betting on a future dominated by gigantic core counts of really simple cores that may even be in-order.

¿Por qué no los dos? ¡CPU and GPU!

082349872349872 · 2024-05-19T06:20:53 1716099653

Lynn Wheeler is another oft-informative usenet personality.

brucehoult · 2024-05-19T07:44:03 1716104643

On IBM 360/370 etc stuff, at least.

klelatti · 2024-05-19T11:29:55 1716118195

Six hours and only two comments. Perhaps the RISC vs CISC debate is over :)

DeathArrow · 2024-05-19T11:51:08 1716119468

I doubt there are many pure CISC architectures anymore. Most use micro instructions.

Pet_Ant · 2024-05-19T13:44:03 1716126243

Was there ever a CPU that didn't breakdown CISC instructions into uops, but instead had dedicated hardware for each operation? My understanding is that CISC assembly is really more like a bytecode for a hardware virtual machine.

chuckadams · 2024-05-19T16:22:09 1716135729

IBM mainframes used to be sold in tiers where the cheaper machines would implement more of the instruction set in microcode than the higher-end ones. I imagine even the top end had some amount of microcode kicking around though.

> a hardware virtual machine

Otherwise known as "a machine" :)

drpossum · 2024-05-19T14:16:07 1716128167

The venerable 6502. And if someone wants to argue that, it doesn't fit under the RISC definition in the post (which I like because it sets some good delineation) as it has features like indirect addressing and variable-length instruction sizes

snvzz · 2024-05-19T23:46:08 1716162368

>it has features like indirect addressing and variable-length instruction sizes

There's nothing inherently non-RISC there.

Indirect addressing was evaluated for RISC-V, and found to not be worth the required added complexity and encoding space.

Whereas RISC-V actually adopted variable-length instruction size, because it was found to be highly beneficial to code density, and doable in a manner that does not add too much decode complexity, in neither small nor large micro-architectures.

brucehoult · 2024-05-20T11:46:17 1716205577

> Indirect addressing was evaluated for RISC-V, and found to not be worth the required added complexity and encoding space.

wtf?

If it was ever suggested by someone (and would NEVER be by Krste, Andrew, Yunsup, or Dave) then any evaluation would take less than 1 second before the "HELL NO!"

> Whereas RISC-V actually adopted variable-length instruction size

TWO instructions sizes, in the currently ratified ISA. And you can tell the size by looking at just 2 bits in the first byte of the instruction (the LSBs).

That is very far from "variable-length" in the sense of VAX or M68k or x86.

> ... was found to be highly beneficial to code density, and doable in a manner that does not add too much decode complexity

Commercially known in microprocessors since at least SuperH 2A and Arm Thumb2. Not to mention 3 lengths (2, 4 or 6 bytes) in IBM 360, also decodable from just 2 bits in the instruction. Also CDC 6600 (arguably the first RISC), Cray-1 (also recognizably RISC), the first version of IBM 801, and Berkeley RISC-II (obviously familar to the RISC-V designers).

snvzz · 2024-05-20T23:35:09 1716248109

Had to re-read a few times until it hit me. I didn't mean indirect but complex addressing.

brucehoult · 2024-05-21T01:21:36 1716254496

Ok, that makes more sense.

And yes, RISC-V designers believe that addressing more complex than base+offset is a net loss in processor efficiency. You might need very slightly fewer instructions overall in a program, but it costs more transistors, silicon, and therefore dollars, and might even lower the achievable clock speed by more than the saved instructions.

Some companies implementing RISC-V don't quite believe that (e.g. Andes, THead) and add more complex addressing modes as custom extensions. Once they've spent the silicon then you might as well use it -- same with WCH's shadow registers or automatic push/pop for interrupt handling (depending on the core) -- but it's not proven it's actually the best use of additional silicon [1].

Let the market decide.

If complex addressing wins then RISC-V always has the possibility to add a standard extension. It's much harder to take unneeded features away.

[1] admittedly the equation is different with a single-core microcontroller in a package, vs a chip with large numbers of cores (whether all the same, or all RISC-V, or not). On a stand-alone microcontroller the die size is often determined by fitting the pads around the outside, and if you don't use all the silicon inside that then it's wasted.

monocasa · 2024-05-19T18:52:26 1716144746

The 6502 has T states at least breaking instructions on memory access boundaries.

snvzz · 2024-05-19T23:42:08 1716162128

>Most use micro instructions.

That's a characteristic of the microarchitecture.

It has nothing to do with RISC or CISC.

klelatti · 2024-05-19T12:00:42 1716120042

They always did didn’t they?

drpossum · 2024-05-19T12:18:56 1716121136

Not "always" but it has been around since at least the 70s, including the original 8086 which I don't think anyone would classify in good faith as RISC