The First RISC: John Cocke and the IBM 801

dosman33 · on Oct 2, 2022

When I worked at IBM in the early 2000's I occasionally remember hearing people say "IBM invented RISC!". As a young pup I knew to keep my mouth shut, but in my head I was like "best keep that news to yourself pop, the rest of the world only knows about the history of RISC-V" which is the only RISC we learned about in technical school in the late 90's. I didn't doubt IBM did something similar early on though, but it just came across to me as another funny IBM community knowledge thing that was a little off-base which wasn't too uncommon anyway. Obviously the IBM old timers were not off-base though.

But it is nice to see history recognizing earlier pioneers in the field like John Cocke. And this article also demonstrates something I found perplexing as a young pup: IBM's marketing department dictated what engineering did. Obviously the point of the company is to make solutions that customers want to pay money for. Just because engineering has a better solution doesn't mean it's the best fit for customers (or the most profitable solution to the problem). But with my head stuck in FOSS philosophy of the day it was a good education in real life.

adrian_b · on Oct 2, 2022

Actually in this case IBM really has the right to brag about it, because there is no doubt that IBM 801 included the first RISC CPU, even if the name RISC was invented only several years later.

Before IBM 801, there have been many simple CPUs, which, if they would be launched today, they would be called RISC CPUs, due to their simplicity.

However for those old CPUs the simplicity was not intentional, but it was determined by the limitations of their manufacturing technology.

On the other hand, the IBM 801 project, which has started in 1975, significant documentation about its principles of operation has been published during 1976 and a working prototype existed in 1978, was a project whose target was to intentionally reverse the trend towards more and more complex CPUs and investigate the ways by which greater simplicity may be in fact a method for reaching a higher performance.

All the principles for obtaining high performance with a simpler CPU were discussed in the IBM 801 documentation 4 to 5 years before the RISC project at Berkeley (1980 to 1984) and the MIPS project at Stanford (1981 to 1984).

Moreover, in March 1982 there has been a very important symposium where all the teams working at that time on RISC CPUs have presented their work, including the Berkeley RISC, Stanford MIPS and IBM 801 projects.

The presentations from that symposium have determined the designers of the future ARM CPU to change their architecture from some kind of 6502 extension to a RISC architecture. They have been particularly impressed by the IBM 801 presentation, so instead of using weak addressing modes, like in the RISC and MIPS projects, they have included in ARM the much more powerful IBM 801 addressing modes (i.e. including modes with base register auto update).

While John Cocke has not invented the word RISC when designing IBM 801, either he or another member of his team (which worked at the project that eventually became IBM POWER) has invented the word "superscalar", a few years later, in 1987, when the design target for CPUs had moved from being able to execute one instruction in a single clock cycle, like in the early RISC CPUs, to being able to execute multiple instructions per clock cycle.

brucehoult · on Oct 3, 2022

>Before IBM 801, there have been many simple CPUs, which, if they would be launched today, they would be called RISC CPUs, due to their simplicity.

No, I don't believe that is the case. Very early computers were simple -- but neither high performance no easy to program -- for that reason.

But by 1964, for example, there were already rather complex machines and ISAs such as the IBM 360.

Also in 1964 was Seymour Cray's CDC 6600, a complex high end machine with 10 functional units and superscalar out-of-order execution, but deliberately simple instructions that separated load/store from arithmetic. It would definitely and deservedly get the term "RISC" if introduced today.

The same goes for the Cray 1, which was also before the 801 project.

>While John Cocke has not invented the word RISC when designing IBM 801, either he or another member of his team (which worked at the project that eventually became IBM POWER) has invented the word "superscalar", a few years later, in 1987, when the design target for CPUs had moved from being able to execute one instruction in a single clock cycle, like in the early RISC CPUs, to being able to execute multiple instructions per clock cycle.

Again, the 1984 CDC 6600 was superscalar. As indeed were high end IBM 360s.

The term didn't exist then, but the techniques (scoreboard on CDC, Tomasulo's algorithm on IBM 360/91) did.

kragen · on Oct 2, 2022

Which pre-RISC simple CPUs are you thinking about? I think the key design features of RISC are no microcode, pipelined execution, usually one instruction per clock, a load/store architecture (no memory operands to ALU instructions), fixed-length instructions, and a large orthogonal general-purpose register set to compensate for the lack of memory operands.

Maybe the CDC 6600 sort of fits, but it only had 8 60-bit operand registers, and they weren't orthogonal; you could load into 6 of them and store from 2 of them, and you couldn't use them for addressing (addressing was done with 16 other 18-bit registers). It was only barely pipelined, though it was sort of superscalar. I do think it fulfills the "no microcode", "fixed-length instructions", and "usually one instruction per clock" desiderata — but you can hardly say it was a "simple CPU", its design objective being to build "the largest computer in the world", and sporting 60-bit registers, hardware floating point (single and double precision), and a heavily multibanked memory system.

TMWNN · on Oct 3, 2022

>Which pre-RISC simple CPUs are you thinking about?

Would 6502 qualify? I know that the ARM team moved away from it to design its own actual RISC architecture, but I've often heard it as being very RISC-like.

kragen · on Oct 3, 2022

I think the 6502 is a lot more like the PDP-8, except that it has variable-length instructions (interspersed opcode bytes and operands). Looking at my list above of RISC features:

- fixed-length instructions: no

- no microcode: the 6502's PLA amounts to fairly extensive microcode

- usually one instruction per clock: the 6502 doesn't come close to this, in part because of its microcoded execution approach. http://www.6502.org/tutorials/6502opcodes.html shows that different instructions take from 2 to 7 cycles, depending on the addressing mode.

- a load/store architecture (no memory operands to ALU instructions): because the 6502 has only one accumulator all its ALU instructions have memory operands, and generally several different addressing modes are available.

- pipelined execution: no.

- a large orthogonal general-purpose register set: no, although you can use the zero page as a register file (as on the PDP-8) you can't use data on it for addressing (unlike the PDP-8!) or ALU operations with anything except the A register.

So I think the 6502 is in every way at an opposite extreme from RISC. Which is not to say it's overcomplicated — it's an amazingly simple processor, actually, almost an order of magnitude smaller than the ARM 1 or any other RISC processor, roughly on par with the much more limited PDP/8 — but that the design tradeoffs it embodies are very different from RISC. They're not particularly similar to the systems the IBM 801 was designed in reaction to, either, except I guess that the 6502, like the IBM 360 and 370, is byte-addressed and supports both decimal and binary arithmetic; and the low-end S/360 Models 30 and 40 used an 8-bit ALU, like the 6502, but they hid it from the user: https://www.righto.com/2022/01/ibm360model50.html

brucehoult · on Oct 3, 2022

I agree with you. The 6502 isn't anything at all like RISC, it's "we can only just barely make a computer at all with the number of transistors we have available".

The 6502 uses a similar number of transistors as the 6800 (different company, but largely the same designers) but is MUCH faster in actual practice. The 6800 is a lot simpler and more orthogonal, the 6502 is a lot trickier and more efficient.

What the ARM designers liked about the 6502 was that it read or wrote a byte from memory on every clock cycle, and you could quite accurately calculate the execution time of a 6502 program just by counting the number of bytes of instructions and data read and written (and for each instruction applied min(bytes, 2)). They are on record as saying they tried to duplicate this in a 32 bit CPU. Making maximum use of memory cycles without a cache was also the reason for ARM's load/store multiple instruction, so that a single 32 bit opcode could then drive up to 16 words of data read/write before needing to fetch another instruction. This feature became redundant (as least as far as speed went) once instruction caches were introduced, but ARMv7 CPUs are mostly running without caches to this day. ARMv8-A removed load/store multiple.

kragen · on Oct 3, 2022

The 6502 is pretty amazing.

I mentioned the LDM/STM thing in https://news.ycombinator.com/item?id=33060256. As you of course know very well, but some other commenters may not, RISC-V omitted LDM/STM as well, even in the compressed-instruction extension; Waterman's dissertation includes a chapter where he measures the cost of providing LDM/STM in "millicode" instead of the instruction set, finding it quite reasonable, but this of course assumes the millicode is getting served out of the cache (at least most of the time) instead of its instruction fetches competing with the data traffic for memory bandwidth.

brucehoult · on Oct 3, 2022

Right. A form of this millicode LDM/STM is implemented in gcc and clang when invoked with the -msave-restore flag.

This calls one of a set of (intertwined) functions on function entry. The called function saves N registers s0 .. s[N-1] plus the Return Address register on the stack. The millicode can save the RA from the original function because RISC-V allows using any register for the return address and the save-restore millicode uses a different one: t0 (register 5) instead of ra (register 1).

The corresponding restore function is jumped to (tail called) at the end of the function and restores s0 .. s[N-1] and then returns to the caller of the original function.

The three extra jump instructions involved with using -msave-restore (one each call, return, and tail-call) of course add several clock cycles. But I have seen code using -msave-restore run faster than code not using it because of better utilisation of the icache. In particular, I have seen this with CoreMark on a machine with 16 KB icache where -msave-restore gave a 3% speed improvement.

At one point, the millicode for -msave-restore was 96 bytes of code. I don't know whether anyone has tweaked it since, but that is a pretty minor part of most icaches.

adrian_b · on Oct 3, 2022

As you say, CDC 6600 and its successor CDC 7600 have many of the characteristics of the RISC CPUs.

Compared with the IBM System/360, which was introduced around the same time, the instruction set of CDC 6600 was very simple.

Having hardware floating-point is something that is orthogonal with being RISC.

The goal of the RISC methodology of designing CPUs was not to create crippled CPUs that are so slow as to be unusable for practical purposes. The goal was to create a higher performance CPU by removing some features with great cost and little benefit.

Hardware floating-point is not one of those features. Removing the hardware floating-point unit from a CPU cannot increase its performance, it just makes the CPU useless for floating-point applications, because the speed of software FP implementations is much too slow.

The first RISC and MIPS projects did not have a FPU because they were student projects and designing a FPU would have required too much work. When commercial CPUs have been derived from those projects, they had to add hardware FPUs to become viable for the workstation market.

Also the width of the registers is something orthogonal with being RISC.

The same can be said about the partition of the registers into a set of address registers and a set of data registers. Depending on the ISA, such a partition can result in a simpler hardware than for a unified set of registers, e.g. by allowing the use of less read-write ports on the registers. Simpler hardware is a RISC goal.

There have been a large number of early computers with no microcode and with fixed-length instructions. Many have also been pipelined in various degrees.

However no early computer had a large number of registers, because the registers were very expensive when made with discrete components, so only a few computers had 8 registers or at most 16, but the majority had less registers than that.

With only a few registers, having only register-register operations and load/store operations would have been inconvenient, so I do not remember any early computer where most of the computation was done with register-register operations, before CDC 6600.

The splitting of the load from the operation that uses its value, either as distinctly encoded instructions or as micro-operations, has become necessary only when the CPU became faster than the main memory, like in CDC 6600, or in monolithic CPUs after 1980. In the slower CPUs, this would not have been a simplification.

A more precise formulation of the purpose of the large register set is not "to compensate for the lack of memory operands", but to allow the fast execution of the instructions despite the high latency of the loads from memory, by splitting the register-memory operations into register-register operations and separate loads. This split has the added bonus of enabling more efficient instruction encodings within the constraints of a fixed-length instruction format.

So a part of the RISC characteristics were present in the early computers, but having a large number of registers and doing computations only with register-register operations are choices that lead to a simpler CPU only in a certain technological context, which became available for cheap computers only around 1980, when the initial RISC projects were launched.

kragen · on Oct 3, 2022

> Having hardware floating-point is something that is orthogonal with being RISC. ... Also the width of the registers is something orthogonal with being RISC.

Agreed, but it's not orthogonal with being simple. You said, "Before IBM 801, there have been many simple CPUs,...they would have been called RISC". My reason for bringing up those points about the 6600 was to show that, however RISCy it may have been, it was anything but simple. So which machines did you mean if not the 6600?

(I don't think that it was very RISCy, given the non-orthogonality of its register set. You are probably right that its non-orthogonal register file simplified its implementation significantly, but it's a distinctly non-RISC feature. But whether such a complex machine, perhaps the most complex in the world at the time and certainly the fastest, was RISCy or not, has no bearing on whether there were simple CPUs that would have been called RISC.)

I don't think it's accurate to say that the early RISC projects (the 801, the MIPS, and the RISC) had a large register set to allow the fast execution of the instructions despite the high latency of the loads from memory, although it was true of the 6600. Rather, as I understand it, it was to speed up the instruction decoding, permit pipelining, and reduce the required bandwidth of the loads from memory. Zero-wait-state memory was still a thing in workstations up to the 90s; there was no latency of loads from memory.

Now you seem to be saying that you don't think there were any pre-801 load-store architectures except the 6600. That sounds dangerously close to the assertion that there weren't any pre-801 RISCs. But, as I understood it, your thesis that I was debating was that there were not just one but many pre-801 RISCs, and that moreover they were simple, but that they weren't called "RISC" because the term hadn't been invented.

Did I misunderstand? Did you change your mind? If not, what were these simple pre-801 RISCs, and what RISC characteristics did they have?

brucehoult · on Oct 3, 2022

>Zero-wait-state memory was still a thing in workstations up to the 90s; there was no latency of loads from memory.

That is not correct.

Even PCs started to get caches in around 1985-1987.

In 1985 the 80386 supported an optional external SRAM cache.

In 1984 the 68020 had a 256 byte instruction cache. 1987's 68030 added a 256 byte data cache. Apple's 1989 Mac IIci added an optional 16k external cache to the 68030, which was actually bought by everyone I knew who had one, as it made a considerable speed difference.

The 80486 (1989) and 68040 (1990) had 8k unified and 4k+4k I&D caches respectively.

RISC workstations of course had caches before Macs&PCs did.

kragen · on Oct 3, 2022

I appreciate the corrections, and I think I went through most of these in my earlier comment https://news.ycombinator.com/item?id=33059665 but I had the impression that the reason for the caches was mostly to reduce the required memory bandwidth. This is of course a stronger reason for an icache than a dcache.

As you can see in that thread, I haven't been able to find any shipping RISC CPUs that had a cache before the 68020 in 01984. In particular I haven't found evidence that the ROMP had a cache, even though it was derived from the split-cache 801. Maybe you know of some?

I must have been wrong about the latency of DRAM in the late 01980s. Even today I suppose single-banked DRAM usually tops out at 10 MHz for random accesses?

adrian_b · on Oct 3, 2022

Intel 8086/8088 had slow memory access, an access cycle was at least 4 clock cycles and it was easy to find DRAM that would not need wait cycles, e.g. for an IBM PC/XT.

Intel 80286 has completely changed the memory bus and it could complete a memory access in only 2 clock cycles, and inside those 2 clock cycles there was little time from when the address was available until the data had to be returned by the memory for no wait states.

Even for the first model of IBM PC/AT, with a very slow 80286 @ 6 MHz (despite the fact that 80286 had been officially launched with clock frequencies starting at 8 MHz), it would have been difficult to find fast enough DRAM to avoid the wait states. The desire of minimizing or avoiding the wait states could have been the reason for the very low clock frequency, unless Intel had manufacturing problems and could not supply enough 80286 @ 8 MHz or @ 12 MHz.

For the later faster 80286 models and for the 80386, which started in 1985 at 12 MHz and 16 MHz, it was pretty much impossible to find DRAM that would allow a 2-clock-cycle memory access cycle.

The majority of the PC/AT clones with 80286 or 80386 used one or more wait states for the DRAM and all the high-end PC/AT clones with 80386DX CPUs had external cache memories made with fast SRAM, e.g. of 32 kbyte or 64 kbyte size, which was the only memory that could be used without wait states.

Even after Intel included the L1 cache in 486 and Pentium, all the decent motherboards had an external L2 cache with SRAM. The external caches have disappeared only after 2000, when the later models of Intel Pentium III and AMD Athlon have included on the chip also the L2 cache memory.

adrian_b · on Oct 3, 2022

Like I have said, except for the CDC 6600 and its successors, including Cray-1, which used register-register operations, I have meant that many early computers had a subset of the RISC features enumerated by you, i.e. fixed-length instructions, no microcode, and in some cases also pipelining (which before IBM Stretch introduced this word in 1959 was called overlapped execution). They also had simple addressing modes and a relatively small number of instructions.

Most of these characteristics were present in the first electronic computers made with vacuum tubes.

In time, the computers became more complex, so after about 15 years of evolution computers like the various IBM System/360 models had features like microcoded implementation, variable-length instruction encoding, register-memory operations, addressing modes with 3 components, many complex instructions for special purposes that were seldom used. All these features, which became standard around 1965, have been removed 10 years later in IBM 801, in order to enable a higher performance, by reverting a part of the evolution from the first 15 years of the electronic computers.

brucehoult · on Oct 4, 2022

A very important point which I have not seen anyone make is that RISC is not only about having simple instructions and implementation BUT ALSO doing that in a way that does not harm performance.

The early simple computers also had very poor performance, not just in an absolute sense (of course they had primitive circuitry) but RELATIVE to the technology they used e.g. the delay needed to clock two values into two sets of latches (flip-flops), feed the output into a full-adder, then clock the result into another (or the same) set of flip-flops. How fast can that clock cycle be in the technology? Now take some fixed task and measure how many of those clock cycles it takes to complete it. The much-maligned Dhrystone will do.

How many instructions are needed to complete the task? How many full-adder clock cycles in total do those instructions take?

Modern RISC does a good job of minimising those clock cycles while using a small amount of hardware.

The first computers used a lot more clock cycles. That applies to not only the first mainframes but also the first minicomputers and the first microprocessors. They were simple in a way that seriously hurt performance.

kragen · on Oct 3, 2022

Can you name one? You've named Stretch but presumably you're not trying to say Stretch was a RISC; and the Cray-1 was developed after the 801 and isn't much RISCier than the 6600, and even less simple.

I admit I'm not very familiar with first- and second-generation CPU architectures, but the ones I have read about in any detail, like the LGP-30, the TX-0, and the IBM 1620, are very far from RISC.

adrian_b · on Oct 3, 2022

I am too lazy to search right now through the library, but there are plenty of books about the history of electronic computers, which describe some of the vacuum-tube computers of the fifties which introduced various innovations, e.g. the computers built by NBS (SEAC & SWAC), various IAS derivatives, the IBM scientific computing series starting with IBM 701, and many others. For some of the early computers there are manuals at bitsavers.org.

All the early computers had by necessity features with low implementation cost, i.e. hardwired control, fixed-size instructions, simple addressing modes, low number of instructions, etc.

The RISC CPUs returned to this simplicity, but they did not become completely similar with the early computers because the early computers did not have a large set of registers allowing the computations to be done by register-register operations.

The early computers did not have many registers for 2 reasons, first they would have been too expensive when made with discrete components, and second there was no need for them because the memory access was fast in comparison with the instruction execution time.

Nevertheless, the concept of an ISA separated in register-register computational instructions and load/store instructions for accessing the memory has also not been newly introduced by the RISC projects, but it dates back to the CDC 6600, in 1964, even if at that time having only 8 data registers was the reasonable register set size.

So all the RISC recommendations had been used, for various reasons, in earlier computers.

What was new with RISC, and this has appeared for the first time in the IBM 801 documentation, i.e. in 1975/1976, was the idea that by returning to the original simplicity of the older computers less hardware resources are needed to implement the CPU, and then the saved hardware resources can be used to implement other features that have a higher impact on performance, e.g. a larger register set, on-chip cache memory, faster parallel multipliers, and so on.

The RISC approach was a more rational CPU design optimization methodology, which advocated the measurement of the complete performance impact of any new feature, e.g. an extra instruction or addressing mode added to the ISA, and the rejection of the features which add too much cost while providing too little performance increase or even a performance decrease, because the added feature might accelerate a seldom used application, while slowing down all the frequently used applications.

The optimization approach proposed by RISC remains as correct today as in 1975 or 1980, but when different manufacturing technologies are available the optimum CPU design can be very different.

While including certain instructions in the ISA might have been a bad choice in 1980, e.g. an extra instruction for computing the tensor product of 2 vectors, a performance vs. cost analysis done in 2020 might conclude that it is in fact the best change that could be done to an ISA in the given technology, in order to provide maximum additional performance at minimum additional cost.

So what must be retained from RISC is not e.g. the necessity of having a fixed-length instruction encoding, or any other feature that was optimal in 1980, and which might happen to be still optimal in 2022, but this is not certain and it must be demonstrated by a new analysis, based on the current technological capabilities.

What must be retained from RISC is that CPU design is an optimization problem and any features of a design must be justified by concrete performance and cost numbers, and not by vague ideas about what might be useful or not useful to have in a CPU.

What is frustrating when looking at modern CPU or GPU designs, e.g. Zen 4, Raptor Lake or the RTX 4000 series, is that an outside observer cannot really assess whether the CPU or GPU design teams have done a good job or not.

An outside observer can measure the performance, i.e. the speed and energy consumption of a CPU or GPU when performing certain tasks, but there is no access to the cost information, which is a closely guarded secret, because foundries like TSMC do not provide design rules information, unless you are a potential big customer.

So an outsider cannot judge whether the current CPU and GPU architectures and micro-architectures are the best that can be done, or significant improvements are possible.

brucehoult · on Oct 4, 2022

I endorse this comment, in full.

I'll only add that the CDC 6600 could get away with "only" 8 data registers (60 bits each) because those registers held only the floating point (usually) values being calculated on, while there were another 8 registers to hold memory addresses (18 bits each) AND another 8 registers to hold integer variables (also 18 bits) such as counters and array indexes and loop limits.

So, it's actually a quite respectable 24 registers.

Basically the same as the (relatively) modern 68000 series with 8 data registers, 8 address registers, and 8 FP registers.

kragen · on Oct 4, 2022

Yeah, I can agree with it too.

I still feel like it's cheating a little because of the non-orthogonality of the registers: you can only load X1–X5, you can only store X6–X7, and you can't store into A1–A7 without causing a load or a store (which, hey, saves you an instruction when you're indexing through an array). But I guess there's nothing particularly un-RISCy about having floating point multiply and divide but no integer multiply and divide.

All this sounds an awful lot like someone designed a machine specifically to do two interleaved DGEMMs all the time. Why two? Maybe in order to hide the high latency of the heavily multibanked memory?

A funny thing is that this separation of address registers from "decrement" registers (the integer variables you mention, which as I understand it actually number only 7 because one of them is the zero register, as on RISC-V) is faintly reminiscent of CHERI and even more so some of its predecessors, but of course the 6600 didn't attempt any sort of security with these.

You might be aware lkcl is basing his LibreSOC on the 6600 (and, sadly, on POWER rather than RISC-V). It'll be interesting to see how it comes out; I think he's more enamored of the superscalar approach than the instruction set.

kragen · on Oct 5, 2022

Here's an example of the non-orthogonality. Here's a RISC-V loop that looks for a mismatch between two arrays of words (untested):

    loop:  lw   a0, (a3)      # load array item from s1
           lw   a1, (a4)      # and from s2
           bne  a0, a1, quit  # and compare them; if they were equal,
           addi a3, a3, 4     # advance the s1 pointer
           addi a4, a4, 4     # and the s2 pointer
           bne  a4, a5, loop  # and repeat if we haven’t reached the end
    quit:

That's 128 bits with C, and if the arrays are long, benefits hugely from even a fairly small instruction cache. I thought this would be more compact on the 6600 because incrementing the pointers also fetches from them, but this is the best I could come up with in my pidgin ASCENT:

    123456789 123456789 123456789 123456789 123456789 123456789 123456789 12
     LOOP     SA1  A1+1        .INCREMENT A1 BY 1 WORD, LOAD X1
              SA2  A2+1        .LOAD X2
              IX1  X1-X2       .COMPARE VALUES LOADED, LEAVING DIFF IN X1
              NZ   X1 QUIT     .LEAVE LOOP IF X1 IS NOT ZERO
              SB1  A1-B2       .SUBTRACT LIMIT VALUE IN B2 FROM A1 POINTER
              NE   B1 B0 LOOP  .IF DIFFERENCE /= 0 (B0 IS 0) REPEAT
     QUIT

I'm not sure this is actually correct (I think NZ does a floating-point zero test and IX1 computes a 60-bit integer subtraction) but even if it is, it's a disappointing result. It's still six instructions! And now they take up 180 bits instead of 128! (But that's probably mostly because the immediate arguments 1, 1, QUIT, and LOOP are all 18 bits; in RVC they're 8, 8, 11, and 11, and the jump offsets are PC-relative. More a question of RVC than orthogonality.)

I was thinking I could just use NE, just like RISC-V uses bne, but noo. NE is only for increment registers like B₁ and B₀! You can't use NE to compare A₁ and B₂, or A₁ and A₀, or X₁ and X₀. So the non-orthogonality of the register set stole back all the expressiveness I was hoping for from it.

Taniwha · on Oct 2, 2022

I'd include the PDP-8 and successors/cloners like the DG NOVA

kragen · on Oct 2, 2022

The PDP-8 is certainly simple, but it has microcode, doesn't have pipelined execution, never approaches one instruction per clock, has memory operands to virtually every instruction (including an indirection bit, a mortal sin in RISC), and has a tiny register set. You could say its zero page is like a "large orthogonal general-purpose register set" but ⓐ all your ALU instructions have to involve a register which isn't in that set (the single accumulator) and ⓑ the current PC page is is just as accessible as the zero page. All it has in common with RISC is the fixed-width instruction set and a (very) small number of instruction formats.

So I would say the PDP-8 is the opposite extreme from RISC.

If you like the PDP-8 you should check out the 32-bit version: https://github.com/johnwcowan/pdp8x

The NOVA is maybe a little more plausible, since it at least has a load-store architecture and four accumulators instead of one, but it still doesn't feel very RISCy to me. I mean it's maybe less RISCy than the 80386?

kragen · on Oct 3, 2022

Correction: although the PDP-8 has "microcoded" instructions, the "microcode" is actually in the instruction. There isn't a microcode store that runs a little program to implement each instruction, as there is on, for example, the 6502.

Taniwha · on Oct 3, 2022

yeah you see I think that's the riscy-ish thing of it all, and why I nominated it: having bits in the instructions wired directly to gates in the ALU without any decode is RISC at its essence

brucehoult · on Oct 3, 2022

In some ways. Maybe it's more like a short VLIW :-)

Either way, it's not conducive to having an Instruction Set Architecture with varying implementations.

Taniwha · on Oct 5, 2022

I think that's fair - but then jump delay slots used in a lot of early riscs, including the 801 are very VLIWy too (the one I worked on could have up to 3 instructions in a delay slot)

kragen · on Oct 5, 2022

I'd never thought of it that way!

cachvico · on Oct 2, 2022

> Moreover, in March 1982 there has been a very important symposium where all the teams working at that time on RISC CPUs have presented their work, including the Berkeley RISC, Stanford MIPS and IBM 801 projects.

> The presentations from that symposium have determined the designers of the future ARM CPU to change their architecture from some kind of 6502 extension to a RISC architecture.

That is absolutely fascinating. Are the papers from that conference available online?

adrian_b · on Oct 3, 2022

The presentations from that symposium have been published by ACM:

ASPLOS-I Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, March 1-3, 1982, Palo Alto, California. ACM Press, 1982, SIGARCH Computer Architecture News 10(2), SIGPLAN Notices 17(4)

The IBM 801 was presented by George Radin: "The 801 Minicomputer", pp. 39-47.

You can download freely all the presentations from:

https://dl.acm.org/doi/proceedings/10.1145/800050

There is somewhere online an interview with one of the ARM designers, I do not remember whether with Steve Furber or with Sophie Wilson, where they mention how they have learned about RISC from this symposium.

twoodfin · on Oct 3, 2022

Wow, that’s a goldmine of interesting papers.

kragen · on Oct 3, 2022

Thanks, this is great! There's also a paper on the disastrous iAPX432 and a bunch of other interesting stuff.

klelatti · on Oct 2, 2022

Funnily enough, I worked for IBM marketing (my first job, for a short while) in 1984.

One of our jobs was to demo System/36 - one of IBM's minis - which might have been replaced by an 801 system had they taken it up.

One of my colleagues went through the demo with a room of customers. At the end he asked for questions. Straight away a hand went up.

"If it's this slow with one user, how slow is it with five?"

Sometimes engineering does know best!

classichasclass · on Oct 2, 2022

> the rest of the world only knows about the history of RISC-V" which is the only RISC we learned about in technical school in the late 90's.

This surprises me, especially since RISC-V wasn't really a thing until around 2010. Not MIPS, SPARC, PowerPC, PA-RISC, ARM, all extant by this period and having a longer history, but RISC-V? What school was this?

cmrdporcupine · on Oct 2, 2022

I suspect the comment author typo'd and meant to write either MIPS or (more likely) Berkeley RISC.

p_l · on Oct 3, 2022

It's technically correct, because the "V" in RISC-V refers to it being the 5th project from UC Berkeley (after RISC-I, RISC-II, and tenuously naming SOAR as RISC-III, and either SPUR or VLSI-BAM as RISC-IV).

kragen · on Oct 2, 2022

It's surprising that you learned about RISC-V in technical school in the late 90s, since the RISC-V project began in 02010 at Berkeley. You should see if you can get in touch with your instructor and help him fix his time machine.

dboreham · on Oct 2, 2022

Perhaps parent meant MIPS.

kragen · on Oct 2, 2022

Perhaps, but it would be a lot more interesting if it turned out his instructor did know about RISC-V in the 90s!

rjsw · on Oct 2, 2022

The RISC-II thesis from 1983 references the 801, but it looks like IBM only started talking about it at around this time. The RISC-V project didn't start until 2010.

klelatti · on Oct 2, 2022

There is a funny quote from David Patterson in his Oral History

> There was the 801 project was secretive. There is something called the 801, “What is it?” It was very hard to know even what they were doing. It was supposed to be very exciting, very breakthrough, but unless John Cocke came and told you what they were doing, you couldn't figure out anything about it. So, there was that kind of the mystery of the 801.

I think word did get out in the late 1970s but not officially!

rjsw · on Oct 2, 2022

One of the references in the Katevenis thesis is "J. Cocke: Informal discussion on the IBM 801 mini-computer, U.C.Berkeley Campus, June 1983".

klelatti · on Oct 2, 2022

From memory I think Cocke lectured at Berkeley (and maybe at Stanford too?)

abudabi123 · on Oct 2, 2022

SC stood for Seymour Cray in RISC went one joke in his memory you can find on YT.

https://en.wikipedia.org/wiki/Seymour_Cray

brucehoult · on Oct 3, 2022

>in my head I was like "best keep that news to yourself pop, the rest of the world only knows about the history of RISC-V" which is the only RISC we learned about in technical school in the late 90's

That must have been quite some trick, since RISC-V was created only in the 2010s!

panick21_ · on Oct 2, 2022

The early risk people gave a lot of credit to Crey.

Zenst · on Oct 2, 2022

Interesting the IBMK ROMP https://en.wikipedia.org/wiki/IBM_ROMP Was a spin-off from the 801 project and was the CPU used in the IBM RT range and saw the birth of AIX.

wrs · on Oct 2, 2022

For those who haven’t heard of the RT, Bitsavers has a large hoard of docs. [0]

I was at CMU when the Andrew project got early-access PC RTs and it was crazy secretive due to IBM’s requirements. Special badge access, NDAs, machines chained to desks…very weird in a university environment. So it’s not surprising there isn’t a ton of public documentation of the 801.

[0] http://www.bitsavers.org/pdf/ibm/pc/rt/

BirAdam · on Oct 3, 2022

Thank you so much for posting! I have an old Byte magazine issue with an article about the PC RT and I’d been searching for more information about it! You just made my day m8!

Zenst · on Oct 3, 2022

It came out around time of the XT and AT, XT - being EXtended Technology, AT being Advanced Technology and both x86 PCs in effect. RT being Risc Technology and the birth of AIX, which I worked on back then. Was used in buisness, and I worked in reinsurance in which it was used. EVentually it got replaced by the RS/6000 line which was a the birth of the Power CPU, which we still have today.

One thing though, 5 1/4 in floppy OS installations on the RT was a real PITA, more so if you found a later disc was corrupted, get new one from supplier only to find their master copy was corrupted so you just got another corrupted disc, really chocolate teapot waiting for kettle to boil moment that.

They did well in finance, as they had good communication interfaces and things like SNA was a seller for linking to their mainframes and AS/400's systems, then then there was good serial handling via multiple ports.

The IBM RS/6000's came out around time of the PS/2 phase in IBM's life for a timeline comparison.

AIX2 was also a rewrite, as iirc AIX1 was written in prolog! Something like that. Back then for comparison, there was no Sun, you had VAX, and main alternative UNIX box of that time I encountered was the NEC Tower sorry not much more details but another common Unix platform of the time, though a little later on. Then around the RS/6000 time you had HP with it's HPUX come about and then Sun. Interesting times, but darn nobody wants to ever have to worry about termcap or terminfo, let alone link two different systems up for UUCP.

Alas no old documents left of that era my end, some later RS/6000 stuff, but buried.

BYTE was great, loved that, also UNIX World was another good mag of that early era that I lamentably miss.

djmips · on Oct 2, 2022

As an aside, the America's cup anecdote is related incorrectly. "On August 22 [1851], the [U.S.-built schooner] America joined 14 British ships for a regatta around the Isle of Wight." It won by a good margin and the Queen Victoria quote was at this race not at a later first America’s Cup race. (named after the schooner and not the country).

-> https://www.history.com/this-day-in-history/u-s-wins-first-a....

klelatti · on Oct 2, 2022

Thanks! The version in the article is based on Cocke's own comments but I'll add a note to clarify.

mnemotronic · on Oct 2, 2022

Oh my gosh. Talk about a flashback. When I heard the bump music following the return from the commercial break my brain went into a time-warp. I remember watching Computer Chronicles. And that RISC IBM desktop. $10k with all of 4Megs for those hefty scientific jobs. Woo-hoo!

sennight · on Oct 2, 2022

A few years ago I dedicated a couple of days to finding any evidence of documentation for the 801 beyond a small number of well known academic papers, but found none. It really bothers me, probably to an unreasonable degree, how common it is for corporations to neglect such an easy win. IBM is really bad about it, especially considering how long of a history they have - just look at how anemic their public facing corporate archives page is. Yes, they had a long running tradition of publishing very low level papers in publicly accessible journals, a huge number of very important papers... but when they killed off their internal journals that served as the fountain head for the effort - their chosen steward quickly locked it all behind a paywall. Young engineers WANT to revere their elders while bringing forward the state of the art, and there have been some good efforts made in the collection of narratives, but engineers need work product (not English lit) in order to do the ancients justice. There aren't enough articles like this, so my thanks to the author - well done... but his job could easily be made so much less difficult by corporations taking advantage of something with such a long tail of public good will: preserving and then releasing obsolete work product.

epc · on Oct 2, 2022

Much of IBM’s internal informal history was lost in the transition from the RSCS based TOOLSRUN forums to a mix of Lotus Notes/Domino/web forums around 1999–2000 (note that this was a business choice, all internal fora were already available via nntp). IBM Research might have additional documentation on the 801 and related products but it looks like you’d have to contact their research library directly, nothing on line.

sennight · on Oct 2, 2022

I wouldn't even dare to dream for version controlled source... I was over the moon when roughly dated personal backups of research edition Unix started popping up. From what I've seen in IBM documentation and a small number of code leaks - it would actually be very difficult to faithfully present their historical code simply due to the Rube Goldberg SCM they've employed. It was so wild that it permeated into their xcoff executable format.

epc · on Oct 3, 2022

Depends on the product lab. The o/s I worked on had some form of SCM dating to the 1970s (CLEAR). The thing is most of the source code from the 1960s and even 1970s would have been punch cards, shifting to tape in the 70s. There's probably landfills near Austin, Rochester, and Westchester county that have punch cards and tape libraries since I doubt IBM held onto them. At some point IBM shifted from the belief that maintaining history was important to the business to believing anything older than 90 days was too dangerous (from a litigation perspective) to hold onto.

kragen · on Oct 2, 2022

From IBM management's point of view, I think, the easy win was that even though they discovered RISC in 01974 and shipped it in ROMP in 01981, their competition didn't ship RISC until 01985, in the form of the janky MIPS R2000; ARM 2 and Fujitsu's SPARC MB86900 didn't ship until 01986.

Frank T. Cary and John R. Opel, who led IBM at the time, couldn't care less who young engineers in 02022 revere. Not only are they both dead, but also even at the time their objective was making a lot of money, not fostering engineering excellence, or garnering admiration for their minions.

sennight · on Oct 2, 2022

I won't pretend to have enough familiarity with the goings on in long past IBM management to be able to state with any authority what they may or may not have thought, but I'd be surprised if your single-minded characterization was close to accurate. If you read through the very long list of IBM published papers - you'll see that a lot of them covered novel solutions to real problems of the day that IBM was selling solutions for, in enough detail to save potential competitors a lot of time. That is a calculated risk that they've made for a long time (until pretty recently). It isn't difficult making the business case for what I've proposed... rumor has it that even Intel has recognized how badly they screwed themselves over by treating senior engineers as poorly as they did in the late 90s.

The whole imperative to recruit motivated talent isn't a new thing that started with Google's masseuses and sous-chefs, R&D heavy industries have been mindful of the advertising value in publishing for a long time.

kragen · on Oct 2, 2022

Oh, I absolutely believe that IBM researchers wanted to share their findings, and that IBM's management was to some extent willing to tolerate them doing so. But I don't think IBM's management (post-Watson at least) saw that as a benefit of having researchers, but rather a necessary cost, one they successfully curtailed during the 01970s. And I don't think IBM's management cared one way or the other what engineers outside IBM thought of it.

But none of them were my personal friends, so I might be inferring wrongly from their behavior. Their behavior was pretty egregious, though!

klelatti · on Oct 2, 2022

Rereading this again I think I may have omitted one interesting point. That the 801 was, I think, one of the first computers with a split L1 cache.

Will add a footnote in the absence of evidence to the contrary!

retrac · on Oct 2, 2022

I think it may have been the first microprocessor with an L1 cache of any kind. I can't find any earlier chips with on-die cache, anyway. Even the instruction prefetch queues of the 8086 and 68000 were still in the future when the IBM 801 went into manufacture.

Another consequence of the RISC philosophy. After discarding all that die area for control logic, there's still some space even in a tight transistor budget to implement a cache. And since it's a clean, new architecture, you can impose restrictions on things like self-modifying code that would otherwise have made splitting the I/D caches more difficult. Cache coherency is for the compiler to worry about.

monocasa · on Oct 2, 2022

To be fair, the 801 wasn't a microprocessor, and it's L1 wasn't on die (because it's CPU wasn't a single die to begin with). It was implemented in MECL-10K which is at the sameish level of die integration as TTL, just a decent amount faster.

kragen · on Oct 2, 2022

And there were earlier processors that had cache; WP cites the IBM 360/85 (01969) and the Atlas 2 (01964–01966) as the first computers with CPU cache.

I'm curious when the first microprocessor with cache was, since the 801 wasn't a microprocessor; the article mentions the 68040 from 01990 with a split on-die L1 cache (4K+4K), but the 68030 in 01987 had 256B+256B, and the 68020 in 01984 had 256B icache. The 68010 had a 6-byte instruction "cache" which allowed it to execute two-instruction loops without fetching instructions, but I don't think that counts.

I think maybe the RISC II (01983) or the RISC I (01982) might have had a cache before the 68020. Katevenis's dissertation is at https://archive.org/details/reducedinstructi0000kate but it isn't open access and I haven't read it. It had the overlapping register windows later popularized by SPARC, though, and we often think of a register window stack as a simpler (but less effective) alternative to a dcache.

According to https://en.wikipedia.org/wiki/Transistor_count neither the ARM 2 (01986) nor the RISC-II-descended Fujitsu SPARC MB86900 (01986) had a cache.

monocasa · on Oct 2, 2022

Huh, that's a great question. I'm certainly struggling to find something earlier than the 68020 as well.

> We decided that RISC I should not be burdened with the design of a full-blown on-chip cache, but an instruction cache would definitely be a good idea for the next-generation RISC.[0]

So it seems RISC-I didn't.

Additionally a die shot[1], and understanding of the architecture of RISC-II (doubling down on the big window register file as a potential substitute for on die D cache) appears to imply that it doesn't have any on die cache, I or D. I certainly don't see any arrays of SRAM other than the register file.

[0] - https://people.eecs.berkeley.edu/~kubitron/courses/cs252-F00...

[1]- https://people.eecs.berkeley.edu/~pattrsn/Arch/RISC2.jpg

kragen · on Oct 2, 2022

Thank you! I wasn't confident in understanding the die shots.

With 40 years of hindsight I think we can probably conclude that caches are a better use of excess on-die transistors than register windows are? Even though register windows theoretically reduce the cost of subroutine calls and thus of good factoring.

I feel like even a fairly small instruction cache, or explicitly-used instruction scratch memory, could have produced big payoffs in reducing instruction fetch cost, especially for RISC designs without hardware indirection? Like, if your compiler emitted explicit code at the beginning of every short leaf function to load it into instruction scratch memory, then run the scratch memory, instead of running it from RAM? At least if it contained a backward branch?

rjsw · on Oct 2, 2022

The Katevenis thesis describes another group at Berkeley designing an external I-cache chip to work with the RISC-II CPU, there are also a few pages analyzing how much difference a D-cache would make, doesn't look to me like RISC-II had either.

kragen · on Oct 2, 2022

Thanks!

fanf2 · on Oct 2, 2022

The first ARM with a cache was ARM3 (1989, 4kiB)

kragen · on Oct 2, 2022

Thanks, Tony! That's what I thought.

Given Sophie Wilson's concern at the time over the importance of memory bandwidth as a limiting factor in system efficiency, I wonder why they didn't at least include a 68010-like "loop mode", so you could eliminate the cost of instruction fetch in inner loops? My familiarity with ARM is not very good and so I'm not even sure if you can use LDM/STM to optimize memcpy but I'm sure it doesn't help with things like the FFT or string search.

fanf2 · on Oct 2, 2022

Yes, LDM / STM were great for memcpy-like things (eg graphics). For other loops the usual technique was to use predicated instructions to avoid pipeline flushes, and loop unrolling.

You would have to ask Sophie Wilson or Steve Furber why there is no loop cache; it’s an interesting question. I guess there simply wasn’t the transistor or complexity budget for it.

AnimalMuppet · on Oct 2, 2022

Off topic: I've downvoted you about five times today, and I really ought to explain why. You have good content, and it pains me to have to downvote.

I'm downvoting you because of the absurd thing you do with your dates.

What's absurd about it?

1. It's not the standard way of writing dates. (Communication is about finding things understood the same way between speaker and listener; non-standard usage messes with that.)

2. Because of #1, it's harder to read. It looks wrong. We have to take a half-second to think about it every single time. Multiply that by several posts a day, and also by hundreds of people reading each post, and it starts to add up to enough time to matter. This particular post is worse, because of all the CPU numbers that are five digits, and your year numbers start matching the pattern in our heads for CPU numbers.

3. It's trying to grind a Long Now axe, in a post that has nothing to do with anything related (other than having a year number in it). It's like, would you find it annoying if there was a zealous Christian on here, who had to work a reference to Jesus into every single post that mentioned philosophy or sociology? You might even find it annoying enough to start downvoting every time they did it.

So, please. Stop. If you're talking only to Long Now people, use the Long Now date format. If you're trying to explain the Long Now to everyone else, explain the dates as part of that. But if you're talking to everyone else about the history of RISC, just use everyone else's date format.

monocasa · on Oct 3, 2022

Aren't your complaints the explicit goal of that date format? It's supposed to trip people up for a second to get them think about years with a different exponent on the floating-poibt/scientific-notation representation our brains fall into using pretty much all of the time. And that in this context there is no true 'appropriate' time besides maybe an emergency situation where understanding on a hard real time latency is paramount.

Basically you might have more success when your reasoning for stopping doesn't also coincide with why someone might be doing it in the first place.

AnimalMuppet · on Oct 3, 2022

His clever way of tripping me up annoys me. Do you think that makes me more likely to adopt his viewpoint? Or less?

I may be alone in being annoyed enough to write a rant about it, but I suspect I'm not alone in being annoyed by it, especially when he does it over and over. His stunt may be useful in drawing attention, but I suspect it's counterproductive at drawing agreement.

082349872349872 · on Oct 3, 2022

Compared to some other possibilities: MCMLXX or 5730 (let alone An CLXXVIII or 62 AF) I find 01970 (or even 000001970) amazingly harmless.

(there is a —very— small community which schedules working hours around ~25 hour "sols", but I doubt they reckon by 668 sol years; does anyone know?)

kragen · on Oct 3, 2022

300 years ago people like AnimalMuppet would have attacked you for writing the date as 1722 instead of MDCCXXII, for precisely the same reasons, and been precisely as correct. Fortunately not everyone is like that.

AnimalMuppet · on Oct 3, 2022

Yeah, thanks for telling me what I would have done 300 years ago. Nice time machine you've got, to be able to tell that.

300 years ago, the advantages of 1722 over MDCCXXII were blindingly obvious. It wouldn't be what I was then used to, but it had clear advantages. Whereas today, the advantages of writing 02022 are... um... that kragen likes to do it that way.

a1369209993 · on Oct 3, 2022

> Whereas today, the advantages of writing 02022 are...

Moving the N-digit rollover from 8000 years from now to 98000 years from now, which... isn't.

kragen · on Oct 2, 2022

> Off topic: I've downvoted you about five times today, and I really ought to explain why. You have good content

You don't, so I don't see the relevance of your opinion about how I ought to write good content.

Write things your way, and I'll write them my way.

At its best, this site is for writing good content, not spelling flames and personal attacks on nonconformists for being weird. Nonconformists being weird is why we have computers and the internet in the first place.

throwawaylinux · on Oct 2, 2022

Out of curiosity, why do you write years that way?

vincent-manis · on Oct 3, 2022

When I see a number with a leading zero, I think “octal”. Perhaps octal years are A Thing?

twoodfin · on Oct 2, 2022

The basic architecture diagram for the 801 is fascinating. In one sense it’s boring, because to a first approximation every modern computer is similar. On the other hand, it was the prototype for at least the subsequent 50 years of general purpose computing.

From a high enough altitude, the only novelty in modern platforms is the GPU.

klelatti · on Oct 2, 2022

Absolutely. The familiarity makes it hard to appreciate the novelty.

I've looked at the architecture of some of IBM's minis of the era (and some of the more ambitions microprocessors - eg iAPX432) and they just look so strange in comparison.

jscipione · on Oct 2, 2022

Tangentially related to article, what happened to the riscv beagle board, is that still going to be available for purchase?

drmpeg · on Oct 2, 2022

The chip it was going to use (the JH7100) never went into production and they had to cancel the project.

There's a Kickstarter for the newer JH7110 chip.

https://www.kickstarter.com/projects/starfive/visionfive-2

brucehoult · on Oct 3, 2022

> The chip it was going to use (the JH7100) never went into production and they had to cancel the project.

The Beagle "Starlight" project that was supposed to have shipped in September 2021 was cancelled, but the JH7100 chip shipped on a very similar board as the "VisionFive v1" in December 2021 and is still available today. Though you'd be crazy to buy it for $180 when the much better VisionFive 2 (with, as you say, the JH7110) is now available for $45-$75 depending on RAM.

It's probably correct to say the JH7100 never went to MASS production, and the VisionFive V1 is probably using MPW chips.

monocasa · on Oct 2, 2022

Additionally Pine64 is putting out a board with the same JH7110 chip too.