Wait, am I understanding you correctly? It used _main memory_ as registers? I fe...

dreamcompiler · on June 24, 2017

> It used _main memory_ as registers?

Yes, that's exactly how it worked. The 9900 only had one internal register, which pointed at the current "register bank" in main memory. I worked at TI in those days and wrote code for the 9900. It wasn't a crazy idea when the chip was designed; after all it made context switches completely free. But after the chip went into production, the speed differences between CPUs and DRAM started becoming obvious.

userbinator · on June 24, 2017

because that seems absurd even for 1978.

At that time memory could be at nearly the same speed or even faster than the CPU; in fact the CISCs which were popular left the memory bus mostly idle while they executed instructions internally, which is what let the relatively memory-bandwidth-consuming RISCs become viable.

In fact I'd almost bet that, had memory always been slower than the CPU, RISC would've never been invented.

Someone · on June 24, 2017

Case in point:

https://xania.org/201405/jsbeeb-getting-the-timings-right-cp...:

"So far so good - it seems unusual to our modern “memory is slow” mindset that the processor touches RAM every cycle, but this is from an age where processors and RAM were clocked at the same speed."

This page also shows that the 6502 happily did extra memory reads.

In fact, the memory had bandwidth to spare. http://www.6502.org/users/andre/osa/oa1.html#hw-io:

"The memory access is twice as fast as the CPU access, so that during Phi2 low the video readout is done and at Phi2 high the usual CPU access takes place"

I remember reading about a setup with two 6502s both running from the same single-ported RAM at full speed, but cannot find it.

dom0 · on June 24, 2017

> I remember reading about a setup with two 6502s both running from the same single-ported RAM at full speed, but cannot find it.

Commodore 8050

kabdib · on June 23, 2017

Sure, it's a decent idea. You can even make it fast if you cheat.

I believe that the PDP-10 (well, some versions) had the first few memory locations equivalent to registers.

The AT&T Hobbit (Aka Crisp) chip had a stack pointer, and aggressively cached memory around the stack, essentially. Once cached, stack-relative memory operations were as fast as registers. (The Apple Newton was going to use the Hobbit, but switched to ARM when it became clear that AT&T wasn't truly interested in committing to consumer grade pricing of the CPUs).

gumby · on June 24, 2017

> I believe that the PDP-10 (well, some versions) had the first few memory locations equivalent to registers.

Essentially. Actually the registers were addressable as the first 16 addresses in memory for all models. For the PDP-6 and first PDP-10 (model KA) the registers were fast semiconductor devices (DTL, as I believe it predated TTL) while the rest of memory was literally core (convenient for when the power went out, as happened occasionally in Cambridge -- whatever process was running died since the registers were lost, but everything else was in core, so the machine could just be restarted).

Since they were addressable you could run code out of them, like bootstrap routines or some deranged TECO code I once wrote). On the other hand any word of memory could be used as a stack pointer (two addresses fit in a word, so one half was the base and the other half the depth).

It was quite a RISC-like, highly symmetrical architecture for its time and a pleasure to program. I still miss it.

bronson · on June 24, 2017

Huh! The first BeBoxes were built around the Hobbit as well. The switch to PowerPC happened when it was clear that its performance and addressing were always going to be weird. Not sure it even got to the point of discussing volume.

octetta · on June 24, 2017

There were was a PC modem board that had the Hobbit in it. I can't seem to find any references to it online though.

dbcurtis · on June 23, 2017

Kind of a performance limiter if you actually do it. There are architectures where the programmers' reference manual is written as if registers are in main memory, but the CPU brings the active registers into physical registers and fakes it. In olden days (of water chillers) I worked on CPU's like that. It essentially requires creating a look-aside buffer so that memory addresses that map to registers in the current context get redirected to the physical registers, and you have to watch out for order dependencies.

So not totally crazy, but a severe performance limiter unless you can afford the complexity of the standard trickery.

TI did make a personal computer that competed with the likes of the VIC-20. It was sloooooow, but had nice (for the time) color graphics.

(edit typos)

mjevans · on June 23, 2017

When I was in college an 6800 derived embedded systems prototype board had 'really fast' in-package memory used for such a setup.

In the context of something vaguely like an SoC where you can make "0 page" memory registers fast with a small bank of high speed SRAM it can make sense, particularly for decoupling manufacturing defects or silicon production processes.

Of course for modern, potentially out of order and speculative branch predicting, pipelined instruction systems this is a horrid idea.

dbcurtis · on June 23, 2017

Well, actually, once you buy into everything you need for O-O-O execution with synchronous exceptions, a lot of stuff that seems difficult at first glance becomes cheap because you can build on the existing O-O-O infrastructure. Anytime you can belly-flop onto the reorder buffer scoreboard the hard stuff just falls out.

deepnotderp · on June 23, 2017

Damn, and I thought x86 was a bad architectural decision....

WalterBright · on June 23, 2017

The PDP-10's register set occupied the first 16 memory address locations, although they were not implemented as memory.

This made instructions simpler, as register instructions did not need a distinct addressing mode.

dbcurtis · on June 23, 2017

Yes, a lot machines of that era did. The Univac 1100 series did, and I believe the IBM 709 and 7090 machines did as well. The low performance machines used actual memory, the higher performance machines had backing registers in the CPU.

Animats · on June 24, 2017

The UNIVAC 1100 machines didn't put registers in memory; they just allowed programs to reference them via memory addresses. This removed the need for register-to-register instructions.

dbcurtis · on June 26, 2017

1100/10 used main memory (plated wire) for the registers. 1101, 1102, and 1108 may have but I can't say for certain. 1100/80 definately use registers in the CPU and redirected matching memory addresses to the registers.

krylon · on June 24, 2017

That is what the two above postings said, too. Or at least I understood them to say that.

ams6110 · on June 23, 2017

I owned a TI99/4a as a kid and at the time I was interested in the TMS9900 archictecture.

I always thought the workspace-pointer-to-register-set could make for some easy multitasking context switches. You just change the workspace pointer and immediately you're working in another context.

In practice it was slow though compared to processors with real registers.

abecedarius · on June 24, 2017

I had one too. The really shocking bottleneck though was this: just 256 bytes of RAM were CPU-addressable; the 16k bytes it was advertised to come with were video RAM. If you bought the "mini memory" 4k expansion and coded in assembler, the speed seemed competitive with other home computers around then. Apparently it was coding in BASIC through the video memory bottleneck (and iirc an extra level of interpretation? I think I read somewhere that their BASIC interpreter was written in an interpreted VM code) that made your programs so slow on the TI-99.

(Added: https://en.wikipedia.org/wiki/Texas_Instruments_TI-99/4A#VDP...)

jhallenworld · on June 23, 2017

Correct, but you had the option to use fast (external) SRAM for the area which held the registers. Even so, the 64K limit was the bigger problem.

Actually 8088 really had a huge advantage: easy to port CP/M apps to the IBM-PC. Even if the 68K was ready to go, 8088 was probably the better choice.

Gibbon1 · on June 23, 2017

One of the things I remember reading in the late 80's was that for IBM the 68000 was a no go because it would have doubled the number of DRAM's needed for a bare bones system. And Motorola didn't have the 68008 ready yet. Important because back then memory was a huge part of the cost of the machine.

jandrese · on June 23, 2017

By "bare bones" system do you mean 64kb? The original Mac had a 68k and only 128kb of RAM. I guess we are talking about the late 70s here though. The Commodore 64 didn't come out until 82 and it got along just fine with 64kb.

IIRC the 68k was an expensive chip period. It was the chip you used if you had money to burn.

to3m · on June 24, 2017

The 68000 has a 16-bit bus but the 8088 is only 8 bit. I don't know about today, but RAM chips of the time supplied 1 bit per address, with the capacity being a square (since column and row are addressed using the same pins). So if you need twice as many bits per cycle, your options are basically to have twice as many chips (of a quarter the capacity each... probably cost-effective, but now your system has half the RAM), or twice as many chips (of the same capacity each... and now it's twice the cost).

Fitting twice as many chips on the board is probably a pain too. (And suppose you go for the 2 x quarter capacity option - now you need 4x as many if you want the same amount of RAM!)

Gibbon1 · on June 24, 2017

(My memory is fuzzy)

I think around 1979-1980 that an Apple II with 16k was like $800. One with 48k was $1900. And they weren't passing any markup on the memory. Much different than today where the cost of DRAM is much smaller fraction of the cost.

krylon · on June 24, 2017

I remember finding a scan of an invoice from the late 1960 for a Univac mainframe somewhere on the Internet. The total invoice was ~1.5 million dollars, the CPU costing ~500k, the RAM (~768 KB) costing 800k, and the rest ~200k.

kazinator · on June 24, 2017

Obvious reversal: the 8086 has a 16 bit bus, and the 68008 only 8.

There is a slower kid in every family. :)

Note that both the 8086 (1978) and 68000 (1979) were introduced ahead of, respectively, the 8088 (1979) and 68008 (1982). Basically these 8-bitsters were probably kind of a cost reduction following a familiar pattern in the hardware industry: product catches on, then customers want to put it into more and more things that are cheaper and cheaper, with simpler boards, where big MIPS aren't needed.

krylon · on June 24, 2017

As the article points out at one point, at least some part of the reason to make the 8088 were existing peripheral devices that were not compatible with the 8086 I/O-wise.

(Those reasons are not mutually exclusive, of course.)

jecel · on June 24, 2017

It was pretty easy to use 8080 peripheral chips with the 8086 and some very few clones did just that. IBM itself had to deal with the problem on the PC AT which had the same i/o chips but a 286 processor. It needed to replicate externally the circuit that the 8088 had internally due to the need to be compatible with the old 8 bit cards as well as the new ISA ones.

The 68000 would have been more of a problem since it moved from the matched memory and clock cycle scheme of the 6800 to a four clock cycle scheme with a complicated handshake. A special memory mode and two extra pins made it talk just fine to the 8 bit i/o chips. There was no need to wait for the 68008 for that.

One huge mistake that was made in the 8088 and 68008 (and I will suppose the TMS9980 as well, though I haven't checked) was that they didn't have a simple way to take advantage of page mode access in DRAMs like the original ARM did. If they had, the gap in performance compared to the 16 bit bus models would have been smaller.

to3m · on June 24, 2017

And the 1979-1982 period is exactly when the original PC was released, hence the problem!

bluedino · on June 24, 2017

The base IBM PC (in 1981) came with 16KB of memory

The Mac came out three years after

jandrese · on June 27, 2017

It's hard to imagine getting any actual work done on a machine with 16KB of main memory and no memory manager. A single page of text is 2k, and you can't go too fancy with mappers and memory paging schemes because the code would be too complicated to fit in memory...

One can see reasons for C's design tradeoffs if you're worried about machines like that.

But even after you abuse every trick in the book it's hard to see how that machine isn't hobbled by its lack of memory.

krylon · on June 24, 2017

> only 128kb of RAM

I remember reading the first IBM PC came with 16k. A former colleague of mine once reminisced about programming on a PC in the late 1980s that only had 256k of RAM (although I think that must have been an old or low-end machine).

Gibbon1 · on June 24, 2017

> 128kb of RAM.

Probably 16 64kX1 DRAMS.

> IIRC the 68k was an expensive chip period.

Fuzzy memory but the 68000 was a 64 pin ceramic package. I remember comments that the IC testers of the day didn't have enough IO to test them. That upped the cost as well.

gaius · on June 24, 2017

The 6502 does something similar with its use of the "zero page", remember back then the disparity between CPU and RAM speeds was alot less than it is today.

chillingeffect · on June 24, 2017

Yes but INX is 2 cycles and the zero page versions INC $55 and INC $55,X are 5 and 6 cycles, respectively. These regs are 2.5x to 3x the speed of ext mem. So having no internal regs would just be a disaster.

jacquesm · on June 24, 2017

The 6502 'zero page' was pretty much the same concept.

https://en.wikipedia.org/wiki/Zero_page

chillingeffect · on June 24, 2017

Only in a vague sense. The 256 byte window can't be relocated. Most instructions (boolean and ALU) only leave results in the accumulator, not RAM. (Inc/dec are an exception, but very slow). Internal registers are used to index zero page bc it can't index itself. Direct, arbitrary access to 16 bit address space is also possible without updating a memory pointer via additional write instruction. (The zero page is basically a parallel set of instructions with eight zeros in the upper 8 bits off address).

jacquesm · on June 24, 2017

> The 256 byte window can't be relocated.

That's true, but that doesn't change the principle. Other processors (like the 6809 for instance) used the same model and could relocate the 'zero' page.

> Most instructions (boolean and ALU) only leave results in the accumulator, not RAM.

Yes, but that's the way this is supposed to work. A, X and Y are scratch with the real results held in 'quick access' zero page variables.

> (The zero page is basically a parallel set of instructions with eight zeros in the upper 8 bits off address).

Yes, and this was explicitly designed in such a way to offset the rather limited register file of the CPU.

In the 6809 it was called the 'direct' page, and in that form it was a lot more usable since you could do a complete context switch with a single load (which the 6809 operating system OS/9 used to good effect).

cmrdporcupine · on June 24, 2017

I don't know about the TMS99 series, but on the 6502 memory read could be as short a single cycle. The 6502 had a concept of a "zero page" which treated the first 256 bytes of memory specially, with single cycle access. So they could be used as a kind of register.

But of course there was only one ALU, and to use it you had to use the Accumulator register.

duskwuff · on June 24, 2017

> On the 6502 memory read could be as short a single cycle…

No, it couldn't. Even a NOP was 2 cycles. Memory access is at least 3 cycles for a zero-page read (read opcode, read immediate byte, read data), or more for the more complicated addressing modes.

to3m · on June 24, 2017

The 6502 definitely reads or writes every cycle! - consult the data sheet (or VICE's 64doc) for more info. This is also not hard to verify on common 6502-based hardware.

Suppose it executes a zero page read instruction. It reads the instruction on the first cycle, the operand address on the second cycle, and the operand itself on the third. 1 byte per cycle.

(For a NOP, it reads the instruction the first cycle, fetches the next byte on the second cycle, then ignores the byte it just fetched. I think this is because the logic is always 1 cycle behind the next memory access, so by the time it realises the instruction is 1 byte it's already committed to reading the next byte anyway and the best it can do is just not increment the program counter.)

duskwuff · on June 24, 2017

Oh, it certainly accesses memory every cycle. But the effective "efficiency" of the memory access instructions is much lower -- if you were writing something like a memcpy(), for instance, they wouldn't be contributing to its transfer speed.

ksherlock · on June 24, 2017

The Renesas 740 (M740) 6502 variant had a T processor status bit which would cause certain instructions to operate on $00,X instead of the accumulator.

filereaper · on June 24, 2017

If I'm not totally mistaken, similar concepts are still applied today. IIRC POWER 8 offloads some of its registers to L1 when running at higher SMT modes like SMT 8.

duskwuff · on June 23, 2017

It might have been reasonable for an older minicomputer design, which would probably run memory at the same speed as the CPU, and would only read or write one register per cycle. For anything larger, though, it'd certainly be catastrophic.

Gibbon1 · on June 23, 2017

Thought of mine was these early 16 bit machines were targeted towards the baby minicomputer market. But all the growth was in the personal computer market. In that market speed wasn't that big of a deal. However users were rapidly bumping up against the limitations of a 64k address space. And that's why they needed 16 bit machines.

jecel · on June 24, 2017

Actually, a pure 16 bit processor (like the TMS9900, MSP430, Z8000, etc) can only address 64KB (with byte addresses) or 128KB (with word addresses). Just like a pure 8 bit processor (like the Kenbak-1) can only address 256 bytes.

The solution is to either have a hybrid, such as 8 bit data and 16 bit address, or use some kind of memory management unit. So the 8088/8086 had a segmented sort of MMU built in while many 8 bit computers added external MMUs to break the 64KB barrier (MSX1 machines could have up to 128KB of RAM while the MSX2, still Z80A based, could have up to 4MB per slot).

Gibbon1 · on June 24, 2017

I remember the Hitachi HD64180 and Zilog Z180 had a paged based MMU. 512k of memory.

And a lot of embedded 8051 designs used one of the 8 bit ports to extend the address space from 16 to 24 bits. I think both common C compilers for the 8051 supported that memory model.

Also if I remember the 68000 correctly indirect addressing was 16 bit register + 32bit constant. Definitely not a 'pure' memory model. Though the 808x was far more ugly.

duskwuff · on June 24, 2017

I think you're remembering the 68000 backwards. All of the registers were 32-bit, and some addressing modes supported a 16-bit immediate displacement.

Gibbon1 · on June 24, 2017

I am remembering it backwards. Remembering back, I found a compiler bug that had to do with the 16 bit offset being computed incorrectly for a large data structure. (overflowed)

abecedarius · on June 24, 2017

Yes, it had an instruction set similar in feel to the PDP-11, using 16 general-purpose "registers" and about the same set of addressing modes, but you could change the base address in memory of the "register" set. (There was a BLWP instruction: branch and link with workspace pointer. https://en.wikipedia.org/wiki/Texas_Instruments_TMS9900#Inst...)