
Why did post-8008 CPUs not keep the on-chip stack idea? - theandrewbailey
https://retrocomputing.stackexchange.com/questions/15078/why-did-post-8008-cpus-not-keep-the-on-chip-stack-idea
======
olliej
SPARC CPUs do functionally have an on chip stack via register windows.

Basically they have something like 256 registers set up so the output
registers overlap the input registers of the next callframe. e.g.

    
    
        iiirrrooo
              iiirrrooo
                    iiirrrooo
    

Where i == input register, r == general purpose register, and o == output
register. When code calls another function it places the first N arguments in
the output registers. The called function reads arguments out of the input
registers.

This results in very high call and return performance. Until you exhaust the
available register windows. At which point you fault, and the kernel has to
manually copy the register window stack to memory. Similarly on return you may
have reached the top of the stack you fault and the kernel has to copy the
windows back from memory into the register windows.

For SPARC this was apparently an ok tradeoff as the "big iron" machines of the
past generally did not recurse heavily.

~~~
compiler-guy
One of the biggest problems with such an architecture is that you cannot
predict whether a given function call will overflow the on-chip physical
registers, and therefore will be expensive. So a simple change like adding a
wrapper function, or a compiler choosing a different inline heuristic can
affect the performance of your code in strange and unpredictable ways.

Context switches also get more expensive, because you have to swap out the
entire windowed register file, and not just what is visible to software. (That
can be mitigated by additional hardware support, but the complexity is high.)

~~~
olliej
Yeah, that why it was considered acceptable for big iron/servers of the time.

There not a huge amount of context switching (basically few processes relative
to the number of cpus in the system), and known types of programs.

They knew in _general_ most code they ran would not blow the stack. The cost
of course if you ran atypical (for their designed purpose) your perf would be
clobbered.

------
Animats
In modern CPUs, the last few items on the stack rarely leave the vicinity of
the CPU and its caches. The programmer's model shows them as being stored in
addressable memory, but that's part of the illusion of superscalar cached
CPUs.

~~~
userbinator
I believe newer x86 actually has a set of registers that cache the top of the
stack and is as fast as the other registers, but it's hard to find information
about this feature. One of the ways in which this is visible is that using the
push/pop instructions is faster than manually writing to memory at the stack
pointer and then adjusting the stack pointer.

~~~
MaxBarraclough
> a set of registers that cache the top of the stack and is as fast as the
> other registers

I could have sworn I'd read something like this - the 'top' words of the stack
always being kept in a very low-level cache - but I was unable to find a
source on it.

~~~
recursivecaveat
The (theoretical) MMIX architecture that Knuth books use has a register-stack.

~~~
p_l
It's a mechanism from few real-world RISC cpus that was used in MMIX, i.e.
register windows which form a stack (RISC cpus otherwise usually didn't have a
stack other than by convention). One CPU that used that approach was SPARC,
although the difference is that MMIX has variable-length windows.

------
weinzierl
This is not only an excellent answer but also a fantastic history of the
development of the stack.

> _" I've found no particular evidence that the stack pointer was made a full
> 16 because they felt any need for a stack to be that large. It's clear that
> at least some experienced microprocessor developers (the MOS 6502 team) felt
> that an 8 bit stack pointer (256 byte stack) was plenty. It's possible that
> the 8080 designers disagreed, or it's possible that they felt they couldn't
> force a particular area to be RAM, as the 6502 designers could. (Even more
> than the MC6800, the 6502 design strongly encouraged page $00 to be RAM, so
> forcing page $01 to be RAM was no hardship.) Or perhaps it just didn't occur
> to them that registers pointing into memory could be any less than 16
> bits."_

To add a little bit of detail to the preceding paragraph:

A page in 6502 parlance is a contiguous block of 256 bytes in address space.
Page $00 are the first 256 bytes of address space, page $01 are bytes $0000 to
$00FF and so on. Page $00 (so called zeropage) is treated specially by the
processor and is therefore required to be backed by RAM (not ROM or IO).

The 6502 has an 16 bit address bus but only an 8 bit stack pointer. The stack
was fixed at 256 bytes (a page) in size and fixed in location at $0100-$01FF
(page $01).

So the point made above is that because the 6502 already required the zeropage
to be RAM it was easy to require the RAM for the stack in a fixed place too,
while the designers of other contemporary processors didn't have that luxury.

~~~
a1369209993
> Page $00 are the first 256 bytes of address space, page $01 are bytes $0000
> to $00FF and so on.

Er, you mean page $01 is bytes $0100 to $01FF, right? And 6502 in general
didn't use flat addressing, IIRC; eg `12FE,x` would address
12FE,12FF,1200,1201,... with increasing x register, rather than 1300,etc.

~~~
weinzierl
> Er, you mean page $01 is bytes $0100 to $01FF, right?

Yes, yes, thanks for the correction.

> And 6502 in general didn't use flat addressing, IIRC;

Not so sure about that.

> eg `12FE,x` would address 12FE,12FF,1200,1201,... with increasing x
> register, rather than 1300,etc.

This is correct but I think this is more due to the fact that the index
register wraps around, and I would still call it flat addressing. A better
argument to not call it flat would be the use of bank switching, which I think
was common in 6502 based designs. But, yeah, this is probably just splitting
hairs over terminology and I agree with you. Thanks for the corrections.

------
stormbrew
IA-64 (Itanium) had a register stack that was used for a lot of the things
memory stacks are used for[1]. It's basically a set of virtual registers into
a large register file that's then spilled out to main memory as necessary if
the stack grows large enough.

So I don't think it's an idea that's completely gone away, it's probably just
that until relatively recently the complexity involved in doing it with modern
code (with deep stacks with large stack frames) was not really a worthwhile
use of die space. And now x86 and arm are so thoroughly dominant that other
paradigms have difficulty getting traction.

[1]
[https://devblogs.microsoft.com/oldnewthing/20050421-28/?p=35...](https://devblogs.microsoft.com/oldnewthing/20050421-28/?p=35833)

------
davidgould
My first computer was a wire wrapped home brew with a Signetics 2650 cpu, and
a handful of 74LS logic and 1kx4 static ram chips. The Signetics 2650 cpu had
an 8 level on chip return stack. It was a good microprocessor for the time,
easy to interface, and generally nice to program. But, there was no way to
extend the stack or even access it so it could only be used when the call
depth, including interrupts, could be guaranteed never to exceed seven levels.
It enjoyed some success as an embedded cpu, but was never used in personal
computers, probably because of the stack limitation.

------
JoeAltmaier
Curiously, this is an idea who's time has come again. Keeping the return
address space (call stack) separate from the display containing addressable
data, could avoid some nasty security issues that plague most architectures.

~~~
Sniffnoy
Seems to me the issue of overflows corrupting return addresses could be
avoided if the stack just grew upwards instead of downwards. Wouldn't prevent
data corruption from overflows, but then neither would what you're
suggesting...

~~~
JoeAltmaier
How so? Indexing off a display structure negatively works as well as
positively...

------
stephc_int13
The Saturn CPU that was used in HP calculators back in the 90s had a hardware
return stack, with only eight levels. It was only possible to use six levels
safely, as the interrupt routine used the remaining two.

The hardware stack enabled nice tricks back in the day, at the whole CPU was a
fun little beast.

4-bits bus and addressable units (nibbles), plenty of 64bits registers, 20
bits addresses...

~~~
altmind
low-end PIC8 micro-controllers still use tiny hardware-based stack, these are
still sold and widely used.

------
seppel
The x87 had (and still has) a stack. Itanium tried something as well.

The problem is that it does not really fit to complied code. The compiler
usually has no clue how deep the stack is when a function is called (esp. with
function pointer, virtual methods, etc), so it will not or can not generate
optimal code for this. Plus you get the trouble with interrupts.

However, modern CPUs understand how the stack is used and actually will
maintain and on-chip stack for you. That is, if you are using the standard
function prologue and you dont play any tricks with the stack-modifying
instructions.

------
weinzierl
Some current real-time processors still have an on-chip stack for return
addresses, not entirely unlike the one the 8008 had. One example is the ARM
Cortex-R which can store up to 4 addresses on its dedicated return stack.

------
GeorgeTirebiter
The AT&T Hobbit chips were designed to run C, and were used in the AT&T EO
portable tablets.
[https://en.wikipedia.org/wiki/AT%26T_Hobbit](https://en.wikipedia.org/wiki/AT%26T_Hobbit)
It was entirely stack-based. Another 'stack' machine was the HP-3000.

------
timonoko
Stacks 4 n00bs. Cosmac did not have any.
[https://en.wikipedia.org/wiki/RCA_1802](https://en.wikipedia.org/wiki/RCA_1802)

------
mobilio
And how many ram have 8008 for this?

Because on later CPUs can exhaust it with few calls.

