What you are showing is not a purebred stack machine; it's a machine with stack-based temporary calculations that rely on randomly accessed operands.
The first C that I used was like this. It was actually a tiny, toy subset of C powering a programming game called C Robots by Tim Poindexter. I think that's how I first heard about Yacc, too.
Anyway, if you want to defend the hypothesis that C is a stack language under the hood from an implementor's POV, it would help to show how expressions involving function calls and variables translate to something resembling canonical, point-free Forth.
The local variables must turn into anonymous stack locations accessed implicitly. If a variable is used as an operand, and is still live (has a next use), you must DUP it to keep a copy in the stack.
Lisp isn't a stack machine either. Lisp expressions can map to stack operations, and easily so if we can have random access to off-stack operands, but do not have to. My own TXR Lisp uses a register-based VM.
1> (disassemble (compile-toplevel '(set y (+ (* a x) (foo b)))))
** expr-1:1: warning: unbound variable y
** expr-1:1: warning: unbound variable a
** expr-1:1: warning: unbound variable x
** expr-1:1: warning: unbound variable b
** expr-1:1: warning: unbound function foo
data:
syms:
0: y
1: sys:b+
2: sys:b*
3: a
4: x
5: foo
6: b
code:
0: 90040003 getlx t4 3
1: 90050004 getlx t5 4
2: 20020003 gcall t3 2 t4 t5
3: 00040002
4: 00000005
5: 90050006 getlx t5 6
6: 20010004 gcall t4 5 t5
7: 00050005
8: 20020002 gcall t2 1 t3 t4
9: 00030001
10: 00000004
11: 94020000 setlx t2 0
12: 10000002 end t2
instruction count:
8
#<sys:vm-desc: 8c0dc50>
The registers are allocated on the native stack, in a frame whose size is determined at compile time and sized to fit using alloca() at run-time.
Local variables (not seen here) turn into v registers, treated uniformly with t registers.
We don't see t1 and t0 above because t0 is a read-only register that holds nil, and t1 is the assembler temporary.
I don't have constant folding through variables working. So gcall t4 1 d1 d0 is still doing the (* a x) multiplication, though directly on the d1 d0 registers that hold these values. Ideally, this
The original unoptimized code is generated like this:
The optimization eliminated the variable frame allocation "frame 2 3" and its matching "end t2", and mapped all the v registers belonging to that frame to new t registers. It propagated the d0, d1 and d2 values, eliminating some of those registers. v00001 became t9, and the "movsr t2 t9" was eliminated, which involved changing the subsequent "end t2" to "end t9".
Another thing we see is the reduction of the + and * functions to internal binary-only variants sys:b+ and sys:b, but that's not done as a pass over the code; the initial instruction selection does that, sensitive to opt-level*.
I'm not sure how easy this kind of work is on stack machines.
> What you are showing is not a purebred stack machine
I see where you are coming from now, and yes, you are correct these stack CPUs are not pure. The whole point though of trying to do a stack machine CPU (however impure) is to simplify register allocation, which is why the 1982 C Machine Stack Cache paper I cited has a title beginning with Register Allocation for Free.... Whether writing a compiler or writing assembler by hand, at some point you hit the problem that it's not easy to fit all your variables into registers r0 through r15 (or whatever) and you must then spill them onto the stack. You evidently have first-hand experience with the challenges of keeping performance critical variables in registers. So the essence of these stack processors, the whole point of trying to build them, is to just let you spill everything on the stack and let the CPU worry about how to map stack locations to fast registers.
Excerpt from the C Machine Stack Cache paper[1]:
The goal of the Stack Cache is to keep the top elements of the stack in high speed registers. The problem we solve is how to perform the allocation of these registers without placing the burden on the compiler, and at the same time retaining the efficiency of register accesses. Control of the hardware, the instruction set, and a disciplined use of the aforementioned calling sequence allows a memory-to-memory style architecture (i.e. registerless to the compiler) to perform an automatic binding of memory addresses to machine registers.
The first C that I used was like this. It was actually a tiny, toy subset of C powering a programming game called C Robots by Tim Poindexter. I think that's how I first heard about Yacc, too.
Anyway, if you want to defend the hypothesis that C is a stack language under the hood from an implementor's POV, it would help to show how expressions involving function calls and variables translate to something resembling canonical, point-free Forth.
The local variables must turn into anonymous stack locations accessed implicitly. If a variable is used as an operand, and is still live (has a next use), you must DUP it to keep a copy in the stack.
Lisp isn't a stack machine either. Lisp expressions can map to stack operations, and easily so if we can have random access to off-stack operands, but do not have to. My own TXR Lisp uses a register-based VM.
The registers are allocated on the native stack, in a frame whose size is determined at compile time and sized to fit using alloca() at run-time.Local variables (not seen here) turn into v registers, treated uniformly with t registers.
We don't see t1 and t0 above because t0 is a read-only register that holds nil, and t1 is the assembler temporary.
With locals:
I don't have constant folding through variables working. So gcall t4 1 d1 d0 is still doing the (* a x) multiplication, though directly on the d1 d0 registers that hold these values. Ideally, thisThe original unoptimized code is generated like this:
The optimization eliminated the variable frame allocation "frame 2 3" and its matching "end t2", and mapped all the v registers belonging to that frame to new t registers. It propagated the d0, d1 and d2 values, eliminating some of those registers. v00001 became t9, and the "movsr t2 t9" was eliminated, which involved changing the subsequent "end t2" to "end t9".Another thing we see is the reduction of the + and * functions to internal binary-only variants sys:b+ and sys:b, but that's not done as a pass over the code; the initial instruction selection does that, sensitive to opt-level*.
I'm not sure how easy this kind of work is on stack machines.