In 1993-4 I was an intern at Inmos, when they were trying to get the T9000 transputer to work.
The transputer was a stack architecture, with a bytecode-style instruction set. The stack was shallow, just big enough to evaluate a typical expression you would write in a high-level language. It also had a local addressing mode, relative to a “workspace” pointer register, which was used for local variables.
To make the T9 go faster, they gave it a “workspace cache”, which was effectively a register file. The instruction decoder would collect up sequences of bytecodes and turn them into RISC-style ops that worked directly on the registers, so the stack was in effect JITted away by the CPU’s front end.
A really cool way to revamp an old design; a pity that the T9 was horribly buggy and never reached its performance goals :-(
The transputer was a stack architecture, with a bytecode-style instruction set. The stack was shallow, just big enough to evaluate a typical expression you would write in a high-level language. It also had a local addressing mode, relative to a “workspace” pointer register, which was used for local variables.
To make the T9 go faster, they gave it a “workspace cache”, which was effectively a register file. The instruction decoder would collect up sequences of bytecodes and turn them into RISC-style ops that worked directly on the registers, so the stack was in effect JITted away by the CPU’s front end.
A really cool way to revamp an old design; a pity that the T9 was horribly buggy and never reached its performance goals :-(