
Stack Computers: the new wave (1989) - falava
http://www.ece.cmu.edu/~koopman/stack_computers/index.html
======
tikhonj
I've been playing around with some Greenarray[1] chips lately, and I have to
admit that the core ideas are fascinating. Their particular variant of Forth
is actually almost pleasant to use--certainly better than MIPS, which is my
closest point of reference. (Ignoring crazy things like the color of
identifiers controlling your program's semantics.) The chips are also
_extremely_ power efficient.

[1]: <http://www.greenarraychips.com/>

I've found having a mental model of a stack-based system far easier than even
a simple register-based one. However, this might also be true because the
Greenarray chips themselves are far simpler than any other CPU I've ever
looked at.

And, of course, the fact that you have 144 cores on a tiny chip _despite_ a
cheap manufacturing process is also very neat.

~~~
gruseom
Oh, please write about your experiences programming those chips. I'm sure I'm
not the only one who'd like to hear all about it. I'd particularly like to
know how you deploy code to the 144 cores and how they talk to one another.

Edit: that request is to anyone who has programmed the GreenArrays chips.

~~~
batgaijin
Yes, please write this! I'd normally just upvote your comment, but I feel like
the sorts of people who play around with GreenArray chips are not that prone
to blogging, at least by my googling...

------
luu
Mini book review: On the off chance you're planning on implementing a stack
machine, you _must_ read this book. A few years ago I was working on a stack
machine, and I didn't know what to read because virtually all of the
references for fundamental material are decades old. Let me save you the
trouble and tell you this is the one you want. There are newer papers you
should read, of course, but if you're only going to read one thing, it should
be this.

Why would you want a stack machine in this day and age? They're small. Really,
really, small. The stack processor we made was so small that it fit in an
empty space in the floorplan of our "real" (x86) processor.

~~~
daniel-cussen
I'm using stack machines now because, like you said, they're small. Meaning,
you can fit a ton of them on a chip, and if you can get them to talk to each
other, you have yourself a pretty powerful parallel processor. The chip I'm
working with has 144 cores running at ~700MHz, and it costs $20. It's insane
how much being small buys you.

~~~
jallmann
Chuck Moore's GreenArrays chip? (He's also the guy behind Forth, the
quintessential stack-based VM.) How's that working?

~~~
daniel-cussen
The very same. It's working really well—initially I was intimidated about the
chips because of a few of the constraints the chip has in order to be as good
as it is. Three challenges:

1) Figuring out how to work within the very low (but reasonable) amount of RAM
each core has. They're independent computers, so each core needs to store its
code and its data, all within 64 18-bit bytewords, which roughly corresponds
to a tweet of information if you tweeted in UTF-8.

2) Dealing with the IDE: the IDE is attacking a problem nobody's solved:
parallel compilation. That being said, it has been the second greatest
challenge so far, after

3) Getting the hardware set up. My co-project-doer is light-years ahead of me
in the EE department and makes most of the EE decisions, with some input from
me about what capabilities the chip needs to have (a minimum amount of off-die
RAM, say). Sourcing parts is something I would not have been able to do on my
own, so I'd say this project requires a good deal of electronic literacy, if
not dexterity.

That being said, though it's a bit of a prima-donna, it has every right and
reason to be. It's an amazing, amazing chip. It's fast, powerful, unclocked,
can work with all kinds of devices, is a crazy bargain both in hardware cost
and in energy cost, and best of all, for me at least, an endless source of
fun, hard problems. It's a fascinating chip.

~~~
listic
Can you give a hint as to what we can expect to do with this chip?

I was a Forth enthusiast and read the book in OP (in early 2000's) and
followed Chuck's work from afar, but I am not sure what exactly I could do
with this chip, given its peculiar limitations.

~~~
daniel-cussen
With low RAM like that, what you want to do is math, not UI. That means
electrical engineering or number crunching, AFAIK.

OK, so first, it has a lot of I/O ports of different kinds on just one chip.
In a lab, you could run lots of analogue knobs and many servos attached to
just one chip. This means a lot less hardware debugging, which should be a
Godsend.

Another possibility is HPC. You can tile a motherboard with these, cool them
all down to -50 very easily and cheaply, and then overclock them as you
perform number crunching that you couldn't do with a GPU. After all, these
cores are truly independent, and can therefore branch independently, unlike
GPU cores. That opens the door to a wide variety of algorithms that aren't
GPGPU accessible.

~~~
listic
A very peculiar kind of number crunching that would be. With 64 words, that
should be some finely selected numbers :)

I'd love to be able to do some useful work with that, of course.

~~~
RodgerTheGreat
Don't think of the F18 computers in isolation- think about doing stream
processing with a network of them. You probably can't fit your application on
one F18, but you could easily fit one or two functions in 64 words, especially
given that a single machine word can contain up to four instructions.

------
zxcdw
One domain where stack machines shine is code density because of no need for
register encoding. Instruction lengths aren't measured in bytes, but rather in
bits. With some form of binary coding, one can reduce code size even further,
1-7 bits per instruction is not uncommon with rather elaborate instruction set
which includes a handful of "macro" instructions for common tasks such as
initializing array of data and random number generation and some others.

If implementing a stack based VM within a program, the stack machine has some
overhead(because it has to be implemented with all the functionality and
interfacing with the rest of the program) but that is mainly a curiosity - one
can implement a less-well scaling ISA without binary coding and fixed opcode
length with far fewer bytes for the VM, but then the cheap cost will be offset
with worse code density. On the other hand a more advanced VM takes far more
space, but the greater code densty pays off after certain amount of code.

Where this matters is of course demoscene. ;)

~~~
Tuna-Fish
Bit-aligned instructions are a really horrible idea if you ever want to make a
CPU implementing that instruction set actually fast. When you want to decode
multiple instructions in parallel, having more possible locations for the
beginning of the next instruction adds a lot of latency and power on a very
critical path. Getting more cache to fit instructions and more bandwidth from
it to the cpu is a problem that can be solved by Moore's law, having to put
more muxes in front of your decoder isn't, now that Moore's law just buys you
more transistors and less power, but not faster transistors.

x86 is now considered to be extremely hard to decode because it's byte-aligned
instructions, and basically all modern instruction sets prefer fixed-width
instructions. ARM used to tout how space-efficient it's THUMB-2/ARMv7
instruction set was compared to other RISC instruction sets because it had 2
possible instruction lengths (16b and 32b), but now that they got to (had to)
reboot the ISA for 64-bit, ARMv8 is fixed-width.

~~~
zxcdw
This is definitely true.

For clarity for those who aren't exactly clear with all this, this only
applies when the instruction set is actually compressed somehow - not for
stack based machines in general.

------
jules
Since then the result seems to be that stack vs registers, it doesn't really
matter as much as people thought. A modern processor that's out of order [1]
has to reconstruct the dependencies in the dataflow graph anyway. Out of all
the things going on in a processor, doing so from a register ISA or from a
stack ISA is not that different. A bare bones stack machine may be smaller
than a bare bones register machine, but nobody wants to run those anyway.

<http://en.wikipedia.org/wiki/Out-of-order_execution>

~~~
dfox
One of problems of stack machines is that it is mostly impossible to build
useful out-of-order stack machine (or even in-order multiple issue). Short or
non-existent pipelines and low CPI of stack machines tend to offset this for
certain classes of code (lots of branching and so), but cannot rival
performance of modern register based designs for other classes of code (number
crunching).

~~~
tailrecursion
It is straightforward to convert stack instructions to 3-address instructions,
by keeping a counter -- a stack pointer -- that points into the register file.
As instructions are decoded the stack pointer is read and the current reading
is delivered on down the pipeline with the instruction itself.

From there on, the cpu is executing an instruction that calls out explicit
register numbers. The stack counter is kept with the decoder at the very start
of the pipeline. This "register number decoder" needs to run sequentially, and
it needs to run a faster cycle time than the rest of the cpu.

------
cookingrobot
Idea: Notch should use a stack machine as the basis for his new 0x10c game. (A
big part of the game involves programming an in-game 16bit computer.) It'd be
a great excuse for a lot of people to explore this architecture.

~~~
rrmm
They can also use the the Java VM or CLR as those are both stack machines.

------
justincormack
I do hope more old technical books make it to a new life online. Otherwise
whole areas will get increasingly hard to access, as there are only a limited
number of library copies.

~~~
Evbn
ACM has a bunch of classic books for download.

