I actually tried to build something like this (with cables, not stacked) circa 1990. My plan was to implement a snooping protocol in the 16KB of extended memory. All from discrete TTL components (74LSXXX). Surprinsingly, it never worked.
Then I moved on to playing with machines with ~100 FPGAs where I can instantiate as many RISC cores as I want. I'm not sure I'd want to move back to 6502 for parallel programming.
I'm glad you found the AppleCrate interesting. I have always found the Applesoft BASIC programming environment very inviting and satisfying, and the AppleCrate leverages this accessibility into parallel programming.
The NadaNet network is a low-speed "reinvention" of Ethernet that I developed to support simple networking of Apple II computers, like the AppleCrate. It was a very educational project, and has proven to be a robust implementation.
The mechanical compatibility of Apple II boards simplifies construction, and the low power makes it a desktop-friendly device.
It certainly won't win any speed contests with modern hardware, but the fact that it runs at a more nearly human-perceptible speed is actually a benefit--from a "blinkenlights" perspective. ;-)
I should put up some YouTube videos of it in operation. There are already a few AppleCrate/NadaNet videos up, but
they are mostly presentation.
I wonder how hard it would be to copy this with old PC boards. Given how non-standard PC hardware is, I guess you'd have to know what motherboard you've got before you can start trying to mod it to network boot.
It looks like Coreboot doesn't support any boards older than the Pentium II timeframe.
I figured I'd be dredging up hardware old enough to not support PXE. If it's possible to add a PXE NIC and boot off the network without a PXE-enabled BIOS, that is both news to me and very cool.
From what I've heard (I have a friend who tried to build a Propeller-based embedded system) it's a miserable architecture to code for.
It assumes that you're using either assembler or the proprietary "Spin" language. The accessible memory space is small enough to break the code generated by C compilers; the only solution I know of involves a mini-hypervisor that can swaps data in and out of memory.
It also doesn't support interrupts. Their solution? Dedicate an entire core to busy-waiting for a hardware ready signal. o_O