The obvious solution to the memory bandwidth problem is to partition it and connect the pieces directly to the cores. Even if there is a shared region, cores don't have to worry about coherence when using their own private memory.
Yes, computing gets very interesting when what we have is no longer an overgrown IBM 5150.
Each SPE has a tiny amount of memory (256K, IIRC), severely limited connectivity to other SPEs and a downright cruel instruction set. A friendlier ISA and Transputer-like connectivity between the nodes would alleviate some of these problems.