This exists now. Some AI accelerators are a grid of independent compute units with their own memory, message passing between them. Graphcore's IPU is an instance.
An AMD GPU is a grid of independent compute units on a memory hierarchy. At the fine grain, it's a scalar integer unit (branches, arithmetic) and a predicated vector unit, with an instruction pointer. Ballpark of 80 of those can be on a given compute unit at the same time, executed in some order and partially simultaneously by the scheduler. GPU has order of 100 compute units, so that's ~8k completely independent programs running at the same time.
You've got a variety of programming languages available to work with that. There's a shared address space with other GPUs and the system processors, direct access to system and GPU local memory. Also some other memory you can use for fast coordination between small numbers of programs.
There's a bit of a disconnect between graphics shaders, the ROCm compute stack and what you can build on the hardware if so inclined. The future you want is here today, it just has a different name to what you expected.
K if I can transpile C/C++, Rust or TypeScript to that and have full access to memory, threads, system APIs, network sockets, etc, then that would work for the use cases I have in mind. Running MIMD processes on SIMD hardware is something I'm definitely interested in.
If there's no straightforward way to do that, then I'm afraid that hardware represents a huge investment in the wrong direction.
Because a GPU can be built from the general-purpose multicore CPU I'm talking about. But a CPU can't be built from a GPU.
What I'm getting at is that if I have to "drop down" to an orthodox way of solving problems, rather than being able to solve them in the freeform way that my instincts leads me, then I will always be stifled.
An AMD GPU is a grid of independent compute units on a memory hierarchy. At the fine grain, it's a scalar integer unit (branches, arithmetic) and a predicated vector unit, with an instruction pointer. Ballpark of 80 of those can be on a given compute unit at the same time, executed in some order and partially simultaneously by the scheduler. GPU has order of 100 compute units, so that's ~8k completely independent programs running at the same time.
You've got a variety of programming languages available to work with that. There's a shared address space with other GPUs and the system processors, direct access to system and GPU local memory. Also some other memory you can use for fast coordination between small numbers of programs.
There's a bit of a disconnect between graphics shaders, the ROCm compute stack and what you can build on the hardware if so inclined. The future you want is here today, it just has a different name to what you expected.