Having many registers at the programmer level turned out to not be all that useful. SPARC machines did that, but it wasn't a huge win.
Manageable parallelism is an answer to "how to use a bajillion registers."
They have a bajillion registers because that works to solve the problem they have, not because they're trying to find out "how to use" them. If you had the same amount of die area and a scalar-ish problem, you'd spend it on a traditional cache hierarchy just like CPUs do.
The answer is that this is too cumbersome and hard to deal with. So is memory, in a smaller way - when things are in memory they can be affected by other threads, the outputs of different instructions can overlap, etc etc.