My point is to remove the distinction between the register file and the main memory so that the entire CPU's working set is linear and no copies are required, therefore drastically increasing speed.
When you do this, you lose all the cache control latency and context switch overhead, resulting in a much smaller and faster core, leaving plenty of space for 32Gb on die :)
No existing architectures will do this as they rely on the memory hierarchy. I'm talking about a new architecture.
That has been tried several times before. As long as small is "enough faster", small&fast+large+overhead beats large. (In really fast processors, active register values are lots of places, so they don't even access the register file except for values that haven't been used for a while.)
> When you do this, you lose all the cache control latency and context switch overhead, resulting in a much smaller and faster core,
Huh? Context switch overhead is time, not space. Cache control is negligible space.
> leaving plenty of space for 32Gb on die :)
Not yet you don't. None of this stuff is as dense as dram and DRAM is just now hitting 4Gbit. Since fast processors do take some space....