That has been tried several times before. As long as small is "enough faster", small&fast+large+overhead beats large. (In really fast processors, active register values are lots of places, so they don't even access the register file except for values that haven't been used for a while.)
> When you do this, you lose all the cache control latency and context switch overhead, resulting in a much smaller and faster core,
Huh? Context switch overhead is time, not space. Cache control is negligible space.
> leaving plenty of space for 32Gb on die :)
Not yet you don't. None of this stuff is as dense as dram and DRAM is just now hitting 4Gbit. Since fast processors do take some space....