Microcode (invented long long ago by Maurice Wilkes, who did the EDSAC, arguably the first real programmable computer) used the argument that if you can make a small amount of memory plus CPU machinery be much faster than main memory, then you can successfully program "machine-level" functionality as though it were just hardware. For example, the Alto could execute about 5 microinstructions for every main memory cycle -- this allowed us to make emulators that were "as fast as they could be".
This fit in well with the nature, speed, capacity, etc of the memory available at the time. But "life is not linear", so we have to look around carefully each time we set out to design something. As Butler Lampson has pointed out, one of the things that make good systems design very difficult is that the exponentials involved every few years mean that major design rules may not still obtain just a few years later.
So, I would point you here to FPGAs and their current capacities, especially for comingling processing and memory elements (they are the same) in highly parallel architectures. Chuck Thacker, who was mainly responsible for most of the hardware (and more) at Parc, did the world a service by designing the BEE-3 as "an Alto for today" in the form of a number of large FPGA chips plus other goodies. Very worth looking at!
The basic principle here is that "Hardware is just software crystallized early" so it's always good to start off with what is essentially a pie in the sky software architecture, and then start trying to see the best way to run this in a particular day and time.