I thought some of the ALU's where double pumped in p4 and newer cpu's. If that's the case we are already running some part's of the CPU at 3.7 * 2 = 7.2Ghz.
Edit: Yea, the integer ALU's operate at twice clock speed. I think the move to 64bit CPU's slowed down the clock speed race as did adding more cores and insane amounts of L2 cache due to heat issues. As a side note when I picked up my quad core CPU at 2.4 GHz it crushed my old 1.8Ghz 32bit cpu so I think they are making some wise tadeoffs due to system buss limitations. Plus the old P4's seemed to trade off a lot of effective for pure clock speed. So after we get back on track we might see a new clock speed race when we have 8 - 32 cores.
You put the carriage in front of the horse: we're moving to multiple cores because we can't increase the clock speed. The number of transistors we can fit on a chip is still following Moore's Law, we're just using them in a different way.
We were squeezing more and more performance out of single cores by lengthening the instruction pipeline, which increased the amount of instruction parallelism processor's could exploit at runtime. The difficulty is that as this pipeline is increased, it takes longer to send information across it. As we decrease cycle time (same as increasing the clock speed), it becomes harder and harder to communicate from one end of the pipeline to the other in a single clock cycle.
Edit: Yea, the integer ALU's operate at twice clock speed. I think the move to 64bit CPU's slowed down the clock speed race as did adding more cores and insane amounts of L2 cache due to heat issues. As a side note when I picked up my quad core CPU at 2.4 GHz it crushed my old 1.8Ghz 32bit cpu so I think they are making some wise tadeoffs due to system buss limitations. Plus the old P4's seemed to trade off a lot of effective for pure clock speed. So after we get back on track we might see a new clock speed race when we have 8 - 32 cores.