Software is eating the world, and as the speed and availability of the network continues to improve dramatically, computing workloads will continue to centralize, and companies will serve their massive user bases with wimpy cores at a massive scale.
Amdahl's Law says that "the speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program."
So while you have a hard limit trying to improve performance of a single function through parallelization, there is no upper limit on running multiple instances of the same function to respond to multiple concurrent and independent requests.
Individual cores need only to be fast enough such that the latency of a single request/response is acceptable. After that, any increase single-core performance requires a commensurate increase in the complexity of the scheduler, leading to rapidly diminishing returns, not to mention significantly lowering 'Responses per kWh'.
The fundamental shift toward multi-core processors may have been necessary due to physics, but not coincidentally (I think) it's also exactly the architecture you want if you are trying to serve large numbers of requests over a network.