
Why CPU Frequency Stalled (2008) - sajid
http://spectrum.ieee.org/computing/hardware/why-cpu-frequency-stalled
======
CalChris
This was written in 2008 and so I'm surprised there's no mention of
Patterson's power wall argument.

Multi core wasn't some great idea. They were forced to do it. Intel hit the
power wall first and then went multicore.

 _But around 2003, chipmakers found they could no longer reduce the operating
voltage as sharply as they had in the past as they strived to make transistors
smaller and faster. That in turn caused the amount of waste heat that had to
be dissipated from each square milli­meter of silicon to go up. Eventually
designers hit what they call the power wall, the limit on the amount of power
a microprocessor chip could reasonably dissipate. After all, a laptop that
burned your lap would be a tough sell._ \--David Patterson.

A few power wall cites:

[https://www2.eecs.berkeley.edu/bears/presentations/06/Patter...](https://www2.eecs.berkeley.edu/bears/presentations/06/Patterson.ppt)
[http://spectrum.ieee.org/computing/software/the-trouble-
with...](http://spectrum.ieee.org/computing/software/the-trouble-with-
multicore)
[http://www.edwardbosworth.com/My5155_Slides/Chapter01/ThePow...](http://www.edwardbosworth.com/My5155_Slides/Chapter01/ThePowerWall.pdf)

This article strikes me as revisionist history.

~~~
CalChris
I'll add that the power wall and the transition to multicore is covered very
well in Chapter 1 of Computer Organization and Design which is available as a
sample chapter.

[http://booksite.elsevier.com/samplechapters/9780123747501/Pa...](http://booksite.elsevier.com/samplechapters/9780123747501/Patterson_Chapter%201.pdf)

------
marmaduke
I was expecting something a little more interesting from IEEE about
characteristic time scale of transistor switching also as a function of
voltage. So you can switch faster but either need to raise voltage or accept
defects or errors? Then finally it makes sense to discuss heat dissipation
etc.

~~~
VLM
Part of the answer, yes no one has any idea how to improve (deg C) / (watt)
which is how you measure thermal resistance.

Thermodynamics was all figured out in the days of steam engines and we're
kinda stuck now at a maximum.

If you examine the equation there's no reason you can't dump 1000 watts if
you're willing to heat the silicon die to 1000 C like a glowing vacuum tube.

However (insert massive hand waving) there are characteristics of silicon
doping vs temps and all this hand wavy stuff that depends on deg C such that I
assure you that you can't build a junction that works with todays technology
at BOTH room temp and 1000C (so you'd have to preheat your CPU, perhaps with a
flame?) or worst case scenario with current chemical processes I'm not sure
you can build a junction that works at 1000C at all. I'd have to think about
that for awhile.

Anyway pull most datasheets and even for bulk power transistors (like a RF amp
or the switching transistors in your power supply) the max die temp is 125C or
150C but I don't casually remember ever seeing a transistor with a max die
temp over 175C (which is hardly scientific proof there's no exotic chip out
there that runs at 200C or 400C...)

Your stereotypical small signal bipolar NPN like a 2N3904 might max out
junction temp at 150C. Some of the graphs like collector leakage current are
exponential with temp and by 150C you better design that leakage into your
circuit. Note that at a couple hundred degrees per watt thermal resistance
from junction to ambient you're not going to dump much power in a small signal
transistor like that, but its certainly enough to run relays and LEDs and
stuff, and the same general temp limits apply to all transistors. Some beastly
multi-GHz multi-watt monster might have a staggeringly lower thermal
resistance but the junction temp limit is still gonna be 150C or so.

I guess in summary if off the shelf you could run junctions at 500C then heat
wouldn't be an issue or limiter for another, I donno, decade or two. But the
150 or 175 or whatever limit is a hard physics limit with current tech, so ...

If you want to impress an EE with space-ship x-files area 51 BS give him a
semiconductor that works while glowing orange hot or so. That would have
pretty interesting performance specs, I bet. Its also impossible AFAIK at
current human technology levels. Either that or a semiconductor with room
temperature superconductive bond wires. Which is more likely first? Who knows.

~~~
derefr
> assure you that you can't build a junction that works with todays technology
> at BOTH room temp and 1000C

How about a chip with cores that are specc'ed for low-temp, and cores that are
specc'ed for high-temp, where the high-temp cores are designed to be the heat-
sink for the low-temp cores? The low-temp cores would be something like a
"starter motor."

~~~
VLM
That is a very good idea. Superficially it would be expensive to ship two
identical cores at high and low power but an interesting strange idea I came
up with is to ship all new cores at high temp, transition all older core
designs to low temp, and bond in multiple pieces of silicon. After all, when
its cold/off its not doing anything computationally intensive, so the cold
core being older/cheaper process wouldn't matter very much.

So a CPU with a burned out hot core is still usable as a low performance cold
only machine but a CPU with a burned out cold core can only be booted if you
hold a match to the heatsink for awhile... that will be an interesting
troubleshooting technique for youtube videos.

------
kijin
4GHz seems to be the soft limit with x86 processors.

Ten years ago, the single-core Pentium 4 stalled just shy of 4GHz. Today's
high-end Core i7's also have a stock frequency around 4GHz, though of course
they're at least ten times more powerful. AMD tried 5GHz at one point, but
that was just a gimmick and most of their current models stick to 4GHz or
less.

Despite having radically different designs and process sizes, roughly the same
frequency remains the point after which energy consumption just runs out of
control. Performance-wise it doesn't matter because we've gone multi-core and
found ways to get more work done per clock cycle, but I wonder why the top
frequency seems to be so consistent.

------
static_noise
I was wondering why the graphs stopped like a decade ago with a Pentium 4.
Then I saw the date of the article which is 2008 a.D.

------
polskibus
Please put 2008 in the title

------
gravypod
I think we also forget that frequency does not mean CPU power.

~~~
vbezhenar
Why not? All things being equal, higher frequency allows to do more
operations/s.

~~~
biofox
The shortcoming of clock frequency as a performance measure is illustrated by
comparing Instructions per Second (MIPS) [1].

e.g. ARM Cortex A7: 2,850 MIPS at 1.5 GHz

Qualcomm Krait (Cortex A15-like, 2-core): 9,900 MIPS at 1.5 GHz

Both processors have the same clock frequency, but one has over three times
the processing speed in MIPS.

[1]
[https://en.wikipedia.org/wiki/Instructions_per_second#Timeli...](https://en.wikipedia.org/wiki/Instructions_per_second#Timeline_of_instructions_per_second)

~~~
static_noise
Does this hold true for algorithms which are hard to parallelize because
everything depends on intermediate results which are hard to predict?

~~~
biofox
As others have said below, issues that have little to do with parallelisation
can heavily influence performance, e.g. caching and I/O.

Referring to the list on wikipedia again, compare two different 4-core CPUs:

Intel Core i5-2500K 4-core: 83,000 MIPS at 3.3 GHz

Intel Core i7 875K: 92,100 MIPS at 2.93 GHz

~~~
static_noise
I'll have to repeat my question in other words. How many MIPS do you think,
remain, if every third instruction is a mispredicted branch and every second
memory access is a cache miss?

Modern CPUs use pipelining which executes many instructions parallel. This
only works well if everything goes as predicted. If you have an algorithm
which works contrary to what the branch prediction thinks and a cache which
does not hold the data you need, your performance goes down the drain. Those
MIPS mean nothing if not put into the right context.

~~~
twoodfin
One of my favorite papers is relevant here:

[http://www.hpl.hp.com/techreports/Compaq-
DEC/WRL-93-6.pdf](http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-93-6.pdf)

It explores the limits of instruction-level parallelism. If you had a
processor that could dispatch an unlimited number of independent instructions
simultaneously, how much of an improvement in common algorithms would you get?

~~~
static_noise
They get an average parallelism of 4-10 with some standard algorithms assuming
unlimited resources. Where are we now with modern intel processors on those
algorithms? 2-3?

------
yuhong
I noticed that the 65nm Pentium D was "losing" ~30W of TDP per new stepping at
the same clock speed. I can imagine a 125W Pentium D 990 at 4.2Ghz based on
the D0 stepping. I wonder if this was the original plan.

------
amelius
> The rising power consumption of CPUs made it less attractive to focus on
> cycles per second, so clock rates stalled.

I'd pay for having more single-threaded performance on my desktop where power
is not an issue. But I guess the average user (word-processing, browsing) does
not care. Gamers also don't care since they can use special hardware, because
games are more easily parallelized (if you divide the screen into individual
blocks, then you can just divide the rendering work to individual cores).

~~~
pavlov
The IBM POWER8 goes up to 5GHz, for those few applications where single-
threaded performance really matters.

~~~
dom0
IBM zEC12 goes to eleven, err 5.5 GHz.

------
nfbush
(2008)

------
castratikron
It's just rotational dynamics from Physics I. The kinetic energy of something
spinning is (1/2)Iw^2. The energy of something spinning is proportional to the
square of the speed that it spins, or I guess you could say "O(n^2)". If you
double the clock frequency, you quadruple the heat, and at a certain point you
can't go any farther because you don't have a way of removing heat fast enough
before it melts.

They even show it in the article:

>­underclocking a single core by 20 percent saves half the power

0.8^2 = 0.64, pretty close.

~~~
static_noise
I see you are very knowledgable and an expert in the field. Could you please
give us a short introduction into how spintronic transistors work?

