
An Inferno on the Head of a Pin - ingve
https://blog.codinghorror.com/an-inferno-on-the-head-of-a-pin/
======
soylentcola
The talk of older processors not needing heatsinks/fans reminded me of my
first (and thankfully last) lesson on this matter back around 2001-ish.

In the past I'd assembled several PCs and my less-than-rigorous setup process
typically involved slotting the CPU, RAM, and other cards, then plugging it
into a monitor and powering up to run through a quick POST just to make sure
nothing was misaligned or improperly seated. Then I'd power off, put down some
thermal paste, mount the heatsink/fan, and connect any remaining cables for
drives, etc.

Well, in the past, this was fine. Sure, I wouldn't want to run things without
at least the crappy little Intel or AMD CPU fan but 5 or 10 seconds was no big
deal. But this time I was building for a friend who had me order one of those
fancy new 1GHz Athlon chips. I was psyched as well because the small markup
I'd added for my time helped me afford my own fancy 1GHz CPU and new
motherboard for my own PC.

So I followed my usual procedure and slotted everything for the initial POST
(nothing worse than getting everything connected in one of those old PC cases
with IDE cables and wires everywhere just to have to unhook it all because
something wasn't seated properly) and powered on.

Blink.

Nothing.

Hmm...Did I not get the CPU seated properly? Let's pop it out and reseat it.
OWW! Burned the crap out of my finger and got a blister!

Uh oh. It sorta smells like the magic smoke even though I don't see any. Well,
eventually I realized what had happened and learned that even a few seconds
without a heatsink/fan had been enough to fry that brand new 1GHz Athlon.

Thankfully I had the one I'd ordered for myself and I was able to finish the
build that my friend had paid for. But it was a rather expensive (for me at
the time at least) lesson in the need for proper cooling on these modern CPUs.

~~~
nine_k
You're just not looking at Intel processors that are old enough :)

80286 needed no heat sinks, even 20-MHz models.

80386 used to do well without radiators, too, or with entirely passive
cooling, even the faster DX variants.

Frankly, I don't remember low-power i486 (e.g. 40-MHz versions) to have heat
sinks, too. You definitely wanted a heat sink, though usually a passive one,
on a DX2 (66 MHz) or DX4 (100 MHz).

~~~
Thoreandan
The column "The Hard Edge" in Computer Shopper magazine (Bill O'Brien and
Alice Hill) teased Intel's heat output on the new "Pentium" chip by saying
they'd get a tiny frying pan made to fry an egg on it. :-) (can't find the
original article, but it's referred to in this 1997 article
[https://www.highbeam.com/doc/1G1-19334858.html](https://www.highbeam.com/doc/1G1-19334858.html)
)

Before then, I don't remember my IBM 386/75 or 486 needing fans ... but the
486DX2/50 had problems with RF emissions from the corners of motherboard
traces (IIRC) so they left external clock speed lower for many moons ...

------
ReidZB
From the article:

> I remember cooling the early CPUs with simple heatsinks; no fan. Those days
> are long gone.

Interestingly, for desktop machines, this is not quite correct... there is
still a "fanless" movement going. My roommate has a PC that doesn't have any
physical moving components: no fans, no hard drives.

Fanless designs are not space-efficient, though; see this fanless CPU
heatsink: [https://www.quietpc.com/nof-icepipe](https://www.quietpc.com/nof-
icepipe) which is rated for up to a 95W TDP CPU - enough to run an Intel
i7-6700K, which has a TDP rating of 91W.

If you're willing to build a very quiet machine instead of a silent machine,
liquid cooling with very quiet fans (that _eventually_ ramp up based on temps)
is very workable, and means your machine is effectively silent (i.e. at or
around ambient noise levels) 99% of the time.

~~~
Theodores
Due to the problems of office noise I needed to solve the problem of being
able to work 'at my desktop' but from somewhere quieter in the office, e.g. an
unused meeting room.

So I invested in a cheap and cheerful Chromebook, put linux on it (Gallium OS
is the easiest operating system to install, ever), got the NFS mounts working,
got my desktop to proxy serve my dev domains, put 'Synergy' on there (so I can
use the same keyboard/mouse on all machines from the Chromebook) and made sure
I could also work fully remote with it too (local repo). I also got 'X
Windows' to work nicely.

As a glorified terminal my Chromebook has full HD, massive battery life (all
day) and zero noise. In fact I wish it was 'warmer' in use, such is the low-
end Celeron's feeble and fanless heat output.

It seems to me that laptops actually do not last the distance if they are over
a certain size - 15". The thermal management is just not up to it and it is
only a matter of time before the fan is running permanently with the CPU
throttled. After a couple of dead laptops one thinks 'desktop/server', somehow
silent and low power so it can be left on... But this doesn't really exist
unless you invest in silent cooling.

With my bargain basement Chromebook I can do everything I want to do although
for some things like a time doing graphics work, I will go to the faster
machine. This faster machine no longer has lots of cables attached to it, the
keyboard/mouse is shared from the Chromebook so it becomes a box with power
in, network and HDMI out, neatly tucked out of sight in an adjacent cupboard
rather than roaring away on/under my desk.

Another bonus of the Chromebook is that it's lameness is a feature. People
have rubbish computers as well as posh ones, I need to test for all devices
and what will work on a low end Chromebook will fly along on anything more
normal.

~~~
sethrin
GalliumOS is easy to install, on something that's not a Chromebook perhaps. In
order to install it, I had to:

> Put the chromebook into 'developer mode', wiping the hard drive

> Physically open the device, breaking the 'warranty void' sticker, in order
> to remove a write-protect screw

> Change firmware flags to allow booting off of unsigned partitions

> Replace the firmware entirely, risking bricking the device

Apart from that it was easy, and the installation program was faultless. I do
use my Chromebook as a dev machine, and I agree with you that the poor
performance is more of a help than a hindrance. Code editors don't stress out
any remotely modern computer, and if your code runs well on the Chromebook
then it should be fine anywhere else; very few machines have worse specs. I
think they're fine machines, and GalliumOS is everything one would wish, but I
really couldn't count ease of installation amongst their features.

~~~
Theodores
My hunch is that you opened up a Chromebook Pixel (2013). I thought about it
but decided against 'mutilating' the design classic that is the original
Pixel, stepped back from the edge and bought an Acer 14" full HD Chromebook
with 4Gb RAM for £250.

One thing though - sound. This only works on HDMI which again is a feature - I
can't procrastinate with videos. Installing 'WinZip' on a PC back in the day
when I used 'Windows' was harder and certainly more fraught with danger.

------
xja
The article mentions microcontrollers that use 100 milliwatts as the lower end
of embedded CPUs.

There are actually microcontrollers that use around 1milliwatt for very low
power applications. For example the msp430. TI have a neat video of one
running using power generated from grapes:

[https://youtu.be/nPZISRQAQpw](https://youtu.be/nPZISRQAQpw)

~~~
david-given
Last time this came up someone pointed me at the new generation of ultra-low-
power ARMs:

[http://www.atmel.com/products/microcontrollers/arm/sam-l.asp...](http://www.atmel.com/products/microcontrollers/arm/sam-l.aspx)

35µA/MHz, so with careful selection of peripherals you probably don't even
need a whole milliamp.

~~~
sbierwagen
Designing microamp-level systems can be interesting. From
[http://www.ganssle.com/rants/leaks_and_drains.html](http://www.ganssle.com/rants/leaks_and_drains.html)
:

    
    
      I put one of the boards under a microscope and looked at 
      some of the ancillary parts. There's a 22 uF decoupling 
      capacitor. No real surprise there. It appears to be one of 
      those nice Kemet polymer tantalums designed for high-density 
      SMT applications.
      
      The datasheet pegs the cap's internal leakage at 22 uA. That 
      means the capacitor draws a thousand times more power than 
      the dozing CPU.

------
tyingq
No issues with the article as a whole, but comparing Intel's processors solely
on #cores and clock speed isn't right. The table that compares a E5-1630 with
a E5-1680, for example, omits the information that the E5-1680 has twice the
amount of cache space, despite only having 1.5x the number of cores.

------
sulam
The odd thing here is that no DC I know of has the power and cooling to
support a rack full of these things without surrounding them with empty space
to meet the watts/sqft budget. That picture of racks full of 1U servers is
basically a lie -- for more reasons than power, but power is the killer.

The follow up would be "an Inferno in your Rack".

~~~
teh_klev
I'd disagree, the fairly newish DC we're in is designed for supplying and
cooling 20kW/rack. That's enough for 40 or so 1u servers burning ~450 watts.

~~~
dfox
One thing that burned us in the past was DC that provided 20kW power per rack
in power, but required you to only dissipate 5kW in heat. Needless to say we
changed the DC provider in like a month after they made this clear (and they
combined that with offer of some kind of "cloud provider" package which meant
15kW of both and triple the price). After few another such DC-related
incidents (and mostly unbelievable: circuit breakers catching fire, 600V peak
on 230V AC line, double flooring collapsing from overload and such) I'm quite
relieved that I don't deal with DC procurement anymore.

~~~
mikeash
What the...? Were they expecting you to radiate the remaining 15kW as
microwaves or something?

~~~
dfox
That's exactly the question I asked them, with "or you want us to radiate the
remaining the remaining 15kW into this single mode fiber going to your switch?
we can get laser capable of that"

~~~
pierrebai
Does the wattage coming in entirely converts to heat? I always assumed it
didn't, but have no idea in what proportion.

~~~
mikeash
Energy in has to equal energy out over the long term. For electronics
equipment that isn't doing mechanical work then there aren't a lot of options
for how that energy can come out: it'll either be electromagnetic radiation of
some kind (i.e. light or radio or similar) or heat. Most computer equipment
emits relatively little electromagnetic radiation, so it's essentially all
heat.

~~~
0xffff2
Isn't heat just lower frequency EM radiation?

~~~
mikeash
Heat can be transmitted through EM radiation (of all frequencies, but IR is
predominant for temperatures we're accustomed to) but also by direct contact
between materials. For typical temperatures, radiation doesn't transmit much
heat. Vacuum makes for a pretty good insulator. If you want to remove 20kW of
heat from a small space, you'll need to do most of that removal by
transferring the heat to a fluid of some kind, e.g. by putting it into the
air.

To see the difference in practical terms, pick an item that's noticeably warm,
but not hot enough to burn you. Hold your finger very close to it. The heat
you feel there is radiated. Then touch it, and feel the heat that's conducted
through touch. You'll feel _much_ more heat with the latter.

------
pokemon-trainer
>Is this extreme? Putting 140 TDP of CPU heat in a 1U server? Not really. Nick
at Stack Overflow told me they just put two 22 core, 145W TDP Xeon 2699v4 CPUs
and four 300W TDP GPUs in a single Dell C4130 1U server. I'd sure hate to be
in the room when those fans spin up. I'm also a little afraid to find out what
happens if you run MPrime plus full GPU load on that box.

What could Stack Overflow be doing that requires such a dense GPU/CPU
configuration? I didn't think a commenting site would required that level of
parallel processing.

~~~
heartbreak
He doesn't work at StackOverflow anymore.

~~~
teh_klev
He was referring to Jeff's conversation with Nick Craver who does work at
StackOverflow.

------
gigatexal
As a hardware guy first, and a (wannabe) software guy second this post made me
really happy.

~~~
Infinitesimus
Hardware guy now doing software and I was all tingly inside reading about
hacks to get a happy CPU under unrealistic load.

(Anyone else out there ran Prime 95 and Furmark for fun in the past?)

~~~
gigatexal
That's how I validate my overclocks. And then after that it's 24 hours of
memtest86+

------
msimpson
How do you measure TDP with a Kill-a-Watt? Energy consumption does not equate
to thermal design power. Nor is TDP even a measure of peak thermal output ...

~~~
MertsA
>Energy consumption does not equate to thermal design power.

Sure it does, the energy has to go somewhere. If it's not being stored in some
way or emitted as EM then heat is pretty much all that is left. If you measure
voltage and current for Vcore going into the CPU then you can easily calculate
the amount of heat it's generating since the CPU can't really store any
appreciable amount of energy and there's basically nothing else that would
allow that energy to leave the CPU.

~~~
msimpson
Yet, he is not measuring the Vcore going into the CPU. He is measuring the
overall wattage consumed by the PSU at the outlet using a Kill-a-Watt. This
contains a lot more draw than just the CPU, itself. That's an overwhelming
lack of concern for a wealth of other variables that can amount to tens of
watts. So to be more concise:

Overall energy consumption of a computer does not equate to only the TDP of
the CPU, itself.

~~~
nucleardog
> Unfortunately, here's what I actually measured with my trusty Kill-a-Watt
> for each server build as I performed my standard stability testing, with
> completely identical parts except for the CPU:

The two CPUs were in identical test rigs. There is a 80W difference changing
only the CPU. While you can expect some of this is lost in the PSU, simple
fact is more power is ending up inside the computer and it has nowhere to go
but out as heat. The most reasonable explanation given otherwise identical
builds would be to expect this difference to be due to the changed component.

So if we assume that Intel's 4-core is actually 140W TDP, then there's no way
this 6-core can also be 140W.

Yes, this isn't an exact, scientific test, but it's certainly reliable enough
to say "this 6-core processor is emitting more heat than the 4-core, although
they are rated identically" which it seems to me was the only point he was
trying to make given the context which was "two more cores, slightly lower
clock speed, that might be an okay tradeoff - OH WAIT, MORE HEAT".

~~~
msimpson
> So if we assume that Intel's 4-core is actually 140W TDP, then there's no
> way this 6-core can also be 140W.

Why? How can you say that under typical usage the thermal dissipation for both
chips isn't the same? Atwood's numbers measure overall power consumption while
idle and under heavy load with mprime, neither of which is what TDP seeks to
measure.

TDP is like fuel economy for cars. You don't claim it's a lie when you go one
hundred and fifty miles an hour for 20 miles and burn two gallons of gas. You
simply realize those highway numbers are meant for more of a sixty mile per
hour journey over the same distance.

~~~
nucleardog
Okay:

Yes, TDP is an inexactly defined and meaningless term and while to the layman
it would generally be understood to have some relation to the heat emitted by
the CPU during normal operation, it's possible that both chips in fact only
generate 1W of heat and were given a 140W TDP because Intel had a pre-existing
cooling solution and a warehouse full of spare parts. Yes, you are correct on
the semantics. Power consumption has no relation to TDP because heat generated
has no defined relation to TDP.

However, given any meaningfully bounded definition of TDP, the case is still
made that the TDP of these chips should be dissimilar. The later numbers show
that under full load the power draw at the wall increases by 20W/core used.
Unless your "typical load" used for determining TDP does not actually make use
of the full number of cores (which I would think could be fine for a desktop
processor, certainly not a server) then it's clear that the heat generated by
these two processors should be dissimilar under any load.

If the 4 core actually needs to dissipate 140W under a typical use case, then
the 6 core should absolutely need to dissipate more unless the "typical use
case" is uselessly applied.

If we want to talk cars... Let's say I sell a base model with a top speed of
80mph, and a sport model with a top speed of 120mph. But I tell you you only
need to put 80mph rated tires on the sport model because that's as fast as a
typical person drives. Would you really claim that the sport model's tires are
correctly rated? Would you not be surprised when you drove 100mph in the sport
model and the tires exploded? Why on Earth would anyone even buy the sport
model if it's crippled to nearly the same performance as the base model?

~~~
msimpson
Let's get to the crux, here:

> Unless your "typical load" used for determining TDP does not actually make
> use of the full number of cores

> If the 4 core actually needs to dissipate 140W under a typical use case,
> then the 6 core should absolutely need to dissipate more unless the "typical
> use case" is uselessly applied.

Taken from Intel's own specs for the E5-1650:

"Thermal Design Power (TDP) represents the average power, in watts, the
processor dissipates when operating at Base Frequency with all cores active
under an Intel-defined, high-complexity workload. Refer to Datasheet for
thermal solution requirements."

Immediately notice "Base Frequency", not Max Turbo, "all cores active", and
"Intel-defined, high-complexity workload." Until you can perform the same test
on both chips, you cannot assume that "these two processors should be
dissimilar under any load."

In regard to your car analogy, if the sport version had a management interface
to limit speed due to the rating on the tires, then it would be like an Intel
CPU. Read this:

[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/guides/xeon-e5-v3-thermal-
guide.pdf)

------
odonnellryan
This is cool. I knew a guy who literally painted gallium onto his processors
as thermal paste. He said it worked really well.

~~~
wlesieutre
Gallium-based thermal compounds are commercially available, often called
"liquid metal". See "Coollaboratory Liquid Ultra" for one example.

They're very effective, but must be used with extreme care because aluminum is
highly soluble in liquid gallium. Aluminum is the most common material for
heatsinks, and it will literally dissolve if the gallium touches it. Gallium
is also electrically conductive, so if you accidentally dripped any into your
processor socket I assume you're going to have a problem.

If you're careful with it and make sure to get a cooler with a copper contact
area, it'll cool more effectively than traditional thermal pastes. I thought
about trying it, but it seemed like more hassle than it was worth.

~~~
throwawayish
> Aluminum is the most common material for heatsinks,

Aluminium is the most common material for heatsink _fins_. The cold plate is
almost always (nickel-plated) copper.

------
jpfed
I can't see a table like what's in the article without poking at it some more
in Excel. Anyway, as you go down the table, the increasing cores do indeed
have a higher cores*GHz. If you look at dollars per core-GHz (yes, I recognize
this is silly), you get a generally increasing trend as you go down the table,
but the E5-1680 is $63.35 per core-GHz and the E5-2680 is only $60.59 per
core-GHz.

~~~
codinghorror
Yeah that is a logical way to look at it, and better than $ per core.

------
lightedman
I love how Intel got caught on the TDP lie. I know for a fact many Intel
processors run just as hot as AMDs, despite Intel blatantly lying about it.

------
antgiant
As a mere desktop guy. Where could I find a quality resource on safe ambient
temperatures for desktop computers? I tend to assume that the magnetic HDD
would be the limiter.

------
protomyth
For me, the Pentium 60 was the turning point for heat. The 486 was pretty
easy, but those Pentiums sure put off a lot of heat. I seem to remember that
the Pentium 90 was cooler.

------
LeonM
Why trying to 'accept' the thermal challenge if you can just buy an
HP/Dell/whatever 1U box that already has the engineering effort done for you.
Or even better, why buy a box at al, just use cloud.

In the old days, I used to tinker with my machines, improve airflow, better
CPU cooler, liquid cooling etc. Now I just want stuff to work, so I buy a
laptop.

~~~
Tepix
It says so in the article: For therapeutic reasons.

~~~
grandalf
I think when one works on software long enough, a sort of pressure starts to
build up making the person long to work on hardware in any way possible. I've
resorted to woodworking, DIY car repairs, and VHDL... all very therapeutic :)

~~~
vonmoltke
This longing is even worse when one started out one's career working in
hardware.

