

Defective Heat Sinks Causing Garbage Gaming - blackhole
http://randomascii.wordpress.com/2013/08/06/defective-heat-sinks-causing-garbage-gaming/

======
incision
_> "When I told the customer of my suspicions they said that they had recently
replaced their processor heat sink. When they put the old heat sink back on
their performance problems went away."_

A person who buys and installs their own heatsink, but can't troubleshoot a
temperature-driven performance problem impresses me as an odd combination.

I expect providing support to members of the "enthusiast" PC market is an
interesting challenge.

~~~
saraid216
I'd actually be capable of buying and installing my own heat sink (I have
never bothered doing so; I don't really do hardware mods), but I do not know
how to troubleshoot a temperature-driven performance problem.

I mean, it's just thermal paste on a fan, right? That's not exactly hard,
after a fairly low threshold of know-how.

~~~
jlgreco
I think that you are overthinking it, this sort of troubleshooting doesn't
require all that much domain specific knowledge (certainly the author of the
article came at it with more than was strictly necessary).

If you install new fancy fuel injection nozzles in your car, and suddenly your
car is shit, then swap them back out and see if the problem goes away. After
you determine that it does, you can start investigating if the installation
was botched or if they are broken, or whatever. You don't need to know
anything about cars to do that, it's just generic troubleshooting.

------
exDM69
Cleaning the CPU heat sink and fan from dust (and in my case, cat hair) can
sometimes give a surprising performance boost. Especially on cheap laptops
that are often sold with inadequate cooling.

CPU thermal throttling can have a huge effect on performance.

~~~
stephengillie
Pulling the gaming boxen over to the garage & blasting them out with the air
compressor hose is part of spring cleaning here.

~~~
shabble
If doing this (or using canned air) it's a good idea to use a small stick or
bit of plastic to jam any fans to stop them spinning too hard and ruining
their bearings or throwing a blade, both of which I've had happen. Laptop ones
seem particularly susceptible, probably down to size and typically higher
usage temps.

~~~
jws
Or acting as a generator and throwing an over voltage into a circuit that
isn't protected from it. (Also, don't push your robot with the motors wired
unless you know you protected the motor driver circuitry.)

~~~
stephengillie
I had never considered this possibility -- either in my laptop's fans nor in
my Arduino robot. But that's a great pro-tip I'll keep in mind. (It may be why
my Arduino stops working normally after a couple hours and takes a couple days
to straighten out)

------
PaulKeeble
Its common in hardware enthusiast forums and discussion now when there is a
performance problem for there to be 2 initial questions. The first is regards
to clock speed and the second is temperature. Its really common for clock
throttling to be the underlying cause of performance problems on today's
systems and temperature is also the number one cause.

But the second major cause, representing about 30% of cases is
software/firmware bugs in the drivers for the graphics card or the Windows
scheduler doing odd things for the CPU under a particular workload. As more of
the performance of components becomes dependent on boosting and clock
increasing in order to save power the more we are seeing inconsistent
behaviour and problems with the implementations.

------
lgeek
> Thermal throttling is extremely difficult to detect in xperf traces. It is
> done automatically by the CPU or the motherboard, and the operating system
> (OS) doesn’t realize that it is happening. I use the xperf toolset for these
> investigations but its CPU frequency graphs show the CPU running at full
> throttle and its power management events say that all is well, and yet…

At least with a Sandy Bridge CPU on Linux you get power management events
which show up in dmesg and the reduced frequency is correctly reported.
Definitely a case of the drivers/monitoring software not looking for the right
things rather than the hardware not exposing this information.

~~~
X-Istence
> I hope that some day the ETW code in Windows that provides the CPU frequency
> will be fixed to detect thermal throttling – and a temperature provider
> would also be nice. Chapter 14 in the Intel Software Developer’s Manual,
> Volume 3A:, System Programming Guide, Part 1 would be a good starting point…

He mentioned that in the article ...

------
outworlder
This should be made more visible by OS vendors.

Windows popping up one of those alert baloons would be very helpful.

------
nkurz
I recently discovered 'turbostat' for reading actual CPU frequency on Linux:
[http://lxr.free-
electrons.com/source/tools/power/x86/turbost...](http://lxr.free-
electrons.com/source/tools/power/x86/turbostat/turbostat.c)

Bruce's direct testing approach has a certain elegance, but instead of timing
a loop 'turbostat' queries the CPU's performance registers. So in addition to
being handy if you are running Linux, it could serve as a guide for developers
of other systems trying to figure out which MSR's are needed and how to
interpret them.

~~~
Florin_Andrei

        cat /proc/cpuinfo

~~~
nkurz
This would be simpler, but doesn't work if you are running a non-standard
clock. /proc/cpuinfo gives you the nominal frequency, but turbostat gives you
the actual. In the example below, the 3.79 GHz reported by turbostat is the
actual speed and the 2.527 reported by /proc/cpuinfo is not. I think thermal
throttling is another case where turbostat is correct but the kernel info is
not.

    
    
      nate@centos:~$ cat /proc/cpuinfo | grep -i mhz
      cpu MHz		: 1596.000
      cpu MHz		: 1596.000
      cpu MHz		: 1596.000
      cpu MHz		: 1596.000
      cpu MHz		: 2527.000
      cpu MHz		: 1596.000
      cpu MHz		: 1596.000
      cpu MHz		: 1596.000
    
      cor CPU    %c0  GHz  TSC    %c1    %c3    %c6   %pc3   %pc6
                2.82 3.75 3.79   2.92   0.57  93.69   0.00   0.00
        0   0   0.29 2.76 3.79   0.26   1.24  98.21   0.00   0.00
        0   7   0.11 3.02 3.79   0.45
        1   3   0.14 2.45 3.79   0.14   0.56  99.16
        1   6   0.03 2.79 3.79   0.25
        2   1   0.06 3.07 3.79   0.12   0.26  99.56
        2   5   0.05 2.85 3.79   0.13
        3   2   0.05 2.81 3.79  21.88   0.23  77.84
        3   4  21.83 3.78 3.79   0.11

