no, no, NOOOOOOOOOOOOO
Nvidia makes faulty GPU chips for 9 years now (underfill between die and bga carrier package). Starting with G84, up to at least Fermi, EVERY SINGLE GPU will die from thermal stress is used extensively at high temps. This is very well documented in numerous lawsuits that forced Nvidia, DELL, APPLE and other manufacturers to do repeated expensive recalls.
But in other cases, this can be managed entirely by the BIOS. I know (or rather, think -- didn't boot Linux on a machine with memory problems recently) that some Dell machines will mark bad locations/ranges as reserved in the ACPI memory map if you run the built-in diagnostic tool from the boot menu.
This sounds like too many fans to me. Fans can even have negative side effects on the air flow. Usually you want one flow from the front to the back. In some cases just the PSU fan is enough. You generally only want 120mm fans, as airflow rises faster than noise with size.
1U servers are evil.
> At first, try to determine WHY the caps failed.
In sufficiently old gear, they failed because they were sitting idle too long, and can probably be reformed.
I did some Googling a few months ago on this subject to clear it up for myself and found a bunch of information that strongly suggests the above line of thinking is very very wrong: it's CRTs that store massive amounts of energy for ages - they do this to store energy for future warmups, so the necessary surge of power can partially come from the CRT itself. So this is a design feature.
Most current PSUs have bleed resistors to drain the internal capacitors, and if I understand correctly, the short/very quiet high-pitched squeal you hear when you disconnect a PSU is the high-frequency oscillator circuit rapidly winding down as the charge in the capacitors is drained (I recommend a quiet room/environment to test this). This happens within about a second or so on the low-wattage PSUs that I have here.
If putting a screwdriver across the capacitors in a PSU produces an arc 5 or 10 seconds after that PSU is off, I'd be very very surprised.
The "PSUs KILL" line has been touted by CRT techs, probably from a time when bleed resistors and other safety measures weren't as prevalent. Well, it worked... and now we believe everything's dangerous. I personally feel a lot more confident eg swapping a fan in a PSU - I hear the squeak (in the PSUs I have) and know that it's now safe to work on.
With the above said, if I were tinkering with an unknown PSU, or especially a cheap Chinese OEM supply in a set-top box or similar device, I'd probably poke everything with a big insulated metal stick before I worked on/near it.
Of course, standard disclaimers with this type of info applies - definitely do your own research before trusting the above!
If any engineers/electricians/similar can chime in here and confirm/disagree that would be great.
A CRT stores energy because it acts a a large capacitor. Some better monitors (especially late '90s open cage arcade monitors) had bleed resistors on the anode of the CRT to make servicing safer. It's not actually a ton of energy; the voltage is very high, but that's about it. A charged, unpowered tube will sting, but it's seldom a killer. The flyback transformer will give a potentially lethal shock, though. The anode cap has the thick insulation and is surrounded by aduadag for a reason!
Considering the level of energy stored in the average PSU cap, if I use a decent 5W or 10W resistor, should I hold it in a pair of pliers so I don't get burnt? (I'm guessing I'd definitely need to do that if I use a tiny 1W or 2W.) Also, as someone whose knowledge of electronics is very poor - is a resistor the best thing to use in this instance?
And it makes a lot of sense to put bleed resistors on open cage CRTs - those types of units are unlikely to frequently power down/back up, so completely discharging the unit isn't an issue. Design win!
I've heard stories about service techs getting bitten by the 70kV HV while testing tubes that were powered up for testing...