Hacker News new | comments | ask | show | jobs | submit login
Broken Hardware, Fixes and Hacks Over 8 Years (hookrace.net)
63 points by def- on June 21, 2016 | hide | past | web | favorite | 23 comments

>"Putting the GPU into an oven and baking it for a while fixed the problem for a few days by resoldering cracked solder points, but then it returned. This is probably related to the switch to lead-free solder in 2006, which is more brittle."


Nvidia makes faulty GPU chips for 9 years now (underfill between die and bga carrier package). Starting with G84, up to at least Fermi, EVERY SINGLE GPU will die from thermal stress is used extensively at high temps. This is very well documented in numerous lawsuits that forced Nvidia, DELL, APPLE and other manufacturers to do repeated expensive recalls.

People still believe this myth even though "baking" won't fix the problem permanently. If it really was the joints (and baking them would actually do something), then the problem would have gone away. The chips themselves are faulty.

I had a similar problem with a DLP chipset controller (BGA) in an expensive home projector. Instead of attempting to reflow the solder joints, i made a clamp which held the chip in place. Four years later, I've not had a problem since.

The memmap trick is pretty clever, though I wonder what the syntax is to make grub reserve multiple unrelated bytes instead of a 1-byte contiguous 'chunk.'

memtest86 can be configured to display address bit patterns that can then be consumed by the Linux kernel as a command line argument: https://lwn.net/Articles/440319/

But in other cases, this can be managed entirely by the BIOS. I know (or rather, think -- didn't boot Linux on a machine with memory problems recently) that some Dell machines will mark bad locations/ranges as reserved in the ACPI memory map if you run the built-in diagnostic tool from the boot menu.

I've heard rumors of low prices on high-capacity "bad RAM", but never been able to find anywhere to buy it. Have you investigated this possibility before?

Never heard that one before; I don't know much about how problem spots usually develop, so I would be wary about the "stability" of any known faults when using a module like that.

I was thinking of replacing the faulty DRAM chip, but sadly with modern RAM using BGA this is not a trivial job though probably not impossible.

> 7 case fans

This sounds like too many fans to me. Fans can even have negative side effects on the air flow. Usually you want one flow from the front to the back. In some cases just the PSU fan is enough. You generally only want 120mm fans, as airflow rises faster than noise with size.

> You generally only want 120mm fans, as airflow rises faster than noise with size.

1U servers are evil.

The Logitech MX518 mouse is quite possibly the best mouse I've ever used, I've had it for almost 8-9 years now still going strong.

Ditto. I've got an MX510 (blue) and a MX518 (grey), both are great and although are over a decade old by now are still in regular use (the MX518 is my daily critter on my main machine). Not had a days trouble with either of them.

yeah, I'm kinda scared it going to break, i don't abuse them, but still......

The first post: No, Nay, Never replace with "some old caps". At first, try to determine WHY the caps failed. E.G. Is there something exceedingly hot around, and can you fix it? Second, the best thing to do is take caps with higher Voltage rating, because many manufacturers cut the margin exceedingly close, resulting in premature failiure. Third, due to the heat in the Power supply, rather take Aluminum-Caps, not Tantalum caps. They have the tendency to start burning. Fourth, look at how the caps are arranged. When there are some in Parallel, take normal ones. Else, you must take Low-ESR ones.

  > At first, try to determine WHY the caps failed.
In current consumer gear, 99% of the time the caps failed because they are garbage that comes nowhere near to meeting their supposed specification, in designs where the specifications are already cut to the absolute minimum.

In sufficiently old gear, they failed because they were sitting idle too long, and can probably be reformed.

I repair old pinball machines for fun and you pretty much need to replace every capacitor in them due to age - anything over 5-10 years old, even if working, is a time bomb waiting to go off.

> "When you open up hardware you have to be careful about electric shocks. I never received one, but especially power supply units (PSUs), which were the broken parts inside of my displays, can store a high amount of energy for a long time after disconnecting them from power."

I did some Googling a few months ago on this subject to clear it up for myself and found a bunch of information that strongly suggests the above line of thinking is very very wrong: it's CRTs that store massive amounts of energy for ages - they do this to store energy for future warmups, so the necessary surge of power can partially come from the CRT itself. So this is a design feature.

Most current PSUs have bleed resistors to drain the internal capacitors, and if I understand correctly, the short/very quiet high-pitched squeal you hear when you disconnect a PSU is the high-frequency oscillator circuit rapidly winding down as the charge in the capacitors is drained (I recommend a quiet room/environment to test this). This happens within about a second or so on the low-wattage PSUs that I have here.

If putting a screwdriver across the capacitors in a PSU produces an arc 5 or 10 seconds after that PSU is off, I'd be very very surprised.

The "PSUs KILL" line has been touted by CRT techs, probably from a time when bleed resistors and other safety measures weren't as prevalent. Well, it worked... and now we believe everything's dangerous. I personally feel a lot more confident eg swapping a fan in a PSU - I hear the squeak (in the PSUs I have) and know that it's now safe to work on.

With the above said, if I were tinkering with an unknown PSU, or especially a cheap Chinese OEM supply in a set-top box or similar device, I'd probably poke everything with a big insulated metal stick before I worked on/near it.

Of course, standard disclaimers with this type of info applies - definitely do your own research before trusting the above!

If any engineers/electricians/similar can chime in here and confirm/disagree that would be great.

Many PSUs do have bleed resistors, but plenty (especially cheaper ones) will not. More importantly, a damaged PSU may have a non-working bleeder circuit, and you'll only know when you discharge a cap. Hopefully across a resistor, rather than a screwdriver, and very much hopefully not an open wound on your skin.

A CRT stores energy because it acts a a large capacitor. Some better monitors (especially late '90s open cage arcade monitors) had bleed resistors on the anode of the CRT to make servicing safer. It's not actually a ton of energy; the voltage is very high, but that's about it. A charged, unpowered tube will sting, but it's seldom a killer. The flyback transformer will give a potentially lethal shock, though. The anode cap has the thick insulation and is surrounded by aduadag for a reason!

I wasn't aware that "plenty" of supplies lack bleeder resistors - and ouch, I didn't consider the case where the unit might be damaged, thanks.

Considering the level of energy stored in the average PSU cap, if I use a decent 5W or 10W resistor, should I hold it in a pair of pliers so I don't get burnt? (I'm guessing I'd definitely need to do that if I use a tiny 1W or 2W.) Also, as someone whose knowledge of electronics is very poor - is a resistor the best thing to use in this instance?

And it makes a lot of sense to put bleed resistors on open cage CRTs - those types of units are unlikely to frequently power down/back up, so completely discharging the unit isn't an issue. Design win!

I've heard stories about service techs getting bitten by the 70kV HV while testing tubes that were powered up for testing...

Very curious what he chose for his new set-up.

If you're having fun, it's not wasted time. Otherwise, this would totally be wasted time.

great article thanks!

unsurprisingly, he uses Gentoo :D very cool in my eyes

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact