Hacker News new | past | comments | ask | show | jobs | submit login
How to repair the parts that explode in Lenovo yoga laptops (adammunich.com)
131 points by adammunich 9 days ago | hide | past | web | favorite | 61 comments

> It is very likely that the reason the transistor failed was because of poor cooling, as it is a low cost, high-resistance transistor that is running near its design limits.

This wouldn't be my first guess based on that hole, which was formed more abruptly than a little overheating. Cooling the outside of the package is only going to buy a little more margin, not fix the root problem.

Looking at that schematic snippet, what is up with the capacitance on that gate drive circuit? They want a slower turn on to avoid a popping sound? Maybe it's a reference circuit that worked fine with a larger package or something, and a UF6 doesn't have enough thermal mass? If the turn on/off time is the problem, removing one of C489 or C530 could stop it from happening again.

Or it could be something downstream drawing too much power on that rail or shorting it out. Since you're replacing the part, you might as well just use a FET with a lower resistance and some hope to address this possibility.

FWIW some thinner flux (like a Kester 951) will generally let you fix solder bridges without starting all over.

I've commented elsewhere on the thread, but, this also has the potential to be destroyed by software. That AUDIO_PWR_EN appears to be a signal that can be controlled by software. If you switch it fast enough such that the MOSFET is kept mostly in the linear region (due to the RC on the gate), you can kill the MOSFET very quickly.

Yeah this looks like an SOA violation to me based on the hole. Usually you would turn a FET like that on slowly to avoid high inrush current into the downstream circuit, but ~10ms seems a bit silly. That would also explain why this escaped their initial testing because the FET wouldn't get hot in normal operation.

Although, if it's a 5V0 or 3V3 rail they may just not be turning on the FET enough. It's Rds(on) is high at ~3 V, and they divide down the gate drive to get reasonably symmetric on/off times and slow speed.

I basically came here to say the same, regarding the FET: Most likely there are parts with the same form factor but much lower resistance (both switching and fully on), which don't have that problem. The price-difference might matter in mass production, but for a repair at home, I don't see a reason to go with a cheap part.

Decreasing the turn on/off time might also be a good idea to prevent future repairs (nice spot!).

I'd be curious to see what shorting across the transistor would do. It would probably reveal whatever behavior they wanted to avoid.

Insufficient cooling has given me issue with certain Lenovo thinkpads over the years. Most recently it was throttling on a X220, fixed by swapping to a redesigned fan/heatsink.

The first time for me was with their T43. This model was mostly a platform refresh of the T42. There's a chip called a southbridge that's located below the palmrest. Between the models this chip went from 400mhz to 533mhz. Lenovo decided it wasn't important to improve the cooling, no heatsink whatsoever included on the southbridge. However, they kept ample temp sensors in the case so the fans would pulse nonstop as it failed to cool this one chip.

I had two nicknames for that laptop. The 'nut roaster' and the 'vacuum cleaner'. I eventually fixed the problem by hammering out an old copper penny and sticking with thermal compound to bridge the gap between the heatsink and chip. It wasn't perfect but it dropped the temps like 10C. The rest could be controlled with an installed fan daemon.

I have similar problem with T420s from 2012 - great laptop, but there's a lot of coil whine and horrible fan with very small heatsink. It's a model without dedicated GPU, but i5 alone is enough to push the fan to constant medium-high speed and it's still 70-80°C running Windows 10 with Chrome. Repasting or cleaning the heatsink does not seem to help.

I have a T420s but with an optimus configuration. The dedicated GPU is pretty much useless because of thermal throttling and yes, even the i5 alone is often enough to go into throttling territory. This is after repasting and thorough cleaning of the fan assembly. And if you want to access the fan assembly then prepare taking apart the whole laptop.

Huh, interesting. I have two T420s laptops (i7s with dedicated GPUs), and have a friend with one too (i5 integrated). None of them has those issues, but we all run Ubuntu. One of mine did have fan issues once, but it was due to the fan itself going bad, and once replaced it was fine.

Thanks for the hint on replacing the heatsink on the x220. A quick search lead me to a reddit post claiming the heatsink from the x230 can be fitted to a x220. Is this what you did, or did you do something else entirely?

> Most recently it was throttling on a X220, fixed by swapping to a redesigned fan/heatsink.

Personally I believe the major source of heat of the X220 comes from the Sandy Bridge processor. The next-gen Ivy Bridge in X230 has almost no performance gain, but reduces heat dissipation significantly (ironically, Ivy Bridge is also the turning point when Intel started to use crappy thermal paste in the desktop CPU packaging, leading to thermal issues...).

I own both machines and the difference is great. Also, two of my X220s from different sources all suffer with random shutdown issues, I suspect the source is the cold solder joints under the power MOSFET or a defective one (saw a troubleshoot post of a similar issue in a fourm), but it's BGA, so...

Anyway, that's why I recommend everyone who wants a X220 to get a X230. You can even replace the motherboard and the bottom case to "upgrade" the X220 to X230.

> the heatsink from the x230 can be fitted to a x220.

I believe it's virtually identical. I never heard about the X230 heatsink performs better, perhaps it's the case, I may give it a try someday...

> It is very likely that the reason the transistor failed was because of poor cooling, as it is a low cost, high-resistance transistor that is running near its design limits. To prevent the problem from re-occuring, a soft / deformable silicone heat transfer pad can be applied to sink excess heat to the aluminum shell of the computer. In the 1990s, this was a very popular way to keep the motor controllers in CD-ROM drives cool without any additional hardware.

So it seems like the problem is a cooling issue, which doesn't surprise me. "Ultrabooks" often seem to sacrifice decent cooling for the sake of thickness, and I think it is because most people really do not tax their hardware all that much. If you do - and you do not have a proper workstation-class laptop - you're likely to run into similar issues.

My Thinkpad X280 suffers from poor cooling that quickly leads to throttling, even undervolted. My old work A485 wasn't that great, either.

No, cooling is not the issue,. This is not a power transistor, its not meant to pump a lot of amps. Whats more once its turned ON its internal resistance is insignificant (225mΩ).

What can be happening here:

- badly designed output Audio section shorting power in some rare circumstances, might be as weird as mechanically stressed audio jack touching traces underneath it.

- software glitch around audio power enable routine enabling/driving that transistor hundred/thousand of times per second (pwm) in some circumstances, keeping it in the linear region

- badly designed under powered mosfet driver, either voltage too low or not enough current keeping it in the linear region

You do not cool power rail switches like this one, they arent meant to dissipate any meaningful power when designet properly.

So the author is wrong in his conclusion? I am happy to know the correct answer.

The author's solution may still help somewhat, but yes, they are almost certainly wrong. This transistor really just isn't being driven properly, because it really shouldn't generate heat in this application.

It could a whole slew of problems and it's impossible to tell without measuring. But, I would guess that gate voltage is not far enough above the threshold to drop the Rds(on) to it's low loss on-state.

It's not cooling. A switching transistor in that application shouldn't make too much heat.

We don't know the voltage on the rail, so we can't say exactly what it is, but the likely candidates are an SOA violation because they're switching it absurdly slowly, or the gate drive is below a reasonable threshold.

> So it seems like the problem is a cooling issue

Not just a cooling issue. It's a MOSFET thermal runaway, as it is being driven to its limit, higher temperature => higher resistance => higher temperature. It can be stopped by cooling it, but the bigger picture is that it is either being driven improperly, or cheap, low power components are used to cut costs without adequate a safety margin.

Laptop audio is ~1-2W of power from 5V rail. Thats at most 500mA going thru this 225 milliohm rdson mosfet.

From the article, it seemed the issue was more just bad design. As the author demonstrated, adequate cooling could be achieved within the existing space constraints.

Something strikes me in the middle of the article. If you have metcal station (expensive pro stuff), clearly you have to have some experience with SMDs... This transistor is replaced with preheater and a hot air gun in seconds, and it is not the smallest package out there.

It's risky to use hot air spot-heating a high density multi layer PCB, as these boards aren't really designed for differential heating and can delaminate easily.

You really should heat the whole board uniformly to be safe, but that is a danger to do when you aren't sure of the melting temperature of the solders used.

You set preheater to 120c and hot air with smallest nozzle to 240c, works 100% like a charm

Do you still need to apply flux before desoldering?

You need it while desoldering, while cleaning nopb solder (yes, you have to clean, or joint will crack later), and while soldering in the replacement. Flux is cheap, better use it all the time.

"The bigger the glob, the better the job" is not the best practice when you're soldering components, but it's very true when you are reworking a SMD board...

You will need to apply it afterward to clean the pads.

Safe to say this is way out of technical capabilities of the average (power) user; but kudos for documenting the process nevertheless! Maybe someone in a local hackerspace would help out...

Sudo room in Oakland California!

It’s a typical thing in industrial design to conduct the heat to metal enclosure. I did this for transistors in a stepper motor driver and for FPGA designs. The cost of conductive sheet is close to zero compared to possible problems. Lenovo saved this 0,2 cent drop of thermal paste. Maybe it’s worth producing thousands of consumer devices, though I would like to buy properly designed products.

Edit: thank you for explanation. This was not thermal problem then. I am just curious why didn’t all laptops had this problem?

That circuit appears to assume that AUDIO_PWR_EN signal will switch very infrequently.

If AUDIO_PWR_EN is a pin that is controlled by software, and, if you keep switching that pin such that the MOSFET is kept in the linear region due to the RC on the gate, that MOSFET will be toast very quickly.

This is intended to be a power saving feature; if enabled, audio detection will be driving the enable pin with a few seconds of hysteresis: http://www.thinkwiki.org/wiki/How_to_enable_audio_codec_powe...

Now the problem is, why would a MOSFET be destroyed even if there is a few seconds of hysteresis?! Well, defective driver is a still possibility, but perhaps it's a hardware problem.

I would refer you to comment by mindslight down here.

It is just bad design by Lenovo. You should not put a switching transistor in linear mode. That rc-dampener is just bullshit.

The damage of this mid-power MOSFET surprises me. Audio circuit is probably consumes just a few tens or hundreds milliamps at most.

Yet it has a drastic crater-like dent in the IC enclosure. That tells me it was a voltage surge or current spike.

The hypothesis of entering a linear region caused by a random software glitch producing an unconscious PWM on a filter slope sounds plausible as well. But that crater-like dent says that the real cause might be way more intensive than that.

Lenovos quality crap. We have around 10 Yoga 720s in our workplace, around 7 have their touchscreen/touchpad failed within a year. I'm out of options on what to buy if I don't want Apple.

If you just want something that'll work for at least 3 years, Dell Latitude with an on site warranty. I've had about 6 failures in 20 years, each time a man has arrived the next working day to fix it wherever I am in the world. None of this "go to the nearest Apple Store" bullshit.

Stop buying consumer(yoga)products for your work(thinkpad)space. Consumer devices are meant to die when warranty runs out, business line is designed to minimize contracted service work.

ThinkPads, T/X2/X1. NOT Yoga. NOT IdeaPad. NOT whatever-the-hell-consumer-they-produce this year.

Very happy with my Dell Latitude. Has a few ports, keyboard is nice, build quality is great. Rock solid so far. One option at least.

Well Apple is not that much better for a much higher price. The keyboard issue is still not fixed.

Thinkpads are usually pretty solid, and I like the (relative) repairability compared to MacBooks. But I agree there are plenty of issues with stupid stuff on modern Lenovo machines and their qc needs to get better to justify their prices.

I have a 4 year old Yoga (a Flip or something I think?) that still works great. My wife has an (admittedly $5000) Thinkpad (yes not Yoga but still Lenovo) that works great (although the ordering was a mess - but we did get a 10% refund in the end without asking for it).

Regardless of the merits of the OP's problem, my point is that any large manufacturer is going to have duds (and/or dud models). People like to hate on Dell, but I've had some great machines from them over the years, for very good prices. There's more to brand quality than 'I had a bad experience once'.

Another recommendation: I've been happy so far with my newish LG Gram. There are some Linux-related issues on some models, but we've found workarounds for it and compatibility is pretty solid now.

I've been looking at the Gram, the 17 looks like a pretty nice machine for a decent price, and going on sale regularly now with the newer i7s out. Worried about the quality though, and that it'll have similar problems to what the Lenovos and other ultralights seem to have.

I use mine primarily for software development and systems administration, and I'm not super picky about things like graphics and audio, so as far as those things go, I can't say much that would be helpful. Nor have I been running it long enough to expose the kind of defects that started this whole thread.

Overall though, the laptop feels well built. I like the key action (but personal preference). The hinge seems to be well made. Nothing flexes when opening or closing it. The screen, to my eyes, is fantastic, and with the Celicious anti-glare film on it, it works well in daylight.

It's wicked light. I don't love ultralight laptops and originally was trying to avoid them, but this one has almost all the ports I want (except for a mmc instead of sd card slot) and appears to be more serviceable than most modern non-ultralight laptops. One of the things I looked for was a user-replaceable battery, and surprisingly, this one should be. We'll see how much it actually sucks to replace in a couple of years.

If you try to run any Linux on the Gram 17, you will need the info at https://bugzilla.kernel.org/show_bug.cgi?id=203617

I'm happy with my DELL XPS laptop.

>'If you are not careful you can rip the metal off the circuit board, and if that happens, there is very little you will be able to do to fix that.'

I've done that before, managed to fix it with silver based conductive paint.

One can also scrape the soldermask off of the nearest segment of the trace that got broken and jumper to it. It's easy to pull up traces on accident but it's uncommon for that trace adhesion failure to cause the nearest IC pin or via to fail as well.

Much harder to do with buried vias :-)

Overheating (and the subsequent throttling) seems to be a growing problem for Intel. Also, most of their mobile line is locked to the UHD 620 GPU.

As I understand it they are working off of an older fab than AMD and are hitting its EOL, all while commanding a price premium. AMD has room to expand their lead and appears to be beginning to, despite being a 10x smaller company.

Yikes! Placing alcohol and tools on the battery like that makes me nervous!

Hug of death?

Curious to know where in the world you are, the server is working well for me on the west coast.

Timing out for me in Denmark

I just rebooted it again, working now, clearly we have a bug.

Hug of death?

Hmm, I just rebooted the webserver and it seems fine now.

Interesting, I learned a new word, apology for this irrelevant comment

Refer to google

Now imagine these were Apple laptops that had their power circuitry randomly explode.

The amount of angry tweetstorms, memes and class action lawsuits would break the internet.

Well, they do. Usually you go to the genius bar and get the whole board replaced for $800+

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact