The Race to Zero Defects in Auto ICs

Animats · on June 9, 2022

"Automakers are demanding 10 defective parts per billion limits across the whole spectrum of automotive ICs and single transistor/diode devices." Now we're getting somewhere.

What really drives quality up is a big, hardass customer. Back in 2018 I commented on the USAF Reliability Program.[1] The USAF was finding out what went wrong with their electronics down to the level of opening up transistor cans to see what went wrong. They were naming and shaming, with press releases to Aviation Week about bad components and who made them. Parts got a lot better.

[1] https://news.ycombinator.com/item?id=18007339

otter-rock · on June 9, 2022

> What really drives quality up is a big, hardass customer.

What drives up quality is cost of defects to the manufacturer. Cost battlegrounds change over time. Currently, some of the big ones are in electronic parts elimination and reliability.

> It's rare to see a transistor opened up today to see what went wrong.

Literally everyone cuts open their electronics nowadays. You don't see it because there's no longer any need to complain about it.

orangepurple · on June 9, 2022

Define "everyone"

londons_explore · on June 9, 2022

Every big electronics shop will have a failure team with an electron microscope able to decap any faulty chip and figure out which mosfet has blown up.

It's time consuming expensive work though, so smaller places just replace the faulty parts or revise the design and hope.

otter-rock · on June 9, 2022

The categories mentioned in the quoted comments - semiconductor, defense, and car companies. Also others too, like industrial and certain consumer appliances and electronics. Search job listings for electronics failure analysis to get an idea. Their job is to find the failure mechanism so the design team knows what to change.

pif · on June 9, 2022

> What really drives quality up is a big, hardass customer.

You forgot the part where the customer is ready to pay the money for quality.

aidenn0 · on June 9, 2022

In my experience, the majority of customers are willing to pay for quality, but unable to distinguish high and low quality, so they buy cheap.

pif · on June 9, 2022

If they are ready to pay, then they can buy a sample of both and spend some time testing the products.

If it is the bean counters who choose, then they are buying cheap for the sake of cheap.

aidenn0 · on June 9, 2022

Perhaps we do need to define what it means to "pay." For low volume products, what you suggest could multiply the per-unit cost by a large integer factor, when what most buyers want is to pay 10-20% more for a supplier that will spend maybe 1-5% more buying quality parts and doing proper in-house QA.

There are plenty of cases where adding an extra "9" to the end will significantly increase the costs, but there is also a lot of low-hanging fruit that stays low-hanging because so much purchasing is low-information.

Hence why the original comment you replied to requires a "big, hardass, customer." Small customers often cant amortize the NRE across enough products to determine quality, so even if they are willing to pay 2x on a part for it to be better, they don't know if they are getting something of higher quality or just something with a higher markup.

pizzachan · on June 9, 2022

Years ago I remember a Freakonomics podcast episode that delved into the hostility public and private policy makers had to running experiments. It mainly comes down to not wanting to bear the downside risk of an experiment having unfavorable or economically useless results. While if they just don’t know and no else does there is no consequence for doing the status quo even if the results are poor.

aidenn0 · on June 9, 2022

In this specific case, my working hypothesis is that the overwhelming majority (if not all) of my choices are of equally mediocre quality, so spending a lot of money to test this is a waste of money.

The "big hardass customer" buys enough product to have a reasonable expectation of effecting change in the suppliers, so running both the experiments suggested by GP, and performing post-mortem analysis of failed components can be expected to steadily improve quality over time.

mhh__ · on June 9, 2022

This is something Apple have absolutely perfected in their marketing. They're able to get the activation energy such that people need the new Mac whereas they probably wouldn't be so relaxed spending extra money on (say) expensive shoes versus merely the shoe equivalent of a nice but artless Dell laptop.

ChrisMarshallNY · on June 9, 2022

Sounds like Sam Vimes' Boots Theory[0].

[0] https://www.goodreads.com/quotes/72745-the-reason-that-the-r...

tokai · on June 9, 2022

You forgot the part where the customer have already paid for working parts.

Shish2k · on June 9, 2022

Most people are happy paying moderate prices for 99.9% reliability. Very few people are willing to pay extreme prices for 99.99999% reliability.

kurthr · on June 9, 2022

Actually, 1000dppm (99.9% defective parts per million) is not acceptable except for hobbyists. Typical commercial manufacturing requires <100dpppm due to the number of components and cost of returns. This can be met quite reasonably with statistical sampling and process controls. Every single IC gets tested, it's just a question of how thoroughly and whether the control limits are set well. You need to do R&R studies, but the infrastructure/procedures are usually available to approach 6sigma (even if it's a bit hokey).

Where this is going a bit nuts is that that it's hard to make 100 gold contacts last for 1000 compression cycles at 1 dppm (better than 6 sigma!). That requires a lot of excess maintenance on the test infrastructure! That will be expensive... and your assembly connectors/SMT is likely orders of magnitude higher defect rate than 10dppb so it won't improve final assembly performance.

Also, I think you need another 9 on that 99.99999% to hit 10ppb and realistically 2, if you want to have an acceptable yield rate, and 3 on the test equipment since they are used 10k times every day.

CRConrad · on June 13, 2022

> > Most people are happy paying moderate prices for 99.9% reliability. Very few people are willing to pay extreme prices for 99.99999% reliability.

> Actually, 1000dppm (99.9% defective parts per million) is not acceptable except for hobbyists.

Exactly: There are many more hobbyists than manufacturers, so most people are content to pay lower prices for lower reliability.

th3typh00n · on June 9, 2022

When the cost of the electronics is a tiny fraction of the overall product the equation is different. Automotive is a perfect example where spending extra on a few key components is a worthwhile investment if that can increase the reliability of the entire vehicle.

Animats · on June 9, 2022

If the unit that has to be replaced is large enough, as it is for cars, it pays to get the component reliability up. Failure of IC can mean replacement of dashboard or power train.

deelowe · on June 9, 2022

I'm confused. Is the issue that the parts are out of spec or that the auto makers want to up the spec?

AnimalMuppet · on June 9, 2022

Not-working parts are out of spec (kind of by definition).

deelowe · on June 9, 2022

Not true. Every supplier worth it's salt will provide you with quality details as part of the product spec (and cost and schedules/volume). 100% is rarely the expectation.

metaphor · on June 9, 2022

It appears you're conflating the parametric/functional spec of a product with the statistical spec of a process in manufacturing said product at volume.

pif · on June 9, 2022

Only if he actually paid for working parts. He who buys cheap, he deserves cheap.

AnimalMuppet · on June 9, 2022

If I buy a part, I'm paying for a working part.

If I buy a million parts, I'm probably paying for an acceptable (to me) failure rate, which is probably negotiated as part of the contract. He who buys cheap, buys a higher failure rate.

hulitu · on June 9, 2022

You forget the part where "quality" is a statistical factor. You can only assess the quality after you bought the product. And the producer wants to make a profit first. Most steps to get a quality product are expensive (testing). That's why they want zero defects. Every failed part on production line is money lost.

wallaBBB · on June 9, 2022

What these demands are driving up are End Of Line tests performed by the semi companies, and more time you spend on verification the bigger the price of manufacturing. That's why you can find basically same ICs on DigiKey or Mouser with the only difference being that one is automotive certified and a few magnitudes more expensive.

johnwalkr · on June 9, 2022

The article focuses on defects in ICs, but automotive testing of things like assembled PCBs is interesting to me. I work in aerospace and at least everything I've worked on is designed to environmental requirements and then tested to those requirements plus a margin. This doesn't work if you want to make a million cars that stay on the road safely for 10-20 years. You can't exactly test a statistically significant number of boards for long enough to know much about failures after 10 years, or how many parts will fail during the warrantee period. So they do highly accelerated life testing with vibration, humidity, hot and cold temperature cycling, at the same time during development. Failure modes are discovered and fixed as much as possible in the available time until the product is released (and presumably delayed if a major issue is seen).

targafarian · on June 9, 2022

My understanding is that accelerated reliability / lifetime testing is often not terribly representative of actual performance (sometimes failures are underestimated, sometimes overestimated; in either case, the mechanisms for failure can simply be different from those that are emphasized during the testing).

hulitu · on June 9, 2022

It is representative if you do it correctly. But testing is expensive. And the development time is short. Etc.

froh · on June 9, 2022

ECU manufacturers like Continental do accelerated climate chamber testing. The PCB has to work from -40°C to +80°C, Alaska to Dubai so to speak.

That and vibration resistance are the main reasons why common over the shelf components can't "just be used in the car".

frxx · on June 9, 2022

Usually, car parts have an expected life time of 15 years. Testing is done as you mentioned above, and after the product is in the field, some things will be fixed still. Also, experiences from previous products are taken into consideration.

nsteel · on June 9, 2022

> and 6% were retested and did not fail the second time

Really hate this but not surprised to see it. At work we've strived to stop our vendors doing this but they simply don't get it. If they find a marginal part during their n seconds of low-stress testing, we are going to see that same failure within a few minutes of turning the full product on. We don't want those parts! Happily pay to not have those parts, the re-work cost of a fully assembled board is huge.

ak217 · on June 9, 2022

It's great that they're focusing on eliminating physical defects and it makes sense that an EE industry publication would focus on that topic, but I think improving the firmware engineering process (better testing, stricter software architecture standards, better fuzzing and simulation) is probably a higher priority in most automotive applications right now.

imglorp · on June 9, 2022

Yes and formal verification; it's been around since the 90's. And now we even have high level tools like TLA+ available to reason about distributed system requirements.

And while we're at it, the full force of standard software engineering principles for automotive firmware. Remember what they found when they examined the Toyota code. https://www.edn.com/toyotas-killer-firmware-bad-design-and-i...

londons_explore · on June 9, 2022

I'd like to see more research into electrical self-testing.

For example, wire bonds on an IC can be tested by injecting 100 GHz+ signals through the bond and looking at reflections.

Every IC could be designed to do that on every pin (IO and power), which would test everything external to the IC, and some internal IC components like pin drive MOSFETs.

The resulting data can be checked for similarity to 'master' data of known-good units, or even simulations.

This could happen once in the factory, or even potentially on every power up when the device is in the field.

metaphor · on June 9, 2022

Are you alluding to e.g. IEEE 1149.6[1] and IEEE 1687[2], or something different?

[1] https://standards.ieee.org/ieee/1149.6/4706/

[2] https://standards.ieee.org/ieee/1687/3931/

mike50 · on June 10, 2022

Copy paste of mil/areo with a dash of space. I can't wait for my car to be 20 years out of date and cost 500,000USD.