We had one particularly bad customer experience on some wireless lighting control systems (power electronics and RF in a tight package) where some nodes would randomly drop from the network and couldn't be controlled.
After two sets of embedded engineers trying to solve the problem with software updates, our RF guru got involved.
He traced the problem to the RF transceiver losing PLL lock at high temperatures and the transceiver not setting the PLL status bit correctly. We had to fix our uC to reset the RF chip every X minutes based on worst-case temperature rise scenarios.
No one else would have figured this out for months. The bug had not been fixed through three generations of this transceiver and the guru had dealt with the issue years ago.
After two sets of embedded engineers trying to solve the problem with software updates, our RF guru got involved.
He traced the problem to the RF transceiver losing PLL lock at high temperatures and the transceiver not setting the PLL status bit correctly. We had to fix our uC to reset the RF chip every X minutes based on worst-case temperature rise scenarios.
No one else would have figured this out for months. The bug had not been fixed through three generations of this transceiver and the guru had dealt with the issue years ago.