To help us troubleshoot this, my boss asked me to program the unit to give a missed call to the server every hour. If we got a missed call, we knew that unit was still working. In countries like India, giving a missed call is a zero cost way to communicate. For example: You would pull up in front of a friend's place and give them a "missed call" to let them know that you are waiting outside etc.
Anyway, I implemented the logic and we sent off our field techs to intercept trucks at highways and update the firmware.
The way I implemented the logic was the unit was to call our server's modem number every hour at the top of the hour. No random delay nothing. So, soon after that, around 50 units tried to call our server at the same time. Remember the clocks in the units are being run off GPS and they are super accurate. This caused our telecom company's cell tower BTS to crash. Cell service in my office area, a busy part of Bangalore, was down for a whole 2 hours.
I was called into the telecom company's head office for their postmortem. They didn't yell at me or anything. They were super nice. In fact, when I finished explaining my side of the story, one of their engineers opened his wallet and gave a hundred rupees to another guy. Guess they were betting on the root cause. From what I understand, they escalated the bug to Ericsson who manufactured the BTS and got it fixed. For my part, I added a random delay and eventually removed that feature.
BTW, the term "missed call" may not be familiar to people outside India. For those not familiar, it's ringing a number and them disconnecting before they pick up. Serves to notify them that you called :)