At university I worked with small unmanned aircraft. We had a crash due to a pie...

Someone · on Sept 4, 2013

So, you changed a somewhat reproducible bug into one that has about a four billion times lower chance of occurring?

Good luck to the poor chap who will have to figure out what happened when that bug hits.

Also, you introduced a new error condition: a corrupt packet that should set a value of 1, but arrives as a value of 2 will not initiate the landing routine.

The right thing to do, IMO, is to prevent corrupt data packets from doing such stuff. Checksum the packets or, better yet, checksum and encrypt them. That prevents the enemy from taking over your plane.

Finally, I do not see how 'no proper bool' is relevant here. If the packet contained a single bit indicating the value of the flag, it still could get corrupted.

asynchronous13 · on Sept 4, 2013

> So, you changed a somewhat reproducible bug into one that has about a four billion times lower chance of occurring?

Yes. You seem to imply that's a bad thing?

> Also, you introduced a new error condition: a corrupt packet that should set a value of 1, but arrives as a value of 2 will not initiate the landing routine.

A corrupt packet should not do anything, so that's good, not an error. We do not want the landing routine to be accidentally triggered in flight. Missing a valid packet is much better than triggering on an invalid packet. (It's a UDP protocol, so the entire system is designed to handle missed packets. Ground station re-sends commands until positive acknowledgement is received from the aircraft)

> The right thing to do, IMO, is to prevent corrupt data packets from doing such stuff. Checksum the packets or, better yet, checksum and encrypt them. That prevents the enemy from taking over your plane.

Exactly right. We were already using a checksum in the datalink, and the corrupted packet that caused the crash passed the checksum as valid! During the post analysis of the crash, I discovered that it was using an 8-bit XOR checksum implemented years earlier. 8-bit XOR is ok for detecting single bit errors, but is not good at detecting burst errors -- it does not detect ~12% of highly corrupted packets. I also updated the system to use a significantly more robust checksum after that incident.

Someone · on Sept 4, 2013

It was a bad thing, the way you described it. Now that I know you also fixed the root cause of the problem, I can see it as an additional line of defense.

I think I wouldn't add it, though. Time is better spent on tooling that checks the variable doesn't get an incorrect value.

asynchronous13 · on Sept 5, 2013

Like you said, an additional line of defense. For a web application there's no need. For a flight critical application where a failure means you just lost a few $100k worth of hardware, then I'll take every measure possible.

arethuza · on Sept 5, 2013

But that's not quite the same - older versions of C didn't have a real Boolean type so you could well run into those kinds of problems.

In a language that does have a proper Boolean type I still think checking equality with literal true/false values is a bit silly.