All this implies that the projects weren't testing the right stuff. The suggestion to spend more time thinking about error cases is probably a good one; in almost all cases people forget about the fascinating variety of ways in which things might fail. On the other hand, when you have a large number of permutations to test, things get a lot messier:
> The specific order of events is important in 88% of the failures that require multiple input events.
In cases like this, you get a lot more mileage out of Jepsen-style torture testing and QuickCheck-style property testing, where the code is tested with large numbers of random inputs. This simplifies the programmer's job a lot, since they're no longer responsible for intuiting an exact series of inputs that might make something fall over.
Of course, not all failures are even this difficult to flush out. It's interesting that the authors got quite quick and substantial gains from their code analysis tool, especially when you look at how simple it is:
> (i) the error handler is simply empty or only contains a log printing statement, (ii) the error handler aborts the cluster on an overly-general exception, and (iii) the error handler contains expressions like “FIXME” or “TODO” in the comments
I agree that lots of random thrashing is good. It may not find all the bugs but boy howdy it will shake out many of them. Where I work we have our 'bot army' that is 100's of programmed clients that log into the same space and thrash around, chatting and videoing and switching their mike and headset on and off. Its a threshold for a release, to run a week on the bot army without issues (crashes, leaks, stuck bots)