Hacker News new | past | comments | ask | show | jobs | submit login

The study argues that most bugs found in production can be reproduced with simple test cases. This does not imply that just adding a bunch of simple test cases would have prevented these bugs. At least some of the tested projects have very large test suites, and all of these bugs made it through that first line of defense.

All this implies that the projects weren't testing the right stuff. The suggestion to spend more time thinking about error cases is probably a good one; in almost all cases people forget about the fascinating variety of ways in which things might fail. On the other hand, when you have a large number of permutations to test, things get a lot messier:

> The specific order of events is important in 88% of the failures that require multiple input events.

In cases like this, you get a lot more mileage out of Jepsen-style torture testing and QuickCheck-style property testing, where the code is tested with large numbers of random inputs. This simplifies the programmer's job a lot, since they're no longer responsible for intuiting an exact series of inputs that might make something fall over.

Of course, not all failures are even this difficult to flush out. It's interesting that the authors got quite quick and substantial gains from their code analysis tool, especially when you look at how simple it is:

> (i) the error handler is simply empty or only contains a log printing statement, (ii) the error handler aborts the cluster on an overly-general exception, and (iii) the error handler contains expressions like “FIXME” or “TODO” in the comments




I don't know how trusting in test cases isn't the same as saying "hidesite is 20-20". Blaming bad test cases is easy; thinking of the right test cases up front is hard.

I agree that lots of random thrashing is good. It may not find all the bugs but boy howdy it will shake out many of them. Where I work we have our 'bot army' that is 100's of programmed clients that log into the same space and thrash around, chatting and videoing and switching their mike and headset on and off. Its a threshold for a release, to run a week on the bot army without issues (crashes, leaks, stuck bots)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: