I maintained a fairly successful legacy C/C++ application that performed business critical functions. The core developers were long gone, and countless of changes had to gone into it since. The only way the application survived was automatically comparing outputs and intermediate data structures against "ideal" values. Hundreds of business scenarios with their own input datasets. No one really understood what each scenario or ideal value meant. So your goal was to minimize divergence, and update the test case if it was deemed acceptable.