So given an infinite set of possible theories to explain a data set, which theory should be preferred? Simple accuracy over the known data is frequently not sufficient, as you point out, because of overfitting. Given noisy data (which is almost always a given), you can't just use maximal accuracy as the sole criteria for which theory to prefer. One way forward is to combine accuracy with some measure of the complexity of the theory (e.g. minimum description length), which has much of the flavor of Occam's Razor.