It could also be that the flags are so tangled together that a change to one part of the system can break many other parts that are completely unrelated. Sure you can run a unit test for X, but what about Y? Retesting everything is all you can do when everything is so tangled you can’t predict what a change could effect.