To that end, btw, I think a service like yours is potentially quite valuable!
We collect a variety of metrics. When we do an A/B test, we look at a all of them as a way of understanding what effect our change has on user behavior and long-term outcomes.
A particular change may be intended to effect just one metric, but that's in an all-else-equal way. It's not often the case that our changes affect only one metric. And that's great, because that gives us hints as to what our next test should be.