The good news is that when you do A/B testing, you can create a properly randomi...

trapper · on Aug 29, 2009

The problem that I have seen in these frameworks that handle A/B testing is that they ignore the basic laws of statistics. All groups are different given large enough n. That's what p is about, do you have enough data to tell the means apart. The closer the means, the more data you need. What you really care about is how large the difference is. That's called effect size, which is really just how far apart the means are in divided by a combined standard deviation for both groups.

I certainly wouldn't be making rash decisions on data without a good effect metric.