more ploomans's comments

ploomans · on July 23, 2013

I think the underlying cause is similar to the publication bias they mention in the article.

For almost all A/B tests, A is actually not much different from B. But due to small sample sizes you will see after the first week, at random, a rather strong positive or negative result. And now the publication bias/selection bias kicks in. If you see strong negative results in the first week you will quickly give up and start a new A/B test. If the initial results show positive results you get excited and keep testing but then in most circumstances you get reversal to the mean at high sample sizes. This would most likely also have happened to the experiments you terminated early but you selected them away and in your memories only for the good initial results a reversal to the mean often seem to happen.

Most A/B tests, ran for a long enough time, will show insignificantly differences. Blogs may give a different impression but again explained by publication bias.

I always compare A/B testing to genetic mutations. Almost none have a strong impact on the fitness of an animal but once in a very long while you have a positive one. Luckily they accumulate and you can get some impressive results with A/B testing (aka natural selection)