

A/A Split Testing - fezzl
http://www.getelastic.com/product-list-ab-test/

======
harrybr
Essentially the article says (paraphrased) "Wow, look, if you A/B test two
versions of the exact same design, you get slightly different conversion
rates. Can split testing be trusted?"

I don't understand why they find this surprising. Of course there's going to
be some variation in the conversion rates. This is the reason why GWO reports
statistical significance.

~~~
sokoloff
While I don't disagree with your main point, there is another question just
under the surface:

Is the method by which you're distributing sessions into your tests somehow
biased?

I'm going to call A' the "new" (but identical) test, and A the original test.

If you've been running A for a long time, and now add A', what is the chance
that the visitor populations between A and A' will be different enough to
drive statisically significant differences that are population-related rather
than test-content-related?

Put slightly differently: If your returning session conversion rate is higher
than your first session conversion rate, you will need to take some pains to
ensure that each of the tests is getting a fair shake at the traffic. In many
cases, that means ending test A, and creating a new test a and a' such that
neither a nor a' has an advantage. It's easy to ASSume that there is no
meaningful test bias, while the reality is that it's quite easy to have test
bias creep in.

~~~
hartror
Another common bias is if your are running multiple split tests in parallel
you need to assign each visitor to an option in each split test at RANDOM. If
you don't you'll end up with 50% of the users going though all the same 50% of
the options and the other 50% of users going through the other 50% of the
options.

------
almost
Can split testing be trusted? No, not if you misunderstand the most basic
concepts. Non-statically significant results are, surprise surprise, not
statically significant.

~~~
hartror
This is exactly the problem with people implementing split testing and they're
hurting their business.

They hear about it, run off and implement the a/b versions of their pages and
add some tracking. Proceeding then to draw incorrect conclusions and make bad
decisions based on them. They don't go read up on it enough to understand the
statistics behind it. I used my stats book from Uni (I knew it had a better
use than a laptop stand, a java book now has that honour) but there are plenty
of tutorials out there. For example: [http://visualwebsiteoptimizer.com/split-
testing-blog/what-yo...](http://visualwebsiteoptimizer.com/split-testing-
blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/)

It really is a toss up between statistics and medicine for which is the most
misunderstood of the sciences.

~~~
ramit
Unfortunately, nobody wants to learn about statistics -- they just want it to
work so they can make more money. I just wrote a post about behavioral change
and understanding your users here:
[http://www.iwillteachyoutoberich.com/blog/why-personal-
finan...](http://www.iwillteachyoutoberich.com/blog/why-personal-finance-
experts-continue-writing-worthless-advice/)

That's why sites like Visual Website Optimizer are getting increasingly good.
They use plain language to explain when to keep running the tests.

Again: It's unrealistic to expect lay users to learn statistics. They don't
want to become statistical experts, they usually just want to increase sales.
This is where software can help.

~~~
hartror
I think even laymen should learn the why of the statistics, a few paragraphs
about statistical significance would have helped the article's author I should
think.

As an aside, wow another one of my favourite bloggers responding to a comment
of mine on HN this week!

------
dfranke
Better not let Zed Shaw read this.

<http://www.zedshaw.com/essays/programmer_stats.html>

------
smallegan
It seems that if an A/A test is showing a variance then your sample size is
too small.

~~~
btilly
There is always a variance. Even with huge tests.

The question is whether it is statistically significant.

