
How not to run an A/B test (2010) - neya
http://www.evanmiller.org/how-not-to-run-an-ab-test.html
======
lmkg
Part of this is a tooling issue: A/B testing products allow you to use the
tool in a way that gives erroneous answers. They don't tell you what's wrong
about doing this, and I would go as far as to say that it's intentional.

I've spoken to a co-worker who used to be a product manager for one of them.
This is intentional. In her words, "The sales guys want to be able to show
quick wins to the prospective customer." It would seem that the market favors
rose-tinted sycophantic products that tell you your test is awesome, rather
than products which give accurate outputs. I'm not surprised, but I am
disappointed.

~~~
ssharp
I've long suspected that A/B products "confidence" stats are intentionally
bad. I see far too many egregious declarations of "wins" in VWO. Even if they
temper it by saying "you should wait a full seven days", I still think they do
a large disservice by not forcing larger sample sizes before steering you in
any direction. Those tools make it very easy to run tests and assuming any of
their users are sophisticated enough to make their own statistical conclusions
is such an obvious mistake that they have to be aware of it.

The conversion optimization consulting companies I've seen aren't much better.
They have no problem reporting "increased" revenue and rates well before there
is enough data to support that. I'm calling these types of tests A/B Dog and
Pony shows. It's easy to report these gains up, get some short-term love from
your boss or client, and hope the chickens never come home to roost or things
are forgotten before they do.

There was a blog post on Unbounce last week talking about a "successful" A/B
test where the combined conversions of A and B over the testing period was 12.
This is the kind of information a company specializing in split testing
landing pages is providing. The article was since taken down, but it's pretty
clear that a large subset of conversion rate folks and the products used for
conversion rate optimizations have little interest in dealing with legitimate
results.

Spend some time reading A/B testing article on Inbound.org or GrowthHackers.
I'll call them mostly a 50/50 mix of articles saying why your A/B testing
results suck and don't stick and articles bragging about A/B testing results
that aren't statistically valid.

~~~
btilly
One of the problems that the products face is that customers are going to ask,
_" Why can't you show me the statistics that X offers?"_ And if you try to say
that it is statistically impossible to really offer that, they will tell you,
_" Well X does it, and they are smart. So you must be wrong!"_

A classic example is that when Google put together Visual Website Optimizer
they included a heuristic for the chance for each version to beat the control.
This heuristic was statistically wrong. But since Google offered it, everyone
else figured out Google's heuristic and did the same thing.

~~~
siddharthdeswal
I think you mean Google Website Optimizer, not Visual Website Optimizer.

~~~
btilly
Whoops, yes.

------
btilly
See [http://elem.com/~btilly/ab-testing-multiple-
looks/index.html](http://elem.com/~btilly/ab-testing-multiple-
looks/index.html) for a series of responses to this article.

I never finished the series, but the second one explains why, in the real
world of web tests, the issue it is concerned about is pretty close to a non-
issue. (At least if you wait until you've reached a fair portion of the effort
you would have been willing to put forth.)

------
greggarious
This is what happens when people who haven't taken a statistics course try to
do statistics.

Any competent UX researcher should know about the multiple comparisons
problem:
[https://en.wikipedia.org/wiki/Multiple_comparisons_problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem)

