

The Ultimate Guide To A/B Testing - paraschopra
http://www.smashingmagazine.com/2010/06/24/the-ultimate-guide-to-a-b-testing/

======
patio11
While the A/B testing content is interesting as well, this is also a good
example of how to get your product onto authority sites (solve a problem for
them, such as their lack of authoritatively presented content about an area
you're expert in) and an example of how you can grow your market via out-
teaching the competition.

Anyone ever wondered how you compete with a competitor who has more money than
God and gives the product away for free? This is one way. (Clocking the
competition in product quality is also an option. In this case, a surprisingly
achievable option. Every time I show off Visual Website Optimizer it knocks
socks off versus the product from the big ad company.)

~~~
michael_dorfman
_Every time I show off Visual Website Optimizer it knocks socks off versus the
product from the big ad company.)_

A tangential question: Do you have any similar recommendation for an Analytics
tool that knocks socks off vs the free product from the big ad company?

I know there are several competitors out there, but I haven't heard the pitch
yet that would lead me to switch.

~~~
dageroth
I am working for an analytics company and it really depends on your needs as
there are dozens and dozens of tools out there. We are specializing in
indivualisation, integration of other data and the ability to filter and
segment on any recorded data, but that comes at quite a hefty price tag and it
requires some time and knowledge to really be of use to a company. So I
hesitate to recommend ourselves to startups, which are not really heavily into
analytics.

Mixpanel offers some good stuff for web applications. If you want, shoot me an
email and let me know, what you are doing and what you need and I can perhaps
make a recommendation.

------
whyenot
One problem: he never discuses how to analyze the results. That is a pretty
big missing piece for an "ultimate" guide. There should be some discussion of
statistical tests. Yes, it may seem obvious, but there have been A/B case
studies posted to HN where nobody performed any inferential statistics at all.
If you don't do any statistics, you don't have all the information you need to
make a decision. Results that may look conclusive may not be conclusive at
all.

~~~
paulgb
By inferential statistics, are you referring to confidence intervals and tests
for significance, or is there more to it than that?

~~~
nostrademons
There's a huge black art to interpreting data. It's not just confidence
intervals and significance tests: you also need to watch _very_ closely for
any sources of bias in your data. Different user populations, unexpected
feature interactions, bugs in your logging code, changes in the site midway
through your experiment period, etc.

------
spuz
Some of those tools look great for doing A/B testing for websites but are
there any guidelines out there on programming design patterns to implement A/B
testing in generic software? I'm reluctant to litter my code with statements
like:

    
    
      if (userid % 2 == 0) {
        //do test A logic
      }else{
        //do test B logic
      }

~~~
patio11
You mean software not running on a server? Well, if you've got a Java
enterprise programmer's love for design patterns, you could do A/B tests using
a Strategy pattern. And if you wanted to decouple that from your code, you
could have the Strategies created by a Factory. And if you configured your
StrategyFactory in XML then you would never have any A/B testing code in your
code at all... and this topic is giving me flashbacks so I'm going to stop
now.

The nuts and bolts of doing this in downloadable software are not
extraordinarily difficult. Pick a unique random identifier at install time,
report random identifier with reports of conversion to the central server.
(Passing it as a query parameter when folks open your website from within the
app is so easy it is almost cheating. You can also ask for folks for a
"hardware ID" to generate their license key, or something similar.)

See the presentation Paras linked to if you need implementation advice.

------
oscardelben
I think statistical significance is the more important thing in a/b testing,
but they only included links to external sources. It's hard to tell the
difference between coincidence and real difference between two versions when
changing simple things like button colors and I still feel ignorant about the
subject.

~~~
paraschopra
Do you think external sources weren't enough? Main statistical tools used for
A/B testing are G-Test, Chi-Square test and Z-test (after assuming sum of
binomial variables as normal distribution).

~~~
oscardelben
What I was trying to say is that in a "definitive guide" about ab testing I
would expect to find some of it. You've done a great job linking all the
external resources, and I probably couldn't have done it better, but I'd
personally have appreciated more about statistical significance so that I
could have understood the links you have included better.

~~~
paraschopra
Sure, point well taken. My only fear was that discussing statistical
significance may have made the article a bit intimidating but yes, if I were
to call it "ultimate guide", I should have discussed it.

------
harrybr
Why is everyone obsessed with AB testing sign up button color? Sure, it's easy
to test, but it's such a tiny factor, in amongst huge things like copywriting,
page layout and the inter-page structuring of user journeys...

~~~
paraschopra
That's because that is the most "oh-wow" example you can give to a person who
has never tried AB testing before. Lot of people still don't do A/B testing
because they either don't know about it or they don't think it works. Sign up
button color examples works well for educating both types of people.

~~~
bad_user
Still, even if the reaction is an "oh-wow" ... it still takes work to do, even
for simple things like changing the color of a button.

And since we are talking about statistical confidence, you may end up waisting
time instead of doing more meaningful work ... so I don't think A/B testing
helps when you're small, unless you have the resources to spare.

I am happy to read about tools/advices that I can use. Nice article.

~~~
paraschopra
>so I don't think A/B testing helps when you're small, unless you have the
resources to spare.

May not be necessarily true. I recently blogged on this topic
[http://visualwebsiteoptimizer.com/split-testing-
blog/optimiz...](http://visualwebsiteoptimizer.com/split-testing-
blog/optimization-vs-validation-two-distinct-uses-of-ab-testing/)

------
sfard
Hey - I work in design innovation for a very large online travel company
(e.g., Expedia, Priceline, Booking). Here are some additional thoughts (1)
Don't get obsessed with statistical significance. 95% is an arbitrary number -
sometimes you need to make decisions and a 70% probability of one option being
better, while not ideal, can be enough to support a decision. (2) Volume! You
hear of case studies where people changed a color and conversion went up 20%,
but that rarely happens in the real world. With smaller improvements you need
A LOT of volume to get statistical significance, so think about what
measurement really matters - sometimes there are proxies for conversion that
require less traffic (e.g., click through vs. conversion) (3) Tradeoffs and
attribution. It's easy to make conclusions like "this element increased
conversion - so it's good" but sometimes page elements can improve a desired
outcome at the expense of other things. For example, we did an A/B on a hotel
page that increased conversion on that specific hotel, so people concluded it
must be good, but it came at the expense of cross-selling other hotels.

~~~
paraschopra
I am pretty amazed at your point of concluding test results even at 70%
confidence. Do you do it in cases when stakes are not high or is it norm?

------
miguelpais
I just get the impression that someone who wants to do A/B testing after
reading this article will just create a different version of the website, and
probably analyze the average of the results in each version and end up
deciding solely on that.

Hope they don't miss the second point of the do's, it should be more
emphasized:

> Don’t conclude too early. There is a concept called “statistical confidence”
> that determines whether your test results are significant (that is, whether
> you should take the results seriously). It prevents you from reading too
> much into the results if you have only a few conversions or visitors for
> each variation. Most A/B testing tools report statistical confidence, but if
> you are testing manually, consider accounting for it with an online
> calculator.

By the way, Excel can do it.

------
nhebb
I would love to see someone _with artistic talent_ create A/Button.com. I've
tried a number of Windows based button generators, and they all sucked. Same
could be said for the on-line tools I've seen. Recommendations are welcome.

------
eliot_sykes
Warning: I've read it is possible that A/B testing extreme differences might
get wrongly diagnosed as cloaking by the search engines. My guess this is more
likely to happen with a server-side A/B test as opposed to a javascript-
implemented A/B test.

------
Vindexus
This site may interest you guys: <http://www.abtests.com/>

------
bdickason
Great post paras!

~~~
paraschopra
Thank you :)

