The Ultimate Guide To A/B Testing

patio11 · on June 24, 2010

While the A/B testing content is interesting as well, this is also a good example of how to get your product onto authority sites (solve a problem for them, such as their lack of authoritatively presented content about an area you're expert in) and an example of how you can grow your market via out-teaching the competition.

Anyone ever wondered how you compete with a competitor who has more money than God and gives the product away for free? This is one way. (Clocking the competition in product quality is also an option. In this case, a surprisingly achievable option. Every time I show off Visual Website Optimizer it knocks socks off versus the product from the big ad company.)

michael_dorfman · on June 24, 2010

Every time I show off Visual Website Optimizer it knocks socks off versus the product from the big ad company.)

A tangential question: Do you have any similar recommendation for an Analytics tool that knocks socks off vs the free product from the big ad company?

I know there are several competitors out there, but I haven't heard the pitch yet that would lead me to switch.

dageroth · on June 24, 2010

I am working for an analytics company and it really depends on your needs as there are dozens and dozens of tools out there. We are specializing in indivualisation, integration of other data and the ability to filter and segment on any recorded data, but that comes at quite a hefty price tag and it requires some time and knowledge to really be of use to a company. So I hesitate to recommend ourselves to startups, which are not really heavily into analytics.

Mixpanel offers some good stuff for web applications. If you want, shoot me an email and let me know, what you are doing and what you need and I can perhaps make a recommendation.

sachinag · on June 24, 2010

Use KISSmetrics for funnels, Mixpanel for cohorts, and Clicky for visitor-level details.

Sadly, there's no good "one true alternative" for metrics yet.

bad_user · on June 24, 2010

Unfortunately my mind is more at ease when choosing the free products of the big ad company ... customers don't like making choices, and your product has to be clearly superior for them to chose you versus choosing a big brand.

Out-teaching is not enough.

whyenot · on June 24, 2010

One problem: he never discuses how to analyze the results. That is a pretty big missing piece for an "ultimate" guide. There should be some discussion of statistical tests. Yes, it may seem obvious, but there have been A/B case studies posted to HN where nobody performed any inferential statistics at all. If you don't do any statistics, you don't have all the information you need to make a decision. Results that may look conclusive may not be conclusive at all.

Vindexus · on June 24, 2010

I'm pretty lost when it comes to statistics. Are you talking about not having enough data to make a conclusion, or something else?

whyenot · on June 24, 2010

Suppose in A/B testing A has a higher "conversion rate" than B. Based on the data you collected, you conclude that A is the better option. Sounds good, BUT if you did not take the time to actually do some statistical hypothesis testing, you could be making a mistake.

What are the chances that A and B will have exactly the same conversion rate? For any reasonably large set of data, it is very close to 0. That means that A > B or B > A no matter what A and B actually are. The difference you observed between A and B could be real or it could be noise. Various statistical tests[1] can help you in deciding between the two possibilities.

Flip a coin twice. Lets say you get two heads. If you then conclude that for this coin heads are much more likely than tails, that would be wrong. This is the sort of mistake statistical tests can help you avoid.

[1] I'm being intentionally vague here because what statistical tests you should use depends on what A and B are, what assumptions you are willing to make, sample size, and other factors. A good starting place is probably to use a chi-square test.

sesqu · on June 27, 2010

Switching when there is no effect is harmless, from the perspective of the product. It only hurts your takeaways, and if you're doing A/B, you're conceding that your previous takeaways weren't the final word anyway.

Now, if switching is expensive for some reason, and your A/B isn't as conclusive as you'd need, there's a pretty good chance your change resistance will catch that. So even then it's probably not a big deal.

paulgb · on June 24, 2010

By inferential statistics, are you referring to confidence intervals and tests for significance, or is there more to it than that?

nostrademons · on June 24, 2010

There's a huge black art to interpreting data. It's not just confidence intervals and significance tests: you also need to watch very closely for any sources of bias in your data. Different user populations, unexpected feature interactions, bugs in your logging code, changes in the site midway through your experiment period, etc.

whyenot · on June 24, 2010

Drawing conclusions from the dataset as opposed to descriptive statistics such as the mean, variance, etc.

spuz · on June 24, 2010

Some of those tools look great for doing A/B testing for websites but are there any guidelines out there on programming design patterns to implement A/B testing in generic software? I'm reluctant to litter my code with statements like:

  if (userid % 2 == 0) {
    //do test A logic
  }else{
    //do test B logic
  }

patio11 · on June 24, 2010

You mean software not running on a server? Well, if you've got a Java enterprise programmer's love for design patterns, you could do A/B tests using a Strategy pattern. And if you wanted to decouple that from your code, you could have the Strategies created by a Factory. And if you configured your StrategyFactory in XML then you would never have any A/B testing code in your code at all... and this topic is giving me flashbacks so I'm going to stop now.

The nuts and bolts of doing this in downloadable software are not extraordinarily difficult. Pick a unique random identifier at install time, report random identifier with reports of conversion to the central server. (Passing it as a query parameter when folks open your website from within the app is so easy it is almost cheating. You can also ask for folks for a "hardware ID" to generate their license key, or something similar.)

See the presentation Paras linked to if you need implementation advice.

paraschopra · on June 24, 2010

Frameworks like Patrick's A/Bingo and Vanity are specifically meant for integrating A/B testing within software. In fact, I remember a presentation by Patrick where he outlined how easy is it to adapt his library to other languages.

Here is the link: http://www.slideshare.net/patio11/ab-testing-framework-desig...

nostrademons · on June 24, 2010

Ultimately that's what you have to do - you've got two different code paths, so there needs to be an if or function pointer in there somewhere to distinguish. As patio11 mentions, you can hide that with Strategy patterns, closures, frameworks, etc, but I'm not sure it buys you anything. You'd still need to retrofit your code to use those patterns, which may be more invasive than just using an if-statement. (Particularly since you want to remove the code if the experiment doesn't pan out, to avoid bloat.)

Another option is to branch your codebase and then proxy requests through to a new appserver running on the branch. This keeps the individual codebases simple, but merges suck - and if you don't stop development entirely, you'll need to be merging several times over the length of the experiment. (Also, this is one way experiments go wrong - a change to an unrelated feature can often have unexpected results on your data.) It's also a deployment pain if you're just a startup with a couple developers.

oscardelben · on June 24, 2010

I think statistical significance is the more important thing in a/b testing, but they only included links to external sources. It's hard to tell the difference between coincidence and real difference between two versions when changing simple things like button colors and I still feel ignorant about the subject.

paraschopra · on June 24, 2010

Do you think external sources weren't enough? Main statistical tools used for A/B testing are G-Test, Chi-Square test and Z-test (after assuming sum of binomial variables as normal distribution).

oscardelben · on June 24, 2010

What I was trying to say is that in a "definitive guide" about ab testing I would expect to find some of it. You've done a great job linking all the external resources, and I probably couldn't have done it better, but I'd personally have appreciated more about statistical significance so that I could have understood the links you have included better.

paraschopra · on June 24, 2010

Sure, point well taken. My only fear was that discussing statistical significance may have made the article a bit intimidating but yes, if I were to call it "ultimate guide", I should have discussed it.

harrybr · on June 24, 2010

Why is everyone obsessed with AB testing sign up button color? Sure, it's easy to test, but it's such a tiny factor, in amongst huge things like copywriting, page layout and the inter-page structuring of user journeys...

paraschopra · on June 24, 2010

That's because that is the most "oh-wow" example you can give to a person who has never tried AB testing before. Lot of people still don't do A/B testing because they either don't know about it or they don't think it works. Sign up button color examples works well for educating both types of people.

patio11 · on June 24, 2010

Just as important, skeptics can be convinced that the button mattered. People think with their eyes, so visually obvious changes feel real. If your first exposure to AB testing was me telling you that a one word tweak to the sidebar of my landing page increased signups by ten percent you might conclude that I am as barking mad as someone with a perpetual motion machine.

bad_user · on June 24, 2010

Still, even if the reaction is an "oh-wow" ... it still takes work to do, even for simple things like changing the color of a button.

And since we are talking about statistical confidence, you may end up waisting time instead of doing more meaningful work ... so I don't think A/B testing helps when you're small, unless you have the resources to spare.

I am happy to read about tools/advices that I can use. Nice article.

paraschopra · on June 24, 2010

>so I don't think A/B testing helps when you're small, unless you have the resources to spare.

May not be necessarily true. I recently blogged on this topic http://visualwebsiteoptimizer.com/split-testing-blog/optimiz...

sfard · on June 24, 2010

Hey - I work in design innovation for a very large online travel company (e.g., Expedia, Priceline, Booking). Here are some additional thoughts (1) Don't get obsessed with statistical significance. 95% is an arbitrary number - sometimes you need to make decisions and a 70% probability of one option being better, while not ideal, can be enough to support a decision. (2) Volume! You hear of case studies where people changed a color and conversion went up 20%, but that rarely happens in the real world. With smaller improvements you need A LOT of volume to get statistical significance, so think about what measurement really matters - sometimes there are proxies for conversion that require less traffic (e.g., click through vs. conversion) (3) Tradeoffs and attribution. It's easy to make conclusions like "this element increased conversion - so it's good" but sometimes page elements can improve a desired outcome at the expense of other things. For example, we did an A/B on a hotel page that increased conversion on that specific hotel, so people concluded it must be good, but it came at the expense of cross-selling other hotels.

paraschopra · on June 24, 2010

I am pretty amazed at your point of concluding test results even at 70% confidence. Do you do it in cases when stakes are not high or is it norm?

miguelpais · on June 24, 2010

I just get the impression that someone who wants to do A/B testing after reading this article will just create a different version of the website, and probably analyze the average of the results in each version and end up deciding solely on that.

Hope they don't miss the second point of the do's, it should be more emphasized:

> Don’t conclude too early. There is a concept called “statistical confidence” that determines whether your test results are significant (that is, whether you should take the results seriously). It prevents you from reading too much into the results if you have only a few conversions or visitors for each variation. Most A/B testing tools report statistical confidence, but if you are testing manually, consider accounting for it with an online calculator.

By the way, Excel can do it.

nhebb · on June 24, 2010

I would love to see someone with artistic talent create A/Button.com. I've tried a number of Windows based button generators, and they all sucked. Same could be said for the on-line tools I've seen. Recommendations are welcome.

eliot_sykes · on June 25, 2010

Warning: I've read it is possible that A/B testing extreme differences might get wrongly diagnosed as cloaking by the search engines. My guess this is more likely to happen with a server-side A/B test as opposed to a javascript-implemented A/B test.

Vindexus · on June 24, 2010

This site may interest you guys: http://www.abtests.com/

bdickason · on June 24, 2010

Great post paras!

paraschopra · on June 24, 2010

Thank you :)