

Most AB-tests will fail - m4xt3r
http://www.jitbit.com/news/185-most-of-your-abtests-will-fail/

======
AJ007
#1 Its good to see a post using statistically significant sample sizes. Most
articles on this topic that I have read either include sample sizes so
embarrassingly tiny it raises questions if the author even understood his
statistics class (30 to 300) to simply not including numbers at all.

#2 If your audience is already pre-sold on your product when they visit your
site, landing page changes won't be as meaningful. Possibly relevant to
Jitbit, in this case.

#3 When you do conversion optimization, eventually you hit a number that
simply becomes unbeatable. Stating the obvious, you can't convert at over
100%. Depending on traffic source/quality/intention you may find that ceiling
to be lower, around 60-70%.

~~~
btilly
In reverse order.

#3. For real companies with completely unmotivated visitors, 60% conversion
rates from visitor to paying customer is untenable. 5% is probably too much to
hope for.

#2. Absolutely. A/B tests should be focused on actual balanced decision
points. "Should I sign up?" "Should I open this email?" "Should I go back to
this site?" But not on people who are already committed to doing what they
were going to do anyways.

#3. On sample sizes, it is important to think carefully about the maximum
effort you're willing to put into a test. Be very, very cautious about
accepting test results that arrive early. No, 99% confidence is not enough to
stop with 200 conversions, and 99.9% probably isn't either. Don't worry about
statistical significance when you kill tests that have run too long to be
worth continuing. This is a line of reasoning that is sadly rare in our
industry. (I keep meaning to write my next article on that topic. But
[http://elem.com/~btilly/ab-testing-multiple-
looks/part2-limi...](http://elem.com/~btilly/ab-testing-multiple-
looks/part2-limited-data.html) explains one way to come up with such a
strategy.)

~~~
pbiggar
Looks at OP's point about traffic source. Its likely that ones of your traffic
sources converts way better than 5%, while unqualified people straight from
google will convert much lower.

------
czr80
This takeaway: "Don’t finish your tests before you have a statistically
significant result" is potentially highly misleading.

See: [http://www.evanmiller.org/how-not-to-run-an-ab-
test.html](http://www.evanmiller.org/how-not-to-run-an-ab-test.html)

------
mikebell
Just because the A/B test doesn't produce the results you expected doesn't
mean it produced no results. Research scientists who don't prove their
hypothesis don't say they learned nothing from the experiments, they just
write a different paper.

Just to pick one example from this post; the whole-site redesign didn't change
conversions. Did it change something else? More importantly, you now know that
investing effort in a full re-design didn't really have any effect. Next time
someone suggests changing the site, you have evidence that says it may not be
worth the engineering cost, and maybe its better to focus somewhere else
first.

------
drhayes9
> No company have ever increased the revenue by changing an insignificant
> detail like a button color. None. Zero. Get over it.

I'm sensitive to hyperbole, and I know he later refines his statement, but
this is not true. Google Ads change what seem to be insignificant details by
tiny amounts all the time (luminance on links, vertical space below the search
bar, padding around sitelinks) and derive significant revenue changes.

I think one issue is that most sites' A/B tests results are difficult to
distinguish from noise while Google has what you might call a significant user
base.

~~~
janesvilleseo
I agree with you above statement. However, in regards to your example. I often
wonder how much of the increase is due to Google overcoming ad blindness and
how much is because of the ads are truly more engaging.

~~~
drhayes9
Good question. That's where longer-term measurement comes in. Overcoming ad
blindness is a transitory thing that goes away after a period of time while
engagement tends to stick (as measured by CTR, say).

------
btilly
True, most A/B tests fail.

However looking at the listed tests, they mostly seem to be general branding
tests. Changing your branding, including fairly major redesigns, reliably
fails to make a difference.

But lots of other kinds of changes reliably do make a difference. Big button.
Putting signup link in multiple places. The word "Free". Streamlining your
form. (No, you don't want to ask for first/last name.) Email headlines.
Providing relevant results first.

However they make small differences. You need to plan on tens of thousands of
successful conversions in your test. You can make this easier by running
multiple tests at once. If you're a small company, you can make conversion be
"to the next step in the conversion funnel" instead of "to the end". (But be
warned, companies that test long enough and get large enough tend to find
examples of cases where that short cut burned them.)

However it is very important to set expectations. Expect to find few big wins.
Expect most to be modest wins. Keep track. For an established company, if A/B
testing is adding 5-20% to your bottom line each year, it is totally
worthwhile. (If you have traffic and have not been testing, then odds are that
your first year will find more than a 20% boost. But this declines in future
years.)

A/B testing is a valuable tool. It isn't magic pixie dust.

------
tribeofone
"No company have ever increased the revenue by changing an insignificant
detail like a button color. None. Zero. Get over it."

This is not entirely true, if you have enough traffic, even the slightest
change can make a difference. HOWEVER, for 99% of sites what you say is true.
A/B testing should be used last, to squeeze out the last few percents of
optimization, once you've done everything else. Not as a 'first step'.

One cool thing that can come of these tests, as an anecdote, is to see what is
really working on a page, as opposed to what needs to be optimized.

We ran a Multivariate Test once for a signup form. The results were all
similar, except for 1 which was much, much worse then the others. We could not
understand why because there were no big differences between the elements, but
this one particular combination set was having a bad time at it. Turns out,
the headline was longer for one of the variations. This wrapped the text and
pushed the call to action down below the fold, resulting in a ~30% (negative)
difference in the conversion from the ones that were above the fold.

~~~
m4xt3r
Those things happened (and still happen) a lot to us. At first we were all
excited when there was a significant difference between alternatives, but now
I just assume that something is wrong by default.

------
JacobJans
In my experience, sometimes a focus on A/B testing can extremely limit you.

How? Because sometimes you need a change in strategy, and A/B testing is more
about changes in implementation of strategy.

For example, sometimes you need to do 'lead gen' instead of 'add to cart.'

Or, sometimes the answer to the problem is to spend a bunch of money on buying
Facebook likes from Facebook, instead of tweaking your landing page design.

Or, better yet, sometimes the best strategy is to keep the landing page
exactly the same, because it's already optimized, and focus on improving the
product, so your customers will stick around longer.

That being said, always be ready to split test everything. And this means,
always keep close track of your numbers.

Why? Because as your strategy shifts, you don't want to go blind. You'll have
lots of implementation ideas, and you need to know which ones work best.

------
LanceJones
Interesting headline on the HN post. Joanna and I at Copy Hackers respectfully
disagree. A properly-designed split test will ALWAYS produce a result --
you'll generate negative, positive, or neutral lift. And no matter what type
of lift you achieve, you can ALWAYS learn from the test.

We just concluded an 11-site split test where we changed nothing but the home
page headline -- written as a value proposition -- and we produced 9 of 11
"winning" tests, with an average conversion increase of 34% on the primary KPI
(and the 34% includes the "losing" tests).

Bottom line is (from our many years of testing copy)... you have to change
meaningful things to create conversion lift. And "meaningful" means in the
eyes of your visitors.

------
aresant
Most A/B testing will fail when you're testing the wrong elements.

I have a challenge for you.

The examples you list are "top" of the funnel tests: buttons for engagement, a
homepage redesign.

But when I go deeper into your funnel like your pricing grid page and cart
checkout page, I see lots of tweaks that, in my experience, can drive the
significant gains that you are looking for.

I'm going to use this page as my starting point, by way of example:

[http://www.jitbit.com/hosted-
helpdesk/purchase/](http://www.jitbit.com/hosted-helpdesk/purchase/)

When I click order now on the "Startup" plan and get myself to the cart page
here was my experience:

a) The load time on click to page load for checkout was >3 seconds for me. You
can get some idea of why here
[http://tools.pingdom.com/fpt/](http://tools.pingdom.com/fpt/) or here
[http://www.webpagetest.org/](http://www.webpagetest.org/)

b) Your design changes drastically between those pages, breaking the user's
flow and attention. 37signals products do a nice job of maintaining conformity
between pricing grid to checkout as an example.

c) You're treating a SAAS checkout like a product checkout - asking about
quantity, list price, etc. What % of your users add multiple services to the
cart? Can you keep this in more of a flow?

d) You have no visible security certificate on the page at all, when in fact
even the placement of your security certificate on checkout can make a drastic
impact in conversion eg [http://www.conversionvoodoo.com/blog/2010/07/proper-
placemen...](http://www.conversionvoodoo.com/blog/2010/07/proper-placement-of-
trust-logos-can-make-a-huge-difference-in-conversion-rate/)

e) Yes, some people are still confused by what a CVV2 / CVC2 code is and
displaying a visual explanation can help your conversions.

So my challenge for you is focus on the RIGHT area on your page to test - your
highest intent traffic and I think you will find that a lot fewer cycles need
be wasted and A/B testing remains one of the highest ROI activities your org
can embark on.

~~~
m4xt3r
Thanls a lot for taking your time to write this awesome comment!

The thing is we have very little control over the checkout area, since it is
hosted by our payment provider and we cannot run tests there. We are not happy
about it and we will move to Stripe as soon as it will be available outside of
the US.

Really, thnaks a lot for this. I will look into this right now, maybe we could
fix some of those.

EDIT: We've removed the quantity field from the SaaS products, never thought
of that before, thanks.

~~~
jordo37
I would also highly encourage you to look at Balanced or Stripe for checkout -
the integration isn't too bad, the pricing is great and you have full control
over the look and feel of your checkout pages.

Also, great tip on the CVV2 info, realized we didn't have it in our system
either.

------
pbiggar
I think this post and most posts on the topic are missing a major advantage of
A/B testing. Sure, one of its uses is to experiment and try to keep optimizing
everything. The other use of A/B testing is to prevent regressions in a
statistically rigorous way.

Basically, if you can afford it, every change to the website could be A/B
tested, and you'd know that the change doesn't negatively impact the
conversion rate. Sometime's its really useful to know that you can safely
change the button from red to a colour that matches your palette better, and
"no change" is a great result.

~~~
JacobJans
I've been using this thinking recently to great advantage.

My basic strategy has been to make the design more beautiful, and if the
conversion rate stays the same or improves, I've succeeded.

This way, I've been able to significantly improve the aesthetic appeal of many
of my sites without risking the bottom line.

------
Slix
> No company have ever increased the revenue by changing an insignificant
> detail like a button color. None. Zero. Get over it.

This contradicts the article about A/B testing by Wired, which used the Obama
campaign as an example.

[http://www.wired.com/business/2012/04/ff_abtesting](http://www.wired.com/business/2012/04/ff_abtesting)

~~~
m4xt3r
Alright, that was a bold statement on my side. That's true for 99.9% of the
companies, not 100%.

Again, most of us don't have enough traffic to make successful buton color
tests.

~~~
justinmarsan
Have you tried MAB ? It seems like the best way to do A/B testing on small-
traffic pages, let it run forever without losing much due to the reduced
frequency of poorly performing variantes.

Setup your test with 3 or 4 colors, the programm will automatically pick the
best one over time, so it doesn't matter how long you let it run, be it weeks
months or years, and when you don't want to test it anymore you pick the one
that performed the best out of the many visitors you've had over a large
period.

[http://en.wikipedia.org/wiki/Multi-
armed_bandit](http://en.wikipedia.org/wiki/Multi-armed_bandit)

------
posabsolute
Not quite sure I fully agree with this, having played myself with a/b tests
for big box publicity, changing the message as lead to tremendous changes.
Same for the website, conversions has changed a lot, not always for the
better.

I do fully agree that buttons color really doesn't matter, which may explain
OP new vs old website conversion rate, frankly it's about the same layout,
without any different call to actions.

------
alohahacker
"No company have ever increased the revenue by changing an insignificant
detail like a button color. None. Zero. Get over it."

I remember Josh the founder of Omniture said how they first got hired by ebay
back in the day. Changing the color of the ebay homepage and button to yellow
helped ebay increase sales by 15%. That slight increase on a heavily
trafficked website resulted in millions of dollars in increased sales.

------
Felix21
If all you test is the layout then, for the most part, you are Right! Right!
Right!

But...

Even with little traffic, you can still achieve significant gains by focusing
on the copy as opposed to the layout and colours.

I learnt this mostly with Adwords. At the prices they charge now, you just
have to find the gains wherever they are.

~~~
m4xt3r
Same principle applies to copy, especially to microcopy. If you change a
couple of words in a headline, it rarely makes a difference.

~~~
btilly
Depends where the copy is. Changing a couple of words in an email subject line
can and often does make a huge difference. Of course the fine line you walk is
that some of the most effective words at driving behavior (Free! Now!) are
also effective at getting you marked as spam.

~~~
m4xt3r
I think wording in the subject line is important, because it is the only thing
a user sees. There are no other factors that affect whether he will open it or
not.

------
grandalf
Using A/B testing and then measuring conversion rate skips a lot of useful
steps.

A/B testing can be used to increase engagement with the site, which may result
in a conversion days or weeks later.

~~~
m4xt3r
Yes, but those things are impossible to track. If metrics changed weeks later,
there is no way of knowing that it was caused by a test you run earlier.

~~~
lucisferre
It's easier and easier to track, ad retargeting works this way for example.
But yes it is still harder to track accurately.

------
ebbv
In our bizdev meeting yesterday we had a conversation that was very much along
the lines of this post. It mirrors our experience over years of A/B testing
completely.

------
TallboyOne
This article is full of so much wrong I can't even finish reading it. Why is
this on the first page.

Source: I do conversion optimization for a living.

~~~
lucisferre
Source: Your opinion. Perhaps you would like to elaborate and add to the
conversation instead.

