

How We Improved Our Conversion Rate by 72% - dmix
http://dmix.ca/2010/05/how-we-increased-our-conversion-rate-by-72/

======
patio11
A bit of professional curiosity: I note the site appears to be written in
Rails. What did you use for the A/B testing? [Edit: I see below that the
answer is A/Bingo. I approve, for the obvious reason.] Any opinions on it?

You probably already know this, but the statistical magic behind A/B testing
lets you know if A was better than B but not by how much. You can calculate
"Hmm, 60% more conversions" but that conclusion is written on water, as I
discover with disturbing heaven-smites-you-for-arrogance regularity every time
I mention the results of the test that way.

If you sustain the 70% higher level, though, hats off and keep spreading the
testing gospel!

~~~
dmix
I use A/Bingo [1] on CareLogger. Although at work (Learnhub.com) we use Vanity
[2].

I found that Vanity splits the participants more equally. I noticed that with
A/Bingo one would have 50 more trials than the other ones. Not a big deal
unless you check the dashboard constantly.

They both do a simple task well so either will work fine.

Regarding the sustainability, this is something I've noticed as well. The
conversion rates fluctuates heavily depending on the days of the week (middle
of the week is best). It swings back and forth but 25% is the new middle
ground and not just the good days as it was before.

[1] <http://www.bingocardcreator.com/abingo/>

[2] <http://vanity.labnotes.org/ab_testing.html>

~~~
patio11
Both A/Bingo and Vanity split participants in essentially the same fashion:
each new participant is assigned totally randomly. (Not only is it the same
effect, the algorithm we use for it is practically identical, too.)

This tends to produce a phenomenon well-known to coin flippers: the more coins
you flip, the closer the _percentage_ of heads and tails will converge to
50/50 and the farther your _counts_ of heads and tails will diverge from each
other.

~~~
dmix
I figured as much. It's likely that I noticed it because in the environment I
use Vanity theres about much more traffic than on carelogger so the numbers
even out much more quickly.

------
mburney
I really like articles like these, because they give concretes rather than
just general advice like "be determined, never give up", etc.

~~~
AndyKelley
Do a barrel roll

------
Groxx
I wonder if the red/green difference is partly due to that blues and greens
for sign-ups are getting common, so red stands out. Could it just be an
example of staying with / ahead of the curve?

~~~
dmix
Yep, as I mentioned in the article it worked so well primarily because green
was used on multiple parts of the homepage. So the red was a significant
contrast to the rest of the layout.

It's like 7UPs logo, they put a red dot so it draws your eyes attention when
your scanning a row of cans.

------
chegra
idk, what would be interesting would be actual figures and not percentages.

Maybe you could have 7 clicks before but now you have 12. Or maybe 14 clicks
and now you have 24. The actual figures will help us to judge the significance
of the result

~~~
dmix
I did share my signup conversion rate at the beginning, you can apply that to
the trials for an idea.

Here's an example I grabbed from A/Bingo dashboard for the 2 different
headlines test:

Version 1: 672 participants - 96 (14.42%) conversions

Version 2: 683 participants - 129 (18.97%) conversions

~~~
paraschopra
It might not be a big deal but the difference isn't significant at 95%
confidence level. It is significant at 90% confidence level but I personally
prefer to shoot for >95% and ideally 99% confidence level.

~~~
tansey
Out of curiosity, why?

To me, there is a difference between theoretical significance and practical
significance. A 90% likelihood that the second version is actually better than
the first version is enough for me to switch.

What is the downside of switching? About 10% of the time you'll be making a
change that is no better than the old version. Unless you REALLY love green
buttons, I think it's worth the risk. :)

~~~
paraschopra
There isn't any downside. But if the test costs are negligible and you can
afford to run a test for a week longer, it is always great to do so. I have
seen too many tests where confidence level after touching >95% came back to
70% or so once extended the test.

An even better way is to do a follow up test where you do an A/B test where
both variations are red. And if you see enough variance in that test, then I
don't think you should take results seriously.

When you are testing it is always better to try proving a hypothesis wrong
rather than trying to prove it right.

EDIT: clarified some parts.

~~~
jules
Perhaps multi armed bandit algorithms can help here. They automatically
balance testing to see which version is better with using the best version as
much as possible.

A multi armed bandit algorithm is a gambler with a number of levers at his
disposal. He chooses which lever to pull and then receives a reward. In this
case lever 1 is "show page version A" and lever 2 is "show page version B".
The algorithms work so that they balance discovering which lever is best with
pulling the best lever.

Here's an example of a very simple algorithm. Record the average profit for
page A and page B in two variables. Now with probability p (for example p=95%)
choose the page with the highest average profit so far. With probability 1-p
pick one at random. A more advanced algorithm could vary p over time so that
it starts at 0% increases towards 100%.

<http://en.wikipedia.org/wiki/Multi-armed_bandit>

------
charliepark
Out of curiosity, was the change to "get started now" based on my comment here
(<http://news.ycombinator.com/item?id=1380017>)? I've been lobbying for more
people to try that language and to share their results. Thanks for doing that,
even if it was independent of my own stuff.

~~~
dmix
I did come across a discussion on HN recently about "Get Started" that
inspired me to do it.

Although I check that particular comment was 4 days ago and I ran the test
before that so maybe it was one of your earlier comments?

------
rriepe
The "signup for free" part had me wondering. "Sign up" is the verb, where
"signup" is a noun. I wonder if the benefit was just in eliminating the
misused word. Most other A/B tests I've seen favor the phrase with "free" in
it.

It's similar to the "login" vs. "log in" discussion, but I think it's a bit
more clear cut with "sign up."

~~~
TheSOB88
Personally, grammatical mistakes like this have always bothered me, but most
people don't even realize it's wrong. So it's odd to think that this would
have an effect on the populace at large.

Perhaps this site caters to an overly literate part of the diabetes-suffering
population? I guess those who are technically inclined tend to be more
educated.

------
underdown
Good read but I would point out the whole green/red button thing is completely
dependent on your site design. I've run dozens of split tests on dozens of
sites and there is no right answer. In fact sometimes increasing contrast on
conversion points lowers conversion rate. The change in message is the big
takeaway from this article.

------
nreece
Related reading on button color test: Red beats Green -
[http://blog.performable.com/post/631526233/button-color-
test...](http://blog.performable.com/post/631526233/button-color-test-red-
beats-green)

------
paraschopra
Dmix, I must congratulate you on your success! Plus your site design is very
professional. Great job.

I noticed on your homepage you are still using 'Sign up for free' (at the
bottom). Any specific reason for that?

~~~
dmix
About 75% of our signups come from the homepage CTA so I only bothered to test
that one. Also I wasn't sure if A/Bingo let you run the same test in multiple
places.

I'm working on replacing the green footer call to actions today with the get
started now.

------
moolave
Congratulations! I like empirical inputs like these. I also read from one of
Kissmetrics' articles that using the phrase "It's Free" also increases
conversion rate.

~~~
AndyKelley
I've become suspicious about the word "free" on any website. It almost has the
opposite effect on me as is intended. I wonder if this is true for other
people?

~~~
maushu
You aren't the only one. I can guess that in a few years the 'free' effect
will decrease (but not reverse since there are always new users showing up).

------
GrandMasterBirt
Thanks for the info.

Question though, how do you get these measurements? Out of X people who visit
the website who don't log on, how many will sign up vs not?

~~~
patio11
<http://www.bingocardcreator.com/abingo> , apparently.

The brief version is that you cookie each visitor with a random unique
identifier. Each identifier maps to one of the versions under testing in a
durable fashion. The first time you see an identifier for a particular test,
you increment the participant count for that appropriate version. When a
conversion -- here a signup -- happens, you look at the identifier, check what
version they saw, and increment the conversion counter. From then it is just a
math problem. So when you see a conversion rate like 24%, that implies that
24% of people who viewed a page the test was active for signed up prior to
becoming lost to the system. (By, for example, leaving forever, clearing
cookies, etc.)

Although one could theoretically exclude folks who log in later from the
participant count, I think that is a poor use of your time for most people.
Existing site users will be split across all alternatives evenly, so their
failure to sign up for the site affects all alternatives easily. Since A/B
testing doesn't really care about the exact value of the conversion rates and
focuses on the differences between them, that comes out in the wash. (Plus,
for many services, first time visitors swamp existing users of the service, so
even if you were worried about distortion it would be minimal.)

