
Redesign: Users: Thrilled. Conversion Rates: Up. Sales: Unchanged.  - wglb
http://www.kalzumeus.com/2012/04/19/ab-testing-is-frustrating/
======
patio11
Happy to answer questions, as always.

By the way, roughly 3/4 of A/B tests I participate in (my own and for clients)
fail to improve business results. This compares with approximately 2% of A/B
tests I've ever seen with in-depth blog posts, so I tried to redress the
balance here.

~~~
maukdaddy
How do you break that news to clients after they've paid $$$ for your
services? Do you heavily caveat the potential before the engagement?

~~~
patio11
That's surprisingly less of a problem than I naively thought it would be. I'm
upfront with clients prior to, during, and after engagements, and I try to set
expectations accurately. That said, for the overwhelming majority of my
clients, if e.g. the entire budget for my wedding was spent on a project whose
final conclusion was "Experimental results: we now know one more thing that
didn't work" that would not be a catastrophic loss.

Businesses routinely spend $$$, $$$$$, and $$$$$$$$$ on things that don't end
up working out the way they were planned. They're pretty much OK with that,
since they run "portfolio strategies" in terms of directions. Nobody is
betting the company on a two-week engagement with me, any more than they bet
the company on any particular man-month of engineering time.

n.b. This is partially influenced by being picky about clients and working
with people who are clueful and have budget. This is one reason why if a
startup with $X0,000 in the bank comes to me and asks to roll the dice on an
engagement I give them a link to my blog, some suggestions, and invite them to
come back when we're a better positioned to help each other out.

~~~
adrianhoward
I suspect that people hire you not to do "a successful experiment" - but so
that they can learn _how_ to experiment in productive ways.

It's a skill that seems sadly lacking with many of the folk that I talk to :-)

------
csomar
I'm not sure if I have high expectations, became too damn good at it, or this
is just a bad designer.

So take this as a constructive criticism and I have nothing against the
designer, nor I know him.

I'll begin with the HTML/CSS parts. He is a Web Designer, and not simply a
designer; so he should use the best practices

1- Use the HTML5 Doctype

2- Removes unnecessary space (empty lines and spaces). Why the extra bits?

3- JavaScript, I see that you are loading a good chunk of JavaScript in the
HEAD and you are loading a bunch of it in the end. For example the Amazon JS
file
([http://s3.amazonaws.com/new.cetrk.com/pages/scripts/0004/737...](http://s3.amazonaws.com/new.cetrk.com/pages/scripts/0004/7378.js))
is certainly not usable, and is yet a whole HTTP request.

4- In the JavaScript file
([http://images2.bingocardcreator.com/javascripts/bcc-
all.js?1...](http://images2.bingocardcreator.com/javascripts/bcc-
all.js?1134712726044062)), you are loading a ColourPicker and other unused
stuff. This is a waste of bandwidth, latency, memory and speed. You don't need
JavaScript in your main page, apart from the analytic and maybe A/B testing
stuff.

5- Unobtrusive JavaScript. (line 181 of the HTML)

Enough, though there are endless problems with the coding part so don't take
this list as an exhaustive review. For the design, it just sucks. I agree that
simple designs (K.I.S.S) are better, but they should be crafted. I won't
complain much, but here are two snapshots of what I'm talking about

1- <http://dl.dropbox.com/u/2777218/shot1.png>

2- <http://dl.dropbox.com/u/2777218/shot2.png>

Again, take this as a constructive feedback and I have nothing against your
business; and what matters finally are the sales/$$$.

Take a look at <http://www.premiumpixels.com/> if you want to see some
carefully crafted designs.

~~~
tptacek
Patrick bootstrapped himself out of his previous full-time programming job,
replacing his salary and then some, in single-digit hours _per week_ , using
the OLD DESIGN.

You gave a numbered list of good suggestions and wrote the kernel of a
detailed, thoughtful comment, but sabotaged it with poor framing and
assumptions. I don't care or anything, but for your benefit: this is a nerdly
inclination that will serve you very, very poorly in your professional career.
HN often has the opposite problem from your comment (a bias towards being
"right" even when we're not "correct", such as in civil liberties stories);
here, you're "correct" but not "right", since adoption of all your
recommendations is unlikely to change the bottom line for Patrick at all.

One consultant to (I presume) another: my suggestion for next time: either:

(a) (expensively) do exhaustive research so you can frame your high-level
assessment and recommendations in the context of someone's business (ie,
expend the effort to make sure you're both correct _and_ right), _or_

(b) (much simpler) develop a habit of writing your comments in a neutral,
helpful tone, so that you can be correct without having to be right at all.

By the way, I feel comfortable writing this comment because I have _oh my God_
exactly the same problem with my comments. Look at me on a crypto thread
sometime.

~~~
csomar
I apologize from Patrick and the readers if my comment made more harm than
good. That wasn't my intent. Sometimes I'm a bit of an "asshole", and I have
to acknowledge that. I thank you for taking the time to correct that side of
me.

~~~
tptacek
Getting the tone of a message board comment doesn't make you an asshole.

I hope.

------
InfinityX0
Something fundamentally missing from both versions - trust signals. Why does a
user want to trust this software? You go to <http://www.seomoz.org/>, you see
Zillow, Home Depot, Yelp and etc "love their software". Same with 37signals -
WB and Kellogg's are using Basecamp? <http://37signals.com/> Why isn't my
company?

It is not explicity apparent that anyone loves this software, especially
nobody they know or have heard of, so why should they use it? It doesn't have
to be who uses it, but could also be "featured on" if you (Patio11) got
coverage and co-linked to the service there as well, which I bet is more
prominent. Of course, families don't care about Techcrunch - relevant news is
needed.

~~~
godDLL
I think that's important to people that want to be in good company. Not when
futzing around with your Macintosh, trying to put together 30+ bingo cards for
tomorrow's class.

EDIT: removed an article to clarify

------
hop
Honestly I think the site is really ugly - the color scheme, all the same size
text, '96 aesthetic... it looks like a ghetto SEO trap website. Please don't
take offense, thats my snap judgement of seeing it. What if you tried letting
a designer do something on par with your payment processor Stripe.com -
picture those cards on the right as bingo cards, good simple headline, call to
action.

~~~
ashraful
I hope you're talking about the old design.

The site shows the old version to some visitors. You can click the link in the
footer to get the new redesign.

~~~
sim0n
Both designs are pretty bad unfortunately.

~~~
sunkencity
I agree. Old site, doesn't look that good, but benefits from old school
'mom&pop web design' look that might or might not inspire trust in buyers.

New site has less character and is more bland. I cannot get over the cards and
the mac. Why is background in screen same as outside - totally breaks the
message of software in a computer, make it black at least, so that the image
can be read more easily, now the computer looks like an open frame. Or show a
stylized screenshot of the software. And why does the cards look so boring?
Add some shadows or something. Why are links underlined in menu. Why is try
now more prominent than buy now? New design need to get it's priorities
straight!

I would like to see an A/B test on whether changing some of the fonts to
ComicSans would increase conversion. I think that for this audience it might.

~~~
silvestrov
Ditto. The new design looks too bland, like a gazillion other 'designed'
sites. Keeping the mom&pop look is important.

I'd keep the old layout, but fix the colors. If you get a designer to improve
the colors and the gradients so they are less Windows 95-ish, it would be much
better.

~~~
philh
Keeping the mom&pop look is important for what?

~~~
sunkencity
Trust. The mom&pop look properly executed is the perfect carrier to convey
honesty and earnestness.

------
jseims
I own a subscription-based online business, and I've done many A/B tests of
new designs, and my conclusions is pretty similar to yours.

Namely, it's always _way_ more work than you expect, especially when you have
legacy customers with prior expectations.

And results hardly ever budge.

My hypothesis is A/B testing can move the needle for light engagement, like
"try a free trial". But pulling out a credit card requires a lot of
motivation, and the 0.1% of your visitors that have this motivation are
relatively unaffected by your design.

~~~
patio11
Any particular change (like this redesign) might not move the needle, but
there exists SUBSTANTIAL evidence that A/B testing can help the bottom line of
the company. Read e.g. any of the multiple times the 37 Signals blog covered
how they got like 40% lifts to paid account signups by redoing a pricing or
landing page.

Do you find me credible on this topic? I'm legally, morally, and practically
constrained regarding how much detail I can go into here, but if you trust me:
I have recent experience which makes me disagree in the _strongest possible
way_ with the generalization of your hypothesis. A recent client CCs me on
their weekly Optimizely status report. I've been evangelizing A/B testing for
a few years. Every time I read one of those emails my mind gets blown again.

~~~
femto
Are you aware of anyone experimenting with the automatic evolution of web
pages?

For example, use genetic programming to periodically modify the page served
up. Both pages are served up, A/B style, and the page which has the highest
metric (eg. $ earned) survives. Repeat.

It would be interesting to see what the results are after an extended period
of time.

~~~
Estragon
Not quite what you're asking about, but an interesting related innovation:

[http://untyped.com/untyping/2011/02/11/stop-ab-testing-
and-m...](http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-
like-a-bandit/)

------
mcfunley
You don't have enough data to really draw any conclusions. In the last step in
the funnel, the completion rate with the redesign is between 94-97% with 95%
confidence. With the old design it's 92-96% with 95% confidence. Those regions
overlap and the designs might be the same or they might be different.

------
smattiso
In the past you have talked about just using off the shelf themes from
ThemeForest and tweaking those. Is there a reason you thought BingoCardCreator
needed a truly unique design?

I'm building a couple sites at the moment and am having them designed from
scratch but I'm wondering how you determine the cost/benefit of tweaking
something off the shelf versus building your own?

~~~
patio11
BCC doesn't _need_ to look very good -- it sold $200k of software on the backs
of either a template or the uncoordinated mishmash of the previous designer's
work being slowly butchered by myself. I just was willing to experiment with
whether turning up the pretty a few notches would move the needle.

As to cost/benefits: I'm at a point in my life/business where "Spend a week
hacking a template to save < $2,000" is an _astoundingly_ poor use of my time.

------
ekanes
Thanks for sharing. Just wanted to add that if people's subjective perceptions
of BCC (modern! clean!) are improved, this may improve (admittedly hard to
measure) word of mouth. Keep it up!

~~~
wildwood
Actually, I wonder if a simple email referral template might help with nicely
measurable word-of-mouth. Plenty of parents will have their kids' teachers'
email address, and could shoot them a targetted recommendation with a little
help.

~~~
patio11
I tried giving people a send-to-friend option with a double-sided referral
incentive. (i.e. Think like Dropbox: you and they both get something if they
sign up.) It was epically unmotivational: only ~1,300 people clicked through
to send an invite to anyone (of > 50k users invited to), less than 200 people
signed up as a result of invites, and a grand total of _two_ ended up buying
BCC.

------
stevenj
Personally, I think the landing page should simply be this (minus the
"Featured Bingo Activities" heading and half-horizonal line on the bottom):
<http://imgur.com/GyRu8>

and then add the rest of the info that's currently below this part into an
additional tab up top, perhaps called: "Info" (and it should be the first tab
-- furthest to the left).

------
kiba
VLC is the media player that can play anything!

Well, when it couldn't play something, it was very memorable to me.

------
wildwood
Thanks for writing this, these are some cool numbers to poke at.

You mention in the write-up that you've already done a lot of work to get your
User Success percentage high. Is this the first time that you've seen a
disconnect between User Success improvements and sales increases? Or did you
pick this more as a useful, dramatic example?

------
pbreit
Bingo card creation is one of the few apps where I would probably avoid a
"simple, elegant, modern" design. I would expect more of a cartoonish, game-y
design.

------
MarkMc
Great article, Patrick, but I'm a little concerned about this line: "I will
likely finalize the redesign and kill the old version in the coming weeks."

Please, please tell me that if you kill the old version you will have the
numbers to show that that the old one is not, say, at least 5% better than the
new one. At the moment you seem to be 'leaning' towards the new version even
though sales from the old version are higher!

It doesn't matter how much time and money you spent on the new version, or how
much better it looks, or how much easier it is to use. What matters at the end
of the day is how much money it makes, so run the split test until you are
(statistically) confident that you are not throwing money away when you kill
one of the designs.

~~~
asr
Given that Patrick's tentative conclusion was that the new version makes a
similar amount of money, but allows more people to get value from his
software, leaving up the old design would seem a bit... hard-hearted.

~~~
MarkMc
The problem is that such a conclusion is _too_ tentative at the moment. It's
possible that the old version is 5% better than the new version. My guess is
that 5% of revenue over the next 10 years could be around $40,000 - that's a
lot of money to give up! Why not keep the split test running until you can be
statistically confident that the two versions make similar amounts of money?

------
bemmu
Part of it might be that people who really really need something are more
likely to convert even if a design is a bit ugly. By improving design you get
more of the less serious people to give it a try, but then they are not as
likely to convert.

------
johndevor
This is probably what Craigslist figured out a long time ago.

------
droithomme
Hi Patrick! Good article, thanks.

I don't see the new web page as an improved design though, the lack of
alignment between elements, weird open spaces and element sizing not being
appropriate to the layout are all evidence the designer was amateurish.

If interested, commentary on the page as annotation is available here:
<http://imgur.com/HsG6E>

(Fair disclosure: I don't sell such services and am not pimping, just
commenting.)

------
pestaa
I think it is too early to judge. I've never done A/B testing, but I think the
point is to do it continously; not for a week and decide it is stupid.

Let's say Patrick would continue this for 9 more weeks; so far we've seen 10%
of the whole test. Let me illustrate an edge case: after 10 weeks, the new
site has made 134 sales, the old one has 125. 13/13 after 1 week seems about
right, but 134/125 is more than 7% increase.

~~~
klbarry
A test of statistical significance can answer that question easily - I think
most optimizers use it.

~~~
adrianhoward
As well as the issue with the low numbers only showing a very significant
effect at this point, there's also the assumption that the redesign will act
immediately.

Some products don't have a "search, find and purchase immediately" pattern to
sales. Especially when you move out of the B2C market.

Some businesses sales can look more like "Visit half a dozen different sites.
Go away for a week and think. Visit best sites again. Go away for a few days
and come to a decision. Visit final option, browse and purchase".

Tracking these multiple visits can be non-trivial/impossible since it may be
different people and different browsers visiting the site at the different
stages. It also leads to long lead-times for the effects that design changes
make.

------
adrianhoward
One comment on the amount of time you've looked at the new redesign over. A
pattern I've noticed with A/B testing more radical redesigns is that there's
often a dip/level track for the first week or two - followed by another more
radical jump (in either direction :-) in the following month.

I'd be interested if you see something similar as the month progresses.

Also - a question not directly related to the new design - but I'm curious :-)

On either home page design there's no social proof info (testimonials, number
of users, total #bingo cards made, etc.). Which intrigues me since it's
something that pretty much always has a positive affect in my experience
(which, I admit, is largely in sites fairly different from BCC). In once case
we got a twenty-something% increase in conversion in the checkout process by
adding in some targeted quotes on value-received/money-saved on the final
"give me your money" pages.

This seems like such an obvious thing that you've probably tried it already.
Is there a reason you didn't go for it?

------
hrabago
Lesson learned in the opening paragraphs - unsolicited email still works, even
amongst the most internet-smart targets.

* with the right circumstances

------
bambax
> _and the before and after redesigns are very compatible at the DOM label_

level?

~~~
patio11
Thanks, fixed.

------
rbxbx
This is a wonderful example of "Working Code Wins". Keep in mind though that
often this complexity doesn't scale, for those of you on a team looking to
implement similar A/B test code ;)

Way to be scrappy, Patrick.

------
damoncali
Isn't the obvious next step to further restrict the freebies? I have a hard
time putting this in the "didn't matter" category.

------
dataminer
I would suggest adding some human faces to the design (e.g pictures of a
classroom, family playing bingo). Add a tutorial video on the front page
displaying how "easy" it is to use bingo card creator. Also remove the "try
now" button since you already have a 30 days return policy and see how it
goes.

------
underwater
Patrick, why chose 50% as the split for the new design? Were you concerned
with about confusing customers if the new design tested poorly and you
switched everyone back to the old one?

------
mmhd
You know, just because you have an immaculate Bingo generating website,
doesn't mean people's interest of making bingo cards will suddenly go up.

------
valladont
Thanks for sharing. There is some very useful and interesting information in
this article.

~~~
PauloPatricio
Hi! Sorry, but I didn't get how much - roughly - you paid for the design.

