Hacker News new | past | comments | ask | show | jobs | submit login
Redesign: Users: Thrilled. Conversion Rates: Up. Sales: Unchanged. (kalzumeus.com)
273 points by wglb on Apr 19, 2012 | hide | past | web | favorite | 107 comments

Happy to answer questions, as always.

By the way, roughly 3/4 of A/B tests I participate in (my own and for clients) fail to improve business results. This compares with approximately 2% of A/B tests I've ever seen with in-depth blog posts, so I tried to redress the balance here.

Some thing that you might want to look at are the number of free cards you are giving away. With many states having mandatory class size reduction class sizes are smaller and 15 would cover a lot of classes. My wife's class this year has been as small as 12 students and as large as 18 students. With 15 free cards she would just make the other three by hand. A lot of that is due to us already spending a large amount of money on her class due to budget deficiencies and salary freezes for all teachers for three years now.

Another issue you might be encountering is that states are adopting the common core standards http://www.corestandards.org/ and word bingo doesn't meet any of the standards for english language arts. Since everything in classrooms is documented, lesson plans are stored and in some cases handed over to the state there's not going to be a lot of use cases for activities that can't be applied to the standards. Even if those activities are just for fun due to a holiday.

I've experimented with 12 vs. 15 before. Null result. I might try pushing it down to 8 sometime. But, realistically speaking, only if it strikes my fancy some morning.

Ok, I freely admit that I know nothing about selling bingo cards.. But it "feels" to me like that number should be closer to 5. I was really surprised to see that you give away 15 cards for free. You could A/B test this too, right?

Another idea: start high and ratchet down with each subsequent word list?

How do you break that news to clients after they've paid $$$ for your services? Do you heavily caveat the potential before the engagement?

That's surprisingly less of a problem than I naively thought it would be. I'm upfront with clients prior to, during, and after engagements, and I try to set expectations accurately. That said, for the overwhelming majority of my clients, if e.g. the entire budget for my wedding was spent on a project whose final conclusion was "Experimental results: we now know one more thing that didn't work" that would not be a catastrophic loss.

Businesses routinely spend $$$, $$$$$, and $$$$$$$$$ on things that don't end up working out the way they were planned. They're pretty much OK with that, since they run "portfolio strategies" in terms of directions. Nobody is betting the company on a two-week engagement with me, any more than they bet the company on any particular man-month of engineering time.

n.b. This is partially influenced by being picky about clients and working with people who are clueful and have budget. This is one reason why if a startup with $X0,000 in the bank comes to me and asks to roll the dice on an engagement I give them a link to my blog, some suggestions, and invite them to come back when we're a better positioned to help each other out.

I suspect that people hire you not to do "a successful experiment" - but so that they can learn how to experiment in productive ways.

It's a skill that seems sadly lacking with many of the folk that I talk to :-)

The glass half empty is that there was no benefit to the redesign.

The glass half full is that you have a more visually pleasing, refreshed web site that is at least as effective as the old design. Patrick also notes that the users tend to associate visual changes with programatic improvements, even when there are no programatic improvements (placebo effect).

You just run more tests, such that 25% of tests run still turns out to make $$$$ at least.

Well, keep in mind that, with a significance threshold of 5%, one in twenty null-result tests will appear to have significance.

So just continuing to test until you get a result is not necessarily going to net you anything.

One potential upside is that, as far as the numbers are concerned, the client is free to use whichever version they want.

You recommend Stripe in a throwaway line in the post - it sounds like your experience with them has been really positive! I'm working on a project where I may end up taking people's money for services on a subscription basis. I've heard that Stripe is better for one-time charges than for re-occuring charges. Is that true in your experience, as someone who's done work with Stripe?

Stripe is a dream come true for developers. I use Spreedly (backed by Paypal) on Appointment Reminder to do subscription management, and am very happy with it, too. I hear you can use them together now, which sounds like the PB&J of money if you ask me.

P.S. I also don't hate me love Paypal, too.

As a user I love paypal because it is really, really easy to end a subscription and there is nothing, whatsoever, the merchant can do about it.

Does Paypal still have the restriction that if you use them for credit card processing, you must also offer a Paypal checkout? Offering Appointment Reminder though standard PayPal seems like a giant additional headache.

I'd never heard of Spreedly! Thank you for pointing that out: it makes this project just the tiniest bit more likely to replace my day job. :)

Maybe you covered this before and I forgot, but how did you arrive at the number you use for the max cards creatable with a free account? I get the 'more than the # of kids someone would have in a family' but 15 seems... high.

You're on the right track, but it's more than the number of kids in a classroom — not family. He mentioned it in a podcast somewhere.

Yep, most A/B tests won't give results. But the ones that do are worth it (provided they were quick and easy to setup). There's always an opportunity and design cost involved with A/B testing, but it is sort of ironical that you can't be judgmental about which A/B tests to run, because if you already know which ones are going to work, why A/B test in the first place?

Do you have any sort of logical explanation or intuition about your stated result - that 75 percent of tests do not improve business results?

It seems to suggest that design just has to be good enough. Below some set threshold for visually appeal and/or usability the design hinders money-making. However, above that threshold, further improvement isn't necessary? If this is a reasonable interpretation, that is actually quite sad and suggests that all the new design-centered startups are wasting a lot of money.

You mentioned that sales were strongly up for reasons unrelated to the post. What are those reasons?

Short version: traffic way the heck up, probably through a combination of Google algorithm changes, just natural growth of the Internet among people who matter to my business, and organic growth.

No questions here, just a small niggle: "Fewer users", not "less".

Actually, that's a widespread misconception. See here, for example: http://itre.cis.upenn.edu/~myl/languagelog/archives/003775.h...

That's a linguist's take on it - you can find many more by googling [fewer less languagelog] or similar.

That mainly refers to units of time, amounts, etc. "Less than 5 users", I can maybe accept. "Less users", no.

You sell the bingo card creator 5000 times a month? Never expected that bingo is such a big deal. Congratz! :)

Not sure where you're getting that number from but it is not accurate. http://www.bingocardcreator.com/stats/sales-by-month Units on the graph are USD.

Curiously, 26 conversions (@ $29) in 7 days seems low compared to your March figures - naively dividing out I would have expected 39 sales in 7 days...

Sorry, didn't see that the charts are in USD there. Anyway, thank you for sharing your insights with us.

I'm not sure if I have high expectations, became too damn good at it, or this is just a bad designer.

So take this as a constructive criticism and I have nothing against the designer, nor I know him.

I'll begin with the HTML/CSS parts. He is a Web Designer, and not simply a designer; so he should use the best practices

1- Use the HTML5 Doctype

2- Removes unnecessary space (empty lines and spaces). Why the extra bits?

3- JavaScript, I see that you are loading a good chunk of JavaScript in the HEAD and you are loading a bunch of it in the end. For example the Amazon JS file (http://s3.amazonaws.com/new.cetrk.com/pages/scripts/0004/737...) is certainly not usable, and is yet a whole HTTP request.

4- In the JavaScript file (http://images2.bingocardcreator.com/javascripts/bcc-all.js?1...), you are loading a ColourPicker and other unused stuff. This is a waste of bandwidth, latency, memory and speed. You don't need JavaScript in your main page, apart from the analytic and maybe A/B testing stuff.

5- Unobtrusive JavaScript. (line 181 of the HTML)

Enough, though there are endless problems with the coding part so don't take this list as an exhaustive review. For the design, it just sucks. I agree that simple designs (K.I.S.S) are better, but they should be crafted. I won't complain much, but here are two snapshots of what I'm talking about

1- http://dl.dropbox.com/u/2777218/shot1.png

2- http://dl.dropbox.com/u/2777218/shot2.png

Again, take this as a constructive feedback and I have nothing against your business; and what matters finally are the sales/$$$.

Take a look at http://www.premiumpixels.com/ if you want to see some carefully crafted designs.

Patrick bootstrapped himself out of his previous full-time programming job, replacing his salary and then some, in single-digit hours per week, using the OLD DESIGN.

You gave a numbered list of good suggestions and wrote the kernel of a detailed, thoughtful comment, but sabotaged it with poor framing and assumptions. I don't care or anything, but for your benefit: this is a nerdly inclination that will serve you very, very poorly in your professional career. HN often has the opposite problem from your comment (a bias towards being "right" even when we're not "correct", such as in civil liberties stories); here, you're "correct" but not "right", since adoption of all your recommendations is unlikely to change the bottom line for Patrick at all.

One consultant to (I presume) another: my suggestion for next time: either:

(a) (expensively) do exhaustive research so you can frame your high-level assessment and recommendations in the context of someone's business (ie, expend the effort to make sure you're both correct and right), or

(b) (much simpler) develop a habit of writing your comments in a neutral, helpful tone, so that you can be correct without having to be right at all.

By the way, I feel comfortable writing this comment because I have oh my God exactly the same problem with my comments. Look at me on a crypto thread sometime.

I apologize from Patrick and the readers if my comment made more harm than good. That wasn't my intent. Sometimes I'm a bit of an "asshole", and I have to acknowledge that. I thank you for taking the time to correct that side of me.

Getting the tone of a message board comment doesn't make you an asshole.

I hope.

to your point, the problem isn't so much pointing out small design errors, but it's being a dick when pointing out small design errors. the design may not be pixel perfect but it basically prints money with no effort. it doesn't suck when it comes to what actually matters, and that is making sales.

Could you elaborate on what you mean by "correct without having to be write" Within the context of the parent post?

Is this a matter of the problems pointed out are "correct" technically, but they may or may not be the "right" thing to spend time on?

Just in case the parent doesn't respond

"Is this a matter of the problems pointed out are "correct" technically, but they may or may not be the "right" thing to spend time on?"

I'm pretty sure you've already got it, this sounds like it. I would only add that a conscientious person checks whether their comment is adding to the conversation (not anxiously; just subconsciously, as a habit). The parent to your comment is saying that if you're 'correct' you're in the clear that way.

I think the thing about being "right" is that if you're right (i.e. your addressee is wrong) you have enough social capital to take that tone with somebody. If you're wrong and you take up an 'I'm right' tone, you come off like you don't care about thinking things through before you make an investment of your time.

He's "correct" in that all his recommendations seem valid. Ceteris paribus, Patrick should adopt them.

He's not "right", though, because adopting any of these recommendations would cost anywhere from tens- to thousands- of dollars, and would be unlikely in the aggregate, even if all of them were adopted, to make Patrick one additional dollar.

My designer is totally blameless for any implementation infelicities: I have ultimate control over that and hack the heck out of his HTML/CSS before it hits my pages.

Other parts are the way they are simply because, well, we have very different standards for things which should delay shipping. Bits are cheap, my time is expensive, optimizing for bits saved (on my cute little re-demoted-to-hobby project, no less) seems like a poor idea.

Similarly, failure to achieve pixel perfection of the price display in Australian dollars is not exactly the kind of thing that would scare me away from a design. You know what would terrify me? "It looks nice -- women generally hate it." That would cause the business to fold like an origami crane. Happily, it doesn't seem to have that problem.

I'm generally happy to get feedback, even negative feedback, but as one professional to another your feedback doesn't leave me with the impression "Tie a string around my finger to address this later" because it does not persuasively implicate anything I really care about.

One could argue that it affects your Google organic search traffic. The better your HTML, and lighter your Javascript, the more likely the mighty Google will send people to you.

Patrick, you're the expert here on Google traffic direction: are any of these changes likely to impact your ranking?

There might be something to this. The site is #6 for "bingo cards" search on google for me.

For whatever it's worth: Patrick's original strategy for ranking (and perhaps still the strategy that accounts for most of his revenue) isn't the Google search [bingo cards]. It's 1000 closely related but technologically distinct searches.

I heard that alot, but haven't actually seen that proven yet.

It's a very good excuse to force people on improving their HTML and website speed, but I still would like to see the impact of the Google organic results after only speed improvement and bad HTML fixes.

On a tangential note: Though the different alignment of the AUD amount wouldn't even register on my radar - I'm 99% sure the exchange rate you're using _will_ be significant for most of your target market. The AUD has been trading above the USD for many months now - and _everybody_ here who buys _anything_ online knows that.


It might be worth revisiting those non-USD prices...

Not sure if this is the case or not, but certain templating systems make it really hard to remove extra whitespace/lines (I'm looking at you Jinja2) and they get added when you use standard templating stuff like "if/then" statements.

Removing them after the fact with a HTML tidier would work, or you could just not care and let gzip handle the compression.

> Not sure if this is the case or not, but certain templating systems make it really hard to remove extra whitespace/lines (I'm looking at you Jinja2)

Jinja2 is a generic templating engine, and as such can't make assumptions about relevance of whitespace.

However, writing an extension on top of it isn't hard, and mitsuhiko has written an extension to do that.


Nice! Going to integrate this into my blogofile setup so that I can output nicer HTML!

> 2- Removes unnecessary space (empty lines and spaces). Why the extra bits?

The extra whitespaces are most probably the template output. If he is using erb, erb is a generic template and doesn't make assumptions about relevance of whitespace. If you are optimizing by hand, you will still have all the newlines and indents in the output.

Some people use minifiers, most people don't care. Almost all servers and clients speak gzip, and the extra overhead is so low on the list of things to optimize that it probably isn't even on the list.

> 3- JavaScript, I see that you are loading a good chunk of JavaScript in the HEAD and you are loading a bunch of it in the end. For example the Amazon JS file (http://s3.amazonaws.com/new.cetrk.com/pages/scripts/0004/737...) is certainly not usable, and is yet a whole HTTP request.

Again, I am taking a guess. layout.html.erb yields under body, and head and scripts are same for all pages. Now patio11 can hand include just the needed scripts on every page, but he chose to include all scripts on all pages. This isn't something the designer controls.

> 2- http://dl.dropbox.com/u/2777218/shot2.png

That menu change really looks weird.

Maybe Patrick could do a A/B test on the new version of the site vs. the highly optimized version of the new site (do even more than you suggested, there is a lot more possible if you start searching) and see if it makes any difference. That could be really interesting.

I doubt he has the time and 'itch' right now, though.

Something fundamentally missing from both versions - trust signals. Why does a user want to trust this software? You go to http://www.seomoz.org/, you see Zillow, Home Depot, Yelp and etc "love their software". Same with 37signals - WB and Kellogg's are using Basecamp? http://37signals.com/ Why isn't my company?

It is not explicity apparent that anyone loves this software, especially nobody they know or have heard of, so why should they use it? It doesn't have to be who uses it, but could also be "featured on" if you (Patio11) got coverage and co-linked to the service there as well, which I bet is more prominent. Of course, families don't care about Techcrunch - relevant news is needed.

I think that's important to people that want to be in good company. Not when futzing around with your Macintosh, trying to put together 30+ bingo cards for tomorrow's class.

EDIT: removed an article to clarify

Honestly I think the site is really ugly - the color scheme, all the same size text, '96 aesthetic... it looks like a ghetto SEO trap website. Please don't take offense, thats my snap judgement of seeing it. What if you tried letting a designer do something on par with your payment processor Stripe.com - picture those cards on the right as bingo cards, good simple headline, call to action.

I hope you're talking about the old design.

The site shows the old version to some visitors. You can click the link in the footer to get the new redesign.

Ahh you are right - I was A/B sent to old site. Reloaded to the newer one - cleaner, nicer aesthetic, still could use some Armor All.

I'm pretty sure he was. Though it does sound like he is saying the new design is cool.

Both designs are pretty bad unfortunately.

I agree. Old site, doesn't look that good, but benefits from old school 'mom&pop web design' look that might or might not inspire trust in buyers.

New site has less character and is more bland. I cannot get over the cards and the mac. Why is background in screen same as outside - totally breaks the message of software in a computer, make it black at least, so that the image can be read more easily, now the computer looks like an open frame. Or show a stylized screenshot of the software. And why does the cards look so boring? Add some shadows or something. Why are links underlined in menu. Why is try now more prominent than buy now? New design need to get it's priorities straight!

I would like to see an A/B test on whether changing some of the fonts to ComicSans would increase conversion. I think that for this audience it might.

Ditto. The new design looks too bland, like a gazillion other 'designed' sites. Keeping the mom&pop look is important.

I'd keep the old layout, but fix the colors. If you get a designer to improve the colors and the gradients so they are less Windows 95-ish, it would be much better.

Keeping the mom&pop look is important for what?

Trust. The mom&pop look properly executed is the perfect carrier to convey honesty and earnestness.

This comment is clownish, because both designs appear to make asymptotically close to the most money you can make on the Internet with a topic-specific bingo card generator (which is a surprisingly high amount of money).

I'm not sticking up for Patrick (he doesn't need my help, and I probably make him look worse), so much as I am repeatedly trying to talk unfunded startups out of wasting time and opportunity chasing cool-kid aesthetics which will have nothing to do with their eventual business success.

Come now. If you are going to randomly put down people, at least point out why you think so.

Oh come on. It is arrogant, unhelpful comments like this that make the tech community look like a bunch of pricks.

Criticism is supposed to be helpful and offer direction.

I own a subscription-based online business, and I've done many A/B tests of new designs, and my conclusions is pretty similar to yours.

Namely, it's always way more work than you expect, especially when you have legacy customers with prior expectations.

And results hardly ever budge.

My hypothesis is A/B testing can move the needle for light engagement, like "try a free trial". But pulling out a credit card requires a lot of motivation, and the 0.1% of your visitors that have this motivation are relatively unaffected by your design.

Any particular change (like this redesign) might not move the needle, but there exists SUBSTANTIAL evidence that A/B testing can help the bottom line of the company. Read e.g. any of the multiple times the 37 Signals blog covered how they got like 40% lifts to paid account signups by redoing a pricing or landing page.

Do you find me credible on this topic? I'm legally, morally, and practically constrained regarding how much detail I can go into here, but if you trust me: I have recent experience which makes me disagree in the strongest possible way with the generalization of your hypothesis. A recent client CCs me on their weekly Optimizely status report. I've been evangelizing A/B testing for a few years. Every time I read one of those emails my mind gets blown again.

Are you aware of anyone experimenting with the automatic evolution of web pages?

For example, use genetic programming to periodically modify the page served up. Both pages are served up, A/B style, and the page which has the highest metric (eg. $ earned) survives. Repeat.

It would be interesting to see what the results are after an extended period of time.

Not quite what you're asking about, but an interesting related innovation:


Having played with genetic algorithms before, that is a hugely interesting idea to me. You would need to take special care on the types of mutations used though. I don't think it is possible to do it well at scale, but it would be very impressive nonetheless.

I'm sure others have achieved significant gains... I just haven't experienced them, and it sounds like you haven't, either.

What I found does move the needle are: 1) Changing how much I charge, including adding upsells, and 2) Getting new sources of traffic.

Sounds like you havent either....

That's like telling the director of the FBI he knows nothing about law enforcement because he wont reveal facts that he has a professional obligation to not reveal.

But wouldn't there be big gains initially and then, as the easy fruit was plucked, the changes would give less and less and the successful tests would become fewer and fewer?

Likely yes. But what matters is return on time in contrast to any of the other things you could be doing. "always work on the most important thing". Even small percentage point improvements in conversion go directly to the bottom line, which is far from obviously the case when fixing bugs or adding features. And if your bottom line is already pretty big, the marginal value of someone working on optimization is likely to continue to be high, regardless of how plucked the fruit seems to be.

My hypothesis is A/B testing can move the needle for light engagement, like "try a free trial". But pulling out a credit card requires a lot of motivation, and the 0.1% of your visitors that have this motivation are relatively unaffected by your design.

That's not true in my experience. I've seen some big increases in the percentages that complete the checkout process through an iterative process of redesign/test for example.

The motivation behind a user purchase can be very different in different situations. If Mary is purchasing a new gearbox for the car, and she's already done a comparison shop, then she's likely to finish the purchase as long as it's vaguely sane. If Bob is impulse buying that new laptop bag he saw on swissmiss then every single grain of friction between Bob and the final "purchase" button is going to make it more likely that he remembers he really needs that $80 for groceries this week.

At some point you're going to reach a maximum of course and only see small or no increases - but, with my clients anyway, folk seem to be quite a long way from that in many instances :-)

You don't have enough data to really draw any conclusions. In the last step in the funnel, the completion rate with the redesign is between 94-97% with 95% confidence. With the old design it's 92-96% with 95% confidence. Those regions overlap and the designs might be the same or they might be different.

In the past you have talked about just using off the shelf themes from ThemeForest and tweaking those. Is there a reason you thought BingoCardCreator needed a truly unique design?

I'm building a couple sites at the moment and am having them designed from scratch but I'm wondering how you determine the cost/benefit of tweaking something off the shelf versus building your own?

BCC doesn't need to look very good -- it sold $200k of software on the backs of either a template or the uncoordinated mishmash of the previous designer's work being slowly butchered by myself. I just was willing to experiment with whether turning up the pretty a few notches would move the needle.

As to cost/benefits: I'm at a point in my life/business where "Spend a week hacking a template to save < $2,000" is an astoundingly poor use of my time.

Thanks for sharing. Just wanted to add that if people's subjective perceptions of BCC (modern! clean!) are improved, this may improve (admittedly hard to measure) word of mouth. Keep it up!

Actually, I wonder if a simple email referral template might help with nicely measurable word-of-mouth. Plenty of parents will have their kids' teachers' email address, and could shoot them a targetted recommendation with a little help.

I tried giving people a send-to-friend option with a double-sided referral incentive. (i.e. Think like Dropbox: you and they both get something if they sign up.) It was epically unmotivational: only ~1,300 people clicked through to send an invite to anyone (of > 50k users invited to), less than 200 people signed up as a result of invites, and a grand total of two ended up buying BCC.

Personally, I think the landing page should simply be this (minus the "Featured Bingo Activities" heading and half-horizonal line on the bottom): http://imgur.com/GyRu8

and then add the rest of the info that's currently below this part into an additional tab up top, perhaps called: "Info" (and it should be the first tab -- furthest to the left).

VLC is the media player that can play anything!

Well, when it couldn't play something, it was very memorable to me.

Thanks for writing this, these are some cool numbers to poke at.

You mention in the write-up that you've already done a lot of work to get your User Success percentage high. Is this the first time that you've seen a disconnect between User Success improvements and sales increases? Or did you pick this more as a useful, dramatic example?

Bingo card creation is one of the few apps where I would probably avoid a "simple, elegant, modern" design. I would expect more of a cartoonish, game-y design.

Great article, Patrick, but I'm a little concerned about this line: "I will likely finalize the redesign and kill the old version in the coming weeks."

Please, please tell me that if you kill the old version you will have the numbers to show that that the old one is not, say, at least 5% better than the new one. At the moment you seem to be 'leaning' towards the new version even though sales from the old version are higher!

It doesn't matter how much time and money you spent on the new version, or how much better it looks, or how much easier it is to use. What matters at the end of the day is how much money it makes, so run the split test until you are (statistically) confident that you are not throwing money away when you kill one of the designs.

Given that Patrick's tentative conclusion was that the new version makes a similar amount of money, but allows more people to get value from his software, leaving up the old design would seem a bit... hard-hearted.

The problem is that such a conclusion is too tentative at the moment. It's possible that the old version is 5% better than the new version. My guess is that 5% of revenue over the next 10 years could be around $40,000 - that's a lot of money to give up! Why not keep the split test running until you can be statistically confident that the two versions make similar amounts of money?

Part of it might be that people who really really need something are more likely to convert even if a design is a bit ugly. By improving design you get more of the less serious people to give it a try, but then they are not as likely to convert.

This is probably what Craigslist figured out a long time ago.

Hi Patrick! Good article, thanks.

I don't see the new web page as an improved design though, the lack of alignment between elements, weird open spaces and element sizing not being appropriate to the layout are all evidence the designer was amateurish.

If interested, commentary on the page as annotation is available here: http://imgur.com/HsG6E

(Fair disclosure: I don't sell such services and am not pimping, just commenting.)

I think it is too early to judge. I've never done A/B testing, but I think the point is to do it continously; not for a week and decide it is stupid.

Let's say Patrick would continue this for 9 more weeks; so far we've seen 10% of the whole test. Let me illustrate an edge case: after 10 weeks, the new site has made 134 sales, the old one has 125. 13/13 after 1 week seems about right, but 134/125 is more than 7% increase.

Imagine a PHB-engineer conversation something like "What are you working on?" "The jQuery plugin we're using doesn't have full support for the latest Chrome nightlies so I'm trying to write some Javascript to achieve the same effect." "Why are you wasting time? I don't know all that much about jQuery, but you can use an iPad to do it."

We can all appreciate that that sort of conversation is pretty exasperating for the engineer, but if he were a nice guy, he'd want to try to explain at least enough engineering to the boss such that the boss understood why "Use an iPad" is not a compelling option there. But he might be disinclined to start that conversation at 2:40 in the morning because it would take a while.

It is 2:40 AM in Japan and improving your understanding of A/B testing would take a while. There exist many comprehensible beginner's guides to it on the Internet. If after reading them you still don't understand why you need two more pieces of data in your hypothetical and why it is not extraordinarily likely that the 7% increase you measure in it reflects an action change in user behavior, I will be happy to explain it to you some day when it is not 2:40 AM.

Some background:

A/B testing is founded on statistics. You take Option A and Option B and see which one achieves more Goal C.

But you can't just look at the percent difference and decide that Option B must be better! Look, it has a higher percent Goal C! But that could be due to chance, so A/B tests employ tests of statistical significance to determine whether the test results are _probably_ chance or _probably_ reflect a genuine causal increase in Goal C.

For example, if you flip a coin four times, and get heads three of those times, without a statistical significance test you might conclude heads is 3x as likely to appear as tails. We know that's wrong, though- each side on a coin has a 50% chance of appearing face-up for each flip.

The flaw in this experiment is that we tried to extrapolate a result from a very small set of data. A statistical significance test would take these results and say "we have a <very small percent> confidence level that heads is more likely to come up, and doesn't just come up more often by chance".

If we flipped the coin 10,000 times instead, you'd get something pretty close to 50% heads and 50% tails, and your significance test would return a high confidence level that those numbers are accurate.

Short story long, you need lots of datapoints to determine whether an A/B test result is chance or an actual difference, and the smaller the difference between how Option A and Option B perform, the more datapoints you need to be confident they're actually different. Patrick's numbers are so close together that he'd need far more than 300 sales to reach the gold-standard 95% confidence level that there's actually a difference.

A test of statistical significance can answer that question easily - I think most optimizers use it.

As well as the issue with the low numbers only showing a very significant effect at this point, there's also the assumption that the redesign will act immediately.

Some products don't have a "search, find and purchase immediately" pattern to sales. Especially when you move out of the B2C market.

Some businesses sales can look more like "Visit half a dozen different sites. Go away for a week and think. Visit best sites again. Go away for a few days and come to a decision. Visit final option, browse and purchase".

Tracking these multiple visits can be non-trivial/impossible since it may be different people and different browsers visiting the site at the different stages. It also leads to long lead-times for the effects that design changes make.

With only 13 purchases on each side of the test, a test of statistical significance is only going to pick out very strong effects. Ones that bump conversion rates more than 10%, typically, which is a pretty huge change.

For more subtle effects, you need more observations, plain and simple; it doesn't matter how you do the math if you don't have the data to reach the significance levels you care about.

One comment on the amount of time you've looked at the new redesign over. A pattern I've noticed with A/B testing more radical redesigns is that there's often a dip/level track for the first week or two - followed by another more radical jump (in either direction :-) in the following month.

I'd be interested if you see something similar as the month progresses.

Also - a question not directly related to the new design - but I'm curious :-)

On either home page design there's no social proof info (testimonials, number of users, total #bingo cards made, etc.). Which intrigues me since it's something that pretty much always has a positive affect in my experience (which, I admit, is largely in sites fairly different from BCC). In once case we got a twenty-something% increase in conversion in the checkout process by adding in some targeted quotes on value-received/money-saved on the final "give me your money" pages.

This seems like such an obvious thing that you've probably tried it already. Is there a reason you didn't go for it?

Lesson learned in the opening paragraphs - unsolicited email still works, even amongst the most internet-smart targets.

* with the right circumstances

> and the before and after redesigns are very compatible at the DOM label


Thanks, fixed.

This is a wonderful example of "Working Code Wins". Keep in mind though that often this complexity doesn't scale, for those of you on a team looking to implement similar A/B test code ;)

Way to be scrappy, Patrick.

Isn't the obvious next step to further restrict the freebies? I have a hard time putting this in the "didn't matter" category.

I would suggest adding some human faces to the design (e.g pictures of a classroom, family playing bingo). Add a tutorial video on the front page displaying how "easy" it is to use bingo card creator. Also remove the "try now" button since you already have a 30 days return policy and see how it goes.

Patrick, why chose 50% as the split for the new design? Were you concerned with about confusing customers if the new design tested poorly and you switched everyone back to the old one?

You know, just because you have an immaculate Bingo generating website, doesn't mean people's interest of making bingo cards will suddenly go up.

Thanks for sharing. There is some very useful and interesting information in this article.

Hi! Sorry, but I didn't get how much - roughly - you paid for the design.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact