By the way, roughly 3/4 of A/B tests I participate in (my own and for clients) fail to improve business results. This compares with approximately 2% of A/B tests I've ever seen with in-depth blog posts, so I tried to redress the balance here.
Another issue you might be encountering is that states are adopting the common core standards http://www.corestandards.org/ and word bingo doesn't meet any of the standards for english language arts. Since everything in classrooms is documented, lesson plans are stored and in some cases handed over to the state there's not going to be a lot of use cases for activities that can't be applied to the standards. Even if those activities are just for fun due to a holiday.
Businesses routinely spend $$$, $$$$$, and $$$$$$$$$ on things that don't end up working out the way they were planned. They're pretty much OK with that, since they run "portfolio strategies" in terms of directions. Nobody is betting the company on a two-week engagement with me, any more than they bet the company on any particular man-month of engineering time.
n.b. This is partially influenced by being picky about clients and working with people who are clueful and have budget. This is one reason why if a startup with $X0,000 in the bank comes to me and asks to roll the dice on an engagement I give them a link to my blog, some suggestions, and invite them to come back when we're a better positioned to help each other out.
It's a skill that seems sadly lacking with many of the folk that I talk to :-)
The glass half full is that you have a more visually pleasing, refreshed web site that is at least as effective as the old design. Patrick also notes that the users tend to associate visual changes with programatic improvements, even when there are no programatic improvements (placebo effect).
So just continuing to test until you get a result is not necessarily going to net you anything.
P.S. I also don't hate me love Paypal, too.
It seems to suggest that design just has to be good enough. Below some set threshold for visually appeal and/or usability the design hinders money-making. However, above that threshold, further improvement isn't necessary? If this is a reasonable interpretation, that is actually quite sad and suggests that all the new design-centered startups are wasting a lot of money.
That's a linguist's take on it - you can find many more by googling [fewer less languagelog] or similar.
So take this as a constructive criticism and I have nothing against the designer, nor I know him.
I'll begin with the HTML/CSS parts. He is a Web Designer, and not simply a designer; so he should use the best practices
1- Use the HTML5 Doctype
2- Removes unnecessary space (empty lines and spaces). Why the extra bits?
Enough, though there are endless problems with the coding part so don't take this list as an exhaustive review. For the design, it just sucks. I agree that simple designs (K.I.S.S) are better, but they should be crafted. I won't complain much, but here are two snapshots of what I'm talking about
Again, take this as a constructive feedback and I have nothing against your business; and what matters finally are the sales/$$$.
Take a look at http://www.premiumpixels.com/ if you want to see some carefully crafted designs.
You gave a numbered list of good suggestions and wrote the kernel of a detailed, thoughtful comment, but sabotaged it with poor framing and assumptions. I don't care or anything, but for your benefit: this is a nerdly inclination that will serve you very, very poorly in your professional career. HN often has the opposite problem from your comment (a bias towards being "right" even when we're not "correct", such as in civil liberties stories); here, you're "correct" but not "right", since adoption of all your recommendations is unlikely to change the bottom line for Patrick at all.
One consultant to (I presume) another: my suggestion for next time: either:
(a) (expensively) do exhaustive research so you can frame your high-level assessment and recommendations in the context of someone's business (ie, expend the effort to make sure you're both correct and right), or
(b) (much simpler) develop a habit of writing your comments in a neutral, helpful tone, so that you can be correct without having to be right at all.
By the way, I feel comfortable writing this comment because I have oh my God exactly the same problem with my comments. Look at me on a crypto thread sometime.
Is this a matter of the problems pointed out are "correct" technically, but they may or may not be the "right" thing to spend time on?
"Is this a matter of the problems pointed out are "correct" technically, but they may or may not be the "right" thing to spend time on?"
I'm pretty sure you've already got it, this sounds like it. I would only add that a conscientious person checks whether their comment is adding to the conversation (not anxiously; just subconsciously, as a habit). The parent to your comment is saying that if you're 'correct' you're in the clear that way.
I think the thing about being "right" is that if you're right (i.e. your addressee is wrong) you have enough social capital to take that tone with somebody. If you're wrong and you take up an 'I'm right' tone, you come off like you don't care about thinking things through before you make an investment of your time.
He's not "right", though, because adopting any of these recommendations would cost anywhere from tens- to thousands- of dollars, and would be unlikely in the aggregate, even if all of them were adopted, to make Patrick one additional dollar.
Other parts are the way they are simply because, well, we have very different standards for things which should delay shipping. Bits are cheap, my time is expensive, optimizing for bits saved (on my cute little re-demoted-to-hobby project, no less) seems like a poor idea.
Similarly, failure to achieve pixel perfection of the price display in Australian dollars is not exactly the kind of thing that would scare me away from a design. You know what would terrify me? "It looks nice -- women generally hate it." That would cause the business to fold like an origami crane. Happily, it doesn't seem to have that problem.
I'm generally happy to get feedback, even negative feedback, but as one professional to another your feedback doesn't leave me with the impression "Tie a string around my finger to address this later" because it does not persuasively implicate anything I really care about.
It's a very good excuse to force people on improving their HTML and website speed, but I still would like to see the impact of the Google organic results after only speed improvement and bad HTML fixes.
It might be worth revisiting those non-USD prices...
Removing them after the fact with a HTML tidier would work, or you could just not care and let gzip handle the compression.
Jinja2 is a generic templating engine, and as such can't make assumptions about relevance of whitespace.
However, writing an extension on top of it isn't hard, and mitsuhiko has written an extension to do that.
The extra whitespaces are most probably the template output. If he is using erb, erb is a generic template and doesn't make assumptions about relevance of whitespace. If you are optimizing by hand, you will still have all the newlines and indents in the output.
Some people use minifiers, most people don't care. Almost all servers and clients speak gzip, and the extra overhead is so low on the list of things to optimize that it probably isn't even on the list.
Again, I am taking a guess. layout.html.erb yields under body, and head and scripts are same for all pages. Now patio11 can hand include just the needed scripts on every page, but he chose to include all scripts on all pages. This isn't something the designer controls.
> 2- http://dl.dropbox.com/u/2777218/shot2.png
That menu change really looks weird.
I doubt he has the time and 'itch' right now, though.
It is not explicity apparent that anyone loves this software, especially nobody they know or have heard of, so why should they use it? It doesn't have to be who uses it, but could also be "featured on" if you (Patio11) got coverage and co-linked to the service there as well, which I bet is more prominent. Of course, families don't care about Techcrunch - relevant news is needed.
EDIT: removed an article to clarify
The site shows the old version to some visitors. You can click the link in the footer to get the new redesign.
New site has less character and is more bland. I cannot get over the cards and the mac. Why is background in screen same as outside - totally breaks the message of software in a computer, make it black at least, so that the image can be read more easily, now the computer looks like an open frame. Or show a stylized screenshot of the software. And why does the cards look so boring? Add some shadows or something. Why are links underlined in menu. Why is try now more prominent than buy now? New design need to get it's priorities straight!
I would like to see an A/B test on whether changing some of the fonts to ComicSans would increase conversion. I think that for this audience it might.
I'd keep the old layout, but fix the colors. If you get a designer to improve the colors and the gradients so they are less Windows 95-ish, it would be much better.
I'm not sticking up for Patrick (he doesn't need my help, and I probably make him look worse), so much as I am repeatedly trying to talk unfunded startups out of wasting time and opportunity chasing cool-kid aesthetics which will have nothing to do with their eventual business success.
Criticism is supposed to be helpful and offer direction.
Namely, it's always way more work than you expect, especially when you have legacy customers with prior expectations.
And results hardly ever budge.
My hypothesis is A/B testing can move the needle for light engagement, like "try a free trial". But pulling out a credit card requires a lot of motivation, and the 0.1% of your visitors that have this motivation are relatively unaffected by your design.
Do you find me credible on this topic? I'm legally, morally, and practically constrained regarding how much detail I can go into here, but if you trust me: I have recent experience which makes me disagree in the strongest possible way with the generalization of your hypothesis. A recent client CCs me on their weekly Optimizely status report. I've been evangelizing A/B testing for a few years. Every time I read one of those emails my mind gets blown again.
For example, use genetic programming to periodically modify the page served up. Both pages are served up, A/B style, and the page which has the highest metric (eg. $ earned) survives. Repeat.
It would be interesting to see what the results are after an extended period of time.
What I found does move the needle are:
1) Changing how much I charge, including adding upsells, and
2) Getting new sources of traffic.
That's like telling the director of the FBI he knows nothing about law enforcement because he wont reveal facts that he has a professional obligation to not reveal.
That's not true in my experience. I've seen some big increases in the percentages that complete the checkout process through an iterative process of redesign/test for example.
The motivation behind a user purchase can be very different in different situations. If Mary is purchasing a new gearbox for the car, and she's already done a comparison shop, then she's likely to finish the purchase as long as it's vaguely sane. If Bob is impulse buying that new laptop bag he saw on swissmiss then every single grain of friction between Bob and the final "purchase" button is going to make it more likely that he remembers he really needs that $80 for groceries this week.
At some point you're going to reach a maximum of course and only see small or no increases - but, with my clients anyway, folk seem to be quite a long way from that in many instances :-)
I'm building a couple sites at the moment and am having them designed from scratch but I'm wondering how you determine the cost/benefit of tweaking something off the shelf versus building your own?
As to cost/benefits: I'm at a point in my life/business where "Spend a week hacking a template to save < $2,000" is an astoundingly poor use of my time.
and then add the rest of the info that's currently below this part into an additional tab up top, perhaps called: "Info" (and it should be the first tab -- furthest to the left).
Well, when it couldn't play something, it was very memorable to me.
You mention in the write-up that you've already done a lot of work to get your User Success percentage high. Is this the first time that you've seen a disconnect between User Success improvements and sales increases? Or did you pick this more as a useful, dramatic example?
Please, please tell me that if you kill the old version you will have the numbers to show that that the old one is not, say, at least 5% better than the new one. At the moment you seem to be 'leaning' towards the new version even though sales from the old version are higher!
It doesn't matter how much time and money you spent on the new version, or how much better it looks, or how much easier it is to use. What matters at the end of the day is how much money it makes, so run the split test until you are (statistically) confident that you are not throwing money away when you kill one of the designs.
I don't see the new web page as an improved design though, the lack of alignment between elements, weird open spaces and element sizing not being appropriate to the layout are all evidence the designer was amateurish.
If interested, commentary on the page as annotation is available here: http://imgur.com/HsG6E
(Fair disclosure: I don't sell such services and am not pimping, just commenting.)
Let's say Patrick would continue this for 9 more weeks; so far we've seen 10% of the whole test. Let me illustrate an edge case: after 10 weeks, the new site has made 134 sales, the old one has 125. 13/13 after 1 week seems about right, but 134/125 is more than 7% increase.
We can all appreciate that that sort of conversation is pretty exasperating for the engineer, but if he were a nice guy, he'd want to try to explain at least enough engineering to the boss such that the boss understood why "Use an iPad" is not a compelling option there. But he might be disinclined to start that conversation at 2:40 in the morning because it would take a while.
It is 2:40 AM in Japan and improving your understanding of A/B testing would take a while. There exist many comprehensible beginner's guides to it on the Internet. If after reading them you still don't understand why you need two more pieces of data in your hypothetical and why it is not extraordinarily likely that the 7% increase you measure in it reflects an action change in user behavior, I will be happy to explain it to you some day when it is not 2:40 AM.
A/B testing is founded on statistics. You take Option A and Option B and see which one achieves more Goal C.
But you can't just look at the percent difference and decide that Option B must be better! Look, it has a higher percent Goal C! But that could be due to chance, so A/B tests employ tests of statistical significance to determine whether the test results are _probably_ chance or _probably_ reflect a genuine causal increase in Goal C.
For example, if you flip a coin four times, and get heads three of those times, without a statistical significance test you might conclude heads is 3x as likely to appear as tails. We know that's wrong, though- each side on a coin has a 50% chance of appearing face-up for each flip.
The flaw in this experiment is that we tried to extrapolate a result from a very small set of data. A statistical significance test would take these results and say "we have a <very small percent> confidence level that heads is more likely to come up, and doesn't just come up more often by chance".
If we flipped the coin 10,000 times instead, you'd get something pretty close to 50% heads and 50% tails, and your significance test would return a high confidence level that those numbers are accurate.
Short story long, you need lots of datapoints to determine whether an A/B test result is chance or an actual difference, and the smaller the difference between how Option A and Option B perform, the more datapoints you need to be confident they're actually different. Patrick's numbers are so close together that he'd need far more than 300 sales to reach the gold-standard 95% confidence level that there's actually a difference.
Some products don't have a "search, find and purchase immediately" pattern to sales. Especially when you move out of the B2C market.
Some businesses sales can look more like "Visit half a dozen different sites. Go away for a week and think. Visit best sites again. Go away for a few days and come to a decision. Visit final option, browse and purchase".
Tracking these multiple visits can be non-trivial/impossible since it may be different people and different browsers visiting the site at the different stages. It also leads to long lead-times for the effects that design changes make.
For more subtle effects, you need more observations, plain and simple; it doesn't matter how you do the math if you don't have the data to reach the significance levels you care about.
I'd be interested if you see something similar as the month progresses.
Also - a question not directly related to the new design - but I'm curious :-)
On either home page design there's no social proof info (testimonials, number of users, total #bingo cards made, etc.). Which intrigues me since it's something that pretty much always has a positive affect in my experience (which, I admit, is largely in sites fairly different from BCC). In once case we got a twenty-something% increase in conversion in the checkout process by adding in some targeted quotes on value-received/money-saved on the final "give me your money" pages.
This seems like such an obvious thing that you've probably tried it already. Is there a reason you didn't go for it?
* with the right circumstances
Way to be scrappy, Patrick.