Hacker News new | past | comments | ask | show | jobs | submit login
What works in e-commerce – A meta-analysis of online experiments [pdf] (qubit.com)
424 points by sweezyjeezy on June 21, 2017 | hide | past | web | favorite | 85 comments



If you want to see pretty much all the most effective measures in action, go to a ticket reseller like Viagogo and follow something through to the basket. It is both impressive and amazing the amount of psychological steers there are on the site.


Just tried it out, wow does that feel slimy. The waiting queue felt totally fake and the "while you were waiting these sold out thing" was obnoxious.

Of course the psychological aspects might have worked differently if I were actually there to buy tickets for an event I really wanted to go to.


The worst is the bottom banner complete bullshit of

"The website is experiencing heavy traffic - This may result in reduced performance"

For the "too lazy to navigate, add to cart", a screenshot: http://i.imgur.com/o8zUP1u.png

Edit: Offtopic, but Floyd Mayweather vs Connor McGregor in a "boxing only" match, with no kicks, submissions, wrestling, etc, allowed? Who thinks that would be fun to watch? Ugh, that's going to be sad to watch. McGregor must be making a fortune to agree to that...he is going to be trounced.


I continued navigating through to the actual point of purchase. The default quantity of tickets selected was, "1" and there was a warning message which said, "About to sell out - only 2 tickets left." When I selected 5 tickets, that message changed right in front of my eyes to, "About to sell out - only 6 tickets left," and so on.

Perhaps they really are experiencing heavy traffic...I have no real way of knowing that for sure. But when you program a robot to scam me, at least put some effort in, right?


Or flight/hotel/car rental booking websites. Google flights is a notable exception.


The easiest way to make me go somewhere else is to adopt these tactics. They make me feel like I'm locked in a battle of wits with an autistic con-artist, or maybe a suspicious Fed - if weird rituals are not faithfully observed under conditions of vast information asymmetry and artificial stress, I will suffer.

There's no hope of avoiding that with airlines for me - I don't fly enough to avoid the prole line. But literally any storefront where I can avoid that shit, I do, and it breeds not just dislike - it makes me want see them to fail.


The people with the knowledge of these tricks are so few that unfortunately in most cases the additional revenue is too much to keep the minority happy. It's fascinating to look at ecommerce storefronts and search for all of the subtle tactics they're trying to use on you.


I don't see these tactics on many airline or hotel owned sites. They seem to be more common on aggregators.


Great research with lots of data and tangible results.

I'm missing 'authority' as way to improve conversion. Funny that they use it themselves in two ways: letting PWC check the results and, in a milder form, using a scientific way of communicating the results (LaTeX, article layout etc). The impact would have been less if it was just a blog post :-). I'm curious about results on authority, maybe someone from qubit can give us some insights on that?

At my company[0] we offer a solution to sites to implement these strategies through notifications/nudges. Having said that: we firmly believe in A/B testing but we believe even more in recognizing (we do that through machine learning) what technique works best on a personal level. This means that a site can have, for example, two strategies and that we apply none, either one or both on the visitor. That way you can reach higher uplifts.

[0] https://www.conversify.com


This is a really useful article. It's a shame that so much development time is wasted on large numbers of fruitless optimisations just because they are "easy" (eg. tweaking the colour of a CTA).

That being said, I'm surprised many of the results are so negative. It would be great to also see the max uplift achieved for each category. A number of retailers I've worked with have been able to beat these uplifts by quite a bit. I wonder if it might be significantly skewed by the kind of clients Qubit has?


Two things - firstly, each of the scores you see in the key findings are just the average. We have also estimated the size of the standard deviation (see table in section 2, or appendix A). So for some treatments, large uplifts are not out of the question.

Maybe more importantly - every A/B test ever run suffers from measurement error, and usually in e-commerce this error is on the scale of the effect you are trying to measure. This means that sometimes you will 'see' massive uplifts, where in actuality most of the size of the effect was due to random noise. This is kind of the curse of e-commerce : most people have enough data to say something (we are 95% sure this test was positive), but most not with any notable precision (we are 95% sure the uplift was between +8% and +9%). Basically all the stats in this analysis is trying to remove this noise, and this is what we got.


Great to see a multilevel model used to shrink the effects. I was reading the abstract and thought, they probably didn't correct for sampling error - but you did.

I'm not an expert on plate notation, though, so I'm not sure which MLM you used. Is it basically `Revenue ~ (Covariate_1 + ... + Covariate_n | Treatment | Category)`?


It would be great to see the max uplift achieved for each category

Indeed. What matters most with these kinds of experiments isn't really the average results, but what is possible and the distribution among beneficial results only. After all, the whole point of A/B testing is to try experiments and then either keep the changes if they improve results or stay with what you've already got if the changes didn't bring an improvement. Surely all the treatments that led to negative changes would just have been discarded in practice? It's still important to see the full picture as well, if only to guide decisions about which experiments are even worth trying, but I think there's another side that doesn't fully come through here.


I think the big error in A/B testing is that expectations are quite often very unrealistic. Designers typically have a reasonably good idea about what will work and what will not. Finding 'million dollar buttons' is rare. Of course a couple of percent or even 10's of percents of improvement is nothing to sneeze at. But thinking that by A/B testing forever you're going to make a shrub grow into a tree is imo not realistic. Aside from the detail that a continuously changing user interface is often in itself a barrier to sales.

Ironically, the companies that have benefited most from A/B testing were the ones that were doing a terrible job of it in the first place so then there is lots of low hanging fruit making the consultants look good.

Yet another item often missed: A/B testing success is a direct function of the length of the lever you are pulling. If that lever commands billions of dollars then it is easy to make it pay for itself. But if you're trying to turn $10000 into $11500 then you likely are wasting your time.


> isn't really the average results, but what is possible and the distribution among beneficial results only

No, the bad results also matter: you are still spending visitors and revenues in testing out bad variants, which is part of determining the costs and benefits. Even with a bandit approach, you incur logarithmic regret in the number of variants. And testing a bad variant is common: the best category, 'scarcity', has a 16% probability of the variant being harmful. A Value of Information calculation has to take into account the harm done while testing.


(Hence the final sentence of my previous comment.)


Well, running a rudimentary eBay store you realize these things help pretty quickly, but it's good to have data.

However... Starting a new e-commerce property? Good luck finding traffic in anything profitable. Amazon and other Giants dominate search rankings so I'm not sure how you will find your traffic unless you create a new niche. Maybe you're a thought leader in a hobbyist space, that can work... But you're not going to be succeeding because of these tricks


That's simply false. There are always categories or "side"-categories that can be highly profitable and big stores only care about as a side business (small category in their store).

Being a small category for a huge store means they don't care about that category too much but it doesn't mean there isn't a lot of cash to be made.

I've helped several companies to build an online store that did exactly that and they are all doing 5 digit revenue month after month.

It's easy to rank higher in search engine for something you are 100% focused on compared to a big stores where it's only a small thing that they don't highlight.


> It's easy to rank higher in search engine for something you are 100% focused on compared to a big stores where it's only a small thing that they don't highlight.

Exactly. The classic example that always comes to my mind: I work with audio and music production in my spare time and looking for equipment from that category will almost certainly result in a first search result from Thomann, a store focused on studio gear (like Sweetwater for europeans/germans), rather than Amazon, eBay and Co.


Agreed. I work for a company in such a niche and there are definitely side-niche markets still out there for people to find and dominate, sometimes even on SEO alone. A small piece of a huge pie is still a very large piece to me.


Have a good example of one that took that piece? Outside of Amazon


5 digit revenue could be as low as $10,000 a month.

When you factor in cost of goods and labor, what's the net income on that $120k/year? An educated guess would put it below the median wage for the country.

How many of them have at least $1,000,000/year revenue?


If you think about that these are only side gigs without much labor involved (mostly drop shipping) and made on a tight budget, even $10k would be good money.

Launch such a store every month and you have $120k MRR after 12 months.


Can you give any examples of small categories? I can't think of any, let alone get my brain round what kind of thing they would be.


For example: http://www.ties.com

(disclaimer: I am a co-founder of this company)


I can think of plenty, but I'm not going to say... Tehehe sorry for the joshing, it's interesting that free shipping doesn't correlate into increased revenues, or at least as much as one would think (although I'd like to see this same test on merchant fullfilled Amazon listings)..


I'm not saying it's not hard, but success is very much possible.

E.g. Shopify handled $15.4 billion worth of transactions[1] in 2016, and that's just one platform.

[1]: https://www.shopify.com/2016


Counter-argument: http://fortune.com/2017/05/11/tophatter-e-commerce/

Disclosure: I work at Tophatter.


This is a marketplace though, I am talking more of direct online retailing.


Shopify as just one example of a platform is still aggressively growing, no? Beyond that I know for sure there are still niches or unique ways to take to make money. I haven't done it personally, but know of a few people.


Inconsistencies to report.

Page 2:

    • scarcity (stock pointers) +2.9% uplift
    • urgency (countdown timers) +2.3% uplift
    • social proof (informing users of others’ behaviour) +1.5% uplift
Page 6 table 2.2:

    scarcity 2.9%
    social proof 2.3%
    urgency 1.5%


facepalm. The table is correct. Will correct this tomorrow. Thanks for the heads up.


I'm surprised free shipping has a negative impact on revenue. Worst case I imagine would be that revenue would increase but with less or negative profit.

Likewise most GUI-related tweaks seem to have a negative effect (mobile friendliness, search, navigation). Assuming it gives a better mobile experience, why would anyone spend less - unless the goal is to get them off mobile and onto the desktop.


I think that the GUI-related tweaks must be on sites that are done properly and well established. For these guys the customers know what to expect so a change in design will be received negatively by people that are used to the old site. Although not in ecommerce, the redesign of the BBC site is like that, people want to cling on to the old site and instinctively don't like the new. In time they adjust or get a touchscreen computer and 'get' the changes and what the redesign is about.

Most people are working on sites other than the top ones. With these websites where customers are not regulars, change to the GUI is received differently. Sensible updates to the UX will convert.


Thanks for sharing some real data. It's always interesting to see.

I too am surprised to see the conclusions come out as negative as this over such a large overall data set. Just from our own experience, even making quite modest changes to small web sites, it's not that unusual to see the kinds of change that came out with a negative mean in this report actually making a very noticeable positive difference.

I wonder whether this is partly a matter of interpretation and presentation. A lot of the treatments that had a slightly negative mean also had a lot of variance, which suggests that quite often those treatments do work but it's not reliable and requires experimentation to make sure you only keep the genuinely beneficial cases. It seems plausible that there were a few of the "75% improvement in our case study!" kinds of results lost in the long tails, but that what the data is telling us is that those really are outliers and don't happen nearly as often as we might wish.


Lot's of interesting data and some findings I didn't expected. Thanks.

What I have missed in this paper is the impact of customer reviews (a 4.5 ranking has a higher uplift potential than a 5.0 star ranking - according to some studies). And the number of reviews has impact as well. Not enough studies around for a meta analysis?


Anyone who hangs out on Facebook group e-commerce forums basically have seen the outrageous claims the authors of this paper alludes to.

Nice to see an analysis like this for a change.


My favorite is: "Our buy now button should be orange, that converts best" without testing it, verifying the results, and just pointing at some blogposts.


can you recommend good groups?


A lot of Shopify groups.

Search up anything Shopify or ecommerce mastermind.

Be aware that a lot of them are set up as groups that simply try to sell you stuff.

The best ecommerce folks are basically the best marketers.


Thanks!


The PwC assurance report URL gives a 404: http://www.qubit.com/sites/default/files/pdf/pwc-qubit-assur...



Now that's pretty bad. There is a modal that open before the document is displayed.

"Chrome PDF Viewer

Click YES to accept the terms of the disclaimer on page 1 of this document.

If you click NO the document will close.

[OK] [CANCEL]"

Not only does this want me to agree to something before displaying it, the legally binding options (Yes / No) aren't even choices in the modal.


Thankfully, clicking CANCEL opens the PDF anyway.


In my experience, the A/B tests that are most likely to win are the ones where you make UX changes designed to make it easier for visitors to do what you want them to do. These not only improve your conversion rates, they are also less spammy and intrusive as things like exit-intent modals. They are also the types of gains that do compound.

Want an easy win? Make mobile checkout better. It's generally the worst. I was on a fairly large, publicly traded, retailer's site over the weekend and had a goofy error that was extremely easy to make on their mobile checkout page. While I was alerted to the error, it also emptied my shopping cart and erased all the address and payment info I spent time typing in.


Very interesting thanks for sharing. As someone working in e-commerce, I was smiling when I saw some of them but it is frustrating on how often these ineffective experiments are repeated.

I would have loved to see the cut of performance by industry/sector. My hunch is some of the things would work really well in travel but not as well in others especially low involvement categories and categories with lower average selling price. It would also be interesting to know the average duration of these A/B tests, I think some of things like scarcity and urgency will have larger effect in the shorter time duration vs others like UI changes which will take a while to produce substantial results, mostly because customers will have to learn new behaviours. Product recommendations is interesting because it is notoriously difficult to get them right and feel they tend to work better in long tail categories like media vs. head heavy categories like mobiles or laptops. They may also not work well in categories where brand influence is high and are generally high involvement and high cost.


It's good to see an analysis, albeit most of this info was common knowledge, maybe except call to action buttons causing a decrease. I thought it was the opposite.

Sometimes factoring all variables when doing testing becomes impossible.


This stuff is less common knowledge than you might expect. When bringing on new clients at Qubit (particularly smaller ones), we still find so many of them obsessing over tiny UI changes expecting it to make an impact. Much of the e-commerce industry has bought into the idea that the cosmetics of a site is the most important thing to get right.


I believe that the main focus of any ecommerce site should be pricing, availability and having the payment and delivery options your customers expects.

E-commerce sites want to "engage" with their customers, but fail to realise that the customers don't want a relationship. They just want whatever product you're selling as quickly and cheaply as possible. Most of you customers will be coming via price comparison sites or Google, so they aren't going to you directly (unless you're Amazon or eBay). For most e-commerce sites the customers aren't going to stay long enough to notice imperfections in the UI.

What you do need is a dead simple checkout (no signup required) and an equally easy return form. Everything else will be used by only a tiny percentage of your customers, often those you don't want to deal with anyway.

For some weird reason making the checkout to simply will result in a large number of purchases cancelled right after placement. In my experience customers are more likely to cancel within 15 minutes of placing the order than at any other time.


Two counterpoints:

I have done a/b tests where increasing the price of the product increased the conversion rate. The original price was to low and the customers thought this was a cheap non quality product.

Also I, and many people I know, are not shopping for the lowest price, but for the best overall package. (Reputation of seller, return policies, shipping duration, display of the product in the shop, filter and search abilities to find the right product in the shop, ...) There are of course many users who simply want the lowest price, but one important thing if you a/b test is: You don't have to go with the biggest group of users (in this case price sensitive), there are so many other oppurtunities if you understand your users. In e-commerce these oppurtunities open especially often, if the user is not exactly sure what product/brand he needs or wants to buy. Competing on price is really hard, competing on advice is often easier as a small shop.


Everyone is price sensitive once you've met the minimum threshold for those other things.

What you're selling sets the threshold for those things. Return policies, advice and reputation matter more for embroidered clothing than they do on bulk sales of nuts and bolts.


You seem to be taking a very narrow definition of e-commerce here. What you say seems plausible enough for sites selling low value or commodity products to casual purchasers, but I wouldn't assume the same situation for someone selling luxury products, services, digital subscriptions, etc.


+1. We run a small business that manufactures bicycle trailers which we sell direct to our customers over the internet. The only reason we've been able to succeed has been because we deal one-on-one with our customers.


> For most e-commerce sites the customers aren't going to stay long enough to notice imperfections in the UI. > What you do need is a dead simple checkout (no signup required)

These two statements are contradicting each other, because "imperfections" in the UI can in fact greatly affect the perceived simplicity of the checkout process - and optimizing that is a large part of what people want to achieve with A/B testing.


I've built a couple lead generation sites and some of the things you've determined don't make a difference with e-commerce seem to have a huge impact on people filling out forms on line.


It sounds like the experiment there wasn't to add call to action buttons, but rather to change the wording to be more action-suggestive.

And they might have flubbed them - I see people go way too far on that sometimes. Changing it from something reasonable that matched the site, to something over the top and not fitting.


I wish they would have evaluated the effect of dynamic pricing. That is, showing different prices to different visitors for the same product. Perhaps not enough online retailers employ the practice, although it seems to be an important tool for retailers like Amazon[0].

0. https://www.theatlantic.com/magazine/archive/2017/05/how-onl...


What you are describing is an A/B test of a price, not typically what industry calls "dynamic" or "variable" pricing. Amazon tried A/B testing prices and caught hell for it. They no longer do it, but they will do dynamic pricing.


Finding out a company employs this technique is the quickest way for me to not buy anything from them.


A lot of these are brilliantly displayed with varying degrees of integration and success on airline pages. [1],[2]

[1] http://sas.dk (Search for e.g. Copenhagen -> London)

[2] http://lufthansa.com (Search for e.g. Frankfurt -> Copenhagen)


I've done the countdown timer thing before, but not in a sleazy way like Viagogo. I've done limited-time sales where the item starts at a percentage off and gradually increases to full price. It seems to work well.


This is interesting but I definitely question some of the results, for instance reporting a negative impact for changing search results. Many businesses have been built on improving conversion through search result optimization.


I interpreted this as them creating/improving on-site search. Not related to adwords or display optimization.


Sure, but what were the improvements to on-site search? More relevant results? Tweaking the display? I know from experience that improving search result relevance in e-commerce has a (sometimes drastic) improvement in conversion.


This was really interesting. Thanks for posting it. Does anyone know of a place to find similar content? (Analysis of web trends and practices from a data driven perspective.)


Nielsen Norman Group ( https://www.nngroup.com ) is a good place to look for somewhat similar content. NNG focuses more on user experience, and so their studies are usually based on direct observation of users, rather than large amounts of data gathered through analytics.

Still, it has proven a very valuable resource for me when trying to explain a decision I've made in a new website design. They have many free articles that offer some good insights, as well as some more in-depth reports about specific sectors that will cost a few hundred dollars each.


We're pretty sure this is the only analysis of its kind at the moment. You have to have a pretty large number of experiments to be able to do this analysis. Hopefully we see more soon (co-author speaking)


tl;dr: The 3 items possibly statistically significant are:

- Saying there are just a handful items left in stock (+2.9% revenue per client) - Saying other people are watching this product (+2.3%) - Time limited offer (+1.5%)

I did not see mention of combining these factors. I doubt the gains are cumulative.

My main takeaway is that most optimizations are not worthy if you have the opportunity to spend your time/money on something else to bring value to the consumer.

Also, I think #1 and #3 are dick moves and #2 needs some good crafting to not be. I doubt the cost in reputation is worth the increase in revenue.


Whether they're dick moves is all in how you handle them. In the nonprofit world, these things are well-known and tend to be handled with finesse and even grace.

1. It's not just about items in stock. "Our SomeNonprofit membership program has just 10 places left at the platinum level, where you receive the following benefits..."

2. Social proof is more than just what people are watching. "Fans of SomeNonprofit donate an average of $121 to SomeNonprofit every year." Or, "When we asked SomeNonprofit donors what they liked most, they said it was the way SomeNonprofit does SomeThing. If you like SomeThing, too, then donate to support SomeNonprofit." It's really just about demonstrating that others have done a thing so it's okay for you to do it, too.

3. One nonprofit I'm involved with very successfully does a fundraising campaign for the last 24 hours of every year. It's an artificial time restraint in some ways, but it also capitalizes on being the very last day each year that tax-deductible donations count toward a year's taxes. It's a true time-constraint!


OK, agreed here. I need to specify it a bit more: 1 and 3 are dick moves when they are totally artificial, like saying only 3 left while the stock is around 300 or saying "only 2 hours left" when the time constraints are imaginary.

In your case they are not, so I agree that it is interesting to remind people of actually seeing these.


are you still involved w/ non-profits?

we offer our social proof platform 100% free to all charities: https://www.usefomo.com

ping me (ryan@usefomo.com) for a free-forever account, no strings.


I like your expression "dick moves", summarizes it quite well. John Gruber started calling static mobile navigation bars as "dickbars" [1]. Maybe we should build on that and call these moves accordingly ;-) e.g. "dicktimer"

[1] https://daringfireball.net/2017/06/medium_dickbars


>> - Saying there are just a handful items left in stock (+2.9% revenue per client)

I appreciate this one, as long as it's actually true. Many times I've been on the fence about something, come back the next to to find out it was gone and wasn't going to be restocked. Knowing would have changed my decision. On the other hand, sometimes I've bought something and the number claimed to be in stock didn't change for months after. THAT'S a dick move.

Same with #2. Time limited offers make sense when you're trying to attract attention or compete with other sales (e.g. opening day sales, Black Friday, etc.) Some brick-and-mortar stores have time-limited offers constantly, and then it's clearly just manipulative.


> I doubt the gains are cumulative

Why do you doubt that? Aren't they presented as independent results?

> bring value to the consumer

This is vague, but also a separate goal. Perhaps another way of representing customer value is churn reduction. In any case, it could be appropriate to invest in gaining 3% more revenue per additional customer, then use the gains to invest in more customer value.

> #1 and #3 are dick moves and #2 needs some good crafting

Real scarcity exists, and there's value to communicating it to the customer. For example, saying there are two copies of something available, and the rest are backordered for three weeks, can be pretty useful information.


Real scarcity exists, and there's value to communicating it to the customer.

I assume what was meant is that they're dick moves if done purely for sales optimisation (i.e. If they are lying) and not if the scarcity or time limitations actually naturally exist.


I think they are not purely cumulative because I assume that consumer have a it of slack in their decision/budget and that each optimization consumes some of this slack. The basic idea behind 1 and 3 is to stress out the consumer into believing that he may miss a sale. I think that stressing them out with 1 or 3 more or less gives the same result and does not need to be applied twice.

Real scarcity exists, but this paper gives an incentive to create artificial scarcity. I have seen a lot of websites mention low stocks but none mentioning backorders or restocking estimates. Saying "Only 3 items left" has a very different effect than "only 3 items left but 100 more arrive in 2 weeks".


> Saying there are just a handful items left in stock (+2.9% revenue per client)

This one is interesting. When doing product design the opposite is important. I expect it depends on what you're selling. Knowing there are 20,000 of an electrical component available vs 20 is a good thing. Availability weighs a lot more than: Quick! Get the last one!


AirBnB does a lot of these — and they aren't necessarily dick moves when you're looking at something with limited availability.

If 1,000 people have looked at an AirBnB listing, it does actually make sense to book faster than a property 3 people were looking at.


It makes sense when the information is accurate and complete. I tend to not trust companies to do the right thing when they have an incentive to do the opposite.

For instance, do you count people "watching" this property through suggestions while they are browsing another one? Do you count people who clicked it but went back after checking the address? How long before you do not consider a person "watching" anymore?

Each of these decisions have different answers if the goal is to raise revenues or to provide accurate information to consumers.


Fair. Personally I'll occasionally look at the source to see if there's any obvious trickery going on. It would be interesting to do a study to see what people consciously think about that type of thing.


my company would happily fund a study like this.

we rely on these strategies "working" for our customers, in order to exist.

i'll wager that when consumers know a "widget" or live notification is provided by an external source, they trust it more than when the build is internal.

ping me (ryan@usefomo.com) if you want to get involved!


Hotels.com uses #2 ("X people have looked at hotels in this area in the last hour.") and it feels like a dick move.


Why is that a dick move if it's true?


what's unfortunate is we'll never know if hotels.com is true, because they built the component in-house.

the use of 3rd party platforms increases trust, ie restaurants using Yelp vs self-hosted text boxes.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: