Of course the psychological aspects might have worked differently if I were actually there to buy tickets for an event I really wanted to go to.
"The website is experiencing heavy traffic - This may result in reduced performance"
For the "too lazy to navigate, add to cart", a screenshot: http://i.imgur.com/o8zUP1u.png
Edit: Offtopic, but Floyd Mayweather vs Connor McGregor in a "boxing only" match, with no kicks, submissions, wrestling, etc, allowed? Who thinks that would be fun to watch? Ugh, that's going to be sad to watch. McGregor must be making a fortune to agree to that...he is going to be trounced.
Perhaps they really are experiencing heavy traffic...I have no real way of knowing that for sure. But when you program a robot to scam me, at least put some effort in, right?
There's no hope of avoiding that with airlines for me - I don't fly enough to avoid the prole line. But literally any storefront where I can avoid that shit, I do, and it breeds not just dislike - it makes me want see them to fail.
I'm missing 'authority' as way to improve conversion. Funny that they use it themselves in two ways: letting PWC check the results and, in a milder form, using a scientific way of communicating the results (LaTeX, article layout etc). The impact would have been less if it was just a blog post :-). I'm curious about results on authority, maybe someone from qubit can give us some insights on that?
At my company we offer a solution to sites to implement these strategies through notifications/nudges. Having said that: we firmly believe in A/B testing but we believe even more in recognizing (we do that through machine learning) what technique works best on a personal level. This means that a site can have, for example, two strategies and that we apply none, either one or both on the visitor. That way you can reach higher uplifts.
That being said, I'm surprised many of the results are so negative. It would be great to also see the max uplift achieved for each category. A number of retailers I've worked with have been able to beat these uplifts by quite a bit. I wonder if it might be significantly skewed by the kind of clients Qubit has?
Maybe more importantly - every A/B test ever run suffers from measurement error, and usually in e-commerce this error is on the scale of the effect you are trying to measure. This means that sometimes you will 'see' massive uplifts, where in actuality most of the size of the effect was due to random noise. This is kind of the curse of e-commerce : most people have enough data to say something (we are 95% sure this test was positive), but most not with any notable precision (we are 95% sure the uplift was between +8% and +9%). Basically all the stats in this analysis is trying to remove this noise, and this is what we got.
I'm not an expert on plate notation, though, so I'm not sure which MLM you used. Is it basically `Revenue ~ (Covariate_1 + ... + Covariate_n | Treatment | Category)`?
Indeed. What matters most with these kinds of experiments isn't really the average results, but what is possible and the distribution among beneficial results only. After all, the whole point of A/B testing is to try experiments and then either keep the changes if they improve results or stay with what you've already got if the changes didn't bring an improvement. Surely all the treatments that led to negative changes would just have been discarded in practice? It's still important to see the full picture as well, if only to guide decisions about which experiments are even worth trying, but I think there's another side that doesn't fully come through here.
Ironically, the companies that have benefited most from A/B testing were the ones that were doing a terrible job of it in the first place so then there is lots of low hanging fruit making the consultants look good.
Yet another item often missed: A/B testing success is a direct function of the length of the lever you are pulling. If that lever commands billions of dollars then it is easy to make it pay for itself. But if you're trying to turn $10000 into $11500 then you likely are wasting your time.
No, the bad results also matter: you are still spending visitors and revenues in testing out bad variants, which is part of determining the costs and benefits. Even with a bandit approach, you incur logarithmic regret in the number of variants. And testing a bad variant is common: the best category, 'scarcity', has a 16% probability of the variant being harmful. A Value of Information calculation has to take into account the harm done while testing.
However... Starting a new e-commerce property? Good luck finding traffic in anything profitable. Amazon and other Giants dominate search rankings so I'm not sure how you will find your traffic unless you create a new niche. Maybe you're a thought leader in a hobbyist space, that can work... But you're not going to be succeeding because of these tricks
Being a small category for a huge store means they don't care about that category too much but it doesn't mean there isn't a lot of cash to be made.
I've helped several companies to build an online store that did exactly that and they are all doing 5 digit revenue month after month.
It's easy to rank higher in search engine for something you are 100% focused on compared to a big stores where it's only a small thing that they don't highlight.
Exactly. The classic example that always comes to my mind: I work with audio and music production in my spare time and looking for equipment from that category will almost certainly result in a first search result from Thomann, a store focused on studio gear (like Sweetwater for europeans/germans), rather than Amazon, eBay and Co.
When you factor in cost of goods and labor, what's the net income on that $120k/year? An educated guess would put it below the median wage for the country.
How many of them have at least $1,000,000/year revenue?
Launch such a store every month and you have $120k MRR after 12 months.
(disclaimer: I am a co-founder of this company)
E.g. Shopify handled $15.4 billion worth of transactions in 2016, and that's just one platform.
Disclosure: I work at Tophatter.
• scarcity (stock pointers) +2.9% uplift
• urgency (countdown timers) +2.3% uplift
• social proof (informing users of others’ behaviour) +1.5% uplift
social proof 2.3%
Likewise most GUI-related tweaks seem to have a negative effect (mobile friendliness, search, navigation). Assuming it gives a better mobile experience, why would anyone spend less - unless the goal is to get them off mobile and onto the desktop.
Most people are working on sites other than the top ones. With these websites where customers are not regulars, change to the GUI is received differently. Sensible updates to the UX will convert.
I too am surprised to see the conclusions come out as negative as this over such a large overall data set. Just from our own experience, even making quite modest changes to small web sites, it's not that unusual to see the kinds of change that came out with a negative mean in this report actually making a very noticeable positive difference.
I wonder whether this is partly a matter of interpretation and presentation. A lot of the treatments that had a slightly negative mean also had a lot of variance, which suggests that quite often those treatments do work but it's not reliable and requires experimentation to make sure you only keep the genuinely beneficial cases. It seems plausible that there were a few of the "75% improvement in our case study!" kinds of results lost in the long tails, but that what the data is telling us is that those really are outliers and don't happen nearly as often as we might wish.
What I have missed in this paper is the impact of customer reviews (a 4.5 ranking has a higher uplift potential than a 5.0 star ranking - according to some studies). And the number of reviews has impact as well.
Not enough studies around for a meta analysis?
Nice to see an analysis like this for a change.
Search up anything Shopify or ecommerce mastermind.
Be aware that a lot of them are set up as groups that simply try to sell you stuff.
The best ecommerce folks are basically the best marketers.
"Chrome PDF Viewer
Click YES to accept the terms of the disclaimer on page 1 of this document.
If you click NO the document will close.
Not only does this want me to agree to something before displaying it, the legally binding options (Yes / No) aren't even choices in the modal.
Want an easy win? Make mobile checkout better. It's generally the worst. I was on a fairly large, publicly traded, retailer's site over the weekend and had a goofy error that was extremely easy to make on their mobile checkout page. While I was alerted to the error, it also emptied my shopping cart and erased all the address and payment info I spent time typing in.
I would have loved to see the cut of performance by industry/sector. My hunch is some of the things would work really well in travel but not as well in others especially low involvement categories and categories with lower average selling price. It would also be interesting to know the average duration of these A/B tests, I think some of things like scarcity and urgency will have larger effect in the shorter time duration vs others like UI changes which will take a while to produce substantial results, mostly because customers will have to learn new behaviours. Product recommendations is interesting because it is notoriously difficult to get them right and feel they tend to work better in long tail categories like media vs. head heavy categories like mobiles or laptops. They may also not work well in categories where brand influence is high and are generally high involvement and high cost.
Sometimes factoring all variables when doing testing becomes impossible.
E-commerce sites want to "engage" with their customers, but fail to realise that the customers don't want a relationship. They just want whatever product you're selling as quickly and cheaply as possible. Most of you customers will be coming via price comparison sites or Google, so they aren't going to you directly (unless you're Amazon or eBay). For most e-commerce sites the customers aren't going to stay long enough to notice imperfections in the UI.
What you do need is a dead simple checkout (no signup required) and an equally easy return form. Everything else will be used by only a tiny percentage of your customers, often those you don't want to deal with anyway.
For some weird reason making the checkout to simply will result in a large number of purchases cancelled right after placement. In my experience customers are more likely to cancel within 15 minutes of placing the order than at any other time.
I have done a/b tests where increasing the price of the product increased the conversion rate. The original price was to low and the customers thought this was a cheap non quality product.
Also I, and many people I know, are not shopping for the lowest price, but for the best overall package. (Reputation of seller, return policies, shipping duration, display of the product in the shop, filter and search abilities to find the right product in the shop, ...) There are of course many users who simply want the lowest price, but one important thing if you a/b test is: You don't have to go with the biggest group of users (in this case price sensitive), there are so many other oppurtunities if you understand your users. In e-commerce these oppurtunities open especially often, if the user is not exactly sure what product/brand he needs or wants to buy. Competing on price is really hard, competing on advice is often easier as a small shop.
What you're selling sets the threshold for those things. Return policies, advice and reputation matter more for embroidered clothing than they do on bulk sales of nuts and bolts.
These two statements are contradicting each other, because "imperfections" in the UI can in fact greatly affect the perceived simplicity of the checkout process - and optimizing that is a large part of what people want to achieve with A/B testing.
And they might have flubbed them - I see people go way too far on that sometimes. Changing it from something reasonable that matched the site, to something over the top and not fitting.
 http://sas.dk (Search for e.g. Copenhagen -> London)
 http://lufthansa.com (Search for e.g. Frankfurt -> Copenhagen)
Still, it has proven a very valuable resource for me when trying to explain a decision I've made in a new website design. They have many free articles that offer some good insights, as well as some more in-depth reports about specific sectors that will cost a few hundred dollars each.
- Saying there are just a handful items left in stock (+2.9% revenue per client)
- Saying other people are watching this product (+2.3%)
- Time limited offer (+1.5%)
I did not see mention of combining these factors. I doubt the gains are cumulative.
My main takeaway is that most optimizations are not worthy if you have the opportunity to spend your time/money on something else to bring value to the consumer.
Also, I think #1 and #3 are dick moves and #2 needs some good crafting to not be. I doubt the cost in reputation is worth the increase in revenue.
1. It's not just about items in stock. "Our SomeNonprofit membership program has just 10 places left at the platinum level, where you receive the following benefits..."
2. Social proof is more than just what people are watching. "Fans of SomeNonprofit donate an average of $121 to SomeNonprofit every year." Or, "When we asked SomeNonprofit donors what they liked most, they said it was the way SomeNonprofit does SomeThing. If you like SomeThing, too, then donate to support SomeNonprofit." It's really just about demonstrating that others have done a thing so it's okay for you to do it, too.
3. One nonprofit I'm involved with very successfully does a fundraising campaign for the last 24 hours of every year. It's an artificial time restraint in some ways, but it also capitalizes on being the very last day each year that tax-deductible donations count toward a year's taxes. It's a true time-constraint!
In your case they are not, so I agree that it is interesting to remind people of actually seeing these.
we offer our social proof platform 100% free to all charities:
ping me (email@example.com) for a free-forever account, no strings.
I appreciate this one, as long as it's actually true. Many times I've been on the fence about something, come back the next to to find out it was gone and wasn't going to be restocked. Knowing would have changed my decision. On the other hand, sometimes I've bought something and the number claimed to be in stock didn't change for months after. THAT'S a dick move.
Same with #2. Time limited offers make sense when you're trying to attract attention or compete with other sales (e.g. opening day sales, Black Friday, etc.) Some brick-and-mortar stores have time-limited offers constantly, and then it's clearly just manipulative.
Why do you doubt that? Aren't they presented as independent results?
> bring value to the consumer
This is vague, but also a separate goal. Perhaps another way of representing customer value is churn reduction. In any case, it could be appropriate to invest in gaining 3% more revenue per additional customer, then use the gains to invest in more customer value.
> #1 and #3 are dick moves and #2 needs some good crafting
Real scarcity exists, and there's value to communicating it to the customer. For example, saying there are two copies of something available, and the rest are backordered for three weeks, can be pretty useful information.
I assume what was meant is that they're dick moves if done purely for sales optimisation (i.e. If they are lying) and not if the scarcity or time limitations actually naturally exist.
Real scarcity exists, but this paper gives an incentive to create artificial scarcity. I have seen a lot of websites mention low stocks but none mentioning backorders or restocking estimates. Saying "Only 3 items left" has a very different effect than "only 3 items left but 100 more arrive in 2 weeks".
This one is interesting. When doing product design the opposite is important. I expect it depends on what you're selling. Knowing there are 20,000 of an electrical component available vs 20 is a good thing. Availability weighs a lot more than: Quick! Get the last one!
If 1,000 people have looked at an AirBnB listing, it does actually make sense to book faster than a property 3 people were looking at.
For instance, do you count people "watching" this property through suggestions while they are browsing another one? Do you count people who clicked it but went back after checking the address? How long before you do not consider a person "watching" anymore?
Each of these decisions have different answers if the goal is to raise revenues or to provide accurate information to consumers.
we rely on these strategies "working" for our customers, in order to exist.
i'll wager that when consumers know a "widget" or live notification is provided by an external source, they trust it more than when the build is internal.
ping me (firstname.lastname@example.org) if you want to get involved!
the use of 3rd party platforms increases trust, ie restaurants using Yelp vs self-hosted text boxes.