
Why small tweaks and split testing don't work, and what to do instead - pedrocortes
https://www.cortes.design/post/saas-website-conversion-split-test
======
jermaustin1
I don't buy this article at all.

I am no longer with the company, but around 10 years ago, I turned a failing
consultancy into an SEO SaaS company. Turning revenues from -40k/month to
90+k/month after laying off more than half the staff when we lost our last
client and deciding to take our destiny into our own hands building a suite of
SEO products.

We ran split tests on almost every design change. They weren't always tiny,
some were entire page layouts. But even on things like changing the header
text saw measurable differences.

I built the ability to split test into every facet of our framework. We had
the ability to split test pages, transactional emails, marketing emails, and
we would tweak everything all the time. If we noticed group a wasn't
performing as well as b, we would drop a.

There were some tiny things that made major differences:

\- Email "From" being a person's name and not just the service bumpped
onboarding but nearly 20%.

\- The onboarding subject "Almost Done..." performed about 10% better than
"Activate your account"

\- Having the video auto play converted visitors almost 50% more often than
them having to click the play button (it may be evil, but it makes money).

\- Having the video be an animation vs just a spokesperson talking into a
webcam converted 5% better with the same script (+2% better with just reading
the text from the page).

We had a large audience after we launched our first product free of charge for
a limited time (linklicious.co -- formerly linklicious.me -- formerly lts.me),
then used that momentum to built out a new product every month or so for the
next couple of years. It was a great job in a terrible industry. 80% of my job
was inventing new ideas.

~~~
shanghaiaway
-Take your email list -Split it in two equal halfs -Send each half the same email -Open and click rates will differ between the two lists

~~~
teisman
But if you apply statistics, you will see that the open rates only differ
significantly 5% of the time, if you use a 95% confidence interval.

~~~
mobjack
It will only differ 5% of the time if you have an adequate sample size and
only check for significance once the sample size is reached.

If you end the test the moment the data reaches 95% significance, it will show
a difference about 50% of the time for the same email. Many people make this
mistake.

A 95% confidence interval doesn't mean much if you dont follow good
statistical practices.

~~~
closed
I don't understand why you would need a sample of a certain size. Setting a
significance threshold at 5% takes sample size in to account. For example, if
I ran a permutation test with a sample size of 5 in each group it could never
been significant at that threshold, and never is < 5%!

A small sample size would lower your power to detect meaningful differences,
which the original scenario doesn't have (by definition).

(If distributional assumptions, etc, are violated, then that's a different
story!)

~~~
jonathankoren
You need to make sure you have enough samples in order to know if you rejected
the null hypothesis by chance. Stopping your test early, is a form of
p-hacking. See:

[https://heapanalytics.com/blog/data-stories/dont-stop-
your-a...](https://heapanalytics.com/blog/data-stories/dont-stop-your-ab-
tests-part-way-through)

~~~
closed
Peeking at your data, and calculating the sample size you need for a test are
separate statistical issues. I agree that peeking messes up significance
levels :).

The point I was trying to make was you can decide to run a test with a very
small sample (e.g. n = 5), and it will still have the level of type 1 power
you set if you chose a significance level of .05.

> You need to make sure you have enough samples in order to know if you
> rejected the null hypothesis by chance.

You do this when you decide the significance level (e.g. .05). The value
needed to reject, given a significance level, is a function of sample size.

The definition of Type 1 error on wikipedia has a good explanation of this:

[https://en.wikipedia.org/wiki/Type_I_and_type_II_errors](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors)

------
hn_throwaway_99
Sorry to be so harsh, but this article is mostly BS. Booking.com built there
entire corporate culture around A/B testing, and they (i.e. parent Priceline)
were one of the top performing stocks of the 00s.

If anything, the biggest issue I've seen with A/B testing is that it biases
organizations to things that are easy to measure (i.e. shopping conversion)
while sometimes at the expense of things that are longer term/harder to
measure (like brand reputation). I'd be most interested to hear how some
companies have dealt with that shortcoming.

But all that said, one of the biggest benefits of a culture around A/B testing
is that it gets companies away from the endless back-and-forth around opinions
(like this article) and builds a culture of "OK, we'll try your suggestion and
see if it succeeds or fails".

~~~
hammock
Hi, I highly doubt booking.com's success came from CTA button colors - or any
testing on the consumer side of the platform, for that matter. Their success
is rooted in how they are able to take advantage of hotels and their
distribution power.

~~~
hn_throwaway_99
This is absolutely false, and anyone who works at booking.com will tell you
this. [https://taplytics.com/blog/how-booking-comss-tests-like-
nobo...](https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-
business/)

~~~
hammock
Because they test a lot doesn't mean it's critical to their success. That's a
cargo cult fallacy.

Your article points to 2-3x industry average conversion rates (for existing
traffic), but says nothing about the more important factors of acquisition or
inventory.

~~~
hn_throwaway_99
Except that Booking actually has data to back this all up. It's very common to
do "holdbacks" so that you, say, give an old version of your website to a very
small portion of your user's, then compare its conversion rate to a version of
your site that has all the latest A/B test winners. Booking also tracks
religiously how conversion rates change over time.

The whole point being that the way Booking (and other) companies act is the
exact opposite of a cargo cult. Everything needs to be backed up with data and
challenging conclusions is part of the culture.

All that said, a common complaint of Booking is that it has a ton of "dark UI"
patterns, so it will be interesting if there is any long term blowback against
short term A/B winners that erode the goodwill to the brand over time.

------
madrox
Every few years, I read an article critical of testing. It's always written by
a designer. It's always the same arguments, and the "solutions" are always
qualitative. They rely a lot on rationalizing after the fact. For example,
"this redesign failed because you didn't build _your website around the EXACT
words your customers might describe their problems and their solution_ "
(italics a quote from this article). Saying these things don't actually help
you arrive at better process.

At this point, all these articles get me thinking about is why they keep
getting written. Were these designers abused by bad product managers? Are they
ego-driven and don't like having their creativity reduced to quantitative
values? I don't know, but I do know that designers who talk this way tend to
be toxic to a productive culture. I've experienced that firsthand.

~~~
nostrademons
Nah, it's because they're consultants, which means they need to constantly
drum up business, which means that they need potential customers to believe
they have unique knowledge that'll help improve their business and all the
alternatives are shit. It's a sales pitch. And because they're consultants,
they don't need 80% or even 10% of website viewers to convert: they just need
a handful of customers who will each pay them tens of thousands of dollars for
services.

Controversy is great advertising: they get all the folks who hate them to help
spread the word about their services.

It's the same thing with software methodologists, gurus, and architects.
Roughly daily there's a new blog post about how [common practice] is now
considered harmful, and you need [proprietary expertise held by consultant] to
implement some other replacement instead. These posts may or may not be
helpful to your software engineering efforts, but they are certainly helpful
to the poster's bottom line.

------
aresant
What remains the #1 most straightforward CRO advice that works basically
every-time: fix your website's speed (especially mobile)!!!

In a previous life I ran a Conversion Rate Optimization as a Service business
and optimized $100s of millions of transactions for customers across
categories.

The amount of research on this topic is staggering and a lot of it dates years
back like Walmart's seminal review in 2012 (1) showing the devastating impact
of speed on conversion rate.

In case that's not enough for you Google set up not one, but TWO separate page
speed tools to help webmasters fix this chronic problem before they just threw
in the towel and took over the job for you with their AMP effort. (2)

Oh and by the way did you know that if your site speed is slow Google will
penalize your SEO? And drop your Page Quality Score index resulting in higher
PPC charges?

The best technical guide on the internet to addressing this problem is here -
[http://httpfast.com/](http://httpfast.com/) \- and a comprehensive monitoring
service built by the author here -
[https://www.machmetrics.com/](https://www.machmetrics.com/).

(1) [http://www.webperformancetoday.com/2012/02/28/4-awesome-
slid...](http://www.webperformancetoday.com/2012/02/28/4-awesome-slides-
showing-how-page-speed-correlates-to-business-metrics-at-walmart-com/)

(2)
[https://testmysite.thinkwithgoogle.com/](https://testmysite.thinkwithgoogle.com/)
and
[https://developers.google.com/speed/pagespeed/insights/](https://developers.google.com/speed/pagespeed/insights/)

~~~
stupidbird
Fun story: I pushed back on the implementation of an A/B testing platform and
instead said we should focus on improving performance and my Marketing VP
literally walked out of the meeting in anger. Infrastructure work is never
sexy, but it's so important to everything — even sales performance.

A/B testing small changes is something that really big sites like Amazon can
do because they have enough volume to justify it. It's kind of like blood
doping in sports. If you're already at the top of your game it will make
enough of a difference to be significant, but if you're just some average
person who can't run a mile... blood doping is the last thing you should do.

~~~
mikekchar
Just yesterday I was watching a video about a triathlon bike which uses a
frame setup that's illegal in normal bike racing. The question was, was it a
faster bike? The conclusion was that the bike was quite heavy and not very
stiff, so not really a great bike for normal bike racing. However, as a _time
trial bike_ where drafting is not allowed (the normal situation in
triathlons), this bike could shave off 40 seconds over 40km. Which is a pretty
huge amount, the presenter said with a grin.

He said it with a grin because it _is_ a huge amount in the context of a
competitive time trial -- like the difference between first place and 10th or
20th place perhaps. But as a percentage difference it's roughly a 1.4%
increase in speed (assuming you can maintain 50 km/h). It's practically
nothing in real terms.

In competitive cycling, though, the margins are _super_ small. You might win
the Tour de France by 2 minutes, which seems like a pretty big lead, until you
realise that's 2 minutes in 80 or 90 _hours_ of cycling. This is why Team
Sky's approach of "marginal gains" is so successful -- the difference between
first and second place is something like 0.04% performance.

We've got this idea that we need to optimise performance (in terms of SEO,
etc, etc), but I've never seen anybody quantify the margin of "victory"
required. How much better do you need to get to push you over the edge?
Because that's what's going to dictate what strategy you need to pursue.

------
anotheryou
So basically: Be bold to break out of or to stop wasting your time on tiny
local maxima. Further: Use reason to make leaps in the right direction.

------
gofor
I'm confused. The author starts by stating:

"When you focus too much on increasing the conversion of a certain page you'll
either not get any positive results or just push the problem to the rest of
the funnel, none of which, will increase your revenue btw..."

And then they launch into their "How to redesign a SaaS website in 3 weeks"
solution with:

"#1 - Focus on the money pages. Why would you focus on redesigning pages that
barely no one visits or that is not related to a conversion goal?!

You need to focus on the pages that are part of the buyer's journey from
landing on your website the first time to completing a goal..."

Am I reading this wrong or is the author contradicting himself?

------
jimmy1
What if you have done 1, 2 and 3, and now small tweaks are all you have left?

At what point do you consider an experience "optimized" and say we just can't
squeeze anything else out of this?

~~~
towelr34dy
IMO, never.

Rule #1: Perfection is attainable

Rule #2: The universe is not constant: Screen sizes change, browsers, screen
types, technology, new competitors come up changing people's expectations,
people's tastes change, you expand into new demographics/demos/sales channels
which react differently than existing demographics/sales channels, etc.

Ideally this work is done with a marketing/programming/operations team all in
one, so all aspects are considered, with their trade offs, to forward on a
collective understanding of solving user pain points.

~~~
jimmy1
I am not speaking optimized in terms of perfection I am talking about
optimized in terms of revenue generation.

At a certain point your offering is it's offering, people found the maximum
value out of it, and making it easier to use or testing different ways of
delivering that functionality is just adding on lipstick.

At what point does it make sense to pursue other revenue models or strategies
rather than squeezing out another 1-3%

I am bringing this up for a reason -- my company is going through this very
thing

------
andyidsinga
I think the section on customer acquisition cost really resonates with me.

The corollary, I suppose, is that once you have some customer traction,
looking at where those costs are is where you want to spend time with
optimization. However, this begs some questions:

If I'm spending $500 for a $5000 customer - should I change anything at all to
drive down that $500?

...which leads to: At what CAC:Cusotmer Value ratio should I start trying to
drive down that cost relative to other costs in the biz?

Anyone have any real experiences from their biz' they can share on this
subject?

~~~
pedrocortes
Thanks, glad you liked it. This it's something I've noticed people need to
come back too when they focus too much on conversions.

To answer your question that totally depends because in the end what you'll
need is cash flow to maintain that reinvesting process and that's something
(finances/management) I'm not an expert in :P

~~~
andyidsinga
Thanks for chiming in here. I've been reading some of your others articles. I
like the very practical style and results focus.

