
How Coderwall grew over 50% with an A/B test - ryanb
http://www.rankscience.com/coderwall-seo-split-test
======
birken
This is a very good basic A/B test, and while it is certainly way better than
nothing and something that a lot of websites should try, I think there are
some important caveats that I'm thinking of that aren't mentioned in the
article:

1\. How were the experiment groups chosen? You have to be really careful with
SEO tests because some pages might get 100x the traffic of another page, so
saying you split up 20,000 pages randomly isn't enough for a test like this.
It is only meaningful if you split up 20,000 pages that had similar traffic
profiles and were getting enough aggregate traffic to be able to notice an
increase.

2\. SEO tests take a long time so if you are doing it, I'd recommend more than
2 variations. You need to put the test up, wait for Google to index it, then
wait a few weeks to see how the traffic changes. So since your turn-around
time is at least a few weeks and maybe longer, try 4 or 8 variations if your
traffic can support it.

3\. I prefer my A/B tests to be a little more crazy, or at least try a crazy
variation among more normal ones. In my experience the biggest gainers (and
also the biggest losers) are ideas that seem crazy. Getting your feet wet with
adding "(Example)" to the title is fine, but also try a crazy variation that
is completely different and seeing how that does. Give yourself a chance to be
surprised by your audience so you can learn more about them. And if the crazy
variation loses by a lot, you have learned something important even if it
isn't a winner.

4\. The facts are a little dubious here. Traffic is up 50% over a timespan...
there is no control. Traffic generally goes up for growing websites even if
you don't do anything. You should also show your work on the 14.8% increase
and give us some error bars. You say it is significant, how did you calculate
this? It does seem like the test was better, but it is also important to make
valid claims about it. Intellectual honesty when running A/B tests is _really
really_ important [1].

1: [http://danbirken.com/ab/testing/2014/04/08/pitfalls-of-
bias-...](http://danbirken.com/ab/testing/2014/04/08/pitfalls-of-bias-in-ab-
testing.html)

~~~
llambda
> You have to be really careful with SEO tests because some pages might get
> 100x the traffic of another page, so saying you split up 20,000 pages
> randomly isn't enough for a test like this.

Is this actually true? Shouldn't the fact you're choosing your pages from a
very large set of pages negate this concern? Yes, you might have outliers, but
your method of deriving significance should account for this.

Given that we don't know the specifics of the tests (this is a case study
after all and we should expect details to be elided), I'm not sure it's fair
to opine as you are on their quality. I would guess a company focused on this
specific problem is both aware of and actively applying all the best practices
you've mentioned.

~~~
birken
I can't speak for every website, but it is my experience that traffic to
diverse sets of landing pages follows much more of a power law distribution
than anything close to constant. This was true at Thumbtack (where we did
these types of tests regularly), and this is true of basically all of the SEO
sites I run.

For example, one of my sites (champsorchumps.us) has a landing page for every
pro sports team. Here is a graph of the traffic to all of those landing pages
in the past month I just pulled from GA:
[http://imgur.com/cB2igLc](http://imgur.com/cB2igLc)

Note: The content on each landing page is basically the same (of course with
different data for different sports teams), but as you can see the traffic to
each page is vastly different.

The top page gets nearly 25% of the traffic. Most of the pages get little or
no traffic. If you consider the top page, traffic on that particular page goes
up and down randomly all the time. If I have that page in a bucket with other
pages for an SEO title test, and its traffic happens to randomly go up by 50%,
the variance from that page alone might be equal to a naive "significant
result" for the whole test.

You can still run a title A/B test (something I do on basically every site I
run), but you have to _be thoughtful_ about it. You have to consider the
buckets carefully and consider what gains would be significant before you run
it.

I'm not suggesting the authors of the post didn't think about it. Maybe they
did. However I've talked to a lot of people about A/B testing and most of them
don't. The problem is that all of these A/B testing posts always _yada yada_
over the important parts of running A/B tests, proper setup and impartial
analysis, and shoot straight to whatever variations they used and talk about
the huge gains. So when people read them they think all they need to do is
come up with some fun variations and boom, they are going to get huge gains.
It just isn't true. If you aren't careful with A/B tests you can just as
easily move backwards than forwards if you skip over the important parts of
creating a test.

~~~
blahi
They use counterfactuals, so they are comparing apples to apples. There are
other ways to do it. One of the easiest, yet most effective way is matching.
You can match your observations with various statistical models, k-means would
do a very good job here, and then run a regression. Compare parameters and you
are done. It's not a rocket science.

------
wdewind
> The number of users clicking on Coderwall from Google increased 14.8% (yes,
> this was statistically significant).

Would be very curious how they calculated significance here.

~~~
samscully
You can calculate the significance using the difference in differences method
popular in econometrics. I derived this result for a previous employer and it
worked well. I can look up my notes and explain the method in more detail if
anyone is interested.

~~~
wdewind
Yes please

~~~
samscully
Found my notes on difference in differences method. First model the sessions
originating on page [i] at time [t] as:

    
    
        S[it] = b0 + b1 * X[it] + b2 * A[i] + b3 * D[it] + e
    

Where:

    
    
        S[it]: number of sessions originating on page [i] at time [t]
        b0: base traffic for all pages
        A[i]: base traffic for page [i]
        X[it]: seasonal traffic for page [i] at time [t]
        D[it]: dummy variable, 1 for after treatment 0 for control and before treatment
        b3: effect size of treatment
        e: noise/error term
    

Taking the average over time:

    
    
        E[S[i]] = b0 + b1 * E[X[i]] + b2 * A[i] + b3 * E[D[i]] + e
    

Take the average from S[it] (the first "differences"):

    
    
        S[it] - E[S[i]] = (b0 - b0) + b1(X[it] - E[X[i]]) + b2(A[i] - A[i]) + b3(D[it] - E[D[i]]) + e
    

which simplifies to:

    
    
         S[it] - E[S[i]] = b1(X[it] - E[X[i]]) + b3(D[it] - E[D[i]]) + e
    

Split all pages [i] in to control and treatment groups 0 and 1. Take the
"difference in differences" of the two groups:

    
    
        (S[1t] - E[S[1]]) - (S[0t] - E[S[0]]) = b1(X[1t] - E[X[1]] - X[0t] + E[X[0]]) + b3(D[1t] - E[D[1]] - D[0t] + E[D[0]]) + e
    

Given D[0t] = E[D[0]] = 0, and for a large number of pages X[it] - E[X[i]] =
X[jt] - E[X[j]]:

    
    
        (S[1t] - E[S[1]]) - (S[0t] - E[S[0]]) = b3(D[1t] - E[D[1]]) + e
    

Which gives a formula in the form of y = mx from which we can calculate the
effect size b3 and the p value using linear regression.

The key assumption is that X[it] - E[X[i]] = X[jt] - E[X[j]] for large numbers
of pages. I found this assumption does hold in practice. I'm not a
statistician so forgive me if I've made any statistical errors but I think
this analysis is correct.

I might write up a blog post at some point to explain it in more detail.

~~~
wdewind
Thank you!

------
JabavuAdams
I don't understand the "delta between test and control" figure. Can anyone
explain?

~~~
jdpigeon
I'm confused as well. Pretty poor figure design without axis labels

------
samfisher83
But why did people click more often when (Example) was put in there?

~~~
ryanb
Good question -- we think it's because people searching for programming
related help are lost and want clear, simple answers to their problems.

Which one of these results would you click on? (query: mysql split string)

[http://i.imgur.com/CWxs0dK.png](http://i.imgur.com/CWxs0dK.png)

------
ssharp
I don't really understand the methodology on this test. How are you getting a
fair control/variant split in Google SERPs?

~~~
zck
They took 20,000 pages, split them into test and control groups, and changed
the title of questions in the test group from "$QUESTION - Coderwall" to
"$QUESTION (Example) - Coderwall". They then measured how many more clicks the
test group got compared to the control group.

------
ourmandave
I love a good click-bait title.

What exactly grew and how much is 50%? Are you getting 15 hits per month, up
from 10?

------
voycey
Or they realised that coders love to copy and paste ;)

------
eonw
not a very useful or informative article. correlation is not causation and
many things could make you move up the SERPS in addition to a title change.

~~~
ryanb
There wasn't a sudden increase in rankings - this test showed CTR from Google
increased dramatically with the change. When it was rolled out to the entire
site, clicks went up, and over many weeks impressions/rankings followed.

~~~
eonw
but the algorithm also changes, so that could have also been part of it(or
your competition made changes), if you dont control all the variables its
flawed, IMO. Downvote all you want. I've seen 1000%+ increase without changes,
so by that math, could I say that not changing anything at all can increase
your CTR by that percentile?

