

Our homegrown A/B testing framework - darrennix
http://blog.42floors.com/behind-scenes-ab-split-testing/

======
patio11
Awesome to see more ways of solving this problem. I usually wouldn't reach out
to recommend that one to clients because the maintenance costs of doing
parallel development on three products steadily diverging from each other get
quite considerable indeed, but if it works for your circumstances than awesome
for you.

If hypothetically you wanted to try this with A/Bingo without making in-line
calls everywhere, it's possible, but not documented anywhere. I treat the
knowledge like a spell to summon the Elder Gods: too terrible a magick gained
at too high a price to destroy outright, but it will certainly be buried under
a forgotten temple and guarded by fanatical cultists (and maybe a dragon) to
keep it from destroying the minds of the unwary.

Just kidding: it's actually just "Monkeypatch the Rails internals to change
the bit which finds templates from looking in
/app/views/$RAILS_CALCULATES_THIS_PART to
/app/ab_tests/$YOU_CALCULATE_THIS_PART/$RAILS_CALCULATES_THIS_PART." (Though
sometimes when looking at that code I wish for Cthulu's sweet embrace.)

I hear you though if you also want to do extensive logical reworking in
addition to reskinning substantially all the pages of the site. The one time I
did this using A/Bingo, in lieu of doing a total app rearchitecture, I made a
utility method to encapsulate the A/Bingo call and then hit the 10ish points
in the code base which were going to be shared but not totally equivalent
between the A and B variants. A bit of a pain in the keister for a Saturday
morning but the test raised sales by 60% so I'll put it in the win column.

~~~
xionon
Disclaimer, I've not used A/Bingo, so there might be a caveat I'm missing.
But: It sounds like you're talking about prepend_view_path, which doesn't
require monkey patching.

    
    
      class ApplicationController < ActionController::Base
        before_filter ->() {
          prepend_view_path "app/views/YOU_CALCULATE_THIS_PART"
        }
      end
    

With that, Rails should look for e.g. "products/index.html.erb" first in
"app/views/YOU_CALCULATE_THIS_PART/products/index.html.erb", and only if it
doesn't find it there, it looks in "app/views/products/index.html.erb"

It seems like you could do something like this:

    
    
      class ApplicationController < ActionController::Base
        before_filter ->() {
          if cookies[:test_a]
            prepend_view_path "app/views/test_a"
          elsif cookies[:test_b]
            prepend_view_path "app/views/test_b"
          end
        }
      end
    

The neat thing about prepend_view_path is that if it doesn't find a matching
file in the prepended path, it falls back to the default rails paths. This
makes piecemeal upgrades or redesigns much easier.

------
ameister14
I like the idea and it clearly worked for you, but there's a problem.

You're not controlling your variables. You're changing 20 things at once, and
then doing it again to get wildly different tests.

That's cool, but you're not going to get the same insight into WHY people
convert better until you start controlling your tests more.

I get why you're not doing that, you want to change big things drastically and
see what happens, but you're kind of shooting in the dark that way. It might
work; it might not.

Even on the current test you're running, there are multiple different elements
between a and b. You can do that, but to do it well you'd need to have
a/b/c/d/...so that you can figure out if it's one change, a combination of
changes, or what.

~~~
gingerlime
For me that's one of the biggest mistakes someone can make when looking at
doing A/B tests. This desire to understand the universe gets in the way of
improving your website / app / business / bank balance. Yet, it's a common
pitfall for many.

What's the purpose of those tests? To make you a better UX specialist? To
scientifically prove a theory? Or to get the conversion rate up??

~~~
ameister14
The purpose of the test is to figure out what direction to go in in order to
improve the conversion rate.

If I understand you right, you question the reason you generally control the
variables and want to figure out why something converts better than something
else and not just observe the results.

Well, firstly, if you figure out what exactly people like about one design
versus another, it's easier to reproduce a higher conversion rate in other
parts of your site. It also goes towards a further understanding of your
customer and what exactly they want from you. If you have a bunch of services
on a page and you change several of the offerings at once, as well as the
content describing the services and the pictures displayed with them,
observing a jump in conversion as you do so, what does that tell you?

Do people not want some of the services you offered before? Not necessarily.
Do they like the new photos, or does the copy better explain the services? Or
is it that they like some of the new services listed? Do they like all of
them, or just like some so much it doesn't matter that the others are there?

What if you change your price at the same time you change the design of your
page and see conversion drop. Was it wrong to change your price, or was the
redesign wrong?

Controlling your variables makes it so that you don't have to go back to
answer all these questions. You get a better gradual understanding of your
customer and can more accurately predict performance changes based on future
redesign, making you more likely to increase conversion with each new test.

~~~
gingerlime
That's well and good in theory. In practice, it's an excruciatingly long
process (unless you at a massive scale already) and you're only likely to
achieve local maxima. Whereas big sweeping changes, which might change more
than a single feature but the whole _message_ , are more likely to give you a
conversion boost.

Once you find out what works and produces a big win, then you optimize more
gradually and try to figure out what in particular it is. But it's usually not
that important at that stage. Your big changes will reveal the stuff that's
important to your customers.

~~~
ameister14
I agree; shot in the dark big changes can be very useful to give you a
starting point to optimize from. I wouldn't say they are more likely to give
you a conversion boost, though.

Why would that be the case? Let's say I design my product and page. It's well
thought out and I put some time into it, and the UX isn't bad. Why would
changing that drastically be more likely to give me a boost than small
changes?

Now, the shift might be larger with big changes, that I'd agree with. With a
small change you're talking about small increases leading to an aggregate
large increase. Then again, if you're established, you're also not risking
your customer base.

I'd agree that you can go with this as a concept at the start, I just think,
as it appears you do, that after you've established a good base you need to go
more controlled from there.

------
weixiyen
> The reason why we put so much effort into split testing is that we’re trying
> to find the global maximum. We worry that making linear iterations will lead
> us to a local maximum that would be far less than our potential. So, we
> force ourselves to try ideas that are radically different from past
> experiments. At some point we’ll probably run out of crazy ideas and then
> we’ll settle into optimizing the winning UX instead of trashing and
> rewriting it periodically.

This article is really good. Way too many companies focus on A/B testing an
existing design, and have no way to actually test something radically
different easily. It's interesting to see 42floors take A/B testing to this
level.

I believe that in order to branch quickly into these type of iterations, your
software architecture plays a huge role. The idea is to change the minimum
amount of code necessary in each "branch" to get the job done.

There are probably ways of doing this that doesn't involve splitting traffic
by server and also avoiding nasty code.

------
jdlshore
@darrenix, I see that your sample sizes are pretty low, and your actual
conversions are incredibly low. Are you controlling for statistical
significance, and are you accounting for repeated significance testing errors?
[1]

Genuinely curious, not trying to shoot down what looks like an interesting
approach.

[1] [http://www.evanmiller.org/how-not-to-run-an-ab-
test.html](http://www.evanmiller.org/how-not-to-run-an-ab-test.html)

~~~
RA_Fisher
It's possible to work with these sample sizes using survival analysis, but I
doubt Bayes would find the difference even with a really well initialized
prior. I don't think that traditional significance testing is the answer here,
those require even larger sample sizes (b/c you're not limiting the space of
the estimate). It's all about limiting the domain-space. You know conversion
rates are never over 10% on your homepage, why would you not exclude that to
give your statistical work more power? That's at least how I think about it.

~~~
jdlshore
It sounds like your stat-fu is way better than mine. :-) Is there a layman-
friendly article you can point to that explains what you're talking about in
more detail?

------
biesnecker
Great results, but

> "While it makes the eventual git merge a nightmare when the branches
> diverge, a day of wrestling with merge conflicts is a small price to pay for
> unfettered experimentation."

terrifies me. It sounds like the sort of thing that makes a lot of sense until
one day it doesn't and you're left with a massive bill of work to reconcile
two very different codebases, all for the sake of UI experimentation.

~~~
RA_Fisher
Exactly. A big risk to pay for avoiding inline code. Although I do hate our
own inline code (even though it's removed diligently when not used).

------
RA_Fisher
I'm in a similar position in the sense, but because the turn around on our
experiments can be super fast. Instead of using a Chi-square p-value type
test, or even a Bayesian implementation (like Beta: alpha = 5, beta = 100),
that survival analysis gives the profile of the difference.

I'm in a similar position because we did choose to pollute our code base with
inline experiments (which we shut down and remove). However, ideally I want
these experiments to "run" forever once the winner is installed. This is
because I can always compare it back to the survival curve of the 'control'
group in the experiment (and test group) to make sure it's behaving as
expected.

So my problem is, how do I escape from inline and yet continue to record
everything afterwards?

Been thinking a lot about it.

------
teleclimber
Very cool.

> Ongoing experiments have a dedicated chart up on a monitor in the lunch area
> so everybody can see how they’re doing.

Is your tracking and reporting (charts) system home-grown too?

~~~
darrennix
We use Chartio for all our long-lived reporting but we use custom charts for
A/B test reporting since we need to combine data from several sources in one
chart e.g. Twilio, Mixpanel, GA, local DB.

