

Confidence.js – make sense of your A/B test results - bvanvugt
https://github.com/sendwithus/confidence

======
dsl
This is awesome and solves a huge problem.

Every startup constantly gets told to A/B test everything, but nobody ever
tells them that startups rarely have enough initial traffic to do any
meaningful testing. As a result you end up spending a lot of work on what is
effectively a random landing page or email template that might actually be
worse than what you started with.

~~~
kylerush
Low traffic levels do not mean that you cannot do any meaningful testing. It
means that you cannot reliably detect subtle effects in the conversion rate.
Whereas Google may be able to reliably detect a 0.005% effect in the
conversion rate, a small startup may only be able to reliably detect a 60%
effect in the conversion rate. A small startup should arguably not be chasing
after small effects in the conversion rate anyway.

Evan Miller has a great sample size calculator to illustrate this:
[http://www.evanmiller.org/ab-testing/sample-
size.html](http://www.evanmiller.org/ab-testing/sample-size.html)

In the example above, Google's MDE is 0.005% and the small startup's MDE is
60%. If the small startup's baseline conversion rate is 5%, then with 95%
confidence and 80% statistical power, they can reliably detect a 60% effect
(positive or negative) with 1,968 visits.

It's not just the 60% increase that you should be mindful of. I bet you all
small startups want to know if the change they made to their homepage
decreases the conversion rate by 60% or more. With an a/b test, they can do
that.

Another point to keep in mind is that you can learn a lot from a statistical
tie (not enough data to conclude there is a difference). In fact, no matter
what your traffic levels are, most of your experiments will be a statistical
tie. It's important to learn what doesn't work just as it's important to know
what does work. This can really help you with prioritization. In the example
above, the small startup can use the statistical tie to draw the conclusion
that some tasks in their product roadmap will result in less than a 60% effect
in the conversion rate and can be prioritized as such.

But ultimately it will be quite difficult for the small startup to produce
variations that have at least a 60% effect in the conversion rate, but it's
not unheard of and again, you can learn from a statistical tie.

Just because you have low levels of traffic doesn't mean you can't learn
anything from a/b testing.

~~~
darkxanthos
Yup. When you have a small amount of traffic this logic really just points out
your primary goal should be either getting more traffic or only designing
tests that should have a large effect.

Some people reply to this by saying "Well then we'll just ship it and measure
results without testing." That is a test... but it's poorly designed. If you
can't see the improvement in a controlled experiment, it's rare you'd be able
to see noticeable improvement in before/after.

------
jessicaraygun
Hey! I'm the author and I'd be happy to answer any questions you might have.

Confidence.js is based on the A/B testing code that we use at sendwithus.

A/B testing math is hard and we've worked really hard on making this great.
Let us know what you think!

~~~
teleclimber
Thank-you for sharing this.

Can you recommend a resource that explains the Math for people who slept
through statistics class?

I actually need to brush up on this topic for a new project so anything you
can recommend would be greatly appreciated.

Thanks!

~~~
jessicaraygun
Stats class R sleepy... Real world applications like A/B testing are
definitely more interesting.

We did a lot of digging and there aren't a lot of resources that are easy to
digest - but this one really stood out to us:

[http://visualwebsiteoptimizer.com/split-testing-blog/what-
yo...](http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-
need-to-know-about-mathematics-of-ab-split-testing/)

I'd love to write a summary of everything I've learned - maybe a blog post in
the near future!

------
peter_mcrae
Awesome work! Will look at replacing some hand rolled stuff internally. One
gap I see with a lot of A/B testing analysis is that it only solves for
conversion. While conversion is a great metric for many tests, in my
experience, revenue is often the metric that matters most. Whether it's
traditional eCommerce or selling tiered subscriptions, a lot of testing is
geared towards 1) getting the customer to buy and 2) getting them into a more
expensive product or plan. In the subscription scenario, some sort of customer
life time value model is even better. I don't pretend to know all the math,
but the calcs I've seen focused on revenue (AOV * conversion) need order level
data (as opposed to aggregate) so it's not as easy to solve generically.

~~~
jessicaraygun
You're right about revenue being a very important metric, and I think
optimizing for conversions has its place too. I would be interested to hear
the results of your repurposing!

------
cleverjake
Was expecting this to actually be
[https://github.com/spumko/confidence](https://github.com/spumko/confidence),
from WM Labs. Strange both would choose the same name for A/B testing

