
A/B Testing With Limited Data - btilly
http://elem.com/~btilly/ab-testing-multiple-looks/part2-limited-data.html
======
btilly
This is part of a planned series. See
<http://news.ycombinator.com/item?id=5112670> for the initial plan. This is
probably the most important in the series, since it addresses a very practical
and common problem. That plan likely will change - I've received a few random
additional ideas, and I really think that at some point I should address the
question of how much I've been giving up with the simplification of only
looking at conversions. (Short version: depends on conversion rates. If
absolute conversion rates are low, then not much. If conversion rates are
high, then a more complex analysis is significantly better. But most websites
have pretty low conversion rates so I haven't given up much.)

However I would not recommend holding your breath too eagerly for the next
one. According to a back of the envelope calculation, the opportunity cost on
each article has been about $4k for me. I think I know what I want to write,
then I begin adding things, removing things, play around with extra
simulations, decide to change subtle details, etc. I think I know what I want
to write for the next one as well, but I said that the last two times.
Therefore I should assume that the same thing will happen the next time.

I want to do this series. So it will happen eventually. But I can't afford to
pump them out quickly.

~~~
tel
I really love these articles. You're solving very practical problems with a
depth of statistics that most people never appreciate. It's really great work.

I'm sorry to hear that you feel the opportunity cost pain so severely. It
might not be worth it to you in the larger scheme of things, but I think you
could definitely package up and sell an expanded version of this series that
spends more time elaborating the value proposition and provides some
simplified decision processes for various common situations. More Fisher than
Jaynes.

~~~
btilly
For me, the difficulty of writing goes up nonlinearly with the length of the
thing that I am trying to write. If I told my wife that I was writing a book,
she'd want to kill me. And I wouldn't blame her for that.

This subject is particularly hard, because I'm trying to require as little
background as possible, on a topic that normally requires much more.

~~~
tel
I wouldn't think a book is necessary. A short "pamphlet" basically could be
really great for (a) convincing people that statistics can be done more
flexibly than they might think and (b) showing a number of case examples as to
how to do it.

The value of further published material might be improved by this kind of
publication as well.

------
jules
This series of articles is just _screaming_ for a Bayesian approach, rather
than this ad hoc approach.

~~~
aaronjg
That approach was discussed by Anscombe, and I wrote up a summary in the
Custora Blog. However just because an approach is frequentist or 'ad-hoc' does
not necessarily mean that there is anything wrong with it. The bayesian
approach requires making assumptions about the number of visitors to your site
after you stop the test, which isn't really any less adhoc than picking an
error cut off.

[http://blog.custora.com/2012/05/a-bayesian-approach-to-ab-
te...](http://blog.custora.com/2012/05/a-bayesian-approach-to-ab-testing/)

~~~
btilly
I like that article, but have one major qualm about it. Everything that you do
in a Bayesian model depends on the prior. Yet you often see - as there -
someone tell you, "Here is the rule to use" but without telling you the prior.

However the prior actually matters. For instance when you look at what Nate
Silver did, most of the mathematical horsepower went to determining a really
good prior to use based on historical data. And armed with that he both can
and does make inferences. (Which he's willing to publish.)

That said, the Bayesian approach is conceptually so much better that Bayesian
with a questionable prior can be better than a frequentist approach.

Finally the fact that a Bayesian approach needs a somewhat arbitrary planning
horizon does not particularly bother me. Financial theory tells us that
businesses really should apply a discounting factor to future projected
income, and when you apply an exponentially decaying discounting factor, the
weighted number of future visitors generally comes out to a finite number. And
yes, there are a lot of arbitrary factors in how you get to that number. But
you can generally do it in a reasonable enough way to be way less sloppy in
your A/B test than every other part of the business is. Heck - you can just
say that your planning horizon is 1 year, and use the expected number of
visitors in that time as a cutoff.

Anyways I'd like to eventually get into this kind of issue with this series.
But whether I can, I don't know. It certainly will be hard if I keep on trying
to pitch it to the level of mathematical background that I've been aiming for
so far.

------
msellout
> It is nowhere near significance. Stopping it is completely wrong under the
> procedure we're using, we'll have no idea whether we're making the right
> decision!

I don't know anyone who runs an experiment _until_ they find "significance".
That's comparable to only reporting outliers. What if the variable you are
testing is simply not causal? That's why you pick the sample size before
running the experiment, based on the minimum causal effect you want to detect.

~~~
btilly
People don't as a practical matter because they simply can't. But a lot of
people just haphazardly give up, and have little sense of what kinds of errors
they may or may not be making. (But they hope that the errors are not too
big.)

The main point of this article is to give people a sense of what a more
organized "giving up" curve could look like. And also to give them some much
more concrete information about what their potential for serious error is.

------
jeroenjanssens
I recently wrote a blog post about how we use Bayesian A/B testing for front
page headlines. [http://visualrevenue.com/blog/2013/02/tech-bayesian-
instant-...](http://visualrevenue.com/blog/2013/02/tech-bayesian-instant-
headline-testing.html)

------
bbrooks
Thanks for doing this series! I read 'How Not To Run An A/B Test' recently and
was struggling with the implications.

My only advice:

body { font-family: Georgia, serif; line-height: 1.5; margin: 0 auto; max-
width: 640px; }

------
sethev
Does anyone have a recommendation for a good book that covers this sort of
thing (A/B testing) in detail?

~~~
btilly
I have read a couple of books, but have not been particularly impressed on the
technical side.

But if you want a longer presentation about how to do A/B testing, with
various gotchas, development considerations, and so on,
<http://elem.com/~btilly/effective-ab-testing/> is a tutorial that I did at
OSCON a few years ago. (Be warned, it is long and divided into sections.
Different sections are aimed at different people in a business.)

