Hacker News new | past | comments | ask | show | jobs | submit login
A/B Testing With Limited Data (elem.com)
52 points by btilly on Feb 12, 2013 | hide | past | favorite | 20 comments



This is part of a planned series. See http://news.ycombinator.com/item?id=5112670 for the initial plan. This is probably the most important in the series, since it addresses a very practical and common problem. That plan likely will change - I've received a few random additional ideas, and I really think that at some point I should address the question of how much I've been giving up with the simplification of only looking at conversions. (Short version: depends on conversion rates. If absolute conversion rates are low, then not much. If conversion rates are high, then a more complex analysis is significantly better. But most websites have pretty low conversion rates so I haven't given up much.)

However I would not recommend holding your breath too eagerly for the next one. According to a back of the envelope calculation, the opportunity cost on each article has been about $4k for me. I think I know what I want to write, then I begin adding things, removing things, play around with extra simulations, decide to change subtle details, etc. I think I know what I want to write for the next one as well, but I said that the last two times. Therefore I should assume that the same thing will happen the next time.

I want to do this series. So it will happen eventually. But I can't afford to pump them out quickly.


> According to a back of the envelope calculation, the opportunity cost on each article has been about $4k for me. I think I know what I want to write, then I begin adding things, removing things, play around with extra simulations, decide to change subtle details, etc.

Out of curiosity, is that rough guess with or without the value of everything you're learning in the second sentence?


That figure is based on, "Here is how much I didn't get paid that I could otherwise have expected to be paid by the contract that I let slide so that I could work on this article."

So no, I'm not valuing my increase in knowledge. And am not valuing the publicity that these articles have received. (Though the fact that I have people ready to pay me for my time suggests that I don't need marketing at the moment.) It is a look at where my bank account stands right now relative to where it would have stood if I had not chosen to write these articles.


Ah. That's a reasonable way to calculate it.


I really love these articles. You're solving very practical problems with a depth of statistics that most people never appreciate. It's really great work.

I'm sorry to hear that you feel the opportunity cost pain so severely. It might not be worth it to you in the larger scheme of things, but I think you could definitely package up and sell an expanded version of this series that spends more time elaborating the value proposition and provides some simplified decision processes for various common situations. More Fisher than Jaynes.


For me, the difficulty of writing goes up nonlinearly with the length of the thing that I am trying to write. If I told my wife that I was writing a book, she'd want to kill me. And I wouldn't blame her for that.

This subject is particularly hard, because I'm trying to require as little background as possible, on a topic that normally requires much more.


I wouldn't think a book is necessary. A short "pamphlet" basically could be really great for (a) convincing people that statistics can be done more flexibly than they might think and (b) showing a number of case examples as to how to do it.

The value of further published material might be improved by this kind of publication as well.


Agreed, I'd be in the market for such a book.


This series of articles is just screaming for a Bayesian approach, rather than this ad hoc approach.


That approach was discussed by Anscombe, and I wrote up a summary in the Custora Blog. However just because an approach is frequentist or 'ad-hoc' does not necessarily mean that there is anything wrong with it. The bayesian approach requires making assumptions about the number of visitors to your site after you stop the test, which isn't really any less adhoc than picking an error cut off.

http://blog.custora.com/2012/05/a-bayesian-approach-to-ab-te...


I like that article, but have one major qualm about it. Everything that you do in a Bayesian model depends on the prior. Yet you often see - as there - someone tell you, "Here is the rule to use" but without telling you the prior.

However the prior actually matters. For instance when you look at what Nate Silver did, most of the mathematical horsepower went to determining a really good prior to use based on historical data. And armed with that he both can and does make inferences. (Which he's willing to publish.)

That said, the Bayesian approach is conceptually so much better that Bayesian with a questionable prior can be better than a frequentist approach.

Finally the fact that a Bayesian approach needs a somewhat arbitrary planning horizon does not particularly bother me. Financial theory tells us that businesses really should apply a discounting factor to future projected income, and when you apply an exponentially decaying discounting factor, the weighted number of future visitors generally comes out to a finite number. And yes, there are a lot of arbitrary factors in how you get to that number. But you can generally do it in a reasonable enough way to be way less sloppy in your A/B test than every other part of the business is. Heck - you can just say that your planning horizon is 1 year, and use the expected number of visitors in that time as a cutoff.

Anyways I'd like to eventually get into this kind of issue with this series. But whether I can, I don't know. It certainly will be hard if I keep on trying to pitch it to the level of mathematical background that I've been aiming for so far.


It is no less ad hoc, but it is clearer. Instead of making assumptions about the right p-values, you make assumptions about real world quantities.

And if you have actual data/projections about future visitors, it is less ad hoc.


If you look through the proposed plan, you'll find that I do intend to show Bayesian approaches. But as I get there, I want to explain the trade-offs.

In particular the choice of a Bayesian prior is as arbitrary as anything that I've done so far. And how you do it hides some important implicit assumptions. That said, a Bayesian framework does have substantial conceptual advantages over the frequentist approaches that I've used so far.


At my job we use a Bayesian approach, or at least we've started using it. I'll ask around to see if they'll let me publish it.

The Bayesian approach is indeed quite simple and understandable.


> It is nowhere near significance. Stopping it is completely wrong under the procedure we're using, we'll have no idea whether we're making the right decision!

I don't know anyone who runs an experiment until they find "significance". That's comparable to only reporting outliers. What if the variable you are testing is simply not causal? That's why you pick the sample size before running the experiment, based on the minimum causal effect you want to detect.


People don't as a practical matter because they simply can't. But a lot of people just haphazardly give up, and have little sense of what kinds of errors they may or may not be making. (But they hope that the errors are not too big.)

The main point of this article is to give people a sense of what a more organized "giving up" curve could look like. And also to give them some much more concrete information about what their potential for serious error is.


I recently wrote a blog post about how we use Bayesian A/B testing for front page headlines. http://visualrevenue.com/blog/2013/02/tech-bayesian-instant-...


Thanks for doing this series! I read 'How Not To Run An A/B Test' recently and was struggling with the implications.

My only advice:

body { font-family: Georgia, serif; line-height: 1.5; margin: 0 auto; max-width: 640px; }


Does anyone have a recommendation for a good book that covers this sort of thing (A/B testing) in detail?


I have read a couple of books, but have not been particularly impressed on the technical side.

But if you want a longer presentation about how to do A/B testing, with various gotchas, development considerations, and so on, http://elem.com/~btilly/effective-ab-testing/ is a tutorial that I did at OSCON a few years ago. (Be warned, it is long and divided into sections. Different sections are aimed at different people in a business.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: