

Easy A/B testing at Nextdoor - kilimchoi
https://engblog.nextdoor.com/2015/07/09/ab-testing//

======
birken
Wow this is like stepping into a time warp for the early days at Thumbtack.
Our A/B testing bucketing and methodology was extremely similar to this.

Things we learned:

> Our guiding principle has been ease and accessibility: when testing is
> effortless, more people ship more tests, and our product steadily improves

Thumbs up. Whenever we made running tests easier, we'd end up running more
tests.

> Ramp the number of exposed subjects up or down

You don't want to do this. If there are time based effects (people behave
differently on weekends vs weekdays, daytime vs nighttime, etc), changing the
ratio on-the-fly can mess up your data. If you need to roll back an experiment
because it is causing a bug or something, roll it back, fix it, then restart
it. But don't change the ratios on the fly.

> and whether the differences were statistically significant

It looks like you are starting/ending your tests based on time. This is good,
much better than cherry-picking results at arbitrary intervals, but it might
be slowing you down. Another method is to define your sample size threshold
before your test (using something like Evan Miller's sample size calculator
[1]), then when your test hits the threshold you can end it based on that. We
discussed and picked a company-wide power and significance level to try to
balance moving quickly with acceptable errors from testing.

> P-value of 0.045 in your test sample

I ran your numbers through ABBA [2], the open sourced split test calculator
that one of Thumbtack's engineers made, and ABBA puts your P-value a little
over 0.08 (Evan Miller's tool [3] agrees with Thumbtack's). Big difference
between 0.08 and 0.045. There are different statistical methods you can use to
evaluate split tests, but we ended up preferring the one underlying ABBA
(which you can read about on the ABBA page).

1: [http://www.evanmiller.org/ab-testing/sample-
size.html](http://www.evanmiller.org/ab-testing/sample-size.html)

2:
[https://www.thumbtack.com/labs/abba/](https://www.thumbtack.com/labs/abba/)

3: [http://www.evanmiller.org/ab-testing/chi-
squared.html](http://www.evanmiller.org/ab-testing/chi-squared.html)

