

A/B Testing iOS Apps Like a Pro - lukabratos
http://lukabratos.me/blog/2014/01/14/a-slash-b-testing-ios-apps-like-a-pro/

======
umanwizard
Neat project. A few concerns I have:

1) ideally you would be able to measure change in every metric, not just ones
you whitelist for a specific experiment. What if adding one feature changes
how people interact with a completely different feature? You would want to
know about this.

2) just showing change without any sort of hypothesis testing is just begging
for people to draw unfounded conclusions from the results. Instead of a vague
note that more than 100 sessions is necessary to get significance, you need to
have real confidence intervals at the very least.

~~~
RA_Fisher
I'm a data scientist and I actually think this is a smart way to go.

The author could have implemented a simple Chi-square test and gotten CIs. The
problem is that conversion rates are usually < 6% and that means you'd have to
have a MASSIVE sample size to detect a difference.

Our basically Type II error is much more important than typical statistical
applications. Our statistical power is super important.

The author could implement Bayesian statistics with a Beta distribution prior
initialized with alpha = 3, beta = 100 (mimicking a 3% conversion rate). The
results would be robust to this prior information. The problem is that there
is no closed-forum likelihood solution. This means you need to use Markov
Chain Monte Carlo simulation. Web servers don't like that.

In my experience, if you see a nice 10% boost in conversion rate (conv. b /
conv. a) after some representative period of time like a few days, you should
just go with that result.

In that way, you don't ignore what's smacking you in the head. "The
implementation had a higher conv. rate or not over a few days." Detecting
small differences really well with stats is fairly pointless in this space.

