Hacker News new | past | comments | ask | show | jobs | submit login

The good news is that when you do A/B testing, you can create a properly randomized experiment, where variations of e.g. the number of men/women assigned to the A and B group are accounted for as part of the sampling error.

Where this stuff really gets you is in retrospective data analysis, where with the right choice of would-be confounding variables you can pretty much argue both directions on any question.




The problem that I have seen in these frameworks that handle A/B testing is that they ignore the basic laws of statistics. All groups are different given large enough n. That's what p is about, do you have enough data to tell the means apart. The closer the means, the more data you need. What you really care about is how large the difference is. That's called effect size, which is really just how far apart the means are in divided by a combined standard deviation for both groups.

I certainly wouldn't be making rash decisions on data without a good effect metric.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: