
The Imperfect World of A/B Testing Selection - brensudol
http://engineering.simondata.com/the-imperfect-world-of-ab-selection/
======
jasondavis
Thanks Dan. I actually wrote a piece a couple years back about this as well:

[http://drjasondavis.com/blog/2013/09/12/eight-ways-youve-
mis...](http://drjasondavis.com/blog/2013/09/12/eight-ways-youve-
misconfigured-your-ab-test)

Can't tell you how many bad a/b tests we ran at Etsy until we figured things
out ;)

------
danmccorm
Great article. Designing a/b systems always seems (relatively) simple at the
start, but in my experience there are 1,000 things you don't think of until
you have massive amounts of worthless results. Add this to the list of things
to watch out for.

~~~
btilly
My experience is the opposite. Spend time simplifying your thinking, and it
stays simple. But it is very, very easy to start overthinking things and then
you go down a rabbit hole.

Consider this for an example. If you're testing per session behavior, then you
can just use a session cookie. If you're testing logged in behavior, you can
use the login id. You've just covered most of the things you want to test.

When you start worrying about cross-device both logged in and not, then you
have a world of pain. So treat it as an identity problem, throw away all of
the users you find questionable, and work with that. And yes, this is a pain,
which is why you do it as seldom as you can!

~~~
yummyfajitas
_So treat it as an identity problem, throw away all of the users you find
questionable,..._

And if questionability is correlated with the thing you are trying to measure,
you've just added bias. For example, consider trying to measure engagement or
something correlated with it. Are users who connect to your site from 3
different devices more or less engaged than normal? Great - you just threw out
your most engaged users.

Similarly, you can't just use a session cookie to test per-session behavior.
This introduces correlations between sessions, which violates the IID
assumption in all the standard statistical tests.

[https://www.chrisstucchio.com/blog/2015/no_free_samples.html](https://www.chrisstucchio.com/blog/2015/no_free_samples.html)

You can fix this if you want by using the weakly mixing central limit theorem
or just explicitly putting the mixing into a Bayesian analysis. But that's
probably a lot trickier than just using a long term cookie.

~~~
btilly
You have to know the limitations of the approach you are using.

Also about session cookies, there is no correlation created if the A/B test
behavior is tied to the session. The downside is that different users get
different behaviors on different days. This may be a bad user experience. The
upside is that it is quick and simple for things like landing pages.

In the end there is no solution that avoids actually understanding what your
data really says.

~~~
yummyfajitas
There is absolutely correlation between sessions. If visitor 1 (corresponding
to sessions 1,2,3) has a high conversion probability, while visitor 2
(corresponding to sessions 4,5,6) has a low conversion probability, then
you've introduced correlation between sessions 1,2,3 and sessions 4,5,6. This
breaks the CLT and all the usual independence assumptions.

If most of your visitors only have one session this may not matter...but then
again with only session cookies you don't even have a way to know this.

