

Beating A/B Tests - m0th87
https://www.themuse.com/advice/beating-ab-tests

======
vijayaggarwal
Disclaimer: I work for Visual Website Optimizer (VWO). Still, I will try to be
as neutral as possible.

> Bandit tests allow you to run as many variations at the same time as you
> want, versus A/B testing, which you limits to two.

True that the phrase _A /B Testing_ in a strict sense considers only two
variations, but almost all tools allow something called _A /B/n Testing_ which
allows for multiple variations. The mathematical foundations of A/B/n testing
are as well established as that of A/B testing.

> But there's a deeper, less obvious benefit. Had we run these social button
> variations as a series of A/B tests, we'd be at much greater risk of
> reaching a local maxima. That is, the odds of us missing the best performing
> variation in favor of one that's merely adequate would be higher.

Again, there is another concept called _Multivariate Testing_ to be used for
testing multiple changes per page. MVT is also offered by many online A/B
testing tools. Again, the theoretical foundation is very well established.

> So I'd have to be on top of my game to make the code-changes as soon as the
> Chi Squared value was high enough.

Good A/B testing services do this automatically for you. Losing variations are
automatically stopped and winning variations are automatically given 100%
traffic. Of course, you have a choice to enable/disable this behavior.

> there's nothing stopping you from tuning a bandit test to behave exactly
> like an A/B test.

Yes of course. Because multi-armed bandit (MAB) is _another strategy_ of
running A/B tests (more generally A/B/n tests, even more generally
multivariate tests). So, technically, you can't beat A/B tests with MAB as
your article's headline suggests, because MAB is itself a strategy of running
A/B tests. What you can try to beat is _classical strategy_ used by most tools
today. Here, the post by VWO linked in your article clearly details the pros
and cons of both approaches and establishes why a head to head comparison is,
in fact, inappropriate.

> ...the critique is utterly ridiculous

As this point talks direct about VWO, I would take a break from my attempt at
neutrality. _tuning a bandit test to behave exactly like an A /B test_ was not
the point of our article, and it is not the point of your article either. We
have both attempted to compare the two approaches (MAB vs classical) to A/B
testing. I do not mind your criticizing our article, but it would only be
appropriate to substantiate your criticism with facts. Criticism without facts
does not help the cause of discourse. I am fairly well versed with the theory
of A/B testing and happy to take any questions here.

