"So, comparing A/B testing and multi-armed bandit algorithms head to head is wrong because they are clearly meant for different purposes. A/B testing is meant for strict experiments where focus is on statistical significance, whereas multi-armed bandit algorithms are meant for continuous optimization where focus is on maintaining higher average conversion rate."
Like any business owner, at the end of the day I care more about conversion rates than about statistical significance.
I'm not a scientist trying to advance the sum of humanity's knowledge. I'm a business owner trying to find the shortest path between what my customers need and what I can profitably offer them.
In a way, statistical significance strikes me as a bit of a fool's errand, because significant results in one context may not be generalizable to another, which means even if we know for a near certainty what worked best, it's hard to apply that knowledge reliably in the future.
Of course, with MAB we could still wait for statistical significance if we want it, before turning off variations that are performing worse. And we can certainly still try to draw conceptually useful conclusions by designing our test variations in ways that facilitate easy comparison.
But with MAB and Myna it sounds like we can pretty well count on higher conversion rates at the end of the day, as well, and that counts for a lot in a business context.
I'm grateful to the VWO team for writing up their analysis and findings, and being so frank about the relative advantages of A/B and MAB. Their summary above tells me what I need to know.