(Edit) So the two steps to run are:
1. Run a traditional A/B test until 95% confidence is reached. This is full exploration.
2. Then, switch to the MAB after that, showing the better performing variant most of the time. As time increases, the display of the worse performing variants decreases.
I emphasize, because this is a common problem made by A/B test practitioners. For a fuller discussion of the problems, check out the papers by Armitage (frequentist) and Anscombe (Bayesian) on the topic. Or see my summary of the issue here: