I personally do not understand the need to compare bandit with A/B.

The goal of A/B is to decide as quickly as possible for the best one.

The goal of bandit is to optimize given content.

I wouldn't use bandit to "decide" the button color but as a simple recommendation system. This seems by far more natural to me as it reacts better with changing optima.

Example: Let's say I run a larger "fun content" media webpage. I have "awesome videos", "funny images" and "goofy articles". On the bottom and right side of each content page i show follow up content. I would use bandit here to optimize the mix of images,videos,text i recommend.

To spice it up: I would create cohorts for my typical behaviour of users (e.g. registered male user) and only consider interactions of the last two weeks into my bandit calculations.

