Absolutely. That's my biggest objection to bandit methods, but it's also the fuzziest objection, and the one least likely to appeal to hyper-analytical people. There's a strong temptation (as we can see from this article) is to treat bandit optimization as a black box that just spits out an infallible answer (i.e. as a Lazy Button).
It's the same human tendency that has led to "you should follow me on twitter" to be one of the more common n-grams on the interwebs (even though it probably never worked for more than Dustin Curtis, and likely causes a counter-intuitive backlash now).
Of course, these algorithms are cool in certain contexts where you never want to think about it again. For example, ad placement for a giant ad network running against a lot of UGC content. They remind me of neural networks like that: the post office doesn't care how the machine recognizes hand-written zip codes, just as long as they do it reliably.
But startups are all about learning, and interface design is where you have direct contact with users. I want to soak in the data. I want to hang out in people's living rooms when they're using my site. A fancy Lazy Button (great phrase) is the last thing I need.
The A/B test just gives you the conversion rate of each option. And so does the bandit. As I understand, the only difference is that the bandit will be bugging your users with bad results less often.
Of course, the real answer to why not is "we have an A/B system and I'm not going to add bandit stuff for no benefit". But even if I were doing it from scratch, it seems more complex. The benefit of these approaches seems to be that one no longer has to think. We want to think.
No, because the assumptions that underpin many statistical techniques are violated when you're not assigning people to cohorts consistently, and at random.
The standard confidence tests -- t-tests, G-tests, chi-squared tests, etc. -- based on distributions of independent, identically distributed (iid) data.
I'd have to think about it more, but I believe that btilly's examples are also the most intuitive reasons why independence matters. If your data is time-dependent, then assigning users to cohorts based on past performance lets the time dependency dominate. There may be other good examples.