I've seen minor tweaks to a form raise conversions by 20%. The person running a particular test may not even know that a particular change was significant, or even that it happened. The change could be as subtle as another running test realized that version x is better, interfering with existing tests.
As for #5, with an A/B test you run into these situations, you're able to break down and crunch the numbers in multiple ways, and then have a discussion about how you want to proceed. But with a multi-armed bandit approach whatever complexities you have not thought of and baked into your approach, are not going to be noticed.