Well it's not Popperian, so it is controversial in that regard.
I'm actually completely fine with the first part of your hypothesis ("With the following 'arms' for the bandit"). Since an n-armed bandit experiment is just a shorthand for n-1 vs everything else experiments.
However I'm a bit confused/concerned by the procedure for developing more arms. What kind of situation are you imagining where that's useful? (Like I'd really want to hear a concrete example of this style of hypothesis).
If you find that two actions are both fairly good in a given state, the system might try automatically generating a new action that's some synthesis of the two, either an average or some kind of genetic algorithm crossover.
I don't know if you can have a useful falsifiable hypothesis about individual arms. But you can about the entire system. The hypothesis you want to fail to falsify is that the system gradually improves its performance (measured by some reward function) over time.
I think we need to be able to fund some things that are a bit more exploratory. Imagine we're going to test how growing a particular culture of bacteria responds to a hundred chemicals. We can only test, say, 10 at a time for cost or equipment reasons. We know from previous science that the chemicals are related in some way, because organic chemicals have various relationships. Suppose we find as we m.a.b.-explore across our set of 100 possible chemicals that things with benzene rings seem to have whatever effect we're looking for after we've tested only 30. We started with a list of chemicals that have at most one such ring. The experimenter would like to try something with two rings.
The proposal is that there ought to be a lower-weight process for going to some committee or funding board or something and saying "According to these statistics, we believe we should try two benzene rings" and getting fast approval. You follow the multi-armed bandit process after that. Suppose for the sake of argument it actually fails miserably, so you stop trying that quickly in accordance with the process. You tried something really quickly, didn't need new funding, and didn't have to wait a year to run a new study. Maybe next week another grad student has another idea about what the real effect may be, and they run something with one more acetyl group than anything else, and it has 25 times the effect anything else has had up to this point, and in accordance with the multi-armed bandit approach, you begin exploring down that route.
By no means am I proposing that the experimenter just gets to do whatever change they want at whatever point they want. There is a lot of virtue in calling your procedures in advance.
But, we're programmers. We, of all people, should be aware that it is possible to define a "process" that does not consist of a straight set of instructions. Our processes are allowed to contain loops, conditionals, mathematics, etc. Science should be too. The process could be to go to a committee; the process could say "we reserve the right to have 5 free new thing trials in advance"; the process could be all sorts of things.
And we are not obligated to be stupid about the process. We do not need to blindly run m.a.b. trials on human medical trials or anything obivously stupid like that. And there would still be a place for old-fashioned, call-your-shot "We're going to do this exact thing and report what happens". It would just be a lot smaller than it is today.
In this example, after testing 40 out of 100 chemicals and perhaps testing 10 that weren't on the initial plan, we found a significant improvement. In the current system, the study plods on through testing all 100 chemicals, regardless of the information obtained partway through the study, and publishes a paper with those the results and a proposal that, someday, maybe somebody ought to try the additional benzene ring, which then never gets funded because the initial study's results contained no significant results and nobody sees a reason to follow this line of inquiry.
What happens in reality is that if it's cheap, the professor might just try this on their own, but it's unsanctioned and takes away from what they are actually funded to do. We're depending on them to be dedicated enough to the cause to do this experiment despite the fact literally all the concrete incentives are aligned against them. And if it's expensive, it doesn't get done at all; no funding. Scientists need more leeway to explore, but with rigorous procedures for exploring.
I'd actually want to go back and carefully read Popper's original writing before I declared it's not "Popperian". It may not be; I don't know. But I would think it's distinctly possible that what we consider Popperian is actually a bastardization of his original points into this oversimplified view of science. This being an HN posting and not an academic debate, I don't have the time to properly dig into that right now.
Science has calcified around some procedures that are simply not mathematically justifiable. In a more scientific world, scientists would read that (and the justification behind it), and rush to fix their underlying foundational procedures before anything else. The fact that scientists as a whole seem rather blase about the whole problem significantly degrades my respect for them.
(And as they seem nearly equally blase as a whole about the reproducibility crisis, my respect isn't running all that high to begin with. "But jerf, there are many people talking about it and dealing with it." Yes, but that's not the correct response. The correct response is that fixing it is your number one priority. When you know you have a reproducibility problem, but allocate <1% of your resources to fixing that while pouring the remainder of resources into something you just by your own methodology established is a hole in the ground, you're not behaving sanely.)