I could be misreading it, but I believe the premise is "more traction => more action".
If I'm right, the idea is that the algorithm dynamically weights the "display frequency" of the two (or n) options. So as one of the a/b options shows itself to be more successful, it's shown more frequently. Because the test is self-correcting, you as an A/B test runner don't have to decide when the results are significant enough, and the program itself will automatically choose the more successful option.
The difference in performance between using evolutionary approaches and bandit algorithms is complex and problem dependent. There is no "one algorithm is better than the other".
I've used both approaches for a different application and neither dominate the other.
That looks like it uses genetic algorithms, which is much less optimal and more exploratory than bandit algorithms. What noelwelsh is proposing would lead to better results much more quickly and without random permutations of elements.
If I'm right, the idea is that the algorithm dynamically weights the "display frequency" of the two (or n) options. So as one of the a/b options shows itself to be more successful, it's shown more frequently. Because the test is self-correcting, you as an A/B test runner don't have to decide when the results are significant enough, and the program itself will automatically choose the more successful option.