I don't think this flaw is real.

If "whatever design was most popular at that time" has a billion trials and 90% successes, and "a new superior design" has 100 trials and 97% successes (97 out of 100), than the new design is favored by the algorithm. No need to "catch up" to the absolute number of successes.

Exactly what I was thinking. What are we missing?

What was meant was: If there is a period of time when you get a lot of visits, lots of clicks, and abnormally high CTR. This could happen due to external factors, for example if make the front page of HN. Over time, this effect will vanish, but you will be stuck with high CTR estimates for the design that was in place when this happened for a long time.

