In their results when they compare a MAB with a 50% exploration rate to their split test you start to see a comparable amount of time to converge. Also they only show the results of one simulation with a lot of random in it. Given we're all stats nerds it would've been handy to see a box plot for each of the styles of simulation across multiple runs.
Having said all of that, my biggest concern around MAB as it's being sold is the lack of thought around the experiments. In the end it's just data and it doesn't mean anything without human intuition and preconceptions guiding it. Example: day of week, time of year/day, new page people are getting used to still, old one that garners lots of repeat traffic, etc. When running tests there's a lot more to consider besides just "whatever the data says".