So, couldn't it be that the authors of the paper ran the model with a random tie break and got lucky? This blog post seems to assume they had the "rand" flag deactivated. Please correct me if I am wrong.
From what I understand in the post getting lucky enough to see that big of a change in this situation would be like getting 1000 head flips in a row. It’s not luck you could expect to ever get.