In section 4.8 they compare accuracy at the same amount of training time for the biggest of each model and show that ALBERT is substantially better.