Is there a speed-up? In their paper in table 3, once you compare each ALBERT mod... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

zuzun on Jan 8, 2020 | parent | context | favorite | on: ALBERT: A Lite BERT for Self-Supervised Learning o...

Is there a speed-up? In their paper in table 3, once you compare each ALBERT model with the smaller BERT model, you're looking at similar accuracies and longer training times.

nl on Jan 9, 2020 [–]

They are comparing the speed to execute training to 125K steps, not speed to a given accuracy.

In section 4.8 they compare accuracy at the same amount of training time for the biggest of each model and show that ALBERT is substantially better.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact