Hacker News new | past | comments | ask | show | jobs | submit login

they can also use more layers without increasing model size.

Additionally, in section 4.9 they compare more layers and find "The difference between 12-layer and 24-layerALBERT-xxlarge configurations in terms of downstream accuracy is negligible, with the Avg score being the same. We conclude that, when sharing all cross-layer parameters (ALBERT-style), there is no need for models deeper than a 12-layer configuration"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: