Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is GPT-2's architecture any different?


Not hugely, but yes. I tend to think of GPT as a style of architecture with consistent themes and major features, but varying minor features and implementation details. Off the top of my head, I believe the most important difference is that GPT-3 alternates global and local attention while GPT-2 is all global attention.

The two published GPT-Neo models follow GPT-3's lead but the repo lets the user pick whether to use global or local attention layers.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: