Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

BERT from a year prior also makes the list at https://en.wikipedia.org/wiki/Large_language_model#List but I think that's what the (2023) is supposed to represent: outside the few initial models from years ago >= 7B parameters is the typical expectation for the term (it actually lines up with that table extremely well).

At the same time, if you're off by less than an order of magnitude (where GPT-2 would fall if released today) I don't think anyone will be harping 7B. Gotta leave a bit of fuzzy interpretation for the real world as no single number is going to please everyone in all cases but some number in the ballpark is still useful to discuss.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: