I've never heard that before. I agree that the "language model" part has an accepted definition. I'd call e.g. GPT-2 an LLM and don't think anyone would bat an eye.
BERT from a year prior also makes the list at https://en.wikipedia.org/wiki/Large_language_model#List but I think that's what the (2023) is supposed to represent: outside the few initial models from years ago >= 7B parameters is the typical expectation for the term (it actually lines up with that table extremely well).
At the same time, if you're off by less than an order of magnitude (where GPT-2 would fall if released today) I don't think anyone will be harping 7B. Gotta leave a bit of fuzzy interpretation for the real world as no single number is going to please everyone in all cases but some number in the ballpark is still useful to discuss.
Ah good question. I think I have read that statement by some other people. But the limit is kind of arbitrary. And of course, this limit will be higher and higher over time, that's why I put the year.
I think the limit should also not be much lower. We already have language models three order of magnitude larger (>1T params), and we also call them "large", so in this context, all those single-digit billion parameter models feel quite small.
Similarly, when is a network "deep"? It used to mean more than 2 or 3 layers. And then there was a definition for "very deep", starting with more than 10 layers (I think Schmidhuber introduced that definition many years ago, https://arxiv.org/abs/1404.7828). Obviously, that's totally outdated now. Networks are often very deep, e.g. those large language models often 96 layers.
I've never heard that before. I agree that the "language model" part has an accepted definition. I'd call e.g. GPT-2 an LLM and don't think anyone would bat an eye.