Ah good question. I think I have read that statement by some other people. But the limit is kind of arbitrary. And of course, this limit will be higher and higher over time, that's why I put the year.
I think the limit should also not be much lower. We already have language models three order of magnitude larger (>1T params), and we also call them "large", so in this context, all those single-digit billion parameter models feel quite small.
Similarly, when is a network "deep"? It used to mean more than 2 or 3 layers. And then there was a definition for "very deep", starting with more than 10 layers (I think Schmidhuber introduced that definition many years ago, https://arxiv.org/abs/1404.7828). Obviously, that's totally outdated now. Networks are often very deep, e.g. those large language models often 96 layers.
I think the limit should also not be much lower. We already have language models three order of magnitude larger (>1T params), and we also call them "large", so in this context, all those single-digit billion parameter models feel quite small.
Similarly, when is a network "deep"? It used to mean more than 2 or 3 layers. And then there was a definition for "very deep", starting with more than 10 layers (I think Schmidhuber introduced that definition many years ago, https://arxiv.org/abs/1404.7828). Obviously, that's totally outdated now. Networks are often very deep, e.g. those large language models often 96 layers.