Hacker News new | past | comments | ask | show | jobs | submit login

> The model requires ~264GB of RAM

This feels as crazy as Grok. Was there a generation of models recently where we decided to just crank on the parameter count?




Cranking up the parameter count is literally how the current LLM craze got started. Hence the "large" in "large language model".


If you read their blog post, they mention it was pretrained on 12 Trillion tokens of text. That is ~5x the amount of the llama2 training runs.

From that, it seems somewhat likely we've hit the wall on improving <X B parameter LLMs by simply scaling up the training data, which basically forces everyone to continue scaling up if they want to keep up with SOTA.


Not recently. GPT-3 from 2020 requires even more RAM; the open-source BLOOM from 2022 did too.

In my view, the main value of larger models is distillation (which we particularly witness, for instance, with how Claude Haiku matches release-day GPT-4 despite being less than a tenth of the cost). Hopefully the distilled models will be easier to run.


Isn’t that pretty much the last 12 months?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: