It also means the largest models can be scaled up significantly with the same in...

llm_trw · 2024-02-28T12:49:47

Depends. The only paper they cite for training: https://arxiv.org/pdf/2310.11453.pdf doesn't improve training costs much and most models are already training constrained. Not everyone has $200m to throw at training another model from scratch.

arunk47 · 2024-02-28T21:43:01

Is there any scope for indie builders?

llm_trw · 2024-02-29T00:28:30

Not really. These are slightly better for memory during pre-training and fine turning but not enough to make a 4090 usable even for a 7b model.