Hacker News new | past | comments | ask | show | jobs | submit login

It also means the largest models can be scaled up significantly with the same inference budget.



Depends. The only paper they cite for training: https://arxiv.org/pdf/2310.11453.pdf doesn't improve training costs much and most models are already training constrained. Not everyone has $200m to throw at training another model from scratch.


Is there any scope for indie builders?


Not really. These are slightly better for memory during pre-training and fine turning but not enough to make a 4090 usable even for a 7b model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: