Hacker News new | past | comments | ask | show | jobs | submit login

Can you clarify what you mean?



Because the training data/model size/compute tradeoff derived from that paper is highly suboptimal (too many parameters) compared to the ones from the later Deepmind scaling laws [1]. And then Meta researchers recommended using even smaller models, to trade-off training- and inference-time compute [2] (which I thought was pretty obvious if you care about more than just benchmarks).

[1] https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models

[2] https://arxiv.org/abs/2302.13971 LLaMA: Open and Efficient Foundation Language Models


He seems to be implying that openai released that paper to throw others off the scent of the direction they were taking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: