Hacker News new | past | comments | ask | show | jobs | submit login

Can you clarify what you mean?



Because the training data/model size/compute tradeoff derived from that paper is highly suboptimal (too many parameters) compared to the ones from the later Deepmind scaling laws [1]. And then Meta researchers recommended using even smaller models, to trade-off training- and inference-time compute [2] (which I thought was pretty obvious if you care about more than just benchmarks).

[1] https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models

[2] https://arxiv.org/abs/2302.13971 LLaMA: Open and Efficient Foundation Language Models


He seems to be implying that openai released that paper to throw others off the scent of the direction they were taking.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: