Hacker News new | past | comments | ask | show | jobs | submit login

I just meant chinchilla optimal in terms of the corrected scaling curves from the chinchilla paper. The original GPT-3 was way larger than it needed to be for the amount of data they put into it based on the curves from the chinchilla paper.



It's also worth noting that we don't know any specifics (parameters, training tokens) of GPT-3.5. Only for GPT-3 those numbers have been published.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: