Hacker News new | past | comments | ask | show | jobs | submit login

gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.




gpt3.5 turbo is a new model, not Curie. As others have stated, it probably uses Mixture of Experts which lowers inference cost.


Is there a source on that? I've never seen anyone think it's below even 70B


It still does a much better job at translation than llama 2 70b even, at 6.7b params


If it's MOE that may explain why it's faster and better...


MOE?



I thought it was fairly well established that GPT 3.5 has something like 130B parameters and that GPT 4 is on the order of 600-1,000


I remember:

- gpt-3.5 175b params

- gpt-4 1800b params




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: