Hacker News new | past | comments | ask | show | jobs | submit login

Why does “cushman-ml” suggest a 12B model instead of the 175B model?



The model with the most similar name in this list is code-cushman-001 which is described as "Codex model that is a stronger, multilingual version of the Codex (12B) model in the paper".

https://crfm-models.stanford.edu/static/help.html

The next stronger Codex model is called code-davinci-001 which appears to be a fine-tuned version of the GPT-3 Davinci model which is known to have 175B parameters. The model naming is alphabetical in the order of the model size:

https://blog.eleuther.ai/gpt3-model-sizes/

See also A.2 here: https://arxiv.org/pdf/2204.00498.pdf#page=6


Code is the base model in more recent iterations [0]

[0] https://beta.openai.com/docs/model-index-for-researchers


Most likely latency and cost reasons. A model that's 10x as big requires 10x the hardware to serve at the same latency. Since most generations are not too long, a smaller finetuned model should work well enough.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: