The model with the most similar name in this list is code-cushman-001 which is described as "Codex model that is a stronger, multilingual version of the Codex (12B) model in the paper".
The next stronger Codex model is called code-davinci-001 which appears to be a fine-tuned version of the GPT-3 Davinci model which is known to have 175B parameters. The model naming is alphabetical in the order of the model size:
Most likely latency and cost reasons. A model that's 10x as big requires 10x the hardware to serve at the same latency. Since most generations are not too long, a smaller finetuned model should work well enough.