I wonder if regenerating the same prompt with the same model multiple times at a...

I wonder if regenerating the same prompt with the same model multiple times at a higher temperature would be equivalent to running different models. I suspect the perceived variance among different frontier models may be largely due to randomness associated with non-zero temperatures.

Models seem to be trained to return nice, round numbers of items, like 5, 10, or 15 (because of interference from training on marketing materials?) Plus, recall is far from 100% on large contexts. So if your code has 27 bugs, each run may find a different set of 10 issues out of the 27, whether you use several models or call the same one repeatedly.