> They simply compare the prompting strategies that work best with each model In...

> They simply compare the prompting strategies that work best with each model

Incorrect.

# Gemini marketing website, MMLU

- Gemini Ultra 90.0% with CoT@32*

- GPT-4 86.4% with 5-shot* (reported)

# gemini_1_report.pdf, MMLU

- Gemini Ultra 90.0% with CoT@32*

- Gemini Ultra 83.7% with 5-shot

- GPT-4 87.29% with CoT@32 (via API*)

- GPT-4 86.4% with 5-shot (reported)

Gemini marketing website compared best Gemini Ultra prompting strategy with a worse-performing (5-shot) GPT-4 prompting strategy.