Hacker News new | past | comments | ask | show | jobs | submit login
Mixtral 8x7B Above Gemini Pro – Chatbot Arena Leaderboard Updated (huggingface.co)
2 points by jafitc 11 months ago | hide | past | favorite | 1 comment



This is based on users choosing the better from 2 models at a time, and calculating an ELO rating from who-beats-who.

BYOT - bring your own tests style.

Gives a better picture of real-world performance and more robust against contamination.

They collected over 6000 and 1500 votes for Mixtral-8x7B and Gemini Pro.

While ELO ratings are widely used to rank performance in Chess or among sports teams, here's a disclaimer by the makers of the leaderboard:

---

> Please note Arena is a "live eval" and pretty much a sampling process to estimate models capability.

> That's why we show the confidence intervals through bootstrapping. Statistically, these models (e.g., GPT-3.5, Mixtral, Gemini Pro) are very close and only looking at their ranking can be misleading.

https://twitter.com/lmsysorg/status/1735729398672716114

https://twitter.com/lmsysorg/status/1735751052287226059




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: