Looking at the benchmarks, it seems like Phi-3 Small (7B) marginally beats out L...

zone411 · 2024-05-22T02:29:46.000000Z

On my benchmark (NYT Connections), Phi-3 Small performs well (8.4) but Llama 3 8B Instruct is still better (12.3). Phi-3 Medium 4k is disappointing and often fails to properly follow the output format.

Filligree · 2024-05-21T20:18:17.000000Z

Have you found either model to be good enough to do anything interesting, reliably?

ukuina · 2024-05-22T15:31:43.000000Z

Llama3-8B is adequate for non-technical summarization or simple categorization.

ashu1461 · 2024-05-22T10:12:28.000000Z

It also seems to be comparable to gpt 3.5 turbo which I feel hard to believe. People have obviously found out a way around these benchmarks.