Hacker News new | past | comments | ask | show | jobs | submit login

Looking at the benchmarks, it seems like Phi-3 Small (7B) marginally beats out Llama3-8B on most tasks, which is pretty exciting!



On my benchmark (NYT Connections), Phi-3 Small performs well (8.4) but Llama 3 8B Instruct is still better (12.3). Phi-3 Medium 4k is disappointing and often fails to properly follow the output format.


Have you found either model to be good enough to do anything interesting, reliably?


Llama3-8B is adequate for non-technical summarization or simple categorization.


It also seems to be comparable to gpt 3.5 turbo which I feel hard to believe. People have obviously found out a way around these benchmarks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: