Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
My benchmark for large language models
(
carlini.com
)
4 points
by
cheviethai123
3 months ago
|
hide
|
past
|
favorite
|
2 comments
cheviethai123
3 months ago
[–]
Consider how low the score of Gemini here compared to the other LLM test. And I'm impressed by the evaluation method's ability to assess performance without relying on tailored prompts.
hoamatcuoi
3 months ago
|
parent
[–]
But the benchmark only scoring Gemini-Pro 1, I'm curious how the Gemini Ultra performance here but guessed we couldn't know yet.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: