Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
tedsanders
5 days ago
|
parent
|
context
|
favorite
| on:
FrontierCode: An eval to measure whether you would...
Makes sense, thanks. I suppose error bars are tricky if trying to handle problem-to-problem variance, rubric-to-rubric variance, and run-to-run variance all at once.
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: