Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
gunalx
20 days ago
|
parent
|
context
|
favorite
| on:
DeepSeekMath-V2: Towards Self-Verifiable Mathemati...
If i read it right it used multiple samples of itself to verify the aqccuracy, but isnt this problematic?
zamadatix
20 days ago
|
next
[–]
Problematic in that it's still not formal verification, not problematic as in "it's worse to do this than not".
viraptor
20 days ago
|
prev
[–]
In what way? Panel of experts approach has been a thing for a while now and it's documented to improve quality.
gunalx
19 days ago
|
parent
[–]
Well problematic because they are using their own verifier as apanem of experts, with their own model trained specifically to satisfy this verifier. On the benchmark runs, they dont mention using human experts to cross validate their scores.
cubefox
19 days ago
|
root
|
parent
[–]
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: