(1) Companies will probably increasingly invest in building their own evals for their use cases because its becoming clear public/allegedly private benchmarks have misaligned incentives with labs sponsoring/cheating
(2) Those evals will prob be proprietary "IP" - guarded as closely as the code or research itself
(3) Conversely, public benchmarks are exhausted and SOMEONE has to invest in funding more frontier benchmarks. So this is prob going to continue.
(1) Companies will probably increasingly invest in building their own evals for their use cases because its becoming clear public/allegedly private benchmarks have misaligned incentives with labs sponsoring/cheating (2) Those evals will prob be proprietary "IP" - guarded as closely as the code or research itself (3) Conversely, public benchmarks are exhausted and SOMEONE has to invest in funding more frontier benchmarks. So this is prob going to continue.
reply