> 1. The nerf is psychologial, not actual. 2. The nerf is real but in a way that is perceptual to humans, but not benchmarks.
They could publish weekly benchmarks. To disprove. They almost certainly have internal benchmarking.
The shift is certainly real. It might not be model performance but contextual changes or token performance (tasks take longer even if the model stays the same).
Anyone can publish weekly benchmarks. If you think anthropic is lying about not nerfing their models you shouldn't trust benchmarks they release anyway.
They could publish weekly benchmarks. To disprove. They almost certainly have internal benchmarking.
The shift is certainly real. It might not be model performance but contextual changes or token performance (tasks take longer even if the model stays the same).