I'd say it's more like benchmarks don't tell the whole story. They should still ...

I'd say it's more like benchmarks don't tell the whole story. They should still be good for catching obvious regressions before any code hits a real user's machine, but having real world data should help to identify a whole bunch of bottlenecks that were unknown before, and determine if performance tweaks are actually helping.