I'd say it's more like benchmarks don't tell the whole story. They should still be good for catching obvious regressions before any code hits a real user's machine, but having real world data should help to identify a whole bunch of bottlenecks that were unknown before, and determine if performance tweaks are actually helping.