For instance a tool I incorporated into one system would see a parameter to a web request and would issue a command to Oracle telling it to log everything that happened on the database for that connection, and then would turn it off afterwards. So, for instance, we could take a slow web page, add a parameter, and a minute later be studying a log generated by Oracle telling us exactly what that web request did, and where it spent its time.
Having the ability to selectively do this on the live production system against live data with a problematic request while it was being problematic was huge. We were tracking down problems that only showed up in production, under production load, so no amount of profiling in development would have helped. Using the same idea, every day we would just take one random database handle, turn logging on for half an hour, and use it as a canary to look for potential problems. We found a lot of things that way.
Addendum (added later) It is also worth noting that in many horizontally scaled systems you can trivially have a fair amount of logging, even in production, if you're willing to accept a constant factor overhead in inefficiency. This can be utterly invaluable in tracking down latency, bottlenecks, and other larger scalability problems. Every large system that I've seen that was well-run did this to some extent.