Linux (the kernel) has been aware of this result for a long time. Here's a thread from 2001, where Linus says
"I would, for example, suspect that a "correct" optimization strategy for 99% of all real-world cases (not benchmarks) is: if it doesn't have floating point, optimize for the smallest size possible. Never do loop unrolling or anything fancy like that. "
Unrolling loops made sense when memory access was cheap. That is, until about mid 80s. Since the advent of caches, making your code (and as much data as possible) fit inside them was the way to go.
"I would, for example, suspect that a "correct" optimization strategy for 99% of all real-world cases (not benchmarks) is: if it doesn't have floating point, optimize for the smallest size possible. Never do loop unrolling or anything fancy like that. "
http://gcc.gnu.org/ml/gcc/2001-07/msg01543.html
and a similar thread http://lkml.indiana.edu/hypermail/linux/kernel/0302.0/1068.h...
And the 2.6.15 (2005) changelog which exposes a configuration option to compile the kernel optimized for size http://lkml.org/lkml/2005/12/18/139