

Automated Locality Optimization Based on the Reuse Distance of String Operations - chorola
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40679.pdf

======
kevingadd
This seems like a pretty clever approach to tuning the performance of memcpy
and memset, since proper instrumentation should enable you to make the right
decision(s) about whether to bypass cache on a given operation most of the
time. In the cases where you mess up, you can probably spot those by doing a
second instrumentation pass after applying the optimizations (to see if any of
them actually caused an overall performance hit).

It's nice to see that they tested this on a bunch of use cases, as well... and
on that note, the adsense-serving benchmark spends 37.5% of its CPU time in
memcpy! That's insane! I wonder if it's basically just a web server benchmark
where it's calling memcpy a lot to serve up static assets over HTTP?

