In my experience (limited to single threaded code though), modern C++ compilers ...

In my experience (limited to single threaded code though), modern C++ compilers (especially MSVC which does a better job than GCC in general) optimize really well. It's interesting to compare the assembly generated for the same operations in VC6 and VC2008/2010.

In line with what was mentioned below, VC6 used a series of lea to seek (a*x+b)th byte in a structure whereas VC2010 just uses a single multiply.

And something I was surprised by, VC6 always used fnstsw ax; test ax, 40h; jz $+16; to compare floating point values. VC2010 uses jp instead, I'd presume for a good reason as well.

Anecdote: One time and one time only I witnessed GCC write more "clever" code than MSVC, though I haven't benchmarked this:

  switch(val) {
   case 1:
   case 2:
   case 6:
   case 15:
    return foo;
   default:
    return bar;
  }

Both compilers returned > 15 normally, but optimized differently: MSVC compiled this into a normal two-step jumptable

  mov eax, table1[dl] // assuming dl contains val, table1 contains either 1 for bar or 0 for foo
  jmp table2[al] // contains addresses of two branches

GCC did the equivalent of this:

  return ((1 << val) & ((1 << 1) | (1 << 2) | (1 << 6) | (1 << 15)))
   ? foo
   : bar
  ;