In my experience (limited to single threaded code though), modern C++ compilers (especially MSVC which does a better job than GCC in general) optimize really well. It's interesting to compare the assembly generated for the same operations in VC6 and VC2008/2010.
In line with what was mentioned below, VC6 used a series of lea to seek (a*x+b)th byte in a structure whereas VC2010 just uses a single multiply.
And something I was surprised by, VC6 always used fnstsw ax; test ax, 40h; jz $+16; to compare floating point values. VC2010 uses jp instead, I'd presume for a good reason as well.
Anecdote: One time and one time only I witnessed GCC write more "clever" code than MSVC, though I haven't benchmarked this:
switch(val) {
case 1:
case 2:
case 6:
case 15:
return foo;
default:
return bar;
}
Both compilers returned > 15 normally, but optimized differently: MSVC compiled this into a normal two-step jumptable
mov eax, table1[dl] // assuming dl contains val, table1 contains either 1 for bar or 0 for foo
jmp table2[al] // contains addresses of two branches
In line with what was mentioned below, VC6 used a series of lea to seek (a*x+b)th byte in a structure whereas VC2010 just uses a single multiply.
And something I was surprised by, VC6 always used fnstsw ax; test ax, 40h; jz $+16; to compare floating point values. VC2010 uses jp instead, I'd presume for a good reason as well.
Anecdote: One time and one time only I witnessed GCC write more "clever" code than MSVC, though I haven't benchmarked this:
Both compilers returned > 15 normally, but optimized differently: MSVC compiled this into a normal two-step jumptable GCC did the equivalent of this: