Somebody should write a new edition of this book which takes account of multithreading optimizations and best practices.
In line with what was mentioned below, VC6 used a series of lea to seek (a*x+b)th byte in a structure whereas VC2010 just uses a single multiply.
And something I was surprised by, VC6 always used fnstsw ax; test ax, 40h; jz $+16; to compare floating point values. VC2010 uses jp instead, I'd presume for a good reason as well.
Anecdote: One time and one time only I witnessed GCC write more "clever" code than MSVC, though I haven't benchmarked this:
mov eax, table1[dl] // assuming dl contains val, table1 contains either 1 for bar or 0 for foo
jmp table2[al] // contains addresses of two branches
return ((1 << val) & ((1 << 1) | (1 << 2) | (1 << 6) | (1 << 15)))
I think I'll have to take a look at state-of-the-art implementations of shifters and multipliers. Does anybody know of some resources regarding this topic (papers or reasonably optimized VHDL code)?
Having said that, the following might be some possible references.
Glenn Colón-Bonet and Paul Winterrowd, "Multiplier Evolution: A Family of Multiplier VLSI Implementations" -- http://comjnl.oxfordjournals.org/cgi/content/abstract/bxm123...
Shailendra Jain, Vasantha Erraguntla, Sriram R. Vangal, Yatin Hoskote, Nitin Borkar, Tulasi Mandepudi, Karthik VP, "A 90mW/GFlop 3.4GHz Reconfigurable Fused/Continuous Multiply-Accumulator for Floating-Point and Integer Operands in 65nm," VLSI Design, International Conference on, pp. 252-257, 2010 23rd International Conference on VLSI Design, 2010. -- http://www.computer.org/portal/web/csdl/doi/10.1109/VLSI.Des...
I would have thought the state-of-the-art is trade secret for Intel and AMD, unless anyone actually depackages their chips to look at the multipliers? Patent applications might be worth looking at, but they are probably incomprehensible.
Also, http://www.ece.ucsb.edu/~parhami/ece_252b.htm looks like a good course web page on VLSI maths, but I don't know.
 Dadda, L. (1965). "Some schemes for parallel multipliers". Alta Frequenza 34: 349–356.