> "For example, as of this writing, the first two google hits for popcnt benchmark (and 2 out of the top 3 bing hits) claim that Intel’s hardware popcnt instruction is slower than a sofware implementation that counts the number of bits set in a buffer, via a table lookup using the SSSE3 pshufb instruction."
I believe that my benchmark is one of those top hits. My description is at http://www.dalkescientific.com/writings/diary/archive/2011/1... . I wrote "My answer is that if you have a chip with the POPCNT instruction built-in then use it. I still don't have one of those chips, but I know someone who does, who has given me some numbers. "
My own code's logic is "if POPCNT exists then use it, otherwise test one of a few possibilities to find the fastest, since the best choice depends on the hardware."
I now have a machine with a hardware POPCNT, and a version with inline assembly. I should rerun the numbers...
I believe that my benchmark is one of those top hits. My description is at http://www.dalkescientific.com/writings/diary/archive/2011/1... . I wrote "My answer is that if you have a chip with the POPCNT instruction built-in then use it. I still don't have one of those chips, but I know someone who does, who has given me some numbers. "
My own code's logic is "if POPCNT exists then use it, otherwise test one of a few possibilities to find the fastest, since the best choice depends on the hardware."
I now have a machine with a hardware POPCNT, and a version with inline assembly. I should rerun the numbers...