
Abusing the CPU's Adder Circuits - scaramanga
https://giannitedesco.github.io/2019/06/15/abusing-add.html
======
sriram_malhar
I'll attempt a lay explanation of kernighan's trick without referring to
hardware logic.

1\. Say your odometer is showing 146799. The next reading is 146800. If it is
146999, the next reading is 147000. Observe that if you run the odometer in
reverse, all trailing 0's get flipped to 9, and the last non-zero digit is
decremented. The digits to the left are unaffected. Let's do this in binary.

2\. Take any bitstring x, and focus on the last 1 bit. By definition (of
'last'), the bits following that bit are all zeroes. That is, x is of the of
the form

    
    
       .....1,  .....10, .....100  etc.
    
    

3\. Now run the odometer in reverse. x-1 is of the form

    
    
       .....0,  .....01, .....011  etc.
    

Observe (a) the "...." part remains unchanged. Only the bits following and
including the last set 1 are flipped. (b) the position of the last '0' in x-1
is the same position as the last '1' in x.

    
    
          x   = 10101100
          x-1 = 10101011
    

4 Bit-wise & the two. The "..." part remains unchanged, the trailing zeroes of
'x' remain unchanged. Therefore, only the last 1 of x is set to 0

    
    
        x&x-1 = 10101000
    

5\. In a loop, compute x &= x-1. In each iteration, it sets the last bit in x
to zero. Count the number of iterations until x == 0.

------
ncmncm
The lack of a popcount instruction in RISC-V has made many of us distrustful
of the whole project. But if the Bitmanip extension goes forward, RISC-V could
end up as the best-equipped ISA of all for bitwise operations.

~~~
dragontamer
Intel's PEXT and PDEP instructions are the #1 innovation to bit-manipulations.
Its basically bitwise gather / bitwise scatter, some of the most flexible
operations I've ever used personally.

Popcount is... ridiculously standard. Even GPUs offer 64-bit popcount. Bit-
reversal is surprisingly useful in my experience as well (ARM and GPUs offer
single-cycle bit-reversal). So I'm surprised to hear that RISC-V doesn't have
popcount standard.

Bit-reversal is great because adder-circuits carry only goes in one direction.
So all of the "least-significant set bit" tricks that are done can be inverted
into "most-significant set bit" rather easily with just a bit-reversal. x86 is
missing out on bit-reverse, while all other platforms (GPUs, ARM, and Power9)
seem to have it.

I'm also of the opinion that multiply-xor-bitreversal-multiply is a very
powerful hashing tool (multiplication by odd numbers is 1-to-1 bijective, xor
is 1-to-1 bijective, bitreversal is 1-to-1-bijective... so a multiply-xor-
bitreverse-multiply cycle can transform any number into a singular 'random'
number across all 64-bits space). x86-fans don't get bit-reversal, but bswap
can be used to largely the same effect for hashing.

~~~
qtplatypus
Popcount is ridiculously standard as many US government computer contracts
require chips that support that operation. Also popcount is very useful if you
are doing cryptanalysis.

------
djmips
Without having a popcnt instruction it's faster to use a SWAR approach instead
of that Kernighan trick.

[http://graphics.stanford.edu/~seander/bithacks.html#CountBit...](http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

~~~
scaramanga
Author here. Yeah, you can expect some more SIMD and SWAR articles from me in
the future I hope. Was trying to keep it short and sweet :)

~~~
nkurz
You are obviously already on top of the concept, but in the paper "Faster
Population Counts Using AVX2 Instructions" we tried to give a clear
explanation (with diagrams) of how it can be faster to roll your own popcnt
using Carry-Save-Adders:
[https://arxiv.org/abs/1611.07612](https://arxiv.org/abs/1611.07612)

