The Construction of Bit Mixers

jandrewrogers · on Aug 15, 2021

The similarity of bit mixers is an example of convergent evolution in code.

Many people have done brute-force searches for bit mixers with a good balance of several quality criteria and performance, the latter being somewhat hardware dependent. If you restrict yourself to basic ALU operations, you pretty much always end up in the same part of the bit mixer phase space even allowing for moderate differences in selection function.

There are two design elements of interest that are more difficult to search for using brute force, though possible in principle. The first is taking advantage of ALU parallelism, which can have a large performance impact but is more challenging to model. The second, particularly in the context of hashing, is taking into account that the input may be partially pre-mixed. A generally optimal bit mixer may be slower than one that is specifically optimized from the assumption that the inputs have been partially pre-mixed by a specific algorithm.

gkbrk · on Aug 15, 2021

I've also been playing around with bit mixers and hash functions recently. I use avalanche diagrams to visually inspect the diffusion, and SSE (sum of squared errors) to compare the bias of different mixers.

https://www.gkbrk.com/wiki/avalanche-diagram/

I guess the next step is to evolve / optimize good constants to improve them further, and rewrite everything in C to make evaluation faster.

sillycross · on Aug 15, 2021

It's very interesting to see how those hash function finalizers are constructed/discovered behind the scene. It seems to me that instruction count is not the best metric (since slow instructions and fast instructions have drastically different speeds, and CPU internally can pipeline instructions), probably sorting the candidates by real clock time performance is more reasonable?

Straw · on Aug 15, 2021

Nice work, I've always wondered if we can brute-force search for good mixers like this!

Take a look at the PCG paper for a strong family of mixers, including most of what you've tried:

https://www.pcg-random.org/paper.html

Multiplication is very good for mixing low to high, and xorshift moves the bits back to low. PCG also has tricks like "random" xor-shift, where the shift is determined by the top bits.

Additionally, you might want to try varying the word size as in PCG- as you get into strong mixers, they'll all pass PractRand (or take excessively long to fail), so to make things harder you can force them to have less state to work with.

Oh, to convert these invertible mixers into RNGs, you might not want to output the entire result, or you'll fail a birthday test after 2^32 outputs or so. You could do something like squish two halves of the output together, or perhaps even xor the counter into the output like in ChaCha.

tangjurine · on Aug 15, 2021

Take a look at https://en.m.wikipedia.org/wiki/Feistel_cipher for generating hash functions gaurenteed to be invertible.

aappleby · on Aug 15, 2021

Very nice work. Try putting byteswap into the set of possible instructions and see what happens.

-Austin, murmur3 author

lloydatkinson · on Aug 15, 2021

Why do so many sites insist on not having links to the home page? I had to delete some of the url to go there which is just painful on mobile.