I have worked on this problem many times, at many companies. I am working on it ...

dr_zoidberg · on Aug 31, 2021

With regards to 1, I wonder: why would calculating the Hamming distance be slow? In python you can easily do it like this:

    hamming_dist = bin(a^b).count("1")

It relies on a string operations, but takes ~1 microsecond on an old i5 7200u to compare 32bit numbers. In python 3.10 we'll get int.bit_count() to get the same result without having to do these kind of things (and a ~6x speedup on the operation, but I suspect the XOR and integer handling of python might already be a large part of the running time for this calculation).

If you need to go faster, you can basically pull hamming distance with just two assembly instructions: XOR and POPCNT. I haven't gone so low level for a long time, but you should be able to get into the nanosecond speed range using those.