It's a brutally simple algorithm, really. XOR + bitcount, which only iterates for each 1 bit, so the lower the total hamming #, the faster it finds it.
It's in C, so everything works in unsigned long blocks. Compiled with -O3 it's:
for 1M each on two forked processes, and 80 lines of code.
With the base string hash pulled out of the loop, it's looking more like 1.7M SHA1 + HD calcs/core/sec.
Of course my main calculation is only one line:
$hamming_dist = unpack("%160b*", sha1($attempt_phrase) ^ $challenge_sha);