this is my code (using mighty, our python based cluster service):
word_generator = mighty.word_generator("wordlist",(mighty.APPEND_NUMERALS,5))
phrase_generator = mighty.word_concentate(word_generator,' ')
job = mighty.match_to_hash('PROVIDEDHASH',mighty.SHA1,phrase_generator.iterate_all)
job.priority = mighty.CRITICAL
EDIT: added CRITICAL as the priority or it takes longer :)
EDIT: ok so the code is a bit the wrong side of the tracks. It wouldnt take long to refactor it :)
Since this is much larger than the total hash space (2^160 = 10^48), you can hope you'll get a collision after the number of tries equal to half of the hashspace (10^48 / 2). That means you'll have to generate 10^47 hashes. At 5 minutes of runtime, you'll have to generate 10^45 hashes per second.
Bottom line: if you can run this on 200,000 cores, you'll have to generate 10^40 hashes per second per core, and that seems impossible to me.
I cant iterate the whole lot - but I can cover a substantial portion more of the keyspace in 5 mins than most could in the 30hrs. I bet 5 minutes will easily get me damn close :)
You won't even cover a fraction of a percent of the keyspace before the sun burns out.
Also, (a) you're allowed to permute case of letters, exploding the search space even more, which is handy because (b) I presume the target sha1 will come from words not on the word list, making an exact match unlikely (hence score based on hamming distance).
Anyway thanks for the code examples, I'm pretty sure I can learn something from them.
Yes you get 500 times the attempts in the 30hr period. But if you run the figures it's something like 0.00000001% difference in the amount of keyspace you can cover :)
AKA just run it on a couple of PC's an hope you get lucky!
Had very similar thinking, but decided against it because of usage policies. :-p
Be careful. This is not "exactly" legal, and it could hurt you. Make sure you also throw some legitimate computational work at the clusters so you have an alibi at least.
Note that it doesn't do any random chars at the end of the string and doesn't change the case of the words.
I've done mine in Pike, which will be faster (though not as fast as C).
My thought is that there are so many possibilities and so few CPU cycles available to me that making random guesses is a better way to approach it. Kinda like winning the lottory, except harder.
Personally I'd run through a thousand bucks of computer time to play around with something or test a hunch before considering a lotto ticket.
Lotto tickets are taxes for people that are not good in mathematics.
Intel(R) Xeon(R) CPU 5148 @ 2.33GHz gives me 300k/sec.
Anyone else have a quick & dirty benchmark?
It's a brutally simple algorithm, really. XOR + bitcount, which only iterates for each 1 bit, so the lower the total hamming #, the faster it finds it.
It's in C, so everything works in unsigned long blocks. Compiled with -O3 it's:
for 1M each on two forked processes, and 80 lines of code.
With the base string hash pulled out of the loop, it's looking more like 1.7M SHA1 + HD calcs/core/sec.
Of course my main calculation is only one line:
$hamming_dist = unpack("%160b*", sha1($attempt_phrase) ^ $challenge_sha);
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.