

Engineyard Challenge Python Code - bcl
http://fpaste.org/paste/18766

======
ErrantX
I dont wish to be, umm, mean. But I know of at least 3 companies (ours
included) that could run this in a few minutes.

this is my code (using mighty, our python based cluster service):

    
    
      import mighty
    
      word_generator = mighty.word_generator("wordlist",(mighty.APPEND_NUMERALS,5))
      phrase_generator = mighty.word_concentate(word_generator,' ')
    
      job = mighty.match_to_hash('PROVIDEDHASH',mighty.SHA1,phrase_generator.iterate_all)
      job.priority = mighty.CRITICAL
      mighty.submit_job(job)
    

Done. Takes about 5 minutes (I wont enter - it's not fair).

EDIT: added CRITICAL as the priority or it takes longer :)

EDIT: ok so the code is a bit the wrong side of the tracks. It wouldnt take
long to refactor it :)

~~~
decode
This isn't actually correct, is it? Assuming an average 6-letter word length,
there are about 10^67 valid combinations that your iterate_all() can go
through: (((2^6) * 1000) ^ 12) * (94 ^ 5) = 3.5 * 10^67

Since this is much larger than the total hash space (2^160 = 10^48), you can
hope you'll get a collision after the number of tries equal to half of the
hashspace (10^48 / 2). That means you'll have to generate 10^47 hashes. At 5
minutes of runtime, you'll have to generate 10^45 hashes per second.

Bottom line: if you can run this on 200,000 cores, you'll have to generate
10^40 hashes per second per core, and that seems impossible to me.

~~~
ErrantX
Weeell we have 20,023 computers (that fluctuates a tad). That's probably about
30,000 cores in the end.

I cant iterate the whole lot - but I can cover a substantial portion more of
the keyspace in 5 mins than most could in the 30hrs. I bet 5 minutes will
easily get me damn close :)

~~~
decode
I'm seeing about 1 million hashes per second on a single core. So, you can do
about 10^11 hashes per second on 30,000 cores. At that rate, it will only take
you 10^48 years to cover the keyspace.

You won't even cover a fraction of a percent of the keyspace before the sun
burns out.

------
herrherr
Theoretically I'm having around 500 CPUs with 3gHz each and 2000GB RAM at my
disposal. Maybe I should give it a try. Hint: University Cluster =)

Anyway thanks for the code examples, I'm pretty sure I can learn something
from them.

~~~
ErrantX
500 cores gives you no really higher chance of getting a good combination over
someone with one PC.

Yes you get 500 times the attempts in the 30hr period. But if you run the
figures it's something like 0.00000001% difference in the amount of keyspace
you can cover :)

AKA just run it on a couple of PC's an hope you get lucky!

------
bcl
Here's a first pass at a python implementation for the Engineyard Challenge
([http://www.engineyard.com/blog/2009/programming-contest-
win-...](http://www.engineyard.com/blog/2009/programming-contest-win-
iphone-3gs-2k-cloud-credit/)).

Note that it doesn't do any random chars at the end of the string and doesn't
change the case of the words.

------
jamesotron
Ahh, I see that you're generating an entirely new random string each
iteration. Interesting.

I've done mine in Pike, which will be faster (though not as fast as C).

<http://github.com/jamesotron/HashChallenge/tree/master>

~~~
ncarlson
I've started mine in erlang. I'm going to run 10 dollars worth of ec2 time and
see how far I get.

~~~
rottencupcakes
I've got mine in C. I probably won't run it, just wrote it for fun. Honestly,
if you were thinking of spending a cent on this competition, I think you'd get
a better ROI buying a SuperLOTTO ticket.

~~~
jacquesm
Lotto tickets aren't nearly as much fun as coding, so what's another 10 bucks.

Personally I'd run through a thousand bucks of computer time to play around
with something or test a hunch before considering a lotto ticket.

Lotto tickets are taxes for people that are not good in mathematics.

~~~
ighost
or they just play for fun

~~~
eru
Alas, being good in mathematics spoils the fun.

------
matta
I'm getting about 250k sha1 hamming distance calculations per core per second,
on an Intel(R) Xeon(R) CPU 3050 @ 2.13GHz (from /proc/cpuinfo). This is a full
implementation... I can just drop in the word list and challenge phrase on
game day.

Intel(R) Xeon(R) CPU 5148 @ 2.33GHz gives me 300k/sec.

Anyone else have a quick & dirty benchmark?

~~~
cyberfox
I'm getting more than 1M SHA1 + Hamming Distance calcs/core/sec on a Intel(R)
Core(TM)2 Duo CPU E8400 @ 3.00GHz, and I haven't really optimized beyond the
obvious.

It's a brutally simple algorithm, really. XOR + bitcount, which only iterates
for each 1 bit, so the lower the total hamming #, the faster it finds it.

It's in C, so everything works in unsigned long blocks. Compiled with -O3
it's:

real 0m0.968s

for 1M each on two forked processes, and 80 lines of code.

~~~
cyberfox
Oooog. Boy is my face red. I was recalculating the main SHA1 each time through
the loop. A minor refactoring mistake with a major impact...

With the base string hash pulled out of the loop, it's looking more like 1.7M
SHA1 + HD calcs/core/sec.

~~~
matta
Yeah, mine's Perl, and running on a slower processor, so the difference feels
about right.

Of course my main calculation is only one line:

$hamming_dist = unpack("%160b*", sha1($attempt_phrase) ^ $challenge_sha);

:)

------
jamesotron
I found this academic paper on finding collisions in SHA1, but unfortunately
I'm not smart enough to figure out what it's telling me.
[http://people.csail.mit.edu/yiqun/SHA1AttackProceedingVersio...](http://people.csail.mit.edu/yiqun/SHA1AttackProceedingVersion.pdf)

------
ighost
Is this supposed to run on Unladen Swallow? Or does the GIL not come into play
here?

~~~
danielh
It uses the multiprocessing package, which is new in Python 2.6. From the
docs:

 _The multiprocessing package offers both local and remote concurrency,
effectively side-stepping the Global Interpreter Lock by using subprocesses
instead of threads._ <http://docs.python.org/library/multiprocessing.html>

~~~
bcl
multiprocessing is also available in 2.5, which is what I used, just do:
easy_install multiprocessing

------
sync
and here's a probably slower, brute force perl variety:
<http://gist.github.com/149173>

