

Engineyard SHA Contest Leaderboard I threw together - jazzychad
http://jazzychad.com/engineyard/

======
mrduncan
It looks like "hashbreaker" and "CodingCrypto" are both running through nearly
the exact same word sets. Either they are from to different people using the
exact same algorithm or (what I suspect is more likely) they have created
multiple twitter accounts to get around the submission limit.

    
    
      hashbreaker
      bUGs BUgs bUGS bUgS bLAnk BlAnK BlANK BLank buGS bugS bUGS bUGs dCao2
    
      CodingCrypto
      BugS BUgs buGs BUgs bLANK Blank bLanK BlAnk BuGs BuGs BUGS bUGs FcBkw
    
      CodingCrypto
      cOWs cOwS cOwS cowS DIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows owjo0
    
      hashbreaker
      CoWS coWS CowS cOWS DiRTY DIrtY DirTy DIRtY COWS COWS cOwS cOWS bvpDq
    
    

Edit: Also, accounts had their first tweets on the 19th.

~~~
DocSavage
I didn't scrutinize the sha-1 algorithm, but is it possible to optimize its
computation (e.g., precompute intermediates) when using a small number of
repeated words as input? This might explain why a group of people all arrived
at similar solutions, since it only matters how quickly you can guess if all
guesses are pretty much equal.

I lucked out and got a solution at 35. Coded something up in python and only
got about 85k hashes/sec on one core. The CUDA program, though, was about 100x
faster on my low-end Nvidia 8400 GS.

~~~
gojomo
Yes: very roughly, you could precompute the internal state after 'abcdefg' --
essentially checkpointing the algorithm -- and then compute SHA1('abcdefgX'),
SHA1('abcdefgY'), etc. more easily than other strings. (In fact, you'd want
your checkpoint aligned at some multiple -- maybe 512 bits? -- but that's the
general idea.)

I figured the 'final 5 arbitrary characters' was a nod to this approach. So
the best strategy I could think of -- and I didn't try coding anything up for
the contest -- was to (1) guess the word dictionary; (2) precompute a whole
bunch of checkpointed-except-for-last-five-characters function states; (3) at
the moment the real dictionary and target was announced, try many different
last-five-character extensions against any valid precomputed states.

------
brown9-2
Kind of interesting or funny that these entries are pretty close to the top of
the standings:

    
    
      31  	d241c4b4dd38aacb197c8e48e9f8488264e39a83  	
      bUGs BUgs bUGS bUgS bLAnk BlAnK BlANK BLank buGS bugS bUGS bUGs dCao2
     
      32  	334945bffdd1000f1d198e02497188a2b17f1991
      cOWs cOwS cOwS cowS DIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows owjo0
    

(assuming the score are correct)

------
theblackbox
Would the contest have been any easier had one made the (dubious) assumption
that the "passphrase" encoded would _likely_ be a valid sentence?

I've not tried any crypto work before but this has really grabbed my interest.
It just occurred to me however, that discarding lines of inquiry leading down
the "BuGS bugs bUgs BUGs BLAnK BlAnK....." route in favour of something that
made more sense grammatically could be a good tactic to save time?.... or it
could just increase the overhead in generating the next hash?...

~~~
eru
Unless you found the exact sequence, the difference between hashes won't tell
you how close you are.

~~~
theblackbox
I don't see what you mean by "the difference between hashes won't tell you how
close you are" .... isn't that the whole point of this competition?

Say for instance you coded four stages:

1/ Generating a list of strings for analysis 2/ Parsing that list for
sentences that made sense 3/ Hashing these sentences 4/ Comparing the hashes
generated with the one being sought

to me it just looked as though a lot of gibberish was analysed because there
is a lot more entropy than order to sift through and nobody tried step 2...
does my logic make sense or am I getting mixed up and not really seeing the
bigger picture?

~~~
jodrellblank
There's no benefit in trying step 2 because the gibberish input is as likely
to be a close match as a sentence that makes sense.

A sentence that starts 'Be' (001000010...) and the same sentence changed to
'He' (001001000...) would produce very different hash results by the Avalanche
Effect ( <http://en.wikipedia.org/wiki/Avalanche_effect> ). So having similar
letters in a similar order or in similar positions, or using similar length
words, wont help.

 _Isn't that the whole point of this competition?_

The point was to get a hash with a similar quantity of 0 and 1 bits in the
output as their hash, not to guess their sentence (they said what it was).

~~~
theblackbox
_The point was ... not to guess their sentence (they said what it was)._

Ahh, I missed that! That makes sense in terms of the Avalanche Effect - cheers
for the knowledge

------
0wned
Twitter accounts with "private" updates do not show up on the public
timeline... my three submissions are nowhere to be found and they do validate.

------
bcl
Good job! Could you add a timestamp and option to sort by it? (I know, feature
creep!)

ETA: I also find it interesting that the lowest score from those not using the
optional 5 random chars is currently 38. All the lower ones use random chars.

~~~
philfreo
Right now you're showing duplicate entries (the same entry of value 31 is
showing from two people)

~~~
jazzychad
That's because somebody just copied and pasted somebody else's entry. The
system is working as intended. I'm sure Engineyard will catch it in their
judging (or so I hope).

------
twism
the rules says you can "append up to 5 characters" but you consider entries
with less than 5 optional characters invalid.

~~~
jazzychad
No, there are several listed with less than 5 extra characters. It's just that
most people are using all 5.

~~~
rneufeld
My entry (rkneufeld) is cut off at the end. Actual suffix is: "3:OID". This
moved me from 34 to 68 :( . Just thought you might like to know

~~~
jazzychad
Apologies. I have corrected it.

