

Engine Yard Programming Contest - Win an iPhone 3GS PLUS $2k in Cloud Credit - wifelette
http://www.engineyard.com/blog/2009/programming-contest-win-iphone-3gs-2k-cloud-credit/
We’re kicking off a programming contest today that is sure to challenge even the most comp-sci heavy engineers out there, and we’re excited to see what you all come up with. With the difficulty of the challenge in mind, we’ve got some great prizes for the winner: an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit.<p>You must tweet a sequence of twelve words that when hashed is bit-wise closest to a hash of a challenge phrase that we will announce the morning of July 20th.  All words must be from a 1,000 word dictionary we will provide at that same time. You are allowed to append up to five random characters to the end of your entry. We’re pretty confident you’ll want to write a program to automate the finding of close matches, so announcing this a week in advance should give you enough time to get your programs up and running.<p>See the blog post for more details, rules and prizes!
======
brianliu
Hey, if you guys are working on this, check out these papers on SHA1
collisions:

generated in 2^39 perms -
<http://www.cs.cmu.edu/~dbrumley/srg/spring06/sha-0.pdf>

generated in 2^52 perms - <http://eprint.iacr.org/2009/259.pdf>

Since it would take an average supercomputer 5-6 days to generate the
collision, I think I'm going to give the contest a rest (generating something
close isn't useful outside the contest).

Another way to go about this is to sort huge rainbow tables by distance, which
have been already created by a few reverse hash lookups (
<http://md5.rednoize.com/> , <http://hashcrack.com>, <http://sha1-lookup.com>
) - You'll probably need to contact them directly or hurry up and make your
own.

Here's the SHA1 algorithm itself for anyone interested: (Google Cache)
[http://74.125.95.132/search?q=cache:QjZpAaNXJr0J:https://www...](http://74.125.95.132/search?q=cache:QjZpAaNXJr0J:https://www.reverse-
engineering.net/viewtopic.php%3Ff%3D6%26t%3D1208+reverse+sha-1&cd=10&hl=en&ct=clnk&gl=us)

~~~
moonpolysoft
Those papers reference a collision attack, which is not the same thing as the
point of this contest. The papers describe ways to generate pairs of messages
that will hash to the same value.

The contest describes a preimage attack, where one has a hash value and must
guess an input which would have generated that hash. SHA-1 has no known
preimage vulnerabilities, so the contest winner will either have to be lucky
or break SHA-1.

The best advice would probably be to iterate the keyspace in a novel way and
hope you happen upon a good match that someone else doesn't get.

------
MicahWedemeyer
I'm glad to see it's a fun challenge and not a "code an app for our platform"
type contest. Those have really started to annoy me as they seem very close to
spec work.

------
akeefer
I might suggest that there's some irony in suggesting that one use Ruby to
write an algorithm that's going to basically be 100% compute bound . . . seems
like this contest is a classic example of exactly the sort of problem where
you do, in fact, care about raw language performance.

~~~
gamache
I don't see anywhere on the page where they suggest Ruby. As you mention, it
is a terrible choice.

~~~
akeefer
Well, the example they give includes the words Ruby, DHH, active, record,
rspec, mongrel, jruby, and rubinius as the potential dictionary, and they
close the post with "The Ruby community is nothing if not persistent, creative
and intelligent — so show us what you’ve got!" So I'm kind of assuming the
contest is aimed at the Ruby community.

~~~
ezmobius
feel free to write the code in any language or platform you want, we don't
care.

------
luccastera
This is smart. After the contest, anytime you search for one of the terms in
their dictionary on search.twitter.com, then @engineyard will show up
(assuming there are a lot of entries).

All they have to do is make the dictionary include terms that potential
engineyard customers might search for on Twitter.

~~~
johns
Not exactly 'anytime' since Twitter search history doesn't go back that far. I
can't find the link now, but I'm pretty sure it only searches up to about 30
days back.

------
brown9-2
Just to be clear, there is probably no realistic way of winning this without
having lots and lots and lots of computing resources available to you once the
dictionary is released, is there?

Since you need to choose a twelve word permutation from a possible set of one
thousand words, plus append a 0-5 character random string, calculate the SHA1
hash for all possible permutations, and calculate the Hamming distance of each
to the hash of the challenge phrase... correct?

    
    
      1000! / 982! permutations of dictionary words * 36^5 permutations of random chars

~~~
timmaah
What about..

"You may permute capitalization for the dictionary words (i.e. you may use
Ruby, rUby, RUBY, and RUBy)"

~~~
brown9-2
Forgot to consider that; along with the fact that the 5 random characters can
be any printable ASCII characters, not just alphanumeric.

So this is somewhere upwards of 10^50 phrases to hash (in a naive
implementation), right?

~~~
sophacles
Well, are you also including the fact that there is nothing forbidding
repeated words?

~~~
jcl
While this has a marginal effect on the total number of possible hashes,
repeated words are still an important consideration: If you can guess even one
word that is likely to appear in their 1000-word dictionary, you can
precompute the hashes for all possible variants of the corresponding phrase
(that word repeated twelve times) prior to the contest, giving you a jump on
other contestants.

I'd guess that we'll either see the rules revised to prevent this or see a
dictionary filled with non-dictionary words.

------
jah
IMO, this seems to be more of a "lottery for nerds" than a programming
contest.

------
ncarlson
Here's some math:

[http://www38.wolframalpha.com/input/?i=%281000+choose+12%29+...](http://www38.wolframalpha.com/input/?i=%281000+choose+12%29+*+%282^72%29+*+%2895+choose+5%29)

The real killer is the allowed capitalization of any letter. If we assume that
the average word length will be 6 characters, then we can assume that the
average character length of any 12 word string will be 72 (not counting
spaces).

With capitalization, the permutations act like a bit string. That is, the
capitalization is either on or off. This gives us 2^72.

The resulting hash space is somewhere near (1000 choose 12) * (2^72) * (95
choose 5), or roughly 2^188.

------
edawerd
Engine Yard seems to suggest that renting cloud computing resources is the key
to win this contest, but I'd suggest using parallel computing resources to
anyone doing this. If you have an Nvidia CUDA-enabled GPU laying around, this
might be perfect for you.

------
sanj
Any tips for walking to nearby hashes from HN's crypto-gurus?

~~~
jrockway
This is the sort of thing that hash functions are designed to avoid.

Since getting closer words does not get you a closer hash, usual optimization
algorithms (like genetic algorithms, simulated annealing, etc.) are not going
to be helpful.

Perhaps I'm missing something, but I think brute force and luck are going to
win this contest, assuming that SHA1 is not broken between now and July 20th.
If you want an iPhone 3G S, you should probably just head over to the Apple
store and buy one and spend your valuable time on something else.

~~~
sanj
I'm aware that this is what hashes are supposed to avoid. However, many crypto
attacks seem to build on weaknesses in their design.

Hence my question.

~~~
javanix
If someone had come up with something like you suggested it would have been a
lot bigger news and this "contest" really wouldn't have happened.

------
gojomo
It's like Swoopo for compute cycles!

------
callmeed
I'm definitely not great at this kind of stuff, but I'll give it a try ...

Check out the Amatch ruby gem ... it can compute the hamming distance ...
might be easy to automate with ruby ...

<http://amatch.rubyforge.org/doc/index.html>

~~~
potatolicious
Except doing this through Ruby is a guaranteed bet to not get done in time.

~~~
sho
I agree that the gem mentioned might not be suitable, but Ruby could still
have a part to play, especially as it allows direct use of native libraries,
or just managing processes running on the OS.

Someone doing this should probably write the fast-spinning cogs in C, but it
might be worth using Ruby to manage threads, processes, data segments, etc -
indeed, maybe prototype the C code there first too. You'd gain an awful lot of
convenience for very little overhead.

------
jpwagner
Anyone able to break 10 using the example they gave?

    
    
      "I am not a big believer in fortune telling"

~~~
sunir
10 is a stretch. After a couple hours of computing several hundreds of
millions of attempts, the best I achieved was a distance of 43 bits (assuming
duplicate words are acceptable).

jruby rubinius MRI Cloud postgresSQL record MRI exception record mongrel tokyo
Cloud

This task is so unbalanced as a contest. The thought of spending a couple
hundred dollars of cloud computing time to just buy more 'lottery tickets' for
$3000 worth of prizes (assuming you could use the cloud time) is too much of a
stretch for me.

~~~
0wned
10 is a _huge_ stretch. My testing in C++ agrees with you. The lowest score
I've gotten is 41 after about 12 hours. Sure, I may get lucky and get one in
the high 30's or something, but it would just be blind luck and persistence,
nothing more. Not a very cool contest IMO.

------
icey
Cool contest, but starting it on a Monday morning sucks pretty badly.

~~~
profquail
You should be coding it now, you know what all of the rules are. Come Monday,
you would just plug in the desired hash value and word dictionary and let it
run for the 24 hours of the contest.

I think I'm going to try to put something together this weekend just for the
heck of it. It would be cool if they published the code (with permission) of
the top submitters, just to see how they went about solving it.

~~~
icey
If you can seriously write software and not have to tweak it to run as you go
for this type of challenge, then my hat is off to you. Unfortunately, I'm a
mere mortal and would mostly likely need to make changes once I see the "real"
data / dictionary.

~~~
profquail
You could make your own word dictionary, pick 12 random words (and append a
few random characters to the end of the string) and come up with your own
hash.

Now that you have the hash and the word dictionary, you "forget" that you know
the real string, and you set your program to try to find it.

At least that would make sure there weren't any major bugs in it (logic bugs
not included) before you tried to solve the real problem. Since you only have
24 hours to crack it, every extra second will count ;)

~~~
icey
I don't think we're talking about the same issue.

Of course you'd have to write the application before the 20th. My point is
that for people with day jobs, Mondays are almost always the busiest day of
the week so I'm sure that will preclude a number of people from participating.

------
oliverkofoed
Is this a marketing or a recruitment exercise? Or both? :-)

~~~
profquail
At least marketing, perhaps a bit of recruiting if you come up with a clever
solution. If they aren't hiring, perhaps some else that is will see your name
up on the leaderboard.

