Great writeup, thanks! I'm confused by your conclusion, though: "I wish I would have identified Rackspace initially as the platform of choice."
From the results of the contest, I would have concluded that the platform of choice was CUDA. Do you think otherwise? Is there a reason you'd choose to use a flotilla of Rackspace servers for a problem like this rather than a graphics card? If you are going to be programming in C anyway, it seems like a much simpler approach.
If I was using just my own machine, or spent a few grand to put together my own CUDA farm, CUDA would have been great.
I think using cloud resources was the way to go though, because it's easy to search for keys concurrently, and you can summon an effectively unlimited amount of computing power pretty cheaply. You can add as many cloud servers as you want to your cluster without worrying about locking or divvying up work, and the number of keys your cluster is able to check per second increases linearly with the number of machines.
The winning team, awesome guys with a cluster of 8 very powerful computers and a CUDA program, were able to check just over 400 million keys per second. If I would have used 400 Rackspace cloud servers, it would have cost me about $100 to run them for the duration of the contest and I would have been achieving similar speeds.
I guess I'm feeling that the CUDA solution pays for itself pretty quickly if you are going to be using it for the long term. Personally, I was checking 200 Million keys per second on a $300 Nvidia card and got down to 34 (with at least 3 35's).
Presuming your costs are accurate, this would imply that the card would be a cheaper solution if run for more than a week. Considering the ~100x speed advantage of the card, it's rather surprising that their cloud solution is as affordable as it is.
Uhoh - I missed decimal place. 3 billion/sec would mean about 3000 of the cheapest Rackspace cloud machines, which could have been operated for 18 hours for about $810. I'm not sure Rackspace lets you have that many though.
From the results of the contest, I would have concluded that the platform of choice was CUDA. Do you think otherwise? Is there a reason you'd choose to use a flotilla of Rackspace servers for a problem like this rather than a graphics card? If you are going to be programming in C anyway, it seems like a much simpler approach.