- Running EMR jobs on cc2.8xlarge machines as spot instances is a great way to get a LOT of computer power very cheaply. Because our jobs are periodic we run both Core and Task as spots and simply retry the job if our machines get terminated. I did a lot of benchmarking and found that a small number of cc2.8xlarge machines out-performs and is cheaper than a large number of lesser instances (and I tried most of the lesser machines). In us-west-2 it's very uncommon to lose our instances, unlike us-east-1 which has major price fluctuations (this is true for all types of spot instance).
- The cr1.8xlarge has fantastic performance, relative to the rest of the AWS machines. It's also very expensive compared to the cost of hardware or a similar solution on another cloud provider. Since we're fully integrated with AWS and don't want to run our own hardware we're sucking up the cost for now, but it's definitely a sore-point in our budget. The cr1.8xlarge is also all-round a better machine than the hi.4xlarge, which has a lot of disk but is pitiful in terms of CPU.
I took a look at Whirr  but I don't see how having a cloud-agnostic platform helps - are there really alternatives to EMR out there? Can they give me 500+ cc2.8xlarge equivalent machines on-demand but at spot prices?
 Assuming this is the Whirr to which you refer: https://whirr.apache.org/
Thanks for the tip!
thanks for sharing.