I am using a number of high-performance instances - cc2.8xlarge machines for EMR jobs, and cr1.8xlarge machines for analytics databases. From my testing:
- Running EMR jobs on cc2.8xlarge machines as spot instances is a great way to get a LOT of computer power very cheaply. Because our jobs are periodic we run both Core and Task as spots and simply retry the job if our machines get terminated. I did a lot of benchmarking and found that a small number of cc2.8xlarge machines out-performs and is cheaper than a large number of lesser instances (and I tried most of the lesser machines). In us-west-2 it's very uncommon to lose our instances, unlike us-east-1 which has major price fluctuations (this is true for all types of spot instance).
- The cr1.8xlarge has fantastic performance, relative to the rest of the AWS machines. It's also very expensive compared to the cost of hardware or a similar solution on another cloud provider. Since we're fully integrated with AWS and don't want to run our own hardware we're sucking up the cost for now, but it's definitely a sore-point in our budget. The cr1.8xlarge is also all-round a better machine than the hi.4xlarge, which has a lot of disk but is pitiful in terms of CPU.
One thing to note with EMR: you still pay 25% of the ondemand price as overhead to use EMR. If you're bringing up and turning off clusters all the time, it's probably worth it, but you might want to look into using Whirr instead.
You've a good point about the EMR charge - that's easy to overlook.
I took a look at Whirr  but I don't see how having a cloud-agnostic platform helps - are there really alternatives to EMR out there? Can they give me 500+ cc2.8xlarge equivalent machines on-demand but at spot prices?
Whirr just uses the AWS APIs to provision a cluster for you, but then you're getting a cluster built on EC2 instances rather than EMR, so you don't pay that EMR overhead. You can choose spot or ondemand instances. If your spots get reaped, I think it would fail pretty similarly to an EMR cluster built on spots. I have no idea how quickly it could provision a 500 node cluster, however.
Ahh, I see - I thought it was about being able to move my EMR jobs to other providers. Running on EC2 without the EMR charge would be nice, and assuming that there's no special casing for spot requests for EMR (I don't know of any, the spot instance requests show up like normal ones) then it may be well able to get me that many instances.