
$1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud - duck
http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
======
dholowiski
I wonder what kind of 'bad activity' detection Amazon does, if any? Did they
have to call Amazon ahead of time, just to warn them they were about to boot
up "3,809" instances? If I tried to do that, would Amazon prevent it?

The reason I ask... how hard would it be to boot up, say 10,000 micro-
instances (using a stolen credit card or AWS account) to be used for a DDOS?
What do you have to do before red lights start appearing in the AWS NOC?

~~~
davidblair
Amazon limits you to 100 Spot instances per region without contacting them to
change the limit for your account.

[http://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_run...](http://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_run_in_Amazon_EC2)

~~~
pavel_lishin
That wouldn't prevent someone from creating multiple accounts with stolen
credit cards though, right? num_accounts * 100 = DDOS.

~~~
dholowiski
That wouldn't prevent it but it would make it much more difficult. Although, I
suppose you could automate the whole thing, right from the stealing of the
credentials to the booting up of 100 new instances and adding them to the ddos
cloud. I wonder if this has ever been done, or if the number of AWS users is
just too low to make it worthwhile (like the old argument of why macs don't
have viruses)?

------
ChuckMcM
This is an interesting result, customer pays slightly less than a
million/month (1279 x 24 x 30 = 920880) to run such a cluster. Using a
'Westmere' class processor (2 procs/mobo 6 cores per proc) that is 2500
machines, at 400W per its a MW of electricity (call it 111K$/month with a
.15/kWH cost(includes cooling)). It would be interesting to price out the
other costs for the machines to understand what sort of revenue that would be.

I wondered about spreading it out around the country through, seems like you
would incur a lot of latency which might become a bottleneck.

~~~
ww520
The advantage of an AWS cluster is that you can shut it down when you don't
need it, whereas your own cluster needs to be running all the time to justify
its comparative cost.

~~~
ChuckMcM
Aye, I agree with that whole heartedly. Its the whole timeshare market I
wonder about. Basically if there is a market for this 'size' cluster for about
$1,300 an hour could you build one of these and rent it out like Amazon does
but more efficiently. (Sort of can you cut costs by specializing into a
particularly lucrative segment of the market).

So if 100% 'occupancy' on your cluster is worth a million a month, and your
cost of 'owning' a cluster of this scale is a quarter million a month, then
you need to do better than 25% occupancy to break even, and anything better
than that and you make money. It is an interesting financial exercise if
nothing else.

~~~
ajdecon
The company I work for, R Systems (<http://www.rsystemsinc.com/>), basically
does this. We own and admin several medium-sized compute clusters (200-500
node clusters) which are rented out to customers who need a lot of compute on
a temporary basis.

A lot of what we do is heavily custom, and we provide a lot of support for our
customers, but we still typically beat Amazon on both price and performance.
EC2 might work nicely for embarrassingly parallel workloads, but they don't
have Infiniband available if you're latency-bound... :)

------
ericHosick
Amazon's EC2 technology is quite amazing (to me anyway). We've been using
their CloudFormation service. This has allowed us to create our entire server
environment (rails/php, mysql, mongo, load balancer, route53, security groups,
alarms, etc.) for different stacks (sandboxes, staging, production) with a few
clicks.

"Cycle combines several technologies to ease the process" - Did that include
Amazons CloudFormation services?

~~~
nivertech
I doubt it, since CFN (CloudFormatioN) doesn't support spot instances yet.

------
mrb
Chances are that this pharmaceutic company's "embarrassingly parallel"
workload could be ported to GPUs to run on anywhere from 1/10th to 1/100th the
number of machines.

Porting the application to GPUs may offer a good ROI if the company intends to
run this workload often enough.

~~~
ghshephard
Depends on how memory/storage intensive the tasks where: "26.7TB of RAM and
2PB (petabytes) of disk space. "

One advantage of going with Amazon, is their really high-speed and voluminous
ephemeral storage available per instance in addition to your EBS backed root
volume.

~~~
mrb
They sounded completely CPU-bound from the blog post. They went with "high-
CPU" c1.xlarge instances that had modest RAM and storage specs. They gave no
details and expressed no concerns whatsoever about RAM or storage bottlenecks.

[http://blog.cyclecomputing.com/2011/09/new-cyclecloud-
cluste...](http://blog.cyclecomputing.com/2011/09/new-cyclecloud-cluster-is-a-
triple-threat-30000-cores-massive-spot-instances-grill-chef-monitoring-g.html)

~~~
ghshephard
One way to find out - I posted a comment on that blog. We'll see if we get an
answer.

------
dotBen
If they have the headroom to spin up 30,000 cores I wonder what their total
headroom is in each availability zone - and what %age that is of their total
cores.

Sure, we'll never know - but it's an interesting thought experiment.

------
calloc
What I find more amazing than anything else is that Amazon apparently has
30,000 spare cores that one can ask for when needed.

~~~
robryan
Probably not spare, probably just running spot instances at a lower bid price.
You would assume that these guys paid a little bit of a premium to jump over
everyone else.

------
merrick
I worked on a small scale cluster at a pharma company 6 years ago. The
scientists goal was to discover new compounds to patent without having to do
the screening by hand. The cluster would screen thousands of compounds
simultaneously all day long. It was very interesting work.

------
pavel_lishin
I'd be curious to see how much $$$ in terms of developer hours it took to set
this up.

~~~
pinko
I know the guys who did it. It took an enormous amount of hard-won experience
from similar but somewhat smaller spinups; experience that would be very
expensive to replicate with a fresh team. Given that experience, however, it
took them relatively few developer-hours to accomplish.

------
bmh100
I find this extremely interesting. Does anyone have any experience or strong
interest in this sort of work? I am looking to one day use many cores like
this, although at a smaller scale.

------
praeclarum
So when your latency goes from 150ms to 300ms, you'll know who to blame.

