

10,000-core Linux supercomputer built in Amazon cloud - jbrodkin
http://www.networkworld.com/news/2011/040611-linux-supercomputer.html
HPC vendor Cycle Computing recently built a 10,000-core Linux cluster on Amazon's Elastic Compute Cloud, in what might have been the largest HPC deployment to date on the Amazon service.
======
BoppreH
Curious about hearing the expression before, I went to search and discovered
that "embarrassingly parallel" seems to be the correct technical term:
<http://en.wikipedia.org/wiki/Embarrassingly_parallel>

------
codex
The speed of the interconnect really matters for most supercomputer problems.
Without knowing the characteristics of that interconnect I would be hesitant
to call 10,000 machines at Amazon a supercomputer.

~~~
Florin_Andrei
They say in the article the algorithm is "embarrassingly parallel", so then
it's okay.

But you're right, if there are any interdependencies then the interconnect
becomes important.

~~~
codex
Ah, forgive me; I didn't see that.

I wonder how they'd count Folding@Home, then: 500K active clients, 6M total
clients, but only a fraction of their clients are active at any given point in
time.

------
mgw
This makes me wonder, how many idle computing resources Amazon has on standby.
Are there any numbers on this in the public?

~~~
BoppreH
The thing that always bothered me about cloud computing is where the cloud
gets its resources when _everybody_ has a spike, such as during holidays.

~~~
lallysingh
I recall EC2 going to complete crap during the holidays -- it is (or at least
was) their spare christmas capacity.

~~~
amock
EC2 has never been spare capacity: [http://www.quora.com/How-and-why-did-
Amazon-get-into-the-clo...](http://www.quora.com/How-and-why-did-Amazon-get-
into-the-cloud-computing-business)

------
jensnockert
The problem in HPC is less often pure CPU horsepower though, it is often cache
or memory bandwidth, or in the interconnects.

I guess you might be able to build a system in the cloud to provide TOP500
level of performance, but it would be pretty hard even with the fancy EC2 HPC
instances (<http://aws.amazon.com/ec2/hpc-applications/>).

~~~
quail_bird
better than might... you can: <http://www.top500.org/system/10661>

~~~
adestefan
Amazon can. That has no information about how the nodes were allocated. They
could have hand picked X rack of nodes that were all connected via the same
switch, etc. You don't get that guarantee from AWS.

~~~
quail_bird
Fair enough, I guess.... they _could_ have done many things.

Although they do not provide an answer, here are some links to additional info
- I spent some time searching for additional info on the Top500 setup, but
found little:

* [http://aws.typepad.com/aws/2010/07/the-new-amazon-ec2-instan...](http://aws.typepad.com/aws/2010/07/the-new-amazon-ec2-instance-type-the-cluster-compute-instance.html) * <http://news.ycombinator.com/item?id=1904590>

------
aeroevan
> its calculations were "embarrassingly parallel," with no communication
> between nodes

That's probably the only type of process that would work in the cloud. Most
HPC applications require lots of communication between nodes, so I don't think
I would call this a proper supercomputer.

~~~
arctangent
I agree. At large scales the speed of light becomes a limiting factor.

------
brianbreslin
I must say these are my favorite types of articles on HN. I also think these
are the perfect use cases of cloud computing platforms such as AWS. Not sure
why massively parallel and "embarrassingly parallel" computing intrigues me.

------
riobard
The best part of it? It doesn't cost millions of dollars to use! Only
thousands!

------
tarpsocks
Am I missing something? If the performance scales linearly, they are at 1000
computers internally (1/10 * 10,000), and it was said to take 8 hours. That
would only be 80 hours if they hadn't have used this service.

This makes me believe someone is lying about something in this article.

~~~
kalleboo
Perhaps their internal capacity is already tied up in other tasks, so while
they have 1000 cores internally, they can't all be monopolized for 80 hours
for a single task like the AWS machines can.

------
nraynaud
who cares about P=NP when you can do that for $8000 ?

~~~
nraynaud
Thanks guys for telling me about P/NP

time for me to teach you something : <http://en.wikipedia.org/wiki/Humour>

~~~
jokermatt999
Humour is pretty much seen as noise on Hacker News. If a joke also has some
insight about the article, it will get upvotes, but if it's just a joke it
gets downvoted.

------
ruslan
Seems like those guys knew nothing about HPCs. Why didn't they run LINPACK
test? It's essential to measure any parallel computing system even of two
cores. Also, any first grade CS student knows that the most significant part
of an HPC is not the cores, but the network. You need to connect hosts using
Infiniband or alike. Using regular ethernet is futile because of high latency,
you will waste 90% of CPU cycles in data exchange/syncronization wait loops. I
bet they could achieve a way better results on just 1/3 number of cores or
even less.

~~~
mukyu
LINPACK is about as useful of a benchmark as BogoMips.

""". Genentech benefited from the high number of cores because its
calculations were "embarrassingly parallel," with no communication between
nodes, so performance stats "scaled linearly with the number of cores," Corn
said."""

That is a direct quote from the article.

