
$4,829-per-hour supercomputer built on Amazon cloud to fuel cancer research - evo_9
http://arstechnica.com/business/news/2012/04/4829-per-hour-supercomputer-built-on-amazon-cloud-to-fuel-cancer-research.ars
======
JunkDNA
While it's nice from a technical perspective, this is unlikely to lead to a
cancer cure. Having worked in cancer drug development, I can tell you, there
is no shortage of cancer targets. Researchers have a list of targets they want
to hit, and chemists are pretty darn good at designing small molecule
compounds to hit them.

The problem in cancer is not that we don't understand individual proteins or
the way that drugs bind to them (the problem being solved in this article). It
is that the biology of cancer is a crazy web of highly complex interactions
and feedback loops of which we have a pathetically rudimentary understanding.
So even when you think you're hitting a target that should kill the cancer,
you find out there's some side pathway that spools up and limits the drug's
efficacy (or worse, the cancer cells actively pump your compound out). If I
recall, something like >95% of new cancer therapies fail in clinical trials.
Most of the failure in this drug category is due to lack of efficacy (even
though they hit a target, they don't do jack for treating the cancer). If you
could make that number 85%, you'd probably be a Nobel contender.

Unfortunately, there's no digital route here. What needs to be done is a _lot_
of slow, messy "wet bench" biology. There is no electronic shortcut to
understanding how living cells work.

~~~
xaa
(Bioinformatician here). Although I think bench work is the most obvious route
and the most likely way these problems will be solved, there are _in
principle_ some computational ways they could be addressed.

If we had good computational models of how perturbations would affect
transcription networks, for example, we could predict these "side pathways"
that so often occur in humans but not in mice.

But you're right, the reality is complex, and you have do deal not only with
transcription, but translation, post-translational modification, non-coding
RNAs, the list goes on... And most current models of this type don't delve
into 3D simulation. Some people are working on whole-cell modeling but that's
in its infancy.

Ultimately I believe the breakthroughs will come fastest if we can "close the
feedback loop" by automating a lot of bench biology, and then have computers
both generate and test hypotheses.

~~~
jboggan
(former bioinformatician here)

I agree, the totality of the interactions for a single cell is so many orders
of magnitude above what we are capable of currently modeling that I fear these
computational approaches are dangerously over-hyped. Having been privy to the
state-of-the-art projects in a lab with a ~5,000 node cluster it was still
disappointing to see how rough the whole cell modeling approaches were. It's
really tough just to model a small corner of the cytoplasm and get the
diffusion of different protein and metabolites right, let alone address
organelles or chromosomal folding and surface availability. It's a mess.

~~~
xaa
Over-hyped? In my neck of the woods these approaches are treated with extreme
skepticism for just the reasons you mention.

For instance: _De novo_ protein folding is not a solved problem, so how can a
simulation predict dynamics for a protein whose conformation isn't even known?

I'm sure in the year 2150 when my grandchildren go to the doctor to be scanned
by the tricorder, the results will go to the full-cell (and full-body)
simulator...but for now, I think bioinformatics is better served by sticking
to higher levels of abstraction like transcript and protein counts (for
disease modeling purposes).

------
reitzensteinm
> As impressive as it sounds, such a cluster can be spun up by anyone with the
> proper expertise, without talking to a single employee of Amazon.

This isn't actually true. There's initially an instance limit of 20, and you
have to contact Amazon to get it lifted. You could probably order just as many
servers at Softlayer or purchase them at Dell without talking to anyone
(except the guy who confirms your credit card).

After all, they're not going to let anyone run up a $3 million bill on the
hope that it'll be paid at the end of the month!

~~~
iqster
Bump ... I've had personal experience where I needed a few 100 VMs for a short
experiment. A quick message to Amazon got the job done. Took less than a day
to get approval.

------
garethsprice
$4,829 per hour isn't the most impressive metric here. It ran for only 3
hours, at a total cost of $14,486 - compared to the build cost and lead time
involved in building a "$20-25m data center".

This has huge implications for the availability of supercomputers for smaller
organizations and use cases.

------
ajdecon
_The cluster used a mix of 10 Gigabit Ethernet and 1 Gigabit Ethernet
interconnects. However, the workload was what’s often known as "embarrassingly
parallel," meaning that the calculations are independent of each other. As
such, the speed of the interconnect didn’t really matter._

Ah, if only this were true for everything. It'd make everyone's lives a lot
easier. :-) Unfortunately there are a lot of applications out there where
inter-node communication is incredibly important: computational fluid
dynamics, in all its varied forms, is a good example of a latency-bound
application. This covers weather forecasting, aerodynamics, dynamic mechanical
modeling, and so on.

What I find interesting is that to get to 51,000 cores, they had to use AWS
datacenters all over the world. I'd love to know what kind of resources are
actually available in any given datacenter. It will vary at any given time,
but it would be useful to know how many cores are "close" to each other in a
networking sense, for applications were latency matters.

------
greglindahl
Wow. This press release even got covered in the NYT!

[http://bits.blogs.nytimes.com/2012/04/19/supercomputing-
rent...](http://bits.blogs.nytimes.com/2012/04/19/supercomputing-rented-by-
the-hour/)

The actual feat accomplished here would not surprise anyone involved in
supercomputing, and companies have done very similar things with non-
scientific tasks, such as this article from 2008:

[http://open.blogs.nytimes.com/2008/05/21/the-new-york-
times-...](http://open.blogs.nytimes.com/2008/05/21/the-new-york-times-
archives-amazon-web-services-timesmachine/)

describing how the New York Times itself did a similar computation.

------
tybris
Stowe was just speaking at AWS summit, lots of AWS goodness there:
<http://aws.amazon.com/live/>

------
drostie
Man, and I thought it was bad when I set in motion a 3 hour computation and
lose it due to a bug in the code that writes the results to disk. At least it
doesn't set me back $15k to redo the computation...

------
iqster
I was in the audience at the NY AWS summit ... quite a loud applause when they
give their per hour cost number. I was also impressed. However, I found it a
bit surprising that that it took 2 hours to acquire the VMs, and 1 hour to do
the actual work.

~~~
reustle
I was there as well. Pretty cool stuff!

------
ChuckMcM
Two things struck me about that article, the pharmaceutical guy who felt like
with enough cores they could find the cure to cancer, and Cycle's challenge of
moving past 50,000 cores.

What struck me is that I wonder why Google (or Amazon) hasn't put out the cure
for cancer. The actual extent of Google's infrastructure is classified, but
using open sources its clear that putting together even half a million 'cores'
is not a huge project for them. (Think of it this way, they spent a billion
dollars building a data center that the actual building/land/etc estimates out
at about 200M.)

~~~
kisielk
Google does invest in life sciences startups:
<http://www.googleventures.com/portfolio#life-sciences>

~~~
ChuckMcM
Perhaps I should be a bit more clear and less snarky. I think Cycle has had a
great press release, it has advertised their product well, and like any good
press release it doesn't so much read like an advertisement for a particular
company as it does as real news. This defines great execution of the PR
technique known as 'article placement.'

One of the qualities of making it a great soundbite is that Schrodinger, the
company that it nominally the topic of the story, goes on about how their
1,500 core cluster can't give them the resolution they need but a 50,000 core
cluster makes everything clear. Understand that a 'westmere' processor is
potentially 12 cores (if you use Intel's defintion of threads and I'm sure
they do), and the typical motherboard is 2 CPUs so that is 24 'cores' per
machine [1]. A 1,500 core cluster is 62 machines, that is a couple of cabinets
worth if you're using Supermicro boxes, less than a cabinet if your using
OpenCompute type cabinets [2]. And at maybe $3K each that is an investment of
$186K, maybe $250K if you included switches. Which is about 1/3 what it would
have cost a pharmaceutical company to buy a VAX minicomputer back in the day.

My point is that if you're in a multi-billion dollar market place, you can
afford to spend more on your hardware. And even at approximately $5K/hr a
50,000 node cluster is only 2100 Open Compute servers in 24 of their 'triplet'
cabinets.

That is about 1MW critical kW of compute power. (1MW being the power
commitment you would have to buy from a colocation center to power it) and
even _that_ is a fairly small foot print at Amazon, Google, Facebook, or even
Apple with its new $1B data center [3].

So when I read the story, I was left thinking "Gee, if using a cluster that
was 33x bigger got them such great results, why not use one that 3000x bigger?
Wouldn't that just answer the question?" And of course I took a moment to
analyze that thought and asked the question every critical reader has to ask
which is, "What is this article trying to say anyway? And do I believe it?"
And that was when it becomes obvious what the article is saying is that Cycle,
the company that makes a living creating virtual super computers for
embarrassingly parallel problems out of EC-2 instances has reached the point
where they can get 50K cores running the same problem." Which is great and all
but like a long story that is just a setup for bad pun, it leaves me feeling
jaded, and hence my snarky remark that if all it takes is more cores, Google
should stop trying to be a great advertising company and switch to being a
pharmaceutical company. Which, when you say that out loud you realize it
couldn't possibly be that easy and yes, its a snarky way of expressing
irritation that I was lead to believe there was something newsworthy here when
there wasn't.

[1] <http://opencompute.org/projects/intel-motherboard/>

[2] <http://opencompute.org/projects/triplet-racks/>

[3] [http://gigaom.com/apple/apples-new-north-carolina-data-
cente...](http://gigaom.com/apple/apples-new-north-carolina-data-center-ready-
to-roll-2/)

------
xaa
How efficient is this compared with running your own cluster? 2x slower? 10x?
In addition to latency between nodes, surely you're paying some kind of
performance penalty for virtualization?

~~~
michaelgrosner
Our department which currently runs a 1000+ core machine and 2500+ core
machine for MD simulations is still waiting on jumping onto the Amazon/Cloud
bandwagon mostly because it's still not cheaper than owning a cluster for a
few years. Granted, being at a large research university means they're not the
only clusters on campus and there's an infrastructure already in place to
maintain it.

In terms of speed, I would assume Amazon to be faster than most clusters since
they probably offer the latest and greatest computers. Lastly, MD is an
embarassingly parallel problem (don't ask me how it is) so latency isn't a
major issue.

------
sentinel
In what programming language would this cross-CPU simulation software be
built?

~~~
bbgm
The actual application used was Glide,
<http://www.schrodinger.com/products/14/5/>, which is likely Fortran or C++
given the type of app. Don't know what Cycle's core sofware for managing the
whole infrastructure is written in, but they use a lot of Chef and schedulers
like Torque or Condor in addition to Boto to orchestrate the EC2 side.

------
rgc4
And yet, no results from their massive computation. Schrodinger is well known
for being a company of liars and frauds, and unfriendly to open source and
other ideals of our community.

~~~
anusinha
To be fair, Gaussian is orders of magnitude worse. See
<[http://www.bannedbygaussian.org/>](http://www.bannedbygaussian.org/>).

