

Ask HN: Workflow for Scientific Computing in the Cloud? - diab0lic

I have recently come into the situation where I need to run cloud computing on demand for my research. Amazon&#x27;s EC2 Spot Instances are an ideal platform for this as I can requisition an appropriate instance for the given experiment {high cpu, high memory, GPU instance} depending on its needs. However I currently spin up the instance manually, set it up, run the experiment, and then terminate manually. This gets tedious monitoring experiments for completion, and I incur unnecessary costs if a job finishes while I&#x27;m sleeping for example. The whole thing really should be automated. I&#x27;m looking for a workflow somewhat similar to this:<p>Manually create amazon machine image (AMI) for experiment.<p>Manually issue command to start AMI on specified spot instance type.<p>Automatically connect EBS to instance for result storage.<p>Automatically run specified experiment, bonus if this can be parameterized.<p>Automatically terminate spot instance on job completion.<p>Something like docker that spun up on demand spot instances of a specified type for each run and terminated said instance at run completion would be absolutely perfect. I also know HTCondor can back onto EC2 spot instances but I haven&#x27;t really been able to find any concise information on how to setup a personal cloud — I also think managing an HTCondor installation may negate the savings of all the automation. Do any other HN users have similar problems? How did you solve it? What is your workflow? Thanks!<p>NOTE: I posted this to Slashdot as well, but reposted here as I don&#x27;t think it&#x27;ll get much attention there.
======
jrmcauliffe
I don't have a Scientific Computing Workflow/Background, but the AWS
recommended tool for automated deployment to AWS is Cloudformation

[http://aws.amazon.com/cloudformation/](http://aws.amazon.com/cloudformation/)

Connecting Spot instances to EBS volumes isn't really possible via their APIs
though. Amazon's API seems to regard Spot Instances as stateless compute
entities, you need to push the results to a real on-demand instance (with
attached EBS Volume) or something like ec2.

If you have massive compute requirements, companies like

[http://www.cyclecomputing.com/](http://www.cyclecomputing.com/)

will do the hard work for you, otherwise with a bit of configuration work you
can spin up a master node with an ebs volume along with a set of spot
instances (of any type) to perform the work.

Checkout the Cloudformation template examples. There's bound to be something
that's pretty close to your use case (including software installs etc).

