
Ask HN: What is the best way to run a scientific experiment on a cloud platform? - georgiev
My current project is to try and classify proteins by one of their properties and I&#x27;m currently exploring which ML algorithm has the best accuracy on the dataset. I&#x27;m trying out 4-5 classifiers from python&#x27;s sklearn library and I want to run them in parallel on a cloud platform. The reason for running them in the cloud is that they take 6GB of memory each so I don&#x27;t want to run them on my computer.<p>My current solution is the following script: http:&#x2F;&#x2F;pastebin.com&#x2F;6X6JsXzH but I&#x27;m wondering if there&#x27;s a better&#x2F;easier way to do this.
======
im_down_w_otp
Seems like a perfectly good way to solve your problem. If you wanted something
that was a little more "conventional" to act as your provisioner and
coordinator you could use Ansible for what you're doing here.

You might find that it gives you a helpful model/template for setting up a
group of machines, ensuring they're configured as you need, that your
artifacts get deployed to them consistently, and also that your jobs run as
you intended and the results can be collected and pushed someplace.

I suggest Ansible, rather than other similar tools, entirely because it's so
simple in scope of responsibility. It's a pretty straight forward DSL for
orchestrating servers (just pushing commands to them to execute) that has no
additional dependencies besides SSH and Python. Since it seems like that's all
you really need for this task, there's not a lot of reason to use bigger
frameworks/platforms.

Or... just keep using your shell scripts. :-)

~~~
georgiev
Thanks for the Ansible recommendation. I was wondering if I was reinventing
the wheel but I guess I wasn't.

