
Ask HN: Cloud server vs. custom PC for running machine learning algorithms - tsaprailis
Hello all,<p>I have been pondering the pros and cons of building a high-end PC vs renting a high performance instance from a Cloud provider (e.g. a P2 instance from AWS) for running Data Science&#x2F;Machine Learning algorithms.<p>Has anyone faced the same dilemma before, and if so what were the deciding factors for you?
======
boulos
I'm biased (Disclosure: I work on Google Cloud), but having a single GPU in
your local machine for testing and then doing real training on a cloud
provider seems best.

If you load your workstation at home with the best GPUs money can buy today,
you've spent a big pile of cash and in six-to-twelve months it's no longer the
best (see our announcement about bringing P100s to Compute Engine in the next
few months). Moreover, your single awesome machine can only train one thing at
a time. By renting several such VMs (on us or other providers), you can use
distributed training to iterate more quickly on your models or explore totally
different problems in parallel.

Doing so at home multiplies your cost for each machine. With the pay as you go
model, if you were going to compute the answer, it costs the same (roughly) to
do it all at once versus serially. So why wait?

Again, big disclosure: I work on Google Cloud, and have ML Services and GPUs
to sell you.

------
trengrj
I think that provided your own desktop or laptop is powerful enough I would
just use that for testing and development. Cloud wins out over a custom PC
build due to the flexibility you get.

Want to run an algorithm on 10 nodes for 5 hours? No problem. Want to leave
something running for days connecting to the twitter firehose? Sure.

However building PCs and your own clusters is quite fun! Check out these
[https://www.picocluster.com/collections/cubes/products/pico-...](https://www.picocluster.com/collections/cubes/products/pico-3-odroid-c2-cluster-
cube)

------
brudgers
Curious what hardware you have currently available and what workloads you are
considering running and for what purpose. I mean:

1\. Today's low end hardware is the high performance hardware of just a few
years ago.

2\. Most data isn't big and if the data is big then moving compute to the data
is usually a better practice than moving data to the compute.

