ALSO: For all the discussion of on-prem, for ML in particular, consider running ...

FridgeSeal · on July 11, 2020

I’ve been mulling this idea over in my head recently of investing a $2-3k in building a machine to do exactly that (and use it as a normal dev day to day machine when it’s not training), because it appears the economics of it are surprisingly great.

Have you (or anything else here) had experience doing this? Did it end up being a worthwhile approach? (Even for a while)

lmeyerov · on July 11, 2020

It depends how long it is on.

If training only short while, may do better by setting up a cloud training workflow that only has the server on while training. If on a lot, then a private box makes more sense (ex: lambdalabs, at home/office/colo). Then setup as a shared box for the team.

A lot of time ends up dev, not actual training, and folks end up keeping dev cloud GPUs on accidentally. We still use cloud GPUs for this, but have primary dev on local GPU laptops. For that, we started by System76 for everyone (ubuntu Nvidia), but those had major issues (weight, battery draw...). I then did a lightweight asus zenbook for myself, but that was too lightweight all around. Next time will do more inbetween or explore Thinkpad options.

And yep, as a small team, this mix dropped our cloud opex spend by like 90%, and pretty fast to offset the capex bump.